ISIDORE¶
What is ISIDORE?¶
ISIDORE is a search engine for discovering and finding publications, digital data and the profiles of researchers in the social sciences and humanities (SSH) from around the world.
The full text of several million documents (articles, theses and dissertations, reports, datasets, web pages, database records, descriptions of archival holdings, etc.) and event announcements (seminars, conferences, etc.) can be searched. In addition, ISIDORE links these millions of documents together by enriching them with scientific concepts created by SSH research communities.
It is accessible on the Web through the portal isidore.science.
It also offers scientific social network functionalities. As such, it falls into the category of search engines and assistants and offers many features to organize scientific monitoring.
Launched on December 8, 2010, ISIDORE is the result of a collaboration between the CNRS “very large equipment” Adonis (2007-2013), the Center for Direct Scientific Communication and the companies Antidot, Mondéca and Sword. It is currently developed, updated and operated by the IR* Huma-Num.
References on the history of ISIDORE:
- POUYLLAU, Stéphane. (2023) ISIDORE : reprise des mises à jour ! Carnet de recherche d’Huma-Num
- POUYLLAU, Stéphane, CAPELLI, Laurent, MINEL, Jean-Luc, BUNEL, Mélanie, SAURET, Nicolas, BAUDE, Olivier, JOUGUET, Hélène, BUSONERA, Pauline, & DESSEIGNE, Adrien. (2021). ISIDORE à 10 ans. Zenodo. 10.5281/zenodo.5699997
- Philippe Bourdenet, “L’espace documentaire en restructuration : l’évolution des services des bibliothèques universitaires”, Le serveur TEL (thèses-en-ligne), tel-00932683
- Yannick Maignien, “ISIDORE, de l’interconnexion de données à l’intégration de services”, Hyper Article en Ligne - Sciences de l’Homme et de la Société, sic_00593320
- Stéphane Pouyllau et al., “Bilan 2011 de la plateforme ISIDORE et perspectives 2012-2015”, MoDyCo, Modèles, Dynamiques, Corpus - UMR 7114, 10670/1.bqexsj
Who is it for?¶
ISIDORE is aimed at the entire international academic community, with the entire site available in 3 languages. It is also aimed at anyone wishing to deepen their knowledge of subjects relating to the humanities, social sciences and, more broadly, issues affecting societies around the world.
How does ISIDORE work?¶
ISIDORE harvests textual metadata and full text, enriches them and then indexes them. It uses the metadata of the documents as well as the full text. The goal is to analyze this information in order to enrich the document, to link them to the concepts of the scientific vocabularies (thesaurus, etc.) and to link them to the authors’ identifiers (ORCID, IDRef, IDHAL, VIAF, etc.).
Several enrichments are performed:
-
Semantic annotation: the words present in the metadata of the documents are compared to the entries of the vocabularies through an algorithm based on a morphological analysis of the terms. If an equivalence is found between a term from the document and an entry in one of the vocabularies, then the resource will be linked to that vocabulary entry. The vocabularies are multilingual and aligned with each other. Thus, the semantic annotation is multilingual.
-
Disciplinary categorization: ISIDORE uses a semantic classifier that, after being trained on a reference corpus, categorizes all documents in ISIDORE into the SSH disciplines of the MORESS vocabulary. The classifier is trained with the help of the manual categorization completed by researchers in HAL when depositing their publications.
-
Detection of the authors: ISIDORE detects the authors of the documents and enriches the author form (first name and last name) with the help of international (ORCID, VIAF, ISNI) and national (IDHAL, IDRef) author identifiers.
ISIDORE indexes, in its search engine:
- Document metadata;
- The full text (if it is available in open access) ;
- The semantic annotations ;
- Disciplinary classification;
- Author enrichment and normalization.
More information is available on the “Vocabularies” page of ISIDORE.
Can ISIDORE index multilingual documents and data?¶
Yes. Since 2015, documents and datasets in English, Spanish and French are indexed, enriched and linked to scientific repositories by ISIDORE (metadata and full text). For full text outside these three languages, it is indexed in the language of the document. For more information, you can consult our blog post on the subject: Isidore speaks English, sino también español et toujours en français.
How often is ISIDORE updated?¶
ISIDORE is updated, incrementally, on average once a day.
What is the circuit for adding collections in ISIDORE?¶
ISIDORE harvests “collections” of documents and data, which is the term used in ISIDORE. These collections may be data warehouses or databases containing metadata, data and links to this data.
Two scenarios:
-
A research project, a team, a laboratory, a library can propose collections to be harvested by simple e-mail to isidore-sources@huma-num.fr. The Huma-Num team studies the request and exchanges with the requester in order to fully understand how the metadata and the data to be indexed are described. Most often, a first harvest and a first indexing and enrichment are carried out so that the requester can see and analyze how their data will be indexed in ISIDORE. Then, the exchanges potentially continue to adjust the indexing process as well as possible.
-
The Huma-Num team identifies a data warehouse or a digital library and contacts the data producer or the structure that distributes this data to exchange and propose harvesting and indexing in ISIDORE. A first harvesting and a first indexing and enrichment are carried out so that the requester can see and analyze how their data will be indexed in ISIDORE. Then, the exchanges potentially continue to adjust the indexing process as well as possible.
How to use ISIDORE?¶
ISIDORE offers several tools to search, discover, collect and organize the contents it indexes:
The “isidore.science” portal¶
The isidore.science portal is a website in three languages that provides a relevance search engine that can be used with several query methods.
- By default, ISIDORE searches for all the words in a query posed by the user by removing empty words (“of”, “the”, “the”, “the”, etc.);
- It is possible to search for a document with a complete sentence or a group of words by using quotation marks around the sentence or word group, for example: “direction of consciousness” will search for exactly this expression. Thus, in this case, the “of” will not be considered as an empty word;
Search operators¶
Several boolean search operators are available in ISIDORE. Note that the syntax of the operators is important in ISIDORE, they are always in UPPERCASE (e.g. AND):
- AND: the intersection will find the terms (or set of terms) common to the query.
For example:
- consciousness AND gender
- “cold war” AND migration
- OR: the union will find the terms belonging to both sets of terms, or to one or the other.
For example:
- “semantic web” OR “web 3.0”
- EXCEPT (NOT): the exclusion will reduce the noise by excluding terms. For example:
- revolution NOT French
- NEAR(n.): the NEAR(n.) operator (i.e. “close to”) will link terms by indicating a value “n.” of proximity between them. It works like an AND with n. word(s) between the terms. The value “n.” indicates the number of words that separate the two terms. NEAR also works without the value n. and is in this case equal to a NEAR(10), i.e. 10 words between the searched terms (standard spacing).
- house NEAR(4) nobility : searches for house and nobility with a proximity of 4 words
Sorting of search results¶
By default, in isidore.science, the results are sorted by semantic relevance. It is possible to change the sorting of the search results to:
- sorting by novelty
- sorting by author’s name in alphabetical order
- sorting by author’s name in reverse alphabetical order
- sort by ascending date
- sorting by decreasing date
- sorting on the title by alphabetical order
- sorting on the title by reverse alphabetical order
Advanced Search¶
An advanced search is also available at https://isidore.science/as and also accessible from the first page of the portal.
Personal space for researchers¶
Isidore.science offers a personal space for researchers allowing them to:
- collect, classify and organize the documents found;
- gather all their scientific production in order to edit it in a personal profile page;
- follow the productions of colleagues;
- record and publish queries and their results for monitoring purposes;
- create bibliographies that can be exported to Zotero.
Your need an HumanID account for the features.
The APIs of isidore.science¶
The isidore.science search engine APIs are available through the GET method on HTTP or HTTPS. They provide a fast, accurate and reliable query service for ISIDORE data with advanced search features (auto-completion, spell checking, multi-criteria, boolean and faceted searches, sorting, aggregation of answers, etc).
Each request to the engine is submitted by means of a URI pointing to a specific web service. The response is a stream in XML (default format) or JSON format.
The isidore.science API web page details all the commands available for the different services available.
To be noted
As of Sept. 15, 2023. ISIDORE no longer offers RDF tripleStore and SPARQL endpoint.
Complementarity between ISIDORE and Zotero¶
Use from ISIDORE of the Zotero connector to feed its bibliographic database¶
ISIDORE is compatible with Zotero. The references of documents can be imported on two levels as soon as the user has installed the Zotero connector in his browser:
- On the page listing the results of a search,
- On the page listing the results of a search, in the page displaying a document.
Using the ISIDORE search connector from Zotero¶
Zotero (Linux, MacOS, Windows client) uses search engines to search or complete bibliographic references directly from the Zotero interface. We propose here two ISIDORE connectors for Zotero that make it possible to use ISIDORE from author search.
By adding ISIDORE to Zotero you can:
- complete references from a search on the author’s name: this is the “ISIDORE, help me find what he/she has published.”
- find documents in which the author is cited: this is the “ISIDORE, what do you have on the author?”
These connectors and installation documentation are available on the IR* Huma-Num GitLab.
Use of RSS feeds¶
ISIDORE can propose its research results in the form of RSS feeds in order to feed scientific monitoring software (including Zotero for example), research notebooks, etc. The RSS feeds created in ISIDORE are updated, like all the contents of the search engine, approximately once a month during the general update of the ISIDORE contents. Thus, it is possible to follow, from Zotero, the update of the ISIDORE documents resulting from the registered queries.
To do so, access your personal space (login required), and click “My queries” to see your registered queries:
For a registered query, you have to click on the pictogram “Request RSS feed of the query” available on the right and to copy the link with .
The copied link is in the form: https://isidore.science/feed/lt3913
.
If your browser is equipped with a module for reading RSS feeds, this link can be used directly in your browser. For our example, we will continue with Zotero.
In Zotero, you have to choose: New feed > From URI:
Then add the url of the feed provided by ISIDORE (N.B. When using Safari under MacOS, take care to remove the mention “feed:” from the url). Then paste it in “URL” of the Zotero RSS feed creation window, example below:
Then you have to give a title to your feed, for example: “isidore.science - Query on …”.
What can be found in ISIDORE?¶
Organization of documents and data in ISIDORE¶
ISIDORE contains several million documents in SSH that are harvested, enriched with scientific references and indexed. They are organized into:
- Research documents and data (archives, raw materials, photographs, films, datasets, statistics, etc.), identified in the ISIDORE ontology by: http://isidore.science/class/primaires
- Published documents and data (articles, books, dissertations and theses, reports, etc.), identified in the ISIDORE ontology by: http://isidore.science/class/secondaires
- Scientific events (conferences, study days, etc.), identified in the ISIDORE ontology by: http://isidore.science/class/evenementielles
For a large number of SSH disciplines, ISIDORE makes it possible to search documents coming from the main publication platforms worldwide, as well as a large number of digitized collections from national, university and municipal libraries.
For advanced search uses, the ISIDORE advanced search offers, for example, the possibility of searching for documents between two dates and by discipline or by collections.
The main publication platforms (journals and books) present in ISIDORE are:
- OpenEdition
- Cairn
- Perseus
- Erudit
- Oapen
- Redalyc
- Scielo Books
The complete list of collections containing publications can be obtained by querying in the search bar with “S” (SOURCES) world.
The main digital libraries (municipal, national, etc.) present in ISIDORE are:
- Gallica (BnF)
- E-rara
- NuBIS
- Octaviana
- Burgerbibliothek
- Berkeley Library Digital Collections
- Argonnaute
- BNE
- Cornell University
- Didόmena
- …
Organization of documents and data by types¶
ISIDORE ontology of types¶
ISIDORE also sorts documents and data by their type: i.e. by articles, datasets, photographs, theses, etc. This makes it possible to offer the “Document Type” filter in the [isidore.science interface] (https://isidore.science).
Most databases or data warehouses that feed ISIDORE use one or more standardized vocabularies to define these types. Most often, they are expressed using Dublin Core metadata (Element Set or Terms, see below the section on OAI and RDFa) and even if there are types repositories (see below COAR), we note a very large heterogeneity between data producers.
In order to gather as many documents as possible in an exhaustive set of types, ISIDORE performs several process on them. It is mainly a question of grouping and merge types starting from the type given by the data producer and aligning it with URIs of international reference systems. These grouping processes are done with the help of an “ISIDORE ontology of types” whose entries are aligned with the international repositories COAR, BIBO, RDFS, DCAT, Wikidata.
The ISIDORE ontology of types is available online in XML (SKOS/RDF).
Since this is a processing done by ISIDORE, it should be noted that the labels of the ISIDORE ontology of types are available in English, French and Spanish like the rest of the enrichments created by ISIDORE.
Alignment of types ontolgy between ISIDORE and NAKALA¶
In the Huma-Num infrastructure ecosystem, since the redesign of NAKALA in 2020, the types is based on the international repository “COAR” (Confederation of Open Access Repositories) developed since 2008 by the European DRIVER program and widely used internationally in most science data platforms (OpenAIRE, etc.).
In 2020, an alignment of ISIDORE ontology of types and NAKALA types, using COAR, was implemented. This alignment is used both in the NAKALA data repository interface (it populates the “Repository Type” drop-down list) and in ISIDORE in the type ontology. This alignment can be obtained by querying using the 3 RDF stores of ISIDORE or NAKALA. You can download it :
To be noted
A cross-presentation of the 3stores will be offered as soon as NAKALA offers a 3store on its new version. It is currently possible to use the NAKALA and ISIDORE APIs for this.
Indexing of the main data platforms in SHS¶
ISIDORE harvests and indexes the contents of many SSH data platforms, allowing researchers to group all their data in their user profile. We encourage researchers, for their research programs, to use platforms offering open interoperability devices and protocols to present documentary and scientific metadata.
The main data platforms (sources, archives but also publications) are harvested by ISIDORE.
Please feel free to report any new source to us.
Can data deposited and documented in NAKALA be referenced by ISIDORE?¶
Yes, data deposited and documented in NAKALA (the data repository for SSH by Huma-Num) can be accessible in ISIDORE. NAKALA offers as standard the OAI-PMH interoperability protocol which allows for the harvesting of document metadata, and therefore for referencing, enrichment and indexation by ISIDORE.
However, referencing by OAI-PMH harvesting is not automatic for the moment, in particular to allow users to prepare and organize their data and data and metadata. To be referenced, simply request by email to be indexed in ISIDORE via isidore-sources@huma-num.fr.
How will scientific articles and images deposited in the HAL, HAL-SHS and MédiHAL open archive be accessible in ISIDORE?¶
All the files (PDF, illustrations, photographs, audio and video) deposited and documented in the open archive HAL, including HAL-SHS, as well as MédiHAL are automatically referenced in ISIDORE and indexed at the level of their metadata. All these documents and their notices are thus accessible through the various query interfaces of ISIDORE.
Can the data deposited in the Didómena (EHESS) warehouse be referenced by ISIDORE?¶
Yes, Didómena (the research data warehouse of EHESS) offers OAI-PMH interoperability. Be careful, harvesting is not automatic. For your collection to be referenced, please provide us with the OAI-PMH access point via isidore-sources@huma-num.fr.
Can the data deposited in the Data.sciencespo warehouse be referenced by ISIDORE?¶
Yes, the data deposited and documented in Data.sciencespo (Dataverse) offer interoperability in OAI-PMH. They are harvested automatically by ISIDORE.
Can the data deposited in the COCOON platform be referenced by ISIDORE?¶
Yes, the data deposited and documented in the COCOON platform offer interoperability in OAI-PMH. This platform is automatically harvested by ISIDORE.
Can files and documents deposited in the European Zenodo platform be referenced by ISIDORE?¶
Yes, it is possible for ISIDORE to reference the files and documents deposited and documented on the platform Zenodo.
The referencing is based on the principle of OAI-PMH harvesting on a set of files and data (and thus their metadata) corresponding to one or more identifier(s) corresponding to the “communities” identifiers in Zenodo (see https://developers.zenodo.org/#sets). We can also group several Zenodo identifiers in the same ISIDORE collection, allowing the depositors of several corpora deposited in Zenodo to group them in ISIDORE to give them more visibility.
To add your Zenodo repositories in ISIDORE, please send us the URL OAI-PMH of your repository (see https://developers.zenodo.org/#oai-pmh).
Can files and documents deposited in Gallica Marque Blanche platform be referenced by ISIDORE?¶
Yes, the data deposited and documented in Gallica Marque Blanche offer interoperability in OAI-PMH with a dedicated “Set”.
Can Omeka farm powered by INIST-CNRS be referenced by ISIDORE?¶
Yes, it is possible for ISIDORE to reference the files and documents deposited and documented on the Omeka farm powrered by INIST-CNRS.
How do I get data referenced by ISIDORE?¶
There are several ways to get data and documents referenced by ISIDORE:
- Submit your data via an XML stream of standardized metadata and using the OAI-PMH protocol associated with metadata in Dublin core format. This method is adapted for documentary databases, corpora, scientific archives and document/data libraries. As an example, a tool such as Omeka (Classic or S) offers the OAI-PMH protocol via modules. This method is adapted to research program websites presenting document or data corpora, scientific blogs (except Hypotheses.org), and web pages in general.
These two methods are also often implemented by data publication tools (CMS, etc.), for example:
Can a web site using Drupal be indexed by ISIDORE?¶
Yes, it is possible to have web pages generated by the Drupal CMS indexed by ISIDORE. There are two ways to do this, depending on the nature of the content of your pages:
- Either via the OAI-PMH protocol and in this case there are several modules for Drupal, see OAI-PMH for Drupal.
- Or via the use of a Dublin Core metadata structure in the web pages generated by Drupal using RDFa and a sitemap.xml. An article dedicated to this way of proceeding is available at the above address.
Can a website using Omeka Classic and Omeka-S be referenced by ISIDORE?¶
Yes, Omeka Classic and Omeka S offer modules to expose metadata according to the OAI-PMH protocol:
- Module for Omeka S
- Module for Omeka Classic
How to report data in ISIDORE with metadata and OAI-PMH protocol?¶
To report your data in ISIDORE using the OAI-PMH protocol, you just have to:
- Prepare your data and metadata using the Documentary vocabulary Dublin Core Element Set or Dublin Core Terms, depending on the level of precision you want, and to make them accessible via the OAI-PMH protocol;
- To organize and document the Sets in its OAI-PMH repository.
- To write to isidore-sources@huma-num.fr and give the address of the repository to Huma-Num.
Document sets in OAI-PMH: Sets¶
The OAI-PMH protocol makes it possible, through the creation of Sets, to bring together a coherent set of records whose perimeter makes sense from a scientific or editorial point of view and which is left to the discretion of the producer of the data.
It also makes it possible to define a hierarchy in the Sets with an inheritance mechanism by specifying
in the set name the name of the parent Set and the child Set,
separated by the :
character. ISIDORE is able to use these
Sets to limit harvesting to a set of records or to differentiate between different
data sources within the same warehouse.
The producer will therefore have to specify the harvesting methods that seem to be
appropriate in order to make the most of their resources within ISIDORE.
To do this, he must indicate the Set or Sets
concerned or a rule enabling the Sets to be taken into
account to be distinguished.
Notes
Sets may correspond to the notion of collections
in search date repositories like NAKALA, Didόmena, MédiHAL, Recherche Data Gouv, etc.
The Sets can present metadata, in the Dublin Core Element Set, which are specific to them. For example:
<set>
<setSpec>OuvColl</setSpec>
<setName>OuvColl</setName>
<setDescription>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:description>Research works distributed on Cairn.info</dc:description>
</oai_dc:dc>
</setDescription>
</set>
Records in OAI-PMH or Records:¶
In the ISIDORE framework, each OAI-PMH “record” corresponds to a document.
The ISIDORE harvester thus exploits the metadata described according to the application profile defined by the Open Archive Initiative for the Dublin Core Element Set (also known as Dublin Core “simple”).
In addition, the harvester also collects the full-text document(s) whose URLs (beginning with https://
or http://
) are specified in the <dc:identifier>
element.
We recommend data producers to provide records that are as metadata-rich as possible since relevance in ISIDORE favors the richest possible metadata. Fields such as:
<dc:title>
<dc:creator>
<dc:date>
Or :
<dcterms:title>
<dcterms:creator>
<dcterms:date>
are require.
The field <dcterms:description>
are highly recommended in order to improve the indexing and thus the positioning in the search results of the documents.
Standardization recommendation¶
Authors¶
ISIDORE is able to detect author forms in order to match them with their identifiers (IdRef, Orcid, etc. see above). This works from a certain normalization of the <dc:creator>
or <dcterms:creator>
metadata: ISIDORE can be configured to detect the following forms (first name = %p, last name = %n, birth and death dates = (%t)):
- %n, %p
- %p %n
- %n, %p (%t)
As well as all other forms composed of these three elements.
Dates¶
ISIDORE is able to detect different forms of dates. This works from a certain standardization of the metadata <dc:date>
or <dcterms:date>
from the following forms:
- ISO8601 (all variants of the standard)
- RFC822
- YYYY_MM_DD
- All variants of https://docs.python.org/fr/3/library/datetime.html#strftime-and-strptime-format-codes
Links to full text or attachments¶
It’s important that the metadata (via OAI or Sitemap+RDFa) provides access to an open-access web page, PDF or XML file using an ``
Example of a complete record according to the OAI-PMH protocol:¶
<record>
<header>
<identifier>oai:halshs.archives-ouvertes.fr:halshs-00514304</identifier>
<datestamp>2010-09-02T11:06:50Z</datestamp>
<setSpec>halshs</setSpec>
<setSpec>SHS:ECO</setSpec>
<setSpec>SDV:BIO</setSpec>
<setSpec>INFO:INFO_BT</setSpec>
<setSpec>SDV:SA:AEP</setSpec>
<setSpec>SDV:SA:STA</setSpec>
<setSpec>CIRAD</setSpec>
<setSpec>SHS</setSpec>
</header>
<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:identifier>http://halshs.archives-ouvertes.fr/halshs-00514304/en/ </dc:identifier>
<dc:identifier>http://halshs.archives-ouvertes.fr/docs/00/51/43/98/PDF/Regulation_GMO_pprint.pdf</dc:identifier>
<dc:identifier>http://halshs.archives-ouvertes.fr/docs/00/51/43/98/PDF/ppt_nocmt_broader_regulation.pdf </dc:identifier>
<dc:title>Broadening the scope of regulation: a prerequisite for a positive contribution of transgenic crop useto sustainable development</dc:title>
<dc:creator>Fok, Michel</dc:creator>
<dc:subject>[SHS:ECO] Humanities and Social Sciences/Economy and finances</dc:subject>
<dc:subject>[SDV:BIO] Life Sciences/Biotechnology</dc:subject>
<dc:subject>[INFO:INFO_BT] Computer Science/Biotechnology</dc:subject>
<dc:subject>[SDV:SA:AEP] Life Sciences/Agricultural sciences/Agriculture, economy and politics</dc:subject>
<dc:subject>[SDV:SA:STA] Life Sciences/Agricultural sciences/Sciences and technics of agriculture</dc:subject>
<dc:subject>regulation</dc:subject>
<dc:subject>coordination</dc:subject>
<dc:subject>GMO</dc:subject>
<dc:subject>biotechnology</dc:subject>
<dc:subject>seed price</dc:subject>
<dc:subject>research</dc:subject>
<dc:subject>weed resistance</dc:subject>
<dc:subject>pest complex shift</dc:subject>
<dc:description>Ex-ante regulation of transgenic crop use generally prevails, before the authorization of commercial release.This kind of regulation addresses the concerns of biosafety and coexistence, under pressure of pros and/or cons of GMO. After fifteen years of large scale use of transgenic crops (notablysoybean and cotton) in various countries (USA, China, Brasil, India...), ecological and economic phenomena are observed and which could threaten the sustainable use of transgenic varieties. I advocate that the regulation scope must be extended so as to a) promote a systemic and coordinatedapproach of transgenic crop use, b) ensure seed purity with regard to the transgenic trait, c) maintain research on non-transgenic varieties, and d) warrant fair pricing of transgenic seeds.</dc:description>
<dc:coverage>Montpelier</dc:coverage>
<dc:coverage>France</dc:coverage>
<dc:date>2010-08-29</dc:date>
<dc:language>english</dc:language>
<dc:type>proceeding with peer review</dc:type>
<dc:source>Proceedings of Agro2010, the XIth ESA Congress</dc:source>
<dc:source>Agro2010, the XIth ESA Congress</dc:source>
</oai_dc:dc>
</metadata>
</record>
The ISIDORE harvester is able to use the Dublin Core Terms and Dublin Core Element Set metadata format and PDF or XML files allowing full-text exposure (including TEI or EAD) thus improving its indexing. The data producer will have to take care to scrupulously respect the specifications of the OAI-PMH protocol version 2.0, in particular as regards:
- The strict respect of the “datestamp” values in OAI verb ListIdentifiers et GetRecords in order to synchronize the updates between the producer and ISIDORE;
- The good management of deleted data (detail on the OAI-PMH protocol documentation);
- In the case of a publisher’s data warehouse or one of significant size, access to its OAI-PMH warehouse via the IP addresses of ISIDORE’s OAI-PMH harvesters (harvesting reported by ISIDORE to its IT department).
We advise producers to regularly validate the compliance of their repository using, for example, the tools of the Open archive initiative. Finally, we advise data producers to contact the Huma-Num team for any information requests.
How to report data in ISIDORE with RDFa metadata?¶
RDFa can express a metadata structure according to the principles of the Semantic Web (RDF for Resource Description Framework) in the HTML code of Web pages. The “a” in RDFa stands for “in attributes”, i.e. within the HTML code).
How to express metadata of a web page very simply by
using the RDFa syntax
? For example, in a blog post published with WordPress. While there
exist plugins to do this,
the obsolescence of the latter can make it difficult to maintain them
over time. Another solution is to implement RDFa in the
HTML code of the WordPress theme you have chosen. For this to be easy
and manageable over time, the simplest way is to use the HTML header
in order to place <meta>
tags that will contain some metadata.
Expressing metadata according to the RDF model via the RDFa syntax allows machines (mainly search engines and indexers) to better process information because it becomes more explicit: for a machine, a string can be a title or a summary; if you don’t tell the machine that it’s a title or a summary it will not guess it. So, at the very least, it is possible to use the tags to define an RDF structure that allows you to structure the minimal metadata for example with the Dublin Core Element Set.
How to do it practically?¶
First of all, it is necessary to indicate in the DOCTYPE of the web page, that it will contain information that will use the RDF model, so the DOCTYPE will be:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
The <html>
tag must contain the addresses of the
ontology (via their NameSpace XML) which are used
to “type” the information. RDFa - which places metadata in the Semantic Web, requires at least the use of RDF and RDF Schema ontologies and the Dublin Core Element Set (dc). It is possible to use in addition - in order to refine the metadata - the Dublin Core Terms (dcterms):
<html xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/">
To encode more information, It is possible to use more document ontologies:
<html
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:cc="http://creativecommons.org/ns#">
IIn the example above, foaf is used to encode information about a person or object described by the metadata. The CC ontology is used to indicate which license, from the Creative Commons, applies to this content.
The RDFa structure through tags in the <head>
header of the HTML page. In a first step
, using a <span>
tag (with foaf to define the predicate), we will define the digital object to which the RDF encoded information will be attached:
<span typeof="foaf:page" about="URL of the page">
This tag defines a container for the information that we are going to indicate using the <meta>
tags. This container is identified by a URI which is a URL, i.e. the address of the page on the web.
The <meta>
tags then define a set of metadata, which in our case is descriptive information about the blog post’s web page:
<span typeof="foaf:page" about="URL of the page">
<meta property="dc:title" content="The title of my post" />
<meta property="dc:creator" content="First name Last name of author 1" />
<meta property="dc:creator" content="First name Last name of author 2" />
<meta property="dcterms:created" content="2011-01-27" />
<meta property="dcterms:abstract" content="A descriptive summary of my page's content in french" xml:lang="fre" />
<meta property="dcterms:abstract" content="A summary in english" xml:lang="eng" />
<meta property="dc:subject" content="keyword A" />
<meta property="dc:subject" content="keyword B" />
<meta property="dc:subject" content="keyword C" />
<meta property="dc:type" content="Web page" />
<meta property="dc:licence" content="Licence" />
<meta property="dc:format" content="text/html" />
<meta property="dc:relation" content="A link to a complementary web page" />
</span>
Depending on the nature of the content of the web page, it is of course possible to be more precise, more refined and more complete in the encoded information. For example, it would be wise to use the DC Terms vocabulary.
The DC Terms allow, for example, a precise form for a bibliographic reference of the content to be included:
<meta property="dcterms:bibliographicCitation" content="Put a bibliographic reference here" />
It would be possible to describe the entire text of a web page using the SIOC vocabulary using the property.
It is also possible to link web pages together (to define a corpus of authors for example) by using in the
DC Terms vocabulary the DC Terms property: dcterms:isPartOf
.
<meta property="dcterms:isPartOf" content="URL of another web page" />
Creating the Sitemap¶
Once the RDFa encoding has been done in the HTML pages, you still need to create a Sitemap XML file listing the pages you want ISIDORE to harvest and to submit the URL of this sitemap:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://monsiteweb.com/</loc>
<lastmod>2018-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://monsiteweb.com/page1/</loc>
<lastmod>2018-03-05</lastmod>
<changefreq>weekly</changefreq>
<priority>0.5</priority>
</url>
</urlset>
It is possible to test the extraction that ISIDORE will do of your RDFa metadata using the “ISIDORE on demand” application available at https://rd.isidore.science/ondemand/fr/rdfa.html
ISIDORE perimeter¶
Why are some items not found in ISIDORE?¶
If you do not find all of your scientific production in ISIDORE, there may be several explanations. It may be that your articles are published in journals that are not electronic or that do not make their articles available even long after they have been published. Since its creation, ISIDORE favors open access since indexing is better for articles available in open access. Many electronic journals have made this choice through portals such as Open Edition Journal (formerly Revues.org) Érudit, Persée, and Cairn.info, Redalyc, OApen and and articles from these journals are therefore collected and indexed by ISIDORE.
It is also possible that your articles are published online, but not on an electronic publishing platform (but a website), or on an electronic publishing platform that does not allow indexing via the standard protocol (see the question and answer on OAI-PMH).
Other journals make their articles available, but only after an embargo period. In this case, ISIDORE indexes only the metadata of the article. If you connect via your university library , documentation center or via BibCNRS, you may still have access to these articles.
The collections indexed by ISIDORE can be searched by using the engine itself and by indicating that you want to search the collections.
It is also possible that your article is published as a PDF image, in which case only the indexing by ISIDORE will be allowed, but not its full text indexing.
Lastly, it is possible that some of your articles are published in journals that are not classified in SSH.
In all these cases, you can deposit your articles in an open archive such as HAL (HAL-SHS in particular) which is also indexed by ISIDORE or contact your bu/documentation center.
If none of these cases correspond to your problem and you therefore think that there may be an error, you can send us an e-mail to isidore@huma-num.fr.
Why are some books/chapters of books not reported in ISIDORE?¶
ISIDORE knows how to identify that a document is of the type “book”, thus, there are more than 500,000 books and book chapters reported in ISIDORE.
It should be noted that there are relatively few platforms that publish online books in open access. ISIDORE indexes in SSH, for example, the contents of book platforms such as:
- OpenEdition Books (at the chapter level, and to flag them);
- Scielo Books (Brazil);
- OApen (Netherlands);
- Erudit (Canada);
- …
In addition, you can, in agreement with your publisher, deposit your work or book or book chapters in the open archive HAL-SHS. It will then be indexed by ISIDORE within the framework of the indexing of HAL-SHS and recognized as a book chapter.
Why are some databases are not reported in ISIDORE?¶
Harvesting by ISIDORE requires standardized and normalized metadata exposure (documentary, scientific, etc.) (either using the OAI-PMH protocol or using an XML Sitemap and RDFa metadata, see above).
If you know of any databases that are not present in ISIDORE, please inform us so that we can check with their publishers/data producers.
ISIDORE training courses¶
Here we list training courses, functional presentations and online self-training courses on the use of ISIDORE. Do not hesitate to let us know about any training session you would like to organize:
- e-learning session for ISIDORE by URFIST Méditerranée
- “Isidore, my personal research assistant” by Johanna Daniel (April 2020)
- Introduction to ISIDORE : a discovery tools for social sciences and humanities (2019)
Updates¶
The list of updates and release log from ISIDORE are available on the web site : https://isidore.science/releases.