Note
Document in progress
A quick guide to describe your Data in NAKALA¶
The quality and richness of data description are central criteria of the [FAIR] principles (https://doranum.fr/enjeux-benefices/principes-fair/). This is a means of achieving the desired objectives (ensuring that data is accessible, interoperable and reusable). Quality can be implemented, for example :
-
using standardized repositories,
-
respecting the same intellectual standards of description for a data set,
-
choosing the metadata fields best suited to the given information,
Richness is achieved by completing as many fields as possible to optimize data comprehension.
In NAKALA, description is based on a minimum set of five pieces of information, which can be enriched extensively and cumulatively.
Note
The description of collections in Nakala follows the same principles and uses the same model as the data. The main difference is that the mandatory metadata are Collection Status (private or public) and Title.
Contents¶
- Status of the data description guide in NAKALA](#status-of-the-data-description-guide-in-nakala)
- Metadata in NAKALA](#metadata-in-nakala)
- NAKALA data description principles](#principes-de-description-des-donnees-dans-nakala)
- Mandatory and strongly recommended metadata](#metadonnees-obligatoires-et-fortement-recommandees)
- Type of data deposited (mandatory)](#type-de-donnee-deposee-obligatoire)
- Title (required)
- Authors (required)](#authors-required)
- Creation date (required)](#creation-date-required)
- License (required)](#license-required)
- Description (recommended)](#description-recommended)
- Keywords (recommended)](#words-recommand)
- Language (recommended)
- Publication of metadata after DOI assignment
Status of the NAKALA data description guide¶
The current version of the guide offers a set of tips and best practices for mandatory and additional first-level metadata fields. It is intended to be completed.
It is possible to make remarks to help it evolve.
Metadata in NAKALA¶
- Mandatory fields in NAKALA :
When depositing data in NAKALA, you must complete five mandatory metadata fields:
- Title
- Author
- Date
- Type
- License
These Dublin-Core-inspired fields provide a minimal description of each piece of data.
- Optional Dublin-Core vocabulary fields :
To the five fields of the NAKALA description record (inspired by Dublin-Core), it is possible to add and/or duplicate any other field from the qualified Dublin-Core vocabulary.
The Dublin-Core vocabulary (“DC”) consists of:
- a base (“simple DC”) of fifteen very generic description fields (
contributor
,coverage
,creator
,date
,description
,format
,identifier
,language
,publisher
,relation
,rights
,source
,subject
,title
,type
) -
an extension (“qualified DC”) with
- additional headings (
audience
,provenance
,rightsholder
…) - refinement qualifiers to specify the basic headings (for example:
available
,created
,dateAccepted
,dateCopyrighted
,dateSubmitted
,issued
,modified
,valid
, are all qualifiers that specify the generic notion of date). - encoding schemes and controlled vocabularies to express field values (for example:
DCMIType
,W3CDTF
…).
- additional headings (
-
Other vocabularies :
NAKALA does not currently offer the option of implementing a format other than Dublin-Core in its repository interface. However, depositors can associate their own metadata files in the format and vocabulary of their choice. In this case, it is possible to use this specific description in a web exhibition external to NAKALA.
NAKALA data description principles¶
Data descriptions should be as rich, precise and accurate as possible.
In addition to the mandatory fields, it’s a good idea to add any other information you know about the data:
- Use qualified DC terms whenever possible, rather than simple DC terms.
- When the content of a heading is expressed in a specific language, specify this language using the lang attribute.
- Wherever relevant, use formal syntax or controlled vocabulary rather than free text.
- When several pieces of information of the same nature need to be specified, use the same term several times.
- Do not use systems based on separator characters.
Note
Huma-Num supports networks of disciplinary or professional data experts via its network of consortia and the regional relays represented by the MSHs. As far as possible, reflection on data description should include disciplinary harmonization in the choice of vocabularies and in the way information is completed. Networks are presented in the [Consortia and Networks] tab (https://www.huma-num.fr/).
Mandatory and strongly recommended metadata¶
Data type (mandatory)¶
This field specifies the main type of data deposited. The list of types available in Nakala is closed, and is taken from the type repository of COAR, the “confederation of open access repositories”. It cannot be repeated here.
List of data types :
Link
API request to query the list of types :
curl -X GET "https://api.nakala.fr/vocabularies/datatypes" -H "accept: application/json"
It is possible to specify the nature or genre of the resource content using the optional dcterms:type
field in “add metadata”.
Title (mandatory)¶
Describe the data with a title or name. The title should be precise and unique, so that the data can be better understood.
- For a photograph, the main subject of the image: “Tour de la Défense: construction view”.
- For a press article, its title: “Launch of the construction of the Tour de la Défense”.
Depending on the needs and uses of the data concerned, the title can include references to dates, places, people, etc.
The mandatory title field can be repeated to indicate different languages.
To mention a secondary title, an abbreviated title or any other name given to the resource, use the dcterms:alternative
field instead.
Note
The data title is different from the name of the associated file(s) in the repository. A data item in NAKALA consists of a description record accompanied by one or more files. The naming of data files must also be organized and planned. Rules are explained in the [prepare data] section (/nakala-preparer-ses-donnees/).
Authors (mandatory)¶
In the Author field, which is mandatory by default, we recommend indicating the data producer. However, this is not always appropriate for the data submitted, or even not possible (unknown author).
To meet the different needs for describing the “Author” role, you can :
- Duplicate the Author field
- Add a
dcterms:creator
field - Leave the Author field anonymous and add an optional
dcterms:creator
field, for example in the case of data whose author is not known in the form of a surname/first name.
Other fields relate to the description of a role on a piece of data and can meet a description need in “add metadata”:
dcterms:publisher
- dcterms:contributor
It’s important to take into account how the data will be cited.
Creation date (mandatory)¶
Mention here the date of creation of the content of the resource, and not the date of creation of its digitized form, in the case of a posteriori digitization.
If the deposited resource represents only a digital avatar of the object described, e.g. the digitization of an old manuscript, indicate the date of creation of the latter.
This field accepts the following forms of W3CDTF:
YYYY-MM-DD
(year-month-day). Example:2021-03-02
YYYY-MM
(year-month). Example:2021-03
YYYY
(year). Example:2021
This field accepts the value “Unknown”.
If the forms accepted by W3CDTF are too restrictive, you can leave the Date field set to “Unknown” and add an optional dcterms:created
field whose content is not controlled.
License (mandatory)¶
The License field specifies the conditions under which the data can be reused.
The NAKALA deposit form allows the depositor to select :
- the 6 [Creative Commons] licenses (https://creativecommons.org/),
- the [Etalab] license (https://www.etalab.gouv.fr/licence-ouverte-open-licence).
To meet other needs, it is possible to mention other licenses. In this case, NAKALA’s License field lets you autocomplete a license from a list of some 400 licenses.
Link
API request to query the list of licenses: curl -X GET "https://api.nakala.fr/vocabularies/licenses" -H "accept: application/json"
If the desired license is not found, you can request the addition of a license by writing to nakala@huma-num.fr. In this case, you need to give the license title and its uri.
Resource
See Reuse licenses in the context of Open Data, by Doranum.
Description (recommended)¶
Corresponds to dcterms:description
.
Allows you to describe the content of the resource in the form of free text. Specify description language.
Specify the choice of description fields in the optional fields:
- If the description is a content summary, use
dcterms:abstract
. - If the description is a table of contents, use
dcterms:tableOfContents
.
Keywords (recommended)¶
Corresponds to dcterms:subject
Allows you to describe the subject(s) of the resource content in the form of keywords. For ease of use, this field is associated with [ISIDORE] repositories (https://isidore.science/vocabularies).
Concept labels from repositories used by ISIDORE for data enrichment (RAMEAU, Pactols, GEMET, LCSH, BNE, GéoEthno, ArchiRès, Geonames) are searchable by autocompletion. This autocompletion is an aid to selection, enabling the depositor to select the desired concept, but also to enter a specific concept not found by autocompletion. Simply validate the word (by pressing Enter) for it to be taken into account. We recommend that you specify the language of your keywords.
This multivalued field can also be repeated, for example, to enter the same list of keywords in another language.
It is also possible to use the optional dcterms:subject
field in “add metadata” to indicate, for example, a code linked to a concept from a Library of Congress repository (types: dcterms:LCSH
or dcterms:LCC
), or a code from a classification (types: DDC or UDC).
Language (recommended)¶
Corresponds to dcterms:language
.
When relevant, the Language field is used to specify the language of the resource. This field is optional and repeatable. The language is identified by autocomplete search in the Nakala repository (a list of over 7,000 living or extinct languages according to ISO-639-1 and ISO-639-3 standards).
It is also possible to use the dcterms:language
field in “add metadata” to indicate a “language” that is not part of this repository, or to specify, for example, the script used (ISO-15924 code).
Publication of metadata after DOI assignment¶
Each piece of data published in NAKALA is assigned a Digital Object Identifier (DOI Datacite), a perennial identifier enabling long-term citation of the data. This attribution is recorded in the metadata, and in order to ensure the continuity of citations, this information is intended to be available in the long term.
Thus, data published in NAKALA is exposed by the NAKALA OAI-PMH server and published in Datacite.
Each collection in NAKALA is a SET in NAKALA’s OAI-PMH warehouse: https://api.nakala.fr/oai2.
It is therefore important to have as clear and precise descriptions as possible of the data deposited in NAKALA.
Links
- A tool for formatting a DOI citation in different models DOI Citation Formatter
- A metadata search tool Datacite metadata search. Metadata can be retrieved from this interface in DataCite (XML or JSON) and Schema.org (JSON-LD) formats.