Note

Document in progress

A quick guide to describe your Data in NAKALA

The quality and richness of data description are central criteria of the [FAIR] principles (https://doranum.fr/enjeux-benefices/principes-fair/). This is a means of achieving the desired objectives (ensuring that data is accessible, interoperable and reusable). Quality can be implemented, for example :

  • using standardized repositories,

  • respecting the same intellectual standards of description for a data set,

  • choosing the metadata fields best suited to the given information,

Richness is achieved by completing as many fields as possible to optimize data comprehension.

In NAKALA, description is based on a minimum set of five pieces of information, which can be enriched extensively and cumulatively.

Note

The description of collections in Nakala follows the same principles and uses the same model as the data. The main difference is that the mandatory metadata are Collection Status (private or public) and Title.

Contents

  • Status of the data description guide in NAKALA](#status-of-the-data-description-guide-in-nakala)
  • Metadata in NAKALA](#metadata-in-nakala)
  • NAKALA data description principles](#principes-de-description-des-donnees-dans-nakala)
  • Mandatory and strongly recommended metadata](#metadonnees-obligatoires-et-fortement-recommandees)
    • Type of data deposited (mandatory)](#type-de-donnee-deposee-obligatoire)
    • Title (required)
    • Authors (required)](#authors-required)
    • Creation date (required)](#creation-date-required)
    • License (required)](#license-required)
    • Description (recommended)](#description-recommended)
    • Keywords (recommended)](#words-recommand)
    • Language (recommended)
  • Publication of metadata after DOI assignment

Status of the NAKALA data description guide

The current version of the guide offers a set of tips and best practices for mandatory and additional first-level metadata fields. It is intended to be completed.

It is possible to make remarks to help it evolve.

Metadata in NAKALA

  1. Mandatory fields in NAKALA :

When depositing data in NAKALA, you must complete five mandatory metadata fields:

- Title
- Author
- Date
- Type
- License

These Dublin-Core-inspired fields provide a minimal description of each piece of data.

  1. Optional Dublin-Core vocabulary fields :

To the five fields of the NAKALA description record (inspired by Dublin-Core), it is possible to add and/or duplicate any other field from the qualified Dublin-Core vocabulary.

The Dublin-Core vocabulary (“DC”) consists of:

  1. a base (“simple DC”) of fifteen very generic description fields (contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, type)
  2. an extension (“qualified DC”) with

    • additional headings (audience, provenance, rightsholder…)
    • refinement qualifiers to specify the basic headings (for example: available, created, dateAccepted, dateCopyrighted, dateSubmitted, issued, modified, valid, are all qualifiers that specify the generic notion of date).
    • encoding schemes and controlled vocabularies to express field values (for example: DCMIType, W3CDTF…).
  3. Other vocabularies :

NAKALA does not currently offer the option of implementing a format other than Dublin-Core in its repository interface. However, depositors can associate their own metadata files in the format and vocabulary of their choice. In this case, it is possible to use this specific description in a web exhibition external to NAKALA.

NAKALA data description principles

Data descriptions should be as rich, precise and accurate as possible.

In addition to the mandatory fields, it’s a good idea to add any other information you know about the data:

  • Use qualified DC terms whenever possible, rather than simple DC terms.
  • When the content of a heading is expressed in a specific language, specify this language using the lang attribute.
  • Wherever relevant, use formal syntax or controlled vocabulary rather than free text.
  • When several pieces of information of the same nature need to be specified, use the same term several times.
  • Do not use systems based on separator characters.

Note

Huma-Num supports networks of disciplinary or professional data experts via its network of consortia and the regional relays represented by the MSHs. As far as possible, reflection on data description should include disciplinary harmonization in the choice of vocabularies and in the way information is completed. Networks are presented in the [Consortia and Networks] tab (https://www.huma-num.fr/).


Data type (mandatory)

This field specifies the main type of data deposited. The list of types available in Nakala is closed, and is taken from the type repository of COAR, the “confederation of open access repositories”. It cannot be repeated here.

List of data types :

Type URI
image http://purl.org/coar/resource_type/c_c513
video http://purl.org/coar/resource_type/c_12ce
sound http://purl.org/coar/resource_type/c_18cc
publication http://purl.org/coar/resource_type/c_6501
poster http://purl.org/coar/resource_type/c_6670
presentation http://purl.org/coar/resource_type/c_c94f
course http://purl.org/coar/resource_type/c_e059
book http://purl.org/coar/resource_type/c_2f33
map http://purl.org/coar/resource_type/c_12cd
dataset http://purl.org/coar/resource_type/c_ddb1
software http://purl.org/coar/resource_type/c_5ce6
other http://purl.org/coar/resource_type/c_1843
archive http://purl.org/library/ArchiveMaterial
art exhibition http://purl.org/ontology/bibo/Collection
bibliography http://purl.org/coar/resource_type/c_86bc
bulletin http://purl.org/ontology/bibo/Series
edition of sources http://purl.org/coar/resource_type/c_ba08
manuscript http://purl.org/coar/resource_type/c_0040
correspondence http://purl.org/coar/resource_type/c_0857
report http://purl.org/coar/resource_type/c_93fc
periodical http://purl.org/coar/resource_type/c_2659
pre-publication http://purl.org/coar/resource_type/c_816b
review http://purl.org/coar/resource_type/c_efa0
score http://purl.org/coar/resource_type/c_18cw
survey data https://w3id.org/survey-ontology#SurveyDataSet
text http://purl.org/coar/resource_type/c_18cf
thesis http://purl.org/coar/resource_type/c_46ec
web page http://purl.org/coar/resource_type/c_7ad9
data paper http://purl.org/coar/resource_type/c_beb9
programmable article http://purl.org/coar/resource_type/c_e9a0

It is possible to specify the nature or genre of the resource content using the optional dcterms:type field in “add metadata”.

Title (mandatory)

Describe the data with a title or name. The title should be precise and unique, so that the data can be better understood.

  • For a photograph, the main subject of the image: “Tour de la Défense: construction view”.
  • For a press article, its title: “Launch of the construction of the Tour de la Défense”.

Depending on the needs and uses of the data concerned, the title can include references to dates, places, people, etc.

The mandatory title field can be repeated to indicate different languages.

To mention a secondary title, an abbreviated title or any other name given to the resource, use the dcterms:alternative field instead.

Note

The data title is different from the name of the associated file(s) in the repository. A data item in NAKALA consists of a description record accompanied by one or more files. The naming of data files must also be organized and planned. Rules are explained in the [prepare data] section (/nakala-preparer-ses-donnees/).

Authors (mandatory)

In the Author field, which is mandatory by default, we recommend indicating the data producer. However, this is not always appropriate for the data submitted, or even not possible (unknown author).

To meet the different needs for describing the “Author” role, you can :

  • Duplicate the Author field
  • Add a dcterms:creator field
  • Leave the Author field anonymous and add an optional dcterms:creator field, for example in the case of data whose author is not known in the form of a surname/first name.

Other fields relate to the description of a role on a piece of data and can meet a description need in “add metadata”:

  • dcterms:publisher
  • dcterms:contributor

It’s important to take into account how the data will be cited.

Creation date (mandatory)

Mention here the date of creation of the content of the resource, and not the date of creation of its digitized form, in the case of a posteriori digitization.

If the deposited resource represents only a digital avatar of the object described, e.g. the digitization of an old manuscript, indicate the date of creation of the latter.

This field accepts the following forms of W3CDTF:

  • YYYY-MM-DD (year-month-day). Example: 2021-03-02
  • YYYY-MM (year-month). Example: 2021-03
  • YYYY (year). Example: 2021

This field accepts the value “Unknown”.

If the forms accepted by W3CDTF are too restrictive, you can leave the Date field set to “Unknown” and add an optional dcterms:created field whose content is not controlled.

License (mandatory)

The License field specifies the conditions under which the data can be reused.

The NAKALA deposit form allows the depositor to select :

  • the 6 [Creative Commons] licenses (https://creativecommons.org/),
  • the [Etalab] license (https://www.etalab.gouv.fr/licence-ouverte-open-licence).

To meet other needs, it is possible to mention other licenses. In this case, NAKALA’s License field lets you autocomplete a license from a list of some 400 licenses.

If the desired license is not found, you can request the addition of a license by writing to nakala@huma-num.fr. In this case, you need to give the license title and its uri.

Resource

See Reuse licenses in the context of Open Data, by Doranum.

Corresponds to dcterms:description.

Allows you to describe the content of the resource in the form of free text. Specify description language.

Specify the choice of description fields in the optional fields:

  • If the description is a content summary, use dcterms:abstract.
  • If the description is a table of contents, use dcterms:tableOfContents.

Corresponds to dcterms:subject

Allows you to describe the subject(s) of the resource content in the form of keywords. For ease of use, this field is associated with [ISIDORE] repositories (https://isidore.science/vocabularies).

Concept labels from repositories used by ISIDORE for data enrichment (RAMEAU, Pactols, GEMET, LCSH, BNE, GéoEthno, ArchiRès, Geonames) are searchable by autocompletion. This autocompletion is an aid to selection, enabling the depositor to select the desired concept, but also to enter a specific concept not found by autocompletion. Simply validate the word (by pressing Enter) for it to be taken into account. We recommend that you specify the language of your keywords.

This multivalued field can also be repeated, for example, to enter the same list of keywords in another language.

It is also possible to use the optional dcterms:subject field in “add metadata” to indicate, for example, a code linked to a concept from a Library of Congress repository (types: dcterms:LCSH or dcterms:LCC), or a code from a classification (types: DDC or UDC).

Corresponds to dcterms:language.

When relevant, the Language field is used to specify the language of the resource. This field is optional and repeatable. The language is identified by autocomplete search in the Nakala repository (a list of over 7,000 living or extinct languages according to ISO-639-1 and ISO-639-3 standards).

It is also possible to use the dcterms:language field in “add metadata” to indicate a “language” that is not part of this repository, or to specify, for example, the script used (ISO-15924 code).


Publication of metadata after DOI assignment

Each piece of data published in NAKALA is assigned a Digital Object Identifier (DOI Datacite), a perennial identifier enabling long-term citation of the data. This attribution is recorded in the metadata, and in order to ensure the continuity of citations, this information is intended to be available in the long term.

Thus, data published in NAKALA is exposed by the NAKALA OAI-PMH server and published in Datacite.

Each collection in NAKALA is a SET in NAKALA’s OAI-PMH warehouse: https://api.nakala.fr/oai2.

It is therefore important to have as clear and precise descriptions as possible of the data deposited in NAKALA.