Data quality

This section describes how NAKALA controls and improves data quality. It is designed as a guide for moderators.

Historical background and current content of NAKALA

The NAKALA service was opened in 2014 to provide a solution for the preservation and dissemination of SHS data. The original aim was to provide immediate secure storage of SHS project data and access to it through a persistent identification system.

NAKALA is part of a set of services provided by Huma-Num and dedicated to SHS research data management, from storage to archiving at CINES.

In addition to the original functionalities, NAKALA now enables published data to be flagged after moderation, to ensure that publication complies with current best practice, from preparation to description.

As of March 2024, NAKALA stores and exposes over 700,000 pieces of data and includes 2,300 depositor accounts.

Controlling the creation of depositor accounts in NAKALA

Huma-Num identifies each depositor in NAKALA. The HumanID account is used to request access to NAKALA. To validate an access request, the team checks the following points:

  • the applicant’s affiliation : the applicant must be a member of the French research establishment (ESR), or participate in a research project led by an ESR establishment

  • participation in a SHS research project

  • the submission of research data from the SHS domain (no administrative data, for example).

Automatic metadata checks

In NAKALA, certain types of metadata and information are automatically checked.

Depending on the information, automatic control is achieved by :

  • by providing controlled vocabularies (closed list) from which the depositor can select a value: only the values contained in the list are accepted.

  • by constraining the syntactic form of the information to adapt it to current norms and standards: only forms of writing compatible with the documented norm or standard are accepted.

The content of several metadata and type fields is automatically checked at time of submission.

Dataset status in NAKALA

Since September 2023, Huma-Num has been setting up a better-tooled support system for depositors, associated with the project to moderate data published in NAKALA according to three quality levels:

  • Public data_ : the data is deposited and published by the depositor, and benefits from the attribution of a perennial identifier (DOI) and secure storage. The depositor can use the data according to his or her needs, and cite it. In this case, the metadata benefits at least from automatic controls.

  • Moderated data_ : the data is deposited by the depositor and the “Moderated” status is displayed on the presentation page. In this case, in addition to the automatic checks, the documentary quality of the data is checked by a member of the NAKALA moderator network. The rules for this documentary assessment are set out in a grid of criteria: the moderation grid (see below).

  • Archived public data_ : the data is deposited and published by the depositor, and displays the status “Moderated” on its presentation page. At the depositor’s request and on instruction from the liaison committee, this data is checked by the Huma-Num team in charge of archiving at CINES. The rules for this appraisal are set by the requirements of the CINES archiving platform.

Documentary moderation of datasets by a network of NAKALA_ moderators

In June 2023, Huma-Num launched its project to build a network of NAKALA moderators with five pilot sites. The first step was to set up a circuit linking a local support service and data depositors, in order to organize support for the deposit and documentary moderation of data in NAKALA.

Following this experimental phase, the network is now being rolled out nationwide. It brings together volunteer staff from local relays and other players in the field (Maisons des Sciences de l’Homme - MSH, Ateliers de la donnée de Recherche Data Gouv, Consortiums Huma-Num, specialized services):

  • Individuals must have received training in NAKALA quality repositories (webinar, “Accompagner au dépôt de qualité dans NAKALA” cycle, etc.).

  • we give preference to people with a SHS profile

  • for 2024-2025, the year in which the network is set up, we are limiting the number of people per local entity to 2/3, in order to facilitate the construction of the network.

If you are in a position to support the local SHS community in depositing data in NAKALA, and you are considering participating in the moderation of datasets in NAKALA, you can declare your interest by contacting us at the following address: ateliersdeladonnee@listes.huma-num.fr.

NAKALA repository training courses:

Every year, Huma-Num organizes training courses for NAKALA repository staff: https://www.huma-num.fr/formations/

These training courses take the form of webinars, and for the past two years, a specific cycle has been offered to the staff of Ateliers de la donnée which have received or are in the process of receiving certification.

The aim of this cycle is to ensure quality repositories and share a common discourse with and among NAKALA users, from local to national levels.

3 topics are covered:

  • 1 / Discovering NAKALA: repository functionalities, resources, presentation of the NAKALA data moderation project

  • 2 / Understanding and using the NAKALA metadata schema

  • 3 / Exploiting and preserving files: best practices and tools

How to assess the documentary quality of data

In addition to training, several tools are available to NAKALA moderators to help them support applicants:

  • detailed online documentation of the description schema and advice on file formats;

  • a 2-page filing guide that summarizes the main criteria to be taken into account for a rich description and quality files;

  • a moderation grid made available to the network of moderators, defining the criteria to be checked and the evaluation rules needed to validate the quality of the repository and award it the status ‘Moderate’.

How NAKALA moderates data sets

Documentary evaluation of a repository and awarding of the quality label involves the following actions:

  • the repository manager requests moderation by selecting a moderator;

  • contact is made automatically, with the moderator taking charge of the exchange;

  • a phase of evaluation and exchange between the moderator and the depositor takes place autonomously;

  • once the quality criteria have been met, the moderator assigns ‘Moderated’ status to the deposit.

To make this possible, new functionalities have been developed:

  • addition of a moderator role and creation of a ‘Moderated’ status;

  • creation of a “Request moderation of this data” zone associated with a list of moderators accessible to data managers;

  • when the quality of the data is assessed as conforming to the criteria provided, the moderator changes the status of the data to ‘Moderate’;

  • once this status has been selected, the moderator’s name, the moderation date and the ‘Moderated’ label are automatically displayed on the data presentation page;

  • if the moderated data is subsequently modified, it automatically loses its ‘Moderated’ status.

Huma-Num would like to thank the 5 pilot sites who experimented with the circuit and tools for several weeks, and who provided us with constructive feedback enabling us to test the practicalities of implementing the moderation circuit.