Huma-Num in a Nutshell
Supported by the CNRS (the French National Center for Scientific Research), Aix-Marseille University and the Campus Condorcet, Huma-Num is a French Very Large Research Infrastructure (“Très Grande Infrastructure de Recherche”, TGIR) with international reach devoted to Social Sciences and Humanities. It is part of the national ESFRI roadmap, which is in turn aligned with the European Union’s ESFRI framework. Indeed, Huma-Num is entrusted with France’s participation in two European Research Infrastructure Consortia (ERIC): DARIAH (Digital Research Infrastructure for the Arts and Humanities) and CLARIN (Common Language Resources and Technologies Infrastructure). It is also involved in European and international projects.
Huma-Num aims at supporting research communities by providing services, assessment and tools on digital research data. To perform its missions, the TGIR Huma-Num bases its activities on a innovative form of organization that combines human (collective consultation through Huma-Num’s consortia, which are groups of researchers and engineers, funded by Huma-Num, working on common areas of interest) and technological resources (sustainable digital services ; see below) on a national and European scale.
With the consortia it supervises, Huma-Num coordinates the production of digital data while offering a variety of platforms and tools for the processing, conservation, dissemination and long-term preservation of digital research data. One of the scientific objective of such involvement is to promote data sharing so that other researchers, communities or disciplines, can reuse them, including from an interdisciplinary perspective and in different ways. More generally, the principles and methods of the Web of data (RDF, SPARQL, SKOS, OWL) on which Huma-Num’s services rely enable data to be documented or re-documented for various uses without confining them to inaccessible silos.
As a consequence, Huma-Num encourages core principles and methods such as open science and the technologies of Web of data to promote interoperability as much internally to allow its services to communicate with one another, as externally to let users plug their tools into Huma-Num services.
Technically, the infrastructure itself is hosted in a big data center in Lyon built by and for physicists. A long-term preservation facility from another data center (CINES) based in Montpellier is also used. In addition, a group of correspondents in the MSH (Maison des Sciences de l’Homme) network (http://www.msh-reseau.fr) all over France is in charge of relaying information about Huma-Num’s services and tools.
What can Huma-Num do for you?
Huma-Num provides tools and services to French communities of researchers and engineers in SSH for each step in the research data lifecycle. It also provides research projects with a range of tools to facilitate the interoperability of various types of digital raw data and metadata.
More specifically for digital collections, the aim is to foster the exchange and dissemination of metadata, but also of the data themselves via standardized tools and lasting, open formats. The tools developed by Huma-Num are all based on Semantic Web technologies, mainly for their auto-descriptive features and for the enrichment opportunities they provide. Other interoperability technologies complement those tools, such as OAI-PMH. All our resources are therefore fully compatible with the Linked Open Data (LOD).
Three services in particular have been developed by Huma-Num to process, store and display research data while making them FAIR and preparing them for re-use and long-term preservation. These services embrace the research data life cycle and are designed to meet the needs arising therefrom:
- Share with NAKALA;
- Show and display with NAKALONA;
- Tag and push with ISIDORE.
These complementary services thus constitute a coherent chain of research data tools. While they interact smoothly with each other, they are also open to external tools using the same technologies.
NAKALA is an interoperable and secure service for depositing all types of data (text files, audio, video, images) in order to share them. Based on Semantic Web technologies, this repository mainly provides three types of services: -
- assignation of a PID (Persistent IDentifier) making data and metadata citable;
- permanent data access;
- dissemination of metadata through a Triple Store and OAI-PMH.
This allows the separation of data management from data presentation.
NAKALONA is a software package which connects the content management system Omeka (created by the Roy Rosenzweig Center for History and New Media, George Mason University, Virginia, USA) and NAKALA, a service created by Huma-Num.
It combines the power of Omeka for editing and displaying digital data and the features of NAKALA’s repository for sharing data and metadata in an interoperable way. The main goal of NAKALONA is to offer the possibility of sharing and displaying the data and metadata already stored inside NAKALA while taking advantage of Omeka’s possibilities such as its powerful search engine and extended OAI-PMH feeds. This software package is entirely managed and administered by the Huma-Num team, and provided as a Software As A Service (SAAS).
ISIDORE is a search platform allowing access to digital data of Humanities and Social Sciences. Open to all, it relies on the principles of Web of data and provides free access to data (open access). More than a simple search interface, ISIDORE standardizes and enriches the metadata and data collected thanks to recognized vocabularies in three languages (French, English and Spanish).
One of the objectives is to prevent the loss of data by preparing their long-term preservation. Huma-Num highlights two aspects:
- Documenting the use of appropriate formats, which are the basis of data interoperability, greatly facilitates the archiving process.
- An important point is to make the storage of data independent of the device used to disseminate the data.
Different technologies are provided for cold data (i.e. inactive data that is rarely used or accessed), warm data (i.e. data that gets analysed on a fairly frequent basis, but is not constantly in play or in motion) and hot data (i.e. data used very frequently and data that administrators perceive to be always changing).
For cold data: Backup on tapes
For cold data, the CC-IN2P3 Datacenter where Huma-Num’s infrastructure is hosted provides a backup on tapes (currently around 700 Tos).
For hot data: NAS’s service
For hot data, high availability is provided with a NAS associated with regular snapshots (currently around 100 Tos).
For warm data: distributed Huma-Num Box system
Huma-Num Box is a distributed file system for warm data. A mesh of distributed storage has been established all over France (currently 9 nodes) using different storage technologies encapsulated. Thus, it is possible to do backup and versioning on any node linked on this logically private network: the software allows complete flexibility in the type and frequency of backups and versioning (currently around 300 Tos).
ShareDocs is a file manager that can be used via a web browser, a WebDAV client or a file synchronization software. Some of its features are comparable to those of tools like Dropbox or Google Drive, but it has clear advantages concerning the security of data storage.
Huma-Num provides a long-term preservation service based on the CINES facility (https://www.cines.fr/en/long-term-preservation/), which is intended for data with a valuable heritage or scientific value. This is much more than the bit preservation done with the above-mentioned technologies. A long-term preservation project means that data have to be organized in such a way that they can be reused by someone who did not participate in their creation, which presupposes a lot of curation. In addition, data should be expressed in a format accepted by the CINES (see https://facile.cines.fr) and it is necessary to provide additional information to document the context of data production, metadata etc. Huma-Num accompanies this kind of projects with a role of go-between linking data producers, CINES, archivists and other actors.
Provision of Software
Huma-Num also buys licences and can provide access on demand to commercial software for text, image or sound processing, spatial data management and data analysis like Oxygen, XMLmind XML Editor, Abbyy, R Studio, ArcGIS, etc.
See the list of available software here.
Huma-Num’s national Consortia
The main idea of a national consortium is to organize (multi)disciplinary collective dialogue within research communities by bringing together different types of actors (researchers, technical staff, librarian, engineers, etc.) coming from different institutions, with the aim of creating synergies. In return, a consortium is expected to provide guidelines of technological and (or scientific) best practices, new standards and tools.
What is a Huma-Num’s consortia? What is their life cycle?
A Huma-Num’s consortium is a group of people, often from different institutions, sometimes from different disciplines, working on the same scientific objects, methods, themes. Together, they submit a common project to Huma-Num Scientific Council which provides an evaluation and if it is approved, the consortia is labelled and funded by Huma-Num for four years.
Every year, the Scientific Council evaluates them by making a scientific evaluation of their actions and he gives a recommendation on the budget requested. Then, the Huma-Num’s Steering Committee validates or proposes modification to the budget; and so on.
Every year Huma-Num labels new entrants, receives submissions, or renews labels based on a mix of continuum and new projects of the group.
What are Huma-Num’s consortia goals and what do they do?
With the help of Huma-Num’s services and personnel, Huma-Num’s consortia are tasked with creating synergies from within the research community for SSH. Their main goal is to facilitate the appropriation of digital tools and their inclusion in open data, open source and open access processes.
In order to do it, they lead a variety of actions such as organizing concertation on good practices, organising training sessions, developing specific or generic tools, publishing guidelines and promoting multi-scale dialog.
See the current list of Huma-Num’s national consortia here.
Huma-Num’s International Collaborations
Huma-Num is involved in several European and international projects. It also collaborates with foreign infrastructures worldwide.
The CNRS and the French Ministry of Higher Education and Research (MENESR) have granted Huma-Num responsibility for coordinating French participation in two European Research Infrastructures Consortia (ERICs): Huma-Num is thus a founding member in DARIAH, and currently an observer in CLARIN.
DARIAH (Digital Research Infrastructure for the Arts and Humanities) is an ERIC, a pan-European infrastructure for arts and humanities scholars working with computational methods. It supports digital research as well as the teaching of digital research methods. France is an official member.
CLARIN (Common Language Resources and Technology Infrastructure) ERIC takes up the mission to create and maintain an infrastructure to support the sharing, use and sustainability of language data and tools for research in the Humanities and Social Sciences. Currently CLARIN provides easy and sustainable access to digital language data (in written, spoken, or multimodal form) for scholars in the social sciences and humanities, and beyond. CLARIN also offers advanced tools to discover, explore, exploit, annotate, analyse or combine such datasets, wherever they are located. France is currently an observer member.
OPERAS will be a European Research Infrastructure for open scholarly communication, particularly in the Social Sciences and Humanities. Led by the French OpenEdition, it aims to become an ERIC and groups 35 organizations from 12 European countries. It aims to coordinate and pool university-led scholarly communication activities in Europe with a view to enabling Open Science as the standard practice.
To achieve this goal, it will mutualize the activities of strategic scholarly communication actors and stakeholders (research institutions, libraries, platforms, publishers, funders) in their transition to Open Science; and it will develop common best practice standards for digital Open Access publishing, infrastructures, services, editorial quality, business models and funding streams.
PARTHENOS (http://www.parthenos-project.eu/) stands for “Pooling Activities, Resources and Tools for Heritage E-research Networking, Optimization and Synergies”. This H2020 project aims at strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, Social Sciences and related fields through a thematic cluster of 16 European Research Infrastructures. The project also aims at integrating initiatives, e-infrastructures and other world-class infrastructures and building bridges between different, although tightly, interrelated fields.
PARTHENOS will achieve those objectives through the definition and support of common standards, the coordination of joint activities, the harmonization of policy definition and implementation, and the development of pooled services and of shared tools and solutions to the same problems.
Humanities at Scale (2015-2017)
The project Humanities at Scale (http://has.dariah.eu/) was designed to improve DARIAH in fostering new and sustaining existing knowledge in digitally enabled research in the Arts and Humanities. In order to achieve these goals HaS focused on three main activities: scaling up the DARIAH community by fostering common practices and expanding the knowledge about Digital Humanities with a pan-European programme, including regions without a longstanding tradition in the Digital Humanities, developing core services for the ERIC and informing research communities.
Other International Collaborations
Huma-Num is a partner of the Canadian initiative CO.SHS – the open cyberinfrastructure for the humanities and social sciences – for which it provides assessment and tools. CO.SHS is strengthening the production capacity of digital scholarly publications, by increasing the dedicated to supporting research in the humanities and social sciences and the arts by discoverability of research results disseminated on the Erudit platform, and by facilitating the exploration of vast textual corpora with analysis and visualization tools.
Spanish Speaking World
Since 2010, ISIDORE, the search engine for SSH created by Huma-Num, has been multilingual and annotates documents in three languages: French, English and Spanish.
In order to promote these functionalities and to contribute to promoting the principles of open science by promoting the sharing of knowledge, Huma-Num undertook in 2018 the task of collecting documents in Spanish from universities or Latin-American infrastructures that had repositories with SSH data ready to be harvested by ISIDORE.
For the moment, more than 10,000 additional documents in Spanish have been aggregated to ISIDORE, enriched and valued and are now available for new research.
The TGIR Huma-Num will continue this work with the support of existing French university networks (UMIFRE network, IFAL, IRD, CEMCA) as well as local partners with a view to closer collaboration.