Huma-Num in a nutshell¶
Supported by the CNRS (the French National Center for Scientific Research), Aix-Marseille University and the Campus Condorcet, Huma-Num is a French Very Large Research Infrastructure (IR*) with international reach devoted to Social Sciences and Humanities. It is part of the national ESFRI roadmap, which is in turn aligned with the European Union’s ESFRI framework. Indeed, Huma-Num is entrusted with France’s participation in two European Research Infrastructure Consortia (ERIC): DARIAH (Digital Research Infrastructure for the Arts and Humanities) and CLARIN (Common Language Resources and Technologies Infrastructure). It is also involved in European and international projects and participates in the creation of the infrastructure OPERAS in collaboration with OpenEdition .
Huma-Num aims at supporting research communities by providing services, assessment and tools on digital research data. To perform its missions, the IR* Huma-Num bases its activities on a innovative form of organization that combines human (collective consultation through Huma-Num’s consortia, which are groups of researchers and engineers, funded by Huma-Num, working on common areas of interest) and technological resources (sustainable digital services ; see below) on a national and European scale.
With the consortia it supervises, Huma-Num coordinates the production of digital data while offering a variety of platforms and tools for the processing, conservation, dissemination and long-term preservation of digital research data. One of the scientific objectives of such involvement is to promote data sharing so that other researchers, communities or disciplines, can reuse them, including from an interdisciplinary perspective and in different ways. More generally, the principles and methods of the Linked Open Data (RDF, SPARQL, SKOS, OWL) on which Huma-Num’s services rely enable data to be documented or re-documented for various uses without confining them to inaccessible silos.
As a consequence, Huma-Num encourages core principles and methods such as open science and the technologies of Linked Open Data to promote interoperability as much internally to allow its services to communicate with one another, as externally to let users plug their tools into Huma-Num services.
Technically, the infrastructure itself is hosted in a large data center in Lyon built by and for physicists. A long-term preservation facility from another data center (CINES) based in Montpellier is also used. In addition, a group of correspondents in the MSH (Maison des Sciences de l’Homme) Network all across France is in charge of relaying information about Huma-Num’s services and tools.
What can Huma-Num do for you?¶
Huma-Num provides tools and services to French communities of researchers and engineers in SSH for each step in the research data lifecycle. It also provides research projects with a range of tools to facilitate the interoperability of various types of digital raw data and metadata.
More specifically for digital collections, the aim is to foster the exchange and dissemination of metadata, but also of the data themselves via standardized tools and lasting, open formats. The tools developed by Huma-Num are all based on Semantic Web technologies, mainly for their auto-descriptive features and for the enrichment opportunities they provide. Other interoperability technologies complement those tools, such as OAI-PMH. All our resources are therefore fully compatible with the Linked Open Data (LOD).
(FAIR) DATA Services¶
Two services in particular have been developed by Huma-Num to process,
store and display research data while making them FAIR and preparing
them for re-use and long-term preservation. These services embrace the
research data life cycle and are designed to meet the needs arising
- Document and Share in an interoperable way your data with NAKALA;
- Disseminate your data and retreive linked objects with ISIDORE.
These services thus constitute a coherent chain of research data tools. While they interact smoothly with each other, they are also open to external tools using the same technologies.
NAKALA is an interoperable and secure service for depositing all types of data (e.g. text files, audio, video, images or other types) in order to share them. This repository mainly provides these services:
- assignation of a PID (Persistent IDentifier) making data and metadata citable;
- permanent data access;
- dissemination of metadata through a RDF Triple Store and an OAI-PMH endpoint;
- dedicated search engine;
- customized presentation with NAKALA Press.
This allows the separation of data management from data presentation.
ISIDORE is a search engine for discovering and finding publications, digital data and the profiles of researchers in the social sciences and humanities (SSH) from around the world (more than 10000 collections from worldwide)
The full text of several million documents (articles, theses and dissertations, reports, datasets, web pages, database records, descriptions of archival holdings, etc.) and event announcements (seminars, conferences, etc.) can be searched. In addition, ISIDORE links these millions of documents together by enriching them with scientific concepts created by SSH research communities. More than a simple search interface, ISIDORE standardizes and enriches the metadata and data collected thanks to recognized vocabularies in three languages (French, English and Spanish). Also, ISIDORE index full text (from open-access plateform) in more than 500 languages and dialects.
It is accessible in english on the Web through the portal isidore.science. An API is also available. ISIDORE is referenced in the European Open Science Cloud (EOSC) markeplace as an unified access to the European hub of research data, tools and services for innovation and education.
One of the objectives is to prevent the loss of data by preparing their long-term preservation. Huma-Num highlights two aspects:
- Documenting the use of appropriate formats, which are the basis of data interoperability, greatly facilitates the archiving process.
- An important point is to make the storage of data independent of the device used to disseminate the data.
Different technologies are provided for cold data (i.e. inactive data that is rarely used or accessed), warm data (i.e. data that gets analysed on a fairly frequent basis, but is not constantly in play or in motion) and hot data (i.e. data used very frequently and data that administrators perceive to be always changing).
For Hot Data: NAS Service¶
For hot data, high availability is provided via network-attached storage (NAS) associated with regular snapshots (currently around 100 TB).
For Warm and Cold Data: Distributed Huma-Num Box System¶
Huma-Num Box is a distributed file system for warm and data. A mesh of distributed storage has been established all over France (currently 10 nodes) using different storage technologies encapsulated (e.g. disk drives associated with backup on tapes). Thus, it is possible to do backup and versioning on any node linked on this logically private network: the software allows complete flexibility in the type and frequency of backups and versioning (currently around 500 TB).
More information about this system: A Techno-Human Mesh for Humanities in France: Dealing with preservation complexity
ShareDocs is a file manager that can be used via a web browser, a WebDAV client or a file synchronization software. Some of its features are comparable to those of tools like Dropbox or Google Drive, but it has clear advantages concerning the security of data storage.
Huma-Num provides a long-term preservation service based on the CINES facility (https://www.cines.fr/en/long-term-preservation/), which is intended for data with a valuable heritage or scientific value. This is much more than the bit preservation done with the above-mentioned technologies. A long-term preservation project means that data have to be organized in such a way that they can be reused by someone who did not participate in their creation, which presupposes a lot of curation. In addition, data should be expressed in a format accepted by the CINES (see https://facile.cines.fr) and it is necessary to provide additional information to document the context of data production, metadata etc. Huma-Num accompanies this kind of projects with a role of go-between linking data producers, CINES, archivists and other actors.
Software Provision Service¶
Huma-Num also buys licences and can provide access on demand to commercial software for text, image or sound processing, spatial data management and data analysis like Oxygen, XMLmind XML Editor, Abbyy, R Studio, ArcGIS, etc.
See how to access to available software here.
The shared web hosting service allows you to host a website to disseminate the data of a research project. It can host any web application using the classic technologies PHP, MySQL, PostgreSQL, Java
Currently aroud 800 websites are hosted.
Huma-Num provides virtual servers (Virtual Machines) for the implementation of web applications and complex processing. This service also gives software autonomy to projects.
Currently aroud 300 virtual machines are provided.
In order to enhance the quality of resources hosted by Huma-Num, Huma-Num support it’s users to develop good data management practices at each step of the data lifecycle.
The main lines of support are:
- Help users understand and apply FAIR principles
- Develop the appropriation of Huma-Num services
- Participate in the evolution of the services and tools made available by Huma-Num for the processing, preservation and dissemination of research data in SHS
- Develop information and training materials for users
The actions implemented for its realization are:
- coordinate of user support (weekly team meetings)
- Evaluate requests and needs, answers and meets with users
- Participate in the creation and updating of supports and documentation
- Participates in Huma-Num’s training activities
- Carries out the monitoring of quality of corpora, especially those deposited in the repository NAKALA, through a regular exchange with data producers
The support team is made up of 4 full-time staff whose skills are complementary (e.g quality of metadata, file formats specialist etc.)
Huma-Num’s National Consortia¶
The main idea of a national consortium is to organize (multi)disciplinary collective dialogue within research communities by bringing together different types of actors (researchers, technical staff, librarian, engineers, etc.) coming from different institutions, with the aim of creating synergies. In return, a consortium is expected to provide guidelines of technological and (or scientific) best practices, new standards and tools.
What are the Huma-Num Consortia? What is their life cycle?¶
A Huma-Num consortium is a group of people, often from different institutions, sometimes from different disciplines, working on the same scientific objects, methods, themes. Together, they submit a common project to the Huma-Num Scientific Council which provides an evaluation and if it is approved, the consortia is labelled and funded by Huma-Num for four years.
Every year, the Scientific Council evaluates them by making a scientific evaluation of their actions and he gives a recommendation on the budget requested. Then, the Huma-Num’s Steering Committee validates or proposes modification to the budget; and so on.
Every year Huma-Num labels new entrants, receives submissions, or renews labels based on a mix of continuum and new projects of the group.
What do the Huma-Num Consortia do?¶
With the help of Huma-Num’s services and personnel, Huma-Num’s consortia are tasked with creating synergies from within the research community for SSH. Their main goal is to facilitate the appropriation of digital tools and their inclusion in open data, open source and open access processes.
In order to do it, they lead a variety of actions such as organizing concertation on good practices, organising training sessions, developing specific or generic tools, publishing guidelines and promoting multi-scale dialog.
See the current list of Huma-Num’s national consortia here.
Huma-Num is involved in several European and international projects. It also collaborates with foreign infrastructures worldwide.
The CNRS and the French Ministry of Higher Education, Research, and Innovation (MESRI) have granted Huma-Num responsibility for coordinating French participation in two European Research Infrastructures Consortia (ERICs): Huma-Num is thus a founding member in DARIAH, currently an observer in CLARIN and participates in the creation of the infrastructure OPERAS in collaboration with OpenEdition.
DARIAH (Digital Research Infrastructure for the Arts and Humanities) is an ERIC, a pan-European infrastructure for arts and humanities scholars and educators working with computational methods. As a research infrastructure of people, expertise, information, knowledge, content, methods, tools and technologies from its member countries, DARIAH develops, maintains and operates an infrastructure that sustains researchers in building, analyzing and interpreting digital resources. By working with communities of practice, DARIAH brings together individual state of-the-art digital arts and humanities activities and scales their results to a European level, enabling the transition to Open Science. It preserves, provides access to, and disseminates research that stems from these collaborations and ensures that best practices, methodological and technical standards are followed.
The DARIAH vision is that humanities researchers will be able to assess the impact of technology on their work in an informed manner, access the data, tools, services, knowledge and networks they need seamlessly and in contextually rich virtual and human environments and produce excellent, digitally-enabled scholarship that is reusable, visible and sustainable.
France is a founding member of DARIAH, and part of the DARIAH Coordination Office is based in Paris at Huma-Num. Huma-Num is the national coordinating institution for the DARIAH-FR network, which also includes the CCSD and OpenEdition.
CLARIN (Common Language Resources and Technology Infrastructure) ERIC takes up the mission to create and maintain an infrastructure to support the sharing, use and sustainability of language data and tools for research in the Humanities and Social Sciences. Currently CLARIN provides easy and sustainable access to digital language data (in written, spoken, or multimodal form) for scholars in the social sciences and humanities, and beyond. CLARIN also offers advanced tools to discover, explore, exploit, annotate, analyse or combine such datasets, wherever they are located. France is currently an observer member.
OPERAS will be a European Research Infrastructure for open scholarly communication, particularly in the Social Sciences and Humanities. Led by the French OpenEdition, it aims to become an ERIC and groups 35 organizations from 12 European countries. It aims to coordinate and pool university-led scholarly communication activities in Europe with a view to enabling Open Science as the standard practice.
To achieve this goal, it will mutualize the activities of strategic scholarly communication actors and stakeholders (research institutions, libraries, platforms, publishers, funders) in their transition to Open Science; and it will develop common best practice standards for digital Open Access publishing, infrastructures, services, editorial quality, business models and funding streams.
The project is led by Huma-Num and aims at developing a European platform for the discovery of data, research projects and researchers’ profiles: GOTRIPLE.
This platform should allow researchers in the humanities and social sciences not only to discover and reuse data and projects available in 9 languages, but also to develop a network beyond disciplinary and linguistic borders thanks to various innovative services:
- A crowdfunding platform aims to promote and launch small research projects and thus participate in the development of citizen science.
- A recommendation system (ScAR) and a network tool based on trust (Trust Building System)
- An annotation and visualization tool to highlight the relevance of available data according to the needs of researchers.
SSHOC involves the several European actors in SSH, mainly ERICs infrastructures, in the construction of the EOSC. As such, this project aims to bring the voice of SHS to the European cloud. In accordance with the FAIR principles, the project partners are working on the creation of digital tools and services dedicated to their research community, to make them accessible via the EOSC, and to maintain them.
In this perspective, one of the major creations of the project is the SSH Open Marketplace: a platform for the discovery of tools and services for researchers in SHS. This platform aims to accompany the user at all stages of the data life cycle by offering “contextualized” solutions. More concretely, a researcher looking for a tool will also be suggested other items to enrich his solution: publications, tutorials, etc.
Coordinated by GARR, the Italian National Research and Education Network (NREN), this project brings together representatives of national infrastructures from 5 Member States: Austria, Belgium, Germany, Italy and France. More specifically, it involves the following organizations: University of Vienna (Austria), University of Ghent (Belgium), CINES, CNRS, IFREMER, INRA, INRIA and INSERM (France), DKRZ, Fraunhofer, GFZ and KIT (Germany) and CINECA, CMCC, CNR, INFN and Trust IT (Italy).
Twelve research institutes from CNRS are participating in EOSC-Pillar. IN2P3 coordinates this participation while the IR* Huma-Num is involved through its expertise in the management of SSH research data.
PARTHENOS stands for “Pooling Activities, Resources and Tools for Heritage E-research Networking, Optimization and Synergies”. This H2020 project aims at strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, Social Sciences and related fields through a thematic cluster of 16 European Research Infrastructures. The project also aims at integrating initiatives, e-infrastructures and other world-class infrastructures and building bridges between different, although tightly, interrelated fields.
PARTHENOS will achieve those objectives through the definition and support of common standards, the coordination of joint activities, the harmonization of policy definition and implementation, and the development of pooled services and of shared tools and solutions to the same problems.
Humanities at Scale (2015-2017)¶
The project Humanities at Scale was designed to improve DARIAH in fostering new and sustaining existing knowledge in digitally enabled research in the Arts and Humanities. In order to achieve these goals HaS focused on three main activities: scaling up the DARIAH community by fostering common practices and expanding the knowledge about Digital Humanities with a pan-European programme, including regions without a longstanding tradition in the Digital Humanities, developing core services for the ERIC and informing research communities.
Other International Collaborations¶
Coordinated by the University of Turin in collaboration with OpenEdition and Huma-Num, CO-OPERAS (Coordination for Open Access to Scientific Communication in the European Research Area) is an Implementation Network of GO FAIR based on the GOCHANGE (technology component) and GOBUILD (digital acculturation component) pillars. It has notably benefited from ANR Flash Science Ouverte funding in 2019.
CO-OPERAS is first of all a collaborative network, both European and international, which aims at coordinating open scientific communication in SHS. To achieve this goal, the network has developed a detailed program based on one of the main tools of open science: the FAIR principles (Easy to Find, Accessible, Interoperable, Reusable). Applicable to any type of data, the FAIR principles make it easy to discover, cite and reuse. As a simple system of recommendations, these principles also allow technical solutions to be adapted to the specific needs of research in the humanities and social sciences. They thus provide a common reference base and working tool for all CO-OPERAS network stakeholders: researchers, infrastructures, publishers and libraries.
In 2017, Huma-Num was a partner of the Canadian initiative CO.SHS – the open cyberinfrastructure for the humanities and social sciences – for which it provides assessment and tools. CO.SHS is strengthening the production capacity of digital scholarly publications, by increasing the dedicated to supporting research in the humanities and social sciences and the arts by discoverability of research results disseminated on the Erudit platform, and by facilitating the exploration of vast textual corpora with analysis and visualization tools.
This work of discovery and cooperation has allowed Huma-Num :
- To be a major actor in collaborations with Canadian communities
- To participate in several seminars and conferences (DH Montreal, COSHS Seminar)
- To be a partner in several research programs (Revue2.0, LINCS, etc.)
Since 2019, Huma-Num is in partnership with the CRIHN (Centre de recherche interuniversitaire sur les humanités numériques) and the Canada Research Chair on Digital Textualites of University of Montréal for growing the Stylo tool.
Spanish Speaking World¶
Since 2015, ISIDORE, the search engine for SSH created by Huma-Num, has been multilingual and annotates documents in three languages: French, English and Spanish. In order to promote these functionalities and to contribute to promoting the principles of open science by promoting the sharing of knowledge, Huma-Num undertook in 2018 the task of collecting documents in Spanish from universities or Latin-American infrastructures that had repositories with SSH data ready to be harvested by ISIDORE.