Overview
INRAE is the French National Research Institute for Agriculture, Food and the Environment. Through research, innovation and support for public policies, it proposes new directions to support the emergence of sustainable agricultural and food systems.
INRAE is the first French institute to have a Department for Open Science.
The objective is to respond to the challenges linked to the opening of scientific research in the context of digital development and increasingly strong expectations from society.
The main goal of this use case is the production of a recommendations document on PIDs, to be adopted and applied in the Institute. These recommendations concern different resource types, such as people, structures, events, sensors, documents and data. A specific effort has to be made on the versioning of PIDs and resources, especially for evolving data.
This recommendations document should lead research teams to adopt FAIR principles in their data management. Moreover, it should allow the Institute to propose new software services to implement these principles, as the Institute is responsible for the registration of PIDs and related resolutions.
Description
Each day, INRAE research teams produce plenty of data concerning animals, plants, landscapes and documents. All the data, for the majority coming from observations and experiences, have to be stored properly, then published in order to be valued and reused. It’s moreover a necessity to make this data easily findable, thanks to software systems.
Finally, every data object must be uniquely identified to be potentially cross-referenced with other data produced anywhere in the world.
However, research teams do not currently use the same identification system to represent data. This situation is mainly due to the differences we observe between the methods used to store the data. We can see, for example, lab data stored in Excel files; in relational databases; in flat files. The teams which are more advanced in technology associate triple stores to their databases, to allow cross-usage of their data and favour key-words usage.
In other words, the technological differences, as well as the lack of information on the use of known identification systems, lead to a disparity in the manner to uniquely identify data.
Challenges that need to be addressed
The main challenge lies in composing a recommendations document, which adequately explains what kind of identifier should be used to represent this or that type of data. This document must be as precise as possible to explain how to organize data versioning and identifier versioning, as well as evolving dataset identification. Moreover, this document must be nearly from today's end-users usages to optimise adoption by research teams.
Expected Impact of the Use Case
The recommendations document must inform the choice of researchers in the use of this or that PID. It should make it possible to harmonize these choices within the institute. Researchers and engineers who wish/can will be able to implement technical solutions by implementing the recommendations themselves. Finally, these recommendations will allow the Institute to offer a PID management service guaranteeing their storage and resolution.
Expected outputs
PID management software system offered by the Institute to its research teams.