Providing a recommendations document on PIDs usages

PIDs
Life science


The main goal of this use case is the production of a recommendations document on PIDs, to be adopted and applied in the Institute. These recommendations concern different resource types, such as people, structures, events, sensors, documents and data. A specific effort has to be made on the versioning of PIDs and resources, especially for evolving data.
This recommendations document should lead research teams to adopt FAIR principles in their data management. Moreover, it should allow the Institute to propose new software services to implement these principles, as the Institute is responsible for registration of PIDs and related resolution.

Each day, INRAE research teams produce plenty of data concerning animals, plants, landscapes and documents. All the data, for the majority coming from observations and experiences, have to be stored properly, then published in order to be valued and reused. It’s moreover a necessity to make this data easily findable, thanks to software systems.
Finally, every data object must be uniquely identified to be potentially cross-referenced with other data produced anywhere in the world.
However, research teams do not currently use the same identification system to represent data. This situation is mainly due to the differences we observe between the methods used to store the data.  We can see, for example, lab data stored in Excel files; in relational databases; in flat files. The teams which are more advanced in technology associate triple stores to their databases, to allow cross-usage of their data and favor key-words usage.
In other words, the technological differences, as well as the lack of information on the use of known identification systems, lead to a disparity in the manner to uniquely identify data.

Challenges that need to be addressed

The main challenge lies in composing a recommendations document, which adequately explains what kind of identifier should be used to represent this or that type of data. This document must be as precise as possible to explain how to organize data versioning and identifier versioning, as well as evolving datasets identification. Moreover, this document must be nearly from today's end-users usages to optimise adoption by research teams.

Expected impact of the Use Case

The recommendations document must inform the choice of researchers in the use of this or that PID. It should make it possible to harmonize these choices within the institute. Researchers and engineers who wish/can will be able to implement technical solutions implementing the recommendations themselves. Finally, these recommendations will allow the Institute to offer a PID management service guaranteeing their storage and resolution.

Expected outputs

PID management software system offered by the Institute to its research teams
 

 


Contributors

François-Xavier Sennesal - INRAE