Providing documentation on harmonised and citable PIDs for subsets of protected data. Use case by EMBL-EBI

PIDs
Life science
EMBL-EBI


Overview

European Bioinformatics Institute is Europe’s largest provider of public biomolecular data resources. The institute is co-located with Elixir Hub and partnered up in many relevant EU projects, among others EOSC-Life, FREYA, and BY-COVID. This use case will explore PID practices in relation to complex data citation and sensitive data for the life science domain, and provide documentation on best practices to be adopted across domains. In addition to supporting life sciences, EMBL-EBI is increasingly also collaborating with other domains, e.g. social sciences in the context of Covid-19 research. EMBL-EBI provides consistent access to life science data by leveraging compact identifiers through the Identifiers.org resolution service. This service will be fine-tuned during the course of the project to ensure alignment with community FAIR practices and the broader EOSC context. 


Description

The ambition of EMBL-EBI in the PID use case is to curate and update components of Identifiers.org. The updates will be aligned with community standards and needs in accordance with FAIR practices. Next step in this process will be to implement tombstone records in the Identifiers.org registry. Aligning the tombstone entry points following EOSC guidelines would ensure that all necessary entry points are included to enable verification of the contents’ last residing place correctly and to ensure the persistence of the PID. However, some modifications might have to be made to the tombstone practices to avoid the system and its maintenance being overburdened. The intention is to create a proposal internally on how Identifiers.org tombstones should be implemented and then open it up for community review and discussion.

The ambition is also to study the possibility of facilitating the automated use of the Identifiers.org registry by including support for kernel information profiles and Digital Object interface protocol. This effort, however, requires some further deliberations to be able to define the exact approach and scope.

Identifiers.org resolution service provides PIDs to data hosted by several repositories. The API endpoints of the service can already provide metadata information on the referenced objects. The entries of the registry are actively curated by a team of specialists to ensure correct behaviour and normalised information. The service currently provides minimal support for PIDs intended for complex data in cases where the target repository identifies these objects with local IDs and provides a valid URL pattern for redirection to these objects. Providing additional support in such cases is another action point for the use case to address.

Challenges that need to be addressed

There is a need to find solutions to overcome some challenges related to PID practices, especially in cases where billions of data objects are included in a large number of resources, where data resources manage their own PIDs, where frequent data updates occur, and where there is a high barrier for adoption of global PID systems. PID practice alignment will require discussions to take place between relevant EOSC stakeholders, and this will require some time and effort. 

Expected Impact of the Use Case

More efficient use of PIDs in sensitive data will benefit especially research within health and medicine, and thus have great positive societal and economic value. Documented complex data citation practices in relation to PIDs support the generation of more qualitative data and effects positively on the reusability and reproducibility of research data. Co-designing the PID practices through EOSC alignment provides a research environment which is responsive to the needs of the various research communities. EMBL-EBI is experienced in working with metadata standards, integration, discoverability, and display through its work with the Identifier.org service. Furthermore, EMBL-EBI is able to bring its valuable expertise from its partnership with a large international consortium, the European Bioinformatics Institute, which is a public biomolecular data resources provider. EMBL-EBI also has vast expertise gained from its partnership in many relevant EU projects, among others EOSC-Life, FREYA, and BY-COVID.

Expected outputs

Documentation on harmonised, citable PIDs for subsets of protected data. The use case will bring evolving Identifiers.org practices into a broader EOSC context and provide solutions to overcome some challenges related to PID practices.

 


Contributors

Henning Hermjakob EMBL-EBI
Henning Hermjakob, EMBL-EBI
Renato Juacaba Neto EMBL-EBI
Renato Juacaba Neto, EMBL-EBI
Josefine Nordling CSC
Josefine Nordling, CSC