Transparency guidelines

FAIR-IMPACT supports the implementation of FAIR-enabling practices, tools, and services. To this end, guidelines and a prototype are being developed to improve the transparency of, and trust in, repositories. 

The guidelines help to expose relevant information as metadata at the organisational and object level to facilitate discovery, provide context, and support interoperability. The guidelines will also recommend accompanying evidence in a uniform and transparent way, to create a sense of trust in the services themselves and service providers that expose information. Focus is put on exposing information about characteristics (e.g. repository name, contact information), information that can help to inform a sense or status of trustworthiness (e.g. a certificate, preservation policy), and information relating to FAIRness of the digital objects held (e.g. assessment results, tool(s) used).

The guidelines in their current version (V1.0) are as follows:

1 - Relevant information should be exposed to achieve transparency.

Exposing information on organisations, services, and objects, as well as their functions and characteristics, implies the precise descriptions of the particular resources. For example, a dataset should be recognisable as a dataset by the information consumer and a repository should identify itself as a repository and data catalogue. The denotation should be as detailed as possible, yet referencing superclasses (e.g. ‘Resource’ for a dataset, or ‘DataService’ for a catalogue, depending on the standard used). This will enable machine agents to select and process the information in specific as well as in generic use cases. It also helps human actors to recognize and understand the information.

2 - Information about the functions and characteristics of repositories and objects should be expressed in line with defined standards and criteria.

Transparent information exposed should take account of, and map to, existing standards and criteria. While perfect alignment may not be possible, defining and documenting how the information structure and content relates to existing efforts will minimise divergence and maximise interoperability. 

Designing the presentation of information should take account of objects at a generic level (e.g. Dublin Core or DataCite) and at disciplinary level (e.g. DDI for the social sciences), and for repositories at a functional level (e.g. CoreTrustSeal) or a more granular level (e.g. DRAWG). The design should also map to the needs of potential consumers of information about repositories (e.g. re3data) or objects (e.g. F-UJI) is also important.

3 - Information should be exposed by, and/or provide references to, an originating source.

As observed in the current landscape, much of the relevant information that needs to be available in a transparent way to inform trust is currently available at multiple third party service providers. Registration, identification, and aggregation adds value to the research landscape and in most cases is a necessity (e.g. PIDs). Despite the complexity to navigate multiple services, merging all information with the same up-to-dateness, it also requires additional trust in the intermediary services to inform decisions. This guideline therefore specifies that an information provider should be placed in control of its own information and permit multiple other organisations to consume it for a variety of uses. It is also of importance that when the information is exposed in different places, the information is consistent across all locations. This should be possible if the information can be validated using the same validation actions, regardless of the location of exposure.

4 - Clarity should be provided on how information should be expressed to support humans and machines.

Defining how different assertions should be presented requires a balance between flexibility and providing enough guidance to ensure uniformity. For example, the following assertion types could be considered:

Free-text assertions: statements in response to descriptive criteria about the required information.  Controlled assertions: selections from ontologies or controlled vocabularies. Identification: Expose and reference PIDs with your (meta)data Evidence artefacts: links to resources containing the asserted information.

Persistent identification, resolution and associated metadata are essential foundations for this guideline. Organisations, such as DataCite or ORCiD, enable sustainability and availability far into the future. They provide unique references, deduplication and in general improve findability and reuse within and between other services beyond a specific technology stack.

To facilitate meaningful validation actions, supporting information and documents should be linked and exposed (see Figure 3). As an example, when a repository claims certification, it should provide a link to the certificate at the certification authorities’ site. This enables human actors and machine agents to consider assertions as a kind of evidence for other assertions and validate accordingly.

Information consumers or communities could work to specify the scope of content of free-text assertions, the ontologies and controlled vocabularies used for the controlled assertions, or acceptable links to use for evidence artefacts. 

5 - When an assertion can be validated, the possible validation action(s) should be defined.

Such specified validation actions ensure uniformity in the interactions with human agents and/or machine agents can carry out and expect to be carried out with regards to their exposed information. For example, the following validation actions could be considered: 

Acceptance of assertion: assertion is accepted without further validation. Direct machine-actionable validation: given that the assertion is machine-testable, the information presented is automatically validated in an established process. Machine-actionable validation through a third party: given a third party can be pointed to as the authority on the information asserted, they are called on to validate the information automatically through an established process.Validation through human action: given the assertion is not machine-testable, the consumer must take manual steps to validate the information presented. Validation through a mixture of human and machine action: given that the assertion is machine-testable, the choice may be made to also validate the information manually to ensure the content or quality of the supplied information (i.e. the machine tests the information is available, the human validates the content).

Choices made in assertion actions can be based on the purposes of the information consumer, or the level of information made available by the provider. To achieve certain use cases for the model, choices will need to be made in the validation actions that will be desired and sufficient to reach specific goals.

Technically, validation mechanisms should be provided to allow users to easily identify and verify trustworthiness or FAIR certificates or ratings. This should be done on the basis of tamper-proof badges or seals, such as OpenBadge.

6 - Appropriate standards should be used to expose dataset metadata, FAIR assessment results, and catalogue information towards harvesters and discovery services.

To enable exploration of (meta)data services, a standard should be used that is able to describe a repository, the digital objects it holds, and their context properly, such as schema.org or DCAT. For example, DCAT is an established and matured standard and recommended by the World Wide Web Consortium (W3C). It is an RDF vocabulary and designed to foster interoperability between data catalogues on the web. It allows for tailored profiles and yet persisting flexibility and interoperability of the semantic web. DCAT already acknowledges the research data landscape in its specification, e.g. PIDs. With a broad and active community it also bridges to a wider audience like the Open (Governmental) Data-Community,, or subject specific organisations like the OGC. Thus the implementation of DCAT will increase consistency and machine-actionability for the repository and its datasets. It will also enable exposing related (meta)data like data quality and linking to other Linked (Open) Data resources.

To standardise outputs of FAIR metrics and associated assessment results the use of Data Quality Vocabulary (DQV) is recommended since this may be used to embed FAIR assessment results within the metadata of assessed data sets via DCAT as recommended by the W3C ‘Data on the Web Best Practices’. This standard further allows to include a minimum set of metadata required to reproduce FAIR assessments which are: test date, assessment target, metric used, and name and software version of the testing tool.

7 - The levels of care offered by repositories and received by digital objects should be expressed.

Beyond basic information about retention periods, repositories should expose information about the different levels of curation and preservation they provide across their digital object collections. At the digital object level it should be clear what levels of retention, curation and preservation are in place, and how and when these might change. Supporting information would include appraisal and selection criteria, re-appraisal schedules, preservation plans etc.

8 - Multiple calibrated FAIR assessment tools should be used, embedded in a holistic FAIR consultation process that supports contextual understanding.

Since there are several FAIR evaluation tools, each of which evaluates the various FAIR implementation options somewhat differently, multiple evaluation tools should be used which should be calibrated against a selection of FAIR benchmarking standard datasets, such as the currently prepared set of benchmarks for FAIR signposting. 

Because these tools usually focus on machine-readable FAIR implementations, it is also of importance to focus separate attention to human-friendly FAIR implementation. Therefore, FAIR evaluations should always be part of an intensive consultation process, supporting a holistic understanding of FAIR and its context.

This process should already start with the selection of a representative set of datasets to be studied and subsequently help to both interpret the machine-aware FAIR implementations and match them with the human-friendly ones.

For the full context, rationale, design concepts, scoping considerations, and next steps, please consult the milestone report.

Further information:

Community feedback

The guidelines are being developed through several iterations, each informed by as much community-input as possible. This page will reflect the latest version of the guidelines and we are very happy to invite community feedback on this work. You can do this in different ways:

  • Commenting on this webpage. Please indicate clearly in your commentary which guideline(s) you are referring to.
  • Providing direct comments on the full report. You can leave suggestions and comments on specific parts or ask for clarification from the authors.
  • Would you like more personal contact on the topic of metrics for data? Get in touch with us by sending an email to metrics@fair-impact.eu

Leave a comment

Please feel free to leave us a comment to share your thoughts with the FAIR-IMPACT community.

Log in or register to post comments

There are no comments