Metrics for data

FAIR-IMPACT continues the work on metrics for data objects started in the FAIRsFAIR project. The seventeen minimum viable metrics proposed by FAIRsFAIR for the systematic assessment of FAIR data objects are being refined and extended. These metrics are based on indicators proposed by the RDA FAIR Data Maturity Model Working Group, on the WDS/RDA Assessment of Data Fitness for Use checklist , and on prior work conducted by project partners such as FAIRdat and FAIREnough. At present, the metrics address the FAIR principles, except A1.1, A1.2 (open protocol, authentication and authorization) and I2 (FAIR vocabularies).

The metrics are presented with a description and background and are mapped to their related FAIR principle. The alignment with the CoreTrustSeal Requirements for Trustworthy Digital Repositories is also presented for each metric.

Domain-agnostic metrics for data assessment

According to the latest version developed in FAIR-IMPACT (v0.8), the generic Data Object Assessment Metrics are as follows:

FsF-F1-01D - Metadata and data are assigned a globally unique identifier.

Description. A globally unique identifier may be assigned to a landing page containing metadata, a metadata file or a data file or stream such that it can be referenced unambiguously by humans or machines. Globally unique means an identifier should be associated with only one resource at any time. Examples of unique identifiers are Internationalized Resource Identifier (IRI), Uniform Resource Identifier (URI) such as URL. Well known persistent identifiers also are globally unique such as URN, Digital Object Identifier (DOI), the Handle System, identifiers.org, w3id.org and Archival Resource Key (ARK). Unique identifiers not necessarily are resolvable but may also be represented by a UUID (Universal Unique IDentifier) or a Hash code. A data repository may assign a globally unique identifier to your metadata when you publish and make it available through their services.

Background. While today most identifiers can be represented as actionable URLs still some non-actionable identifiers may in be in use which are globally unique such as UUID (Philipson, 2017)

FAIR principle. F1. (Meta)data are assigned globally unique and persistent identifiers.

CoreTrustSeal Alignment. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation.

FsF-F1-02MD - Metadata and data are assigned a persistent identifier.

Description. Both, metadata as well as data, should be provided with a persistent identifier. We make a distinction between the uniqueness and persistence of an identifier: An HTTP URL (the address of a given unique resource on the web) is globally unique, but may not be persistent as the URL of data may be not accessible (link rot problem) or the data available under the original URL may be changed (content drift problem). Identifiers based on the Handle System, DOI, ARK are both globally unique and persistent. They are maintained and governed such that they remain stable and resolvable for the long term. The persistent identifier (PID) may be resolved (point) to a landing page, metadata or the data content (downloadable artefact), or none if the data or repository is no longer maintained. Therefore, ensuring persistence is a shared responsibility between a PID service provider (e.g., datacite) and its clients (e.g., data repositories). For example, the DOI system guarantees the persistence of its identifiers through its social (e.g., policy) and technical infrastructures, whereas a data provider ensures the availability of the resource (e.g., landing page, downloadable artefact) associated with the identifier.

Background. The EOSC PID policy requires a PID to be globally unique, persistent, and resolvable (Valle et al., 2020). No authoritative list or registry of persistent identifiers yet exists, but the DataCite identifier type vocabulary (DataCite Metadata Working Group, 2019) is listing most common PID types. These can be used except identifiers exclusively used for print products and physical entities (e.g., ISBN, EAN, ROR). In addition, identifiers listed in identifiers.org can be used to complement the controlled list.

FAIR principle. F1. (Meta) data are assigned globally unique and persistent identifiers

CoreTrustSeal Alignment. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation

FsF-F2-01M - Metadata includes descriptive core elements (creator, title, data identifier, publisher, publication date, summary and keywords) to support data findability.

Description. Metadata is descriptive information about a data object. Since the metadata required differs depending on the users and their applications, this metric focuses on core metadata. The core metadata is the minimum descriptive information required to enable data finding, including citation which makes it easier to find data. We determine the required metadata based on common data citation guidelines (e.g., DataCite, ESIP, and IASSIST), and metadata recommendations for data discovery (e.g., EOSC Datasets Minimum Information (EDMI), DataCite Metadata Schema, W3C Recommendation Data on the Web Best Practices and Data Catalog Vocabulary). This metric focuses on domain-agnostic core metadata. Domain or discipline-specific metadata specifications are covered under metric FsF-R1.3-01M. A repository should adopt a schema that includes properties of core metadata, whereas data authors should take the responsibility of providing core metadata.

Background. Following data citation guidelines metadata properties necessary for proper data citation are: creator, title, publication date, publisher, and identifier.

In addition, abstract or summary and keywords are essential to enable discoverability and the indication of a resource type is necessary to distinguish research data objects from other digital objects. The resulting set of core descriptive metadata elements (creator, title, publisher, publication date, summary, keywords, identifier) aligns well with existing recommendations for data discovery and core metadata definition. This set of metadata elements is present in most domain agnostic metadata standards such as Dublin Core, DCAT-2, schema.org/Dataset, and DataCite schema.

FAIR principle. F2. Data are described with rich metadata.

CoreTrustSeal Alignment. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation.

FsF-F3-01M - Metadata includes the identifier of the data it describes.

Description. The metadata should explicitly specify the identifier of the data (content) such that users can discover and access the data through the metadata. Such identifiers could, for example, be represented by links to downloadable data files but also to services that enable a selection of data.

FAIR principle. F3: Metadata clearly and explicitly include the identifier of the data they describe.

CoreTrustSeal Alignment. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation.

FsF-F4-01M - Metadata is offered in such a way that it can be registered or indexed by search engines.

Description. Metadata can be available via multiple endpoints. For example, a repository may distribute its metadata via a metadata protocol (e.g. via OAI-PMH) and/or a custom web service, but these may only support specialized catalogs that are only known to a limited number of people. This metric focuses on those methods of making metadata available that are beneficial to as many user groups as possible, i.e. that can be consumed by well-known, large catalogs and search engines such as Google and Bing. Such metadata should be offered according to the requirements of these search engines.

FAIR principle. F4. (Meta)data are registered or indexed in a searchable resource.

CoreTrustSeal Alignment. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation.

FsF-A1-01M - Metadata contains access level and access conditions of the data.

Description. This metric determines if the metadata includes the level of access to the data such as public, embargoed, restricted, or metadata-only access and its access conditions. Both access level and conditions are necessary information to potentially gain access to the data. It is recommended that data should be as open as possible and as closed as necessary. Datasets should be released into the public domain and openly accessible without restrictions when possible. Embargoed access refers to data that will be made publicly accessible at a specific date which should be specified in the metadata. Restricted access refers to data that can be accessed under certain conditions or is available to a particular group of users or after permission is granted.

FAIR principle. A1: (Meta)data are retrievable by their identifier using a standardized communication protocol.
Note: This metric is about ensuring provision of metadata related to data access and based on metric RDA-A1-01. This metadata is important to retrieve data using a standardized communication protocol, thus we mapped it to the principle A1.

CoreTrustSeal Alignment. R2. The repository maintains all applicable licenses covering data access and use and monitors compliance. R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.

FsF-A1-02MD - Metadata and data are retrievable by their identifier.

Description. This metric determines whether data and metadata are accessible via their identifiers, i.e. whether the identifiers resolve to a target that actually contains data or metadata.

FAIR principle. A1: (Meta)data are retrievable by their identifier using a standardized communication protocol.

CoreTrustSeal Alignment. R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.

FsF-A1.1-01MD - A standardized communication protocol is used to access metadata and data.

Description. Given an identifier of a dataset, the dataset should be retrievable using a standard communication protocol such as HTTP, HTTPS, FTP, TFTP, SFTP, FTAM and AtomPub. Avoid disseminating data using a proprietary protocol.

FAIR principle. A1: (Meta)data are retrievable by their identifier using a standardized communication protocol.

FsF-A1.2-01MD - Metadata and data are accessible through a standardized communication protocol which supports authentication.

Description. Given an identifier of a dataset, the metadata of the dataset as well as related data should be retrievable using a standard communication protocol which supports authentication such as HTTP, HTTPS, FTPS.

FAIR principle. A1. (Meta)data are retrievable by their identifier using a standardized communication protocol.

FsF-I1-01M - Metadata is represented using a formal knowledge representation language.

Description. Knowledge representation is vital for machine-processing of the knowledge of a domain. Expressing the metadata of a data object using a formal knowledge representation will enable machines to process it in a meaningful way and enable more data exchange possibilities. Examples of knowledge representation languages are RDF, RDFS, and OWL. These languages may be serialized (written) in different formats. For instance, RDF/XML, RDFa, Notation3, Turtle, N-Triples and N-Quads, and JSON-LD are RDF serialization formats.

FAIR principle. I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

Note: The I1 principle loosely defines the use of knowledge representation. Therefore, we define two metrics corresponding to the principle concerning metadata. The metric FsF-I1-01M focuses on making the metadata available for machine-mediated interpretation, whereas the metric FsF-I1-02M focuses on the use of semantic resources to enrich metadata.

CoreTrustSeal Alignment. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.

FsF-I2-01M - Metadata uses registered semantic resources.

Description. A metadata document or selected parts of the document may incorporate additional terms from semantic resources (also referred as semantic artefacts) that unambiguously describe the contents so they can be processed automatically by machines. This metadata enrichment may facilitate enhanced data search and interoperability of data from different sources.

Ontology, thesaurus, and taxonomy are kinds of semantic resources, and they come with varying degrees of expressiveness and computational complexity. Knowledge organization schemes such as thesaurus and taxonomy are semantically less formal than ontologies. As a base requirement for FAIR semantic resources / vocabularies these should be registered in a semantic repository which supports metadata indexing.

FAIR principle. I2. (Meta)data use vocabularies that follow FAIR principles.

CoreTrustSeal Alignment. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

FsF-I3-01M - Metadata includes qualified references between the data and its related entities.

Description. Linking data to its related entities will increase its potential for reuse. The linking information should be captured as part of the metadata. A dataset may be linked to its prior version, related datasets or resources (e.g. publication, physical sample, funder, repository, platform, site, or observing network registries). Links between data and its related entities should be qualified, thus expressed through relation types (e.g., DataCite Metadata Schema specifies relation types between research objects through the fields ‘RelatedIdentifier’ and ‘RelationType’), and preferably use persistent Identifiers for related entities.

FAIR principle. I3. (Meta)data include qualified references to other (meta)data.

CoreTrustSeal Alignment. R11. The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.

FsF-R1-01MD - Metadata specifies the content of the data.

Description. This metric evaluates if the content of the dataset is specified in the metadata (beyond the provision of the data identifier), and it should be an accurate reflection of the actual data deposited. Examples of the properties specifying data content are resource type (e.g., data or a collection of data), data format and size. Further, properties like variable(s) measured or observed could be verified, however since these are not available for some data types (images, movies etc.) these should not be scored. Ideally, ontological vocabularies should be used to describe data content (e.g., variable) to support interdisciplinary reuse.

FAIR principle. R1: (Meta)data are richly described with a plurality of accurate and relevant attributes.

Note: Data quality aspect is not explicitly addressed by FAIR principles. However, an accurate description of the data content is important for assessing the quality of the data. We regard the properties of data content as part of rich metadata, therefore we map this metric to its closest principle R1.

FsF-R1.1-01M - Metadata includes license information under which data can be reused.

Description. This metric evaluates if data is associated with a license because otherwise users cannot reuse it in a clear legal context. We encourage the application of licenses for all kinds of data whether public, restricted or for specific users. Without an explicit license, users do not have a clear idea of what can be done with your data. Licenses can be of standard type (Creative Commons, Open Data Commons Open Database License) or bespoke licenses, and rights statements which indicate the conditions under which data can be reused.

It is highly recommended to use a standard, machine-readable license such that it can be interpreted by machines and humans. In order to inform users about what rights they have to use a dataset, the license information should be specified as part of the dataset’s metadata.

FAIR principle. R1.1. (Meta)data are released with a clear and accessible data usage license.

CoreTrustSeal Alignment. R2. The repository maintains all applicable licenses covering data access and use and monitors compliance.

FsF-R1.2-01M - Metadata includes provenance information about data creation or generation.

Description. Data provenance (also known as lineage) represents a dataset’s history, including the people, entities, and processes involved in its creation, management and longer-term curation. It is essential that data producers provide provenance information about the data to enable informed use and reuse. The levels of provenance information needed can vary depending on the data type (e.g., measurement, observation, derived data, or data product) and research domains. For that reason, it is difficult to define a set of finite provenance properties that will be adequate for all domains. Based on existing work, we suggest that the following provenance properties of data generation or collection are included in the metadata record as a minimum.

Sources of data, e.g., datasets the data is derived from and instruments
Data creation or collection date
Contributors involved in data creation and their roles
Data publication, modification and versioning information

There are various ways through which provenance information may be included in a metadata record. Some of the provenance properties (e.g., instrument, contributor) may be best represented using PIDs (such as DOIs for data, ORCIDs for researchers). This way, humans and systems can retrieve more information about each of the properties by resolving the PIDs. Alternatively, the provenance information can be given in a linked provenance record expressed explicitly in, e.g., PROV-O or PAV or Vocabulary of Interlinked Datasets (VoID). Alternatively suitable metadata properties can be used. For example, Dublin Core has been mapped to PROV in PROV-DC which allows further mappings of other metadata standards to PROV.
FAIR principle. R1.2. (Meta)data are associated with detailed provenance.

CoreTrustSeal Alignment. R7. The repository guarantees the integrity and authenticity of the data.

FsF-R1.3-01M - Metadata follows a standard recommended by the target research community of the data.

Description. In addition to core metadata required to support data discovery (covered under metric FsF-F2-01M), metadata to support data reusability should be made available following community-endorsed metadata standards. Some communities have well-established metadata standards (e.g., geospatial: ISO19115; biodiversity: DarwinCore, ABCD, EML; social science: DDI; astronomy: International Virtual Observatory Alliance Technical Specifications) while others have limited standards or standards that are under development (e.g., engineering and linguistics). The use of community-endorsed metadata standards is usually encouraged and supported by domain and discipline-specific repositories.

FAIR principle. R1.3. (Meta)data meet domain-relevant community standards.

CoreTrustSeal Alignment. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

FsF-R1.3-02D - Data is available in a file format recommended by the target research community.

Description. File formats refer to methods for encoding digital information. For example, CSV for tabular data, NetCDF for multidimensional data and GeoTIFF for raster imagery. Data should be made available in a file format that is backed by the research community to enable data sharing and reuse. Consider for example, file formats that are widely used and supported by the most commonly used software and tools. These formats also should be suitable for long-term storage and archiving, which are usually recommended by a data repository. The formats not only give a higher certainty that your data can be read in the future, but they will also help to increase the reusability and interoperability. Using community-endorsed formats enables data to be loaded directly into the software and tools used for data analysis. It makes it possible to easily integrate your data with other data using the same preferred format. The use of community-endorsed formats will also help to transform the format to a newer one, in case an older format gets outdated.

FAIR principle. R1.3. (Meta)data meet domain-relevant community standards.

CoreTrustSeal Alignment. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

Discipline-specific metrics for data assessment

One of the main objectives of the FAIR-IMPACT project is to build upon the current assessment metrics and tailor them to specific disciplines. Through extensive analysis of the current community practices, some metrics have been identified that can be made specific to certain communities.

For the social sciences community, the following discipline-specific metrics have been created which may be used instead of their corresponding domain-agnostic counterpart:

FsF-F2-01m-ss - Metadata includes descriptive core elements relevant for the social sciences to support data findability.

Description. Metadata is descriptive information about a data object. Since the metadata required differs depending on the users and their applications, this metric focuses on core metadata. The social science community has defined specific requirements for core metadata and the individual content to be described with it defined in the CESSDA Metadata Model (CMM). These are community-specific with respect to certain properties but coincide to a large extent with discipline-agnostic specifications such as common data citation guidelines (e.g., DataCite, ESIP, and IASSIST), and metadata recommendations for data discovery (e.g., EOSC Datasets Minimum Information (EDMI), DataCite Metadata Schema, W3C Recommendation Data on the Web Best Practices and Data Catalog Vocabulary).

FAIR principle. F2. Data are described with rich metadata

CoreTrustSeal Alignment. Discovery & Identification R12. The repository enables users to discover the digital objects and refer to them in a persistent way through proper citation.

FsF-F4-01M-ss - Metadata is offered in such a way that it can be retrieved by machines for social sciences catalogues.

Description. This metric refers to ways through which the metadata of data is exposed or provided in a standard and machine-readable format. In Europe, the social sciences community catalogues largely rely on the availability of the standard metadata exchange protocol OAI-PMH. Such interfaces are e.g. used by GESIS and CESSDA portals and therefore relevant for the social sciences. Therefore, metadata should be made available via OAI-PMH for the social sciences community.

FAIR principle. F4. (Meta)data are registered or indexed in a searchable resource

CoreTrustSeal Alignment. Discovery & Identification R12. The repository enables users to discover the digital objects and refer to them in a persistent way through proper citation.

FsF-I2-01M-ss - Metadata uses semantic resources relevant for the social sciences research community.

Description. Metadata should incorporate controlled terms from community-specific semantic resources that unambiguously describe the contents so they can be processed automatically by machines.

Semantic resources registered in community catalogues (CESSDA and GESIS vocabulary services) relevant for the social sciences community should be preferred, these include e.g. the CESSDA Topic Classification, THESOZ , ELSST.

FAIR principle. I2. (Meta)data use vocabularies that follow FAIR principles

CoreTrustSeal Alignment. Reuse R13. The repository enables reuse of the digital objects over time, ensuring that appropriate information is available to support understanding and use.

FsF-R1.1-01M-ss - Metadata includes licence information under which data can be reused within the scope of social sciences.

Description. This metric evaluates if data is associated with a licence preferred by the social sciences community because otherwise users cannot reuse it in a clear legal context. Within the social sciences, the CreativeCommons family of licences is most frequently used and therefore required.

It is highly recommended to use a standard, machine-readable licence such that it can be interpreted by machines and humans. In order to inform users about what rights they have to use a dataset, the licence information should be specified as part of the dataset’s metadata.

FAIR principle. R1.1. (Meta)data are released with a clear and accessible data usage licence

CoreTrustSeal Alignment. Rights Management R02. The repository maintains all applicable rights and monitors compliance.

FsF-R1.3-01M-ss -Metadata follows a standard recommended by the social sciences (ss) research community of the data.

For social sciences several well established metadata standards exist in particular the family of standards defined by the DDI (Data Documentation Initiative) Alliance but also other formats are used within the community.
A FAIR social sciences repository should support the following standards

DDI Lifecycle
DDI Codebook
da|ra metadata
A discipline agnostic metadata format which can be mapped to DDI: Schema.org, Dublin Core, DataCite or DCAT for data set level metadata description.

FAIR principle. R1.3. (Meta)data meet domain-relevant community standards

CoreTrustSeal Alignment. Reuse R13. The repository enables reuse of the digital objects over time, ensuring that appropriate information is available to support understanding and use.

Further information

For more information on how these metrics were designed, and how they were implemented in the F-UJI tool, see:

Huber, R. (2025). FAIRsFAIR Data Object Assessment Metrics (0.8). Zenodo. https://doi.org/10.5281/zenodo.15045911
D5.1 Implementing metrics for automated FAIR digital objects assessment in a disciplinary context
M5.4 Practical tests for automated FAIR digital object assessment in disciplinary context
Huber, R. (2025). FAIR metrics for Earth & Environmental Sciences - preliminary results (Versie 1). Zenodo. https://doi.org/10.5281/zenodo.15282756
M5.1 Reference collection of test data sets

As you read these metrics please bear in mind the following:

In the FAIR ecosystem, FAIR assessment must go beyond the object itself. FAIR enabling services and repositories are vital to ensure that research data objects remain FAIR over time.
Automated testing depends on clear, machine-accessible criteria. Some aspects (rich, plurality, accurate, relevant) specified in FAIR principles still require human mediation and interpretation.
Until domain/community-driven criteria such as schemas and usage elements have been agreed, the tests must focus on generally applicable data/metadata characteristics.

Metrics for data can be assessed using automated assessment tools. You can find the tools that FAIR-IMPACT works with here.

Please feel free to leave us a comment to share your thoughts with the FAIR-IMPACT community.

Answered by Admin AdminSurname
17 July 2026

test

Metrics for data

Domain-agnostic metrics for data assessment

Discipline-specific metrics for data assessment

Further information

Leave a comment