Enhance the semantic functionally of the national Earth & Environmental Data Repository by integrating it with the EarthPortal

Interoperability
Metadata & Ontologies
Earth and environmental sciences
Dataterra (CNRS)


Overview

Easy Data is the French repository for long-tail data relating to the Earth and the Environment. EaSy Data uses vocabularies initiated by scientists to fill in specific metadata elements such as topics or keywords. These vocabularies are incomplete and do not reflect the complex diversity of data deposited in EaSyData. Other vocabularies exist and could be used to complete this vocabulary. Our aim is to improve the semantic functionality of EaSyData by connecting it to the EarthPortal. By combining the vocabularies displayed in the EarthPortal and EaSy Data vocabulary with associated services such as the annotator, we can offer researchers a more comprehensive set of tools to specify metadata. This will allow us to propose new ways for the researcher to specify metadata and thus improving the semantic capabilities of EaSyData. With four data clusters related to the Earth system and the environment (atmosphere, solid earth, continental surface and ocean), the French research infrastructure Data Terra wants to improve its data repository service (EaSyData) with a better use of semantics.

 

Context and objectives

EaSy Data is a French national data repository dedicated to long-tail data related to Earth and Environment, using the ISO 19115 standard to describe them. EaSy Data uses Geonetwork in the back office to store metadata, and an ad-hoc application layer has been developed to support the data repository and related functions (deposit, search).

 

EaSy data repository
Fig. 1: EaSy Data repository 

 

To fill the metadata elements related to keywords or topics, community controlled vocabularies are used. These specific vocabularies have been defined by scientists and are maintained in an experimental registry based on UKGovLD which has limited services to administrate and suggest new keywords.

 

EaSy data Thesaurus
Fig. 2: EaSy Data Thesaurus (UKGovLD) 

 

 

Furthermore, these vocabularies are incomplete and do not reflect the complex diversity of data deposited in EaSyData. Other vocabularies exist and could be used to complete this vocabulary. For example, some of the thesauri produced by French data clusters, such as those supplied by Theia/Ozcar, or by European infrastructures, such as those defined by EPOS or ACTRIS, could be used.

 

Example of Theia/Ozcar vocabulary (Skosmos)
Fig. 3: Example of Theia/ Ozcar vocabulary (Skosmos) 

 

Example of ACTRIS vocabulary (Skosmos)
Fig. 4: Example of ACTRIS vocabulary (SKOSMOS) 
​​​​​

Thus, one of our aims is to improve the semantic functionality by using these additional SA in EaSy Data.

 

Challenges and solutions implemented

 

In EaSy Data, we need to use community vocabularies at several levels: administration (to manage users by topic, e.g., moderators linked to specific topics), depositors (to better describe datasets and avoid spelling biases, etc.) and users performing searches (to improve search guidance). This is in line with the FAIR principles, which explicitly require the use of FAIR community vocabularies. The use of a catalog to reference and update existing vocabularies, and potentially to host specific vocabularies for EaSyData, adds significant value.

To achieve this goal, we need to change the vocabulary tool used in EaSyData to better manage vocabularies and benefit from enhanced services.

The EarthPortal is a thematic semantic artefact catalog and repository for the Earth sciences using the OntoPortal technology. It has been deployed in the context of the FAIR-IMPACT Task 4.2 to host Earth and Environmental semantic artefacts (SA) that can be used by external applications through its REST API. Moreover, EarthPortal provides tools such as the Annotator (makes term suggestions based on text input), the Recommender (suggests relevant SA based on text input) and Mappings (generates, stores and displays mappings between SA). These tools could be useful to improve the EaSy Data semantic functionality.

 

EarthPortal homepage
Fig. 5: EarthPortal homepage

 

 

Data Terra's thesauri, as well as vocabularies produced by European infrastructures and other more generic but commonly used SA (e.g. SOSA, Sweet, etc.) are already available in EarthPortal.

 

Example of semantic artefacts in EarthPortal
Fig. 6: Example of semantic artefacts in the EarthPortal 

 

EaSy Data will harvest these vocabularies directly from the Earth Portal's existing REST API, in order to offer users more terms than those initially defined. EaSy Data will also use the annotation service offered by EarthPortal.

This will improve the user experience in two ways:

  • To populate keywords and topics metadata, the EarthPortal will allow the user to check additional vocabularies  to those currently used by EaSy Data.
  • Annotator service will be used to suggest new terms from the user-written abstract to enrich the metadata with new terms.

 

Expected/Measured Impacts

 

From a user perspective, we expect several improvements that should enhance the FAIRness of the datasets and related publications in EaSy Data:

  • A better access to the vocabularies and the possibility to contribute to them;
  • An extended semantic description of the datasets, allowing better discovery of related resources;
  • Semantically enriched metadata of the datasets by looking at the related SA to suggest related vocabulary concepts.

From an administrator's point of view, we expect a better management of the EaSy Data vocabularies and a smarter use of other vocabularies.

 

 

 


Contributors

C. Pierkot
C. Pierkot
G. Alviset
G. Alviset
H. Bressan