Semantic Artefacts Alignment for Improving Interoperability in Astronomy

Interoperability
Metadata & Ontologies
Physical and Technical Sciences
INRAE
Observatoire de Paris


zenodo

Short Use Case overview

The astronomy community is structured in several sub-communities, with matured but siloed semantic artefact ecosystems. This use case brings all semantic artefacts in the same catalogue. The goal is to improve the semantic interoperability between the astronomy communities,  and in turn the semantic artefact FAIRness.

 

Use case description

The astronomy community is composed of three main semantics sub-communities:

  • celestial astronomy (objects are referenced to with their sky coordinates, e.g., stars, galaxies, etc);
  • planetary sciences (the study of the Solar System objects, e.g., planets, comets, asteroids, etc);
  • heliophysics (the study of the Sun, the plasma environments throughout the Solar System).

Each of these sub-communities have developed interoperability and semantic ecosystems, which are rather siloed up to now.  

 

Context and objectives

The sky astronomy community is organized around the IVOA (International Virtual Observatory Alliance, https://ivoa.net), which is maintaining an operational interoperability framework used by data repositories and science application platforms throughout the world. In this community, semantic artefacts (SA) are composed of terms (vocabularies) and schemas (data models). The Semantics Working Group of the IVOA is managing the vocabularies used in the IVOA standards. The vocabularies are available from a dedicated web page (https://ivoa.net/rdf) and are accessible using IVOA or RDF tooling.

 

The planetary science community is less organized than the sky astronomy one. Two main frameworks co-exist with different scopes: the IPDA (International Planetary Data Alliance, https://planetarydata.org), which proposes an advanced data archiving information model for planetary exploration datasets; and the OGC (Open Geospatial Consortium, https://www.ogc.org/), which is used by the teams studying the planetary surfaces.

 

The heliophysics community is organized around the IHDEA (International Heliophysics Data Environment Alliance, https://ihdea.net), which is proposing a set of tools and standards for finding and accessing datasets in this domain. Semantic artefacts in this community were historically of two kinds:

  • the SPASE (Space Physics Archive Search and Extract, https://spase-group.org) Ontology[1] (XML schema) includes list of terms, properties and classes for defining various objects (Persons, Observatories, Instruments, Datasets, Repositories, etc);
  • SOLARNET set of keywords[2] (dedicated to Solar observations).

 

The IVOA, IPDA and IHDEA alliances are all worldwide working groups, consensus and bottom-up driven, and based on best effort contributions. Interdisciplinary links between these communities have been developed thanks to the Europlanet/VESPA (http://www.europlanet-vespa.eu/) project, focusing on discoverability and implementation of plugins to extend the capabilities of existing tools. The semantic interoperability across the sub-communities approach started only recently, with the ongoing development of two common semantic artefacts: a vocabulary for “observation facilities” and another on for “reference frames”.

 

The objective of this use case is to enable semantic interoperability between sub-communities of astronomy in a first step, and explore interoperability with neighboring fields such as the Earth and environmental sciences, or particle physics. 

 

Challenges and solutions implemented 

 

Within FAIR-IMPACT Task 4.2, an OntoPortal instance has been set up with the goal of gathering semantic artefacts from the various astronomy sub-communities in the same place. The astronomy ontology portal is now available, and a series of relevant semantic artefacts have been ingested therein (39 SA, at the time of writing), covering sky astronomy and heliophysics.

 

The main challenge of this use case is to produce semantic artefacts in the form (RDF, OWL or SKOS) that can be ingested into the OntoPortal instance. Most of the current semantic artefacts (except those from the sky astronomy) are in diverse forms, from lists of terms in XML schemas, to unformatted lists of metadata in specification documents.

 

This lifting shall be done by the semantic working groups or authorities of the relevant communities (e.g., the IVOA Semantics WG or the IHDEA dedicated teams), with the support of the ObsParis FAIR-IMPACT team. This interaction has started as shown in the few examples below.

 

In the IVOA context, all the semantic artefacts (list of terms) were available on a web page, and an RDF version of each SA was available on the respective landing pages. The IVOA vocabularies are managed according to an IVOA recommendation[1] defining rules and conventions, and specifically how should be designed the RDF version of the IVOA SA, with a limited subset of SKOS and OWL properties. This design decision is trying to limit the external dependencies and ensure the sustainability of the IVOA infrastructure.

 

The semantic artefact management in the IVOA relies on VEP[2] (Vocabulary Enhancement Proposal), which is a process to propose, update and deprecate a term. The VEPs process implies a consensus-based decision, after a community discussion.  

 

In the IHDEA context, a general rehauling of the semantic artefacts has been initiated since the 2023 IHDEA meeting. The previous state was based on the SPASE information model, serialized solely by an XML schema. The lists of allowed values are embedded in the SPASE schema, requiring frequent release of new versions (e.g., for each new item in a list of allowed values). It became clear that many lists of terms needed to be updated, and that convergence with the IVOA semantic artefacts was desirable. The first work on a joint semantic artefact concerns a vocabulary for “solar system reference frames”, which will be merged with the IVOA RefFrame vocabulary[3].

 

The goal of the FAIR-IMPACT use case with the IVOA and IHDEA communities is to enhance the semantic artefact quality, especially on the interoperability. Part of the planned outcome is to update the semantic artefact management practices (e.g., new version of the Vocabulary in the VO recommendation).  

 

Expected/Measured Impacts 

 

From the point of view of the interoperability alliances, the setting up of the OntoPortal instance and the assessment / preparation of the semantic artefacts have been a quantitative and qualitative improvement.

  • IVOA context: As an example, a revision “Vocabularies in the VO” document (https://ivoa.net/documents/Vocabularies) is being prepared to include a set of terms required for fixing SKOS-based semantic artefact catalogues in the IVOA. This work also started an ongoing fondamental discussion of reusing external semantic artefacts, versus keeping things “simple” (but disconnected) by reducing external semantic dependencies.
  • IHDEA context: Several teams have started working on producing linked data and metadata using the RDF tooling. The OntoPortal instance for astronomy has been  an important incentive for adoption of this framework. 

 

From a user perspective, the enhanced FAIRness of the semantic artefacts will enhance the FAIRness of published datasets. The semantics artefacts can be plugged into smart DMP tooling (URIs for terms, rather than free text) or search interfaces, and in turn allow more refined queries and selections on data discovery interfaces. In this respect, The OSTrails (https://ostrails.eu) project development for the astronomy thematic pilot will be built on the semantic artefact catalogue tooling developed thanks to FAIR-IMPACT.

 


Contributors

Baptiste Cecconi, Observatoire de Paris
Laura Debisschop, Observatoire de Paris
Sophie Aubine, INRAE