Advancing access interoperability with ODRL

Interoperability
Metadata & Ontologies
Social Sciences and Humanities
UESSEX-UKDS/CESSDA


Overview

The UK Data Service (UKDS) is a partnership between the Universities of Essex, Manchester, UCL, Edinburgh and Jisc and the UK service provider to the Consortium of Social Science Data Archives (CESSDA).

The focus of this use case is on enabling machine actors to better interpret the currently ambiguous semantics of digital objects’ access and usage conditions and secondly, to provide more specific guidance on how to encapsulate the definition and execution of access and usage conditions in FAIR signposting practices.  The latter reference “license” as a link type but apart from referencing natural language license statements, this mechanism currently provides little scope for subsequent machine-actionable negotiation and execution of access/usage conditions for a digital object.

Access and usage conditions are typically specified, asserted and managed locally.  Beyond classifying these with some shared, loosely understood categories such as “Open” and “Closed”, such categories are largely bespoke to a particular repository.  For example, UKDS has three top-level categories: Open, Safeguarded and Controlled and also supports embargoes.  By way of contrast, OpenAire has openAccess, restrictedAccess, embargoedAccess and closedAccess.  Such access categories are pivotal for FAIR however they encapsulate and signify a set of complex attributes, constraints and workflows but in a currently non-normative way.  For the purposes of long-term global interoperability, such locally-defined high-level access categories are currently of little practical use beyond simple discovery, as they only signify precise meaning locally to the repository that assigns them.

An access category is normally assigned as the end result of a (typically human) assessment of the intersection of (a) the regulatory/legal context, (b) rights and usage prescriptions of the data owner, and (c) the disclosure risk of the data (itself a function of inherent properties of the data in isolation as well as emergent properties when the data is combined with other data).  In most cases, these assessments are often non-deterministic.

W3C standards such as ODRL (Open Digital Rights Language) have emerged, which allows natural language rights statements to be formally represented as structured RDF data. This use case will create the first comprehensive coverage of ODRL statements for a national collection in the landing pages of UKDS “studies” (the primary object that acts as a container for datasets and documentation).  This is a foundational first step to providing a machine-actionable corollary to hitherto natural language-based artefacts, such as licenses and data-sharing agreements.


Description

For researchers, access to data, particularly sensitive data, is too complex and takes much more time than it should.  Much effort has been devoted to machine-actionable implementations of the FAIR principles but in the access arena, less progress has been made.  Access and usage conditions are derived from the intersection of a number of factors: legal & statutory obligations, rights management assertions, external prescriptions from data owners, and intrinsic properties of the digital object e.g. more disclosive data will inevitably require more stringent access protocols.  With the global recognition that interoperability will lead to better global services for researchers, access is no longer a second-order problem.  Mediating researcher access to data has become a topic we can no longer leave primarily to humans’ best administrative efforts, still largely informed by natural language license artifacts.  Rights statements, legal obligations and access workflows need to be systematically modelled and implemented in metadata and code, in order to be executed at scale by machines. ODRL, while not complete in its coverage of all aspects of rights and access management, is currently the most practicable way forward to deliver better access interoperability.

Challenges that need to be addressed

Attempting to harmonize top-level access categories across domains and repositories is unlikely to be a fruitful course of action: considering that CESSDA’s data access policy took several years to reach an agreement on the most coarse-grained access categories.   We will pursue a more granular, bottom-up approach that establishes core vocabularies for the key ODRL classes i.e. Parties, Permissions, Obligations, Prohibitions and Actions and best practices for representing this in ODRL policies.  In practice,  there are a finite number of items in these core vocabularies for the majority of access-related repository activities.  Once they are available to deploy in ODRL policies, this will be a significant step forward in effectively modelling the definition of traditionally prose-based access/usage conditions statement.   It is a precursor to a future goal (not in the scope of this use case) of connecting ODRL Actions to machine-actionable workflow definitions modelled in, for example, Common Workflow Language, leading to full end-to-end machine-actionable messaging and process choreography between repositories. 
 
The machine-actionable access arena is relatively immature – compared to practices around discovery, for example.   Simply communicating why this is important is not a trivial task to communities administratively and culturally accustomed to dealing with researchers’ access to data as a largely human-mediated activity.
 
As well as providing a real-world production implementation of ODRL, we will provide guidance, both technical and more governance-related: the terminological overlaps and relationship between licensing, rights management, access and usage (among others) remain a barrier to precise articulation of problem statements and the design of solutions in response. 

Expected Impact of the Use Case

Working with our partners in CESSDA in the newly created Sensitive Data Working Group, UKDS will endeavour to be the exemplar for an initial real-world implementation of ODRL and will encourage and advocate for the uptake of similar practice by other Service Providers in CESSDA.
 
We expect the benefits for data consumers to include:

  • Medium-term:
    • transparency and efficiency in requesting data
    • consistency of access experience across different service providers
  • Longer term:
    • automated processes and services across service providers
    • foundational infrastructure for future B2B federation of access workflows

For service providers, the standardized and structured approach to access through ODRL and associated controlled vocabularies will provide:

  • Medium-term
    • Guidance on minimal best practices and design patterns for new systems development
    • Equity, and transparency in processing access requests
  • Longer term
    • The ability to track and evaluate access requests more systematically helps provide more robust evidence to inform improvements to access management practices.
    • Opportunities for service providers to participate in multi-organisational and cross-domain collaborations 

Expected outputs

Tangible outcomes/solutions

  • Best practice documentation for embedding machine-actionable ODRL statements in resource landing pages and how this interacts with current FAIR signposting practice.
  • Production implementation in UKDS catalogue.


Contributors

Darren Bell - UESSEX-UKDS/CESSDA
Darren Bell - UESSEX-UKDS/CESSDA
Hervé L’Hours - UESSEX-UKDS/CESSDA