Summary of discussions on indicators on GitHub

16 Oct 2019
Groups audience: 

The FAIR Data Maturity Model Working Group has been working since the
beginning of 2019 to develop a set of indicators for the evaluation of the
FAIR principles.
The work started with a landscaping exercise

JZG0U4Hg/edit#gid=2080819087> looking at a dozen existing evaluation
approaches to find out which kinds of aspects these approaches were
evaluating. The results of the landscaping exercise were discussed by the
working group and augmented through the use of a collaborative document

JZG0U4Hg/edit#gid=11147031> where members of the WG suggested additions and
modifications of the first set of indicators.
Further discussions took place on the groups' mailing list
, in a set
of meetings on 21 February , 3
April (at the 13th RDA plenary),
18 June and 12 September
, and on GitHub
.
The current state of the indicators, as of early October 2019, is now
frozen, with the exception of the indicators for the principles that are
concerned with 'richness' of metadata (F2 and R1, see below). The current
indicators will be used in a testing phase where owners of evaluation
approaches are going to be invited to compare their approaches
(questionnaires, tools) against the indicators. As such, the current set of
indicators can be seen as an 'alpha version'. In the first half of 2020, the
indicators may be revised and improved, based on the results of the testing.
Indicators for F1: (meta)data are assigned a globally unique and eternally
persistent identifier
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/15
There are four indicators for this principle, two for metadata and two for
data. The indicators make a distinction between persistence and uniqueness
of the identifier. For example, an HTTP URI could be unique, but its
persistence is not guaranteed. On the wording of the indicators for
uniqueness, the working group decided to use the term 'universally unique'
to include objects off-planet.
Indicators for F2: data are described with rich metadata
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/16
In the discussion in the working group it was acknowledged that there is no
clear definition of 'richness' of metadata. It was noted that the metadata
under this principle would be focused on the discovery aspect and that the
main source for definition of 'richness' should be a domain or
discipline-specific metadata standard.
Two indicators were proposed, one to recommend the use of a
domain/discipline standard, and one that recommends providing metadata
according to a set of metadata elements proposed by the RDA Metadata
Interest Group.
Further discussion has been postponed until a joint meeting on the 25th of
October 2019 at the 14th RDA plenary with several metadata-related RDA
groups: https://www.rd-alliance.org/metadata-fair-data.
Indicators for F3: metadata clearly and explicitly include the identifier of
the data it describes
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
There is one single indicator for this principle. It was noted that there is
not always a clear distinction between the identifier for metadata and the
identifier for data. For example, in a DOI/DataCite publication approach,
the DOI often resolves to a landing page that includes metadata and the URL
for access to the data. This discussion is ongoing.
Indicators for F4: (meta)data are registered or indexed in a searchable
resource
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
Initially, a set of indicators were proposed that distinguished various
places where metadata could be located and indexed: general search engines,
domain-specific portals and institutional repositories. However, it was
noted that it would be difficult and not even useful to enumerate the places
where metadata could be registered and indexed. The outcome of the
discussion was that the important aspect is that the metadata is made
available for harvesting and indexing by any system that is willing to do
that. As a result, a single indicator was retained for this principle.
Indicators for A1: (meta)data are retrievable by their identifier using a
standardised communications protocol
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
This principle combines two main aspects: one is the retrievability of the
metadata and data, the other is the use of a standard communications
protocol.
Two sets of indicators have been defined for this principle, three for
metadata and four for data. In the case of metadata, it was proposed to
include an indicator concerning access conditions in addition to indicators
for retrieval of a metadata record and for access through a standardised
protocol. For data, two indicators are defined for the way that data can be
retrieved, either manually or automated, one indicator for the retrieval of
a digital object and one for data access through a standardised protocol.
Indicators for A1.1: the protocol is open, free, and universally
implementable
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
Four indicators have been defined for this principle, two for metadata and
two for data. The indicators distinguish between the aspect that use of the
protocol should be free (i.e. without need for payment) and the aspect that
the protocol should be open-source.
Indicators for A1.2: the protocol allows for an authentication and
authorization procedure, where necessary
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/21
This principle refers to two separate aspects, i.e. authentication and
authorisation which should both be supported by the protocol that is used to
access the data. Therefore, two separate indicators cover those aspects. An
additional indicator was agreed for the inclusion in the metadata of
information relevant for access control.
Indicators for A2: metadata are accessible, even when the data are no longer
available
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/22
A single indicator was agreed for this principle, related to the guarantee
that metadata will remain available after the data is no longer available.
Indicators for I1: (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
The discussions in the working group have led to a set of six indicators,
three for metadata and the same three for data. The indicators cover three
aspects: that the knowledge representation is expressed in a standardised
format, that it is machine-understandable and that it is self-describing.
One issue that came up in the discussions on this principle is that there is
not clear definition of what 'knowledge representation' is and that
therefore the evaluation of these indicators may pose problems. A separate
discussion on this pointed to a proposal to distinguish minimal reporting
requirements, terminologies including controlled vocabularies and
ontologies, and models/formats. It was also noted that it would be good to
have a glossary of terms to ensure that these and other terms are widely
understood.
Indicators for I2: (meta)data use vocabularies that follow FAIR principles
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/24
The challenge that was identified for this principle is that there are
currently very few, if any, vocabularies that satisfy all the FAIR
principles. The reason for that is that most vocabularies were developed and
published before the FAIR principles were formulated. In addition, it is not
fully understood how deep the compliance with FAIR principles should be.
Some participants would accept compliance with a subset of FAIR principles,
e.g. persistent identification and resolution of vocabulary terms, but other
would like to see deeper compliance. It was also noted that this FAIR
principle is recursive. In order to avoid the situation that this principle
cannot be satisfied until vocabularies are made FAIR, indicators were added
that are less strong and require vocabularies to be standardised if not
fully FAIR. Four indicators have been agreed, two each for metadata and
data, for the use of standard vocabularies and for FAIR-compliant
vocabularies. Further discussion on the depth of FAIRness of vocabularies is
led by the FAIRsFAIR project in a meeting at the 14

cs-and-fair-repositories> th RDA plenary in Helsinki on 22 October 2019.
Indicators for I3: (meta)data include qualified references to other
(meta)data
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
Six indicators have been agreed for this principle, four for metadata and
two for data. A first distinction is made between unqualified and qualified
references. An unqualified reference is similar to a link to a webpage that
does not contain information about the type of reference or a property in
metadata like dct:relation, while a qualified reference includes information
what the meaning of the reference is. A second distinction is related to the
type of object that is references, either metadata or data, i.e. metadata
referencing metadata, metadata referencing data, or data referencing data.
Please note that the indicators for metadata referencing data are not about
the reference in metadata to the data that the metadata describes - that is
an aspect covered in principle F3 - but a link in the metadata to other data
that is somehow related to its 'own' data.
Indicators for R1: meta(data) are richly described with a plurality of
accurate and relevant attributes
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/26
This principle is similar to principle F2 as it relates to the 'richness' of
metadata. Where principle F2 is primarily concerned with metadata that is
relevant for discovery of the data, principle R1 is concerned with metadata
that enables reuse. In a way, F2 is related to the characteristics of the
data, while R1 is about the context that allows reusers to understand how
the data can be reused.
As a result of this similarity, the two indicators for R1 are the same as
the ones for F2, with the further note that the discussion on which elements
need to be provided has been postponed to the joint meeting in Helsinki,
https://www.rd-alliance.org/metadata-fair-data.
Indicators for R1.1: (meta)data are released with a clear and accessible
data usage licence
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/27
Four of the five indicators for this principle are concerned with the
licence under which the data can be reused. They are about whether there is
any information about a licence, whether the licence is a standard reuse
licence, whether the licence is referenced in the appropriate element in the
metadata, and whether the licence is machine-understandable. The fifth
indicator is about consent for reuse of personal or other types of
restricted data.
Indicators for R1.2: (meta)data are associated with detailed provenance
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/28
Two indicators have been agreed for this principle related to provenance
information, one for the inclusion in the metadata of provenance information
in accordance with community-specific guidelines and one for provenance
information according to a cross-domain language, for example W3C PROV
ontology.
Indicators for R1.3: (meta)data meet domain-relevant community standards
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/29
This principle is about the use of standards for the expression of metadata
and data. Of the four indicators for this principle, two are for metadata
and two for data. They distinguish between the use of a community standard
and the use of a machine-understandable community standard.