Defining, managing, and reporting dataset quality in a multidisciplinary Open Data space

You are here

22 Feb 2022

Defining, managing, and reporting dataset quality in a multidisciplinary Open Data space

Submitted by Carlo Lacagnina


Meeting objectives: 

 

The “Data Quality” subgroup of the European Open Science Cloud (EOSC) Task Force on FAIR metrics and Data Quality organizes this interactive RDA session, in collaboration with the ESIP Information Quality Cluster and the Australia-New Zealand Data Quality Interest Group. This meeting seeks to:

  • Get a broad overview of the needs (common principles, concepts and elements) for data quality in diverse scientific environments, as well as pointing to commonalities and most relevant opportunities (risks) deriving from improved (poor) data quality.
  • Identify and promote community interests, opportunities and capabilities to improve the management of dataset quality in a multidisciplinary environment (e.g. directions to select quality dimensions, what metrics are in place or need to be defined).
  • Promote cross-disciplinary consistency when managing and communicating dataset quality information.
  • Engage the RDA community in developing  a Working Group for FAIR Dataset Quality Information.
Meeting agenda: 

 

Collaborative session notes: https://docs.google.com/document/d/1KIieicxYX6X1ETKXJY0dPcFZEMLbOlkcNC0K...

 

Presentations:

  • Introduction to EOSC FAIR Metrics and Data Quality Task Force - 5 mins

  • Summary of the last RDA-P18 BoF session “Representing and Communicating Data Quality Information” - 10 mins

  • Needs and challenges to improve representation and communication of dataset quality information in different communities and scenarios. Case studies: - 20 mins

    • Lessons learned from the INSPIRE regulation, as a legal requirement on how to share public data and finding a balance between details & feasibility

    • Perspective on Data Quality regarding vocabulary registry, terminology subject indexing, etc.

    • Data quality to ensure unbiased AI algorithms

    • Metadata standards in bioimaging

  • Overview of international community-based guidelines to improve the management and accessibility of Earth science dataset quality information - 10 mins

Multidisciplinary discussion on developing a Working Group for FAIR Dataset Quality Information - 40 mins

 

Closing remarks and next steps - 5 mins

 

Type of Meeting: 
Working meeting
Short introduction describing any previous activities: 

 

As evidenced by the long list of fields covered by the members of the World Data System (WDS), data from a very diverse set of scientific disciplines are regularly managed by organizations from around the world. Informed decisions on, and accurate (re)use of, individual digital datasets depend on knowledge about the quality of data and relevant information. Effectively representing and communicating dataset quality information is important for conducting research within and across many disciplines, for improving data sharing and ultimately promotes an effective data ecosystem.

As a first step to address this challenge and promote the creation and (re)use of freely and openly shared dataset quality information, international domain experts have undertaken an effort to develop guidelines for the Earth science community. An initiative started with collaboration among the Earth Science Information Partners (ESIP) Information Quality Cluster (IQC), the Barcelona Supercomputing Center (BSC) Evaluation and Quality Control (EQC) team, and the Australia/New Zealand Data Quality Interest Group (AU/NZ DQIG). The guidelines are characterized by practical recommendations to promote sharing and reusing of quality information at the dataset level. A peer-reviewed paper titled “Call to Action for Global Access to and Harmonization of Quality Information of Individual Earth Science Datasets'' was published in the Data Science Journal (DOI:http://doi.org/10.5334/dsj-2021-019), a set of guidelines was developed and described in a white paper (DOI: https://doi.org/10.31219/osf.io/xsu4p), and a BoF session was organized during the last RDA Plenary Meeting (RDA-VP18) to present this work and seek interest in extending it to other disciplines.

Building on the session at P18, the purpose of this BoF is to broaden the disciplinary base covered by these efforts beyond Earth science in promoting consistency when managing dataset quality information. To this end, the European Open Science Cloud (EOSC) FAIR Metrics and Data Quality Task Force has taken a leading role by including perspectives and experiences from diverse disciplines including biology, metrology, philosophy, data and computer sciences. We will focus on common principles, concepts and steps to consider when dealing with quality, as well as identifying the actors involved in the quality assessments and organizations seen as a reference point in each discipline. The endeavor shall form the foundation for comprehensive guidelines in dataset quality across different disciplines to ensure better consistency of metadata sustaining a FAIR interdisciplinary data space.

 

BoF chair serving as contact person: 
Meeting presenters: 
Carlo Lacagnina, Ge Peng, Robert Downs, Chris Schubert, Andrea Bertino, Sarah Stryeck, Oliver Biehlmaier
Avoid conflict with the following group (2): 
Contact for group (email): 
Driven by RDA Organisational Member: 
No
Applicable Pathways: 
FAIR, CARE, TRUST - Adoption, Implementation, and Deployment
Training, Stewardship, and Data Management Planning