Complex Data Citations: Formulating a Community Recommendation

You are here

26 Feb 2022

Complex Data Citations: Formulating a Community Recommendation

Submitted by Shelley Stall

Meeting objectives: 

During a Town Hall at the  AGU Fall Meeting in December 2020 the challenge of complex data citations was identified as an urgent concern within the Earth, space, and environmental sciences. 

In specific, we hoped to address the  use case of citing a large number of datasets (or other digital objects) such that credit for individual datasets/objects is assigned properly.  Such collections of data may contain hundreds to millions of elements with a citation needing to include subsets of elements potentially from multiple collections. There are a number of terms used for this concept, to include Data Collection, Package, Crate and more.  Infrastructure and guidance are still required to make it easier for researchers to use this type of citation and receive/give credit for the digital object used in the research. This is critical to enable reproducible research, and important for researchers to be able to trace citation and usage of their work and report on impact to funders. For policy makers it allows for accurate tracking of policy decisions to the supporting data. 

In this BoF we want to share the work that happened during 2021 with a tenacious  group of researchers, repositories managers, infrastructure, journal staff, and indexers to define the problem, a basic approach that applies to most variations,  and some of the early drafts of the recommendations. 

We want to engage with the broad RDA community to fully develop the  recommendations and identify adopters. 

We are especially inviting those with interest in PIDs (re: DOIs), FAIR Digital Objects, DOI Collections, repositories creating DOI collections, journals supporting citing DOI Collections, and Credit for the elements of a DOI Collection.

Meeting agenda: 

Collaborative session notes:

  1. Introduction to challenge with complex data/digital object citations and current status (20 mins)

  2. Value of solving this problem: researchers, institutions, journals, infrastructure, funders, organisations (e.g., IPCC). (10 min)

  3. Overview of approaches (30 min)

    1. British Oceanographic Data Centre, Justin Buck and James Ayliffe

    2. German Climate Computing Center/IPCC DCC, Martina Stockhause

    3. Ameriflux, Deb Agarwal

  4. Discussion (25 min)

  5. Next steps (5 min)

Type of Meeting: 
Working meeting
Short introduction describing any previous activities: 

Starting in December 2020 during a Data FAIR Town Hall at AGU’s large meeting we reached out across the community for interest in supporting virtual working sessions that would lead to recommendations, adoption, and a better of citing large numbers of data/digital objects in journal articles and other publications. 

During 2021 we held three large working sessions as well as smaller group development efforts. 

8 April 2021:

Develop a common agreement on the use case (and variations) as well as hear from those whom it affects. Materials and link to recording: Agarwal, Deborah, Coward, Caroline, Stall, Shelley, & Erdmann, Christopher. (2021, April). Data Citation Community of Practice - 8 April 2021 Workshop. Zenodo. 

8 June 2021: 

Presentations from different repository sse Cases:

  • RO-Crate, Carole Goble

  • BioStudies, Ugis Sarkans

  • GBIF, Daniel Noesgaard

  • Pangaea, Uwe Schindler

Infrastructure Elements:

  • DOI Collection, Martin Fenner, DataCite

  • Make Data Count, Martin Fenner, DataCite

  • Scholix / OpenAire, Paolo Manghi

Workshop materials and link to recording:  Agarwal, Deborah, Goble, Carole, Soiland-Reyes, Stian, Sarkans, Ugis, Noesgaard, Daniel, Schindler, Uwe, Fenner, Martin, Manghi, Paolo, Stall, Shelley, Coward, Caroline, Erdmann, Chris, 2021. Data Citation Community of Practice - 8 June 2021 Workshop.

29 October 2021: 

We leaned into the term “reliquary” as a temporary word for a collection/package of other DOIs, PIDS, or links.

“Reliquary” Use Cases:

  • British Oceanographic Data Centre, Justin Buck and James Ayliffe

  • German Climate Computing Center/IPCC DCC, Martina Stockhause

  • Ameriflux, Deb Agarwal

Workshop materials and link to recording: Stall, Shelley, Buck, Justin, Ayliffe, James, Stockhause, Martina, Agarwal, Deb, Coward, Caroline, & Erdmann, Chris. (2021, October 29). Data Citation Community of Practice - 29 October 2021 Workshop. Zenodo.

We are hopeful to bring awareness of this effort to the broad RDA community, and move forward using the RDA structure.


BoF chair serving as contact person: 
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2): 
Contact for group (email): 
Driven by RDA Organisational Member: 
Driven by RDA Organisational Member
Applicable Pathways: 
FAIR, CARE, TRUST - Principles
FAIR, CARE, TRUST - Adoption, Implementation, and Deployment
Data Infrastructures and Environments - International