RDA Common Descriptive Attributes of Research Data Repositories

    You are here

31
Jan
2024

RDA Common Descriptive Attributes of Research Data Repositories

By Bridget Walker


 Data Repository Attributes WG

Group co-chairs: Matthew Cannon, Allyson Lister, Washington Luís Ribeiro de Segundo, Kathleen Shearer, Michael Witt, Kazu Yamaji

Recommendation Title: RDA Common Descriptive Attributes of Research Data Repositories

Authors: Michael Witt, Matthew Cannon, Allyson Lister, Washington Segundo, Kathleen Shearer, Kazu Yamaji and the Research Data Alliance Data Repository Attributes Working Group

Impact

A complete and current description of a research data repository is important to help a user find the repository; to learn the repository’s purpose, policies, functionality and other characteristics; and to evaluate the fitness for their use of the repository and the data that it stewards. Many repositories do not provide adequate descriptions in their websites, structured metadata and documentation, which can make this challenging. Even fewer make this information available in a machine-readable or actionable manner, which hampers interoperability. Descriptive attributes may be expressed and exposed in different ways, making it difficult to compare repositories and to integrate repositories with other infrastructures such as registries. They can be difficult to navigate and find: they may be locked behind authentication, obscured within workflows, or buried in myriad documentation and web pages.

Well-described data repositories provide value and impact to a broad set of stakeholders such as researchers, repository managers, repository developers, publishers, funders, registries and others to discover and utilize data repositories. Some motivating use cases for the development of these attributes include:

  • As a researcher, I would like to be able to discover repositories where I can deposit my data based on attributes that are important to me.
  • As a repository manager, I would like to know what attributes are important for me to provide to users in order to advertise my repository, its services, and its data collections.
  • As a repository developer, I would like to understand how to express and serialize these attributes as structured metadata for reuse by users and user agents in a manner that can be integrated into the functionality of my repository software platform.
  • As a publisher, I would like to inform journal editors and authors of what repositories are appropriate to deposit their datasets that are associated with manuscripts that are being submitted.
  • As a funder, I would like to be able to recommend and monitor data repositories to be utilized in conjunction with public access plans and data management plans for the research that I am sponsoring.
  • As a registry, I would like to be able to easily identify and index attributes of data repositories to help users find the best repository for their purpose.

Recommendation package DOI: 10.15497/rda00103

Citation: Witt, M., Cannon, M., Lister, A., Segundo, W., Shearer, K., Yamaji, K., & Research Data Alliance Data Repository Attributes Working Group. (2024). RDA Common Descriptive Attributes of Research Data Repositories (Version 1.0). Research Data Alliance. https://doi.org/10.15497/RDA00103

 

Abstract

The RDA Common Descriptive Attributes of Research Data Repositories outlines a list of common, concise, high-level descriptors that represent information that can be useful in describing a research data repository along with examples of how each attribute can be expressed as metadata in different schemata, limitations and potential complications that each might pose in harmonization, a brief rationale for why each attribute is important and a gap analysis of how easy or difficult it may currently be to locate this information from a data repository. The attributes are conceptual with examples provided for illustration purposes without endorsement of any particular approach, standard or implementation.

 

UN Sustainable Development Goals

While this recommendation does not support one or more of the specific SDGs, well-described, interoperable, well-promoted data repositories support research that can advance any of the goals in a general manner.

Output Status: 
Recommendations with RDA Endorsement in Process
Review period start: 
Thursday, 1 February, 2024 to Friday, 1 March, 2024
Group content visibility: 
Use group defaults
Primary WG Focus / Output focus: 
Domain Agnostic: 
Domain Agnostic
  • Beth Plale's picture

    Author: Beth Plale

    Date: 01 Feb, 2024

    Excellent work.  it would be helpful for the RDA-US region to include a paragraph or so on how this work complements/supports the OSTP Desirable Characteristics 2022 document.  Its complement/connection to DataCite should be addressed too (as was asked in OAB meeting).  This could be accomplished more generally through a new section titled "setting guidance in context" and doing same brief connecting the dots between this doc and known significant influencing contexts in other regions.

     

    https://www.whitehouse.gov/wp-content/uploads/2022/05/05-2022-Desirable-...

  • Yuri Carrer's picture

    Author: Yuri Carrer

    Date: 02 Feb, 2024

    Hi!

     in the context of the the project https://www.rd-alliance.org/fair-enabling-citation-model-cultural-heritage-objects we published in our zenodo community https://zenodo.org/communities/fair-cho this deliverable:

    A2 Case Studies Examination https://doi.org/10.5281/zenodo.8215438 (Page 10)

    where we explore the use of  “Citation Guideline URL” (re3data:citationGuidelineUrl) metadata field in re3data.org repositories.

    Also available:

    A2.2a Digital repositories data citation practices. Supplementary material https://doi.org/10.5281/zenodo.8188806

  • Allyson Lister's picture

    Author: Allyson Lister

    Date: 22 Feb, 2024

    Dear Yuri,

    Thank you very much for your comments! Indeed, the FAIR CHO work seems really interesting.  While the work seems to primarily around the attributes for data citation at a dataset level (and the DRA WG instead at the repository level), it makes sense that we're aware of each other's efforts and see how we can help each other.

     

    Separately to the DRA WG, please let me know if you'd like to know more about how FAIRsharing might help your efforts; it may even be that your group would like to provide a FAIRsharing Community Champion (https://fairsharing.org/community_champions) to ensure closer collaboration and feedback. For instance, did you know that FAIRsharing has a record for the FORCE11 Data Citation Principles (https://doi.org/10.25504/FAIRsharing.9hynwc) and there is a relationship graph of the policies and other resources that utilise it? We also align with the DRA WG attributes list and provide many searchable metadata fields for database attributes and conditions (https://fairsharing.gitbook.io/fairsharing/additional-information/databa..., which can be filtered via our Assistant at https://assist.fairsharing.org/) that should be helpful to you.

  • Dorothea Strecker's picture

    Author: Dorothea Strecker

    Date: 02 Feb, 2024

    Thank you for compiling this excellent draft!

    One point to consider might be the gap analysis categorization for 'certification'. This piece of research shows that out of a sample of research data repositories, the majority provided certification information on their websites:

    Donaldson, D. R. (2020). Certification information on trustworthy digital repository websites: A content analysis. PLOS ONE, 15(12), e0242525. https://doi.org/10.1371/journal.pone.0242525

  • Mingfang Wu's picture

    Author: Mingfang Wu

    Date: 08 Feb, 2024

    Excellent work!

    It will be good if there is a short explantion on how the gap analysis is derived, for example, is a gap scale based on the analysis of 100 repositories?

  • Francoise Genova's picture

    Author: Francoise Genova

    Date: 13 Feb, 2024

    Thanks for putting together this excellent work. It might be useful to provide in Annex a short explanation on how the WG reached the result (information used as input in addition to the use cases, methodology).

  • Claudia Bauzer Medeiros's picture

    Author: Claudia Bauzer ...

    Date: 13 Feb, 2024

    Very good and useful work. I would have liked to see an e.g. 3 paragraphs to 2 pages explaining the methodology - how did you decide on which attributes to examine and recommend, which repositories did you analyze?

     

    Also, I noticed that many of the recommended attributes are used to describe re3data repositories. I would also have liked to see a table describing which of your recommended attributes are also re3data attributes, and which are not. In other words, if I want to register "my" repository with re3data, I need to fill a form containing many of the attributes you recommend = they are the metadata that re3data requires for cataloguing a repository. Many of these attributes coincide with your recommendations. Which ones do not, and vice-versa?

  • Christin  Henzen's picture

    Author: Christin Henzen

    Date: 21 Feb, 2024

    Thanks for preparing an excellent draft. From discussions within our project (nfdi4earth.de), we would be interested in the spatial extent that is covered by the repositories' data. In our use cases, users often look for data with a specific spatial extent, and they would not visit a repository' webpage if they could find out that the repository only provides data for other regions. Have you already considered/discussed this in preparation for the draft?
    Moreover, we see the necessity to provide structured and _machine-readable_ information on preservation, e.g., how long will data be archived? 10y.' By now, there is no option to filter for "long-term archives" within re3data. Would it be worth sharpening the preservation concerning machine-readability?

    As previously mentioned, it would be interesting to learn more about the applied methods/approach for the gap analysis and reuse it for specific disciplinary repository descriptions.

submit a comment