• Output Type: Working Group Supporting Output
  • Review Status: Endorsed
  • Review Deadline: 2020-02-28
  • Author(s): Mingfang Wu
  • Abstract

    Data Versioning WG

    Group co-chairs: 

    Jens KlumpLesley WybornAri AsmiRobert Downs

    Supporting Output title: Principles and best practices in data versioning for all data sets big and small  

    Authors: Jens Klump, Lesley Wyborn, Robert Downs, Ari Asmi, Mingfang Wu, Gerry Ryder, Julia Martin

    Impact: Provides recommendations for standard practices in the versioning of research data, adding a central element to the systematic management of research data at any scale which in turn enhances reproducibility and enables the attribution of any person or organisation that contributed to the development or funding of any version of a dataset.

    DOI: 10.15497/RDA00042

    Citation:  Klump, J., Wyborn, L., Downs, R., Asmi, A., Wu, M., Ryder, G., & Martin, J. (2020). Principles and best practices in data versioning for all data sets big and small. Version 1.1. Research Data Alliance. DOI: 10.15497/RDA00042.

     

    Abstract:

    The demand for better reproducibility of research results is growing. More and more data is becoming available online. In some cases, the datasets have become so large that downloading the data is no longer feasible. Data can also be offered through web services and accessed on demand. This means that parts of the data are accessed at a remote source when needed. In this scenario, it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. However, while the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available.

    Versioning procedures and best practices are well established for scientific software. The related Wikipedia article gives an overview of software versioning practices. The codebase of large software projects does bear some semblance to large dynamic datasets. Are therefore versioning practices for code also suitable for data sets or do we need a separate suite of practices for data versioning? How can we apply our knowledge of versioning code to improve data versioning practices? This Working Group investigated to which extent these practices can be used to enhance the reproducibility of scientific results.

    The Research Data Alliance (RDA) Data Versioning Working Group produced this white paper to document use cases and practices, and to make recommendations for the versioning of research data. To further adoption of the outcomes, the Working Group contributed selected use cases and recommended data versioning practices to other groups in RDA and W3C. The outcomes of the RDA Data Versioning Working Group add a central element to the systematic management of research data at any scale by providing recommendations for standard practices in the versioning of research data. These practice guidelines are illustrated by a collection of use cases.

     

    Please note that the previous version (v1.0) underwent community review. The current version (v1.1) was updated following the community review.

     

     

  • Group Technology focus: Data (Output) Management Planning
No comments found.