status: Recognised & Endorsed

Chair (s): Jens Klump, Lesley Wyborn, Mingfang Wu, Kirsten Elger

Group Email: [group_email]

Secretariat Liaison: Stefanie Kethers


The Data Versioning WG has transitioned to the Data Versioning IG as of July 2021. The email address and group space have remained the same. 


The demand for reproducibility of research results is growing, Therefore it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. The capacity of computational hardware infrastructures have grown it is now common to have online petabyte data stores, This has encouraged the development of concatenated seamless data sets where users can use web services to select subsets based on spatial and time queries. Further, the growth in computer power has meant that higher level pre-processed data products can be generated in really short time frames.

This means that data sets and data products are needing some form of systematized way of being able to reference the exact version of the data that was used to underpin the research findings, and/or was used to generate higher level products. This was recognised by the RDA Working Group on Data Citation, whose final report recognises the need for Data Versioning. However, there were no specifics on best practice for data versioning, particularly for large volume multi-terabyte and even petabyte scale data sets. A BoF meeting held at the RDA Plenary in September 2016 in Denver highlighted the fact that there are no recognised best practices for versioning of data.

Versioning procedures and best practices are well established for scientific software and can be used enable reproducibility of scientific results. The codebase of very large software projects does bare some semblance to large dynamic datasets. Are these suitable for data sets or do we need a separate suite of practices for data versioning?

Ultimately versioning concepts developed for research data will need to be brought in line with versioning concepts used in persistent identifier systems.


The BoF initially emerged at Plenary 8 in Denver through the discussion available here:  https://www.rd-alliance.org/data-versioning-rda-8th-plenary-bof-meeting

File Repository

20
December
2018

WG Data Versioning - RDA Plenary 12 Notes and Presentation

by Jens Klump

Notes and presentation from the WG Data Versioning working meeting at RDA P12 in Gaborone, Botswana.


AttachmentSize
PDF icon WG Data Versioning - RDA Plenary 12 Notes.pdf99.48 KB
PDF icon RDA P12 Data Versioning Session Presentation.pdf1.7 MB
25
June
2018

Data Versioning WG presentation from P11

by Jens Klump

Presentation from Data Versioning WG Working Session at P11.

AttachmentSize
PDF icon rda p11 data versioning session.pdf1.87 MB
25
June
2018

Data Versioning WG notes from P11 Berlin

by Jens Klump

Notes from the Data Versioning WG Working Session at P11 Berlin

AttachmentSize
PDF icon wg data versioning - rda plenary 11 notes.pdf92.42 KB
20
December
2017

Notes from Data Versioning Session at P10 Montreal

by Jens Klump

Notes from the Data Versioning WG session at the RDA Plenary 10 in Montreal.

AttachmentSize
PDF icon WG Data Versioning - RDA Plenary 10 Notes.pdf249.73 KB