Data Versioning WG

WG

Group details

Secretariat Liaison: 
Stefanie Kethers
TAB Liaison: 
Tobias Weigel
WGs Wrapping up (from ~12 months after RDA endorsement)
 

The demand for reproducibility of research results is growing, Therefore it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. The capacity of computational hardware infrastructures have grown it is now common to have online petabyte data stores, This has encouraged the development of concatenated seamless data sets where users can use web services to select subsets based on spatial and time queries. Further, the growth in computer power has meant that higher level pre-processed data products can be generated in really short time frames.

This means that data sets and data products are needing some form of systematized way of being able to reference the exact version of the data that was used to underpin the research findings, and/or was used to generate higher level products. This was recognised by the RDA Working Group on Data Citation, whose final report recognises the need for Data Versioning. However, there were no specifics on best practice for data versioning, particularly for large volume multi-terabyte and even petabyte scale data sets. A BoF meeting held at the RDA Plenary in September 2016 in Denver highlighted the fact that there are no recognised best practices for versioning of data.

Versioning procedures and best practices are well established for scientific software and can be used enable reproducibility of scientific results. The codebase of very large software projects does bare some semblance to large dynamic datasets. Are these suitable for data sets or do we need a separate suite of practices for data versioning?

Ultimately versioning concepts developed for research data will need to be brought in line with versioning concepts used in persistent identifier systems.


The BoF initially emerged at Plenary 8 in Denver through the discussion available here:  https://www.rd-alliance.org/data-versioning-rda-8th-plenary-bof-meeting



Recent Activity

06 Dec 2019

RDA Plenary 15 Announcements - Registration and Calls for Co-located Events and Posters Now Open

RDA is excited to announce several important pieces of information related to RDA Plenary 15: Data for Real-World Impact, which will be held from 18-20 March 2020 in Melbourne, Australia at the Melbourne Convention Exhibition Centre, MCEC (https://mcec.com.au/).

06 Dec 2019

P15 Session Received

Dear RDA Chair,

 

Thank you for your session proposal for Plenary 15 titled “Data Versioning WG: Final Recommendations and next steps”.  A review of all submitted proposals is now underway by the RDA Technical Advisory Board, with notifications of acceptance and scheduling details planned to be sent out the first week in January.  

 

Please feel free to contact us with any questions or concerns you have regarding your submission.  

 

Thank you.

 

Regards,

 

Alex Delipalta, on behalf of the RDA Secretariat

28 Nov 2019

Deadline For P15 Group Session Submissions Extended To 5 December 2019

Dear RDA members, 

 

RDA's 15th Plenary meeting, Melbourne, Australia, 18-20 March 2020 

 

The deadline to submit session proposals for Plenary 15 in Melbourne, Australia has been extended to 5 December 2019, midnight UTC.

12 Nov 2019

Deadline for P15 Group Session Submissions Fast Approaching

Dear Group Members, 

The deadline for group session submissions for Plenary 15 is 28 November 2019, midnight UTC – just two weeks away.

If your group is interested in submitting a session for a working group, interest group or joint meeting session, submissions will only be accepted from your group chairs, so be sure to coordinate with them to ensure proper communication of the submission.   

If you are a chair and submitting for a joint meeting session, all chairs involved in that joint session must be notified of the submission. 

09 Oct 2019

P14 Session Chair Update

In preparation for Plenary 14 in Helsinki, please refer to the information presented below pertaining to your breakout sessions.

 

Remote Participation

  • Remote participation will be available in each meeting room through GoToMeeting, which will be running from the laptop provided by the University. By just opening your presentation on this laptop, remote attendees will be able to view your slides and hear you speak.

09 Oct 2019

CODATA Data Science Journal Call For Papers: Research Data Alliance Results Special Collection

Dear members,

I’m writing on behalf of the editorial board of the CODATA Data Science Journal. I would like to recall the possibility to submit the outputs produced by this group for the special collection of CODATA DSJ on RDA Results. Publication fees will be covered by the EC project "RDA Europe 4.0".