Data Versioning IG Constitutive Meeting
Short introduction describing the activities and the scope of the group:
The demand for reproducibility of research results is growing, Therefore it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. The capacity of computational hardware infrastructures have grown it is now common to have online petabyte data stores, This has encouraged the development of concatenated seamless data sets where users can use web services to select subsets based on spatial and time queries. Further, the growth in computer power has meant that higher level pre-processed data products can be generated in really short time frames.
This means that data sets and data products are needing some form of systematized way of being able to reference the exact version of the data that was used to underpin the research findings, and/or was used to generate higher level products. This was recognised by the RDA Working Group on Data Citation, whose final report recognises the need for Data Versioning. However, there were no specifics on best practice for data versioning, particularly for large volume multi-terabyte and even petabyte scale data sets. A BoF meeting held at the RDA Plenary in September 2016 in Denver highlighted the fact that there are no recognised best practices for versioning of data.
Versioning procedures and best practices are well established for scientific software and can be used enable reproducibility of scientific results. The codebase of very large software projects does bare some semblance to large dynamic datasets. Are these suitable for data sets or do we need a separate suite of practices for data versioning?
Ultimately versioning concepts developed for research data will need to be brought in line with versioning concepts used in persistent identifier systems.
Additional links to informative material related to the group i.e. group page, Case statement, working documents etc
Case statement: https://www.rd-alliance.org/group/data-versioning-ig/case-statement/data...
Notes from Denver Plenary BoF meeting: https://www.rd-alliance.org/data-versioning-rda-8th-plenary-bof-meeting
Use cases and definitions: https://docs.google.com/document/d/1TfBPlfjTVg0YcFxuw0UszAXPYrRmyZ6PCxtx...
To establish an RDA interest group on developing agreed practices for Data Versioning and develop a work plan.
2. Why, How and What of Data Versioning
Why (Lesley Wyborn)
How (Jens Klump)
Where - Lightning presentations
3. Develop work plan for RDA IG on Data Versioning.
Anyone who is interested in moving data versioning forward.
If they have begun to systematize data versioning then if they can contribute their use case to the discussion.
Group chairs serving as contacts: Jens Klump
Group maturity: 0-6 months
Remote Participation: To access the remote meeting link for this session on April 5 from 14:00-15:30 titled “RDA Plenary 9: Data Versioning Interest Group" please go to https://global.gotomeeting.com/join/311156213
You can also dial in using your phone.
Access Code: 311-156-213
Australia: +61 2 8355 1038
Austria: +43 1 2060 92964
Belgium: +32 28 93 7002
Canada: +1 (647) 497-9373
Denmark: +45 32 72 03 69
Finland: +358 972 52 2971
France: +33 157 329 481
Germany: +49 69 5880 7802 72
Ireland: +353 15 360 756
Italy: +39 0 230 57 81 80
Netherlands: +31 207 941 375
New Zealand: +64 9 913 2226
Norway: +47 21 93 37 37
Spain: +34 912 71 8488
Sweden: +46 775 757 471
Switzerland: +41 225 4599 60
United Kingdom: +44 20 3713 5011
United States: +1 (646) 749-3117