Data Versioning WG Working Meeting (Remote Access Instructions)
Room Location: Commonwealth A2
Collaborative session notes: https://docs.google.com/document/d/1p-VVKhZ6NV-feon-68FMu_dEDO9e3H7Tca5mNxpLRdY/edit
Short introduction describing the scope of the group and if any previous activities
The demand for reproducibility of research results is growing, Therefore it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. However, systematic data versioning practices are currently not available.
Versioning procedures and best practices are well established for scientific software and can be used to enable reproducibility of scientific results. The codebase of large software projects does bear some semblance to large dynamic datasets. Are therefore versioning practices for code also suitable for data sets or do we need a separate suite of practices for data versioning? How can we apply our knowledge of versioning code to improve data versioning practices?
Over the past year, we have collected use cases of data versioning practices and extracted data versioning patterns. A draft of the Working Group’s report and recommendations for data versioning practices will be presented in this session. We invite data scientists, operators of data repositories, and anyone who is interested in moving data versioning forward, to attend.
Additional links to informative material related to the group
- Working Group Page: https://rd-alliance.org/groups/data-versioning-wg
- Case Statement: https://rd-alliance.org/group/data-versioning-wg/case-statement/data-ver...
- Data Versioning Use Cases: https://rd-alliance.org/data-versioning-use-cases
- Mapping of use cases to W3C Data Exchange WG use cases: https://docs.google.com/document/d/1lI5HyRnHMLkdDpRMg82HCnyzvxxmb_oymJ4T...
- Draft of white paper on data versioning: https://docs.google.com/document/d/1bINRNA2PtumnnNWoPnvzMoCYci6S5HjcX8LG...
Notes and presentations from past plenaries:
- Notes and presentation from P12 Gaborone: https://www.rd-alliance.org/wg-data-versioning-rda-plenary-12-notes-and-...
- Notes from P11 Berlin: https://www.rd-alliance.org/data-versioning-wg-notes-p11-berlin
- Presentation from P11 Berlin: https://www.rd-alliance.org/data-versioning-wg-presentation-p11
- Notes from P10 Montreal: https://rd-alliance.org/notes-data-versioning-session-p10-montreal
- Presentation from P10 Montreal: https://rd-alliance.org/data-versioning-presentation-rda-p10
- Notes from Denver Plenary BoF meeting: https://www.rd-alliance.org/data-versioning-rda-8th-plenary-bof-meeting
The objective of this session is to establish a work plan for this RDA Working Group on developing agreed practices for Data Versioning to finalise the outcomes and recommendations. This includes:
- Identifying areas where versioning is required and/or other use cases:
- Identifying groups in RDA and planning of how to engage
- Identifying external groups
- Overview of collected use cases
- Present the outline of a white paper on recommendations for data versioning:
- Spectrum of data types to be included (files, databases, unstructured data, model runs, etc.),
- How to align these with the practices for the assignment of persistent identifiers.
- Identify other topics that should be included
- Data Versioning WG retrospective
- Presentation of report draft and recommendations on data versioning practices
- Adoption of WG recommendations through W3C
- Engagement with other RDA and external groups
- Work plan for final six months of RDA Data Versioning WG
- Scheduling of online meetings up to Plenary 14
Group chair serving as contact person
Type of meeting
- Members of the Working Group.
- Data scientists and operators of data repositories
- Data producers and users
- Publishers who want to be sure that the correct version of a data set is cited in a publication
- Anyone who is interested in moving data versioning forward.