Curating for FAIR and reproducible data and code

You are here

27 Jun 2019

Curating for FAIR and reproducible data and code

Submitted by Limor Peer


Meeting objectives: 
  1. Introduce the topic of curating for FAIR and reproducible data and code
  2. Review environmental scan of existing relevant training and identify gaps
  3. Introduce the CURE Consortium and elicit input on proposed standards, practices, and tools for curating for FAIR and reproducible data and code
  4. Review CURE goals in the context of the Reproducibility IG at RDA and potential synergies with other RDA IGs and WGs
Meeting agenda: 

 

Collaborative session noteshttps://docs.google.com/document/d/1IenuTqrcIK-ktBkohr1ypvGkTjXmhCcSOLHL...

 

10 minutes: Introductions and goals of the BoF
20 minutes: Background on CURE project, aims, activities
40 minutes: Group activity on CURE practices, gaps, opportunities for collaboration and integration
20 minutes: Open discussion of key issues, potential synergies, and the role RDA can play

 

            

 

 

 

Type of Meeting: 
Working meeting
Short introduction describing any previous activities: 

Scientific reproducibility provides a common purpose and language for data professionals and researchers. For data professionals, reproducibility can be a framework to hone and justify curation actions and decisions, and for researchers it offers a rationale for inserting best practices early into the research lifecycle. Curating for reproducibility (CURE) includes activities that ensure that statistical and analytic claims about given data can be reproduced with that data. Academic libraries and data archives have been stepping up to provide systems and standards for making research materials publicly accessible, but the datasets housed in repositories rarely meet the quality standards required by the scientific community. Even as data sharing becomes normative practice in the research community, there is growing awareness that access to data alone – even well-curated data – is not sufficient to guarantee the reproducibility of published research findings. Computational reproducibility, the ability to recreate computational results from the data and code used by the original researcher, is a key requirement to enable researchers to reap the benefits of data sharing (Stodden et al., 2013), but one that recent reports suggest is not being met. Verifying findings to confirm the integrity of the scientific record and to build upon previous work to discover and develop new innovations also requires access to the analysis code used to produce reported results. The exhaustive laundry list of tasks that characterize the traditional data curation workflow that enables data access--file review and normalization, metadata generation, assignment of persistent IDs, data cleaning, and assembly of contextual documentation--falls short when research reproducibility is the ultimate goal (Peer et al., 2014). In order to curate for reproducibility, activities must include a review of the computer code used to produce the analysis to verify that the code is executable and generates results identical to those presented in associated publications. CURE has been implementing practices and developing workflows and tools that support curating for reproducibility.

BoF chair serving as contact person: