Delimiting the units of open data

You are here

25 Jan 2021

Delimiting the units of open data

Submitted by Tim Vines

Meeting objectives: 

One reason for researchers to skip their data sharing responsibilities is the difficulty in determining which datasets should be shared: there is a wide gap between the wording of data sharing policies and the actions required for their particular study. The only way to overcome this obstacle is to get specific and tell researchers exactly which of their datasets they need to share, and where they should go. Unfortunately, stakeholders also find it hard to work out which data should be shared, and it’s also far from clear where in the research cycle stakeholders should focus their open data efforts. 

Researchers typically encounter data sharing policies at journals, and hence they are accustomed to providing all of the data associated with a particular manuscript. However, manuscripts may not be the most natural ‘unit’ of data sharing – a unit of research effort for which we try to obtain all the underlying data – and we should consider alternatives. 

This BoF session considers the strengths and weaknesses of a range of data sharing units:

  • the daily or weekly output of a research lab 

  • published articles 

  • entire grants

Focusing attention on delimiting the units of open data is vital because it addresses the ‘What’ aspect of data sharing, and thus complements the ‘How’ described by the FAIR guidelines. More practically, it enables stakeholders to give their researchers specific expectations around open data at all stages of the research cycle.

Our long term goal is to develop a working group that will formulate a set of guidelines for stakeholders to use when determining whether all of the data associated with a particular unit of research have been successfully made public. This BoF meeting will establish the level of broader interest in this work and lay out the initial steps.

Meeting agenda: 

Collaborative session notes:


1. Introduction and overview (5 min)

2. Lightning talks (40 min)

2.1 Monitoring Research Outputs as a Funder – Ekemini Riley (Aligning Science Across Parkinsons, TBC)

2.2 Considerations in preserving data - Natalie Harrower (Digital Repository Ireland)

2.3 Using AI to identify datasets associated with articles – Tim Vines (DataSeer)

2.4 Monitoring datasets in an Open Science Lab – Timothée Poisot (U Montreal, TBC)

3. Discussion on next steps forward  (10 mins)

4. Meeting close (5 min)

Type of Meeting: 
Informative meeting
Short introduction describing any previous activities: 

Tim Vines is the Founder and Project Lead on DataSeer, an AI-powered tool that ‘reads’ research texts to find sentences describing data collection, data re-use, or data sharing. He published a post on the Scholarly Kitchen blog titled “Articles Are the Fundamental Unit of Data Sharing”, with the aim of drawing attention to the ‘What’ question of open data.

BoF chair serving as contact person: 
Meeting presenters: 
Ekemini Riley (TBC), Natalie Harrower, Tim Vines, Timothée Poisot (TBC)
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2): 
Contact for group (email): 
Estimate of the required room capacity (Hybrid plenary):