Data Granularity BoF; user perspectives, data citation and data versioning

27 Nov 2019

Submitted by Fotis Psomopoulos

Meeting objectives: 

The proposed session has the following objectives:

  • To initiate a discussion with all relevant stakeholders (data users, data providers, data repositories) on data citation and data versioning within the context of  data granularity and modifications on data content / structure.
  • To initiate a collaborative study on descriptions of data granularity from the user perspective. Focus would be to identify the best way to convey the structure of the data (from the data provider to the data user). Granularity definitions would not be required - the description only needs to make sense and be appropriate for the data, and clear for the data users.

Target audience
Data providers / data users; groups that need to be able to describe the granularity levels of their data in a way that others can understand it. We will build upon previous and ongoing work within the RDA such as that by the Data Discovery Paradigms IG, the Research Data Collections WG, the WG on Data Citation, the Data Versioning WG and Research Data Packaging.


Meeting agenda: 
  • Introduction: why we need to study data granularity from the user perspective
  • Current studies: Presenting the current work of the Data Discovery Paradigms IG Granularity TF
  • Discussion 
    • What resources would be required for such a study
    • Whom (e.g. research groups, data repositories) we should partner with
    • Connections with other RDA WGs/IGs (such as the Data Versioning WG, etc)
  • Wrap-up, next actions for the group 
Type of Meeting: 
Working meeting
Short introduction describing any previous activities: 

The work that will be discussed in this BoF will build upon the activities that have been taking place within the Granularity Task Force of the Data Discovery Paradigms IG. The cataloging of data products often focuses on collections of similar digital objects such as observations from the same instrument over time or samples collected during a particular excursion. Users are then limited to discovering and accessing data at this highest level. The efficient and effective reuse of data requires that users, regardless of whether they are humans or machines, be able to find and access resources at finer levels of granularity. For example, a collection could be a set of files with each file containing multiple variables or geospatial layers. It is becoming more common for repositories to offer services that allow users to discover and access individual files within a collection and even access the individual layers in a complex file or individual columns in a table. This is made possible only if there is metadata matching the granularity level that is of interest. Furthermore, if a user accesses and uses only a subset of a collection, it may be useful that the subset has a unique identifier so that it can be accurately cited.

