Data publishing: data usability certification services - RDA 8th Plenary BoF meeting

Data publishing: data usability certification services

The increasing availability of research data and its evolving role as a first class scientific output in the scholarly communication requires a better understanding of and the possibility to assess the fitness for reuse. The concept of data fitness is multifaceted and covers various aspects related to a dataset such as its level of annotation, curation, peer review, and citability. In addition, the ‘quality’ of datasets could be made even more explicit by allowing social tagging by end users. Moreover, the reliability of a data service providing datasets - for example the level of certification of the data repository - could also serve as a useful proxy. Currently, criteria for assessing the reusability of a data set are not made transparent to users. Thus the decision whether to reuse a data set is ambiguous which in turn not only leads to a lesser use of shared data but also decreases the reliability of research results including reused data. Firstly, a concept of data fitness requires a revision of assessment criteria to be included as well as to balance the effect of each of those criteria. This preferably leads to a corresponding metric. Secondly, we want to find effective ways to expose and communicate such a metric, e.g. using a labelling or tagging system. 

At the 7th plenary meeting, the idea to start a working group on data quality services was presented during the session of the ‘Certification of Digital Repositories IG’. Within this meeting there was sufficient interest to work towards a new working group on the topic. This working group could fall under the umbrella of the IG data publishing. The new group will work towards the following deliverables:
• The definition of quality criteria certifying data fitness for reuse
• The development of a system of badges/stamps/labels certifying data fitness for reuse
• The definition of procedures for the certification of data sets 

Case statement for this new group (draft):

Objectives for the meeting:

• Presentation, discussion and finalization of the objectives of the case statement
• Drafting first timelines for working group
• Identification of additional partners

Meeting agenda

Moderation: Helena Cousijn & Markus Stocker
1. Presentation of the idea behind this BoF and the Case Statement (Helena Cousijn) - 15 minutes
2. Specification of the objectives of the WG (discussions, all participants) 45 minutes
3. Overview of outcomes of the Certification of Digital Repisitories IG (Ingrid Dillo & Rob Hoft) - 10 minutes
4. Summary of the results (Markus Stocker) – 10 minutes
5. Specification of timelines for WG (Helena Cousijn) – 10 minutes 

This session will be relevant for (1) data centers and repositories carrying out quality processes such as curation, (2) researchers who would be interested in improving assessment of data usability, and funders and publishers who are interested in the application of a data quality/usability label. It is suggested that participants read the case statement draft before the meeting.

Group chairs serving as contact person Helena Cousijn & Michael Diepenbroek