‘Data makes the difference’ RDA Plenary 14, 2019 Helsinki
A blog (not) about Fish and Forks
As there were so many interesting sessions in parallel at the 14th Plenary it was a tough choice which one to attend. And as if we were not already spoilt for choice a great new addition to the program was the unconference where participants had the opportunity to propose topics that had not made it to the official session’s agenda.
The Dipoli buildings beautiful architecture and the wonderful kinetic pine-cone that was in front of it.
source J-P Kärnä, CC BY-SA 3.0
In several sessions I joined, the question of lawful use of research data came up quite often which is why I would like to highlight to working session from the RDA/CODATA Legal Interoperability Interest Group (hereinafter RDA-CODATA WG). Led by Co Chair Christoph Burg the aim was to explore data research repositories and the problems users face with respect to mixed licenses and access types. For data to 'make a difference' researchers must be able to access and use data which can be a problem in practice. A successful outcome of the RDA-CODATA working group has been in this regard the publication of the “Legal Interoperability of Research Data: Principles and Implementation Guidelines” (Uhlir et al, 2016)
The recommendations to improve the much needed legal interoperability come in the form of six principles for legal interoperability of research data and guidelines on how to implement them. Legal interoperability between multiple datasets is met when the legal use conditions for each dataset can be accessed and those conditions are clear about what is allowed which should at a minimum be the creation and use of combined or derivative products. When their proposed use is not restricted by any of the applicable licenses for the datasets, users should be granted lawful access to and use of each dataset without having to ask for further authorization from data rights holders.
The six principles to comply with for legal interoperability as proposed by the RDA-CODATA WG are as follows:
- Facilitate the lawful access to and reuse of research data.
- Determine the rights to and responsibilities for the data.
- Balance the legal interests.
- State the rights transparently and clearly.
- Promote the harmonization of rights in research data.
- Provide proper attribution and credit for research data.
As the publication of the principles received critical feedback from stakeholders the purpose of this workshop was also to follow up from Plenary 13 and discuss the proposed Re-Charter taking into consideration the feedback received.
To help inform the discussion amongst the session participants, there were several presentations including by CEDA on their findings from the CEDA licenses review and on the results of the JISC survey on licensing for repositories which I will discuss in some more detail here.
According to the FAIR principle of reusability (R1.1) datasets must be released with a clear and accessible data usage license (Wilkinson et al. 2016) In practice however this is easier said than done especially in the context of data packages where different types of data are combined which can have different licenses attached to them. It then becomes a challenge for users when these licenses are not clear or present conflicting levels of openness.
Standardised licence classification: an approach born from use-cases and lived experience
Presented at the plenary were the insights gained by the Centre for Environmental Data Analysis (CEDA). In addition to information CEDA gives access to data on earth observations and atmospheric science for environmental science. They currently have about 16.3 PB of data available in 5622 datasets covering around 224 million files.
Having identified a problem amongst their users with respect to the discovery of data they conducted a study to see whether the discovery of data would be improved through the use of standard licence classifications. What they found when they first did a review of the licenses back in 2014 was that there was a wide range of licenses used for CEDA data from very generic ones to very specific bespoke licenses and that the quality of the different types varied enormously as well as what permissions they give for using the data.
Having standardised licenses would improve the current situation for the following reasons: it allows for a uniform definitions to be developed and give more clarity for users about use restrictions of the data they intend to use. When more standardised licenses are used this may also make it easier to improve them and for example cover additional aspects such as the need to have time or geographical restrictions which may not yet be covered by any of the licenses currently in use and finally when standardised it will be easier to achieve a universal understanding of their scope amongst (new) users.
In 2019 they did a second licence classification review which they recommend others to do to because it has helped them to improve their licensing practice having a clear structure in place that enables them to make better choices between the available licenses and for example how to assess bespoke licences where these are essential. (See presentation) For others who seek to develop or improve access to research data the results of the work CEDA has done can be of interest. See also the poster presented at Plenary14 introducing
As mentioned as one of the benefits of having standardised licensing is the opportunity to develop and have uniform definitions. In this regard CEDA is also looking at the design of what need to be meaningful icons for users to easily understand any use restrictions posed.
The next presentation(s) from the Jisc Open Research Hub team picks up on the point of design having looked at the issue of repository licenses from a product design perspective.
Mixed licences & access types in repository-based research outputs A UX Challenge
As mentioned files in a dataset may have different levels of restriction on access which is not only a cause of uncertainty for the user seeking access but also for the researchers who want to upload their datasets in a repository for example not knowing which license is the most appropriate and legally compliant for example. To avoid the choice for unnecessarily restrictive licenses or data not being used due to uncertainty by researchers, specific guidance during the process whether this is uploading or downloading data for further use would greatly improve not only re-use but also data being made available ‘as open as possible’ and only restrictive when necessary.
When addressing the problem of deposits with mixed access and mixed licenses from a product design perspective, it became clear that there can be a variety of design solutions. Presented for example were the application of validation rules and on screen messaging can help for example simplify uploads and avoid contradictions in the upload of files.
After the presentations there was little time to discuss the proposed Re-Charter for the group to work on next but the proposed aims of the renewed IG include to document barriers and possible solutions to improve to implementation of the Principles and Guidelines, so please do consider joining and contribute your insights.
I will end by thanking the Research Data Alliance organisation and local organisers for making yet another successful plenary possible.
It’s not just the data that makes the difference, it is also the people like you, so thank you.
 If you are curious what this refers to don’t hestitae to contact me I am very interested to meet fellow RDA members.
 See Doldirina C, Eisenstadt A, Onsrud H and Uhlir P (2018) Legal Approaches for Open Access to Research Data. LawArXiv. Available at: osf.io/preprints/lawarxiv/n7gfa.
 Wilkinson MD, Dumontier M, Aalbersberg IjJ, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018.
 Graham Parton, Sam Pepler, Kate Winfield (2019) Standardised licence classification: an approach born from use-cases and lived experience [slides] RDA 14th Plenary, Helsinki 24th October 2019
 The hub is a single interoperable system for managing, preserving and sharing institutional digital research data.
 They conducted and organised a survey and workshops around this issue. Survey results [online] http://tiny.cc/jisc-mixed-licence
 Mixed licences & access types in repository-based research outputs A UX Challenge [slides] Tom Davey, Jisc
 You can find the RDA/CODATA Legal Interoperability IG at https://www.rd-alliance.org/groups/rdacodata-legal-interoperability-ig.html