The aim of Long Tail of Research Data IG is, according to the case statement, “to develop a set of good practices for managing research data archived in the university context. The scope of the topic will be limited to the data generated in universities and research institutions and the role of institutional repositories and libraries as agents of the institutional data management.”
The implication here is that there are research datasets of varying sizes, and arising from research across various disciplines, which are not currently curated and preserved by subject-specific datacentres. As a result, these research datasets, where generated by research-performing institutions such as universities, may be less likely to be recognised as valuable, shareable and re-useable research assets.
This is not, however, to say that participants in the group come only from universities. Many do, but datacentres and other relevant institutions are also regular participants. Our speaker line-up for Plenary 4 was as follows:
- Veerle Van den Eynden, 'Incentives for sharing research data'; Knowledge Exchange and the UK Data Archive
- Stefan Kramer, American University, US
- Dimitris Koureas, Natural History Museum London, UK
- Amy L. Nurnberger, Columbia University, US
- Kerstin Lehnert, Integrated Earth Data Applications, US
- Jochen Schirrwagen, Bielefeld University, Germany
As you can tell from the speaker list alone, one of the attractions of this group is as a place of exchange of ideas between different types of institution across different national contexts: the common thread is that they either generate or host research data, and often do both. Most of our speakers on this occasion provided overviews of strategies to motivate deposit at their respective institutions, including approaches to advocacy and infrastructure development. The exception was Veerle Van Den Eynden; her talk provided a more global view, reporting on the KE/UKDA project, ‘Incentives for sharing data’. Whilst recognising the value of repository deposit as a means to share data, Veerle’s project investigated how researchers from various disciplines and countries currently understood the notion of sharing research data, and how they currently went about it. Sharing methods include emailing resources, use of a common filestore, and (the gold star route!) deposit in a sustainable repository with a licence allowing appropriate re-use. This project also recognised that sharing is not always alike – in some cases there is an immediate reciprocal value to making data available, e.g. in the context of a collaborative project, but we should recognise that in some cases, ‘data sharing’ can actually be understood as ‘data donation’: where data is deposited for re-use without an immediate obvious benefit to the donor. In such cases, it is of course important to remember the long-term benefit to researcher impact as described by Piwowar et al, amongst other studies, as well as the increased likelihood that research funders may – either implicitly or explicitly – move to recognition of past data sharing form, when considering future bids for funding.
The webpage for the ‘Incentive’ project is at http://www.data-archive.ac.uk/about/projects/incentive.
The other speakers in this session all contributed valuable outlines of research data-related innovations and implementation of infrastructure at their various institutions. Such sessions provide a useful opportunity to compare and review variations and commonalities in practice. But the IG is not just a forum for presentations. Discussion identified some suggested work areas for group members around the themes of interoperability, discoverability, incentives for deposit and use, costs and funding models and evidence about ‘the long tail’. Members were urged to return to their campuses and try to find out the kind of data holdings at each institution. In addition, any tools that help specifically with the management of ‘long tail’ data should be identified, shared and reviewed.
The notion of a discovery layer was also raised during discussion: this is something that in some national contexts is being looked at already. For example, the Jisc Research Data Discovery Service (RDDS) is a UK-based pilot initiative which is currently trialling approaches to the harvesting of metadata from research data repositories hosted by universities, subject-specific datacentres and databases, and other large institutions. (Phase 1 is described here: http://www.dcc.ac.uk/projects/research-data-registry-pilot). The RDDS builds on software first developed by the Australian National Data Service for their research data discovery service Research Data Australia, a major initiative by the Australian government: https://researchdata.ands.org.au.
There was also timely discussion about the preparation of incentives for researchers to discourage opt-out from the Horizon 2020 Open Data pilot – the group recognised the opportunity the Open Data action provides for improving awareness and practice across the EC.
Many other issues and questions were raised throughout the presentations and the discussion session. My full notes and co-chair Kathleen Shearer’s summary are available at https://www.rd-alliance.org/group/long-tail-research-data-ig/post/summary-and-notes-meeting-amsterdam-september-23-2014.html .
The group’s webpage can be found at: https://www.rd-alliance.org/group/long-tail-research-data-ig.html .