RDA-MPG Science Workshop on Data Report - Compiled by Bernard Schutz, Leif Laaksonen, Raphael Ritz, Herman Stehouwer, Peter Wittenburg, Rob Baxter
The widespread adoption of the Internet in the 80s was met with scepticism by science as to whether it could truly foster scientific research. Within just a decade, science had fully adopted both the Internet and its various layered infrastructures such as the World-Wide Web, as science understood that the exchange of knowledge, information and data between the rapidly increasing number and types of computers could now be done within seconds, almost seamlessly. It relieved scientists from many time-consuming aspects of traditional communication and exchange channels. Agreement on a few basic principles (node numbering, protocols, registries) at a time where many competitive suggestions were brought forward allowed scientists to shift their attention back to new scientific questions, simply making use of the new facilities rather than trying to invent them.
Currently we seem to be in a comparable situation, where the number and complexity of data exceeds our abilities to deal with them manually or through traditional means such as file systems. Fragmentation within disciplines, across disciplines and often across organizational boundaries (projects, institutes, states) is increasing rather than decreasing, and in many scientific domains the amount of time needed to manage and manipulate data to make them re-usable has become intolerable without support from new, highly automated processes. These trends with respect to data in science and beyond require new approaches to our management of data in the coming decades. Hence the Research Data Alliance (RDA), an initiative inspired by the Internet Engineering Taskforce, started, like the IETF, as a grass-roots, bottom-up organization designed to come up with formal agreements, specifications, running code – by data practitioners, for data practitioners.
The RDA-MPG workshop on Scientific Data brought together a number of leading European scientists and other experts to discuss current points of concern in the context of research data 10-11 February 2014 at the Max Planck Society headquarters in Munich. The main objectives were to discuss whether the participants see a role for the Research Data Alliance (RDA) in their daily activities, what the science community’s expectations might be, and whether the RDA roadmap needs to adapt to meet those expectations. The context of the Workshop was set by a number of questions covering scientific concerns, data sharing & publishing, stakeholder aspects, data infrastructures, technological trends and aspects of data science education.
The outcomes of the Workshop discussions are classified under a number of headings – data sharing and re-use, publishing and citing data, infrastructures and repositories, and general observations on the nature of the research data challenge we face today. The discussions culminated in a number of recommendations for RDA to consider:
- RDA can play an important role if it is able to come up with recommendations, API specifications, guidelines, etc. that help to overcome the many one-shot, point solutions currently being implemented and hence make infrastructure building more cost-effective.
- RDA must indeed be a bottom-up organization, and needs to strike the right balance between a better balance between bottom-up and its current, rather top-heavy, state.
- RDA must motivate a “middle layer” of data scientists and to get engaged, rather than hope for too much engagement from leading researchers.
- RDA must be aware that it may find itself in a race towards specifications and solutions with big commercial players who may win with de facto standards, simply because they arrive first.
- Expectations RDA has to meet:
- Invest in training younger generations of data scientists.
- Push demo projects, act as a clearing house and should be able to give advice on data management, access and re-use to everyone in research.
- Have data experts who can visit institutes and help them implement solutions.
- Perform good quality assessment on the first working-group results due in September 2014, and take care to not fall into the trap of overselling.