The content on this wiki page contains only the summary. For the full document, please see the attached PDF.
Recommendations for Implementing a Virtual Layer for Management of the Complete Life Cycle of Scientific Data
Edited by: Tobias Weigel, Peter Wittenburg,
Supported by: Bridget Almas, Reinhard Budich, Sandra Collins, Michael Diepenbrook, Ingrid Dillo, Francoise Genova, Frank Oliver Glöckner, Rebecca Grant, Wilco Hazeleger, Margareta Hellström, Keith Jefferey, Franciska de Jong, Tibor Kalman, Rebecca Koskela, Dimitris Koureas, Wolfgang Kuchinke, Leif Laaksonen, Larry Lannom, Michael Lautenschlager, Damien Lecarpentier, Jianhui Li, Jay Pearlman, Luca Pezzati, Ralph Müller-Pfefferkorn, Beth Plale, Stefano Nativi, Raphael Ritz, Ulrich Schwardmann, Rainer Stotzka, Achim Streit, Dieter van Uytvanck, Anwar Vahed, Doris Wedlich, Colin Wright, Ramin Yahyapour, Thomas Zastrow, Carlo Maria Zwölf
This note does not deal with research infrastructures in the general sense, but only with those aspects that are related with data. Appendices refer to details.
There is wide agreement on a set of principles. Action is now required to put in place operational components of a common infrastructure.
We are at a tipping point in the development of a common conceptual framework and set of tools and components which will revolutionize the management of scientific data. It is widely acknowledged, as detailed below, that the current volumes and complexity of data now being collected and, even more so, the inevitable and enormous increase in that volume and complexity, have reached the point where action is required. At the same time, and largely in response to this perceived crisis, a number of principles for the management of scientific data have arisen and been widely endorsed. The danger now is that agreement will stop at the level of principles and that multiple non-interoperable domain and technology specific silos will continue to arise, all based on the abstract principles, and we will lose the opportunity of leveraging the current crisis to create a common set of tools and components based on an agreed conceptual approach.
The real risk at the moment is in not building a common core infrastructure according to our best current information.
What follows is our summary of the current agreed-upon principles, a more detailed analysis of the requirements implied by those principles, and the current state of work on those requirements, as reflected in the work of RDA, which we believe has the broadest base and most neutral view of the situation. This includes brief summaries of the requirements and the current state of work on repositories, registries, identifiers, metadata, types, licenses, and, in general, the whole ecosystem of interlinked digital objects needed for managing the life cycle of scientific data. We end with a more detailed view of the requirements for selected components extending partly the FAIR principles (Findable-Accessible-Interoperable-Reusable, Appendix A).
Many core infrastructure components are already in use. We need to validate these components, encourage their use, connect the components, and begin building a common core infrastructure.
Action is now required to put in place operational infrastructural components based on this and similar analyses. Some of these components already exist at an operational level with wide experience across communities, while others are yet at a prototype or concept stage, i.e. no design from scratch is intended and we can build on extensive knowledge built up in various regions. We should now install a systematic approach where these components can mature and ultimately enable communities to build new services and proof that added value can be achieved by means of combination of components. There will, of course, be a risk in doing this and some of these components will surely fail or otherwise prove inadequate. In some cases waiting another five or ten years would perhaps result in better designs and implementations based on technology advances between now and then but by that time the interoperable silo problem will have gained ground and be difficult to displace. The real risk at the moment is in not building a common core infrastructure according to our best current information.
In addition to the current widely adopted recommendations by funders the essence of which recommendations will come next can be summarized as:
- Digital objects should be stored in trustworthy repositories that are assessed regularly using DSA/WDS guidelines and those repositories should be registered in open registries such as re3data.
- Trustworthy repositories need to assign PIDs to all digital objects and register them with trustworthy PID service providers, such as the International DOI Federation and the European Persistent ID Consortium for eResearch, that guarantee their resolution to meaningful state information.
- The digital objects referenced in points 1 and 2 above are not restricted to the data itself but also include schemas, queries, concepts and concept vocabularies, all of which need to be registered in open registries and assigned PIDs if they are cited or referenced.
There are still many issues to be explored and questions to be answered, but we believe that science would be well-served if future scientific data infrastructure projects accepted and followed these high level recommendations.