The need for establishing this working group was articulated during the 9th plenary meeting in Barcelona during the Active DMPs IG session. The discussion was framed by a white paper by Simms et al. on machine-actionable data management plans (DMPs). The white paper is based on outputs from the IDCC workshop held in Edinburgh in 2017 that gathered almost 50 participants from Africa, America, Australia, and Europe. It describes eight community use cases which articulate consensus about the need for a common standard for machine-actionable DMPs (where machine actionable is defined as “information that is structured in a consistent way so that machines, or computers, can be programmed against the structure”)
The specific focus of this working group is on developing common information model and specifying access mechanisms that make DMPs machine-actionable. The outputs of this working group will help in making systems interoperable and will allow for automatic exchange, integration, and validation of information provided in DMPs, for example, by checking whether a provided PID links to an existing dataset, if hashes of files match to their provenance traces, or whether a license was specified. The common information models are NOT intended to be prescriptive templates or questionnaires, but to provide re-usable ways of representing machine-actionable information on themes covered by DMPs.
The vision that this working group will work to realise is one where DMPs are developed and maintained in such a way that they are fully integrated into the systems and workflows of the wider research data management environment. To achieve this vision we will develop a common data model with a core set of elements. Its modular design will allow customisations and extensions using existing standards and vocabularies to follow best practices developed in various research communities. We will provide reference implementations of the data model using popular formats, such as JSON, XML, RDF, etc. This will enable tools and systems involved in processing research data to read and write information to/from DMPs. For example, a workflow engine can add provenance information to the DMP, a file format characterization tool can supplement it with identified file formats, and a repository system can automatically pick suitable content types for submission and later automatically identify applicable preservation strategies.
The deliverables will be publicly available under CC0 license and will consist of models, software, and documentation. The documentation will describe functionality and semantics of terms used, rationale, standard compliant ways for customisation, and requirements for supporting systems to fully utilise the capabilities of the developed model.
The working group will be open to everyone and will involve all stakeholders representing the whole spectrum of entities involved in research data management, such as: researchers, tool providers, infrastructure operators, repository staff and managers, software developers, funders, policy makers, and research facilitators. We will take into account requirements of each group.This will likely speed up and increase adoption of the working group outcomes.
The group will predominantly collaborate online, but will use any possibility to meet in person during RDA plenaries, conferences, workshops, hackathons or other events in which their members participate. All meetings in which decisions are made will be documented and their summaries will be circulated using the RDA website.
The work will be performed iteratively and incrementally following the best practices from system and software engineering. We will evaluate preliminary drafts of the model with community to receive early feedback and to ensure that the developed common model is interoperable and exchangeable across implementations. We will also express existing DMPs using the developed common model and will investigate how to support modification of machine actionable DMPs by various tools involved in data management process, while ensuring that proper provenance and versioning information is stored with. Finally, we will build prototypes to investigate possible system integrations and to evaluate to which degree the information contained in the DMPs can be automatically validated and which actions or alerts depending on a DMP state can be triggered, e.g. by sending notifications to repositories or funder systems.
During our work we will monitor parallel efforts and engage with various research communities to find candidates for pilot studies and to transfer the acquired know-how. Towards the end of the lifetime of the working group we will launch pilot projects in which the model will be customised to suit the needs of the identified interested communities. Pilot studies will use the models to integrate systems and demonstrate how machine-actionable DMPs can work.
We believe that the outcomes delivered by this group will contribute to improving the quality of research data and research reproducibility, while at the same time reducing the administrative burden for researchers and systems administrators.