Skip to main content

Notice

We are in the process of rolling out a soft launch of the RDA website, which includes a new member platform. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Visitors may encounter functionality issues with group pages, navigation, missing content, broken links, etc. As you explore the new site, please provide your feedback using the UserSnap tool on the bottom right corner of each page. Thank you for your understanding and support as we work through all issues as quickly as possible. Stay updated about upcoming features and functionalities: https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/

#130929

Hi all,
I did not have the opportunity to attend the TPDL 2018 workshop, so at the risk of being outdated, here are my two cents on this until we can meet in Gaborone and discuss this in person.
1. maDMPs need an ecosystem of “automatable” software (Rules 1 and 2 of the 10 Simple Rules for maDMPs )
One important goal is to have the maDMP automate repetitive work, but people should be able to keep their own repositories and other software platforms in the maDMP workflow, or, at most, install an upgrade to these existing solutions. However, automation across different systems requires interoperability.
Thus, interoperability with existing data management tools is a must, because a maDMP is basically worthless if no software knows how to execute what it prescribes or enforces. An off-the-shelf workflow engine should be able to fire off the proper events to the data management software, with the necessary payload, which would itself comply with a set interoperability standard (this is where we could come in, I think). In a sense, we could work towards an API specification covering a core set of operations that should be supported by any repository or data staging platform that wants to be “maDMP Ready” in the RDA sense.
2. maDMPs should be modular
Every research project has its own way of handling data but some needs are common, as shown by the recent tools that aid in building a DMP. A very strong interest of maDMPs thus lies in possible reuse either as a whole or as building blocks. The maDMP should expose the subprocesses inside it in a modular way, with certain elementary validation and processing workflows being modeled and shareable using existing modeling languages, such as BPMN, as shown at the workshop BPMN , as shown at the workshop. Other modeling languages could be used as long as they have both a standard visual representation for being included in the “printed” DMP, much like UML, and a machine-processable representation in XML. Such maDMP “building blocks” so to say, could then be reused in a project’s maDMP, and published in a maDMP “directory ” for others to access.
People interested in a maDMP for their project could then:
a) Download and reuse an existing maDMP for a project that they know went well
b) Reuse only the maDMP “building blocks” for metadata validation, dataset availability, repository compliance, etc. (modeled using BPMN )
c) Build their own from scratch and share it to this “directory” of maDMPs for others to reuse
3. The code that actually executes the actions in the maDMP’s processes should travel with them (Rules 6, 7, 9)
If possible, the small pieces of code behind every step of the modeled workflows should be open-source and retrievable as needed by the workflow engine as it runs the BPMN processes. Like this, vendors could fork and adjust them to the existing APIs of their repository software or other software that the maDMP needs to “remote-control” to execute the automation steps specified in the modeled process. This is somewhat similar to ETL tools, but fetching platform-specific code as needed. A commercial example of a graphical ETL tool is Pentaho Data Integration here — by the way, i am not affiliated with them, only used the tool myself.
Best,
João Rocha da Silva
Invited Assistant Professor — Dendro Lead Developer — Research Data Management
Faculty of Engineering of the University of Porto, Portugal
ORCID: https://orcid.org/0000-0001-9659-6256 GitHub: https://github.com/silvae86