Canonical Workflow Frameworks for Research (CWFR)
Submitted by Peter Wittenburg
The CWFR (Canonical Workflow Frameworks for Research) initiative is motivated by deep gaps which we found between technological possibilities and principles on the one hand and data practices on the other hand. Deep insights in about 70 research projects led us observe two major paradoxes: (1) Researchers are aware of the FAIR principles, but do not yet want to change practices in the data labs. For making data FAIR they like to defer to the end phase of a data-driven project when a publication is being submitted. This will include only a very small fraction of data which was generated and used. Making all relevant data FAIR at the very end is very costly and not useful to promote data science. (2) Having analysed the processes in data creation, managing, and processing across disciplines one can observe recurring patterns of work steps, but individual researchers believe that their work is unique. Despite much effort to create technical workflow frameworks during the last decades, these are not as widely used in the data labs as might be expected.
G. Strawn describes this as "Standards are good for science, but not for the individual scientists". The reason is of course that the individual researcher is under some stress to publish new results and does not like interruptions caused by new technological innovations. They like tools which help them master their work in the shortest time possible.
The CWFR initiative addresses both paradoxes. Scientifically motivated workflows empowered by FAIR Digital Objects could help the researcher in his/her daily work, making many steps more efficient by using automated workflow methods that would immediately create FAIR compliant data without bothering the researcher with details. With this approach we do not want to replace existing and evolving workflow technologies, but to build on top of them offering libraries of components and basing these on FAIR Digital Objects as the integrative standard.
The initial CWFR group of about 40 experts from different regions had two meetings during November 2020 discussing the concept –what has already been done and a variety of workflow use cases driven by research needs. Having written a “position Paper” we want to reach out now and include many other potentially interested experts and form a subgroup within the RDA Data Fabric IG as the major collaborative platform.
The BoF session will be an interactive working session in which we aim at discussing:
- Whether the CWFR concept is already clearly enough specified
- What has already been done and should be reused
- How FAIR Digital Objects need to be structured to act as the integrative standard
- Defining further steps
Collaborative session notes (main session): https://docs.google.com/document/d/1yv9wp6y3xy8EhKjzOhNT8GJSiUM7Q618SjWZCw0Atqo/edit?usp=sharing
Collaborative session notes (repeat session): https://docs.google.com/document/d/1HRoLNGmMLO5t8DeGuFfi7c4rAuSnadBkkC8-y3pWjFk/edit?usp=sharing
- Welcome and Short introduction of CWFR (20m)
- Short overview what has been done (20m)
- Discussion of CWFR concept (35m)
- Discussion further steps and final summary (15 m)
This BoF has close liaisons with a number of other RDA groups. In particular, we should mention teh Data Fabric Group which is continuing its work on (FAIR) Digital Objects which are a core pillar of CWFR. There is an agreement with the current DFIG co-chairs to act under the umbrella of DFIG as long as their is no WG definition yet. We should also mention that the CWFR group that emerged out of RDA GEDE has about 50 interested people from many different countries.
- The Data Foundation & Terminology Working Group (historical) and the Data Foundation & Terminology Interest Group described a basic, abstract data organization model which can be used to derive a reference data terminology that can be used across communities and stakeholders to better synchronize conceptualization, to enable better understanding within and between communities and finally to stimulate tool building, such as for data services, supportive of the basic model’s use.
- The PID Kernel Information Working Group (historical) produced a recommendation that contains guiding principles for identifying information appropriate as PID Kernel Information, a PID Kernel Information profile, and architectural considerations. PID Kernel Information is defined as the set of attributes stored within a PID record. It will serve as the fundamental information source describing FAIR DOs. The PID Kernel Information Profile Management Working Group will extend the results.
- The RDA Working Groups on Data Type Registries work on the specification and standardization of machine-actionable data type descriptions which is an essential part of the FAIR DO ecosystem.
- The Data Fabric Interest Group is working on the identification and definition of common components and their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.
- The RDA Working Group FAIR Data Maturity Model was established in January 2019 and aims primarily to develop a common set of core assessment criteria for FAIRness as an RDA Recommendation. In the course of 2019, the WG established a set of indicators and maturity levels for those indicators through participation and contributions from the WG members.
- The CURE-FAIR Working Group, endorsed by RDA in July 2020, will work to establish standards-based guidelines that offer a framework for implementing effective curation workflows for publishing FAIR data and code that support scientific reproducibility. The ultimate objective is to improve FAIR-ness and long-term usability of “reproducible file bundles” across domains. The RDA/WDS Publishing Data Workflows Working Group (historical) aimed to provide an analysis of a representative range of existing and emerging workflows and standards for data publishing, including deposit and citation, and provide reference models and implementations for application in new workflows. The group published its recommendations.
- The RDA FAIR4RS (FAIR for Research Software) is working on applying FAIR principles to other objects (i.e., not only data), including computational workflows.
The group is actively communicating using OSF (https://osf.io/2cy86/)
- 1281 reads