With over 14000 individual members from 150+ countries, RDA provides a neutral space where its members can come together to develop and adopt infrastructure that promotes data-sharing and data-driven research
Max Planck Computing and Data Facility, Garching/Munich
Bio: After finishing the Diplom-Ingenieur Degree in Electrical Engineering at the Technical University Berlin in 1974 with computer science and digital signal processing as main topics, Wittenburg started working as research assistant setting up a center for control computation at TUB. In 1976 he became head of the technical group at the newly founded Max-Planck-Institute for Psycholinguistics. In 2011, Wittenburg became the head of the new unit called The Language Archive that was built as a collaboration between Max-Planck-Society, Berlin-Brandenburg-Academy of Sciences and the Royal Dutch Academy of Sciences. Since 1988 he is member of the IT Advisory board of Max Planck Society, had/has leading roles in European research and data infrastructure initiatives and was member of the High Level Expert Group which produced the “Riding the Wave” report on scientific data.
The Data Fabric IG (DFIG) identified that working with data in the many scientific labs and most probably also in other areas such as industry and governance is highly inefficient and too costly. Excellent scientists working on date intensive science tasks are forced to spend about 75% of their time to manage, find, combine and curate data. What a waste of time and capacity. The DFIG is therefore looking at the data creation and consumption cycle to identify opportunities to optimize the work with data, to place current RDA activities in the overall landscape, to look what other communities are doing in this area and to foster testing and adoption of RDA outputs. The goal of DFIG finally is to identify so-called Common Components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios such as replicating data in federations, developing virtual research environments, etc. Much important work is being done on data publishing and citation, but DFIG believes that we need to start at early moments in the "Data Fabrics" in the labs to organize, document and manage data professionally if we want to meet the requirements of the coming decades.
In addition, we can ask what the FAIR principles and the EU Open Science Cloud intentions in this process are. The FAIR principles are great since they define a common and global language about things we should do when dealing with data. However, we should not make the mistake to see them as blueprints for infrastructure building, they can be used as criteria that need to be met. Yet, we need to convince industry that they make sense. EU Open Science Cloud needs to address this tension between reference architecture and component approach. I believe that we need both poles: reference models and architectures help us to discuss and understand the overall challenges, components and testbed configurations will help us to build in the dynamics and to improve our understanding about the landscape. EOSC could be an excellent platform to carry out these discussions and to realise testbeds to achieve stepwise improvements.
The Webinar will briefly discuss some core statements from current reference models, describe the procedure taken in Data Fabric IG, present a few stable components, and discuss the usefulness of combining the two approaches in further work.