By Tomasz Miksa - RDA EU Early Career Grant Winner– Vienna University of Technology
If you ever wondered what the RDA really is, whether you fit in there, and why would it make sense for you to participate in the plenary meeting, then you should keep on reading. In this blog post I will address these questions by discussing my experiences of the 8th RDA Plenary meeting in Denver.
It has been a week since I came back from Denver and I am stilling going through the notes I have taken during the meeting. My notes contain new ideas, links to interesting tools, titles of new publications, and what is the most important, contact detail to new colleagues working on similar or related problems. RDA is a great mixture of backgrounds, interests and personalities. I believe this is the only place where researchers having a wide range of problems concerning data can meet with those who are willing to help them solve these problems, or point them to already existing solutions. This is possible due to the structure of meetings. There are common sessions in which themes important to the whole community are discussed. There are also multiple workshop-style meetings in which specific topics are discussed. Furthermore, participants not only can ask questions, but also can volunteer to deliver their own presentations. Thus, there is a chance to discuss new ideas and receive early feedback from experts in the community. I do not have to mention that coffee breaks and the accompanying events are also great opportunities for networking and exchange of ideas.
Another reason why my notebook is full of notes is due to the fact, that I planned my meeting very carefully. Before I went to Denver I made a list of presentations I want to attend. Sometimes at the first glance I had doubts whether a given meeting would fit both my expertise and interests. However, in fact these meetings were the best, because they helped me identify that similar problems exist in other fields and the exchange of expertise could help in both domains. For example, the discussion of IG Research data needs of the Photon and Neutron Science community was about containers and virtualisation. Meanwhile the IG Reproducibility is also very interested in investigating containers as means of increasing reproducibility of research. This is how new ideas are born!
I felt from the very beginning that I am in the right place at the right time. This is because many presentations and resulting discussions referred to reproducibility, data management, data citation, and research infrastructures. These are the topics at which I concentrate most in my research. It was a really great feeling to see data enthusiasts filling the grand ballroom of Sheraton in Denver emphasizing how important these topics are, how much we have already achieved, and also how much there is still to achieve.
I liked especially the presentation of Unni Karunakara who talked about Humanitarianism: Action + Data. To begin with, he explained meaning of humanity, impartiality, neutrality, and independence. These four principles influence how doctors without borders can collect data and to what extent one can trust the data. The data collected in regions of military conflicts or humanitarian crisis can impact security of people. Hence, proper mechanisms for collection, processing (including aggregation and anonymization) are very important. For this reason, doctors without borders rely on data collected themselves or heard first hand. The quality of data underpins the credibility of the whole organization. Unni Karunakara also provided examples of how data can be misused to misinform the publicity – same measure quantified using different methods can provide significantly different results. Hence, it is important to use the data from a trusted source which uses correct methodology. Finally, not all of data can be shared and that is why embargo periods on data are needed. Data can be misused when put in hand of wrong people. He concluded his talk with reference to Florence Nightingale who was the first person to use medical data to convince decision makers to change policies in hospitals. Her data showed that more people are dying in hospitals as a result of improper treatment, than on battlefields. During the open discussion Unni Karunakara received many questions. Most of them dealt with the ethical aspects of collecting data while at the same time helping people in need. He argued that such data is essential not only to get more focus of publicity and thus improve the quality and quantity of help, but also to verify whether the actual actions bring the expected results. In my opinion such stories give all of use new fuel and power to work on better tools and data management solutions that enable people like Unni Karunakara and doctors without borders to help those in need.
I participated in several workshop-style meetings of interest and working groups. One of them was meeting of IG Reproducibility in which there were two excellent presentations. The first one was delivered by Victoria Stodden. In her presentation Victoria focused on explaining three types of reproducibility: empirical, statistical, and computational reproducibility. She emphasized the change that is ongoing in the understanding of the reproducibility term. Traditionally the computational reproducibility focused on the scientific method. Nowadays, it also covers the infrastructure used in experiments, because the technological changes can affect the reproducibility. This can be observed in large scale simulations (data driven computational science). The presentation also included overview of tools and software that enhances reproducibility and facilitates dissemination of scholarly records. The second presentation was delivered by Andreas Rauber who presented the PRIMAD model that was developed during the Dagstuhl Seminar on Reproducibility of Data-Oriented Experiments in e-Science. The model assumes that reproducibility can be affected by Platform, Research Objective, Implementation, Actors, and Data (PRIMAD). Depending on kind of change the reproducibility is affected in a different way. Furthermore, the presentation included a decision tree that can be used for assessing whether a reproducibility paper is worth publishing. Apart from common criteria like code availability, or outcome identity, the tree uses new criteria like “are you able to state why the results differ?”, or “do you know how to fix them?”, to name just a few. The presentations were good catalysts for discussions among the participants and continued over the next coffee break.
I also had a chance to present my work on Actionable Data Management Plans (ADMPS) during the joint meeting of IG Active Data Management Plans, IG Preservation e-Infrastructure, IG Reproducibility. I presented the concept of ADMPs that cover the same themes as standard Data Management Plans, but particular sections are filled with information obtained from existing tools fostering reproducibility. ADMPS can be considered as an automatically collected metadata about an experiment. They have a machine-readable data model that allows organising information in a structured way. The key message of my presentation was that we already have all the tools in place and now it is about making them talk to each other. The ADMPs are the information carrier that allows this integration.
I was happy to receive stimulating questions, valuable suggestions, and positive feedback in general (also in social media). It was only a pity that the joint meeting was scheduled as the last one – just before the closing plenary session. Hence, many people were in hurry to catch their flights back home and we did not have much time for longer discussions. However, we are in touch and I look forward to further excellent cooperation! Because this is what RDA is about, isn’t it?