RDA 13: The Social and the Technical
There is no challenge in research data management that can be met exclusively with a technical approach. But, on the other hand, there is also no such challenge that can be solved without it. This is my take-away message from the 13th RDA plenary.
It is a common figure to separate social from technical aspects of the challenges we face in research data management. Oftentimes, they are even treated as antagonists:
Male moderator: We’ve talked a lot about humans and issues on this panel but at the fundamental level it is all about data and technical and infrastructural challenges.
Dr. Devika Madalli: I'll repeat, I think there is an overemphasis on the technological.
— Angela Okune (@Honoluluskye) 3. April 2019
Roughly speaking, defenders of the social aspects discuss what we should do, while technicians try to answer the question what we can do (and how we can do it). It goes without saying that it is important to answer both (and that RDA plenaries are the events where these answers are most likely to be phrased for the first time). But the twist is: they cannot be answered separately. Research data infrastructures unrelated to its users' context are empty, research without proper technical support is blind.
To become more specific, let me give an example from the past plenary in Philadelphia: I had the honor to present some thoughts about shortcomings of usage metrics in the context of the Data Usage Metrics working group. Since humans assess projects, publications and peers with usage metrics, everyone can instantaniously think of drawbacks: They can (and will) be gamed, they are biased by bandwaggon effects and timliness, and they require trust among metric providers. All those issues are rather social, than technical. But finding appropriate responses to them cannot be done without technical expertise.
This even applies to the most radical solution, which is to refrain from using usage metrics at all. Technicians need to point out that every approach to do anything meaningful with research data needs to scale with the amount of data that is out there. Given that this amount grows exponentially, it follows that assessment is a task that cannot be done by humans alone. We need machines to keep up. Granted that, there are still technical approaches which are more or less suited to the task. We need to find the best answers to the said shortcomings. Let's discuss how we can implement counter-measures against forgeries, how usage data can be contextualised, and how we can certify trust in research data repositories. All these discussions took place during the plenary, and they are ongoing. I doubt that we will ever finish such discussions, since technical possibilites advance and social contexts shift. But the most important part: these discussions can only succeed by taking both the social and the technical perspective into consideration.
The need to think the social and the technical together became also obvious to me during the session of the FAIR Data Maturity Model working group. The discussion centered around manual versus automatic assessement of the FAIRness of research data. There are many approaches to assess FAIRness, but as has been pointed out by Barend Mons in the session, a lot of misunderstandings concerning these principles are floating around and the publication clarifying many of these misunderstandings is unfortunately not as known and cited as the original statement of the FAIR principles. One of the key points:
"FAIR is not just about humans being able to find, access, reformat and finally reuse data."
— Mons et al: Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud
Although manual assessment gives humans anecdotal insights into the FAIRness of data, there is no easy way to translate this into machine-actionability. Just one example: If humans arrive on a landing page, they can identify a direct link to download the data. Machines cannot do this as easily. The assessment of Accessibility by a human must also include the current state of technology: Is there a standard that allows a machine to access the data? Is this standard properly supported? While it might be necessary for humans to make the first assessment, the gold standard should be a test carried out by a machine based on the specification of a human being. Again: We must think the two together!
In the end, these thoughts also apply to the slogan of the plenary - "With data comes responsibility": From my point of view, this responsibility is twofold: responsibility about the societal impact of data-driven research and responsibility to implement infrastructures in a way that makes the most out of the technical resources that are available to us.