Blog by Heidi Laine, University of Helsinki - RDA Europe Plenary 7 Early Career Programme Winner
When I first attended a Research Data Alliance plenary meeting in Washington D.C. in 2013, I felt like a fish with a bicycle. Don’t get me wrong, I like cycling, but I was in over my head. My background is in social science history, so I consider myself half humanist, half social scientist. I don’t have a single data science bone in my body. What is metadata, what about linked data, why should I care about persistent identifiers, machine readable what?! The question I was burning to ask, but didn’t dare to utter, was ‘What does the word data mean for me, a human science researcher?’.
I was at the 2nd plenary due to working as a Science Secretary for the Council of Finnish Academies at the time. The Council had just decided to join CODATA and was looking into ways of becoming involved in the fast growing community of RDA.
Flash forward to Tokyo and the 7th plenary, which I attended in the role of a PhD student and an RDA early career program participant. I was now responsible for my personal open research project, acting as the secretary of the Finnish national committee of CODATA and involved in coordinating activities of the Open Knowledge Finland Open Science community. This time I was the indisputable master of my bicycle, wasn’t I? Or was I?
I am nowadays more familiar with concepts like metadata, identifiers and many more. I understand that in order for open research data to become the norm, we need to keep developing tools and services, creating data citation practices and metadata standards, as well as tackling technical and legal interoperability issues, to name a few.
One thing hasn’t changed though: I’m still not sure what ‘data’ means for me.
As a researcher, my primary sources consist of interviews, e-mails, blog posts and comments, newspaper articles, books, reports, guidelines, recommendations and archival documents (yes, I mean paper) of various kinds. I want to be as open as I can, but there are very few easy answers when it comes to f. e. privacy, copyright, repositories and formats.
‘Workflow’ is one buzzword that I have learnt recently. For me doing research is a constant process of exploring (and discarding) new tools and approaches, one that often resembles floundering rather than flowing. Which part of my chaotic collection of excel sheets, Evernote notes, Google documents, interview audios and transcripts, Atlas.ti files and SmartArt charts is ‘data’ and how should I make it open?
I guess I am part of what some call the ‘long tail of research data’. Which makes me wonder: is there also a long tail of RDA?
Digital Practices in History and Ethnography Interest Group
As much as I appreciated all of the sessions and discussions I attended at the 7th plenary, the one where I truly felt at home was that by the Digital Practices in History and Ethnography IG, chaired by Mike Fortun and Jason Baird Jackson. Though the IG started as a forum for researchers of history and ethnography, it has become a catch-all group for human sciences. In addition to historians and ethnographers in Tokyo there were present at least archeologists, social scientists, computer scientists and librarians. One of the key combining experiences for this motley crew is being first introduced to debates on Open Access to publishing, and gradually through that getting more and more involved in Open Data issues.
According to Mike and Jason, the IG doesn’t so much aim at consensus on a certain issue, as to facilitating conversation and connections. For more fine-tuned issues the IG can foster new groups. That has already happened, as there is a metadata themed working group in the works.
One thing that was made clear right at the beginning of the meeting, was that while we as human science researchers might be at the long tail of data, it by no means signifies that our data or work is less relevant or has less impact. What is small in terms of bytes can be rich in so many other ways.
There are benefits for all parties if human scientists get more involved in RDA. Digital humanities and computational social science research is booming all over, but there seems to be very few international platforms for exchanging that knowledge. Human sciences don’t have a strong history of cooperation, neither among fields nor individuals, and that is something that the IG aims at changing.
Digital humanities or computational social sciences, or just plain old humanities and social sciences existing in a digital world, are all about new researcher skills and interdisciplinary cooperation. Like one of the IG meeting participants pointed out, training needs is the common ground were the IG can start working with the other RDA groups. RDA could also be in a position to address some of the pressing challenges concerning human data in the digital domain, such as the complex ethical and privacy constraints.
Someone said and/or tweeted during the Tokyo plenary opening, that data needs people and that data infact is people. I’m not sure how the statement was meant to be understood and how other people understood it, but to me it felt like one of those things that are so obvious, that you are surprised when someone says it out loud. Which doesn’t mean that it wasn’t in my mind an important and valuable thing to say. Even when data has been produced by, say, the Large Hadron Collider, it is all about people. There is history, politics, inclusions, exclusions, interpretations, communication, relationships, hierarchies, style choices even (remember the comic-sans-gate?). In a word, there are stories.
Let’s print this on a t-shirt for the International Data Week next fall: data is always subjective.
We have all heard it: metadata, metadata, metadata. The reason why metadata is so darn important, is because it tells us what the story is about, like the covers on a book. Historians don’t use the word metadata, but they know all about contextualization. We historians know, that there is absolutely nothing that can’t be used as a primary source for a historical research (a postcard, a building, someone’s trash). You just need to understand how to investigate your sources, understand the possibilities and limitations. You do that by being aware of the nature and the context of the source, i.e. the metadata.
Another thing that the historical sciences community knows a thing or two, that is also at the core of Open Data, is preserving information through time. An interesting practical example that was brought up during the IG meeting was a comment concerning the online tools and platforms we use and their changing nature: in order for future researchers to understand our data they need to know the tools that were used to generate them, so that they don’t take choices made out of necessity as choices of preference or evaluations, and vice versa. This is why we also need to document our tools in narrative form, not just in technical terms, were they GitHub, Google Docs, Evernote or MS Office.
Long tail curling in
During Washington's 2nd plenary there was very little to none in terms of input and involvement from human scientists, according to my impression at least. In Tokyo there was one IG meeting dedicated to human science research and many sessions with discussions that were in one way or another relatable to human scientists. If this is the trend, then I am very hopeful and looking forward to attending many more RDA plenaries and events.
Just like the human sciences should be rehabilitated from the long tail to the core of research data, human science researchers should embrace RDA and bring their point of view and understanding of all things people to every discussion there. I feel like RDA both as an organization as well as a community is ready and even yearning for more interdisciplinary dialogue, so the ball is very much in the human science court.
One more thing while I still have the mic: couldn’t we all just be called scientists? I for one call for an RDA recommendation on naming all academic research science and all researchers scientists. Because it’s all about people, remember?