Blog by Ramine Tinati, University of Southampton - RDA Europe Plenary 7 Early Career Programme Winner
The following article synthesizes my activities during the Research Data Alliance (RDA) Plenary meeting held in Tokyo, Japan, during March, 2016. I would first like to thank the RDA for providing me with an early careers award to support my travel and attendance.
In order to contextualise my research with the ongoing work at the RDA, I’ll provide a little background on my current research. As a research fellow on the SOCIAM project (http://sociam.org), in the Web and Internet Science department, University of Southampton, my current activities are two-fold; I perform studies and experiments on lots of heterogeneous types of data, which typically involves primary and secondary sources of data, either mined by myself, or by other researchers. My second strand of research involves the engineering and development of infrastructure capable of capturing, storing, processing, and analysing this data. Consequently, I interact with large-and-small data on a daily basis; and over the last few years, I’ve seen the growing enthusiasm towards establishing a community around the best practices of data use.
A project which I have been involved in since its incarnation in 2011 is the Web Observatory (http://webobservatory.org), which is supported by the Web Science trust, and the Web Science Network of Labs. In simple terms, the Web Observatory can be understood as a set of guiding technical and social principles which enable the discovery, sharing, access, and querying of different types of data (not limited to Web data). These principles were realised by the development of the Southampton Web Observatory platform (http://www.webobservatory.soton.ac.uk), which offers users to securely share, search, query, and build applications on data. In particular, my work on the Southampton Web Observatory includes research into the technologies and workflows suitable for the back-end big data infrastructure, which includes the ingestion, storage, and access to data, to the analysis and visualisation of real-time data streams.
Being aware of the RDA, and monitoring the activities of the RDA, my line of research aligns very well with the ongoing activities with several of the Working and Interest Groups within the RDA. Being able to attend the 7th RDA Plenary has been a great opportunity to understand how the RDA functions, learn about the ongoing work, contribute to the discussions, and perhaps most importantly, meet the community.
What follows only represents a snapshot of the three days of sessions, plenaries, and social activities during the RDA 7th Plenary meeting. The first 2-days were opened with an engaging and lively plenary session with a mix of talks, panels, and reports of RDAs current achievements. These sessions really illustrated the extensiveness of the RDA, both in terms of the ongoing work, and the community engagement.
As part of my attendance to the RDA Meeting, I was fortunate enough to attend participate in several of the breakout sessions for the various Interest and Working Groups during the 3 days, and also support the Chairs of the Repository Platforms for Research Data, and Long Tail of Research Data Interest groups. Taking part in both of these groups was a great opportunity to contribute to the RDA, and also, help my current thinking in terms of my current activities, and also to the wider development of the Web Observatory. Additionally, these agenda and topics discussed in these IGs was extremely beneficial to understanding how the current development of the Southampton Web Observatory aligns with that of the discussions within the RDA, specifically with the design of platform standards, the representation and use of metadata for describing data, and perhaps how research data.
In relation to my current work, I found many of the discussions very fruitful, and gauging the level of discussion, so did the participants attending the breakout sessions. Within the Platforms for Research Data, there was a good discussion around the functional requirements of a platform capable of serving research data to the community. The current work in this IG has already led to a significant amount of work in constructing a matrix of features, based on the analysis of real-world research platforms. There was unanimous agreement that this would form a useful tool for designers to use when developing new platforms, or even as a guideline (or cheat sheet) for those wishing to use existing platforms. There was also agreement that more platforms need to be analysed in order to get a more holistic view of features required (and desired), and to situate the functional requirement analysis in the wider collection of extensive academic literature. It was good to see that the Web Observatory platform exhibited many of the functional requirements listed, whilst offering certain features which were unique (querying of real-time data), and thus could be a valuable use case for the matrix to contain. One – of several – action points from this breakout session will be to expand the current list of use cases, and also make sure that the IG’s efforts are not being replicated elsewhere in the RDA, and are known to the wider community.
“Open Science is not same as Open Access publishing” (@sjskhalsa)
Another interesting IG which I was able to attend concerned the accessibility of – the expanding volume – of research data in the long tail. This issue is particularly important to us working in the Web Observatory project, because we are starting to find that whilst some datasets are extremely popular, some receive no attention, despite containing very rich content. As identified by the IG, many of these underrepresented datasets are small, badly described, and become overshadowed by the laws of preferential attachment; resources which are popular will continue to gain more popularity, making the less popular go further down the long tail. Whilst there was no consensus on how to tackle this problem (indeed, this is a phenomenon which is not just present in the data world), there are several technical mechanisms which could be used to mitigate, or reduce this effect. For instance, in the Web Observatory, we are labelling a selection “editors’ choice” datasets, which represent hand-picked data, based on their quality, and completeness. Whilst representing a manual intervention, this is one way which we are trying to reduce the effects of data forming the long tail.
For a more in-depth, overview of the RDA sessions which I supported, please take a look at the notes for the Repository Platforms for Research Data (https://docs.google.com/document/d/1xMPqdDNu6AwFD71JUXbO8LXoqhCOJWnSRB6tdjSZKY0/edit?usp=sharing), and the Long tail of Research Data (https://docs.google.com/document/d/1NYePmteIq1BC8QFar709pWCcvL6DdHAnmOfJbeiCFCk/edit), which are also going to be made available on the RDA Website, under the associated Interest Group.
Wrapping up the, the plenary was a closing session discussing the upcoming 8th RDA Plenary, and the International Data Week, and the SciData Conference, all running in conjunction with each other. If the level of anthem and energy of the 7th RDA plenary is anything to go by, September promises to be a really exciting time of discussions, work, and presentation of results related to data!
“What is metadata for you may be data for someone else #RDAPlenary” (@resdatall)
Also, as an added extra, I thought I’d capture and analyse the Twitter communications of the #RDAPlenary hashtag, of which you can find below! In total, the #RDAPlenary generated over 2,000 tweets (which can be found here), with around 63% of these resulting from Retweets, and 11% from direct communications.
A quick analysis of the Retweet (Figure 2) and Mention network reveals the sharing of information (retweets), and the direct communications (mentions) illustrate highly interactive and connected network. In case you haven’t come across these types of graphs before; the network represent twitter users (the nodes), and the communications between then (the edges). The colours represent the category of user based on their tweeting behaviour (which is based on software I created, Flow140). Interestingly, and somewhat different from other twitter conversations, the single connected component shown in the retweet network suggests that the RDA community does not have isolated areas of conversation, rather, the topics draw on many members who may not be directly involved in the Working Groups.
Until then, happy Easter, and look forward to contributing to the RDA interest groups!