Reflections from my first RDA Plenary
It is undeniable that data underpins modern society. Data is everywhere, and is constantly being generated and consumed in increasing quantities. However, in order to exploit the data to its fullest extent, not only does it need to be managed correctly, it needs to be easily accessed and distributed without barriers. Yet still there are problems within the data community, especially with the infrastructure over which data is shared. The 9th RDA Plenary provided just such an opportunity to discuss the problems and potential solutions, in the host city Barcelona, home to Gaudi’s architecture, Barça F.C., and for 3 days in April 2017, over 500 people associated with the pursuit of bettering the data community.
My initial thoughts upon entering the conference (other than bemusement over Hotel Sant Barcelona’s space theme, including an astronaut statue and flight simulator) was the range in participants. Not only were there people from 45 countries, and representation from everyone from academics to government officials, publishers to businesses, but the range in topics was truly extensive: rarely can you go to a conference where medicine, fisheries and earth sciences are represented. However, given the mission statement of the RDA, openly sharing data to address the challenges of society, this shouldn’t have been surprising. In the opening session Arcadi Navarro Cuartiellas compared the RDA to the scientists of the Renaissance, calling the work of the RDA a second Renaissance through facilitating the accessibility and recoverability of modern data to anyone who may need it. The keynote address by Augusto Burgueno Arjona, however, was far more provocative, questioning whether the RDA was actually a bottom up initiative, despite previous speakers highlighting the bottom-up nature of the organisation. This caused quite a reaction from the audience, both during the questions and whilst speaking to other members at lunch. This challenging of statements would be a first glimpse of what the plenary would entail, with all meetings I attended having an almost equal amount of both audience and presenter participation. Although at times it meant as a newcomer, the narrative of some sessions could be hard to follow, eventually I got used to it, and found that the high level of participation allowed considerably more problems to be highlighted and potential solutions found.
Over the next two and a half days there was a plethora of working groups, interest groups and birds of a feather sessions, demonstrations, posters, coffee breaks, and a women in the RDA breakfast, all of which aided the interdisciplinary nature of the meeting, and helped me especially as a new participant in RDA get to know people from different disciplines. My own work, which I presented as a poster, involves data mining of satellite infrared data to detect active volcanoes through machine learning. Throughout my work, I have constantly had issues with data not being open source or poorly stored, and therefore reliant on other researchers to send me their data. The hassle involved with getting data that could be essential for hazard management, is primarily what led me to apply for the RDA Early Career Bursary, in the hope to understand how to better use data, store my own openly and in line with the data standards set out by the RDA, and to also teach my students how important data really is and incorporate aspects of the RDA outputs in their education.
In addition to presenting a poster, we were also assigned two sessions to participate in, which were chosen given our research interests. The first session I was assigned was the scholarly link exchange working group, about the link between data and publications, and providing an infrastructure (scholix) that can connect datasets with the academic publications. By connecting these, it is hoped that this will increase the discoverability and reuse of datasets, and allow greater exploitation of the data. However, sharing data, comes with its own problems, for example, what metadata should be included in the exchange (which brings its own questions about the licencing metadata), how can we increase data citation awareness, and how can we incentivise scholars to make their data open source. The complexity of the problem seemed daunting, yet the team working on scholix appears to have developed a well-thought out, sustainable system that isn’t burdened with additional complexity to the consumer. Although there were some questions about replication datasets and the complexity of the data type that could be stored, the scholix system has been laid out to provide the infrastructure for additional services to be added too. With scholix being nearly operational, it is now time for the group to think how to market it people so its incorporation into the data community is successful. This session appeared to be very successful and makes for an incredibly useful tool, that should benefit both data providers and consumers alike.
The second assigned session was the array database assessment working group. Array databases provide an easier way to store and query large datasets such as space sensor data, geospatial data, statistical data and health data. In a world where Big Data is increasingly used, access to array databases can allow an effective way to manage data. For database managers, several questions need to be considered when storing data, such as, what data is necessary, what is the data being used for, where is it being used and what interfaces can be supported, and will the array databases be coupled with the appropriate metadata. For me personally, who has only ever consumed the data from databases, I hadn’t considered that there were so many different points to consider for storing data. As a newcomer to the group, it was interesting to see how when working through the more complex technical issues, nearly everyone in the room contributed, and even if a solution was not immediately found, the group considered ways to tackle the problem.
In addition to these sessions, I also attended the geospatial interest group, the mapping the landscape interest group, metadata standards group and data rescue group sessions, all of which were completely different, however the common take-away from every session was that collaboration means better science.The RDA’s work is vital and it is essential that other associations, commissions and educational institutes use the recommendations given by the RDA. From my perspective, I believe that if more people within the scientific community knew about the work the RDA did, then they would not only undertake the recommendations given, but hopefully contribute. I am grateful to the RDA for allowing me to have the opportunity to come to this meeting and meet people who are truly passionate about data, and that really are fuelling the second renaissance.