By Fernando Aguilar - RDA EU Early Career Grant Winner – Universidad de Cantabria
According to IBM’s definition of Big Data (https://www-01.ibm.com/software/data/bigdata/what-is-big-data.html), every day we create around 2.5 quintillion bytes of data, which means that the 90% of the available data has been created during the last two years. Research data is an important source for this continuous data creation and it needs to be used in a proper way that allows the scientific community to get an added value and obtain information from this data. Research Data Alliance is, from my point of view, the most important forum where researchers from around the world can discuss and agree new and common methodologies to manage the data during all the different life cycle stages, which is very interesting for my PhD Thesis. During The International Data Forum day and the RDA 8th plenary, I took the opportunity to learn from different approaches on data management and check out how different communities can contribute to develop a more open science.
It is clear that the big amount of data generated every day needs to be correctly administered to get that added value since analyzing data from different sources is key for doing science nowadays. The potential use of integrated and linked data will allow to address global challenges and to solve complex problems we were unable to solve years ago. Shared economy initiatives like AirBnb are the models that Open Data should take into account since they are models based on trusted relationships.
Within a proper framework where all scientists generate data based on the four FAIR principles (Findable, Accessible, Inter-operable, Re-usable) as well as other open-data oriented recommendations such as the RDA ones, reserchers could integrate their data being sure that everything is correct. However, there are basically two problems: trust and services gap. In many cases, scientists do not trust in sharing their data or they do not trust in using data from others because of many reasons. Regarding the service gap, in some scientific fields there is a gap between different levels of services (data, software, projects) that are not connected or directly there are no service from any level. Both problems can be addressed creating collaborative environments based on trust, where people could share both data and services supported by e-infrastructures just as sharing economy initiatives like AirBnb put together different actors providing or using services. In order to achieve this goal, cloud computing could be a key factor due to the flexibility, scalability and variety of service that it can provide.
The European Commission is making efforts to develop that described collaborative environment for research. In the EC context, it is called European Open Science Cloud and it would be a common space where all the data infrastructures can be integrated for sharing not only data but also services. Managing the big amount of available data is basic for the human development in the following years and it needs to be accessible not only for researchers but also for the other stakeholders: policy makers, business and society.
During the RDA 8th plenary, I could participate in different WG, IG and BoF sessions that were very interesting. I have realized that RDA is a very important forum at international level and supported by very important institutions like NSF, NIST and the European Commission. Around 40 countries were represented which indicates the international character of the event. It was plenty of sessions, hackatons, workshops, etc. The first plenary session regarded the importance of supporting transitions from projects or initiatives that generates data to a sustainable status. For public funders it is clear that data generated by public funds is a public good that need to be given in return, but sometimes there no media to do it. Innovative ways are needed. For instance, Open Source is supported by the community and there is a wide rank of options in that sense, even mixing business models or academic, but creating a stakeholders community is needed to create users of your products.
Combining different methods can drive you to success and can be the proper way for sustainable data.
Transition from project to something stable is not easy. In the first stages of a project it is well supported (money, staff, etc.). The problem comes after those stages: How to adapt? How to enroll other communities? Strategy is important, but the goals from your project need to be dynamic and changeable, being adaptive to new environments.
Another important issue is the knowledge preservation in terms of people skills. How to transfer that knowledge? How to compete in the very competitive market with other private companies? The answer is not easy, but I think that there are available tools that can contribute to preserve the knowledge such as wikis or repositories where other people can access and understand the work of others. In terms of data generation, adding well-defined metadata to data make it reproducible and understandable.
One of the most inspiring talks was the overview of the African science landscape as well as the data sharing status. The problems that they have are not so far from those we have, and also there is a gap between researchers who are not always well communicated and coordinated. African Academy of Science is the key organization to develop those kind of actions. It aims to be the mechanism to drive the scientific and technological development of Africa, with five regional offices distributed within the continent. It stimulates funding programs like DELTAS, Grand Challenge Africa or CIRCLE and has collaborating institutions from external countries (USA, UK, France…). In my opinion, coordination initiatives for researching in Africa can contribute to the general development of the different countries since they would be able to create a collaborative environment for science which could enforce the Universities and other research institutions. It would impact directly in the African R&D industries. To do so, the young (and not young) scientists need to be inspired and stimulated to Open Science.
During the RDA Plenary, I had also the chance to participate in the Joint meeting of IG Metadata, WG Metadata Standards Catalog and IG Data in Context. There I noticed the difference of the use of metadata along the different scientific fields: some fields are almost not using it meanwhile some others treat metadata as a basic aspect on their data management. The benefits of metadata are many: easier searches, integration and interoperability, machine to machine processing, etc. I think metadata is key for the future development of a collaborative environment but there is a lot of work to do in that sense in order to guarantee the four FAIR principles. We need to study the different use of metadata along the different scientific fields and try to understand how we can make the different metadata standards interoperable and I think that the different RDA Groups are the proper place to discuss. After that, recommendations can be released to allow others to know the best way to manage.
There was also a session where the early career scientist that were in the RDA plenary could share thoughts and opinions regarding the different activities that took place there.
In conclusion, the International Data Forum and the RDA 8th plenary was very helpful and interesting for me since it gave me an international overview about the different activities in research data managing that are ongoing around the world. The interdisciplinary character of the event gave me also different approaches for addressing data management, which means also new ideas in how to solve problems. I had the opportunity to discuss some topics, share my opinion and thoughts and learn from others. Finally, it confirmed me the importance of the Open Data and Open Science and how it will be key in the following years for researching making science more open, more accessible and overall, more equitative and fair for everyone.