“Data! Data! Data! I can't make bricks without clay”, cries impatiently Sherlock Holmes in the era of Open Science
Blog by Paola Masuzzo, Ghent University - RDA Europe Plenary 7 Early Career Programme Winner
Perhaps one of Sherlock’s most famous quotes, the title of this blog post captures the need to gather as much data as possible when reconstructing facts and making logical deductions. Even though their use and scope have changed over time, scientific data have always been and still remain the basis on which objective conclusions are drawn, the basis of the scientific approach for understanding the world.
This was the key message at the 7th Research Data Alliance Plenary Meeting which took place in Tokyo, Japan, from the 1st to the 3rd of March 2016. I had the privilege to join the meeting as one of the winners of the Early Career Support Program, a great occasion to be introduced to the exciting world of RDA (and to Japan, too, of course!). I was immediately hit by the special audience: students, researchers and policy experts, research data champions, publishers, all getting together in a friendly yet highly professional atmosphere. The degree of interdisciplinarity was impressive as well, with study areas ranging from climate change, over human studies, research integrity, medical investigations, to marine biodiversity, just to name a few.
To break the ice and set the mood for the following days, a symposium on data sharing took place on the 29th of February, organized by the Japan Science and Technology Agency (JST).
“Why do we need to talk about Open Science?”
A beautiful introduction on “Open Science as a practice” was given by Yuko Harayama, the Executive Member of the Council for Science, Technology, and Innovation, Government of Japan. Harayama started her presentation with a very simple, yet powerful and provocative question: “Why do we need to talk about Open Science?”. I have myself tried to answer this question for a long time now, and have come to the conclusion that “In theory, theory and practice are the same. In practice, they are not.” It is true that science is open by construction, as it is a cumulative process that goes through peer review and concerns knowledge sharing. However, in reality, things get quite different very quickly. First of all, access to knowledge can be very expensive (just think of the serial crisis in the scholarly publishing). Furthermore, conditions of use, ownership and boundary between public and private, are all issues that need to be taken into account. However, we live now in the time of Internet, where the access to information is instantaneous, available at anytime and anywhere (in theory, at least), and at almost no cost (again, in theory). So this seems to be the perfect time to go for open science! Open in the way we produce knowledge (as in citizen science), open in the way we access scientific publications (as in Open Access), and open for anyone to access and (re)use research data.
“Today’s data, tomorrow’s discoveries”
The introduction to open science from Harayama set the stage for further considerations on how science is drastically changing with the advent of Internet and the growth of data storage capacity and computing power. Jim Kurose, the Assistant Director of the National Science Foundation (NSF), made it very clear that if we want to create a knowledge-based society we need to build on scientific results of the past, both in terms of publications and data. This philosophy seems to be very much embraced by the European Commission, represented at the symposium by Jean-Claude Burgelman. In the Web 2.0 era, science is becoming more open and collaborative, as well as more data intensive. These trends are beginning to impact existing academic and scientific institutions, their funding structures and procedures, their hiring processes, and even the type of education being sought out by students. It is crucial of course to identify both drivers and barriers towards this shift in the paradigm of science, but, as Burgelman clearly stated, whatever we would do at whatever level, it has to be stakeholders driven, and it has to be bottom-up.
“Data: first-class citizens in Science”
As mentioned by Mark Parsons, Secretary General of RDA, the main goal of RDA is to build both the social and technical bridges to make data sharing happen in Science. However, it is very clear that this will never happen if data will not become first-class citizens in science. Several people at the 7th plenary in Tokyo have highlighted how data-driven science can constitute the trigger of scientific development in the 21st century, and how pressing is the need to cybernetic infrastructure, education, training and computational resources.
Haruki Nakamura, from the Institute of Protein Research, Osaka University, presented a new view on the classical Data-Information-Knowledge-Wisdom pyramid in science, which I find absolutely fascinating. Basically, Wisdom in science should be read as Principle: big data and Artificial Intelligence can produce new science, thus new principle. However, some scientists might argue that Newton did not find the principle of gravitation after having seen an apple falling from the tree. If we were to collect big data - lot of apples - this process will still not derive the principle. On the other hand, if we let the data talk, we might be able to discover things we neither knew nor expected, and to see relationships and connections among the elements, whether previously suspected or not. Does this all mean that big data is going to mark the end of theory in science? I personally do not think there exist a unique answer to this question (after all, big data, computational resources and data analysis all played a crucial role in the discovery of the Higgs boson, but the discovery of the Higgs boson was not data-driven).
“Open data: the force awakens!”
One of the plenary talks that completely got my attention was “The power of data - from scientific discoveries to societal benefits” by Masaru Kitsuregawa, director of the National Institute of Informatics, University of Tokyo.
I think I had never seen before in such a clear way the real, concrete benefits of open research data. Kitsuregawa has not only presented powerful infrastructure to make data sharing possible (just think of the Science Information Network, SINET, a backbone that covers more than 800 universities across Japan), but has also highlighted few amazing scientific and societal results achieved thanks to open data and data sharing. In 2014 Hiroshima Prefecture in Japan was struck by a series of landslides following heavy rain, and many other global natural disasters are increasing dramatically. The University of Tokyo is involved in the GEOSS initiatives, with a continuous monitoring taking place, and with predictive discharge being possible with the means of numerical weather forecast.
Another important application was shown, in the field of healthcare for developing countries. Grameen Village Phone is a unique idea that provides modern telecommunication services to underprivileged people in Bangladesh. These mobile phones have very cheap billing rates and are also given on easy loans from Grameen Bank. Once given a phone, the subscriber is encouraged to provide the services to the people in the adjoining area, covering both outgoing and incoming calls. In this way, he or she can earn money to repay his or her debt to the bank as well as to earn a profit. Many inhabitants of the rural area of Bangladesh, particularly underprivileged women, have been able to change their lives with the help of Village Phone. This system could become very easily a method for health management for countries like Bangladesh.
Kitsuregawa closed his presentation with a strong message, which I believe can summarize the entire philosophy of the plenary: “Let us transform our society with Research Data!”.
“An RDA newcomer at work”
As an early career European researcher, I was given the opportunity to take action and do some real work during the plenary meeting in Tokyo. First of all, I presented my PhD research with a poster, together with the other early career successful applicants. The poster, which can be found here, illustrates the need for open data in the cell migration field, and the ecosystem that will be built to satisfy this need, through the recently kicked-off EU-H2020 MULTIMOT project.
Furthermore, the RDA organizing committee assigned me to the “Health Data” interest group (IG) and to the “Data Citation” working group (WG). The “Health Data” IG provides an ideal forum for discussion on the specific issues that arise when using data management and analysis techniques in a healthcare setting, particularly (even though not exclusively) focusing on privacy and security concerns. During the meeting, several key points were brought up both from the chairs of the session and from participants in the audience. Perhaps the aspect that needs prioritized attention is the development of a form of dynamic consent, which is essential if we want to maximize the use of (bio)medical data (something that is not currently happening). For the interested readers, minutes of the meeting can be found here.
The “Data Citation” WG is instead focused on releasing recommendations to make dynamic data citable, an aspect extremely crucial especially for research data. The WG has finished its work at the RDA Plenary 6 in Paris, so it is now focusing on offering support for the implementation of the recommendations, on collecting feedback from use-cases and run through all the lessons learned during the process. For the interested readers, minutes of the meeting can be found here.
I would like to conclude this post expressing my gratitude to RDA Europe for having given me the tremendous opportunity of participating to a great Plenary, and I certainly look forward already to the next one!
Paola - @pcmasuzzo