Blog by Mersiha Mahmic-Kaknjo, Split School of Medicine - RDA Europe Early Career Programme Winner
As an early stage career researcher, coming from the field of biomedicine, researching on data sharing, I was curious to find about overall data-sharing practices, all over the world, across all disciplines. In the beginning, I was afraid that I would not be able to understand that IT language. It turns out that the things are not complicated at all, similar principles were woven through all the sessions I was present in: yes, there is a need to share data, yes it could be complicated, yes there is a need to work on interoperability, infrastructure, etc... And everybody agreed that persistent identifiers are a must.
I was an official note taker to one stream and two interest groups’ meetings.
Service orientation to data and high-performance and distributed computing was one of five streams happening on Tuesday 22nd Sept in Paris. This is where I learned about different computing initiatives both in Europe (PRACE, ETP4HPC, Helix Nebula The Science Cloud, European Grid Infrastructure, INDIGO-Data Cloud), as well as outside of Europe (XSEDE Extreme Science and Engineering Discovery Environment, Compute Canada, National Research and Education network in Brazil). The session was wrapped up with a review of the day’s talks by Alan Blatecky and Karim Chine. They noted that it was clear that science is changing – both data and HPC infrastructures were getting more and more complicated, and that integration is becoming more and more important. Other emerging issues seem to be continuity and more agreement on social contracts.
Interesting was also a second session I took notes of: Joint meeting of IG Libraries for Research Data, IG Long Tail of research data & IG Repository Platforms for Research Data. Each of the IGs and its work was shortly presented. Use cases were discussed, being relevant for all three groups.
My third assignement was to cover IG Ethics and Social Aspects of Data meeting. There I learned about citizens' science and how citizens might grow science and about data sharing importance in this kind of science: it was a project started by Corey Jackson and Kalpana Shankar. The project was run on 15 samples of citizens' science projects, and the research was about the way these projects communicated policies and the content of these policies. Conclusions were that language on privacy and ethics was almost sufficient, but the ethics issue was insufficiently covered in this, yet small, sample. This project will be continued (on a larger sample) and probably presented as a deliverable for this group. Some other deliverables were defined too: developing of case studies/use cases on citizens working with scientists, annotated wiki bibliography will be developed, as well as educational material.
Eye-catching was also a BoF Big data in health, which was trying to become an IG. There is a large quantity of patient data being produced all over the world, and it could be used to tackle many serious issues. Yet the greatest issue would be anonymization.
For me personally the most interesting session was Joint meeting of IG ELIXIR Bridging Force & WG Biosharing Registry: Life Sciences and Sensitive data. The most intriguing at all was the anonymization issue, or how to protect confidential data, which seems to be one of the biggest barriers to data sharing in health arena. It was brilliantly presented by George Alter. There are 4 ways to secure data: concept of safe data, safe places, safe people and safe outputs. There is a large scale of the level of data confidentiality needed, but there are also various solutions for this to be achieved, each having its benefits and pitfalls. When asked about risk, he emphasized that there were two dimensions to assess the risk: harm and identifiability. Susanna Assunta Sansone had a very interesting presentation on registry of repositories in biomedicine.
Also, I presented a “designer” thing in Paris; it was a poster on preliminary results of Data sharing practices in clinical trials prior to 2000. Poster itself was designed by Nevena Jerić (Apropo media), and it presented methodology of scoping literature review done as a part of setting baseline for IMPact observatory, a natural experiment that monitors the transition of clinical research regarding transparency of trial data. The initially vague notion of concept and benefits of data sharing led to a rich discussion and pioneer efforts which laid the ground for further development. We observed dynamic that was sometimes led by specific patient groups (such as HIV/AIDS, cancer), triggered by media due to scandals, and a broad discussion about pros and cons of data sharing.
I have realized that even though I had no idea what a cloud was, I have been using some quite a while...
The issue of the 6th Plenary was data climate change.The astonishing idea came that there is an army of people working on getting the data on climate around the clock, and there are just few people working on getting the data together so we could know what to wear tomorrow or if our plain will take off tomorrow. It's a kind of magic, coming out of human capacities: to be able to predict weather, and by tracking the climate changes to be able to predict and maybe even reverse them.
I became aware of paramount importance of data curation. It is important to build infrastructure; yet the bigger issue is to keep track on what data and where these are stored.
We are surrounded by data. We need data. More and more data is emerging and will be emerging throughout our lives. Data are making our life both: easier and complicated. It seems that data fabric is getting denser and thicker, there are going to be more and more colours. We are going to collaborate transdomain more and more. We should get used to it.
I myself produce an abundance of data. I hardly delete anything. Get organized, you may need many of these data! – I keep saying to myself! Reuse as many as you can. Take advantage of more and more data becoming publicly available. They are fruits, free, beautiful, juicy, sweet, and ... easy to pick :-)