This year I attended the RDA Plenary #13 in Philadelphia as one of the experts supported by the RDA Europe project. With this article, I would like to take the chance to reflect upon my experience in Philadelphia, summarize in which ways the conference helped me into the understanding of data practices, and gain an international perspective from exchange of ideas, expertise, and socializing with worldwide counterparts etc. It has been a great experience. I had the opportunity to meet many intelligent and sharp colleagues, experienced managers, and hopefully give good technical advices.
Before the conference
Everyone's RDA experience can be different because the plenary draws a very large range of professionals, which is also increasing over the time. There are plenty of meetings for everyone ranging from policy makers, senior managers, researchers, students and early-career staff, and according to their career track, people are likely to be taken to very different parts of the conference agenda. My priorities have been influenced by my role and responsibilities at Cineca, the Italian centre for High Performance Computing where I am coming from, and the duties as RDA expert. My job is half technology scouting and half data management stuff. Large part of my effort is funded by national and EU grants to do applied research in the field of data management and data processing. The other part is spent staying informed on emerging technologies in computing and storage. With this respect, my RDA has been a mix of technical sessions and engaging with colleagues and potential new partners.
Personally, I think the conference was very well organized and very successful. I fully enjoyed the entire event with so many interesting meetings and discussions on various topics. Several of the meetings had been very informative and insightful on their particular subjects.
RDA also acts as a strategic think tank to enhance a constructive dialogue and collaboration on themes relevant to data science and to present the latest research results in many areas.
The theme of this year was “With Data Comes Responsibility”, a sentence coined by Hilary Hanahoe, Secretary General of RDA, emphasized during her opening talk and reinforced during the keynote speech of Dr. Julia Stoyanovich about the importance of transparency into daily data science practices, especially those connected with the life of people.
During the conference, I attended the following meetings; a brief comment comes for each of them.
Tuesday, 2 April
IG Software Source Code. Really interesting and intense discussion about the preservation of publicly available source code. The group potentially spans multiple disciplines, perhaps not all directly connected with RDA, but it provides a solid example on how to preserve valuable data using a sort of bottom-up approach.
IG Data Fabric Session. Perhaps one of the first groups of RDA. Although it went across different cycles, the value of the discussion is still worth the participation. In my opinion, the outcome of the group will be more effective if more concrete and perhaps more scoped targets were defined. For instance, it would be really great if the group could work to coordinate the convergence, even technical, of any existing initiatives on this topic. I found the presentation of the RPID (Robust Persistent Identification of Data) project, including the Digital Object Interface Protocol, really interesting from the interoperability of multiple data services point of view.
Wednesday, 3 April
IG Research data needs of the Photon and Neutron Science community. Although out of my discipline radar, I decided to attend this meeting because the management of very large data facility is a topic I am highly interested in. The overall discussion was vibrant, very tight-knit group though, but I suddenly recognized here the problem that many RDA groups are also suffering from. I also reported this comment into my conclusions but it is evident how, although many groups have a very concrete roadmap, they struggle to deliver tangible solutions due to the lack of effort to cover the last-mile. Many groups prefer to keep reiterating on the same concept to keep the discussion alive.
IG Federate Identity Management. The federation of digital identities at any level is a real issue, and this group perfectly reflects the importance of the topic. Technical solutions do exist. I do not consider RDA be the right place to address this topic but I think the time is ripe now for digital infrastructures to take a decision and converge towards a common solution. To this end, I believe RDA can only help pushing the message forward.
IG Virtual Research Environment: Emergence of New Concepts and Technologies. The aim of this group is to coordinate global VRE/SG/VL initiatives with the goal to form a common architecture and/or deliver best practices. As part of group goals, I proposed to include a summary of all possible solutions available on the market, including respective use cases, especially to avoid communities invest effort into the development of new ones. However, despite the existence of many domain-tailored solutions, even more communities are using the Jupyter (https://jupyter.org/) framework to interactively access and manipulate data. This is indeed a good news.
IG Early Career and Engagement. Creating a focal point within RDA for early and mid-career researchers and professionals into the data management domain is a complex task, especially when machine-learning aspects nowadays dominate the profile of a data practitioner. For this group to be more effective, RDA should handle this paradigm shift and propose multiple career paths as the market requests, and organize different sessions accordingly.
Thursday, 4 April
WG Blockchain Applications in Health: State of the Art Report. Besides participating to a very interesting discussion, this working group also helped me identify another point RDA may improve on. Considering the technology hype, almost everybody would be interested in attending a discussion about blockchain applications. But the reality is a bit more complex, especially while talking about a non-trivial technology as the blockchain is. The goals of this group are extremely valid and very well presented. However, as also discussed during the session, without a clear scope, the group risks to become much less effective as it might really be. I strongly encouraged the organizers to reduce the scope of the working group, starting with the definition of the target audience and then moving from basic concepts to more articulated use cases. I personally think that if the group enables the health community to decide when to use a blockchain and when not, it will already make a big progress with respect to the current situation.
IG Data Foundations and Terminology. This group supports one of the key ingredients of RDA meetings, which is the pursuit of understanding about research, data and its foundations. It is a continuative effort to form a collaborative discussion about data definitions and tools. By definition the group works to keep enriching data concepts adding new details, perhaps endlessly. If this aspect is crucial to improve the discussion, on the other end it risks to become repetitive and postpone the delivery of stable definitions for an indefinite period of time.
If with data comes responsibility, with RDA comes responsibility too. RDA is always a good opportunity to meet friends, colleagues, and highly knowledge experts working into the data field. However, as the data ecosystem is expanding very rapidly embracing new concepts, RDA agenda is expanding too. As a natural gas tends to fill out all the free space within a room, RDA is trying to incorporate all possible topics related to data with the intention to pull together as many as possible experts and thus improve outputs quality. Although this mission is honourable, it tends to consume too much effort and time, and, more importantly, it risks to produce unpredictable side effects such as increasing the distance between research discussions and practical solutions. I always conceived RDA as a powerful mean to improve community’s data management practices by delivering technology selection, service reviews, guidelines and pre-standardization scripts. But exactly on this point I have the impression that the whole RDA movement has still to prove its maturity. The biggest problem lays on the uptake of groups recommendation. For instance some working groups have become so much mature over the time that the discussion keeps reiterating on the same concept without converging towards a final conclusion. RDA is probably suffering from the same problem that many similar organizations are affected of, that is how to cope the “last mile” such as getting outputs adopted by communities, such as integrating new solutions into existing systems. If no countermeasures are taken, RDA risks to become an endless vortex where the output of so valuable discussions will never fly and permeates its audience. The same vortex that impedes communities, or individual researches, feel really engaged and actively participate to the discussion. How groups output could be really transformed into concrete solutions is indeed a complex task but it is exactly what RDA should carefully pay attention to. If RDA does not redefine its identity, no sane community will commit to any uptake plan that fars out, especially given the uncertainty of working groups roadmap, but it is fundamental they remain receptive to input from colleagues who are formulating their own strategic directions for the same time period. Maintaining these sorts of ongoing conversations is a major part of RDA strengths. The new matra could be do less to do better.
The poster sessions were also a great opportunity to interact with other colleagues in an informal way. Fortunately I had the chance to go through most of them finding out how presented topics changed over the years moving from data management to data analytics. It is a matter of fact that machine learning techniques are catalyzing the attention of young researches.
I hereafter respectfully present some suggestions that might help improve RDA.
IG/WG lifespan. If an IG does not evolve into one or more WGs within six months it needs to be closed and possibly be resurrected after going through a new application process. The same approach should apply for WGs. If no outputs are produced within eighteen months, the WG has no reasons to continue.
Communication. Adoption stories need to be better communicated during plenary events and, more importantly, towards policy makers, funders, etc.
Standardization. Certain outputs need to become standards to be adopted. If RDA does not aim at establishing a new standardization body, it should at least become a recognised contributor for existing organizations. On one hand this would make RDA’s work be more credible, on the other it will significantly improve outputs quality.
In conclusion, I would like to express my thanks to all people who have made the RDA #13 plenary possible and those who have shared their experiences during those exciting days in Philadelphia. I have benefited very much from this great event.