International Data Week 2023: “A Festival Of The Work Of The Community”.
IDW2023 – A Festival of Data, and in the words of Hilary Hanahoe, RDA Secretary General, “a Festival of the work of the community”.
All photos thanks to Matti Heikkurinen
This was the fourth edition of International Data Week, a joint gathering between RDA, CODATA and World Data Systems. This year’s vast conference welcomed over 800 international delegates, from 48 countries, to the city of Salzburg, as well as over 120 online attendees.
Missed anything and want to access the recordings? Then tune into to see the Whova recordings here. The full programme is here and all recorded sessions will be available on the programme site from 7 December 2023.
If you would like to catch up on the key takeaways from the RDA Plenary sessions at IDW, don't miss our "Post Plenary Webinar" on 16 November: Register here.
Check out the most recent Plenary Pathways that are helpful to navigate all the different sessions at RDA plenaries.
The list of plenaries and breakouts is so large, so diverse, reflects so many activities, such hard work, all a collaborative effort of the international data community. The plenary sessions themselves were wide-ranging but one of the common threads was that of: Inclusivity and Interoperability – also with a touch of AI. So many talks focused on the importance of “cross cutting domains”, keeping “science practices sustainable” and “relevant at the global level”, as well as making information “trusted and legitimate” while at the same time “changing the current rewards system”. If one were to see the IDW as a flow of themes, the overarching narrative was: how can we respond to the pressing challenges of the 21stcentury, via Open Science at a global level and taking everyone along the way, from all the regions of the world with all diverse practices and ensure trusted and legitimate information.
And then to the “do-ers” of IDW, who are putting this all into practice: the many sessions in between the plenaries, where the real implementation continues to be discussed, it was inspiring to see movements on key Open Science themes: making data interoperable, aligning on standards, PIDs, data repositories, federated systems, the digital divide, policies, Global Open Research Commons, FAIR and this list goes on…
In short, the plenary environment created excellent opportunities for ad hoc discussions that more often than notled to new areas of common interest and expanded contacts beyond the initial group or the participants in the sessions they were following. Getting out of ‘one’s comfort zone’ and attending sessions you normally wouldn’t means the combined impact of these planned and fortuitous meetings is perhaps best summarised by a comment from one of the RDA Europe awardees: “useful, came back with 20 ideas”.
Selected sessions at a glance:
How can we equitably implement Open Science practices across the globe? The “Inclusivity in Open science while advancing research assessment and career pathway impact session” touched on many vital parts of how we implement OS practices across the globe, not least that the standards we develop and accept should always ensure the principles of inclusivity and improve ways with increased focus on quality rather than quantity of outputs. Speaker Jinny Barbour iterated that there is a profound problem with research assessment, the way it is set up is not inclusive, it excludes any nuanced ways of doing science. While we may acknowledge this in our European setting, it also profoundly disadvantages researchers and even whole countries, given the research process is all interconnected, as she stressed the disadvantages of the “Matthew” and “Halo” effects. Additionally, what are the needs of minority and indigenous groups? Another key take-away is the very western-centric viewpoint that is so often taken for granted, as stressed by speaker Maui Hudson, for example we often overlook the importance of giving credit to data coming out of indigenous communities in aspects of AI: much data is created but very little credit given to the originating source.
RDA Outputs – How do they work in real-life and how to demonstrate impact?
The session gave a first sneak peak of the brand new RDA strategy 2024-2028. This was followed by an overview of the RDA Organisational Assembly and its crucial role in signing off on RDA outputs. Wim Hugo gave a vision of how RDA can track and tag outputs in order to demonstrate input via a front end ‘Maintenance Facility’ which will better harness RDA outputs. Four RDA adopters told their stories of how RDA outputs were rolled out in their respective contexts. Four very different presentations focused on the practical roll out. Daniel S Katz also highlighted the deeper issue of how RDA can harness the outputs via looking at what we can learn from processes such as organisational change. The discussion focused in order to look for successful impact. It is crucial to ensure that the RDA outputs are more adoptable and we need guidance on how impact is expected to happen. See the full slides here for a deeper understanding of the outputs used in certain contexts.
Much food for thought for RDA as it synthesizes the outputs, makes them FAIR and targets them at the right communities. The main takeaway is that we need a navigation of RDA outputs but also to maintain the integrity of RDA. One solution is to focus on looking at packaging multiple set of recommendations. Regional engagement is useful here. We also need champions, for example to convince top management? It was also pointed out that top management love certification, if the outputs can head towards that. Ultimately RDA might need to set up a Synthesis WG in order to steer all the valuable outputs.
Ethics as a first class subject: The Salzburg plenary may be remembered as the plenary where ethics became a “first-class subject” in the research data domain. However, the conclusions or recommendations reached were very much tentative and preliminary. The final plenary session illustrated the urgent need for the involvement of the experts – including the ones in the RDA community – in the broader discussion of legal, ethical, and societal impact. The session presented the participants with kaleidoscopic snapshots of discussions – some between the panelists and the current generation of Chatbots - about ethics and new, hyped technologies deployed with wild abandon all around us. The outcome was perhaps entertaining (in a somewhat Pollockian manner) on the surface. However, it was primarily a call for action for the RDA community. It demonstrated the urgent need to ground the ethics/AI/societal impact/etc. concepts and terminology coherently. Developing shared moral guidelines for data and ethics requires a common vocabulary and conceptual model covering the relevant technologies and ethical principles.
One of the pleasant surprises of the event was mainstreaming the ethics issues in many – perhaps even the majority – of the disciplinary discussions. The plenary programme contained only four occurrences of the word “ethics” (in the descriptions of one breakout session and the closing plenary). However, issues around ethics and trust were seen as relevant or even crucial in the contexts of earth observation or disaster risk reduction, too. There seemed to be a broad consensus that understanding the basic ELSA/ELSI (Ethical, Legal, and Societal Aspects/Implications) issues should be an integral part of the competencies of a data scientist.
Getting into the RDA Sessions: We've chosen a small selection of technical and social-related sessions below. Please refer to the end-of-days emails for a deeper insight as well as the RDA Group pages for the invididual session rolling notes.
The session FAIR Metadata for Machine learning session on 25 October was worth attending. The group is mobilising to do the following; solving the problem that there is no systematic approach as to how we work with machine learning and algorithms. If the community can define the processes and map the FAIR principles to them we are closer to solving how to integrate FAIR into machine learning. The sessions gave clear demonstrations around harvesting metadata from different sources, and make it comprehensive and reproducible. Lessons learned were: better metadata on provenance is need, technical scripts are often missing. The session discussed how exactly Machine Learning Algorithms can be interpreted in different ways. Format definition could be a concrete start to solving the issue and there were lots of discussions and ideas, next stop is to synthesise it all and head to a possible set of outputs, a continuation of the IG or a potential WG.
The session on “National and Institutional Approaches” on 26 October highlighted Australian, Dutch and Swiss approaches to different research repository services. Many innovative approaches to creating an infrastructure to support local RDM, including data stewards: a vital part of championing data services. An emerging concept “Data mesh” unifies silos by creating a community that connects to other communities who wish to share data with those who wish to use it. Institutions are not aware of their data outputs. One issue was “the complete lack of absence of measurement of research data” and many presentations covered the issue of actually measuring research outputs. By actually going to researchers students can capture their own outputs and then deposit them, a fascinating example set out by Ingrid Dillo and Herbert van der Sompel. So much can be achieved by automatically collecting this and enriching metadata can be automatised: there is huge support that repositories can both support academic careers and speed up FAIR implementation.
The rapid development of AI applications and Large Language Models (LLMs) was the area where the ethics issues came into sharp focus. The session of the Artificial Intelligence and Data Visitation (AIDV) Working Group on Tuesday 24 October provided an in-depth review of the in the context of science. In general, it seemed that the RDA community had a heightened appreciation of the opportunities of the AI/LLM in supporting research and making the results more accessible. At the same time, the awareness of the challenges and risks is getting more nuanced and comprehensive. In the research domain, providing ethics committee members with the knowledge and support needed to determine the ethical implications of the research proposals was seen as a promising decision point in the research processes of many of the disciplines that could be used to increase the impact of ethics in the research. As an example of why these new ethics committee resources would be needed, it was noted that LLMs require rethinking what constitutes informed consent. However, the ethical issues around open data, AI and LLMs are not limited to science and research. The discussions in the AIDV session brought up issues around the use of AI in law enforcement settings and how bias in the training material can have grave real-world consequences. It was also noted that tight schedules and financial constraints mean that developers of the software solutions behind commercial ventures will need to focus almost exclusively on “getting the product out of the door”. This means that ethics assessment will, in many cases, need to be performed without access to in-depth technical knowledge of the AI/LLM models used.
If you would like to catch up on the key takeaways from the RDA Plenary sessions at IDW, don't miss the webinar on 16 November
All RDA plenary meeting breakout sessions were recorded and will be available to all from 7 December. Keep an eye on the full programme
Looking forward - 2024 and 2025
The event finished with a bang - or bangs. The major community events of 2024 were announced: Mark your Calendars! The next RDA in-person Plenary will be in Costa Rica on 12-14 November 2024
And we hope you know by now why the Koalas made it around IDW2023 - to welcome you to the International Data Week 2025 in Brisbane: 13-16 October for the next edition!