Galvanising the Commons - Building Global Bridges for Commons and Disciplines
On 20 March, EOSC Future and RDA held a collaborative workshop in Gothenburg as a satellite event to P20. Over 300 people attended in person, 120 online. The agenda was packed and the discussions lively. This blog presents some highlights and key takeaways. Recordings and links to presentations can be found on the event agenda page.
The three themes covered during the day were: interoperability, discoverability and engagement. EOSC Future was present throughout, and many of the EOSC Future RDA awardees presented their work in the EOSC context under the three themes.
Making global connections - the morning session
Sarah Jones made most of her short visit to Gothenburg and was tireless in her role as the MC of the event. She welcomed the first speaker, Hilary Hanahoe, who gave the welcome presentation and demonstrated the importance of interoperability by using slides where 15% of the text was invisible...
Hilary reiterated the RDA vision of researchers and innovators openly sharing and re-using data across technologies, disciplines, and countries. She also demonstrated the progress the RDA community has made in building social and technical bridges that enable sharing and reuse of data. The presentation focused on the structure, working practices and values of the RDA community, such as openness, technology neutrality and inclusiveness. RDA therefore provides the natural forum for developing research commons where researchers can seamlessly access trusted research outputs and services.
Karen Payne, co-chair of the GORC IG and WG followed up with a comprehensive overview of the work that the RDA GORC WG is doing focusing on looking at national, domain and pan-national initiatives and identifying common features or attributes of Commons organizations, including common ideas about governance, AAI, and discoverability. Many of these themes resonate with what EOSC Future is tackling. Karen particularly praised the onboarding processes of EOSC, and their schema to catalogue resources - which the WG examined in the international environment. In their findings, training and support is less represented across all research commons, and largely tends to be in-house training. One particularly good example is the Malaysian national training programme.
Interoperability and standards are the common factor
The WG identified hundreds of potential features of Commons, most of which were associated with interoperability and standards. The features that occured most often in the literature were associated with metadata, specifically issues around federated metadata, provenance and licensing. These are important to all repositories, and could provide good use cases to create a global set of interoperable and integrated Commons. The IG will be building on the results of the WG and creating a roadmap for Commons development.
Image courtesy of the RDA GORC IG
For more information on the current activities of the GORG Interest Group, please see, e.g., the call for input related to the definition typology elements.
A keynote by Hans Pfeiffenberger, “The Elephant in the room: The Role of Big Commercial Cloud Providers", presented us with thought-provoking insights into researchers’ habits and how taking the most user-friendly route will always prevail. Hans’ talk contextualised in practical terms researchers’ day-to-day challenges, such as link rot, and global DOIs that don't resolve. These day to day problems can often be solved by commercial providers, and it is the stark reality that researchers use whichever service is easiest. Moreover, they may adopt repositories such as Zenodo with less rich metadata than, for example, disciplinary databases as it is easy to use and quick to access.
In addition, he declared that “Researchers don’t care about service providers or metadata” as long as the impact factor influences and drives research practices. Giving some tips for EOSC, Hans said that science commons, as they are built and adapted for researchers’ needs, should “.... look at the productivity of scientists” and really drill down into their workflows and requirements. Another aspect is that EOSC has been created over the years to match funding resources, naturally limiting how it addresses researchers’ needs. For national players and funders, there is a real need to fund data management operations that are long-term and embedded within the institution.
A pre-lunch Panel session: Big picture challenges faced by the global research data community and big picture solutions
The five panelists set out their view for the main challenges faced and led a thought-provoking discussion. Some of the challenges - or lessons learned - discussed were as follows:
- Impact takes time to see: Very often the infrastructure we manage in the commons is there to make a difference for researchers, yet the impact is downstream so the results have an impact on the world and this takes time to filter through, decades in fact.
- Cross-disciplinary research: Commons need to have a concerted effort to reach out to researchers in specific disciplines, and in particular to those who actually leverage the data for cross disciplinary research, for example marine researchers working with humanities scholars. Knowing which data needs to be combined across disciplines allows us to develop targeted crosswalks for interoperability. Above all It has to be a collective effort, which “encourages researchers to be part of the community”. Javier Albacete of the EC highlighted the importance of incentives for EOSC with a clear need to meet bibliodiversity.
- It takes a mix of ingredients: One strand of conversation was about how diversity is important in a knowledge commons, for example metadata in different languages, and how more often than not it is primarily developed countries that set the standards, which is unbalanced. Different data types require curation and harmonization
- Coordination: Commons initiatives differ wildlyin terms of the scale, different policy frameworks and stages of development. There is a need for national roadmaps to tailor the approach to the specific context and flagged that the cost of infrastructure is a challenge.
- How to deal with commercials: a good question was whether it was fair that commercial companies are “soaking up the gravy” of researchers hard work. Panellists agreed and flagged that we should be working with private providers on our terms, e.g. ensure they commit to our ideals of open standards and academic access. It was indeed noted that the “private sector can amplify the value”
- Be inclusive: There was a fascinating discussion around the global south and whether EOSC should be inclusive of the global south as it develops. In an ideal world these Commons initiatives would have a levelling effect, ensuring equity of access and opportunities to researchers regardless of the country or institution in which they are based.
Some of the conclusions of the panel
We should not be prescriptive about Commons architecture - they can either be centralised or federated - what is important is that they provide certain key elements and that they connect to each other via consistent standards. No ‘one-size-fits-all’ applies to the Commons and we don’t need benchmarks. RDA can help to build those connections by providing a recipe where we lay out the types of ingredients needed. As it was highlighted in this analogy, “it is ok to mix spice and miss other items”. Important elements in Commons are that they address skills, metadata practices and apply mechanisms that build trust.
Commons should put more resources into data curation than at present. This is the cornerstone of a rich and productive commons, with resources accessible and FAIR for all. Commons should also consider roadmaps for working with industry.
Commons, such as EOSC, will work if the researchers lead them. However, until researchers are rewarded for open science, engagement will remain low. This issue is compounded by the sustainability challenge: basically, short-term funding doesn't work. The level of investment compared to major science funding is low and initiatives such as EOSC should have a stable funding environment.
It is critical to collaborate globally: EOSC and others should connect to the global south and other regions as the incentives are clear for all: it accelerates socio-economic development. We should have global-level discussions in the framework of RDA to develop commons in countries where they are less developed to build locally, leverage all competencies and provide the necessary capacity. UNESCO has already laid out some clear principles as a global community for us to follow and to bridge the gaps.
A question of whether we should come up with a commons coordination set of attributes was also raised. As we have FAIR, CARE, can we apply this to a commons? We need to emphasise the core values of research commons, and once that message is expressed coherently and consistently, it can trickle down and facilitate coordination.
The afternoon kicked off with the Interoperability session, chaired by Carole Goble. The five lightning talks presented specific aspects of interoperability addressed by EOSC Future funded projects and other initiatives. The lightning talks included were:
- "Harmonisation and alignment between vocabularies for interoperability", Magdalena Szuflita-Zurawska, (Gdansk University of Technology)
- "How FAIR DO's Enable Interoperability, a position from the RDA FAIR Digital Object Fabric IG", Dr Rainer Stotzka, (Karlsruhe Institute of Technology)
- "Interoperability and FAIR in disciplinary repositories using RDAs TRUST principles, based on the RDA grant to Repopsi", Aleksandra Lazic, (University of Belgrade)
- "Vocabularies for cross-domain interoperability", Chris Schubert, (Vienna University of Technology Library)
- “Telling stories of convergence—on shared interoperability practices across infrastructures and global data communities in the life sciences”, Wolmar Nyberg Åkerström, (ELIXIR Sweden / NBIS - National Bioinformatics Infrastructure Sweden)
The Q&A session covered numerous issues, with questions from both on-site and remote participants. As an example of a specific question, the relationship between FAIR Digital Objects and micropoblications was discussed (with the former being seen as a more general PID + an object attached to it). The best approach to take in case the resources are limited (“80-20 approach”) was discussed in some detail. In general, this was deemed a challenging issue, and the wide variety of answers illustrated this:
- Building the right kind of team
- Be transparent about everything and allow the solution to grow.
- Take calculated risks in terms of having to re-implement components or refactor the architecture.
- Choose the right kind of vocabulary that fits the repository.
- Pick concrete issues to solve and looking into existing solutions and best practices
The tensions between the technology push and the involvement of the scientists and the different levels of interoperability (from syntax all the way to legal issues) were discussed. Language choices also influence interoperability. For example, the letter and interpretation of laws differ between countries - an interoperability issue that can be difficult to detect until translation efforts are taken. Reforming the whole science to facilitate accurate translations between languages and disciplines was seen as the theoretical solution. However, allowing free choice of user-defined keywords to provide some redundancy would most likely be a more pragmatic and feasible solution.
The Discoverability session that followed was chaired by Louise Bezuidenhout and consisted of three short presentations:
- "Crosswalking to Schema.org", Leyla Jaeol Castro, (ZB Information Centre)
- "Enhancing Generic Data Descriptors with Discipline Specific Metadata", Dr Vaidas Morkevius (Lithuanian Data Archive)
- "European Research Infrastructure solutions", Daan Broeder (CLARIN)
The discussion covered combining metadata schemas, which was deemed possible - as long as it was possible to determine which index to use in specific cases based on the purpose. The practice is relatively common in life sciences. The availability of open source tools for metadata management was deemed to be a not fully resolved question, but the SSHOC portal was mentioned as one of the potential starting points. Also, the RDA metadata registry contains conversions between metadata standards. FAIRCORE4EOSC is also planning to create a registry of crosswalks. In closing, the responsibility for the metadata should be shared between researchers and infrastructure providers - similar to submitting bibliographical data when submitting a publication. Essentially this would normalise (rather than incentivise) the provision of metadata.
Pathways to Community Engagement
The final session of the day was the panel "Pathways to Community Engagement, Showcasing the RDA/EOSC Future Domain Ambassadors and RDA's work on Engaging Communities in EOSC” chaired by Najla Rettberg. The panel consisted of the following Domain Ambassadors:
- Dr. Sofie Meeus, (Meise Botanic Garden)
- Dr. Lina Sitz, (LEDataS)
- Dr. Helene N. Andreassen, (UiT The Arctic University of Norway)
- Dr. Marek Cebecauer, (J. Heyrovsky Institute of Physical Chemistry of the Czech Academy of Sciences)
- Prof. Laura Morales, (Fondation Nationale des Sciences Politiques)
- Beth Knazook (Digital Repository of Ireland)
The panel discussed barriers to engagement. Beth Knazook cited her research activities in the EOSC Future project, where the issue was that the Commons (EOSC, in this particular case) was not fully understood. Questions about the mechanisms to access the services and the added value of the services themselves were common among the respondents. The shared infrastructure and common tools used in “Big Data” collaborations have made adopting FAIR approaches mainstream in their communities. The “Small Data” collaborations lack these constraints and incentives. Sofie Meeus reported encountering a lack of awareness of EOSC when discussing its nature among her colleagues. Misconceptions about services provided and the (perceived and real) European focus created additional challenges. Helene N. Andreassen emphasised that linguistics has a lot of variations in terms of tools and approaches, requiring an approach tailored to specific audiences or sub-disciplines.
Helene also noted that the lack of time and ingrained habits of the researchers that are not EOSC-related make it difficult to engage with them. If there is a solution that works, it is hard to convince researchers to invest the time and effort to look into alternatives. Laura Morales echoed these observations: open science is well-known and appreciated. However, EOSC is relatively unknown and seen as very complex - except for users of research infrastructures that are already linked to EOSC. Marek Cebecauer suggested adding trust as a crucial aspect - the trust exists within a field, but broadening this beyond discipline or outside Europe would require a more hub-like approach. A fundamental question from the audience that triggered lively discussion was whether scientists should understand EOSC. The perceived complexity of the system was noted as a major barrier on several occasions.
The role of the ambassadors was seen as crucial in picking up the relevant resources and services for a particular researcher - “EOSC will be different things for different people”. However, to make the engagement strategy scalable and less dependent on the ambassadors, the interfaces and authentication mechanisms need to be simplified. It was also noted that EOSC should be a way to showcase research and results; make it available to a much broader audience. Spreading the word also about this aspect of engagement is crucial. Cross-disciplinary aspects of ambassadorship were touched upon, cross-pollination across disciplines seen as bringing considerable benefits.
The final question presented to the panel was: what would be the first low-effort step for using EOSC? The role of the ambassadors - or other first movers - is crucial. Demonstrating that EOSC contains relevant, useful resources for the community requires evangelism. However, Laura Morales recommended that anyone interested should invest some time to browsing the EOSC portal and catalogue to get a sense of what kind of resources are available.
The challenging task of wrapping-up the themes and discussions fell on Karen Payne. Some of the key things she raised were:
- Hilary was prescient this morning in her opening remarks - the need for regional interaction remains a key area for Global research commons.
- Hans Pfeiffenberger’s presentation presented the challenge of the research community in the face of Tech Giants in a very clear way
- "May all your problems be technical." was one of the better quotes (seen on Patricia Herterich’s tweet)
- The glue to a successful Commons is interoperability. However, to achieve full interoperability, all science should be reformed.
- Discoverability presents different challenges in general and community-specific contexts. Engaging researchers in building interoperability (e.g. by improving metadata) would also require reforming science itself!
Finally, a personal example is very powerful in engaging different communities to build the global research commons. It was referred to as a low-effort step, but I think this is the very essence of how we can build our community. It isn’t low effort - it is a commitment of time, which is an act of love and respect!
Pictured above: a group of EOSC Future awardees and the support team members participating in the event.
Report drafted by: Matti Heikkurinen, Karen Payne and Najla Rettberg