You are here


1. Working Group Context

One of the major challenges of data-driven research is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of data and their associated research objects, e.g., algorithms, software, and workflows. To address this, an initial effort to define a "DATA FAIRPORT” began in 2014 at the Lorentz workshop and transitioned into developing a set of ​FAIR Data Guiding Principles in 2016. The FAIR data principles strongly contribute to addressing this challenge with regard to research data. The principles, at a high level, are intended to apply to all research objects; both those used in research and those that form the outputs of research. Here we focus on the adaptation and adoption of the FAIR principles for VREs (Virtual Research Environments).


Digital objects such as data, software and workflows cannot be made FAIR in isolation - digital infrastructure is needed to store, manage, analyse and share the digital objects, and to make them discoverable. 


Virtual research environments (VREs), also called science gateways, research platforms or virtual labs, are increasingly used as the vehicle for collecting or generating digital objects, processing, analysing, annotating and visualising these, then sharing the research outputs. VREs are defined here as a broad category of digital research infrastructure that consist of a set of online services, often with associated integration and/or orchestration functions and connections to specific data, software, workflow, storage and compute resources. One of the main drivers for creating VREs is to lower or remove the barrier for researchers in accessing datasets, performing complex analyses, and sharing their workflows to encourage reproducibility. 


How infrastructure such as a VRE is developed, and the functions it supports, therefore have a large impact on the FAIRness of digital objects themselves. VREs should enable FAIRness in the digital objects that they create or produce, and at the very least should not make digital objects that they process less FAIR. 


VREs themselves should also be FAIR, in that they should be easily discoverable and accessible; should interoperate with other digital research infrastructures; and their technical architecture, components and services should be reusable to improve development efficiency. 


While some of the high-level FAIR principles as applied to data can be directly applied to VREs, others are not applicable. Likewise, the recently developed FAIR principles for Research Software do not cover all of the aspects of VREs. The application of the FAIR principles is now also being considered for other areas, including workflows, machine learning, artificial intelligence (AI), and skills and training.


The FAIR4VREs WG will enable coordination between existing communities working with VREs, science gateways, platforms and virtual labs, to define what it means for VREs to be and enable FAIR, and provide guidance to VRE developers in achieving this.


The working group will:

  1. Investigate how the existing application of the FAIR principles to data, software, workflows, computational notebooks, training materials, AI and machine learning enable VREs to enable FAIR digital objects, and themselves be FAIR, and identify any gaps in the existing work.

  2. Produce guidance on and examples of how VRES can and should be FAIR.

  3. Produce guidance on and examples of how VREs can and should enable FAIRness for other digital objects.


As the working group explores the FAIR ecosystem with respect to VREs, the exact format of the output(s) will be defined. Rather than producing a set of principles specifically for VREs, the output(s) will be another connecting piece in the FAIR toolkit between principles and implementation as they apply to VREs.


2. Value Proposition

The FAIR4VREs WG​ will identify how VREs can and should be FAIR and also enable FAIRness for other digital objects, and produce guidance for VRE developers on making their VRE both FAIR and FAIR-enabling. 


This work will be of value to digital infrastructure developers and managers; researchers and other users of research data; research software engineers and researchers who develop tools; repository managers; policymakers who are responsible for defining digital policies; funders of research; and others with an interest in the FAIR principles for all research outputs.


3. Engagement with existing work in the area

Activities outside RDA

There are specific activities that are underway through the Science Gateways Community Institute (SGCI, US), the Australian Research Data Commons (ARDC, AU), and the European Open Science Cloud (EOSC, EU). All three of these organisations are working through the implications of how to apply the FAIR principles to a range of digital objects.


In Australia, the ARDC has developed, or is developing guidelines relating to the FAIRness of:

  • Data

  • Software

  • Services

  • Platforms (equivalent to VREs).

In the US, SGCI is developing guidelines relating to the FAIRness of: 

  • Science Gateways (equivalent to VREs)

  • Research Infrastructures

In the case of software, the principles developed by the FAIR4RS WG will be adopted by the ARDC, as will the adoption guidelines and implementation plan being developed.


In the case of services, the relevance of the work undertaken by the FAIRsFAIR project (M2.10 Report on basic framework on FAIRness of services) will be considered as part of the development of the outputs of this WG.


In the case of VREs, the ARDC and SGCI propose to lead international guideline development, building on the successful approach taken by the FAIR4RS WG.


FAIRness of research infrastructures combines aspects of FAIRsFAIR project and the SGCI Tech Summit (


RDA groups

FAIR4VREs WG will also engage with a range of other RDA WGs whose interests overlap, and ensure that the work of the WG aligns with the work of the other groups. There is a commitment to engage with co-chairs of relevant RDA groups and members of the following groups are invited to regularly check our public updates via the mailing list archive​.

  • VRE IG​ seeks to build the required technical bridges, skills and social communities that enable global sharing and processing of data across technologies, disciplines and countries through the creation of shared online virtual environments. As these individual VREs grow, inevitably they need to also connect with other major research infrastructures. This FAIR4VREs WG is sponsored by the VRE IG.

  • FAIR4RS WG is enabling coordination of a range of existing community-led discussions on how to define and effectively apply FAIR principles to research software, to achieve adoption of these principles. FAIR4VREs will incorporate the outputs of FAIR4RS. 

  • FAIR data maturity model​ WG - working on RDA recommendations for a common set of core assessment criteria for FAIRness and a generic and expandable self-assessment model for measuring the maturity level of a dataset.

  • CURE FAIR WG​ ​- f​ocusing on the reproducibility aspect of FAIR for data and code.

Other potential groups to coordinate with include: ​Go FAIR IG​, ​Exposing Data Management Plans WG​, ​Active Data Management Plans IG​, ​Research Funders and Stakeholders on Open Research Data Management Policies and Practices IG​, ​Research Metadata Schemas WG​, FAIR in AI BOF (exploring an IG), and ​Global Open Research Commons IG​.


4. Work Plan

The FAIR4VREs WG​ will identify how VREs can and should be FAIR and also enable FAIRness for other digital objects by investigating how the existing applications of the FAIR principles to digital objects translate to VREs, and identifying any gaps in the existing work.


Guidance will then be developed for VRE developers on making their VRE both FAIR and FAIR-enabling. 


Note that while the group is interested in FAIR metrics and indicators for VREs, this is not in scope for this WG. We suggest this could be an activity for a follow-on group. We also like to emphasise that with the early definition of co-chairs to coordinate the work we believe the deliverables are attainable in a timeline of 18 months from endorsement.


4.1 Milestones and Deliverables

Milestone and description of work


Due date

Define scope of work

Identify issues in the application of FAIR to VREs based on preliminary analysis of existing definitions and frameworks.

The scope would identify commonalities and differences, and thus key questions for the community to engage.


A review document outlining the issues that need to be addressed in defining FAIR for VREs


0-4 months

Initiate consultation with the community (including identification of who is in this community, engagement with co-chairs of relevant RDA groups).


0-4 months

Finalise work plan for identifying any gaps in the existing FAIR work, including identification of use cases, and community consultation.

Revised work plan, collection of use cases and strategies for community consultation.

+4 months

Finalise analysis of gaps in existing FAIR principles for VREs.

Document identifying how VREs can and should be FAIR, and enable FAIRness for other digital objects

+9 months

Crowdsource case studies on how VREs contribute to FAIRness of other digital objects and examples of guidance materials

Collection of case studies​ on how to achieve a FAIR and FAIR-enabling VRE

+12 months

Review work plan to determine if further guidance materials for VRE developers are required, and create as needed.

Updated work plan

+12 months

Draft adoption plan, and coordinate community activities to collect adoption examples and test guidance materials.

Plan for adoption of guidance, including identification of adoption examples

+12 months

Presentation of Recommendation at Plenary

Presentation summarising application of the FAIR principles for digital research objects to VRE development and use, including implemented examples

+18 months


4.2 Working Group Operations

In addition to meeting in-person or virtually at Plenaries, we will have two or more formal calls in between the Plenaries and share information via a mailing list. 


Documents will be created and made public through Google docs and GitHub. This allows for collaborative work and also serves as a form of communication. Those individuals actively working on outputs will have ad-hoc virtual meetings as needed (at least monthly). Trello and Github will be used for planning and tracking group deliverables.


4.3 Addressing Consensus and Conflicts

The WG will adhere to the stated RDA Code of Conduct and will work towards consensus, which will be achieved primarily through mailing list discussions and online meetings, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed. 


The co-chairs will keep the working group on track by reviewing progress relative to the deliverables. Any new ideas about deliverables or work that the co-chairs deem to be outside the scope of the WG defined here will be referred back to the VRE IG to determine if a new WG should be formed.


4.4 Community Engagement

The FAIR4VREs WG will provide a range of ways for community members to engage at any of the three levels:

  • Co-chairs. The Co-chairs are responsible for leadership of the WG.

  • Working and feedback cohort. Community members can choose to engage with the WG by providing feedback at their preferred pace via the ​WG RDA space​, subgroup activities and the WG ​GitHub repo.​

  • Advocates. Those who can play a key role in endorsing and promoting the outcomes of this group.

All community members will receive regular updates through the ​RDA email list​. The email list will facilitate collaborations through invitations to webinars, collaborative documents, surveys, etc. The WG will organise dissemination about the activities and findings and gather community feedback regularly during all the phases of the work. All documentation produced by the group will be publicly accessible via collaborative documents.


5. Adoption Plan

The WG will create an adoption plan for distributing and maintaining the deliverables. A specific plan will be developed to facilitate adoption of the WG Recommendation within the organisations and programs represented by WG members. This will include strategies for adoption more broadly within the global VRE community.


The WG will aim to recruit other potential adopting organisations early in the development process and ensure that their perspectives inform the guidance, and adoptions would ideally start within the 18 month timeframe before the WG is complete.


6. Initial Membership

The initial membership of this group will be drawn from the RDA VRE IG. Communications around the formation of this WG will continue to promote membership to the wider community. Active participation as a co-chair or working group member will be encouraged as the work plan is further refined.


The initial co-chairs of the ​RDA FAIR4VREs WG​ are listed below.


First name(s)

Last name(s)







Australian Research Data Commons





SGCI, Discovery Partner Institute








Sánchez Mondragón




Leyla Jael 

G. Castro


ZB MED – Information Centre for Life Sciences



Contributions to the Case Statement

Kerry Levett, Andrew Treloar, Sandra Gesing, Michelle Barker.



DTL. “Data FAIRPORT”. 2014. Available at

Koers, Hylke;  Herterich, Patricia;  Hooft, Rob;  Gruenpeter, Morane;  Aalto, Tero 2019. “M2.10 Report on basic framework on FAIRness of services” from Fostering FAIR Data Practices in Europe project (FAIRsFAIR). 10.5281/zenodo.4292599 


Kok, R. 2014. “Data Fairport: enabling global exchange of research data,” International Innovation, no. 131, pp. 98–101, 25 Feb. Available at


Lamprecht, A.L., Garcia, L., Kuzak, M., ​et al.​ Towards FAIR Principles for Research Software. Data Science 3 (2020) 37–59 37. IOS ​Press.​ ​  


Wilkinson, M., Dumontier, M., Aalbersberg, I. ​et al.​ The FAIR Guiding Principles for scientific data management and stewardship. ​Sci Data​ ​3, ​160018 (2016).

Review period start:
Wednesday, 11 August, 2021 to Saturday, 11 September, 2021
Custom text:

Case Statement: National PID Strategies Working Group


WG Charter

The existing RDA WGs and IGs linked to PIDs tend to focus on technical challenges, updates from specific PID providers on their activities and the state of the art, or on discipline-specific needs or challenges. The National PID Strategies WG would explore how PIDs form part of national policy implementation frameworks. There are systemic and network benefits from widespread and consistent PID adoption, and funders, government agencies, and national research communities have created PID consortia or policies (including mandates) in pursuit of these benefits.


At the RDA Plenary 17 a Birds of a Feather session examined six case studies of national PID strategies and frameworks, looked at commonalities and divergences between them, and assessed the potential benefit of collaboration and alignment in the development and implementation of future national initiatives.


The consensus from this BoF was that a National PID Strategies WG should be formed with the objective of mapping common activities across national agencies/efforts and reporting on the specific PIDs adopted in the context of national PID strategies.


Commonalities already exist across the example case studies, such as a national PID policy, a coordinating network/group developing roadmap and policies, similar PIDs being prioritised in national infrastructures and ORCID/DataCite consortia being common. These will form the basis for discussion within the WG and input from other countries will be sought.


The WG will enable coordination and community discussion to deliver the following:


  • Coordinate and align different national PID strategies and bring together PID experts to support the group
  • Mapping common activities across national agencies/efforts and a report on the specific PIDs adopted in the context of national PID strategies
  • Agreed PID categories and defined common metadata and standards for PIDs
  • A minimal set of PIDs for international interoperability
  • Example ideas on governance and common workflows
  • The benefits from having a national PID strategy and adopting priority PIDs and the investment requirements.


Having a WG looking at national PID strategies would provide an opportunity to promote international PID systems rather than isolated national systems, avoid replication of PID development, and exchange experience of national-level PID coordination and map needs.


Value Proposition

The value of PIDs and the underlying metadata associated with PIDs have been recognised across the globe. A number of countries have developed roadmaps, workflows and strategies to encourage integration and adoption of some of the key PIDs (priority PIDs in some countries) to support open access and open research. This WG will coordinate and align different national PID strategies and bring together PID experts to support the group. The work will be of interest to researchers, research managers, research information managers, research administrators, publishers, PID providers, funders, policy makers, repository managers and vendors, and institutions.


The WG will provide various ways for the community members to engage. Everyone who signs up to the group will receive regular updates through the ​RDA mailing list​. These will encourage collaboration and participation in online discussions, group meetings, collaborative documents, etc. All documentation produced by the group will be publicly accessible and open for feedback.


Engagement with existing work in the area

The National PID Strategies WG emerged from the BoF session at the RDA Virtual Plenary 17. This highlighted case studies from the UK, Netherlands, Finland, Canada, Australia and South America (Brazil and Peru). This WG will bring together representatives from these countries who are currently working on developing and implementing national PID strategies. It also provides the opportunity to encourage contributions from other countries, although the outputs will benefit those countries who are yet, or are planning, to develop their own national strategies. The organisations represented from the above countries include Jisc, ARDC, Research Data Canada, CSC, and SURF. As part of our PID work we engage with organisations such as ORCID, DataCite, Crossref, ROR, British Library, UKRI, EOSC, FORCE11, STM Association, Knowledge Exchange (and others identified in the initial planning phase) and would encourage their participation in the WG.


This WG will be relevant to the existing Persistent Identifiers Interest Group (PID IG), where we have previously shared updates, but will be focused on specific challenges in the national context: multi-disciplinary, policy-driven PID engagement and integration, for example. As such, it will complement the PID IG’s blend of project and provider updates and cross-PID community thematic discussions with an emphasis on specific implementation and coordination challenges.


Other RDA groups to coordinate with include Global Open Research Commons (GORC) IG, SHAring Rewards and Credit (SHARC) IG, PID IG, Persistent Identification of Instruments WG, FAIRsharing Registry: Connecting data policies, standards and databases RDA WG, RDA/WDS Scholarly Link Exchange (Scholix) WG, Research Funders and Stakeholders on Open Research and Data Management Policies and Practices IG, Data policy standardisation and implementation IG.


Engagement Plan

The National PID Strategies WG aims to engage with a number of parties, including PID providers, publishers, PID infrastructure providers, funding agencies, HE institutions and sector bodies.


The National PID Strategies WG plans to organise a WG session at each RDA Plenary during the WG’s lifetime. It’s hoped that the first session will be the RDA Virtual Plenary 18 when it’s likely that the WG status will be pending. The plenary sessions will be the main face-to-face meetings in which to provide updates, make decisions and discuss the actions over the six-months period. The WG will also present its progress in the RDA PID IG session at each plenary session, if accepted. Individuals may also utilise regular conferences to promote the group and encourage participation.


Work Plan



The National PID Strategies WG will first of all map common activities across national agencies/efforts and deliver a report on the specific PIDs adopted in the context of national PID strategies. It will seek community input to agree on PID categories and define common metadata and standards for PIDs, define a minimal set of PIDs for international interoperability, and provide example ideas on governance and common workflows. The report produced is a recommendation that can be adopted or adapted by other countries looking to develop their own national PID strategies. By following the recommendations it will encourage standardisation internationally.



Assuming the WG is endorsed in time for the RDA Virtual Plenary 18, the WG will kick-off at this event. This will allow us to encourage participation in the WG and consensus on its core objectives. Each subsequent Plenary during the WG’s lifetime (M1-M18) will be a milestone as we will provide updates on the WG’s progress and review the plan.


0 - 6 months - map common activities across national agencies/efforts and deliver a report on the specific PIDs adopted in the context of national PID strategies.


0-12 months - seek community input to agree on PID categories and define common metadata and standards for PIDs, define a minimal set of PIDs for international interoperability, and provide example ideas on governance and common workflows.


12-18 months - complete community consultation and finalise agreement on report, national case studies, PID categories, common metadata and standards for PIDs, and minimal set of PIDs.



The National PID Strategies WG will meet virtually or physically (depending on restrictions) at RDA Plenary meetings (P18, P19, P20 or P19, P20. P21 if not endorsed in time for P18) during its 18 months lifetime. These meetings will serve as a platform to present and discuss progress, address and resolve open issues, and to plan the following six-months phase.


The WG will also hold monthly calls for the whole group. These will allow the group members to check on progress against the plan, update the community and obtain input and feedback.


Consensus and Conflicts

The National PID Strategies WG plans to develop consensus by encouraging and ensuring participation in online monthly meetings, as well as the larger plenary meetings. Input will be provided by the initial six case studies, which will be used to define PID categories and define common metadata and standards for PIDs. Through community consultation it will ensure that the work is not restricted to these countries but is open to a wide and diverse group as possible.


If consensus cannot be reached the WG will vote on issues and adopt a weighted multi-vote approach whereby each member can cast up to three votes, from most to least preferred.


The co-chairs are committed to keep the development on track and within scope. We will also undertake regular retrospectives where we assess progress against milestones and review the plan for the next phase.


Community Engagement

The working group case statement will be disseminated to RDA mailing lists, relevant PID communities and stakeholders, PID providers, funders and publishers to ensure a diverse, international and multidisciplinary membership. As well as engaging via the RDA, there are other groups and events where the PID community come together. Members of this working group are active within these and well placed to ensure the activities of this working group are brought to the attention of an international community, as well as ensuring input into the working group comes from these multiple sources. Engagement will not just be focussed on the global north and we have already included strategies from South American countries, but we will ensure there is engagement and input from other countries from the global south.


Group members are already involved in events such as PIDapalooza, sit on community steering groups for PID providers and engage with international PID-related work. We will work to ensure the membership of the group is increased to include as many members of the PID community as possible, as well as including those that may not yet have participated in such activities and groups. We will engage with EOSC, in particular the PID policy work to ensure the working group’s activities are aligned with the European PID landscape.


Adoption Plan

The primary deliverable from the working group is the report produced from converging multiple national PID strategies that can be adopted or adapted by other countries looking to develop their own national PID strategies. The working group will actively engage with PID providers, funders, publishers and relevant groups throughout the lifetime of the group in order to communicate progress and obtain feedback.


As well as encouraging countries to adopt national strategies for PIDs, part of the group’s work is to increase the adoption and use of priority PIDs. This group will identify commonalities between existing strategies and those PIDs identified as a priority, but these need to be adopted internationally. PIDs have a vital role to play in the transformation of the research communication system. However, the challenges of achieving consistent, reliable PID adoption, integration and coverage are substantial. By having a set of priority PIDs and implementing national strategies, other countries can target solutions to research challenges, and ensure that the potential benefits of PID usage can be delivered and demonstrated to the research sector. This will bring savings to the sector as identified by the recent cost-benefit analysis report from Jisc.


Initial Membership

During the National PID Strategies BoF at the RDA Plenary 17, 41 people signed the shared document asking if they would like to be kept informed about future discussions, plans and any proposal for an Interest/Working Group. Of these, 9 people indicated that they would be interested in participating in a Interest/Working Group. Communication about the formation of this Working Group will be shared with the RDA and the wider community.


First Name(s)






























Parland-von Essen









The British Library











Vials Moore











Gül Akcaova SURF Netherlands


Working Group Chairs

The co-chairs are Natasha Simons (ARDC, Australia) and Christopher Brown (Jisc, UK).


Review period start:
Monday, 9 August, 2021 to Thursday, 9 September, 2021
Custom text:

RDA Metadata Interest Group Charter Statement (200words)         Keith G Jeffery 20130729


Due to confusion over the definitions/roles of WGs and IGs we have both a WG and an IG with essentially the same remit – metadata standards / directory of metadata schemas.


After discussions, it is proposed that the Metadata IG is wide-ranging and long-standing and acts as an umbrella coordinating over the WGs concerned with metadata (Metadata Standards, Contextual Metadata, likely WGs on particular subject domains such as agriculture, marine…) which have time-limited task-focused.


The Metadata IG will concern itself with all aspects of metadata for research data.  In particular it will attempt to coordinate the efforts of the WGs concerned with metadata to produce a coherent approach to metadata covering metadata modalities of description, restriction, navigation, provenance, preservation, and the use of metadata for the purposes of discovery, contextualisation, validation, analytical processing, simulation, visualisation and interoperation.  It will also liaise with the other WGs especially Data Foundation and Terminology, PIDs, Standardisation of data categories and codes and Data Citation.  This IG activity relates to data management policies and plans of research organisations and researchers, and to policies and standards of research funders and of research communities which may or may not be official standards.

The metadata IG will organise itself through online meetings and face-to-face meetings of members of the IG present at RDA Plenary events.  It is proposed that – while membership is open to any RDA registered member – key members will be the leaders of the WGs concerned with metadata.  In order to get the renovated IG working, I volunteer to initiate this activity but would expect elections and handover to someone else after an initial period.


Review period start:
Sunday, 8 August, 2021
Custom text:

The Interest Group on Agricultural Data (IGAD) came to life in Gothenburg (Göteborg, Sweden) in 2013 at the beginning of the Research Data Alliance (RDA), and has since grown to include over 260 registered members, from across continents. IGAD is a domain-oriented interest group working on all issues related to food and agricultural data. It represents stakeholders in managing data for food and agricultural research and innovation, including producing, aggregating and consuming data.


Beyond this, IGAD promotes good practices and RDA Recommendations in the research domain, including data sharing policies, data management plans, and data interoperability. As a forum for sharing experiences and providing visibility to research and work in food and agricultural data, IGAD has become a space for networking and blending ideas related to data management and interoperability. It also provides fertile ground to reach out and promote projects among other international organizations and institutions working in food and agricultural research and innovation. On a logistical level, one of IGAD’s chief roles is to serve as a platform that leads to the creation of domain-specific Working Groups.


For more information on why IGAD should transition to become the “Improving Global Agricultural Data” Community of Practice, view the “Objectives” section of our proposal. Benefits and alignment with RDA goals are described below under the Value Proposition Section 2. User scenario(s) or use case(s) the CoP wishes to address.


Review period start:
Thursday, 24 June, 2021 to Friday, 6 August, 2021
Custom text:

The Data Versioning WG has decided to transition to an Interest Group. The proposed Charter, which underwent community review, can be found here. The final version of the Charter, which was updated after the community review, can be found here.


For more background information on the Data Versioning WG, please refer to the comment by Jens Klump on this page.

Review period start:
Friday, 7 May, 2021 to Monday, 7 June, 2021
Custom text:


Name of Proposed Interest Group: Sensitive Data Interest Group

RDA site:


1. Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community)


Sensitive Data: A working definition of sensitive data is: Information that is regulated by law due to possible risk for plants, animals, individuals and/or communities and for public and private organisations. Sensitive personal data include information related to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership and data concerning the health or sex life of an individual. These data that could be identifiable and potentially cause harm through their disclosure. For local and government authorities, sensitive data is related to security (political, diplomatic, military data, biohazard concerns, etc.), environmental risks (nuclear or other sensitive installations, for example) or environmental preservation (habitats, protected fauna or flora, in particular). The sensitive data of a private body concerns in particular strategic elements or elements likely to jeopardise its competitiveness.
Adapted from: David et al., 2020, “Templates for FAIRness evaluation criteria - RDA-SHARC IG”


A range of disciplines collect data which are potentially sensitive, presenting serious barriers to reuse and reproducibility. There are a number of barriers which need to be overcome before sensitive data can be utilised safely and to its best advantage. One major challenge is that not all sensitive data is alike, with significant disciplinary variation in how sensitive data is defined, linked, managed, stored, and reused. Additionally, common approaches to working with, sharing and managing data are not always appropriate for sensitive data. For example, sensitive data exposes the different perspectives underlying the FAIR and CARE principles. Further, sensitive data requires careful stewarding such that it can be disseminated in an ethically and culturally appropriate way. Nonetheless, sensitive data has significant potential to be utilised in the conduct of novel and impactful work. Therefore, it is essential that a set of community standards and best practices be developed for sensitive data usage and management.


Issues the IG will address

In addition to issues identified by the RDA community as this IG develops, we envisage this IG will address the following issues:

  1. Data carries with it different levels of sensitivity depending on its context (e.g., research discipline, who the data is about, what the data is being used for). However, it is not always clear how we should assess data for sensitivities in different contexts. A resource is needed for those working with data to allow them to make informed decisions about data sensitivity and, consequently, data governance, management, and usage.
  2. Sensitive data is often identified. However, re-identification can be possible and can cause serious harm. Resources are needed on mechanisms of reidentification and the different risks for different types of sensitive data.
  3. Data that has been labeled sensitive is often not shared beyond the team that collected/created this data. This means that data collection is sometimes duplicated, and is a challenge for reproducible research. More ethically and culturally safe sharing of sensitive data may also enhance the robustness of research design and development. Resources are needed which provide information for those working with sensitive data with information about how that data can be shared and reused in a safe and ethical manner.
  4. At times there is a duality between sharing and reusing data in general, and for stewarding data in culturally and ethically appropriate ways. This duality is exacerbated in the context of sensitive data due to lower rates of data sharing, and increased potential for harm. Guidelines are needed for balancing principles of data sharing and reuse (e.g., FAIR) with ethically and culturally appropriate principles (e.g., CARE) specifically in the context of sensitive data.
  5. Consent is a major consideration when sharing any data, especially sensitive data. However, informed consent can be challenging to obtain, especially when reusing data. This is sometimes a barrier to sharing sensitive data. Guidelines are needed that explore consent models, especially post-hoc consent, for governing the primary and secondary use of sensitive data. 


How this IG is aligned with the RDA mission

The RDA Vision: This IG aligns with the RDA vision because it will develop mechanisms for the responsible reuse of sensitive data - a data source that is both extremely valuable but which also carries many ethical and cultural considerations. Sensitive data will play an increasingly significant role in addressing the grand challenges of the 21st century, such as issues of social and environmental justice. Indeed, the benefits and potential harms of sensitive data are increasingly being discussed in public forums as corporations and private companies leverage such data for profit. As mechanisms for sensitive data reuse become widely available (such as through the work of this IG), new innovation and invention will be fostered through the reuse of sensitive data. This IG has participants from University and non-University sectors, which strongly positions the IG to engage with all the variety of stakeholders.


The RDA Mission: This IG aligns with the RDA mission as it develops guidelines for the technical components of working with sensitive data, and for addressing the social aspects of working with sensitive data including fostering discussion around the cultural and ethical considerations of data reuse. This IG is well positioned to meet these challenges given the diverse backgrounds of the initial members. The connection between the technical aspects of working with sensitive data (such as secure virtual environments) and the ethical and cultural aspects (such as consent, disciplinary perspectives and norms, and CARE principles) is a key point of interest for this IG.


How this IG would be a value-added contribution to the RDA community

Sensitive Data is ubiquitous. However, its context varies. For this reason, this IG complements the work of a range of existing IGs and WGs, including:




  • Infectious Diseases Community of Practice (forthcoming)



The aims of the Sensitive Data IG is to provide a space to focus explicitly on sensitive data. While the scope is interdisciplinary, this IG focuses on sensitive data types. Our planned activities will compliment the above IGs as we address sensitive data in domain specific terms (e.g., sensitive data in the health domains) as well as in general terms (e.g., systems for sharing sensitive data). The Sensitive Data IG already has members from a number of the above IGs, which will aid us in coordinating our activities with these groups. The Sensitive Data co-chairs are collectively members of over 20 RDA groups.


All members of the Sensitive Data IG are also active members of the RDA community. We will draw on this to ensure that our efforts take account of previous work in the RDA, and to ensure that our group remains up-to-date on RDA activities.


2. User scenario(s) or use case(s) the IG wishes to address
(what triggered the desire for this IG in the first place):


We identified the following key reasons for forming this IG. We envisage that additional use cases will be developed through working with the RDA community following endorsement.

  1. There are a lack of guidelines for working with sensitive data both within and between disciplines/research areas. One reason for this is because sensitive data varies between contexts (e.g., between disciplines). To develop a cohesive but also targeted set of guidelines, a group is needed which comprises members of a range of disciplines with a shared interest in sensitive data.
  2. There is a need for a framework which considers the ethical and cultural aspects of sensitive data, alongside the technical aspects. Individuals may want to share their sensitive data and may have conducted all the necessary ethical/cultural safe guards. However, they may lack an understanding of how this can be achieved with the technical resources available to them, what repository or sharing mechanism can handle such data, and how best to access persistent IDs which allow them to track the use of their data. Conversely, individuals may have the ideal technological solution for sharing without an understanding of the ethical/cultural considerations. A group is needed to facilitate a dialogue between the ethical/cultural and technical aspects of sensitive data sharing, and to produce tangible outputs which progress this discussion.
  3. There is a general consensus that sensitive data is highly valuable but that it is not being utilised to its full potential. While there is a range of anecdotal support for this claim, a body of work is needed which explores and documents the state of sensitive data primary and secondary usage, and which examines the underlying causes of sensitive data reuse practices within and between disciplines.
  4. There is a recognition that there are a number of stakeholders with respect to  sensitive data assets, and that each stakeholder has different requirements, needs, expectations, and terminology (e.g., in the case of health data, government, hospitals, researchers, community members). A group is needed which can synthesise the main expectations of different stakeholders to develop resources of individuals and organisations to use when engaging with, sharing, and accessing sensitive data (i.e., a resource for a shared language between stakeholders).
  5. There is a need for adequate and specialised resourcing and infrastructure to manage, work with, and share sensitive data. Different data types require different solutions for management, analysis, and sharing. While a range of solutions are available for these different data types, their suitability for sensitive data is not always clear. Work is required to assess solutions for different sensitive data types specifically.
  6. Our era is experiencing the most brutal collapse in biodiversity that the earth has known. Yet biodiversity produces many ecosystem services, and resources. However, species and habitat diversity is undermined by many human activities. The preservation of both fragile and overly coveted species and resources makes the publication of their geolocation sensitive. Other data concerning the characteristics of certain pathogens have also proven to be sensitive.
  7. The humanities and social science disciplines likewise require clear guidance regarding collection, use and reuse of sensitive data. This may encompass specific ethical considerations pertaining to data collection (e.g., balancing FAIR v CARE principles), research data collection methods when working with vulnerable individuals or communities often on sensitive topics, the joining of disparate datasets, and considerations of how long such data should be retained, and where.


3. Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.  Articulate how this group is different from other current activities inside or outside of RDA.):[


  1. Using the definition presented at the top of this document as a starting point, develop a shared understanding and refined definition of sensitive data.
  2. Define various levels of “sensitivity” for data.
  3. Data should be as open as possible and as closed as necessary. Within this context, develop an understanding of how sensitivity relates to openness.
  4. Identify different consent models.
  5. Identify types of sensitive data holdings and resources across various domains.
  6. Identify existing data definitions and standards for different types of sensitive data.
  7. Identify challenges in collecting, using and sharing sensitive data.
  8. Engage with key stakeholders working in the area of sensitive data management/analytics .
  9. Identify existing solutions for sensitive data collection, analysis, storage and dissemination.
  10. Identify differences in how sensitive data is managed between groups and regions.


4. Participation (Address which communities will be involved, what skills or knowledge they should have, and how will you engage these communities. Also address how this group proposes to coordinate its activity with relevant related groups.):

Use these people to help grow the case studies


While the interested participants in this interest group are currently mostly from Australia, we have been working to establish this group as part of a global Community of Practice. We are currently developing a strategy to achieve international engagement.


To further this effort, the group has seen the recent addition of chairs from Europe and the USA to the group. The Social Science Interest Group, which comprises a broad international membership base and chairs from Norway, USA and Australia, also has formal participation in the Sensitive Data interest group.


The next phase of this engagement strategy will be through specific engagement with RDA groups and other stakeholders covering a range of domains and geographic regions. Specific stakeholders to be approached are still to be determined, but will be drawn from the these target groups These include:

  • RDA Interest Groups: Social Science IG (established), International Indigenous Data Sovereignty IG (Initial approach made, pending response), Ethics and Social Aspects of Data IG, RDA-COVID19 WG (and the various sub-groups), Reproducible Health Data Services WG, Epidemiology common standard for surveillance data reporting WG, Domain Repositories IG, Health Data Interest Group, RDA/NISO Privacy Implications of Research Data Sets IG, Virtual Research Environment IG, Social Dynamics of Data Interoperability IG
  • Communities outside of RDA: Relevant domain and discipline communities, eg. The SSHOC and EOSC work programs around sensitive data, US and Canadian networks of Research Data Centres, International and Regional Statistical Agencies (WHO, UNStat, Eurostat, National Statistical Offices), (HEALTH DATA EXAMPLE COMMUNITIES??)?



5. Outcomes (Discuss what the IG intends to accomplish. Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):


  1. To identify the key expectations of the community and use these to refine the IG's objectives.
  2. List different types of data across disciplines such as health, social sciences, etc and how different levels of sensitivities apply to those types of data.
  3. Identify best practices in sensitive data management across multiple regions, domains and disciplines and how to adapt the best practices.
  4. Engage with relevant RDA IGs, WGs and CoPs to identify priorities in the area of sensitive data management.
  5. Gather common guidelines and recommendations for working with sensitive data in different disciplines and in different regions.
  6. Catalogue of ethical, philosophical and cultural principles that underpin the use of sensitive data assets.


6. Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):


The IG will meet every 3 - 4 weeks via Zoom. Meeting times will be alternated to accommodate as many time zones as possible. Google Docs will be used to develop shared documentation. Email will be used to communicate about meetings and tasks requiring follow-up between meetings. The current chairs/members of the IG are already successfully using this system to meet and maintain momentum.


The IG will also meet regularly at Plenaries as an opportunity to workshop new ideas with the RDA community and foster new engagements. The group will also establish an informal communication channel through Slack, or a similar platform, to allow for ongoing conversation. The group will also organise webinars and information sessions between Plenaries to share ideas and for group members to stay in touch with the activities of the group. The IG will also use our RDA page to share documents and communicate regularly with the RDA community.


7. Timeline (Describe draft milestones and goals for the first 12 months.):


Initial activities: The group met for the first time as a Birds of a Feather session at RDA 16. Following this, a core group of interested members met to begin drafting the group charter. This group also submitted a proposal for an IG session for RDA 17. The group will send the draft charter for initial TAB review and community consultation in the lead up to RDA 17. The draft charter will also be sent for feedback specifically to members who have joined the IG page and who attended the BoF session at RDA 16. The draft charter and TAB/community/group feedback will be discussed at the RDA 17 session. Following this, the revised charter will be submitted for formal endorsement.


First 12 months: Once the IG is formally endorsed, we will undertake the following activities in the first 12 months:

  1. Formally launch the IG - update our RDA IG site, call for additional co-chairs, share the approved charter with group members, establish a regular meeting time, establish RDA mailing list for the IG.
  2. Engage in group consultation to identify the main themes of interest and develop a strategy for establishing working groups/task forces to address these.
  3. Engage with stakeholders for feedback on key sensitive data issues and to develop the IGs networks within and outside of RDA.
  4. Invite existing RDA IGs identified in section 4 above to provide feedback on, and participate in, working groups/task forces themes.
  5. Presentation of webinar/workshop to workshop working group/task force topics and open the working groups/task forces topics for group comment through interactive platforms like Google Docs. 
  6. Formalise the working groups/task forces, share the goals of the working groups/task forces with the group and RDA more broadly to increase participation, prepare for RDA18 as an opportunity to share progress of the IG and working groups/task forces.
  7. Prepare reports and outputs from the working groups/task forces, share reports with the community, present a webinar/workshop to share the outputs with the community.
  8. Hold an IG meeting to assess the progress from the preceding 12 months and determine the next steps for working groups/task forces.



8. Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest.)


  • People interested in leadership:







University of Melbourne, Research Data Specialist  




University of Melbourne, Research Data Specialist  



University of Melbourne, Research Data Specialist  



ARDC, Data Technologist



ARDC, Manager (Engagements)



ARDC, RDA Director of Operations




Australian Data Archive, Director



Data manager, Research fellow European Research Infrastructure on Highly Pathogenic Agents


Director of Project Management and User Support
Assistant Research Scientist
Inter-university Consortium for Political and Social Research
University of Michigan
  • People who have joined on the Sensitive Data RDA IG so far





Frankie Stevens


Vince Bayrd

United States

Bénédicte Madon


Tiiu Tarkpea


Lars Eklund


Kristan Kang


Amy Nurnberger

United States

Su Nee Goh


Robert Pocklington


Kristan Kang


Genevieve Rosewall


Graham Smith

United Kingdom


  • People who attended the BoF expressed interest in participating following the BoF:


Affiliation and role


Interested in participating further?

Marjolaine Rivest-Beauregard

McGill University,

MSc student


Kiera McNeice

Cambridge University Press, Research Data Manager


Matthew Viljoen

EGI Foundation, Service Delivery and information security lead


Stephanie Thompson

Research Data Management, University of Birmingham


Y. G. Rancourt

Portage Network, Curation Officer


Thea Lindquist

University of Colorado Boulder, Center for Research Data and Digital Scholarship, Executive Director


Briana Ezray

Penn State University, Research Data Librarian - STEM


Gen Rosewall

Agile Business Analyst, AARNet


Becca Wilson

University of Liverpool, UK ; Research Fellow


Karen Thompson

University of Melbourne


Jeaneth Machicao

Universidade de São Paulo / Research fellow


Jules Sekedoua KOUADIO

Gustave Eiffel University


Mahamat Abdelkerim Issa

Institut national de recherche scientifique (INRS), Québec, CA, Phd. Student


Erin Clary

Portage, Canadian Association of Research Libraries


Kylie Burgess

Research Data Lead, University of New England





Review period start:
Tuesday, 23 February, 2021 to Tuesday, 23 March, 2021
Custom text:

Please note that the group has revised its Case Statement following review by TAB. This latest version, version 2, was submitted in July 2021. Version 1 underwent community review in February / March 2021 and was reviewed by TAB.

The Data Granularity Task Force of the Data Discovery Paradigms Interest Group (DDPIG) of the Research Data Alliance (RDA) proposes to form an RDA Data Granularity Working Group (WG).  This WG would address issues of data granularity in data discovery, access, interoperability, analysis, citation, and more. More efficient and effective reuse of data requires that users can find and access data at various levels of granularity. The WG will explore key questions and collect and share valuable information for how to best support data granularity, providing guidance to help data professionals to determine the best level of granularity for user discovery, access, interoperability and citability. 

The activities and final recommendations of the Data Granularity WG will build upon and complement existing and ongoing work of several RDA Working and Interest Groups that touch upon the subject of data granularity. The final deliverable for the WG is a set of collected use cases and a guidance document of data granularity approaches for prioritized use cases, including terminology, methods to evaluate approaches, and a summary of community feedback.

Review period start:
Friday, 5 February, 2021 to Friday, 5 March, 2021
Custom text:


GORC International Model WG Case Statement

The GORC International Benchmarking WG changed its name to GORC International Model WG on 2 August 2021. The revised Case Statement with the new group name has now been attached to this page.

The following previous case statements have now been superseded.




The Global Open Research Commons (GORC) is an ambitious vision of a global set of interoperable resources necessary to enable researchers to address societal grand challenges including climate change, pandemics, and poverty. The realized vision of GORC will provide frictionless access to all research artifacts including, but not limited to: data, publications, software and compute resources; and metadata, vocabulary, and identification services to everyone everywhere, at all times.


The GORC is being built by a set of national, pan-national and domain specific organizations such as the European Open Science Cloud, the African Open Science Platform, and the International Virtual Observatory Alliance (see Appendix A for a fuller list). The GORC IG is working on a set of deliverables to support coordination amongst these organizations, including a roadmap for global alignment to help set priorities for Commons development and integration. In support of this roadmap, this WG will establish benchmarks to compare features across commons.  We will not coordinate the use of specific benchmarks by research commons. Rather, we will review and identify features currently implemented by a target set of GORC organizations and determine how they measure their user engagement with these features.

Review period start:
Friday, 8 January, 2021 to Monday, 8 February, 2021
Custom text:


CASE STATEMENT: RDA/CODATA Epidemiology common standard for surveillance data reporting WG

See, also:




A concise articulation of what issues the WG will address within a 12-18 month time frame and what its “deliverables” or outcomes will be.


In May 2020, the Organization for Economic Cooperation and Development (OECD) discussed why and how Open Science is critical to preventing and combating pandemics such as COVID-19 caused by the novel coronavirus, SARS-CoV-2 (OECD 2020). Open Science is transparent and accessible knowledge that is shared and developed through collaborative networks (Vicente-Saez and Martinez-Fuentes 2018). FAIR (findable, accessible, interoperable, and reusable) data principles are an integral part of Open Science. FAIR data principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with no or minimal human intervention) (GoFAIR).  


However, there is an urgent need to develop a common standard for reporting communicable disease surveillance data without which Open Science and FAIR data will be difficult to achieve. Limited by antiquated systems and the lack of an established infrastructure, the tempo of the spread of the disease has outpaced our ability to react and adjust (Austin et al. 2020a,b; Garder et al. 2020). 


The need for developing a common standard for reporting epidemiology surveillance data was articulated by the RDA COVID-19 Epidemiology work group (WG) in their recommendations and guidelines, and supporting output (RDA COVID-19 WG 2020; RDA COVID-19 Epidemiology WG 2020). 


On October 27, 2020, the WHO, UNESCO, HCHR, and CERN issued a Joint Appeal for Open Science, a call on the international community to take all necessary measures to enable universal access to scientific progress and its applications UNESCO et al. 2020; UNESCO 2020:


"The open science movement aims to make science more accessible, more transparent and thereby more effective. A crisis such as the COVID-19 pandemic demonstrates the urgent need to strengthen scientific cooperation and ensure the fundamental right to universal access to scientific progress and its applications. Open Science is about free access to scientific publications, data and infrastructure, as well as open software, open educational resources and open technologies such as tests or vaccines. Open science also promotes trust in science, at a time when rumours and false information abound."


Michelle Bachelet, United Nations High Commissioner for Human Rights stated, 


"Data are a vital human rights tool."


The WG will build upon existing standards and guidelines to develop uniform definitions and data elements to improve data comparability and interoperability. 


We will build upon the work begun by the RDA COVID-19 Epidemiology WG, and extend beyond the COVID-19 pandemic to provide an actionable specification for reporting communicable disease surveillance data and metadata, including geospatial data.


This work will be a consensus building effort that contributes to CODATA’s Decadal programme:

  • Enabling Technologies and Good Practice for Data-Intensive Science
  • Mobilising Domains and Breaking Down Silos
  • Advancing Interoperability Through Cross-Domain Case Studies



A standard specification for reporting communicable disease surveillance data. 



A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.


Epidemiology surveillance data will enable governments and public health agencies to detect and respond to newly emergent threats of disease. Early detection may prevent development of epidemics and pandemics. It will also enable them to deliver more effective responses at all stages of the threat, from emergence through containment, mitigation, and reopening of society in the case of pandemics. Epidemiology surveillance data and geospatial data are large and varied. Treated as a strategic asset, they have the potential to support evidence-informed policy, stimulate new research areas, expand collaboration opportunities, and increase the health and economic well-being of society. A common standard for reporting epidemiology surveillance data will support these outcomes by improving data and metadata management and provision of findable, accessible, interoperable, reusable, ethical, and reproducible (FAIRER) data. 


The common standard for reporting epidemiology surveillance data is intended for implementation by government and international agencies, policy and decision-makers, epidemiologists and public health experts, disaster preparedness and response experts, funders, data providers, teachers, researchers, clinicians, and other potential users.



A brief review of related work and plan for engagement with any other activities in the area.


Nature of the problem to be addressed

The World Health Organization (WHO) defines public health surveillance as, “An ongoing, systematic collection, analysis and interpretation of health-related data essential to the planning, implementation, and evaluation of public health practice” (WHO 2020a). The WHO is a source of international standardized COVID-19 data and evidence-based guidelines, and is an invaluable source of technical guidance (WHO 2020b). Available instruments include a case-based reporting form, data dictionary, template, and aggregated weekly reporting form (WHO 2020c). There is also a global COVID-19 clinical data platform for clinical characterization and management of hospitalized patients with suspected or confirmed COVID-19 (WHO 2020d). The WHO (2020e) also notes that continued vigilance is needed to detect the emergence of novel zoonotic viruses affecting humans. 


Unfortunately, there are inconsistencies in the manner in which various jurisdictions agencies collect and report their data. This is due to gaps in existing standards, and a failure to comply with those standards that do exist. 


COVID-19 threat detection has been slow and ineffective, resulting in rapid development of a pandemic. Countries around the world have implemented a disparate series of public health measures in attempting to suppress and mitigate spread of the disease. The world was not prepared to respond to a novel zoonose that spreads with the tempo and severity of COVID-19 (Greenfield et al. 2020). The pandemic has resulted in serious health and economic consequences for both High Income Countries (HICs) and for Low and Middle Income Countries (LMICs) (Bong et. al, 2020). 


The RDA COVID-19 WG recommendations, guidelines, and supporting output highlighted discrepancies in the number of COVID-19 incident and mortality data across data sources which could be directly attributed to varying definitions and reporting protocols (RDA COVID-19 Epidemiology WG, 2020a,b). For example, mortality data from COVID-19 are frequently not comparable between and within jurisdictions due to varying definitions (Dudel, 2020). Variations resulting from discrepancies in official statistics limit effective disease-specific strategies (Modi et al., 2020; Modig & Ebeling, 2020). Other variables  (e.g., confirmed cases, probable cases, probable deaths, negative tests, recoveries, critical cases) are also inconsistently defined (Austin et al. 2020b). For example, while the WHO (2020f) defines a confirmed case as "a person with laboratory confirmation of COVID-19 infection, irrespective of clinical signs and symptoms", other datasets report confirmed cases as the number of both laboratory positive subjects and probable cases (JHU, 2020). The US CDC (2020) has amended its previous policy and now reports case counts from commercial and reference laboratories, public health laboratories, and hospital laboratories, but still excludes data from other testing sites within a jurisdiction (e.g. point-of-care test sites). In Turkey, the number of cases published until the end of July represented only symptomatic COVID-19 subjects, excluding asymptomatic laboratory positive individuals (Reuters Editors, 2020). Other issues affecting data accuracy include: duplicate event records, laboratory report delays, missing data, and incorrect dates.


Much of the developed world has a notifiable disease surveillance system for effective and efficient reporting within their national limits, with varying fields of data elements. There exist, also, a large number of international data standards which should be used when reporting epidemiology surveillance data (Table 1). However, these do not address specific requirements that would ensure that epidemiology surveillance data are comparable and interoperable. 


Table 1. Initial list of data standards useful for notifiable disease surveillance systems (non-exhaustive list) [SOURCE: Haghiri et. al. 2019].



Proposed Standard

Machine-organizable data


Medical document exchange format

Clinical Document Architecture (CDA), Continuity of Care Document (CCD), and Continuity Care Record (CCR)

Markup language

XML Document Transform (XDT)

Classification systems

International classification of disease (ICD, ICD9, ICD9-CM)

Other classification systems (DRG, CPT, ICECI, HCPCS, ICPM, ICF, DSM)

Nomenclature systems




Standard content-maker formats

Standard address format definition, standard contact number format definition, standard ID format definition, and standard date format definition


Disease surveillance systems rely on complex hierarchies for data reporting. Raw data are collected at local level followed by anonymization and aggregation as necessary before sending it up the hierarchy which includes many levels. Even in many of the most developed regions of the world, much of this process continues to be done by hand, although the push to electronic medical records is gaining traction. As a result, most disease surveillance systems across the world experience reporting lags of at least one to two weeks (Fairchild 2018, Janati 2015).


Publicly available data are made available on websites that are often difficult to navigate to find the data and associated definitions.


Historical data are not fixed the first time they are published due to undetected errors, late or missing data, laboratory delays, etc. The dataset is updated when the data becomes available. This problem, often called “backfill,” is due to the complex reporting hierarchy and antiquated systems that disease surveillance systems rely on. Backfill can in some cases drastically affect analyses (Fairchild 2018). The problem is compounded when corrected and missing case counts are added to the date on which the correction was reported, instead of the date on which the event occured. 


Case definitions used in epidemiological surveillance data are not clearly defined. Publicly available data are made available on websites that are often difficult to navigate to find the data and associated definitions. Fairchild et al. (2018) have highlighted the challenges with data reporting and stressed the importance of explicit and clear case definitions. Even with standardized definitions, regions with little support on funding for public health institutions may struggle to adopt a framework of best practices. We will develop guidelines that recognise these limitations and that will support both LMICs and HICs.


Engagement with other related activities

The proposed “Epidemiology common standard for surveillance data reporting WG” will address a high-priority challenge based on assessments of public health needs during a pandemic using COVID-19 as a use case.


The initial WG membership (see Section 6, below) is well connected to various community-based initiatives and WGs that address similar and other relevant topics. The WG will monitor and align its efforts with other related activities, including:


RDA WG and interest groups (IG):


See, also, “Solicited WG membership in Section 6, below.”



A specific and detailed description of how the WG will operate.



D1 (months 4-12). Epidemiology common standard for surveillance data reporting.

This deliverable will contain the developed common standard specification for reporting epidemiology surveillance data, including variable names, definitions, and rationale.


D2 (months 12-16). Guidelines for adopting the common standard.

Guidelines will be based on lessons learned during development of the standard.



M1 (months 0-1). Engagement of representatives from prominent stakeholders in public health. 

We will seek engagement with the WHO, eCDC, US-CDC, ICMR etc.


M2 (months 0-3). Identification of standards gaps and issues concerning data interoperability and comparability across and within jurisdictions. 

  • We will use COVID-19 surveillance data as a use-case to identify issues that can be resolved by implementation of a common standard for reporting communicable disease surveillance data.

  • Identify related standards and guidelines.

  • Identify standards gaps.


M3 (months 1-3). Definition of the scope of the standard and detailed objectives.

We will develop a detailed project management and work plan. 


M4 (months 3-6). Hackathon.

A hackathon will be conducted for the RDA 17th Plenary in April 2021. The objective will be to combine publicly available COVID-19 related datasets and to present solutions that overcome the barriers encountered. The hackathon will be announced at the 16th Plenary in November 2020 and will be opened in February 2021. Participants will present their results at the 17th Plenary at which time judging will take place and winners announced. Winners will be offered co-authorship on a peer-reviewed publication. From November to January, we will seek sponsors for cash prizes to be awarded to 1st, 2nd, and 3rd place winners.


M5 (months 3-12). Development of a draft standard for reporting epidemiology surveillance data.


M6 (months 12-14). Public review of the draft standard.


M7 (months 12-16). Development of guidelines for adoption of the standard


M8 (month 15-17). Finalization of the standard.

M9 (month 1-18). Dissemination and Communication.

WG activities and outcomes will be disseminated via the RDA website, preprint(s), submission to a peer-reviewed journal, RDA Plenaries, conference presentations, and social media.


Simplified Gantt Chart

























































Work space

The WG will use the following platforms for communication and development:

  • RDA website

  • Google drive

    • Working documents will be managed on Gdrive to facilitate open collaboration, and to generally make things easier.

  • GitHub 

    • We will develop a public GitHub repository to host the hackathon material,  models, source code, and the proposed common standard, and to raise and resolve issues.

  • Zotero


We will use a variety of tools, for example:

  • Visualization
    • Mindmapping
    • Infographics
  • Gantt charting for project management
  • Voting and consensus building tools



  • Meetings will be held weekly.

  • An online platform (e.g., GoToMeeting, Zoom, WebEx, MS Teams, or Google Meet) will be used for meetings. Participants will be asked to activate their video to enhance communication effectiveness. The WG will meet at RDA Plenaries, the first such meeting being at the 16th Plenary on November 12, 2020 at 12:00 - 1:30 AM UTC

  • Agenda, minutes and rolling notes will be circulated via google doc. 

  • Discussions will be held at the RDA 16th, 17th, and 18th Plenaries, and at other conferences and workshops where possible. 


A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation, and


Consensus will be achieved mainly through discussions in our regular weekly meetings, where conflicting viewpoints will be identified and openly discussed and debated by group members. If consensus cannot be reached in this manner, the final decision will be taken by the group co-chairs. By setting realistic deadlines and assessing progress on assigned tasks, the co-chairs will keep the WG on track and within scope.


Community engagement

A description of the WG’s planned approach to broader community engagement and participation. 


To encourage broader community engagement and participation in the development of a standard, the WG case statement will be circulated to various public health organizations and  epidemiological societies across the globe, and on social media (Linkedin and Twitter). A regular update on events/news related to Epidemiology common standards will be posted on RDA WG webpage to encourage involvement of specialists in the field. 



WG outputs will be published under a CC BY-SA license. 



A specific plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community. Such adoption or implementation should start within the 12-18 month timeframe before the WG is complete.


The WG members will be encouraged to implement the new standard and guidelines within their organizations. We will pursue adoption by a variety of stakeholders and research communities, particularly those involved in public health.The standard will be disseminated via RDA webinars, other scientific presentations and twitter handle. We will also seek to publish the final standard and guidelines as an open access peer-reviewed journal article. We will follow up with adoption stories.



A specific list of initial members of the WG and a description of initial leadership of the WG.


Co-Chairs: Claire Austin and Rajini Nagrani


RDA Liaison: Stefanie Kethers



Soegianto Ali

Anthony Juehne

Nada El Jundi

Fotis Georgatos

Jitendra Jonnagaddala

Miklós Krész

Gary Mazzaferro

Jiban K. Pal

Carlos Luis Parra-Calderon

Bonnet Pascal

Fotis Psomopoulos

Stefan Sauermann

Henri Tonnang

Marcos Roberto Tovani-Palone    

Anna Widyastuti

Becca Wilson

Eiko Yoneki


Current initial membership

The initial WG includes:

  • Cross-domain expertise 

    • biostatistics, clinical informatics, computer engineering, data science, epidemiology, global health, health informatics, health sciences, interoperability, IT architecture, mathematics, open science, pathology, predictive modeling, public health, research data management, software development, veterinary medicine
  • Experience 

    • academia, editor of scientific journals, government, international WG leadership,  program director, research, standards development.
  • Regional representation 

    • Africa (sub-saharan), Asia (maritime southeast), Asia (south), Australasia, Europe, North America, and South America.
  • Income groups

    • Two lower-middle income, two upper-middle income, and 15 high-income countries.


Initial membership comprises a core group from the RDA-COVID19-Epidemiology WG, and additional members who bring additional domain specific expertise. We aim to further strengthen the group to expand global participation (low income, lower-middle income, and upper-middle income countries), interdisciplinary experts, and stakeholder representation to address this pressing common epidemiology surveillance data challenge across the public health domain. 


Actively soliciting WG membership

The initial membership does not currently include any potential adopters. We will be soliciting the active participation in the WG of representatives from key stakeholders, including the following:

Official agencies and funders

  • Official agencies, organizations, and funders having international reach
  • Supernational organizations
  • European Centers for Disease Control (eCDC)
  • Global Early Warning System (GLEWS+)
  • Global Health Security Agenda (GHSA)
  • Global Influenza Surveillance and Response System (GISRS)
  • Global Partnership for Sustainable Development Data (GPSDD)
  • GloPID-R
  • Indian Council of Medical Research (ICMR)
  • Observational Health Data Sciences and Informatics (OHDSI)
  • UN Office for Disaster Risk Reduction (UNDRR)
  • United Nations Educational, Scientific and Cultural Organization (UNESCO)
  • U.S. Centers for Disease Control (CDC)
  • Wellcome Trust
  • World Data System (WDS)
  • World Health Organization (WHO)
  • World Bank World Development Indicators (WDI)


Data aggregators in academia

  • Johns Hopkins University (Killeen et al. 2020)
  • University of California, Berkeley (Altieri et al. 2020)
  • University of Oxford (Roser et al. 2020)

News Outlets

  • The Atlantic

  • The Economist

  • The Financial Times

  • The New York Times

Communications/graphic artist expertise




Altieri, N., Barter, R. L., Duncan, J., Dwivedi, R., Kumbier, K., Li, X., Netzorg, R., Park, B., Singh, C., Tan, Y. S., Tang, T., Wang, Y., Zhang, C., & Yu, B. (2020). Curating a COVID-19 Data Repository and Forecasting County-Level DeathCounts in the United States. Harvard Data Science Review.


Austin, Claire C; Nagrani, Rajini; Widyastuti, Anna; El Jundi, Nada (2020a). Global status of COVID-19 data: A cross-jurisdictional and international perspective. Canadian Public Health Association Conference. October 14-16.


Austin, Claire C; Widyastuti, Anna; El Jundi, Nada; Nagrani, Rajini; and the RDA COVID-19 WG. (2020b). Surveillance Data and Models: Review and Analysis, Part 1 (September 18, 2020). Preprint available at SSRN:


Bong CL, Brasher C, Chikumba E, McDougall R, Mellin-Olsen J, Enright A (2020). The COVID-19 Pandemic: Effects on Low- and Middle-Income Countries. Anesth Analg, 131:86-92.


CDC. Coronavirus Disease 2019 (COVID-19) in the U.S.. Centers for Disease Control and Prevention. 2020 [cited 2020 Oct 23]. Available from:


Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, Priedhorsky R, Deshpande A (2018). Epidemiological Data Challenges: Planning for a More Robust Future Through Data Standards. Front Public Health, 6:336.


Gardner, L., Ratcliff, J., Dong, E., & Katz, A. (2020). A need for open public data standards and sharing in light of COVID-19. The Lancet Infectious Diseases, 0(0).


Greenfield J., Tonnang E.Z., Mazzaferro G., Austin, C.C.; and the RDA-COVID19-WG. (2020). Epi-TRACS: Rapid detection and whole system response for emerging pathogens such as SARS-CoV-2 virus and the COVID-19 disease that it causes. IN: COVID-19 Data sharing in epidemiology, version 0.06b. Research Data Alliance RDA-COVID19-Epidemiology WG.


GLEWS (2013). Global Early Warning System.


Haghiri H, Rabiei R, Hosseini A, Moghaddasi H, Asadi F (2019). Notifiable Diseases Surveillance System with a Data Architecture Approach: A Systematic Review. Acta Inform Med, 27:268-277.


Janati A, Hosseiny M, Gouya MM, Moradi G, Ghaderi E (2015). Communicable Disease Reporting Systems in the World: A Systematic Review. Iran J Public Health, 44:1453-1465.


JHU (2020). Coronavirus resource center. Johns Hopkins University.


Killeen, B. D., Wu, J. Y., Shah, K., Zapaishchykova, A., Nikutta, P., Tamhane, A., Chakraborty, S., Wei, J., Gao, T., Thies, M., & Unberath, M. (2020). A County-level Dataset for Informing the United States’ Response to COVID-19. ArXiv:2004.00756 [Physics, q-Bio].


Modig K, Ebeling M (2020). Excess mortality from COVID-19. weekly excess death rates by age and sex for aweden. Preprint available at medRxiv:


Norton, A., Pardinz-Solis, R., & Carson, G. (2017). Roadmap for data sharing in public health emergencies. GloPID-R.


OECD (2020). Why open science is critical to combatting COVID-19—OECD. Organisation for Economic Co-Operation and Development, May 12, 2020.


OHDSI (2020). Observational Health Data Sciences and Informatics.


RDA COVID-19 WG (2020). Recommendations and guidelines. Research Data Alliance.


RDA COVID-19 Epidemiology WG (2020). Sharing COVID-19 epidemiology data: Supporting output. Research Data Alliance.


Reuters Editors. Turkey has only been publishing symptomatic coronavirus cases - minister. Reuters. 2020 [cited 2020 Oct 15]; Available from:


Roser, M., Ritchie, H., Ortiz-Ospina, E., & Hasell, J. (2020). Coronavirus Pandemic (COVID-19). Our World in Data.


SDMX (2020). The Business Case for SDMX. SDMX Initiative.


UN (2018). Overview of standards for data disaggregation. United Nations.


UN (2020). IAEG-SDGs—Data Disaggregation for the SDG Indicators. United Nations.


UNESCO (2020). Preliminary report on the first draft of the Recommendation on Open Science—UNESCO Digital Library. United Nations Educational, Scientific and Cultural Organization.


UNESCO, WHO, HCHR, & CERN (2020, October 27). ​Joint Appeal for Open Science.


Vicente-Saez, R., & Martinez-Fuentes, C. (2018). Open Science now: A systematic literature review for an integrated definition. Journal of Business Research, 88, 428–436.


WHO (2020a). Public health surveillance. United Nations, World Health Organization.


WHO. (2020b). Country & Technical Guidance—Coronavirus disease (COVID-19). World Health Organization.


WHO. (2020c). Global COVID-19 Clinical Data Platform for clinical characterization and management of hospitalized patients with suspected or confirmed COVID-19. World Health Organization.


WHO. (2020d). Global COVID-19 Clinical Data Platform. World Health Organization.


WHO (2020e). Preparing GISRS for the upcoming influenza seasons during the COVID-19 pandemic – practical considerations. United Nations, World Health Organization.


WHO (2020f). COVID-19 case definition.


WHO (2020g). Global Influenza Surveillance and Response System (GISRS). United Nations, World Health Organization.


WHO (2020h). COVID-19 Core Version Case Record Form (CRF). United Nations, World Health Organization. 


WHO (2020i). COVID-19 Rapid Version Case Record Form (CRF). United Nations, World Health Organization.


WHO (2020j). WHO Information Network for Epidemics (EPI-WIN). United Nations, World Health Organization.



Review period start:
Wednesday, 28 October, 2020 to Friday, 25 December, 2020
Custom text:
Review period start:
Monday, 26 October, 2020
Custom text: