Skip to main content


The new RDA web platform is still being rolled out. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: Please report bugs, broken links and provide your feedback using the UserSnap tool on the bottom right corner of each page. Stay updated about the web site milestones at

Mapping the road ahead for the Data Discovery Paradigms IG

  • Creator
  • #134108

    Kathleen Gregory

    Collaborative session notes:

    Introduction of the group (10 minutes)

    Discussion on short-term activities on the open topics (60 minutes)

    Metadata enrichment for discovery, for example, using upper-level ontologies

    Possible introductory talk: Mingfang Wu

    Suggested scope:

    Identify existing resources, their purposes, problems that they are designed to address, domains where used, examples of successful application. 

    Challenges or difficulties, e.g. in applying specific controlled vocabularies or upper-level ontologies

    Potential tasks:

    Identify existing surveys that could be used as input

    Might require the design of a new survey, tailored to the particular questions

    Expected outcomes of the session discussion:

    Agreement on Scope

    Discuss possible tasks

    Potential leads

    Machine learning for data discovery, for example, topic modelling

    Possible introductory talk: 

    Suggested scope:

    Doing a landscape of ML solutions, either a survey of repositories or document/literature review, from perspectives (taking into consideration all stakeholders, i.e. researchers, librarians, citizen scientists, etc) / activities / efforts that are assisting / facilitating Data Discovery. Based on the landscape, identify best practices across multiple domains

    Examination of how ML models trained for data discovery can be described and shared.

    How to ensure that repositories can facilitate ML-based workflows while avoiding potential biases (resulting from only certain subsets of relevant data being discovered).

    Facilitate (semi-)automated processes (e.g. recommendation) in repositories for identifying items of interest from user perspective.  

    Potential tasks:

    Landscape performed as a literature review, and/or a survey of repositories

    A best-practices document, that can be consequently transformed into a recommendations document.

    Expected outcomes of the session discussion:

    Agreement on Scope

    Discuss possible tasks

    Potential leads

    User study / meta-research/analysis of data discovery interviews 

    Possible introductory talk: Kathleen 

    Suggested scope:

    Revisit the prior efforts on user study interviews/surveys 

    Meta analysis / research of the efforts.

    Potential tasks:

    Organize a series of presentations of existing efforts.

    Create a list capturing all the existing/past/ongoing efforts

    Expected outcomes of the session discussion:

    Agreement on Scope

    Discuss possible tasks

    Potential leads

    Next steps and wrap up (20 minutes)

    Additional links to informative material
    The group has delivered the following three supporting outputs: 

    Eleven quick tips for finding research data

    Data discovery paradigms: user requirements and recommendations for data repositories

    A survey of current practices in data search services

    Slides from previous plenary sessions: 

    January 2021 – RDA Virtual Plenary 17:

    Investigating data discovery across domains (Group session, slides)

    November 2020 – RDA Virtual Plenary 16:

    What information about data do users desire for discovery? (Group session, slides)


    April 2020 – RDA Virtual Plenary 15:

    Inferring data searchers’ intent and their interaction with data discovery systems (group session)

    Data Granularity BoF; user perspectives, data citation and data versioning (BoF session)

    Oct. 2019 (P14) – Data Discovery Paradigms IG: Reports from Task Forces and Way Ahead (slides)

    April 2019 (P13): Data Discovery Paradigms IG: Reports from Task Forces and Way Ahead (slides)

    Oct. 2018 – RDA Plenary 12; IG meets, Task Forces report back

    March 2018 — RDA Plenary 11; IG meets, Task Forces report back

    Slides from earlier plenaries are available from the group page.

    Applicable Pathways
    The FAIR Agenda, Data Infrastructures – Organisational to Environments

    Avoid conflict with the following group (1)
    Data Versioning IG

    Brief introduction describing the activities and scope of the group
    The objective of this IG is to provide a forum where representatives from across the spectrum of stakeholders and roles pertaining to data discovery can work together to identify, study and make recommendations concerning issues related to improving data discovery. The goal is to produce concrete deliverables that will be recognised and valued by the research and data communities.
    This group was officially endorsed at RDA P9. The group has worked on the following task forces, namely:

    User study in data discovery (ongoing)

    Data/Metadata granularity (ongoing, a BoF has been submitted)

    Using for research dataset discovery (This task force has spun off to the Research Metadata Schemas Working Group, which was endorsed in Sept. 2019).

    Initial four task forces from the group:

    Relevancy ranking (completed)

    Use cases, prototyping tools and test collections (completed)

    Best practice for making data findable (completed)

    Metadata enrichment (closed)

    Group chair serving as contact person
    Kathleen Gregory

    Meeting objectives

    To update the group progress

    To discuss new task forces

    Please indicate the breakout slot (s) that would suit your meeting
    Breakout 16, Breakout 19, Breakout 22

    Privacy Policy

Log in to reply.