Describing diverse chemistry datasets across distributed data resources

You are here

31 Oct 2022
Group(s) submitting the application: 
Meeting objectives: 

Increasing numbers of chemistry-related datasets are becoming available for re-use through data repositories that range from generic to very specialised. These repositories may specialise in one particular data type, or focus on one particular sub-discipline; some may capture just the raw data produced by an experiment, or results derived from those datasets; others may be less structured and accept whatever files are deposited.


Chemistry data can be quite complex with a broad variety of data types and associated information necessary to analyse. Datasets may contain collections of files relating to different analytical methods that together describe an overall result but also have relevance individually. Much of chemical data can be rendered meaningless without adequate contextual description (e.g., samples, conditions, measurement parameters) and there are many approaches to documenting this information.


Repositories will necessarily involve a variety of in depth description of datasets and the scope of coverage can impact the availability of metadata that is exposed. Additionally, it is important to ensure domain metadata are registered alongside DOIs with appropriate description granularity. Implementation of domain data standards will be critical for maximising interoperability and sustainability.


This session aims to explore the metadata descriptions needed to ensure that deposited chemical data may be fully discovered and robustly re-used broadly across disciplines and use cases.


We will explore specific needs and challenges relating to the description of chemistry datasets including:

  • The vocabularies needed to characterise disciplines and sub-domains relevant to a particular chemistry dataset

  • How to accurately represent the specific type of a chemistry dataset

  • How to describe chemical substances relevant to a chemistry dataset

  • Deficiencies in general-purpose metadata schema that prevent us from describing chemistry datasets to the level of detail required

For each of these, we will address the following questions:

  • To what extent do existing scientific standards and motifs provide a basis for addressing our needs and how might these need to evolve?

  • How do we ensure that the technical community is aware of existing standards that are applicable to the development of new infrastructure?

  • Are new standards or recommendations, chemistry-related or otherwise, required to ensure the consistency and coherence of description across diverse data resources?

  • How do we engage subject experts in helping to fill scientific gaps in both general and domain-specific repositories?

We will also consider more general concerns such as:

  • The level of description appropriate to include in disciplinary vs general data and metadata repositories and registries

  • How to effectively group together different data objects required to reproduce published results when these may be stored in different repositories

  • Whether new services are required to enable the discovery, reuse and interoperability of chemistry datasets across resources

Meeting agenda: 


Collaborative session notes:


We intend spending half of the session sharing perspectives and half discussing what the wider community needs are, what may already exist to help address these and further action required.


Contributions and participation will be drawn from the list below who have all indicated some interest. A more detailed agenda will be established nearer the time. Note that we do not expect all those mentioned will give presentations and we will aim to have some discussions amongst stakeholders in advance so we can efficiently lay out the challenges to be considered during the session.

  • NFDI4Chem

  • OneGeochemistry 

  • UK Catalysis Hub

  • UK Physical Sciences Data Infrastructure

  • IUPAC standards projects (e.g., Gold Book, FAIRSpec, ThermoML, Adsorption Information Framework, Solubility metadata, etc.) 

  • IUPAC WorldFAIR Working Groups

  • The InChI Trust

  • Chemistry domain repositories

  • European Materials & Modelling Ontology (EMMO)

  • The RDA PDINST Working Group

  • DataCite representatives and/or DataCite Metadata Working Group


Target Audience: 
  • Infrastructure providers supporting the discovery and reuse of chemistry datasets

  • Standards communities developing schema for domain-level description of data

  • Registration agencies providing services that support discovery of data across domains 

  • Domain experts working with complex research data

Group chair serving as contact person: 
Brief introduction describing the activities and scope of the group: 

The Chemistry Research Data Interest Group exists to provide a forum for discussion of matters relating to the sharing and reuse of research data generated by and relevant to the chemical sciences and aligned disciplines. We periodically convene sessions to address timely topics that may be of broad community interest and can benefit from input by experts from across RDA communities. The activities of the group complement data-related activities being undertaken by other national and international chemistry initiatives, in particular the International Union of Pure and Applied Chemistry (IUPAC), the standards body for chemistry.

Short Group Status: 

The group has been active since 2015 and has periodically run sessions at RDA Plenary meetings, most recently Developments in the Digital Chemistry Space: An Update (Plenary 18) and FAIR publishing of chemistry research data objects with the Photon and Neutron Interest Group (Plenary 17). The group also contributes to sessions organised by other RDA Interest and Working Groups as requested.

Type of Meeting: 
Working meeting
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2):