Describing Chemical, Physical and Biological samples digitally: Seeking harmonization

You are here

11 Jun 2023

Describing Chemical, Physical and Biological samples digitally: Seeking harmonization

Submitted by Leah McEwen

Meeting objectives: 


Click here for the collaborative session notes


Samples are taken in every field of study, but they vary widely in terms of type, e.g., single crystals, powder, complex structures, proteins and other biological (macro)molecules, cells, tissues, organisms, archeological artifacts, fossils, artwork, etc. Different fields may categorize samples from multiple perspectives simultaneously (e.g., nanomaterials are considered both physical particles and molecular entities, proteins are molecular entities of biological origin). Samples may consist of multiple components, in multiple phases; samples may represent collections of multiple entities, or single entities. 

The sampling scheme is a critical aspect of designing any experiment to yield informative and reproducible results. A number of factors around sample collection, storage and processing are relevant for interpretation of measurement data derived from those samples. Different samples may be collected for different purposes: for example, biological specimens (or parts of specimens such as leaf for plants and tissue for animals), soil, and even air samples. Samples may be dependent on conditions of handling and storage (e.g., humidity, temperature), and may also be subject to further processing workflows (e.g., dispersion, mixing, plating, staining etc.). Samples may have spatial, temporal or other relationships that need to be articulated. A macro sample may be collected, with subsequent subsamples taken at increasing granularity down to the nanoscale, and multiple series of parent-child relationships need to be documented. 

There are many well developed identifiers and other semantic descriptions used to describe different facets of sample provenance. A few cross-domain community endorsed examples include  ISO 19156: 2013 Observations, Measurements and Samples and the W3C/OGC Semantic Sensor Network Ontology which includes the core SOSA (Sensor, Observation, Sample, and Actuator) Ontology for its elementary classes and properties. Other more domain specific approaches include iSamples  and the Global Biodiversity Information Facility (GBIF). How widely known and used are these existing cross domain ontologies and models? New international cross-domain ontologies are being published, as well as community driven ontologies, such as in earth sciences and biodiversity; how can we adapt these to be suitable for additional disciplines? 

The ability to compile data from disparate disciplines will greatly facilitate the opportunity to answer broader, global challenges. Harmonization of sample descriptions will also facilitate the workflow of instrument facilities that apply physical measurement techniques to extremely diverse sample types and need to meet a broad range of user needs for documentation. This session brings together a variety of disciplines including geochemistry, biodiversity, nanomaterials, analytical chemistry, and crystallography, among others, to explore approaches to harmonization around sample description and provenance.

The expected outcomes of this discussion will be to:

  • Compile a list of needs for describing sample types, origin, processing workflows and other requirements across disciplines  

  • Identify existing identifiers, classifications, ontologies and terminologies that support these descriptions 

  • Identify if there is engagement to propose an RDA Working Group project to develop best practices for sample data model specifications 

Meeting agenda: 

We intend spending half of the session sharing perspectives and half identifying what the wider community needs are, what may already exist to help address these and further action required.

(40 Min)  Fllash talks: Show examples of samples in speaker’s field and demonstrate challenges around describing them)

  • Kerstin Lehnert (PSIG - IGSN)

  • Debora Pignatari Drucker (WP10 - IGAD CoP)

  • Alexander Prent (WP05 - ideas about Core Research Object)

  • Rolf Krahl (PaN)

  • Stuart Chalk (WP03 IUPAC-NIST data modeling) 

  • Iseult Lynch (WP04) 

  • Simon Hodson, Arofan Gregory (WP02)

  • Potentially others TBD 

(40 min) Discussion of themes 

  • Needs for describing sample types, origin, processing workflows and other requirements across disciplines

  • Existing identifiers, classifications, ontologies and terminologies that support these descriptions 

  • Interest for an RDA Working Group project to develop best practices for sample data model specifications

Type of Meeting: 
Informative meeting
Short introduction describing any previous activities: 

Physical Samples and Collections in the Research Data Ecosystem IG 

Physical Samples IG aims to facilitate cross-domain exchange and convergence on key issues related to the digital representation of physical samples and collections. The group has published the RDA 23 Things Physical Samples  as an overview of practical, free, online resources and tools that are available for describing and working with physical samples.  The group has organized on multiple occasions webinar(s) to address challenging working with samples. In the webinar titled “Supporting Reproducibility by Capturing Physical Sample Data and Metadata in a Connected Electronic Lab Notebook”,  researchers provided an overview of their point of view and  presented  a case study of the kinds of problems created by inadequate sample tracking and the lack of workable alternatives for associating samples with the experimental record. While in the RRIDs: A Way to Track Samples Through the Scientific Literature webinar series, the webinar shared information about RRIDs (Research Resource ID’s), persistent identifiers that are used to track key biological resources, i.e., "physical samples" for the biomedical field, successfully drove improvements in resource identification. 

Chemistry Research Data IG

Describing diverse chemistry datasets across distributed data resources: This session was organized at the RDA P20 by the Chemistry Research Data Interest Group (CRDIG). This session provided updates and perspectives from regional and disciplinary initiatives relevant to chemistry, focussing on the challenge of describing chemistry data sets to enable interoperability and reuse across resources and domains. This was followed by a discussion that aimed to identify cross-community challenges that might be addressed through activities within the RDA. The discussion identified areas of focus that the group will aim to take forward in collaboration with other RDA groups and community initiatives. One of the key outcomes was the opportunity for cross-community collaboration on agreeing standard approaches for describing samples in chemistry that align with wider initiatives.

CODATA-RDA WorldFAIR project

The WorldFAIR project sets out to produce recommendations, interoperability frameworks and guidelines for FAIR data implementation and assessment. The WorldFAIR approach, outputs and modes of dissemination will significantly strengthen international cooperation in order to increase and mainstream FAIRness of data and digital objects. The project aims to join up disconnected initiatives on data management, data stewardship, and FAIR data practices, within and across disciplines and internationally, by utilising eleven case studies. Several recent deliverables have been posted describing recommendations across different disciplines: 

What is a chemical? Webinar series

In this webinar series, WorldFAIR Chemistry (WP03) aimed mainly  to understand the chemical substance notations used within multiple disciplines (Geochemistry (WP05) , Nanochemistry (WP04), Atmospheric chemistry, Environmental chemistry, Oceanography, Crystallography, etc.). Participants across the aforementioned disciplines agreed on having a common challenge of real sample complexity (Can we have chemical identifiers to reflect sample complexity?)

BoF chair serving as contact person: 
Meeting presenters: 
Kerstin Lehnert (IGSN), Debora Pignatari Drucker (WP10/IGAD CoP), Alexander Prent (WP05), Rolf Krahl (PaN), Iseult Lynch (WP04), Stuart Chalk (WP03/CRDIG), Simon Hodson (WP02)
Contact for group (email): 
Driven by RDA Organisational Member: 
Applicable Pathways: 
FAIR, CARE, TRUST - Adoption, Implementation, and Deployment
Semantics, Ontology, Standardisation
Discipline Focused Data Issues
Estimate of the required room capacity: 
I Understand a Chair Must be Present at the Event to Hold the Breakout Session: 
Please indicate at least (3) three breakout slots that would suit your meeting.: 
Breakout 1
Breakout 3
Breakout 6