Describing diverse chemistry datasets across distributed data resources
-
Discussion
-
Collaborative session notes: https://docs.google.com/document/d/1_X2e1qCEedP4IaKr1uwsgB5Eh7_7Unl4KeUn…
We intend spending half of the session sharing perspectives and half discussing what the wider community needs are, what may already exist to help address these and further action required.
Contributions and participation will be drawn from the list below who have all indicated some interest. A more detailed agenda will be established nearer the time. Note that we do not expect all those mentioned will give presentations and we will aim to have some discussions amongst stakeholders in advance so we can efficiently lay out the challenges to be considered during the session.NFDI4Chem
OneGeochemistry
UK Catalysis Hub
UK Physical Sciences Data Infrastructure
IUPAC standards projects (e.g., Gold Book, FAIRSpec, ThermoML, Adsorption Information Framework, Solubility metadata, etc.)
IUPAC WorldFAIR Working Groups
The InChI Trust
Chemistry domain repositories
European Materials & Modelling Ontology (EMMO)
The RDA PDINST Working Group
DataCite representatives and/or DataCite Metadata Working Group
Additional links to informative material
CRDIG: https://www.rd-alliance.org/groups/chemistry-research-data-interest-group.html
OneGeochemistry: https://www.earthchem.org/communities/onegeochemistry/
NFDI4Chem: https://www.nfdi4chem.de
PSDI: https://www.psdi.ac.uk/
UK Catalysis Hub: https://ukcatalysishub.co.uk/ /
IUPAC Digital Standards: https://iupac.org/what-we-do/digital-standards/
IUPAC WorldFAIR: https://iupac.org/worldfair-global-cooperation-on-fair-data-policy-and-practice/
IUPAC FAIR Chemistry: https://zenodo.org/communities/fairchemistry/
FAIRSpec: https://doi.org/10.26434/chemrxiv-2022-t783k
The InChI Trust: https://www.inchi-trust.org/
Chemistry Domain Repositories: https://www.nfdi4chem.de/index.php/repos/
EMMO: https://github.com/emmo-repo/EMMO
Avoid conflict with the following group (1)
Data Usage Metrics WGBrief introduction describing the activities and scope of the group
The Chemistry Research Data Interest Group exists to provide a forum for discussion of matters relating to the sharing and reuse of research data generated by and relevant to the chemical sciences and aligned disciplines. We periodically convene sessions to address timely topics that may be of broad community interest and can benefit from input by experts from across RDA communities. The activities of the group complement data-related activities being undertaken by other national and international chemistry initiatives, in particular the International Union of Pure and Applied Chemistry (IUPAC), the standards body for chemistry.Group chair serving as contact person
Ian BrunoMeeting objectives
Increasing numbers of chemistry-related datasets are becoming available for re-use through data repositories that range from generic to very specialised. These repositories may specialise in one particular data type, or focus on one particular sub-discipline; some may capture just the raw data produced by an experiment, or results derived from those datasets; others may be less structured and accept whatever files are deposited.
Chemistry data can be quite complex with a broad variety of data types and associated information necessary to analyse. Datasets may contain collections of files relating to different analytical methods that together describe an overall result but also have relevance individually. Much of chemical data can be rendered meaningless without adequate contextual description (e.g., samples, conditions, measurement parameters) and there are many approaches to documenting this information.
Repositories will necessarily involve a variety of in depth description of datasets and the scope of coverage can impact the availability of metadata that is exposed. Additionally, it is important to ensure domain metadata are registered alongside DOIs with appropriate description granularity. Implementation of domain data standards will be critical for maximising interoperability and sustainability.
This session aims to explore the metadata descriptions needed to ensure that deposited chemical data may be fully discovered and robustly re-used broadly across disciplines and use cases.
We will explore specific needs and challenges relating to the description of chemistry datasets including:The vocabularies needed to characterise disciplines and sub-domains relevant to a particular chemistry dataset
How to accurately represent the specific type of a chemistry dataset
How to describe chemical substances relevant to a chemistry dataset
Deficiencies in general-purpose metadata schema that prevent us from describing chemistry datasets to the level of detail required
For each of these, we will address the following questions:
To what extent do existing scientific standards and motifs provide a basis for addressing our needs and how might these need to evolve?
How do we ensure that the technical community is aware of existing standards that are applicable to the development of new infrastructure?
Are new standards or recommendations, chemistry-related or otherwise, required to ensure the consistency and coherence of description across diverse data resources?
How do we engage subject experts in helping to fill scientific gaps in both general and domain-specific repositories?
We will also consider more general concerns such as:
The level of description appropriate to include in disciplinary vs general data and metadata repositories and registries
How to effectively group together different data objects required to reproduce published results when these may be stored in different repositories
Whether new services are required to enable the discovery, reuse and interoperability of chemistry datasets across resources
Privacy Policy
1
Log in to reply.