Version 0.9, 9/17/2017
Name of Proposed Interest Group: Physical Samples and Collections in the Research Data Ecosystem
Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):
Physical samples are a basic element for reference, study, and experimentation in research. Tests and analysis are conducted directly on samples, such as biological specimens, rock or mineral specimens, soil or sediment cores, plants and seeds, water quality samples, archeological artefacts, or DNA and human tissue samples, because they represent a wider population or a larger context. Other physical objects, such as maps or analog images are also direct objects of study, and, if digitized, may become a source of digital data. There is an urgent need for better integrating these physical objects into the digital research data ecosystem, both in a global and in an interdisciplinary context to support search, retrieval, analysis, reuse, preservation and scientific reproducibility. This group aims to facilitate cross-domain exchange and convergence on key issues related to the digital representation of physical samples and collections, including but not limited to use of globally unique and persistent identifiers for samples to support unambiguous citation and linking of information in distributed data systems and with publications, metadata standards for documenting samples and collections and for landing pages, access policies, and best practices for sample and collection catalog, including a broad range of issues from interoperability to persistence.
A growing community of stakeholders, comprising domain scientists, collection curators, information scientists, data managers, all working at the interface with computational science, are developing detailed practices and standards around identifiers, vocabularies, and software interfaces, which are necessary for wider community application. Publishers and funders represent additional stakeholders interested in best practices for sample citation and registration of sample metadata in online catalogs that are fundamental for reproducibility of sample-based data and future use of valuable collection specimens. Currently, these efforts are fragmented, as is the communication of technical solutions and organizational best practices. This IG will support cross-disciplinary and international dialog helping to build technical and social bridges among a broad range of stakeholders to align and coordinate ongoing efforts, strengthen solutions, and broaden their adoption.
At RDA Plenary 4 and Plenary 6 Bird of Feather sessions were held that already gathered an international and multi-disciplinary group of stakeholders. A preliminary case statement was reviewed by participants in the P6 BoF and informed the current version.
User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):
Best practices, standards, and infrastructure are needed to properly link physical samples and collections to digital data generated by their study or to features in the real world. Samples need to be cited with globally unique, persistent, and resolvable identifiers in publications to ensure that they can be unambiguously linked to online metadata profiles (landing pages) and to other data generated by other studies of the same sample. Scientists want to search for data for a given sample across the entire literature. This can now be achieved as sample PIDs can be included in publication DOIs or data DOIs as related identifiers that can be harvested and searched through systems like SCHOLIX. Scientists also want to find out where a given sample can be accessed to reproduce data or add new measurements to the available knowledge about a sample. Both the approaches to, and maturity of technical and organizational solutions and infrastructure differ across the many disciplines that work with physical samples. Diverse and uncoordinated practices make it difficult to advance the adoption of best practices that link physical samples to the digital research data ecosystem. Further, commercial software providers for museum and collection catalogs and publishers are reluctant to implement best practices if they are different and incompatible across domains.
Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place. Articulate how this group is different from other current activities inside or outside of RDA.):
RDA presents a multi-disciplinary and international community engaged in research data management that presents a unique opportunity for the goals of this IG. The objectives of this IG are:
- Identify commonalities and diversities across the stakeholders and establish prioritized action items that are appropriate for Working Groups. Relevant issues are: unique sample identifiers; sample documentation including vocabularies and taxonomies and alignment with international metadata standards; sample registration and interoperability of digital online catalogs; policies for sample citation in publications; and access to samples and sample metadata.
- Identify and characterize existing systems and solutions relevant to linking physical samples with digital research data; identify gaps and challenges.
- Facilitate international cooperation to develop harmonized approaches and best practices for physical object identification and digital curation; enable the facilitation of object and sample identification infrastructure both at the national and international levels.
- Build linkages between object repositories and museums, digital data repositories, scientific publications, museum software providers, and science communities.
Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities. Also address how this group proposes to coordinate its activity with relevant related groups.):
Communities that will be involved in this IG range from museum and collection curators, to research data managers, researchers in domain sciences, information sciences, and computer sciences, to publishers and funders. Various workshops happened over the last few years that brought together stakeholders primarily from interested in the topic of Physical Samples in the Digital Research Ecosystem, including:
- Linking Environmental Data and Samples, CSIRO, Australia, May 2017
- Physical Samples and Digital Collections, iConference, China, March 2017
- Physical Samples, Digital Collections, ASIS&T Conference, Denmark, October 2016
In the previous two BoF sessions at RDA P4 and P6 the following communities were represented:
- French science archive
- Australian meteorology - water quality sampling and provenance
- German Research Center, library
- European PID network
- Geological Society
- Kew Gardens
- CDL - neurobiology, Berkeley museum
- Agricultural research in Italy, soil samples
- Zoology and environmental science
- Provenance and workflows, biodiversity workflows
- Material science, Air Force
- Natural History Museums
- National Repositories
We will facilitate workshops at ASIS&T, JCDL, SPNHC, and domain-specific conferences such as AGU for the Earth and Space Sciences to broaden participation and dissemination of outcomes.
We will work with the following organizations to engage relevant communities.
- International Geo Sample Number IGSN e.V. (Kerstin Lehnert, Jens Klump; http://www.igsn.org) - Global implementation organization for unique sample identifiers, members if 5 continents)
- Global Biodiversity Information Facility (Donald Holborn)
- Taxonomic Data Working Group (John Wieczorek, http://www.tdwg.org)
- DISSCO (Distributed System of Scientific Collections, http://dissco.eu)
- AuScope (Lesley Wyborn, http://www.auscope.org.au/)
- EPOS (Kirsten Elger, https://www.epos-ip.org/ )
- SPNHC (Society For The Preservation of Natural History Collections, http://www.spnhc.org)
- DataCite (https://www.datacite.org)
- CODATA Task Group on Coordinating Data Standards amongst Scientific Unions (Marshall Ma, http://www.codata.org/task-groups/coordinating-data-standards )
- ESIP (Earth Science Information Partners) (Erin Robinson, http://www.esipfed.org)
- Scientific Collections International (SciColl, http://scicoll.org)
Related RDA groups
● WG/IG Chairs
Outcomes (Discuss what the IG intends to accomplish. Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):
- A report that synthesizes existing best practices for digital curation and sharing of physical samples from disparate disciplines and institutions.
- A journal special volume on sample and collection management in the research data ecosystem (journal TBD).
- Creation of RDA Working Groups to develop recommendations for best practices and standards related to sample unique identifiers, sample metadata, and sample citation, such that they can be linked with data and publications derived from them.
- Joint sessions with other RDA groups such as Biodiversity Data Integration IG, Long Tail of Research Data IG, PID IG, Research Data Provenance, and others as appropriate for knowledge exchange, to align with emerging relevant standards, and to promote recommendations from the IG.
- Facilitation of collaborations that advance interoperability between collection catalogs, sample registries, data repositories, and publications for improved data sharing across disparate disciplines, through e.g., alignment of sample metadata with existing metadata standards.
Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):
- Primary mechanism for communication will be email with periodic (quarterly web conferencing) meetings, plus sessions at plenary meetings.
- We will also leverage other meetings such as EGU, AGU, SciDataCon, and ESIP
- Knowledge gathering and capture will be via RDA IG web site. We may use other collaboration tools as appropriate, e.g. wiki’s, or tools such as GitHub, or Center for Open Science Open Science Framework.
Timeline (Describe draft milestones and goals for the first 12 months):
September 2017 - P10 session: BoF session, presentation of Case Statement
December 2017 - AGU meeting and progress report
March 2018 - P11 session, evaluate progress, revisit workplan
September 2018 - P12
Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest): Bold indicates co-chairs
The following are additional potential participants who attended the previous BoF sessions at P4 and P6: