Cultural Heritage Science Data at the Crossroads

You are here

27 Nov 2019

Cultural Heritage Science Data at the Crossroads

Submitted by Fenella France


Meeting objectives: 

Cultural heritage data is at the convergence of historical and cross-cultural humanities, social and physical science data. Further, heritage science [1] research data is inherently multi-disciplinary, including scientists from a diverse range of fields, including chemistry, physics, material science, engineering, and archeology. With increasing heritage preservation challenges – moveable collections and heritage sites – the data and information collected needs to be collated and shared with an ever-wider range of colleagues, not infrequently with expertise in previously unrelated fields. For instance, it is more critical than ever that sites and environments are closely monitored, with changes tracked over time, which is itself an important component of many scientific research projects. Data fusion of these data sets requires a common terminology to allow event-based coordination of research activities, and the ability to crosswalk between appropriate ontologies and schema.

 

To align with the mission of RDA we consider this data initiate extremely relevant to real-world data issues, especially in relation to the destruction and damage of cultural heritage materials worldwide. As part of our engagement with the Council on Library and Information Resources (CLIR) we are looking to use this infrastructure to address heritage preservation challenges through truly global, linked and shared heritage science data. CLIR had a strong global focus for the advancement of knowledge not bound by national lines since these challenges to cultural heritage with climate change issues are also not bound by nations (https://www.clir.org/global/). CLIR’s current response is an evolving initiative, Pangia: an open, interoperable, advanced quantitative environment that will preserve and make reusable digitized cultural and scientific knowledge, essential to addressing the climate crisis. Our CH data platform is integrally linked with this initiative. Pangia is in partnership with Stanford University and currently in formal discussion with Europeana (Europeana.eu - is an initiative of the European Union for sharing heritage collection data and digital tools). We consider this data challenge to have real-world impact, and the opportunity of the BoF will enable us to form a strong WG networking across science and humanities’ fields to share data and technical solutions.

 

We are aware of numerous RDA IGs sharing and discussing solutions and challenges relevant to the cultural heritage sphere. Our hope is that interested parties can be drawn together to form a WG to address the increasingly critical need for a baseline level of data sharing, and to establish this through mutual agreement and published guidelines for the integration of data. We have made some headway independently, and would like to demonstrate what we have been working on for discussion and to open up the subject. However, our work in isolation cannot address the need for international, cross-institution, data interoperability.

 

IGs we have coordinated with include Libraries for Research Data IG, Research Data Architectures in Research Institutions IG, Chemistry Research Data IG, RDA/CODATA Materials Data, Infrastructure & Interoperability IG, and the Physical Samples and Collections in the Research Data Ecosystem IG. There is interest in various overlap areas with colleagues including data scientists, chemists, materials scientists, earth scientists, life scientists, archaeologists, and anthropologists, computer and information scientists, and collection and museum curators. We will be setting up conference calls early in the New Year to discuss specific involvement and further information for group members.

 

Most disciplines are using similar instruments and techniques, therefore being able to access non-proprietary file formats and a data model/structure that allows dataset fusion would be of great benefit for collaborations, especially on the international level. The STEM community could make scientific data and datasets available to a broader user base, and share relevant data between disciplines more readily, thorough structured linked data[2] that reuses existing vocabularies[3] and models[4], rather than have each institution create new structures that meet its needs only, too specifically, and in ways that prove more or less incompatible with that of other institutions’. The United States and Europe have initiatives to move forward with shared open data, especially in regards to broader and easier dissemination for diverse user groups – scientists, researchers, academia, government, and the public.

 

The purpose of this proposed BoF meeting is to address the increasingly critical need for a baseline level of data sharing, and to establish this through mutual agreement and published guidelines for the integration of data. To achieve this aim we will explore what could be used as the top 10 ontologies for scientific data terminology, accepted file formats and structure, and required high level metadata. Further, to understand what other RDA STEM/STEAM group projects related to shared LOD scientific authoritative sources for instrumentation and analytical techniques are underway that we can make use of, contribute to and integrate with heritage science data. Some of these include the IG on Digital Practices in History and Ethnography, the IG on International Indigenous Data Sovereignty, the IG on Physical Samples and Collections in the Research Data Ecosystem, IG on Social Science Research, WG Empirical Humanities Metadata, and WG Metadata Standards for Attribution of Physical and Digital Collections. Discussions with data and informatics and library colleagues at the National Institute of Standards and Technology (NIST) and other institutions have indicated significant crossover between research data within chemistry, archaeology, materials science, physics and other fields. Addressing the current capabilities for true linked open data that exist within STEM disciplines to provide authoritative sources will form the scope of the discussion around heritage science, to determine how this may already be represented or developed within existing RDA interest and working groups. Data analytical and instrumentation types include spectral imaging, x-ray fluorescence (XRF), gas-chromatography-mass spectrometry (GC-MS), fiber optic reflectance spectrometry (FORS), Fourier transform infrared spectrometry (FTIR), size exclusion chromatography (SEC), and spectral imaging to name a few.

 

Meeting objectives: Discussion of what a baseline level of data sharing would look like, and what guidelines would be needed to begin that integration of data.

  • Discuss the current heritage science data initiative to assess the objective analysis of 3000 of the same volumes, instrumental sources, the challenges of access to active datasets, and avoiding creating new terminology.
  • Discussion of how to move forward in collaboration with RDA partners for  creation of a Working Group
  • Discussion of advantages, issues, challenges and opportunities
  • Current heritage science and STEM/STEAM vocabularies, terminologies
  • Identification of interested and additional partners to expand and engage a broader and more diverse audience (such as small and lower–funded institutions or in economically disadvantaged nations and regions)
  • Identification of interested and additional partners for developing an expandable base model suitable for describing cultural heritage scientific data alongside humanities’ data: discussion will include baseline guidelines with minimum barrier to entry to encourage adoption
Meeting agenda: 

The target audience of this meeting will include other scientific discipline interest groups, cultural heritage members, libraries and archives, data science, reference collections, metadata standards, ontologies, research data repository or infrastructure developers and providers, and all interested in integrated datasets and data fusion between related scientific disciplines. A declared goal of this session is to establish a RDA working group and learn from colleagues in related scientific disciplines who are facing similar issues for linking and sharing data.

  1. Presentation and demonstration of the data modelling for “Assessing the Condition of the National Collection” research project with active data integration (20 minutes)
  2. Overview of the CLIR Pangia global data sharing initiative (https://www.clir.org/?s=pangia) 5 minutes
  3. Update on the current status of the European Research Infrastructure for Heritage Science – Digilab (5 minutes)
  4. Review of commonly used data research analytical and instrumentation: For heritage data these include spectral imaging, microscopy, x-ray fluorescence (XRF), gas-chromatography-mass spectrometry (GC-MS), fiber optic reflectance spectrometry (FORS), Fourier transform infrared spectrometry (FTIR)
    Presenters: Fenella France* ***, Andrew Forsberg*, John Henry Scott** (* Library of Congress LC; ** National Institute of Standards and Technology NIST, ***Council on Library and Information Resources (CLIR)),
  5. Overview of the objectives for moving forward and discussion of how to engage with current interest groups, (discussions, all participants) 40 minutes. 
  6. Other international partners include the Council on Library and Information Resources (CLIR), the International Image Interoperability Framework (IIIF), Stanford University, The European Research Infrastructure for Heritage Science (E-RIHS)[1] National Research Council (CNI) Florence, Italy), National Gallery London.
  1. Identification of other potential group members (all participants) (5 minutes)
  2. Summary of the results, actions, and identification of contributions of the group members (TBD) 25 minutes

The intent of this BOF is to develop a working group to identify specific components of heritage science data that overlap and can enhance other current RDA WG initiatives.

The minutes of the meeting will be published at the latest one week after the session as an attachment to the session’s web page.

Type of Meeting: 
Informative meeting
Short introduction describing any previous activities: 

This approach is to assemble a pragmatic selection of use-case driven solutions for real world scenarios, one which privileges ‘recipes’ with minimal barriers for entry. To this end, we are currently working on a pilot series of LOD design patterns/recipes for the Mellon-funded ‘Assessing the Physical Condition of the National Collection’[5] (APCNC) project. As is the norm for the cultural heritage domain, APCNC’s data, its producers and consumers, bridge numerous traditional disciplinary borders. We need to integrate and query data from some quite standard sources, such as online cataloging records (OCLC[6]), but the overwhelming majority of it is from more disparate sources, such as:

  • the actual publication details collected from the partner institutions’ physical books;
  • quantitative and qualitative book, binding, and paper descriptions and condition assessments; and,
  • data and metadata for a wide array of scientific procedures, instruments, and analyses.

We are using CouchDB[7] as a JSON[8] document store to enable interrogating heterogeneous datasets for trends and correlations. However, a primary goal is to publish (via an API) the same serialized as JSON-LD (and Turtle, etc) for sharing publicly, and for consumption by other internal tools and services. We could use a triple store, but would like that to not be a requirement.


[1] https://en.wikipedia.org/wiki/Heritage_science Heritage science is cross-disciplinary scientific research of cultural heritage. It is the application of science and technology to heritage to improve understanding, engagement and its long-term management.

[2] https://www.w3.org/TR/ld-glossary/#x5-star-linked-open-data

[3] https://en.wikipedia.org/wiki/Controlled_vocabulary Vocabulary terms allow LOD to clearly identify exactly what data refers to in a way that both people and computers can interact with. https://www.w3.org/TR/ld-glossary/#vocabulary; And/or: https://www.w3.org/TR/ld-glossary/#ontology

[4] https://en.wikipedia.org/wiki/Data_model LOD models define how the data is organized in a standardized way.

[5] https://nationalbookcollection.org/

[6] https://www.oclc.org/en/home.html and https://www.worldcat.org/

[7] Apache CouchDB: https://couchdb.apache.org/

[8] JSON: https://www.w3.org/TR/ld-glossary/#json, JSON-LD: https://www.w3.org/TR/ld-glossary/#json-ld

 

BoF chair serving as contact person: 
Remote participation availability (only for physical Plenaries): 
Yes
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2): 
Avoid conflict with the following group (3):