RDA 9th Plenary Joint meeting: IG Data in Context, IG Metadata

Evolving Metadata and the Metadata Lifecycle

Groups: IG Data in Context, IG Metadata

With massively growing amounts of data, metadata have gained increasing importance. Not always is the value of metadata itself appropriately acknowledged, and there are different notions of metadata; what is counted as data and what as metadata also varies. The curation of metadata is expected to contribute to improvements in information quality and reliability, and also to managing access and thus openness of the underlying data. Increasingly, it becomes clear that getting an overview about data collections in the age of “big data” is crucial in every advanced information management environment. The Data in Context Interest Group focuses on “contextual metadata”.

The Metadata IG concerns itself with all aspects of metadata for research data. In particular, it coordinates the efforts of the WGs concerned with metadata to produce a coherent approach to metadata that covers metadata modalities of description, restriction, navigation, provenance, preservation and the use of metadata for the purposes discovery, contextualization, validation, analytical processing, simulation, visualization and interoperation. It also liaises with the other WGs such as Data Foundation and Terminology, PIDs, Standardization of data categories and codes and Data Citation. The activities of this IG relate to data management policies and plans of research organizations and researchers, and to policies and standards of research funders and of research communities that may or may not be official standards.

Data in Context Interest Group: https://www.rd-alliance.org/groups/data-context-ig.html
Metadata Interest Group: https://www.rd-alliance.org/groups/metadata-ig.html

In the near future, the use of emerging collaborative electronic technologies will require reliable, incorruptible, authenticated, secure, encrypted tracking of all facets of data and software management, analytical pipelines, network and hardware use and versioning. See references 1–5 for some examples of distributed models for information storage, curation, and access in health care, technology, and science. A series of forums (e.g., references 6–8) are currently discussing the technological and societal implementation of this idea: How can universities, funders, repositories, publishers, and others create meaningful networked knowledge objects that can live on these distributed networks? The intent of this session is look at the metadata management and curation processes needed to ensure usability and comprehensibility of such data, both within and between disciplines. What metadata are needed to enable discovery and reuse data, both across domains and across silos within domains? What tools are available for metadata creation and improvement that can be used with workflows to incorporate good data practices? In particular, this session will focus on what happens after research data are shared or published. We want to explore using metadata to capture a dataset’s ‘microprovenance’: the concept that a ‘trail’ of data use, citation, and modification is stored as an evolving metadata object that is appended to a data file or data collection. There are parallels here with recent applications of blockchain technology (9–12). These may be useful examples because we want to enable the distributed and, where possible, automated creation of provenance trails.

In this workshop, we aim to do three things:

  1. Clearly describe the goal of the Interest Group, and achieve buy-in from participants towards this goal;
  2. Make a start toward collecting a series of practical use cases, where evolving provenance trails could help solve a real-world problem of data repositories, data creators, funders, and publishers;
  3. Divide up the work for the IG based on this goal and these use cases.

Meeting facilitators: Anita deWaard, Rebecca Koskela, and Mark Musen

  • Welcome and Tour le table (10 min)
  • Introduction of Metadata Lifecycle (5 min)
  • Center for Expanded Data Annotation and Retrieval (Mark Musen, Stanford University) (10 min)
  • Dataverse (Mercè Crosas, Harvard University) (10 min)
  • National Institute of Standards and Technology (Robert Hanisch, Office of Data and Informatics) (10 min)
  • Publishers (Anita de Waard, Elsevier) (10 min)
  • Discussion (30 min)
  • Next Steps (5 min)


The target audience will include anyone interested in the meeting objectives. This will include researchers from various domains interested in reusing data or making their data reusable, representatives of research infrastructures, publishers, and funders.


  1. Open, distributed repositories, eg. Earth System Grid Federation ESGF P2P is a component architecture expressly designed to handle large-scale data management for worldwide distribution”: https://esgf.llnl.gov/
  2. iRods (Integrated Rules Oriented Data System) technology to enable automated enforcement of policies and a reporting infrastructure of policy-based data management, e.g. see http://www.americanlaboratory.com/172718-Policy-Based-Data-Management-Th...
  3. Federated systems for health care, article in Nature: http://www.nature.com/nbt/journal/v33/n4/full/nbt.3180.html
  4. Distributing the nodes where scientific insight is generated: https://www.computer.org/cms/Computer.org/ComputingNow/issues/2016/09/me...
  5. Pop up labs: http://www.popup-labs.com/
  6. NITRD Workshop on ‘Measuring the Impact of Digital Repositories”: https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
  7. Meeting and draft White Paper on “TOKeN: The Open Knowledge Network: Creating the Semantic Information Infrastructure for the Future”, to define architecture for semantic, open knowledge networks involving terascale networks, uncertainty/evidence bases/semantics, involving Cyc, Google, Amazon: https://docs.google.com/document/d/1KQaBKCo_e9ku5udZLxThhi8vS_fU8m9ne9Ac...
  8. Imagining Tomorrow’s University, connecting institutional leaders with early career/Open Science users/open science advocates and address question: how does Open Science impact institutions?: http://www.ncsa.illinois.edu/Conferences/ImagineU/
  9. Blockchain technology for shipping and tracking goods, article in NY Times: https://www.nytimes.com/2017/03/04/business/dealbook/blockchain-ibm-bitc...
  10. Using blockchains for storing patient data, article in Wired (https://www.wired.com/2017/02/moving-patient-data-messy-blockchain-help/
  11. Another article on Blockchains: https://www.wired.com/2017/03/google-deepminds-untrendy-blockchain-play-...
  12. https://www.constellationr.com/blog-news/blockchain-healthcare-and-leadi...


Paul N. Edwards, Matthew S. Mayernik, Archer L. Batcheller, Geoffrey C. Bowker and Christine L. Borgman. Science friction: Data, metadata, and collaboration, Social Studies of Science 2011 41: 667 originally published online 15 August 2011. DOI: 10.1177/0306312711413314. The online version of this article can be found http://sss.sagepub.com/content/41/5/667

Anita de Waard and Joost Kircz, Metadata in Science Publishing. 2003, http://wwwis.win.tue.nl/infwet03/proceedings/8/


