Metadata Interest Group
Chairs: Keith Jeffery & Rebecca Koskela
A 90-minute breakout session was organised at RDA Plenary 3 Dublin. We had > 50 attendees covering many countries and domain interests from astronomy and particle physics through bio and materials sciences, physical sciences and on to social sciences and humanities. We also had librarians and archivists.
Keith Jeffery introduced the work to date. There was extensive discussion over the uses of metadata leading to agreement on the following major uses:
- discover the dataset(s) of interest;
- evaluate them for suitability for the intended purpose;
- understand them in context (i.e. the dataset related to projects, persons, organisations, facilities equipment, publications…);
- allow interoperation in the sense of a homogenous query over heterogeneous datasets
There was general agreement that there are many metadata ‘standards’ (de facto or de jure) and that there was a need to interoperate metadata from the same and different research domains.
A diagram was shown and discussed indicating the relationships between MIG (Metadata Interest Group, MSDWG (Metadata Standards Directory Working Group) and DICIG (Data In Context Working Group). It was also noted that the Group on Provenance was relevant (especially for versions of datasets). There was agreement that for research datasets we need metadata at three levels (although the boundaries are not firmly defined): discovery level (covering (1) above), contextual level (covering (2) and (3) above), detailed level (covering (4) above). There was discussion on exactly where the boundaries are and what kinds of metadata fell into which category. After the discussion there was general agreement on the three levels – but with some reservations and a desire to verify such a model with use cases.
It was agreed that for cross domain work (even within one research domain) we needed to find metadata down to the lowest level of commonality – below which metadata became domain, instrument, observation or even experiment / dataset specific. This lowest common level involved contextual metadata concerning entities / objects related to the dataset from other datasets and publications to persons, organisations, projects, funding and related facilities and equipment.
Semantics were also discussed. It was agreed that we need to be able to handle multiple semantics related to the same dataset and multilinguality. This implies full ontology facilities. Various examples of implemented vocabularies in ontologies were discussed.
There was a useful discussion on the dataset lifecycle and the need for data management planning ensuring that at each stage of the lifecycle relevant metadata are collected. The planning of incremental collection of metadata (either automated or manually) was approved since it reduces the threshold barrier for the researcher in providing metadata associated with a dataset.
There was a discussion on the use of triples in RDF for metadata or data with several participants expressing concerns over scalability.
There was a general feeling that we should work towards a general architectural framework for metadata in RDA. The 3-level model is a starting point but needs verification over a range of domains and requirements. It was agreed that MIG, MSDWG and DICIG will organize an easy-to-use template for use cases to be collected via the website over the coming months.
Finally an outline programme of work was suggested and approved:
- Needs of other WGs /IGs
- Architectural principles
September 2014-March 2015
- Reference architecture
- Verify with other WGs/ IGs
March 2015-September 2015
And the group agreed to continue working using available communication means
- Meetings at Plenaries
- Email lists
- Virtual meetings between
The meeting concluded with the co-chairs thanking all the participants for their active engagement and encouragement to register with RDA if not already and join the group list to receive ongoing information.