Updated 15 January 2016 in response to TAB review
A concise articulation of what issues the WG will address within a 12-18 month time frame and what its “deliverables” or outcomes will be.
The Metadata Standards Catalog (MSC) Working Group will produce a catalog of metadata standards of relevance to research data. Specifically, the catalog system will consist of the following components:
- A set of records describing metadata standards
By ‘metadata standard’ we mean a defined metadata structure and format used within a community. Some metadata standards have formal standardization status but this is not necessarily an indication of quality or utilization.
- A user interface for submitting information, searching, browsing and displaying standards information
- A machine-to-machine interface (API) allowing automated tools to submit information, perform queries and retrieve information from the catalog
In this sense the catalog will be ‘machine readable’. We also intend that the information provided through the API will be structured in such a way that the recipient machine will be able to process and act upon it; in this sense the catalog will also be ‘machine actionable’.
This work builds on the outputs of the Metadata Standards Directory Working Group: both the Metadata Standards Directory (MSD) itself and the set of attendant use cases. The advances to be made by the MSC Working Group beyond that work are improvements to the data structure of the records, an improved user interface, and the addition of an API. The latter development will enable the MSC to participate in and provide services to any autonomic e-research fabric. While the base set of records for the MSC will be derived from those in the MSD, further information will be sought from discipline-specific standards directories and RDA working groups.
A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.
Prediction is very difficult, especially if it's about the future. — Neils Bohr
The primary beneficiaries of this work – being a continuation of work performed by the MSD Working Group and the UK Digital Curation Centre – will be researchers, in the following ways:
- Early career researchers, or those unused to sharing their data, will be able to consult the MSC – whether directly or indirectly through other tools – and discover an appropriate standard to use when documenting their data. This will help them to comply with funder requirements for data management from the planning stages through to submission to an archive.
- Having a comprehensive catalogue of metadata standards will make it easier to prevent duplication of standards development effort (in areas where metadata standards already exist) and highlight gaps where standards development activity is needed.
- Following on from the above, it will therefore be easier for peer researchers to discover, validate and reuse datasets that have been shared, due to the expected metadata being in place.
The group recognizes that there exist catalogs of datasets, repositories, software and publications with limited metadata for each entry. There exists the BioSharing initiative for bioscience and DataONE in environmental sciences, and beyond them many projects and initiatives with some information on relevant datasets and metadata associated with them. While these all contribute to the above benefits, in each case the information available is only covering one or a few domains; and in the case of BioSharing, intended for human not machine action. The MSC will address both these shortcomings with respect to metadata standards for research data.
More specific and tangible benefits and impacts of the MSC will be identified and targeted as part of the work plan. Use cases continue to be collected by the existing RDA metadata groups, the Metadata Interest Group (MIG), Metadata Standards Directory Working Group (MSDWG), and Data in Context Interest Group (DICIG). These, along with use cases to be collected directly by MSC Working Group, will be used as a basis for the precise use cases to be addressed by the MSC and the information it will contain. The following are a few examples:
- Researchers and research support staff will be able to discover standards relevant to the researcher’s discipline.
- Researchers will be able to browse the catalog for standards in other disciplines that may be appropriate, perhaps to facilitate interdisciplinary or multidisciplinary work.
- Developers of tools such as DMPTool or DMPonline will be able to use information from the MSC, via the API, to suggest relevant standards to researchers.
- Systems will be able to look up, via the API, the converters available for unfamiliar standards, to reduce the friction of importing metadata from other systems.
- In the long term, the MSC may be able to provide information to enable repository software to provide tailored metadata input forms on-the-fly, or to generate a matching between standards that a developer could use as a starting point for a converter.
The medium term goal, however, is to analyze the metadata standards and use cases and generate – working with and through MIG – a set of proposed ‘packages’ of metadata elements for purposes drawn from the wider set of use cases. By ‘packages’ we mean groupings of metadata elements for particular purposes, such as discovery, contextualization, or detailed connection of software to data. For clarity, please note the following:
- Each element may not be a single-valued attribute but a structure.
- There are relationships between elements including those carrying referential and functional integrity.
- Elements may belong to more than one package.
Since the packages represent functions, one possible naming scheme would Function Package with a specific function represented as Function Package: Discovery. The packages describe here differ from application profiles. Application profiles are a mapping from a given data structure to that required for a particular application (business purpose). These packages are generic. The packages so generated will be useful for those writing converters, designing systems and considering new standards. In the longer term they may form the bais of novel presentations of schemas and specifications for metadata standards.
The information that the catalog will need to hold about each standard will depend on the use cases chosen for the MSC, but we anticipate it will include information on the schema/specification, converters available, associated vocabularies (on which collaboration with the Vocabulary Services Interest Group would be useful), associated tools, examples of services with expertise in the standard, the version history of the standard, and the provenance (source, date last updated, etc.) of the records themselves.
Engagement with existing work in the area
A brief review of related work and plan for engagement with any other activities in the area.
The only source of knowledge is experience. — Albert Einstein
The work of MSC Working Group will build on the outputs of the RDA MSD Working Group, as outlined in the case statement for that group. The group will engage with several actively maintained domain-specific standards directories:
The group will engage with other RDA groups with an interest in metadata standards:
- Wheat Data Interoperability Working Group
- Metadata IG
- Data in Context IG
- Data Fabric IG
- Practical Policy WG
- Brokering Governance WG
- Brokering IG
- ELIXIR Bridging Force IG
- Data Description and Interoperability WG
- Research Data Provenance IG
- Domain Repositories IG
The group will engage with other external activities with an interest in identifying and producing metadata standards:
A specific and detailed description of how the WG will operate.
M1-M6: Continue to collect use cases for interaction with the proposed catalog including both human access/interaction and machine access/interaction. Analyze these (using the MIG template presented at RDA Plenary 6) for intersections and synergies leading to a definition of the requirements and technical specification for the catalog.
M6-M12: Do initial design and development of the Catalog system prototype based on already collected use cases, defined requirements and technical specifications using best practices in metadata standards to describe metadata standards and their relationships to organisations, persons, software, datasets, etc. thereafter taking into account later-arriving use cases refine to the production catalog system. Cooperate with other RDA Groups (especially domain groups) in refining the prototype system to production status leading to adoption. Identify potential adopters of the Catalog system.
M12-M18: Evaluate the catalog against requirements and technical specifications with and by application domain communities of RDA and other potential adopters. Validate the Catalog mechanisms for directing users to metadata standard(s) appropriate to their purposes.
M14-M16: During this period of time the packages will be documented; standards to the catalog will be inputted "as they are"; a priority list of standards for mapping will be established by the community; the high priority standard swill be mapped to the packages; the mapping will be stored in the catalog; and a user interface and API to the catalog will be provided
M18: Report at next RDA Plenary
The form and description of final deliverables of the WG.
A catalog for metadata standards, incorporating
- User interfaces for input/edit/query/reporting
- Appropriate APIs for software interaction with the catalog
- Mechanisms for directing users to metadata standards appropriate to their purposes.
Milestones and Intermediate Deliverables
The form and description of milestones and intermediate documents, code or other deliverables that will be developed during the course of the WG’s work
- One or more white papers to stimulate discussions and steer group activities (M3, M9)
- Requirements and technical specification for the MSC, to serve as the foundation for software development (M6)
- Contributions to the MIG objective of defining ‘packages’ of metadata elements for defined purposes (M12)
Mode and Frequency of Operation
A description of the WG’s mode and frequency of operation (e.g. on-line and/or on-site, how frequently will the group meet, etc.).
The WG will provide the usual forum for discussion through teleconferences/Skypes (every three months), face-to-face meetings between plenaries associated with group chair meetings, face-to-face meetings at plenaries (twice a year), meetings at plenaries with other groups (especially the metadata groups but also others) and the RDA website forum as appropriate. However, the major mechanism of operation is the provision of software to encourage input/edit and utilisation of metadata standards.
A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation
The proposed co-chairs have extensive experience of project management generally and also within the RDA context due to participation in other groups. They have already demonstrated their ability to manage groups. Working from the agreed plan, all group members will participate in discussions on realization of the objectives and the means to do so. Consensus will be developed by frequent online (teleconference or email/forum) discussions mediated by one or more of the co-chairs. This mechanism will also achieve conflict resolution by exposing the different viewpoints and – through discussion – converging to consensus. The milestones defined above provide the anchorage to a timeline and the deliverables will be concrete proof of adherence to the timeline.
Community Engagement Plan
A description of the WG’s planned approach to broader community engagement and participation.
Potentially all RDA groups should be engaged and participate by contributing to this WG. The WG plans to provide advice and assistance upon request especially to domain-based groups enquiring about metadata standards. In particular the groups listed above (Engagement with Existing Work) will be involved in shaping the work of the proposed MSC Working Group acting both as requirements stakeholders and validators of the deliverables.
A specific plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community. Such adoption or implementation should start within the 12-18 month timeframe before the WG is complete.
In order to ensure the eventual functionality and the APIs of the MSC are maximally useful, the WG will consult with potential adopters, including tool developers, research data management support staff when compiling and evaluating against the requirements and technical specification (see Work Plan above). Entries may also be synchronized with peer standards catalogues, such as the DCC Disciplinary Metadata catalogue.
A specific list of initial members of the WG and a description of initial leadership of the WG.
In some ways, this is a continuation of MSDWG but with a broader scope focused on one aspect of autonomicity of e-Research. The initial leadership will be as for MSDWG:
There are 45 members of the Working Group as of November 2015.