Metadata Standards Catalog WG Case Statement
Updated 15 January 2016 in response to TAB review
A concise articulation of what issues the WG will address within a 12-18 month time frame and what its “deliverables” or outcomes will be.
The Metadata Standards Catalog (MSC) Working Group will produce a catalog of metadata standards of relevance to research data. Specifically, the catalog system will consist of the following components:
- A set of records describing metadata standards
By ‘metadata standard’ we mean a defined metadata structure and format used within a community. Some metadata standards have formal standardization status but this is not necessarily an indication of quality or utilization.
- A user interface for submitting information, searching, browsing and displaying standards information
- A machine-to-machine interface (API) allowing automated tools to submit information, perform queries and retrieve information from the catalog
In this sense the catalog will be ‘machine readable’. We also intend that the information provided through the API will be structured in such a way that the recipient machine will be able to process and act upon it; in this sense the catalog will also be ‘machine actionable’.
This work builds on the outputs of the Metadata Standards Directory Working Group: both the Metadata Standards Directory (MSD) itself and the set of attendant use cases. The advances to be made by the MSC Working Group beyond that work are improvements to the data structure of the records, an improved user interface, and the addition of an API. The latter development will enable the MSC to participate in and provide services to any autonomic e-research fabric. While the base set of records for the MSC will be derived from those in the MSD, further information will be sought from discipline-specific standards directories and RDA working groups.
A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.
Prediction is very difficult, especially if it's about the future. — Neils Bohr
The primary beneficiaries of this work – being a continuation of work performed by the MSD Working Group and the UK Digital Curation Centre – will be researchers, in the following ways:
- Early career researchers, or those unused to sharing their data, will be able to consult the MSC – whether directly or indirectly through other tools – and discover an appropriate standard to use when documenting their data. This will help them to comply with funder requirements for data management from the planning stages through to submission to an archive.
- Having a comprehensive catalogue of metadata standards will make it easier to prevent duplication of standards development effort (in areas where metadata standards already exist) and highlight gaps where standards development activity is needed.
- Following on from the above, it will therefore be easier for peer researchers to discover, validate and reuse datasets that have been shared, due to the expected metadata being in place.
The group recognizes that there exist catalogs of datasets, repositories, software and publications with limited metadata for each entry. There exists the BioSharing initiative for bioscience and DataONE in environmental sciences, and beyond them many projects and initiatives with some information on relevant datasets and metadata associated with them. While these all contribute to the above benefits, in each case the information available is only covering one or a few domains; and in the case of BioSharing, intended for human not machine action. The MSC will address both these shortcomings with respect to metadata standards for research data.
More specific and tangible benefits and impacts of the MSC will be identified and targeted as part of the work plan. Use cases continue to be collected by the existing RDA metadata groups, the Metadata Interest Group (MIG), Metadata Standards Directory Working Group (MSDWG), and Data in Context Interest Group (DICIG). These, along with use cases to be collected directly by MSC Working Group, will be used as a basis for the precise use cases to be addressed by the MSC and the information it will contain. The following are a few examples:
- Researchers and research support staff will be able to discover standards relevant to the researcher’s discipline.
- Researchers will be able to browse the catalog for standards in other disciplines that may be appropriate, perhaps to facilitate interdisciplinary or multidisciplinary work.
- Developers of tools such as DMPTool or DMPonline will be able to use information from the MSC, via the API, to suggest relevant standards to researchers.
- Systems will be able to look up, via the API, the converters available for unfamiliar standards, to reduce the friction of importing metadata from other systems.
- In the long term, the MSC may be able to provide information to enable repository software to provide tailored metadata input forms on-the-fly, or to generate a matching between standards that a developer could use as a starting point for a converter.
The medium term goal, however, is to analyze the metadata standards and use cases and generate – working with and through MIG – a set of proposed ‘packages’ of metadata elements for purposes drawn from the wider set of use cases. By ‘packages’ we mean groupings of metadata elements for particular purposes, such as discovery, contextualization, or detailed connection of software to data. For clarity, please note the following:
- Each element may not be a single-valued attribute but a structure.
- There are relationships between elements including those carrying referential and functional integrity.
- Elements may belong to more than one package.
Since the packages represent functions, one possible naming scheme would Function Package with a specific function represented as Function Package: Discovery. The packages describe here differ from application profiles. Application profiles are a mapping from a given data structure to that required for a particular application (business purpose). These packages are generic. The packages so generated will be useful for those writing converters, designing systems and considering new standards. In the longer term they may form the bais of novel presentations of schemas and specifications for metadata standards.
The information that the catalog will need to hold about each standard will depend on the use cases chosen for the MSC, but we anticipate it will include information on the schema/specification, converters available, associated vocabularies (on which collaboration with the Vocabulary Services Interest Group would be useful), associated tools, examples of services with expertise in the standard, the version history of the standard, and the provenance (source, date last updated, etc.) of the records themselves.
Engagement with existing work in the area
A brief review of related work and plan for engagement with any other activities in the area.
The only source of knowledge is experience. — Albert Einstein
The work of MSC Working Group will build on the outputs of the RDA MSD Working Group, as outlined in the case statement for that group. The group will engage with several actively maintained domain-specific standards directories:
- Marine Metadata Interoperability Content Standard References: the MSD Working Group co-chairs have already initiated discussions with John Graybeal (Mindspring).
- GEOSS Standards and Interoperability Registry
- BioSharing: the MSD Working Group co-chairs have already initiated discussions with Susanna-Assunta Sansone (University of Oxford).
- Community Inventory of EarthCube Resources for Geosciences Interoperability: the MSD Working Group co-chairs have already initiated discussions with Ilya Zaslavsky (SDSC).
The group will engage with other RDA groups with an interest in metadata standards:
- Wheat Data Interoperability Working Group
- Metadata IG
- Data in Context IG
- Data Fabric IG
- Practical Policy WG
- Brokering Governance WG
- Brokering IG
- ELIXIR Bridging Force IG
- Data Description and Interoperability WG
- Research Data Provenance IG
- Domain Repositories IG
The group will engage with other external activities with an interest in identifying and producing metadata standards:
A specific and detailed description of how the WG will operate.
M1-M6: Continue to collect use cases for interaction with the proposed catalog including both human access/interaction and machine access/interaction. Analyze these (using the MIG template presented at RDA Plenary 6) for intersections and synergies leading to a definition of the requirements and technical specification for the catalog.
M6-M12: Do initial design and development of the Catalog system prototype based on already collected use cases, defined requirements and technical specifications using best practices in metadata standards to describe metadata standards and their relationships to organisations, persons, software, datasets, etc. thereafter taking into account later-arriving use cases refine to the production catalog system. Cooperate with other RDA Groups (especially domain groups) in refining the prototype system to production status leading to adoption. Identify potential adopters of the Catalog system.
M12-M18: Evaluate the catalog against requirements and technical specifications with and by application domain communities of RDA and other potential adopters. Validate the Catalog mechanisms for directing users to metadata standard(s) appropriate to their purposes.
M14-M16: During this period of time the packages will be documented; standards to the catalog will be inputted "as they are"; a priority list of standards for mapping will be established by the community; the high priority standard swill be mapped to the packages; the mapping will be stored in the catalog; and a user interface and API to the catalog will be provided
M18: Report at next RDA Plenary
The form and description of final deliverables of the WG.
A catalog for metadata standards, incorporating
- User interfaces for input/edit/query/reporting
- Appropriate APIs for software interaction with the catalog
- Mechanisms for directing users to metadata standards appropriate to their purposes.
Milestones and Intermediate Deliverables
The form and description of milestones and intermediate documents, code or other deliverables that will be developed during the course of the WG’s work
- One or more white papers to stimulate discussions and steer group activities (M3, M9)
- Requirements and technical specification for the MSC, to serve as the foundation for software development (M6)
- Contributions to the MIG objective of defining ‘packages’ of metadata elements for defined purposes (M12)
Mode and Frequency of Operation
A description of the WG’s mode and frequency of operation (e.g. on-line and/or on-site, how frequently will the group meet, etc.).
The WG will provide the usual forum for discussion through teleconferences/Skypes (every three months), face-to-face meetings between plenaries associated with group chair meetings, face-to-face meetings at plenaries (twice a year), meetings at plenaries with other groups (especially the metadata groups but also others) and the RDA website forum as appropriate. However, the major mechanism of operation is the provision of software to encourage input/edit and utilisation of metadata standards.
A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation
The proposed co-chairs have extensive experience of project management generally and also within the RDA context due to participation in other groups. They have already demonstrated their ability to manage groups. Working from the agreed plan, all group members will participate in discussions on realization of the objectives and the means to do so. Consensus will be developed by frequent online (teleconference or email/forum) discussions mediated by one or more of the co-chairs. This mechanism will also achieve conflict resolution by exposing the different viewpoints and – through discussion – converging to consensus. The milestones defined above provide the anchorage to a timeline and the deliverables will be concrete proof of adherence to the timeline.
Community Engagement Plan
A description of the WG’s planned approach to broader community engagement and participation.
Potentially all RDA groups should be engaged and participate by contributing to this WG. The WG plans to provide advice and assistance upon request especially to domain-based groups enquiring about metadata standards. In particular the groups listed above (Engagement with Existing Work) will be involved in shaping the work of the proposed MSC Working Group acting both as requirements stakeholders and validators of the deliverables.
A specific plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community. Such adoption or implementation should start within the 12-18 month timeframe before the WG is complete.
In order to ensure the eventual functionality and the APIs of the MSC are maximally useful, the WG will consult with potential adopters, including tool developers, research data management support staff when compiling and evaluating against the requirements and technical specification (see Work Plan above). Entries may also be synchronized with peer standards catalogues, such as the DCC Disciplinary Metadata catalogue.
A specific list of initial members of the WG and a description of initial leadership of the WG.
In some ways, this is a continuation of MSDWG but with a broader scope focused on one aspect of autonomicity of e-Research. The initial leadership will be as for MSDWG:
- Alex Ball, firstname.lastname@example.org
- Keith Jeffery, email@example.com
- Rebecca Koskela, firstname.lastname@example.org
There are 45 members of the Working Group as of November 2015.
Author: Ruth Duerr
Date: 29 May, 2015
There is no doubt that this catalog is needed and will be well-used; but I don't see some organization stepping up to make this an operational system, maintaining it for the long-term. Is someone doing this?
Author: Kevin Ashley
Date: 11 Jun, 2015
Ruth, that's a valid point. However, I think we have part of the answer in the genesis of this group and its predecessor, and I think it should be the job of this group, or perhaps the interest group, to tease out the issues that will inform the rest of the answer.
I think maintenance is potentially a separate issue from providing a home for the service. Part of what fed into the predecessor group to this was a catalogue of research data metadata standards that had been built by the DCC - we finished the first version at about the time that the RDA was being established. We were happy to provide a home for that service indefinitely, but I had real concerns that we did not have the resources to maintain the information in it. The RDA working group provided an answer to that - it brought in a far larger community not just to maintain what we had done but to rethink what that service and platform should look like. It's allowed us to let go of the work and we are now just one maintainer amongst many. However, we would still see providing a home for the service as something that's well-aligned with our mission, although we don't claim to be the only people in that position.
So, I think one thing to examine as this work develops is the social and technical issues involved in having a community of maintenance that is larger than and disconnected from the hosting organisation providing the operational system. That must be surely be an issue that crops up again and again in dealing with RDA WG outputs.
Author: Ronald Margolis
Date: 29 May, 2015
Since the NIH BD2K effort also includes questions about metadata and community-based standards, the RDA MD Standards Catalog WG should seek to interface with the NIH to assure complete coverage and ultimately acceptance of community-based MD standards.
Author: Rebecca Boyles
Date: 04 Jun, 2015
The NIH BD2K effort on community based standards is working to formulate a path forward for NIH. We are very interested in the work of this group, and the case statement has been circulated among us.
Thank you for all the effort you all are putting into advancing this work.
Author: Ramona Walls
Date: 04 Jun, 2015
Has use case collection already begun. If so, are the existing use cases visible? If not, will there be an open call for use cases?
Author: Rebecca Koskela
Date: 08 Jun, 2015
Yes, collection of use cases has begun and they are located on the Metadata Interest Group page in the File Repository (https://www.rd-alliance.org/node/167/repository). If you have a use case that you would like to submit, the template and example of a completed template are also there.
In addition, we have asked for volunteers (and have a list of the volunteers) to analyze the use cases. If you are interested in this activity, please contact us.
Rebecca Koskela & Keith Jeffery
Author: Keith Jeffery
Date: 08 Jun, 2015
Many thanks for the comments on this. Of course we should (and shall) take this effort into account along with efforts in Europe from DCC and others. Significantly some communities (biosciences springs to mind but also others) have their own catalogs (but usually machine readable rahter than machine actionable) and we shall take those into account too.
Author: George Alter
Date: 12 Jun, 2015
A metadata catalog is certainly of great value. But the metadata landscape is already very complex, and the work plan does not sufficiently engage with existing institutions and stakeholders. The use cases on the Metadata Interest Group mostly start from the perspective of an individual researcher, and they do not consider links to existing facilities, especially projects, such as bioSharing in the life sciences and Earth Cube in the geosciences. For an API to be successful in collecting standards information it must be acceptable to existing repositories, databases, and registries. The workplan needs a strategy to engage with others in the diverse domains that already run or are working on such catalogs, including groups that do not come to RDA plenaries.
It is our experience that - even within a specific domain - the creation of a metadata catalog requires time or at least a well staffed team of developers and curators to tackle both technical and content sides, and the need for staffing increases with the diversity of domains to be covered. The community outreach and engagement (the social engineering side) is of comparable difficulty to to the technical. This will be difficult to accomplish in the 18 months time period of the WG.
Author: Ian Fore
Date: 13 Jun, 2015
Noting the need identified by others to be able to represent their domains, and to make use of existing metadata work; the catalog can only be effective with those contributions. It seems particularly important to focus on the model in which domains can be express their metadata standards. Standards (metastandards?) like DCAT, schema.org, ISO11179 and others come to mind for this purpose. A potential benefit of these standards is that various forms of machine actionability exist around them.
This would potentially be within the scope of of the API proposed in the case statement.
Author: Denise Warzel
Date: 16 Jun, 2015
(typos and clarifications fixed)
I am new to this group and have been operating a metadata registry for the past 15 years at NCI that holds metadata about CDEs and Case Report Form (Forms), which we will be extending to integrate with other NIH and outside metadata registries and initiatives, so I am very interested in this WG.
From the description in the case-statement, its difficult to understand what kind of metadata is to be captured in the Catalog and what purpose it serves beyond other resources already available, such as data.gov.
Although the case-statement indicates that this follows directly from the Metadata Standards Directory (MSD) WG. The MSD directory at the URL below seems to contain at least some of the same kind of information noted in the Metadata Catalogue Value Proposition. The Metadata Catalog value proposition could be improved by making it more distinct and explaining how/what this effort is contributing beyond what the MSD already provides. The Metadata Standards Directory (MSD) WG, which publishes a Directory here: http://rd-alliance.github.io/metadata-directory/standards/.
The case-statement for the Metadata Catalog "Value Proposition" is below, but seems ambiguous in the ways noted below:
1. Discover standards relevant to the researchers’ discipline
- the Catalog could be the actual usable metadata as data, like metadata describing CDEs, CRFs, CIMIs, which are standards for collecting data. Or, the Catalog could be an entry that points to the actual resource where the metadata lives...which is it?
2. Browse the catalog for standards in other disciplines that may be appropriate
- This also sounds like the MSD which appears to be fairly well organized by topic, Arts and Humanities, Engineering, etc.
3. Utilize tools they may already be using in the course of their research, such as DMPTool or DMPonline to query the MSC
- this sounds like either metadata about Tools that can used with specific kinds of data, but where is the data? Are there entries in the Metadata Catalog that point to the data? Or is this a platform where the tools can actually be used along with data, like the HubZero platform? Why are tools in a Metadata Catalog?
4. Utilize services within middleware
- This too sounds like something other than Metadata Catalog unless its metadata about the services?
I am very interested and would like to contribute, however clarification of the types of metadata and how it differs from the MSD and other initiatives would be helpful.
Author: Amy Barton
Date: 13 Jul, 2015
Based on my understanding of what differentiates the MSD from the proposed MSC, I see two key points:
1. Directory contains many domain-specific, as well as general, metadata standards aggregated and organized in one space. However, there is no querying or reporting functionality in the directory. The catalog will include this functionality - which is valuable in terms of discovering not only information about potentially appropriate metadata standards for a discipline/project, but will lead to the actual standards (schemes), documentation about the standard, and possibly implementation guidance where available. Additionally, with reporting functionality in place, perhaps it would be possible to track the usage of the various standards which would inform us of how/where the standards are used and adopted.
2. The directory is just that, a directory. The catalog will be machine actionable, with API development and the ability to build services on top of the catalog. This latter point may suggest aligning the metadata catalog to the DCAT standard.
With that, there is one point in the case statement that I don't quite understand:
"The medium term goal, however, is to analyse the metadata standards and use cases and generate – working with and through MIG – a set of proposed ‘packages’ of metadata elements for each defined purpose."
What makes up a "package"? How will it be used? Does this suggest developing new standards with properties culled from existing standards? Or, does this suggest, for each given standard, the working group will develop "packages" of properties that include the required properties with additional "highly recommended" properties that enhance data discovery, data documentation and data reuse, much like the "super-set" the DataCite Metadata Working Group developed (the super-set: https://schema.datacite.org/meta/kernel-3.1/doc/DataCite-MetadataKernel_... - section 2.3 Overview.)?
All the best!
Amy J. Barton (née Hatfield), MLS
Assistant Professor of Library Science, Metadata Specialist
Purdue University Libraries, Research Data
Member of the DataCite Metadata Working Group