Best Practices for vocabulary-based projects: Development, Standardization, Registration, Harmonization and Support - RDA 12th Plenary Meeting

You are here

BoF Meeting Title

Best Practices for vocabulary-based projects: Development, Standardization, Registration, Harmonization and Support

Collaborative session notes
:

Short introduction describing the scope of the group and if any previous activities

Vocabulary use, both for data management and domain applications, continues to be an important topic at RDA groups and plenaries. Past BoFs and joint meetings at plenaries have ranged from exploration issues of domain vocabulary services based on vocabulary registration to issues and practices for harmonization of domain vocabularies. Expanding interest and work is evident in such areas as library-led initiatives, scholarly publications and communications along with the use of FAIR principle that encourage richer & reusable metadata to support discoverability and interoperability between data, systems and communities. Besides these efforts to standardize data vocabularies range from RDA's DFT to The International Research Data Management (IRiDiuM) Glossary which are cooperative efforts to NIST's Big Data Solutions Reference Glossary. However these and our original RDA DFT vocabulary efforts did not address research library data services and related issues. It would be useful to close this gap which might connect to and leverage Publishing and Referencing Ontologies.

RDA has made notable proposals for metadata elements but there are a range of other projects covering some of the same issues. One example is a community-led initiative 2020, which was organized as a global effort by Crossref in collaboration with associations, publishers, universities, and other scholarly communications organizations. Due to both the Crossref and RDA efforts there is now a more mature range of practical and applied field experience for developing, registering and harmonizing metadata vocabularies. It seems useful topic for an RDA Plenary BoF to share emerging best practices for vocabulary-based projects leveraging such project and group experiences.

In addition to the effort described above additional standards and practices to be leveraged are standardized vocabularies such as DCAT. DCAT is useful for registration and management of data catalogs and facilitating interoperability between data catalogs published on the Web. The recent extension to the DCAT vocabulary includes the addition of data services (aka distribution services) to its catalog and an explicit class to support Data Distribution Services. And such services are also on interest to RDA's Vocabulary Services IG. To a modest degree metadata from some vocabulary based tools can be used to identify and label data which helps mitigate some of the problems of ambiguities associated with data markup. A growing practice to add at least some basic thesaurus metadata using broader, narrower, related relations and Simple Knowledge Organization Systems (SKOS) properties. SKOS uses RDF to provide some formalization of various types of controlled vocabulary including classification schemes, subject heading lists, and taxonomies. This promises some degree of automation for finding relevant and similar terms, but it has limitations that are worth being aware of.

Additional links to informative material related to the group

Libraries for Research Data IG: https://www.rd-alliance.org/groups/libraries-research-data.html
DCAT Revision: https://www.w3.org/TR/2018/WD-vocab-dcat-2-20180508/
Data Foundations and Terminology IG, https://rd-alliance.org/groups/data-foundations-and-terminology-ig.html
Metadata 2020: http://www.metadata2020.org/
Vocabulary Services Interest Group, https://rd-alliance.org/groups/vocabulary-services-interest-group.html
The International Research Data Management (IRiDiuM) Glossary (http://www.codata.org/working-groups/standard-glossary-for-research-data...) and (http://dictionary.casrai.org/Category:Research_Data_Domain)
Big Data Solutions Reference Glossary: https://bigdatawg.nist.gov/_uploadfiles/M0067_v1_5148194733.pdf
https://onlinelibrary.wiley.com/doi/full/10.1002/asi.23781

Meeting objectives

The intention of this session is to provide a suitable venue to feature, share and discuss Best Practices for vocabulary-based projects. BoF organizers and participants will illustrate approach to use and mangagment vocabularies and talk about their current approach to developing, registering and managing vocabularies along with issues that they have encountered. To prepare for discussion, participants will be encouraged to provide one or more use cases and brief current approach to developing & importing vocabularies, harmonizing internal and external vocabularies with functions like "mapping” and harmonizing.

Among the practices to be addressed include how to:

develop, harvest, reuse and manage rich metadata
use of vocabularies in FAIR
raise the awareness of the importance of sharing richer metadata
provide information for the community on the role of metadata in making scholarly content discoverable
encourage partners like publishers, aggregators, funders, research institutions, and service providers to commit to increase the quality of their metadata
Equipping all stakeholders with tools and information Examples of relevant tools are those that store various vocabularies in a common repository. The National Environment Research Council (NERC) Vocabulary Server, for example, is a tool that provides access to lists of standardized terms that cover a broad spectrum of disciplines

Meeting agenda

  • The meeting intended to support some cooperative work on issues. To facilitate common understanding this session will start with an overview of objectives and the proposed agenda.
  • Each group, such as described previously, will provide a brief summary of their work, how it is a best practice and indication of relevant vocabularies and standards. If available presenters are encouraged to provide an illustrative use case such as harmonization and mappings between 2 or more vocabularies along with an explanation of issues involved.
  • Following presentations major time will be devoted to community discussion of, and work on, common interests, issues and best practice solutions in the vocabulary space. 
  • Whenever possible focus will be given to discuss issues of vocabulary use reflecting actual innovative, tools and supporting activities and services available from current research and development.
  • The session will conclude with discussion of follow up on common interests and venues for this such as in existing vocabulary groups and exploration of interest future sessions. 

Audience

All RDA groups with a marked interest in vocabulary applications such as adding semantics to vocabularies or metadata are potential audiences along with those pursuing interoperability. In addition groups dealing with communication and data sharing issues are potential target audiences.

Starting with the organizers, representatives from the MIG and DCAT there will be a core of brief summary of best practice work illustrated by means of a use case. As available presenters are also encouraged to provide an explanation of issues involved.

Co-chairs from the Libraries for Research Data IG and Data Foundations and Terminology IG will take the lead in recruiting best practice project to be presented as well as moderating the meeting.

Group chair serving as contact person: Gary Berg-Cross

Type of meeting: Working meeting