The wide use of schema.org to add structured metadata in web pages for use by commercial search engines has attracted the attention of the data management community as a possible mechanism to leverage the robust commercial search engines like Google, Yahoo, Bing etc. to facilitate discovery and access to scientific data. Various projects have been exploring this approach, including the US NSF EarthCube p418 projectGoogle's Dataset Recommendations, BioSchemas, Force11 DCIP, Research Data Australia, DataCite, Harvard Dataverse, NASA’s Distributed Active Archive Center (DAAC) Infrastructure, EOSCpilot, etc. Since schema.org has largely been driven by commercial business use cases, and a loosely governed process for adding and defining resource type, property and vocabulary for research domain, there are gaps and deficiencies that make its application for research data problematic.
Since P11, the RDA Data Discovery Paradigms IG started the task force "Using schema.org for research data discovery". The group has organised sessions at RDA plenaries and online calls to discuss how we research community come together to embrace the advantages of discovering data via web search engines, meanwhile to address gaps and deficiencies. There is a proposal to form a RDA Working Group with a focused scope and set of well-defined priorities/objectives.
Four objectives identified from previous meetings are:
Objective 1: Identify and define research schemas types and minimum information guidelines for discoverability and accessibility
Create / Identify a list of generic discipline-agnostic entries from schema.org, and identify the properties that would be minimally suggested for users / data providers (minimal suggested properties could be provided by existing RDA recommendation taking into consideration the original context of those recommendations vs our interest in discoverability and accessibility).
Objective 2. Crosswalk and gap analysis evaluating existing standards and guidelines
Look for gaps between existing RDA recommendations / existing best practices / users’ practices in data searches against schema.org, and assess potential solutions. Do a cross-walk between the different standards, and identify the gaps through this process (in addition to promoting these standards). Compare between mapping / cross-walking.
Objective 3. Review existing efforts working on Schemas to describe scientific types
Identify the commonalities of the different extensions of schemas used by the different communities, find current gaps in the schemas - either the common gaps independent of potential extensions to schema.org between disciplines, or all gaps identified (i.e. intersection or union).
Objective 4. Engagement and communication strategy; collaboration and with existing efforts
Collaboration with other existing efforts / groups / communities
To align with the above objectives, we instrumented a survey on current practices in using schemas to describe research datasets. The survey is still open, your participation is more than welcome.
The WG's Wiki Index
This group has a regular meeting on second Thursday each month, starting 8pm UTC. A meeting reminder with zoom ID will be emailed to this group ahead of each meeting.