Open Science Graphs for FAIR Data Interest Group
NB: this case statement is the revised version from October 2019. Attached below, the downloadable initial and revised case statements.
- Amir Aryani (Research Graph Foundation)
- Martin Fenner (DataCite)
- Wouter Haak (Elsevier, Mendeley Data)
- Paolo Manghi (OpenAIRE Infrastructure - Institute of Information Science and Technologies, CNR, IT)
The goal of the Open Science Graphs Interest Group (OSG IG) is to build on the outcomes and broaden the challenges of the Data Description Registry Interoperability (DDRI) and Scholarly Link Exchange (Scholix) RDA Working Groups to investigate the open issues and identify solutions towards achieving interoperability between services and information models of Open Science Graph initiatives. The aim is to improve FAIRness of research data, and more generally FAIR*-ness of science, by enabling the smooth exchange of the interlinked metadata overlay required to access research data at the meta-level of the discovery-for-citation/monitoring and at the thematic level of the discovery-for-reuse. Such “FAIR-ness” and “interlinked-ness” provide strong support for research integrity and research innovation which in turn underpin significant social environmental and economic benefits.
Open Science is urging scientists, communities, institutions, and policymakers to define and adopt methodologies, practices, and tools for publishing research products, beyond the scientific article, including research data, software, digital experiments, etc. The ultimate goal is to achieve transparency and reproducibility of science. As a consequence of this trend, researchers are depositing into scholarly communication data sources the metadata and files relative to all these products, together with semantic links between them, and towards other relevant entities, such as those kept in registries for authors, organizations, and data repositories (e.g. ORCID, ROR, re3data.org). De facto, Open Science publishing practices materialize a distributed/federated/de-centralized and global Open Science Graph, where by “graph” we mean a collection of objects (i.e. scientific product metadata) interlinked by semantic relationships (i.e. claims of object-to-object relationships together with their meaning). Needless to say, there is a great interest to contribute to and/or consume this Graph for sharing, discovering, and monitoring Open Science. To address this, several initiatives are aggregating targeted subsets of such sources to build specialized Open Science Graphs, subsets of the global Open Science Graph, capable of serving specific user needs: Google Scholar, Microsoft Academics, Scopus, FREYA PID Graph, Research Graph Foundation, OpenAIRE Research Graph, Open Knowledge Graph, Human Brain Project Knowledge Graph, as well as the CERIF graphs built via CRIS systems are just a few of the real-case graphs being built and consumed out there.
Clearly, FAIRness of research data strongly relies on the success and diffusion of such graphs, both at cross-discipline level and at the thematic level. Research data (as well as software, or any research object) can be contextualized, thus maximizing its value and ability to reuse, and be reachable via navigation from other related objects. Nonetheless, research data value and related scientific reward may be derived from its constantly updated context, relying on a network of of citations and usage statistics. The general architecture and use cases have been studied by the RDA Data Description Registry Interoperability (DDRI) Working Group. Besides, the RDA/WDS Scholarly Link Exchange (Scholix) Working Group has added significantly to our understanding of the subset of the research data graph connecting data and literature. Both working groups have done extensive work on the implementation of services and community adoption and are today in maintenance mode. However, other projects continue to work on these outcomes, including the OpenAIRE Research Graph, the Scholexplorer graph, Research Graph, and the FREYA PID Graph. Driven by such motivations, the co-chairs of the aforementioned WGs organized a BoF on “Research Data Graphs” at the RDA Plenary Conference in Philadelphia, to check on the general interests and possible commitments on this topics, which resulted on this IG case-statement and proposal.
The Open Science Graphs Interest Group (OSG WG) will investigate the challenges and identify solutions towards achieving interoperability between services and information models of Open Science Graph initiatives. The aim is to improve FAIRness of research data, and more in general FAIR*-ness of science, by enabling the smooth exchange of the interlinked metadata overlay required to access research data at the meta-level of the discovery-for-citation and at the thematic level of the discovery-for-reuse. The following main challenges can be identified as the worth of investigation:
- Build a community of Open Science Graph initiatives working together in the context of RDA with a focus on FAIR data;
- Build on and provide input to the outcomes of RDA IGs and WGs such as DDRI, Scholix, PID, Metadata, Go-FAIR, FAIR Data Maturity Model, Data Usage Statistics, and Data Citation; this will be achieved thanks to the current membership of co-chairs to such groups and by inviting co-chairs of other groups to build active synergies;
- Analyse the state of the art in this domain, by making synergies with the tens of initiatives today building Open Science Graphs and provide an overview of current research data graph activities to frame a definition and classification of such graphs;
- Study the foundations of an information model, a lingua franca, that would enable the realization of an interoperability layer facilitating the exchange of information between graphs;
- Discuss the ideal services, protocols, and APIs required to exchange graphs, query graphs, navigate graphs in both aggregation scenarios and federated access scenarios.
- Identify one or more dedicated RDA Working Groups to tackle/address relevant challenges.