Scholarly Link Exchange Working Group Case Statement

07 Oct 2016

Scholarly Link Exchange Working Group Case Statement

Scholarly Link Exchange Working Group:

Follow on from: RDA-WDS working group on Data Publishing Services

On Enabling Interlinking of Data and Literature

Charter:

The Scholarly Link Exchange Working group aims to enable a comprehensive global view of the links between scholarly literature and data.  The working group will leverage existing work and international initiatives to work towards a global information commons by establishing:

  • Pathfinder services and enabling infrastructure
  • An interoperability framework with guidelines and standards (see also www.scholix.org)
  • A significant consensus
  • Support for communities of practice and implementation

 

By the end of this 18 month WG period there will be:

  • A critical mass of Scholix conformant hubs providing the enabling infrastructure for a global view of data-literature links
  • Pathfinder services providing aggregations, query services, and analyses
  • Beneficiaries of these services accessing data-literature link information to add value to scholarly journal sites, data centre portals, research impact services, research discovery services, research management software, etc.
  • Operational workflows to populate the infrastructure with data-literature links
  • Better understanding of current data-literature interlinking landscape viewed from the perspective of e.g. disciplines, publishers, repositories etc.

 

The working group follows on from the RDA/WDS Publishing Data Services WG, https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html.  The original working group established demonstrator services enabling infrastructure.  The follow on working group will support the “hardening” of that infrastructure and services as well as an increase in the number of participating hubs and services. The original working group established an interoperability framework. The follow on group will provide further specification, documentation and profiling of that framework to support adoption by link contributors and consumers.  The original working group established a consensus among large infrastructure providers and early adopters; the follow up group will extend that consensus to the next stage of adopters and to a more diverse set of infrastructure providers.  The original working group harnessed the energy and interest of specialists; the follow up group will provide support for a number of communities and services as they implement and adopt the framework and vision established in the original group.

The working group believes a global system for linking data and literature should be:

  • Cross-disciplinary and global (built for, and aspiring to, comprehensiveness) 
  • Transparent with provenance allowing users to make trust and quality decisions
  • Open and non-discriminatory in terms of content coverage and user access (this also means ranging from formal to informal, and from structured to non-structured content)
  • Standards-based (content standards and exchange protocols)
  • Participatory and adopted, including community buy-in
  • Sustainable
  • An enabling infrastructure, on top of which services can be built (as opposed to a monolithic “one-stop-shop” solution).

Note - This group retains the principles established in its precursor working group (Publishing Data Services)

 

Value Proposition:

The WG aims to oversee and guide the maturation of a distributed global system to collect, normalize, aggregate, store, and share links between research data and the literature. This will build upon the output of the preceding Data Publishing Services Working Group, which delivered a consensus vision and set of guidelines called the Scholix Framework, together with an operational system called the Data-Literature Interlinking (DLI) System, which puts these guidelines into practice as a pathfinder implementation. The WG proposed here will build out these assets into an operational infrastructure and service layer that is to become the de facto go-to place for organizations to deposit or retrieve links between research data and the literature.

 

The value of such a system ultimately rests on the value of links between research data and the literature. The utility of such links is threefold (see also the Case Statement of the Data Publishing Services WG):

  1. They improve the visibility and discoverability of research data (and relevant literature), so that researchers can find relevant material more easily.
  2. They help place research data in the right context, so that researchers can re-use data more effectively.
  3. They support credit attribution mechanisms, which incentivize researchers to share their data in the first place.

 

These value elements are illustrated below, and in more detail in Annex A.

While there is broad support for the value and utility of data-literature links amongst the various stakeholders in research data publishing (including researchers as the ultimate end-users of this information), organizing the associated information space is not an easy feat: there are many disconnected sources with overlapping information, and there is a wide heterogeneity in practices today - both at a technical level (different PID systems, storage systems, etc.) and at a social level (different ways of referencing a data set in the literature, different moments in time to assert a link, etc.). As a consequence, the landscape today is incomplete and patchy, characterized by independent, many-to-many non-standard solutions - for example a bilateral arrangement between a journal publisher and a data center. This is both inefficient and limiting in the value that can be delivered to researchers.

 

The universal linking infrastructure which this WG strives to put in place represents a systemic change. It will offer an overarching, cohesive structure that binds together many of today’s practices into a common interoperability framework - which will ensure that links between research data and the literature can be easily shared, aggregated, and used on a global scale. This will drive a network effect, where the value in the system as a whole is greater than the sum of individual parts: for researchers as end-users, this value lies in the comprehensiveness and quality of link information; for service providers and infrastructure providers (including journal publishers and data centers), the value also lies in simplicity, efficiency, and reduction of friction in the process by being able to work with a single interface to deposit and retrieve links (and, potentially, the possibility to benefit from additional services developed on top of the core infrastructure).

 

Who will benefit and Impact

 

Mapping the value proposition as described in the above to the various stakeholders and actors in research data publishing (copied largely from the Data Publishing Services WG Summary & Recommendations), benefits and impact may be summarized as follows:

  • For data repositories and journal publishers: linking data and the literature will increase their visibility and usage, and can support additional services to improve the user experience on online platforms (for example, offering links to relevant data sets with articles, or offering links to the literature that will help place data in context). In contrast to the bilateral arrangements that we often see today between data centers and journal publishers, the global linking infrastructure will make the process of linking data sets and research literature a more robust, comprehensive, and scalable enterprise.
  • For research institutes, bibliographic service providers, and funding bodies: the infrastructure will enable advanced bibliographic services and productivity assessment tools that track datasets and journal publications within a common and comprehensive framework.
  • For researchers: firstly, the infrastructure will make the processes of finding and accessingrelevant articles and data sets easier and more effective. Secondlyit will

 

 

Engagement with existing work in the area:

  1. Building upon previous work of the RDA/WDS Publishing Data Services WG
  2. RDA/WDS Publishing Data IG
    • RDA/WDS Publishing Data Bibliometrics WG
    • RDA/WDS Publishing Data Workflows WG
  3. Infrastructure providers
  4. Infrastructure projects
  5. Related projects
  6. Data Center Community
    • ICSU WDS
    • DataCite
  7. Publisher Community
  8. Institutional Repository Community
    • OpenAIRE
    • SHARE
  9. Discipline-specific Communities
    • Pangaea (Earth and Environmental Science)
    • EBI-EMBL (Life Sciences)
    • ICPSR (Social Sciences)
    • CERN (High Energy Physics)

 

Adoption Plan:

The Adoption Plan for this Working Group is quite mature since it builds on a previous working group, includes adopter work packages, includes outreach and documentation work packages, targets new hubs, and focuses on benefit realisation.

 

Previous Working Group:  The proposed working group builds directly on the Data Publishing Services Working Group which has a considerable membership with an active core of contributors. The WG is representative of publishers, data centres, research organisations and research information infrastructure services who are the key stakeholder and adopter communities. The existing momentum and buy-in of this group will be leveraged for adoption.

 

Technical Development of Hubs: In a similar vein, the WG activity plan includes targeted activity to extend existing hubs (CrossRef, DataCite, OpenAire, RMap) and establish new hubs in new community areas (such as Astronomy, Life Sciences).

 

Implementation Sub Projects: The working group case statement “Activities” section provides details of a number of adoption sub projects.  The Scholix framework that underpins the WG approach involves content publishers (eg journal publishers or data centres) communicating with natural hubs (eg CrossRef and DataCite). This WG activity plan includes implementation projects from publisher to hub.

 

Documentation and Support Materials: The WG activity plan includes an extension of the Scholix framework by providing documentation of instantiation of the abstract Scholix information model in various technologies or formats (such as xml, rdf, json) and using a number of common protocols (such as open api calls, sparql, oai-pmh, resourceSync). These specification and implementation materials will also be the product of the development and adoption projects described above.

 

Outreach, Liaison, Collaboration:  This Working Group focuses on a technical solution to the exchange and aggregation of data-literature link information.  Other peak bodies and advocacy groups focus on changing practice and integrating data citation as part of scientific practice.  The WG work plan includes collaboration with those organisations to leverage their established agendas.  Current members of the WG include leaders in these organisations and further such activity is slated in that area of the work plan.

 

Benefit realisation: The sustainable driver of adoption is benefit for the adopter.  The overall work plan is underpinned by the objective of delivering benefits to end users, as outlined in the use cases of the Annex A.

 

Work Plan:

The work plan will be implemented through a set of interconnected activities outlined below. Categories exist only for planning and pragmatic purposes; they are not at all independent and activities will not be siloed.  Cross-category contributions by working group members will be the norm.

Stream 1.Technical Development.

The objective of this stream to put the Scholix framework into practice such that both hubs and services develop operational functionality.

A. Develop Hubs

  1. OpenAIRE
    • Make OpenAIRE APIs compatible with Scholix to export and import links to and from DLI Service
  2. DataCite
    • Further develop standardised interfaces for query and export
  3. CrossRef
    • Further develop standardised interfaces for query and export
  4. New domain-specific hubs, e.g. EMBL/EBI(TBC by opportunity)
  5. Interim hubs (direct feed to DLI): standardisation (using Scholix framework) of feeds from previous working group and improvement of dynamic currency of feeds
    • ANDS to DLI direct (only non-DOI content)
    • ...
  6. Further interoperation of the hubs (extensions to the Scholix conceptual framework during the course of the working group)

B. Develop Services (in relation to the user scenarios defined in previous WG)

  1. DLI aggregation service https://dliservice.research-infrastructures.eu/#/
    • Transition to production at OpenAIRE data centre and infrastructure
    • APIs for PID resolution (Scholix conformant) - Pangaea
    • Improving quality: e.g. de-duplication of objects (datasets and literature)
    • Improving service level: live updates of links
  2. Use of the Scholix framework to access and expose links between articles and data in exemplar end-user services
    • OpenAIRE APIs compatible with Scholix to export and import links to and from DLI Service
    • Data centre/ publisher exemplar projects using DLI as per user scenarios

C. Elaborate the Scholix framework

  1. Create profiles of the inf model for use in different technologies
    • XML for oai-pmh
    • JSON for RESTful api
  2. Investigate how best to apply
    • DISCO (through cooperation with RMAP)
    • ResourceSync
    • Others?…(RDF for Sparql)
  3. Provide documentation and support materials for the above

 

2. Community buy-in stream

This stream supports buy-in from different communities such that exchange of scholarly link information is implemented and accepted as standard practice.

D. Support Community Adoption:

  1. Create strategies for community adoption:
    • Publishers
    • Data centres
    • Repositories
    • ….
  2. Implement these strategies through:
    • Early adopter groups (eg CrossRef early adopters; e.g. Force11 DCIP project; eg via the THOR project; with COAR)
    • Implementation projects
    • Webinars
    • Presentations
    • Support materials and activities

E. Communicate Broadly

  1. Create communications plans
  2. Implement communications plans

F.  Create Coordination and Governance Materials. Investigate and document issues such as:

  • Quality of data links
  • Requirements to be a hub
  • Access
  • Benefits for contributors
  • Measures of success

 

Key Stakeholder Groups:

The above Activity Plan will be delivered with involvement of the following groups who bring complementary resources, approaches, focus, and expertise.

A. Advocacy and Peak Bodies

  • Force11 (application data citation standards & advise on implementation standards)
  • CODATA (application data citation standards & advise on implementation standards)
  • ICSU World Data System (e.g. get more citations into DataCite)
  • STM (outreach, training, Crossref early adopter project)
  • ESIP / COPDESS
  • FAIR Data

B. Other data literature linkage projects

  • National Data Service
  • RMAP (application of DISCO)
  • SHARE
  • RDA Working Groups (Publishing Data IG, ….)

C. Prospective Hubs

  • BIOCaddie (DataMed)
  • EMBL-EBI/ELIXIR
  • NASA ADS

 

Initial Membership

Initial members are coming from the existing working group on an opt-out basis; they will be asked again if they want to join this newly formed working group by e-mail following the RDA

https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html

 

Adrian Burton

George Mbevi

Kathrin Beck

Paul Dlug

Amir Aryani

Håkan Grudd

Kerstin Helbig

Peter Rose

Amye Kenall

Haralambos Marmanis

Kerstin Lehnert*

Peter Fox

Aris Gkoulalas-Divanis

Howard Ratner

Lars Vilhuber

Rabia Khan

Arnold Rots

Hua Xu

Laura Rueda*

Rainer Stotzka

Arthur Smith

Hylke Koers

Laurel Haak

Richard Kidd

Bernard Avril

Iain Hrynaszkiewicz

Leonardo Candela

Rick Johnson

Carly Strasser

Ian Bruno*

Luiz Olavo Bonino da Silva Santos

Robert Arko

Carole Goble

Ingrid Dillo*

Lyubomir Penev

Rorie Edmunds*

Caroline Martin

Jamus Collier

Mark Donoghue

Sarah Callaghan*

Claire Austin

Jeffrey Grethe

Martin Fenner*

Sheila Morrissey

Claudio Atzori

Jingbo Wang

Martina Stockhause*

Siddeswara Guru

Dan Valen

Jo McEntyre

Michael Diepenbroek*

Simon Hodson*

David Martinsen

Joachim Wackerow

Mohan Ramamurthy

Suenje Dallmeier-Tiessen

David Arctur

Johanna Schwarz

Mustapha Mokrane*

Tim DiLauro

Donatella Castelli

John Helly

Natalia Manola

Timea Biro

Eefke Smit*

Jonathan Tedds

Niclas Jareborg

Tom Demeranville

Elise Dunham

Juanle Wang*

Nigel Robinson

Ui Ikeuchi

Elizabeth Moss

Kate Roberts

Paolo Manghi

William Mischo

Francis ANDRE

Katerina Iatropoulou

Patricia Cruse*

Wouter Haak*

 

 

 

Xiaoli Chen

 

 

 

Yolanda Meleco

* Representattives of a WDS member

 

Initiatial workstream leads and co-chairs:

    • techincal specs and docs (Paolo Manghi)
    • hub development and interoperability (Martin Fenner)
    • Scholix service development (Jeff Grethe)
    • publisher (Iain )
    • repository (Ian Bruno)
    • general outreach (Fiona Murphy)
  1. WG Coordination
    • WG program oversight (Wouter Haak)
    • WG component integration (Adrian Burton)

 

Annex: Use Cases

 

Use Case

Details

 Live linking

As a publisher, I want to know about relevant data for an article that I published so that I  can present links to such data sets to the users on my platform

- OR -

As a data center, I want to know about relevant articles for a data set that I published so that I  can present links to such articles to the users on my platform

  • Needs to be on-demand, real-time query. Performance is critical.
  • Publisher or  data center platform should be able to control UI for smooth platform integration.
  • No need for the service to do any filtering; just return all linked data sets and client can filter as needed.

 

Overview

As a data center, I want to obtain a full overview of article/data (and data/data) links for the data sets relevant to me so that I  can demonstrate the utility of my data

  • Query should be on-demand, complete, and up-to-date.
  • Precision and comprehensiveness are key
  • Ideally on-demand,  pull mechanism.

Notification

As a data center, I want to be alerted that an article may be citing/referencing our data so that I can validate that link and then add it to our own database.

  • For an alerting mechanism, recall is more important than precision (since the data center will still validate)
  • Should be push notifications.
  • Data center needs to be able to selectively receive notifications for their data repository only, need “data center” metadata.
  • This service is not so sensitive to comprehensive coverage

Exploration

As a researcher interested in a particular topic of study, I want to be able to explore a relevant article/data graph so that I  can find the articles or data sets that I am interested in.

  • General “research” use case, could apply to individual researchers, data repositories, and others.
  • Requires a lot of freedom to do exploration at the user’s terms
  • Would expect the user in this case is highly tech-savvy and will want to create their own search logic using a minimal “hopping service” that exposes a set of links given an article or data set PID.

 

 

Review period start: 
Friday, 7 October, 2016
  • Debjani Deb's picture

    Author: Debjani Deb

    Date: 06 Feb, 2017

    Hi All,

    I am trying to understand the Scholix framework and how to implement it for my data centre. Any help on this will be greatly appreciated. I will need some hand holding at the beginning a I am totally lost on where to start.

     

    Thanks

    Debjani Deb

    Scientist and Data Co-ordinator, ORNL DAAC 

    Environmental Sciences Division 

    Oak Ridge National Laboratory 

     

submit a comment