Bibliometric indicators are essential to obtain quantitative measures for the assessment of the quality of research and researchers and the impact of research products. Systems and services such as the ISI’s Science Citation Index, the h-index (or Hirsch number), or the impact factor of scientific journals have been developed to track and record access and citation of scientific publications. These indicators are widely used by investigators, academic departments and administration, funding agencies, and professional societies across all disciplines to assess performance of individuals or organizations within the research endeavour, and inform and influence the advancement of academic careers and investments of research funding, and thus play a powerful role in the overall scientific endeavour.
The basic idea of bibliometrics is to evaluate the attention scientific publications receive within the scientific community. The classical approach is based on counting formal citations in the literature, and despite various critical aspects—ambiguity of authorship, self-citations etc.—these indicators have become widely adopted across all of science. Similar indicators for the value and impact of data publications are needed to raise the value and appreciation of data and data sharing as the missing recognition for data publication in science is seen as the major cause for the reluctance of data producers to share their data. The overall objective of this working group therefore is toconceptualize data metrics and corresponding services that are suitable to overcome existing barriers and thus likely to initiate a cultural change among scientists, encouraging more and better data citations, augmenting the overall availability and quality of research data, increasing data discoverability and reuse, and facilitating the reproducibility of research results.
Principally, existing metrics for scientific papers could also be applied to data publications. However, an extrapolation of the classical bibliometric approach to research data are difficult to realize because:
● Citing data are not a standard practice in the scientific community. At present, references to data in the literature are rare and do not follow a generally agreed schema. No recommended Best Practices for data citation exist. This is also true for data products compiled in general from already published data.
● There is a large variety in the structure and practices of data repositories. Many repositories are not prepared for the data publishing concept, and have not implemented formal data publication procedures. Granularity, versioning, persistent identification, metadata, and review of data entities are among the unresolved issues.
Besides the classical approach, various alternative metrics for data evolved during the last years. These so-called ‘altmetrics’ are based on data usage analysis (except citations as indicators of usage) and content evaluation quantified e.g. through dataset downloads or analysis of annotations of datasets by users (social tagging). However, applications of existing solutions are isolated and scarcely comparable, thus are currently not usable as a basis for representative indicators. Nevertheless, seeing the potential and dynamics behind developments, altmetrics need to be considered as serious concepts beside the classical approach.
Any approach to data metrics needs to address the challenge of a cultural change in science toward full appreciation and recognition of data as an essential part of the scholarly record. Metrics for data need to be designed and conceived in a way that all stakeholders will embrace them as credible, valuable, and meaningful.
As a summary, one may say that at present there is no generally acknowledged metric for data. This Working Group will bring together the essential stakeholders in this field, will investigate the requirements and recommend necessary steps to be taken. Activities will address different levels:
● Organizational: What are the overall changes in the scholarly publishing system needed to foster proper attribution of datasets? Which are the building blocks for an optimal system? Which changes are needed from funders, data centres, science publishers, and science service providers? What is the optimal way of interaction between stakeholders? Do we need commonly operated services?
● Technical: Which are the technical components, interfaces, and standards that need to be developed and used? What current capabilities can be adopted as solutions, what is missing?
● Methodological: What methodologies for data metrics need to be developed? What are the costs and benefits of altmetrics versus traditional processes? What research into indicators is needed and what are the strengths and weaknesses of individual indicators?
● Financial: What are the costs for data metrics (seen as a cost component of data publication)? Who will pay for it?
This Bibliometrics WG is part of the overarching RDA-WDS IG on Data Publishing and as such covers a particular thematic field. On the one hand, the group relies in part on the results from the other groups—in particular the workflows WG— and on the other hand, it delivers results to the other groups—here in particular to the data publishing services WG.
Good and practicable bibliometrics are fundamental for establishing data publication and data sharing as a recognized contribution to science. This is a prerequisite for realizing the vision of an open, comprehensive, global knowledge base of scientific data as the new paradigm for scientific discovery in the 21st century.
Who will benefit
Data bibliometrics will allow data producers, data centres/publishers, data managers, research facilities and academic institutions, science publishers, and funding agencies to demonstrate quantitatively and formally the significance and viability of data to the advancement of science.
We anticipate that bibliometrics for data will have a profound impact on the willingness of researchers to make their data openly accessible, on the availability of sustained funding for data centres, and on the institutional changes in academic institutions to acknowledge formally data contributions as part of the scholarly record that is used in tenure and promotion evaluations. These anticipated cultural changes will likely lead to a rapid growth of available data.