RDA WGDC Paper summarizing adoption stories and lessons learned published
-
Discussion
-
Dear all,
I am very happy to inform you that the paper jointly written by many of
the adopters of the recommendations on dynamic data citation has been
published in the Hardvard Data Science Reviews:
Rauber, A., Gößwein, B., Zwölf, C. M., Schubert, C., Wörister, F.,
Duncan, J., … Parsons, M. A. (2021). Precisely and Persistently
Identifying and Citing Arbitrary Subsets of Dynamic Data. Harvard Data
Science Review, 3(4).
https://doi.org/10.1162/99608f92.be565013
https://hdsr.mitpress.mit.edu/pub/si7wzxxa/release/2?readingCollection=1…
(Abstract attached below)
It provides a summary of the recommendations, describes the reference
implementations for different types of data and, most important of all,
presents a number of implementations that have been deployed in various
infrastructures. It is a kind of “opus magnum” for this WG, collecting a
lot of the work done in one single report and shows how far we have come
from the time when we have published the recommendations with some
initial demonstrators, via proper reference implementations to
infrastructures that have actually deployed them in practice and put
them into full operation.
My thanks go, specifically, to all the WG members who have put trust
into the recommendations and put up the effort to actually implement and
deploy them – and contributed to this paper summarizing all the lessons
learned so far – thank you so much!
I hope this paper is useful for other institutions who want to embark on
implementing the recommendations, providing templates and points of
reference. We will, of course, continue to support such adoptions – and
we would be happy to learn more about other adoptions taking place
collecting them, and sharing information about them – initially e.g. via
our webinar series, and then maybe by a sequel to this paper. We have
been a bit quite during the last plenaries as thse on-line meetings are
not th emost efficient mechanisms for discussion, especially when a lot
of these sessions are squeezed into a tight week. But we would be very
happy to pick up the webinar series again if you have any adoption
stories to share.
Best regards,
Andreas Rauber
———————-
Abstract:
Precisely identifying arbitrary subsets of data so that these can be
reproduced is a daunting challenge in data-driven science, the more so
if the underlying data source is dynamically evolving. Yet an increasing
number of settings exhibit exactly those characteristics. Larger amounts
of data are being continuously ingested from a range of sources (be it
sensor values, online questionnaires, documents, etc.), with error
correction and quality improvement processes adding to the dynamics.
Yet, for studies to be reproducible, for decision-making to be
transparent, and for meta studies to be performed conveniently, having a
precise identification mechanism to reference, retrieve, and work with
such data is essential. The Research Data Alliance (RDA) Working Group
on Dynamic Data Citation has published 14 recommendations that are
centered around time-stamping and versioning evolving data sources and
identifying subsets dynamically via persistent identifiers that are
assigned to the queries selecting the respective subsets. These
principles are generic and work for virtually any kind of data. In the
past few years numerous repositories around the globe have implemented
these recommendations and deployed solutions. We provide an overview of
the recommendations, reference implementations, and pilot systems
deployed and then analyze lessons learned from these implementations.
This article provides a basis for institutions and data stewards
considering adding this functionality to their data systems.
Log in to reply.