In the software citation rabbit hole (Was: Re: [fair4rs] Newly published - TU Delft Research Software Policy and...)

14 Apr 2021

Dear Paula, dear all,
software citation has been a hot topic for a few years now, and there
are quite a few different takes on it in the growing literature that is
becoming available.
When one looks closely at it, though, it turns out that this issue is
remarkably complex, and despite significant efforts and good will spent on
it, we are still far from having resolved it satisfactorily. Let me try to
summarise some of the key points I see.
At first sight, a "*software citation*" seems a simple matter: one just
wants to know how to mention a software project in the bibliography of an
article. Here is an example of how I do this currently using the
biblatex-software package (see the
entries 2, 6 and 7 below), extracted from the article
*[Rp] Reproducing and replicating the OCamlP3l experiment*
ReScience C, 6 (1), 2020
[image: image.png]
But if you dig a bit, you will quickly discover that under the term "*software
citation*" people tend to conflate at least *four different and distinct
- *archival* of the software, to make it available for the long term
- *reference* to the software, ensuring one identifies the exact version
that is used or mentioned, for reproducibility
- *description* of the software, with properly curated metadata about
the software
- *credit* given to the people involved in the software project
These four concerns (aka *ARDC*) are analysed in depth in the EOSC report
on Scholarly Infrastructures for Research Software
published this last December and available from the Publications Office of
the EU; it provides actionable recommendations to improve the current
situation by leveraging and interconnecting existing infrastructures.
Addressing the first two concerns, *archival and reference of software
artifacts*, is the easiest part, via the Software Heritage universal
software archive
you can find detailed guidelines with a running example in this article
(available in open access)
*Archiving and referencing source code with software heritage*. In ICMS,
volume 12097 of Lecture Notes in Computer Science, pages 362--373.
Springer, 2020
Concerning the *description of software*, a significant standardisation
effort has been made around the CodeMeta initiative
, that maintains a correspondence table
tools to convert from/to other metadata formats exist). An open source tool
that allows to easily generate, edit and validate metadata files is now
available (you can play
with it here and it can be
incorporated in any other web-based workflow, as it is designed as a simple
html+js page).
Quality metadata does not come for free, though: here is where *scholarly
repositories play a significant role* when they implement proper moderation
and curation mechanisms to ensure we do not get garbage in the loop. You
can find an extensive description of the process that has been put in place
for the national open access archive in France (HAL) here.
*Curated Archiving of Research Software Artifacts: Lessons Learned from the
French Open Archive (HAL).* International Journal of Digital Curation, 15
(1), pp. 16, 2020.
Finally, when it comes to *giving credit to the people* involved in a
software project, it is really important to take the time to understand *who
should get credit for what, and how*.
For the "*who gets credit for what*" part, I strongly recommend to read the
following article, that provides an overview of the best practices used for
over a decade for career evaluation of researchers at Inria (the national
research center in Informatics in France, that originated a large number of
open source landmark software projects over 50+ years, like OCaml, Coq and
Scikit-learn) and also at CNRS (the national research organization in
*Attributing and Referencing (Research) Software: Best Practices
and Outlook From Inria*
Computing in Science Engineering, 22 (1), pp. 39-52, 2020, ISSN:
(green open access:
One of the lessons learned is that there is a multiplicity of *roles* at
play in a software project, and the simple term "*author*", even with the
addition of "*contributor*", is definitely not enough to cater for it. The
article above clearly identifies *9 key roles*, based on extensive real
world experience, and we can hope they will be soon incorporated in
mainstream use also outside France.
This is a complex issue, and it definitely cannot be solved by automated
tools that try to extract an "author list" from the history of git commits
in a repository (a very bad idea that has gotten some good press some time
The "*how one gets credit*" part is one of the most difficult of all
issues: we have seen the terrible damage inflicted on the research
community by the abuse of bibliometric indicators for articles (see the DORA
declaration , but I particularly cherish this older
article from the CS comunity
and I definitely do not want to be part of any effort that will replicate
the same schema in the area of software, where counting numbers of
"citations" may be infinitely more damaging. Quoting the above mentioned
EOSC SIRS report:
*Metrics should not be reduced to simple numeric indicators, to avoid
reproducing in the research software world the negative effect **that
bibliographic indicators have had in the research publishing world. It is
necessary to bring together a broad spectrum of expertise, and include in
the conversation representatives of the research community that will be
directly impacted by the creation of these metrics.*
All the best
Computer Science Professor
(on leave at Inria from IRIF/Université de Paris)
Software Heritage E-mail : ***@***.***
Bureau C328 Twitter :
2, Rue Simone Iff Tel : +33 1 80 49 44 42
CS 42112
75589 Paris Cedex 12
GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3
On Wed, 14 Apr 2021 at 10:29, orchid00 via FAIR for Research Software
(FAIR4RS) WG <***@***.***> wrote: