FAIR Principles for Research Software (FAIR4RS Principles)

10
Jun
2021

FAIR Principles for Research Software (FAIR4RS Principles)

By Paula Andrea Martinez


FAIR for Research Software (FAIR4RS) WG
Group co-chairs: Michelle BarkerPaula Andrea MartinezLeyla GarciaDaniel S. KatzNeil Chue Hong, Jennifer Harrow, Fotis Psomopoulos, Carlos Martinez-Ortiz, Morane Gruenpeter

Recommendation Title: FAIR Principles for Research Software (FAIR4RS Principles)

Authors: Neil P. Chue Hong*, Daniel S. Katz*, Michelle Barker*; Anna-Lena Lamprecht, Carlos Martinez, Fotis E. Psomopoulos, Jen Harrow, Leyla Jael Castro, Morane Gruenpeter, Paula Andrea Martinez, Tom Honeyman; Alexander Struck, Allen Lee, Axel Loewe, Ben van Werkhoven, Catherine Jones, Daniel Garijo, Esther Plomp, Francoise Genova, Hugh Shanahan, Joanna Leng, Maggie Hellström, Malin Sandström, Manodeep Sinha, Mateusz Kuzak, Patricia Herterich, Qian Zhang, Sharif Islam, Susanna-Assunta Sansone, Tom Pollard, Udayanto Dwi Atmojo; Alan Williams, Andreas Czerniak, Anna Niehues, Anne Claire Fouilloux, Bala Desinghu, Carole Goble, Céline Richard, Charles Gray, Chris Erdmann, Daniel Nüst, Daniele Tartarini, Elena Ranguelova, Hartwig Anzt, Ilian Todorov, James McNally, Javier Moldon, Jessica Burnett, Julián Garrido-Sánchez, Khalid Belhajjame, Laurents Sesink, Lorraine Hwang, Marcos Roberto Tovani-Palone, Mark D. Wilkinson, Mathieu Servillat, Matthias Liffers, Merc Fox, Nadica Miljković, Nick Lynch, Paula Martinez Lavanchy, Sandra Gesing, Sarah Stevens, Sergio Martinez Cuesta, Silvio Peroni, Stian Soiland-Reyes, Tom Bakker, Tovo Rabemanantsoa, Vanessa Sochat, Yo Yehudi

(*) Lead authors with equal contributions

Impact: Adoption and implementation of the FAIR for research software principles will create significant benefits for many stakeholders, including increased research reproducibility for research organisations, better practices and more software usage for its developers, clarity for funders around their own policies and requirements for software investments, and guidelines for publishers on sharing requirements.

This work will be of value to software project owners, researchers, users of research data and software, the scientific community, research software engineers, software developers who publish their software, software catalogue maintainers, repository managers, software preservation and archival experts, policymakers who are responsible for defining digital policies, and organisations that create, modify, manage, share, protect, and preserve research software, funders of research, and others with an interest in the FAIR principles for research software.

DOI: 10.15497/RDA00065

Citation and download: Chue Hong, N. P., Katz, D. S., Barker, M., Lamprecht, A.-L., Martinez, C., Psomopoulos, F. E., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., Honeyman, T., et al. (2021). FAIR Principles for Research Software (FAIR4RS Principles). Research Data Alliance. DOI: 10.15497/RDA00065

 

Abstract:

Research software is a fundamental and vital part of research worldwide, yet there remain significant challenges to software productivity, quality, reproducibility, and sustainability. Improving the practice of scholarship is a common goal of the open science, open source software and FAIR (Findable, Accessible, Interoperable and Reusable) communities, but improving the sharing of research software has not yet been a strong focus of the latter.

To improve the FAIRness of research software, the FAIR for Research Software (FAIR4RS) Working Group has sought to understand how to apply the FAIR Guiding Principles for scientific data management and stewardship to research software, bringing together existing and new community efforts. Many of the FAIR Guiding Principles can be directly applied to research software by treating software and data as similar digital research objects. However, specific characteristics of software — such as its executability, composite nature, and continuous evolution and versioning — make it necessary to revise and extend the principles.

This document presents the first version of the FAIR Principles for Research Software (FAIR4RS Principles). It is an outcome of the FAIR for Research Software Working Group (FAIR4RS WG).

The FAIR for Research Software Working Group is jointly convened as an RDA Working Group, FORCE11 Working Group, and Research Software Alliance (ReSA) Task Force.

 

 

Output Status: 
Recommendations with RDA Endorsement in Process
Review period start: 
Friday, 11 June, 2021 to Sunday, 11 July, 2021
Group content visibility: 
Public - accessible to all site users
Primary Domain/Field of Expertise: 
Social Sciences, Natural Sciences, Engineering and Technology, Medical and Health Sciences, Agricultural Sciences, Humanities
Domain Agnostic: 
Domain Agnostic
File: 
AttachmentSize
PDF icon FAIR4RS_Principles_v0.3_RDA-RFC.pdf781.84 KB
  • Keith Russell's picture

    Author: Keith Russell

    Date: 14 Jun, 2021

    Hi all,

    Thank you for a really interesting translation of the FAIR principles for software. I like the solutions for addressing the fact that Accessible, Interoperable and (Re-)Usable are different things for software. One thing I wondered about is whether it would be worth explicitely mentioning use of software for specific analysis and therefore include links to Identifiers of articles and data sets. I think you do already cover that to some extent under Interoperable, but might it be worth a mention under Provenance?

    I noted one unfinished line "would not have responsibility for making the depende....."

    But again, really interesting and great work.

    Regards

    Keith

     

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 25 Jun, 2021

    Dear Keith,

    on behalf of the FAIR4RS drafting team, thank you for your comment.
     


    One thing I wondered about is whether it would be worth explicitely mentioning use of software for specific analysis and therefore include links to Identifiers of articles and data sets. I think you do already cover that to some extent under Interoperable, but might it be worth a mention under Provenance?

    As you note, we partially address this under I2. Software includes qualified references to other objects

    Under R1.2 Software is associated with detailed provenance (which should be read in conjunction with R1) our intent was that this should primarily cover the provenance of the software itself, rather than the software's role in the provenance of analysis. 

    I think the suggestion you are making is implicitly covered by R1. Software is described with a plurality of accurate and relevant attributes, but I wanted to clarify whether you meant that software is more reusable if it explicitly includes links (with identifiers) to articles and data sets that show its use for a specific type of analysis (effectively documenting its use for a type of research)? Or did you mean something else?


    I noted one unfinished line "would not have responsibility for making the depende....."

    Good spot! That sentence should read "This would ultimately be intractable as the authors of the software would not have responsibility for making the dependencies FAIR."

  • Keith Russell's picture

    Author: Keith Russell

    Date: 06 Jul, 2021

    Hi Neil,

    To answer your question: 

    'whether you meant that software is more reusable if it explicitly includes links (with identifiers) to articles and data sets that show its use for a specific type of analysis (effectively documenting its use for a type of research)? Or did you mean something else?'   That is indeed what I meant.

    R1 in the FAIR principles is a bit of a tricky one as it often just viewed as a header. The real principles are the two Sub-principles R1.1 and R1.2. So to cover off these rich attributes they would need to be covered under Provenance.

    For a researcher trying to interpret whether they can re-use a piece of software for their purpose having a link to the article describing the analysis and the data set used in the analysis is extremely valuable, it will give them a much better view for what purpose the software was designed. It is great that you are asking for these identifiers under I2. I just wonder if it doesn't hurt to be more explicit that such information is invaluable to increase the reusability and not just the interoperability of the software. This is possibly more important for the guiding documentation than that it can be covered in the principles itself.

    I hope this is helpful. 
    Kind regards,
    Keith

     

     

     

  • Joachim Wuttke's picture

    Author: Joachim Wuttke

    Date: 14 Jun, 2021

    »To support a wide range of reuse scenarios, the license should be as
    open as possible« [R1.1] - Adoption of this rule would preclude voluntary choice of the GPL.

    Many researchers consciously choose the GPL. The aporia here is about how to support reuse. Is software reuse at large supported best by allowing inconditional use of our creations? Or by the reasonable and fair demand that those who reuse our code also allow reuse of their extensions?

    Whatever your stance on this, you should make it explicit, and not advise against the GPL without proper explanation.

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 25 Jun, 2021

    Dear Joachim,

    on behalf of the FAIR4RS drafting team, thank you for your comment.


    To support a wide range of reuse scenarios, the license should be as
    open as possible« [R1.1] - Adoption of this rule would preclude voluntary choice of the GPL.

    Many researchers consciously choose the GPL. The aporia here is about how to support reuse. Is software reuse at large supported best by allowing inconditional use of our creations? Or by the reasonable and fair demand that those who reuse our code also allow reuse of their extensions?

    Thanks for raising this - it was something that came up drafting discussions. We are conscious that the original intent of the FAIR principles was "as open as possible, as closed as necessary". Therefore we've drafted the principle itself to only say "R1.1 Software must have a clear and accessible license." In the explanatory text, we've said "To support a wide range of reuse scenarios, the license should be as open as possible. This license must also be compatible with the requirements of the licenses of the software’s dependencies so that the software can be legally combined." The intent here is to persuade people to use open source licenses where possible, not preclude the use of the GPL. Personally, I would include both permissive and copyleft licenses to be "as open as possible". We will make this clearer in the guidance that is used by communities.

    If you have a suggestion of a specific rewording to improve the explanatory text itself, please do let us know

  • Limor Peer's picture

    Author: Limor Peer

    Date: 15 Jun, 2021

    Thank you for producing a very comprehensive and clear document. I'm pleased to see language in this version that refers to the shared responsibility for applying FAIR4RS Principles -- I think it's important to emphasize that while the primary responsibility lies with software creators and owners, it often falls to those tasked with quality review and stewardship (who are really the first users) to follow through. I suggest also referencing this issue, and the need to build capacity for this type of work, in the section on the path to adoption. Thanks again for great work!

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 25 Jun, 2021

    Dear Limor,

    on behalf of the FAIR4RS drafting team, thank you for your comment.

    You make a very good point, and we will add a reference to this issue in the section on path to adoption as you suggest.

  • Yo Yehudi's picture

    Author: Yo Yehudi

    Date: 21 Jun, 2021

    These principles are very clear and well laid out - two small comments, both about possible examples of the principles:

    A1 talks about protocols to access software. I wasn't sure if this meant something like git or https, or whether it meant a defined process document on a website, or something else. Maybe it meant all of those? :)

     

    Similarly, F4: Metadata are FAIR and indexable. I couldn't decide based on this if publishing a software artifact as a ZIP on zenodo, with embedded .cff might becompliant with this rule, or if perhaps I am supposed to upload the cff itself to a repo somewhere.... or maybe something else? I broadly understand the _intention_ of this rule but struggled a little to understand the specifics about how one might meaningfully comply.

    Other than that I thought the rules were really clear and wasdelighted to see the note about overloading accessiblity as a term :) it's too little loved as it is and I dread seeing it dropped in favour of FAIR accessibility.

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 25 Jun, 2021

    Dear Yo,

    on behalf of the FAIR4RS drafting team, thank you for your comments.


    A1 talks about protocols to access software. I wasn't sure if this meant something like git or https, or whether it meant a defined process document on a website, or something else. Maybe it meant all of those? :)

    There was considerable debate during the initial discussions and consultations about how to best interpret this specific principle from the way it is phrased for FAIR data. For much software, there are very commonly used technical communications protocols such as git or https to gain access to the software. Arguably (though we did not include this in the explanatory text) a line of text on a website saying "email the author at this address and you'll be emailed the code" could fulfil  A1, A1.1 and A1.2.

    It would be good to get a sense from the community about whether there is broad commonality around how people normally get access to software, in which case we can be more specific in the explanatory text.


    Similarly, F4: Metadata are FAIR and indexable. I couldn't decide based on this if publishing a software artifact as a ZIP on zenodo, with embedded .cff might becompliant with this rule, or if perhaps I am supposed to upload the cff itself to a repo somewhere.... or maybe something else? I broadly understand the _intention_ of this rule but struggled a little to understand the specifics about how one might meaningfully comply.

     I think the challenge here is that we don't have enough community practice around this yet. Adding metadata in a file to a repository would certainly comply, as would registering the metadata when depositing the software in e.g. Zenodo. 

    If you have some suggestions for the kind of guidance that you'd like to see to help you apply this principles, it would be really helpful for us as we develop training and guidance materials.


    Other than that I thought the rules were really clear and wasdelighted to see the note about overloading accessiblity as a term :) it's too little loved as it is and I dread seeing it dropped in favour of FAIR accessibility.

    There is definitely a need for follow-on work to decide what principles are required to complement FAIR for research software in the same way that, for instance, the CARE principles extend and complement FAIR for data. 

  • Nicola Soranzo's picture

    Author: Nicola Soranzo

    Date: 24 Jun, 2021

    Thanks for working on this important topic! I haven't had the time to read it through yet, but just noticed a copy-paste typo at page 23: "F4. Metadata are FAIR and is
    searchable and indexable." should be "F4. Metadata are FAIR and are searchable and indexable."

    Also a question: for "R1.1. Software must have a clear and accessible license.", did you consider the license proliferation issue? This affects the reusability of the software when used as a dependency for other softwares (see also https://opensource.org/proliferation-report ), so it may be a good idead to recommend choosing, when going for an open source license, one of the "popular licenses"  listed at https://opensource.org/licenses .

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 25 Jun, 2021

    Dear Nicola,

    on behalf of the FAIR4RS drafting team, thank you for your comments.


     I haven't had the time to read it through yet, but just noticed a copy-paste typo at page 23: "F4. Metadata are FAIR and is searchable and indexable." should be "F4. Metadata are FAIR and are searchable and indexable.

    Thanks for spotting this - as you can tell, we did a final editing pass to rationalise the way that we used terms that can be both singular, plural and plural singular, but clearly missed some.


     Also a question: for "R1.1. Software must have a clear and accessible license.", did you consider the license proliferation issue? This affects the reusability of the software when used as a dependency for other softwares (see also https://opensource.org/proliferation-report ), so it may be a good idead to recommend choosing, when going for an open source license, one of the "popular licenses"  listed at https://opensource.org/licenses .

     

    This is a good point to raise, and it did come up in the discussions and earlier consultations. During the drafting, it was decided that we should aim to respect community standards and norms. However this particular issue is probably common to all communities, so is something that we could address in the explanatory text to improve practice.

  • Nicola Soranzo's picture

    Author: Nicola Soranzo

    Date: 01 Jul, 2021

    Hi Neil,

    thanks for your reply!

    This is a good point to raise, and it did come up in the discussions and earlier consultations. During the drafting, it was decided that we should aim to respect community standards and norms. However this particular issue is probably common to all communities, so is something that we could address in the explanatory text to improve practice.

     

    Yes, addressing in the explanatory text would be appropriate, I think. Thanks for considering!

  • Tek Raj Chhetri's picture

    Author: Tek Raj Chhetri

    Date: 24 Jun, 2021

    Thank you for producing the comprehensive document. I have a few comments (or suggestions).

    1. The document emphasises rich metadata and talks about maintaining metadata even after the software is no longer available (A2). Further, it also mentions that the metadata should be both human and machine-readable. -- How about making use of Linked Data (or Linked Open Data) (https://www.w3.org/egov/wiki/Linked_Open_Data, https://www.w3.org/standards/semanticweb/data) for metadata? This also enables discoverability, as discussed in F4.
    2. I1, software reads, write and exchange data- The data could be of any type, i.e. personal and non-personal. Laws like GDPR, however, restrict how these data are exchanged or processed. The document also talks about transparency (at the beginning- aims), so I was wondering if we should also adopt (or address) the issues that may arise due to laws like GDPR (probably in future)?
    3. In I1, the document also talks about APIs documentation which should be again both human and machine-understandable. I think we can refer to Swagger API documentation as an example.

     

    Great work!

     

    Regards,

    Tek

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 25 Jun, 2021

    Dear Tek,

    on behalf of the FAIR4RS drafting team, thank you for your comments.


    The document emphasises rich metadata and talks about maintaining metadata even after the software is no longer available (A2). Further, it also mentions that the metadata should be both human and machine-readable. -- How about making use of Linked Data (or Linked Open Data (https://www.w3.org/egov/wiki/Linked_Open_Datahttps://www.w3.org/standards/semanticweb/data) for metadata? This also enables discoverability, as discussed in F4.

    The use of linked data is certainly an approach that supports the FAIR principles for research software. In the draft, we didn't explicitly mention it, because its use is not commonplace across all communities, and there are other approaches that are being used by different communities. Here, I think there's a strong role for community specific guidance which explains how linked data can be used to make software (and other research objects) FAIR.


    I1, software reads, write and exchange data- The data could be of any type, i.e. personal and non-personal. Laws like GDPR, however, restrict how these data are exchanged or processed. The document also talks about transparency (at the beginning- aims), so I was wondering if we should also adopt (or address) the issues that may arise due to laws like GDPR (probably in future)?

    There is definitely a wider challenge around how legislation (and other policies) affect the exchange and processing of data. However these are probably at a level above the FAIR principles for research software - it would be possible for the software to be FAIR, even if the data is subject to GDPR or similar laws. An example might be its use on synthetic data.

    If you have a specific scenario which you think isn't currently addressed by the principles around data protection, we'd be interested to hear it, so we can discuss whether this should be explicitly addressed in the FAIR4RS principles.


    In I1, the document also talks about APIs documentation which should be again both human and machine-understandable. I think we can refer to Swagger API documentation as an example.

    Would it be more appropriate to refer to the OpenAPI Specification (which is a successor to Swagger API documentation)? 

  • Joris van Eijnatten's picture

    Author: Joris van Eijnatten

    Date: 29 Jun, 2021

    Clearly, a lot of work has gone into drafting these principles. It is great to see the result of this effort. Thanks, on behalf of the community! For the Netherlandse eScience Center, transparency, reproducibility and reusability of research software are fundamental and we hope these principles will indeed be helpful in that direction. I note that principle R1.1 is equivalent with the existing recommendation https://fair-software.eu/recommendations/license/.

    In principle R3 “domain-relevant community standards” seems to encompass a very broad range of things (documentation, coding practices, standards for testing). Although we agree that standard practices will be community dependent, it would be great to see at least some level of agreement between these communities. Otherwise, providing guidance is impossible.

    Another question concerns the definition of community itself in the context of R3. If research software is a ‘fundamental and vital part of research worldwide’ , it seems an omission that research and researchers as such are hardly mentioned in fleshing out the FAIR principles. For example, in R3 it isn’t made clear what is understood by ‘the community’. If the community is seen as a community of developers/RSEs in a narrower sense, then coherence is easier to come by than if the community is seen as something focused more on research as such. Ideally both communities would form an integral unity, but the point is that research tends to follow its own course. In other words: research follows research problems rather than research software, while research software should follow research demand. This means that ‘fragmentation of community practice’ is necessarily difficult (if not impossible) to avoid.

    Quality is mentioned once as a challenge (in the abstract) and once as a goal (as the need for ‘high quality software’). Yet the relevance and importance of quality software aren’t addressed explicitly anywhere in the document. One would expect ‘quality’ to surface under ‘Reusable’, but it doesn’t. The implication is that software quality has little or no direct bearing on usability or reusability, if only because quality means different things for different people and for different types of software . If that is the case (and it is one that could be defended), why mention quality in the first place? Or is software quality something that partially overlaps with the FAIR principles but not completely addressed by them?

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 02 Jul, 2021

    Dear Joris,

    on behalf of the FAIR4RS drafting team, thank you for your comments.


    For the Netherlandse eScience Center, transparency, reproducibility and reusability of research software are fundamental and we hope these principles will indeed be helpful in that direction. 

    We acknowledge the effort and leadership that the NLeSC has provided in this area, and look forward to your support for the implementation and adoption of the FAIR4RS Principles.


    In principle R3 “domain-relevant community standards” seems to encompass a very broad range of things (documentation, coding practices, standards for testing). Although we agree that standard practices will be community dependent, it would be great to see at least some level of agreement between these communities. Otherwise, providing guidance is impossible.

    One of the challenges identified here is that different research communities have different "transitions" between what is considered appropriate for different maturity levels of a piece of research software. Therefore, while it may be possible to get some consensus about the minimum standards that should be used, it would be near impossible for the FAIR4RS Principles to document this in a more specific way that worked for all communities.

    However, the wider work of the FAIR4RS working group, in particular the FAIR4RS Roadmap work, will be addressing the question of how communities can provide guidance, including on how to choose and apply standards in relation to the FAIR4RS Principles.


    Another question concerns the definition of community itself in the context of R3. If research software is a ‘fundamental and vital part of research worldwide’ , it seems an omission that research and researchers as such are hardly mentioned in fleshing out the FAIR principles. For example, in R3 it isn’t made clear what is understood by ‘the community’. If the community is seen as a community of developers/RSEs in a narrower sense, then coherence is easier to come by than if the community is seen as something focused more on research as such. Ideally both communities would form an integral unity, but the point is that research tends to follow its own course. In other words: research follows research problems rather than research software, while research software should follow research demand. This means that ‘fragmentation of community practice’ is necessarily difficult (if not impossible) to avoid.

    How the FAIR4RS Principles should define "community" came up in many of the comments in the previous consultations, and it may be that we have not adequately explained how we define the concept. Our intent is that community broadly means the wider definition of a research community (which will include both researchers, RSEs and others). We have seen some of these communities address fragmentation of community practice through the agreement of guidance / guidelines e.g. the ESIP Software Assessment Guidelines for the Earth Sciences.

    We'd be happy to take suggestions of how to improve the explanatory text to define "the community" more clearly.


    Quality is mentioned once as a challenge (in the abstract) and once as a goal (as the need for ‘high quality software’). Yet the relevance and importance of quality software aren’t addressed explicitly anywhere in the document. One would expect ‘quality’ to surface under ‘Reusable’, but it doesn’t. The implication is that software quality has little or no direct bearing on usability or reusability, if only because quality means different things for different people and for different types of software . If that is the case (and it is one that could be defended), why mention quality in the first place? Or is software quality something that partially overlaps with the FAIR principles but not completely addressed by them?

    This is a good point. In an earlier draft, there was some additional text around potential aspects of software quality under the "Reusable" principles, principally around the "dependendability" or "robustness" of software. This was later removed for the reasons that you mention - reusability was redefined slightly, and software quality is seen as something which overlaps with the FAIR4RS principles but is not directly addressed by them. We will review the text to reduce any confusion around this.

    We envisage that there will be complementary principles to the FAIR4RS principles, in the same way that there have been for data e,g, FAIR+CARE, that address other software principles such as quality or accessibility (in the other software engineering sense). 

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 07 Jul, 2021

    The following comments were received via Twitter from Paul Secular, and are cross-posted here with permission.


     

    I share the concern on p.16 about the risk of confusion over the term "accessible": "it may lead to confusion and mean the principle is not well-understood across all domains" Accessible is already widely used to mean that people are not excluded on the basis of disability.

    Thanks for this feedback. It is clear that the multiple meanings of accessibility in the context of software will need to be clearly explained.


    Unfortunately, I also worry that "It will take significant effort to gain wide-spread adoption of the FAIR4RS Principles" is maybe an understatement. I am concerned it may, in practice, require a radical overhaul of academic culture, research, funding, etc.

    As you note, the FAIR4RS Principles on their own are not sufficient for adoption. Further work by the FAIR4RS working group will address adoption, and the wider work of the Research Software Alliance and collaborators is aimed at those changes in research culture, funding and institutions that are required - as noted by Dan Katz, a co-author of the FAIR4RS Principles, FAIR is not the end goal, it's just one part of the solution.


    I recommend removing gendered language. e.g. "she/he needs" should be replaced with "they need".

     

    We agree that non-gendered language should be used.

    In this case, I believe “she/he" is only in Appendix B, from directly quoted text from the GO-FAIR website presented to show the evolution of the principles - I will pass this suggestion on to them.

  • Rob van Nieuwpoort's picture

    Author: Rob van Nieuwpoort

    Date: 09 Jul, 2021

    I would like to express my thanks and appreciation for this wonderful initiative and report. Great work!

    Here some feedback on the document.

     

    On P9 the document states: “Many software engineering practices are relevant to various of the FAIR4RS Principles. For instance: localization can improve accessibility, design patterns can improve interoperability, and documentation and encapsulation can improve reusability. Nevertheless, while important more generally for producing high quality software, they are best addressed separately from (but as a complement to) the FAIR4RS Principles.”

     

    I understand the details of these best practices are out of scope here. Nevertheless, this is one essential point that distinguishes software from data, and thus one of the reasons we need specific FAIR principles for software in the first place. I think the most profound impact of the best practices are around reusability (I also like the other examples given, but I would argue they all have an impact on reusability as well). So, shouldn’t we include a general statement about software quality and best practices as principle R4?  For example:

    R4. Software aims to adhere to relevant software engineering best practices.

    And then the examples as given on P9.

     

     

    On page 12, the Interoperable principle:

    “I: The software interoperates with other software through exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs).”

    When phrased like this, it is unclear what the added value of this principle is. How else should software interoperate then through exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs)? I.e., this principle always holds and does not exclude any software? Moreover, interoperation is not a goal by itself, it is a means to an end. I think what is meant is that these data formats and APIs should be standardized? If so, can’t we simply start with the text of principle I1?

     

     

    On page 12: “This includes the use of data types and formats that ideally are formally described using controlled vocabularies, to facilitate machine readability and data exchange.”

    This is a very limited view on data exchange. Quite often, data is simply exchanged via formally defined (binary or text-based) standard formats. Not every data exchange standard uses controlled vocabularies, nor should.

    Also, I think we should carefully consider the language we use. It should be understandable by domain researchers. I suspect almost no one (other than computer scientists) knows what “controlled vocabularies” are.

     

    P13: “Where software interacts via APIs, these should be documented so that their capabilities can be inspected and understood by humans and machines.”

    This is a fairly limited scope of this principle. Shouldn't we advise the use of open API protocols when possible here?

     

     

    P14, second paragraph of R3.

    I understand the point of this paragraph, but I think we should consider that communities can be very diverse, as are their members. Many different programming languages and file formats are used within any given community, with good reason. That often is a strength, not a weakness. I don't think we should universally strive for convergence and certainly not in a top-down manner or by including this in the FAIR4RS principles. There are so many aspects to this, such as using the right tool for the job, availability of hardware and software, training, domain-specific aspects, etc. I don't believe in convergence of languages and formats as a goal of the FAIR4RS principles. If this is a goal or not is up to the relevant communities. I would suggest omitting the second paragraph of R3 altogether.

     

    Thanks again for all the hard work, excellent result!

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 09 Jul, 2021

    The following comments were received from Wilhelm Hasselbring and are cross-posted with permission.


    I took a quick look at the paper, great effort!

    What I'm missing concerning Interoperability and Reusability is a mention to container technologies such as Docker.

    Furthermore, research software may also be provided as a service (SaaS). For instance, via CodeOcean or BinderHub.
    This could also be mentioned, as it may have impact on the FAIR principles.

    How to refer to containers, and their role in the FAIR principles for software is something that was discussed in the previous community consultations. It was felt that the use of containers (as opposed to software in another form) does not inherently make software more interoperable or reusable if the FAIR4RS principles are followed. However it would be reasonable to add some text to include containers as a common form of executable package.

    Software as a Service is primarily discussed under Accessibility (and in Challenges to Implementation). We think there needs to be more work done to understand whether any aspects of SaaS need to be considered by the FAIR4RS Principles or as part of more general FAIR principles for services / workflows. We will add something to the Challenges to Implementation section to note this.

  • Neil Chue Hong's picture

    Author: Neil Chue Hong

    Date: 09 Jul, 2021

    The following comment was received from Peter Hill via the RSE Slack and is cross-posted with permission.


    I2 seems to use "qualified" slightly differently than the FAIR data principles, where they say "For example, X is regulator of Y is a much more qualified reference than X is associated with Y, or X see also Y.". Maybe FAIR4RS should include something like "X takes Y as input"?

    Thanks for this input. We will discuss how best to address this in the text. During the drafting process, we felt that qualified references for software to other objects were slightly different from data. We have provided some examples and were looking to keep the text more open to different ways that these references were expected to be used, but your comment suggests that this could be clarified further.

  • Michelle Barker's picture

    Author: Michelle Barker

    Date: 11 Jul, 2021

    The following comments were received via email and posted with permission from:

    Michael Barton

    Director CoMSES.Net (Network for Computational Modeling in Social and Ecological Sciences)

    Comments for FAIR4RS proposal.

    This is a very useful document to help guide the application of FAIR principles to research software. The Network for Computational Modeling in Social and Ecological Sciences (CoMSES.Net) would be happy to adopt such guidelines if approved. I offer a few detailed comments that might improve these guidelines, referenced by document section.

    F. Metadata should be readable by humans as well as machine-readable. I think this is intended, but never explicitly stated.

    F3. (perhaps more appropriate in the challenges section). This is a good idea but difficult to apply in practice. The identifier cannot be added to a metadata document until it is assigned. However, identifiers like DOIs are commonly not assigned until a digital object (including its metadata) are published. In Zenodo, for example, updating metadata to include the DOI requires a new release, which generates a new DOI, etc. Not sure how to solve this dilemma. An identifier should link back to the metadata anyway, so maybe it is not necessary for it to be in the metadata document.

    F4. Metadata must also be human readable to be human discoverable.

    A1.2. A typo. Replace "...having an embargo period, however this..." with "...having an embargo period. However, this..."

    A1.2. This statement is true, but seems to contradict A1.1. Restricted software is not accessible in the way FAIR is intended. The phrase "not optimal" seems misplaced and confusing. Much software is not accessible according to FAIR principles. As indicated in the preamble, this document is not a judgement about quality, is aspirational, and can serve as a metric. This paragraph does not reflect those goals. Software that is restricted is not accessible, even though it may be findable and even reproducible or interoperable.

    A2. Should release date and any updates be specified in the metadata to help users assess this?

    I: A typo. Replace "...write or otherwise exchange the same formats." with "...write or otherwise exchange data in the same formats."

    I2. If the software is FAIR, external data objects so referenced should not contain sensitive information (e.g., personal information). Alternatively, if such data objects are needed for reproducibility, there needs to be a warning in the relevant metadata/documentation that the data contain sensitive information and can only be used with proper authorization. This is especially important for proliferating machine learning models and their training/test datasets.

    I and R: It is very useful to distinguish between interoperable and reusable as is done in this document.

    R1. Although documentation for usability is a kind of metadata (i.e., data about a data object in the broad sense), it is distinct in format and purpose from more "normal" metadata described in the Findable section. Most metadata (and common metadata standards) are aimed at finding data objects and, related to discoverability, helping a user ensure that s/he has found the desired data object and can reference it appropriately (including provenance information). In this sense, metadata are not about how to use the data object (e.g., how to read a book). For digital research data, this perspective must be expanded somewhat to include the meaning of data fields or codes. But software documentation is about how to use the data object (i.e., software), not how to find it or reference it. Moreover, software documentation is often stored in separate files from structured metadata, and probably could not be stored as structured metadata using common community standards today. So I'm wondering it it would be worthwhile to somehow distinguish between metadata, as it is commonly considered, and 'documentation' described in the Reusable section here? Currently, the two terms are used interchangeably in this section, but metadata is used in the more common way in the Finable section. 

  • Michelle Barker's picture

    Author: Michelle Barker

    Date: 11 Jul, 2021

    The following comments were received via email and posted with permission from:

    Michael Barton

    Director CoMSES.Net (Network for Computational Modeling in Social and Ecological Sciences)

    Comments for FAIR4RS proposal.

    This is a very useful document to help guide the application of FAIR principles to research software. The Network for Computational Modeling in Social and Ecological Sciences (CoMSES.Net) would be happy to adopt such guidelines if approved. I offer a few detailed comments that might improve these guidelines, referenced by document section.

    F. Metadata should be readable by humans as well as machine-readable. I think this is intended, but never explicitly stated.

    F3. (perhaps more appropriate in the challenges section). This is a good idea but difficult to apply in practice. The identifier cannot be added to a metadata document until it is assigned. However, identifiers like DOIs are commonly not assigned until a digital object (including its metadata) are published. In Zenodo, for example, updating metadata to include the DOI requires a new release, which generates a new DOI, etc. Not sure how to solve this dilemma. An identifier should link back to the metadata anyway, so maybe it is not necessary for it to be in the metadata document.

    F4. Metadata must also be human readable to be human discoverable.

    A1.2. A typo. Replace "...having an embargo period, however this..." with "...having an embargo period. However, this..."

    A1.2. This statement is true, but seems to contradict A1.1. Restricted software is not accessible in the way FAIR is intended. The phrase "not optimal" seems misplaced and confusing. Much software is not accessible according to FAIR principles. As indicated in the preamble, this document is not a judgement about quality, is aspirational, and can serve as a metric. This paragraph does not reflect those goals. Software that is restricted is not accessible, even though it may be findable and even reproducible or interoperable.

    A2. Should release date and any updates be specified in the metadata to help users assess this?

    I: A typo. Replace "...write or otherwise exchange the same formats." with "...write or otherwise exchange data in the same formats."

    I2. If the software is FAIR, external data objects so referenced should not contain sensitive information (e.g., personal information). Alternatively, if such data objects are needed for reproducibility, there needs to be a warning in the relevant metadata/documentation that the data contain sensitive information and can only be used with proper authorization. This is especially important for proliferating machine learning models and their training/test datasets.

    I and R: It is very useful to distinguish between interoperable and reusable as is done in this document.

    R1. Although documentation for usability is a kind of metadata (i.e., data about a data object in the broad sense), it is distinct in format and purpose from more "normal" metadata described in the Findable section. Most metadata (and common metadata standards) are aimed at finding data objects and, related to discoverability, helping a user ensure that s/he has found the desired data object and can reference it appropriately (including provenance information). In this sense, metadata are not about how to use the data object (e.g., how to read a book). For digital research data, this perspective must be expanded somewhat to include the meaning of data fields or codes. But software documentation is about how to use the data object (i.e., software), not how to find it or reference it. Moreover, software documentation is often stored in separate files from structured metadata, and probably could not be stored as structured metadata using common community standards today. So I'm wondering it it would be worthwhile to somehow distinguish between metadata, as it is commonly considered, and 'documentation' described in the Reusable section here? Currently, the two terms are used interchangeably in this section, but metadata is used in the more common way in the Finable section. 

  • Morane Gruenpeter's picture

    Author: Morane Gruenpeter

    Date: 11 Jul, 2021

    A few comments were recieved during the FAIR4RS RDA France atelier in the notes document:

    https://docs.google.com/document/d/19NCSJuPJiAVPb0tclfHJoGURYpEV09o7n7aF...

    I'll translate the comments on the document and copy here ASAP.

     

  • Roberto Di Cosmo's picture

    Author: Roberto Di Cosmo

    Date: 11 Jul, 2021

    Dear all,

    thanks for asking for input from a broader community.

    As a general remark, this document shows how difficult it is to try and adapt the FAIR principles to software : this does not come as a surprise, since FAIR principles were designed with data bases in mind, while software is of a totally different nature, and it is not clear at all that the best way to approach its specificities is to try and translate principles designed for something else.

    One striking example is the particularly surprising statement in the current draft that these principles are not concerned with long term preservation of software : this is in total contradiction with clear statements made in various high level documents like the recently released National Plan for Open Science in France, and the EOSC SIRS report published in 2020.

    The absence of documents like the EOSC SIRS report from the bibliography (which is much too short and is not sufficiently used to support the statements made in the document), makes one wonder whether the working group is missing key relevant information.

    The section "Challenges to Implementation" reveals that there is no clearcut and consensual approach for a broad range of important subjects identified in the report, and it is difficult to understand how one can state, in the « path to adoption » section that the next step is to « promote the outcomes, aiming to raise awareness and facilitate a wider adoption of the FAIR4RS WG outcomes by existing and emerging initiatives ».

    I strongly suggest that the working group takes the time to rreconsider in depth this draft, before moving forward.

     

     

     

submit a comment