Use Cases and Software Source Code Identification

You are here

28 Nov 2019
Group(s) submitting the application: 
Meeting objectives: 

The proposed session has the following objectives:

  • To introduce the group to new audiences

  • To talk about different types of identifiers that can be used for software, e.g. DOI, ARK, Handle, registry identifiers (SwMath-ID, ASCL-ID, HAL-ID, etc.), GitHub URLs, Software Heritage’s SWH-ID, Wikidata entity identifiers, etc. 

  • To review collected identification use cases and collect new use cases.

  • To review collected identification schemes 

  • To document identification schemes used for specific use cases (continue a discussion started at FORCE2019’s Research Software Hackathon as a software artifact identification  mapping, where we started a bottom-up approach, naming "things" that can be referenced when trying to identify software, to build a use case to identifier mapping: [Use case] [Software entity/artifact and granularity level] [ID used])

  • To discuss one of the contexts for which identifiers are of utmost importance: Citation of software projects for proper credit attribution

  • To discuss challenges, pros, and cons of various identifiers for the various use cases  

  • To seek input and feedback from participants and other RDA WGs/IGs for future activities

Meeting agenda: 

  • Introduce the WG (objects, outputs, connections with other RDA WGs/IGs, etc.) (~10 min)

  • Present initial use cases for software source code identification (~10 min)

    • Audience may suggest additional use cases

  • Present types of software source code identifiers (with short presentations from different TBD people about each): (~15 min)

    • DOIs (M. Fenner)

    • IDOs (hashes, SWH-ID, etc)

    • URLs

    • Wikidata entities

    • ARKs

    • Registry identifiers (ASCL-ID, RRID, SwMath-ID, etc.)

  • Small group discussion per software citation use case: For each, documenting the use case, and the challenges, pros, and cons of different identifiers for that use case (~25 min)

  • Report-back, bringing discussions together (~15 min)

  • Discussion (~15 min)

    • Feedback to the group’s progress and future activities

    • Feedback to associated groups and activities 

  • Wrap-up (~5 min)

Target Audience: 
  • Anyone (especially publishers, funders, repository/catalog/registry managers/developers/operators) who is interested and would like to know more about how to identify software source code

Brief introduction describing the activities and scope of the group: 

The objective of this working group is to bring together a broad panel of stakeholders directly involved in ​software identification​.

We believe that bringing together a broad panel of stakeholders is the best approach to avoid fragmentation in the emerging scholarly software identification landscape.

We also believe that connecting scholarly players with the daily practice of software development in industry will ease the adoption by these emerging scholarly initiatives of standards that are compatible with the well established practice of software development worldwide.

To this end, we plan to engage a dialogue with software industry bodies and software foundations that are working on standard approaches for identification of software components, like the Linux Foundation. An endorsement from such organizations would have a significant positive impact, as a shared standard will allow one to refer to both research and industry software in exactly the same way.

The planned outcomes of the working group are recommendations and guidelines for software artifact identification (in particular in its source code form), targeted specifically at scholarly stakeholders that are willing to integrate software artifact into their workflow: scientific publishers, institutional repositories, and archives, with the intent of ensuring that the solutions that will be adopted by the academic players are compatible with each other and especially with the software development practice of tens of millions of developers worldwide.

The WG has as its medium-term goals:

  • An initial collection of software identification use cases and software identifier schemas.

  • An overview of the different contexts in which software artifact identification is relevant, including

    • Scientific reproducibility

    • Fine-grained reference to specific code fragments from scientific articles or documentation

    • Description of dependency information

    • Citation of software projects for proper credit attribution

And as long-term goals:

  • Call out other RDA groups, in particular those working on citation and versioning issues, for consultation on the draft guidelines

  • A set of guidelines for persistent software artifact identification, in each of the above contexts

Short Group Status: 

This working group was endorsed around Sept 2018, as a Joint RDA & FORCE11 WG, initialized from the RDA’s Software Source Code IG and FORCE11’s Software Citation Implementation WG.

 

So far, the group has organised the following activities:

Type of Meeting: 
Working meeting
Remote participation availability (only for physical Plenaries): 
Yes
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2): 
Avoid conflict with the following group (3):