Computational Notebooks

You are here

31 Jul 2020

Computational Notebooks

Submitted by Hugh Shanahan


Meeting objectives: 

Notebooks, specifically Jupyter notebooks (but also notebooks from platforms such as RStudio), are generating a great deal of excitement and are potentially a significant step forward in terms of reproducibility, education, code documentation and academic publishing. A recent paper found nearly 1.2 million unique notebooks based on searches through GitHub; another even more recent blog gives evidence of nearly 5 million notebooks on GitHub.

This is colossal growth of the  use of notebooks by researchers. The RDA absolutely needs to engage with this as it enables sharing of both data and software.

More recently the development and deployment of resources such as JupyterHub, myBinder and EGI notebooks mean that notebooks can be deployed in the cloud and helps reduce issues that occur when running on a user's own system. For beginner and intermediate users this could become the easiest pathway for using cloud computing resources.  It is clear that frameworks such as EOSC and the EGI  will make extensive use of notebooks, and their offerings affected by them. On the other hand, the closure of services such as Azure Notebooks means that the preservation of these notebooks is also key.

An Interest Group will be proposed to match this BoF session and build on the BoF held at RDA Helsinki . A set of topics (discussed in detail in the objectives below) has been arrived at that a) would make a tangible contribution to the notebook community and b) make use of the combination of expertise within the RDA community not available elsewhere. Briefly these are:

  • Publishing notebooks

  • Long term preservation of notebooks

  • Notebooks as FAIR digital objects

  • Notebooks for big data and compute

The RDA community benefits from this as it provides an opportunity to shape how this rapidly expanding technology is deployed and used.

The meeting objective is to build on the work carried out there and discuss the formation of WGs to carry out explicit tasks. 

 

Martin Fenner of the Software Source Code Identification WG has agreed to coordinate between the two groups.

 

Meeting agenda: 

 

Collaborative Notes Link: https://docs.google.com/document/d/1TXr6QlD0Byz7uhhrx7vjbzUvuyTg8kZch5Nm...

  1. Introduction and IG proposal (5 minutes HS).

  2. Earthcube project and reviewing/publishing notebooks (5 Minutes Daniel S. Katz)

  3. Preserving notebooks with current preservation software tools. (5 Minutes Patricia Herterich)

  4. Notebooks and the Science Gateway. (5 Minutes Rob Quick)

  5. Notebooks for Big data and compute . (5 Minutes Enol Fernandez)

  6. Break-out groups to discuss proposals for WGs. (45 Minutes)

  7. Report back and recommendations for WGs. (20 Minutes)

Type of Meeting: 
Working meeting
Short introduction describing any previous activities: 

A successful BoF was run on this topic at RDA Helsinki where the above topics were initially discussed.

This session proposal is based on a wide variety of activities - names included in parenthesis indicate individuals who have expressed an interest in this proposal. The Software Source Code Identification WG (Roberto Di Cosmo, Martin Fenner, Daniel S. Katz) and Software Source Code IG (Neil Chue Hong, Roberto di Cosmo) are active groups in the RDA and providing guidelines for the citation of notebooks would be an excellent use case for these groups. In parallel, RDA BoFs on FAIR for research software have led to the proposal of an RDA/FORCE11/ReSA FAIR for Research Software WG (Michelle Barker, Paula Andrea Martinez, Leyla Garcia, Daniel S. Katz, Neil Chue Hong) that is considering what FAIR means for notebooks. The FORCE11 Software Citation Implementation WG (Daniel S. Katz, Martin Fenner, Neil Chue Hong) is developing guidance on citation (including notebooks) for authors, developers and journals/conferences. The research library community is extensively exploring the potential of notebooks to facilitate data and software management and reproducibility (Rosie Higman, Jez Cope and Patricia Heterich). Notebooks provide an interesting use case for PIDs in data (Rob Quick) and likewise for computational services (Gergely Sipos, Enol Fernández and Christine Kirkpatrick). This meeting also provides a place for deployment services such as myBinder (Tim Head) to engage with the RDA. Finally the Neutron and Photon Science community (Brian Matthews and Frank Schluenzen) and the relevant IG make very extensive use of notebooks and hence complements these activities.

BoF chair serving as contact person: 
Please indicate the breakout slot (s) that would suit your meeting. : 
Breakout 1
Breakout 3
Breakout 6
Are you willing to host a live second session to accommodate a different time zone? : 
Yes
Meeting presenters: 
Daniel S. Katz, Patricia Herterich, Rob Quick, Enol Fernandez, Hugh Shanahan
How do you prefer to hold the virtual component of your session: 
live
Other: 
If there is an additional session for a different time zone (i.e. one that is better for E. Asia/Australasia then then that session would have pre-recorded videos of the talks from the other session.
Avoid conflict with the following group (2): 
Avoid conflict with the following group (3):