Management of Computational Notebooks

27 Jun 2019

Submitted by Hugh Shanahan

Meeting objectives: 

Notebooks, specifically Jupyter notebooks (but also notebooks from platforms such as RStudio) are generating a great deal of excitement and are potentially a significant step forward in terms of reproducibility, education, code documentation and academic publishing. A recent paper found nearly 1.2 Million unique notebooks based on searches through github.

More recently the deployment of resources such as JupyterHub, myBinder and EGI notebooks mean that notebooks can be deployed on a cloud and hence obviates issues with installed libraries. One potential result of this is that it would represent the easiest pathway for users to use cloud computing resources for research which could transform the use of such resources. It is clear that frameworks such as EOSC and the US National Data Service will make extensive use of notebooks and their offerings affected by them. 


There is an extensive body of research on Notebooks (for example the JupyterCon conference series) but the expertise of the RDA members can make important contributions in this area. The overall objectives are to initially consider the following :- 

  • Citation of notebooks that conform to Software Citation standards,

  • Integration of notebooks with data sources (e.g. EGI DataHub or more abstractly to data sources with a PID),

  • Deploying notebook services on large Scientific computational platforms. 

  • Long-term preservation of notebooks without losing functionality.


This will be carried out with some relatively short introductions to the above topics followed by break out groups on the above topics to consider what concrete steps can be made in this within the RDA, e.g. through the formation of an IG and/or WG’s.


Meeting agenda: 


Collaborative session notes: 

Please use the twitter tag #RDACompNotebooks if you are referencing this session.

  1. A brief introduction to notebooks.

  2. Publishing notebooks (Martin Fenner, DataCite)

  3. Notebooks and long-term preservation (Patricia Herterich, DCC)

  4. Notebooks and FAIR digital objects (Christine Kirkpatrick, UCSD/US National Data Service)

  5. Deploying notebooks on large Scientific computational platforms (Gergely Sipos, EGI)

  6. Break-out groups to discuss topics 2-5.

  7. Consideration of next steps.

This session proposal is based on a wide variety of activities - names included in parenthesis indicate individuals who have expressed an interest in this proposal. Software citation (Martin Fenner, Neil Chue Hong) is an active IG and WG in the RDA and providing guidelines for the citation of notebooks would be an excellent use case for these groups. The research library community are extensively exploring the potential of notebooks to facilitate data and software management and reproducibility (Rosie Higman, Jez Cope and Patricia Heterich). Notebooks provide an interesting use case for PID’s in data (Rob Quick) and likewise for computational services (Gergely Sipos, Enol Fernández and Christine Kirkpatrick). This meeting also provides a moment for deployment services such as myBinder (Tim Head) to engage with the RDA. Finally the Neutron and Photon Science community (Brian Matthews and Frank Schluenzen) and the relevant IG make very extensive use of notebooks and hence complements these activities.

