Computational Notebooks
Submitted by Hugh Shanahan
Notebooks, specifically Jupyter notebooks (but also notebooks from platforms such as RStudio), are generating a great deal of excitement and are potentially a significant step forward in terms of reproducibility, education, code documentation and academic publishing. A recent paper found nearly 1.2 million unique notebooks based on searches through GitHub; another even more recent blog gives evidence of nearly 5 million notebooks on GitHub.
This is colossal growth of the use of notebooks by researchers. The RDA absolutely needs to engage with this as it enables sharing of both data and software.
More recently the development and deployment of resources such as JupyterHub, myBinder and EGI notebooks mean that notebooks can be deployed in the cloud and helps reduce issues that occur when running on a user's own system. For beginner and intermediate users this could become the easiest pathway for using cloud computing resources. It is clear that frameworks such as EOSC and the EGI will make extensive use of notebooks, and their offerings affected by them. On the other hand, the closure of services such as Azure Notebooks means that the preservation of these notebooks is also key.
An Interest Group will be proposed to match this BoF session and build on the BoF held at RDA Helsinki . A set of topics (discussed in detail in the objectives below) has been arrived at that a) would make a tangible contribution to the notebook community and b) make use of the combination of expertise within the RDA community not available elsewhere. Briefly these are:
-
Publishing notebooks
-
Long term preservation of notebooks
-
Notebooks as FAIR digital objects
-
Notebooks for big data and compute
The RDA community benefits from this as it provides an opportunity to shape how this rapidly expanding technology is deployed and used.
The meeting objective is to build on the work carried out there and discuss the formation of WGs to carry out explicit tasks.
Martin Fenner of the Software Source Code Identification WG has agreed to coordinate between the two groups.
Collaborative Notes Link: https://docs.google.com/document/d/1TXr6QlD0Byz7uhhrx7vjbzUvuyTg8kZch5Nm...
-
Introduction and IG proposal (5 minutes HS).
-
Earthcube project and reviewing/publishing notebooks (5 Minutes Daniel S. Katz)
-
Preserving notebooks with current preservation software tools. (5 Minutes Patricia Herterich)
-
Notebooks and the Science Gateway. (5 Minutes Rob Quick)
-
Notebooks for Big data and compute . (5 Minutes Enol Fernandez)
-
Break-out groups to discuss proposals for WGs. (45 Minutes)
-
Report back and recommendations for WGs. (20 Minutes)
A successful BoF was run on this topic at RDA Helsinki where the above topics were initially discussed.
This session proposal is based on a wide variety of activities - names included in parenthesis indicate individuals who have expressed an interest in this proposal. The Software Source Code Identification WG (Roberto Di Cosmo, Martin Fenner, Daniel S. Katz) and Software Source Code IG (Neil Chue Hong, Roberto di Cosmo) are active groups in the RDA and providing guidelines for the citation of notebooks would be an excellent use case for these groups. In parallel, RDA BoFs on FAIR for research software have led to the proposal of an RDA/FORCE11/ReSA FAIR for Research Software WG (Michelle Barker, Paula Andrea Martinez, Leyla Garcia, Daniel S. Katz, Neil Chue Hong) that is considering what FAIR means for notebooks. The FORCE11 Software Citation Implementation WG (Daniel S. Katz, Martin Fenner, Neil Chue Hong) is developing guidance on citation (including notebooks) for authors, developers and journals/conferences. The research library community is extensively exploring the potential of notebooks to facilitate data and software management and reproducibility (Rosie Higman, Jez Cope and Patricia Heterich). Notebooks provide an interesting use case for PIDs in data (Rob Quick) and likewise for computational services (Gergely Sipos, Enol Fernández and Christine Kirkpatrick). This meeting also provides a place for deployment services such as myBinder (Tim Head) to engage with the RDA. Finally the Neutron and Photon Science community (Brian Matthews and Frank Schluenzen) and the relevant IG make very extensive use of notebooks and hence complements these activities.
A successful BoF was run on this topic at RDA Helsinki. A report on this can be found here.
The following are a set of relevant links
-
Introduction of Jupyter notebooks :- https://jupyter.org/about
-
Introduction to Rstudio notebooks - https://bookdown.org/yihui/rmarkdown/notebook.html
-
myBinder - https://gke.mybinder.org/
-
EGI notebooks - https://notebooks.egi.eu/hub/login
- 680 reads