AW: [rda-oab][synchronisation-assembly] [rda-oab][synchronisation-assembly] US future thoughts

02 May 2016

Hallo Harry,
guess it is very good to try out various ways and I am curious of course how your schools will develop. Training new teachers is a must indeed and we need to use all ways which we can think of. Also collaboration with research communities is essential although the point here is of course that we have so many of them, ELIXIR being just one, and many prefer to come to meetings at national level as I said earlier. So it is good to have many activities. If you need help from the RDA Europe team for your efforts then let us know. We can certainly discuss this, if there is an interest.
Let me just add that in one point you are wrong: You state that liaising with every expert is killing the productivity of myself and the team. You should only make such statements if you know the details. If you are interested we can talk about that.
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***
RDA Europe Director, RDA TAB Member, EUDAT Scientific Advisor
Senior Advisor Data Systems, Max Planck Computing and Data Facility
Gießenbachstraße 2, 85748 Garching, Germany,
former affiliation: MPI for Psycholinguistics, Nijmegen, The Netherlands
Von: Harrison, Andrew [mailto:***@***.***]
Gesendet: Sonntag, 1. Mai 2016 16:21
An: Jamie Shiers; Wittenburg, Peter; RDA Organisational Assembly / Organisational Advisory Board (OAB); RDA Europe Synchronisation Assembly (SyA)
Cc: Hugh Shanahan; ***@***.***; Mark Parsons; Berman, Fran; Simon CODATA
Betreff: Re: [rda-oab][synchronisation-assembly] [rda-oab][synchronisation-assembly] US future thoughts
Dear Peter, Jamie and colleagues,
I think it might be helpful to directly compare how the CODATA-RDA schools will grow compared with the current approach. The focus is on the word how as this directly addresses Peter's questions about practicalities. And this will allow us to focus on ideas rather than events and people. We can also discuss further in Nottingham.
The CODATA-RDA schools WG have put a lot of intellectual effort into the design of the schools. We have tried to avoid at all costs bespoke solutions as this is where the labour gets spent without an ongoing return. It is the bespoke solutions to each expert webinar that is killing the productivity of Peter and the team because they have to spend a long time on liaising with each expert. Each webinar tends to be a one-off event with no descendants.
Whereas the schools are focussing on how the material will travel onwards. We are working hard to enable the schools to produce a community of teachers who can then pass on the material to other schools. We have placed the ideas in the material centre stage, and each school leads to direct descendants able to pass on the material. So the material propagates - the schools then mechanically resemble mitosis, atomic fission, viruses, population explosions. So each school has many descendants and it is this that allows us to envisage education to scale.
We are working closely with a number of partners. The propagation of material is that developed by Software Carpentry and Data Carpentry and we are copying their technology for training new teachers. We are also working with Digital Curation Centre (who will cover Research Data Management) as well as other partner organisations looking at analytics, visualisation and infrastructures. We want to ensure that they get full credit for their efforts and intellectual contributions.
Where I also think we are working differently to what is currently happening is by placing the disciplines at the forefront. We are working with Elixir/BD2K(NIH)/H3Africa/Goblet, as an example, and they will create an advanced course on Data sciences for the life sciences - and will be taken by people who have previously the "Vanilla" school. The interests of the life scientists are fundamentally different to the physicists from CERN and their will be lots of push-pull of ideas. It is this bringing together of many disciplines to build a common educational infrastructure which will be one of the intellectual legacies of the WG (this should produce new advanced courses when partners from different fields meet to discuss and exchange ideas - and this process resembles meiosis as noted by one of our partners when referring to speciation events for new courses).
We are collaborating with Elixir on exploring advantages/disadvantages of different approaches to Training New Teachers, and it is getting TNT right that will be the thing that gets to 500K in the most effective way (and this will almost certainly be the most cost effective way).
Ultimately people do not scale. This has become very clear as we are now wading through 326 applications to the first school. And the WG are planning to reach out to many industrial partners, each of whom will have their own perspectives and wishes. So we can definitely benefit from more hands and resources in helping to set up this educational machinery.
So regarding Peter's challenge as whether we can do more with a small team - we can do things differently. And this will produce more as time moves forward (and will certainly get to 500K quicker). It will be good to discuss how best to make use of our collective efforts and resources when we meet in Nottingham. Jamie refers to mobilisation, and most armies march on their stomach.
Best wishes,
From: Jamie.Shiers=***@***.*** <***@***.***> on behalf of Jamie Shiers <***@***.***>
Sent: 01 May 2016 11:26
To: ***@***.***; RDA Organisational Assembly / Organisational Advisory Board (OAB); RDA Europe Synchronisation Assembly (SyA)
Cc: Hugh Shanahan; ***@***.***; Mark Parsons; Berman, Fran; Simon CODATA; Harrison, Andrew
Subject: Re: [rda-oab][synchronisation-assembly] [rda-oab][synchronisation-assembly] US future thoughts
Dear Peter,
I will let Harry, Hugh and others talk about training as they have concrete things in place.
As far as Collaboration is concerned, I think that the key point is to make the 4000 RDA members feel empowered.
This is a big topic that needs wider discussion and probably some "test cases".
Let's take the case of data preservation and see how this fits.
Hilary is working on a quote for a new version of the RDA Recommendations & Outcomes booklet. I don't know when this will appear. There are also some slides I sent to Mark in May 2015 and Fran uses some material from me in one of her courses.
The benefits of adoption are multiple and can be hard to measure precisely. One, of many examples comes from Jamie Shiers, Information Technology Department, CERN & Manager of the Data Preservation for Long-Term Analysis in High Energy Physics (DPHEP) who says that "In DPHEP (Data Preservation in High Energy Physics) we have saved person-years through the knowledge gained through the RDA including the Preservation IG and many others. The most conservative estimate I can think of is that we have saved 5 person years and got something much better and more sustainable than we would have done otherwise. This could well be an under-estimate but 5 person-years at the EU project rate of ~EUR100K/year is quite substantial."
There is material that I have sent to the Preservation e-Infrastructure IG mailing list, including a status report covering the last 3 years (more or less the time of my involvement with the RDA.
It is referenced from the attachment which will appear in an upcoming version of the CERN Courier. (This is still a draft - some small changes are likely). There is also an iPRES paper currently in review.
What does this say? At a minimum we want to go through certification according to ISO 16363 for the WLCG Tier0. Then we will decide whether we want a formal audit or not. I intend to do the self-audit looking how to extend to all CERN experiments as well as to all CERN activities (the latter including also photos, videos, memos etc - the organisations "digital memory" as the article calls it).
We will certainly learn a lot from this exercise: it should be completed by 2018 to allow it to input to the next round of the European Strategy for Particle Physics update (2018/2019 - the exact dates are not yet fixed).
Coupled to this are other activities: the preparation of Active Data Management plans for WLCG, all CERN experiments (I would like to see this part of the formal approval / review process for experiments), Sharing and Re-use in practice (CMS have already made available 3 releases - some 200TB in total), Open Access Policies, Reproducibility, etc etc
Through projects that we are involved in, we are trying to spread this to other disciplines. Also through the EIROForum IT WG.
The time lines are strict: whilst not technically difficult getting all the necessary formal approval (e.g. the 100+ metrics of ISO 16363) is not going to happen overnight.
How can "the RDA" empower me / us so that the whole community benefits as much as possible?
(Not by telling me that there haven't been many posts on the PeIG mailing list for example).
If we achieve the above I am immodest enough to think that it will be a pretty major achievement.
I have said many times that we could not have come up with this plan, nor advanced so far in its implementation, without expert input from many people at RDA meetings, including WGs and IGs.
If I had never come to Garching in September 2012, then to Gothenburg in 2013 etc we would be in a very different situation.
Probably unable to see the wood for the trees, so I think my estimate of the effort (= money) that we have saved is almost certainly an underestimate.
To mis-quote George Bernard Shaw, we are waiting to help, we are willing to help, we are wanting to help!
I hope that I haven't gone too much off track - to come back to concrete steps:
1. We will continue to participate in WGs, IGs, BoFs etc that are relevant to our work and goals;
2. We would like to see how the latter could be leverage to benefit "all of RDA', e.g. the workshops we are organising, or are likely to organise in the coming 1 / 2 / 3 years (data management, data sharing, reproducibility: "bit preservation" has been a bit overdone in our past DPHEP workshops but we still believe we have some valuable experience to share: at the 100+PB scale today and planning for up to 3 orders of magnitude more, including cost model and business case).
3. We are happy to participate in the production of "success stories" now and in the future.
How to take this further and generalise it?
I would suggest to take 2-3 examples (data preservation, training, and one other) and talk about them, presumably at a plenary, as shining examples of the "power of the RDA". End by calling for other examples to be submitted for show-casing in the future.
Possibly follow with a 30' discussion, explicitly trying to get "the silent majority" to speak up.
Not easy, but it could well be a snowball effect, starting slowly and rapidly gaining size and momentum.
Cheers, Jamie
On 01 May 2016, at 11:47, Wittenburg, Peter <***@***.***> wrote:
Thanks Jamie,
interesting point. How do you want to organise things?
I would like to understand how to do things practically. Currently I see that when organising an event based on some request we will find good experts from our RDA members base whom we could ask to run a course etc. They have shown in the groups where they are heading to, they have indicated their use cases or adoption stories, etc. It's all on the wiki.
The 500.000 data scientists you are talking about are a huge mass indeed - some of them we as individuals know per accident and if we are lucky we know in detail what they are doing, where they have deep knowledge, what they could contribute, etc. So when we would organise a meeting on preservation I know that for example you have done a lot, but don't know details. Yet there is no adoption story or WG/IG output about that.
So what is your suggestion?
-----Ursprüngliche Nachricht-----
Von: Jamie Shiers [mailto:***@***.***]
Gesendet: Sonntag, 1. Mai 2016 08:56
An: Wittenburg, Peter
Cc: Hugh Shanahan; RDA Europe Synchronisation Assembly (SyA);
***@***.***; Mark Parsons; Berman, Fran; RDA Organisational
Assembly / Organisational Advisory Board (OAB); Simon CODATA; Harrison,
Betreff: Re: [synchronisation-assembly] [rda-oab][synchronisation-assembly]
US future thoughts
Dear Peter and all,
It is very good that so many of us see training as a key output / deliverable.
And it is clear that there is only so much that a small team can do.
Hence the proposal to leverage the ~4000 strong "RDA Collaboration".
But equally important IMHO is to balance "push" with "pull" (from the
organisations / projects that the "RDA Collaboration" represents and bridges
to). (The bi-directional engagement as I call it).
Then we could truly say: "The Worldwide RDA Collaboration, that represents
science at all scales, mobilises to target the "missing 500,000" data scientists.
It offers training not only core data principles and values but also addresses
the specific needs of the various communities and projects concerned. This
allows the RDA to implements its vision of "researchers and innovators
openly sharing data across technologies and disciplines and countries to
address the grand challenges of society".
Scalable. Sustainable. Implementable. Workable.
Cheers, Jamie
On 30 Apr 2016, at 16:16, Peter Wittenburg <***@***.***>
Dear Hugh, all,
did you ever see the training page - please look here:
Our primary focus for out training efforts as RDA EU is to disseminate the
RDA global results, but we also do slightly more. And also the other regions
will have some activities in this respect. Since all material is open we can all
share our efforts.
So we are already doing training, organise this with a very small team and it
is a huge challenge to organise up to 3 or 4 events per month. In all this we
act as a broker between interested people and some experts which we can
make use of from the 4000 RDA experts. It may be that you find the focus of
the training courses wrong, but then please fill in the request for ideas and
Let me add here that we are pleased that EDISON volunteered to create a
framework which allows us to synchronise training efforts across Europe.
And I should also mention (and that is perhaps as important as what is
said above) that various countries are doing training courses
organised by active RDA groups and often these national courses are
supported by RDA Europe. As an example just look
So Hugh - what is missing and can you do more with a small team?
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***
RDA Europe Director, RDA TAB Member, EUDAT Scientific Advisor
Senior Advisor Data Systems, Max Planck Computing and Data Facility
Gießenbachstraße 2, 85748 Garching, Germany,
former affiliation: MPI for Psycholinguistics, Nijmegen, The
Von: Hugh Shanahan [mailto:***@***.***]
Gesendet: Samstag, 30. April 2016 11:27
An: Jamie Shiers
Cc: Wittenburg, Peter; ***@***.***; Mark Parsons; Berman,
Fran; RDA Organisational Assembly / Organisational Advisory Board
(OAB); RDA Europe Synchronisation Assembly (SyA); RDA Europe
Synchronisation Assembly (SyA)
(***@***.***; Simon CODATA;
Harrison, Andrew
Betreff: Re: [rda-oab][synchronisation-assembly] US future thoughts
Dear all
I wanted to follow up on the email from Jamie, namely his wish for the
RDA to engage in training. I agree with him when refers to it as the under-
utilised "killer app" of the RDA.
I don't have to state the obvious that there is a huge requirement of Data
Science skills in Research. What is important to note is the depth and
breadth of training that is required. The skills required change subtly from
those working in a domain dominated by Volume and/or Velocity issues such
as High Energy Physics and Bioinformatics to those mostly facing the Variety
issue (in, for example, the Long Tail of Research).
It's also important to point out that training for specialists in Data Science
is key, but training for the large numbers of researchers who will need to
have a moderate understanding of a variety of different topics within Data
Science is also essential. The recent estimated figure of 500K researchers
who need Data Science skills in Europe alone jumps out here.
There is an analogy here with Engineering - an Engineer will typically
understand some Calculus, Mechanics, Thermodynamics and Linear Algebra
and a variety of other topics. Obviously there are experts in all those fields
but that doesn't mean an Engineer should simply hand off any matrix
inversion to an Applied Mathematician simply because she's not done a PhD
in the topic. As it is many researchers are wasting much of their time and
effort re-inventing the wheel and view Open Research with suspicion
because they don't see the bigger picture.
The number of Masters programmes in Data Science around the world are
growing rapidly but appear to be only addressing parts of the problem. As
noted by one study from the EDISON project, these programmes are often
focussed on particular sub-sets of Data Science rather than giving an
overview. Hence the danger is that without some leadership Data Science
will become a fractured discipline.
There is a need for an organisation to take the lead and propose, not
dictate, best practices in Data Science from introductory to advanced levels;
to point out that it's necessary to have some understanding of all of it and
the fact that being open with data and its analysis makes for more effective
and efficient research; to set up the mechanisms to accredit individuals,
courses and degree programmes to ensure quality and maximise impact.
The RDA is ideally placed to do this. It has the authority based on its
extensive grassroots community of experts and its array of funders.
I cannot think of a more effective way of achieving the goals of the RDA.
All the best
Hugh Shanahan
Senior Lecturer in Bioinformatics
Skype hugh_shanahan
Tel +44 (0)1784 443433
On 28 Apr 2016, at 11:00, Jamie Shiers <***@***.***> wrote:
Dear all,
Stimulated by these various discussions I have written a short very informal
note (aka brain dump) that is attached.
This lists 3 main points (as per Leif's mail) although I confess to running out
of steam, time and page limit (1 double side of A4) on the 3rd (and possibly
most important) point.
However, as the note says, I am sure others will weigh in on this point.
Cheers, Jamie
On 27 Apr 2016, at 18:14, Peter Wittenburg <***@***.***>
Dear SyA members,
Fran from RDA US allowed us to distribute this brainstorming note about
the future of RDA. Please take it as what it is: first ideas on how RDA could
move from the viewpoint of our US colleagues.
I think it is a great resource to stimulate our discussions in Europe as well.
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***

Attached files:
Full post:
post/us-future-thoughts Manage my subscriptions:
Stop emails for this post:

Attached files:
Full post:
post/us-future-thoughts Manage my subscriptions:
Stop emails for this post:
Full post:
Manage my subscriptions:
Stop emails for this post: