Dear Hugh, all,
did you ever see the training page – please look here: http://europe.rd-alliance.org/training-programme
Our primary focus for out training efforts as RDA EU is to disseminate the RDA global results, but we also do slightly more. And also the other regions will have some activities in this respect. Since all material is open we can all share our efforts.
So we are already doing training, organise this with a very small team and it is a huge challenge to organise up to 3 or 4 events per month. In all this we act as a broker between interested people and some experts which we can make use of from the 4000 RDA experts. It may be that you find the focus of the training courses wrong, but then please fill in the request for ideas and wishes.
Let me add here that we are pleased that EDISON volunteered to create a framework which allows us to synchronise training efforts across Europe.
And I should also mention (and that is perhaps as important as what is said above) that various countries are doing training courses organised by active RDA groups and often these national courses are supported by RDA Europe. As an example just look here: https://www.dkrz.de/Nutzerportal/veranstaltungen-1/de-rda-de-trainingswo...
So Hugh – what is missing and can you do more with a small team?
best
Peter
------------------------------------------------------------------------------------------------
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***
RDA Europe Director, RDA TAB Member, EUDAT Scientific Advisor
Senior Advisor Data Systems, Max Planck Computing and Data Facility
Gießenbachstraße 2, 85748 Garching, Germany
http://www.mpcdf.de, http://www.mpcdf.de/~pewi
former affiliation: MPI for Psycholinguistics, Nijmegen, The Netherlands
Von: Hugh Shanahan [mailto:***@***.***]
Gesendet: Samstag, 30. April 2016 11:27
An: Jamie Shiers
Cc: Wittenburg, Peter; ***@***.***; Mark Parsons; Berman, Fran; RDA Organisational Assembly / Organisational Advisory Board (OAB); RDA Europe Synchronisation Assembly (SyA); RDA Europe Synchronisation Assembly (SyA) (***@***.***-groups-europe.org); Simon CODATA; Harrison, Andrew
Betreff: Re: [rda-oab][synchronisation-assembly] US future thoughts
Dear all
I wanted to follow up on the email from Jamie, namely his wish for the RDA to engage in training. I agree with him when refers to it as the under-utilised “killer app” of the RDA.
I don’t have to state the obvious that there is a huge requirement of Data Science skills in Research. What is important to note is the depth and breadth of training that is required. The skills required change subtly from those working in a domain dominated by Volume and/or Velocity issues such as High Energy Physics and Bioinformatics to those mostly facing the Variety issue (in, for example, the Long Tail of Research).
It’s also important to point out that training for specialists in Data Science is key, but training for the large numbers of researchers who will need to have a moderate understanding of a variety of different topics within Data Science is also essential. The recent estimated figure of 500K researchers who need Data Science skills in Europe alone jumps out here.
There is an analogy here with Engineering – an Engineer will typically understand some Calculus, Mechanics, Thermodynamics and Linear Algebra and a variety of other topics. Obviously there are experts in all those fields but that doesn’t mean an Engineer should simply hand off any matrix inversion to an Applied Mathematician simply because she’s not done a PhD in the topic. As it is many researchers are wasting much of their time and effort re-inventing the wheel and view Open Research with suspicion because they don’t see the bigger picture.
The number of Masters programmes in Data Science around the world are growing rapidly but appear to be only addressing parts of the problem. As noted by one study from the EDISON project, these programmes are often focussed on particular sub-sets of Data Science rather than giving an overview. Hence the danger is that without some leadership Data Science will become a fractured discipline.
There is a need for an organisation to take the lead and propose, not dictate, best practices in Data Science from introductory to advanced levels; to point out that it’s necessary to have some understanding of all of it and the fact that being open with data and its analysis makes for more effective and efficient research; to set up the mechanisms to accredit individuals, courses and degree programmes to ensure quality and maximise impact. The RDA is ideally placed to do this. It has the authority based on its extensive grassroots community of experts and its array of funders.
I cannot think of a more effective way of achieving the goals of the RDA.
All the best
Hugh
__________________________
Hugh Shanahan
Senior Lecturer in Bioinformatics
***@***.***
http://www.shanahanlab.org
@hughshanahan
Skype hugh_shanahan
Tel +44 (0)1784 443433
orcid.org/0000-0003-1374-6015
On 28 Apr 2016, at 11:00, Jamie Shiers <***@***.***> wrote:
Dear all,
Stimulated by these various discussions I have written a short very informal note (aka brain dump) that is attached.
This lists 3 main points (as per Leif’s mail) although I confess to running out of steam, time and page limit (1 double side of A4) on the 3rd (and possibly most important) point.
However, as the note says, I am sure others will weigh in on this point.
Cheers, Jamie
On 27 Apr 2016, at 18:14, Peter Wittenburg <***@***.***> wrote:
Dear SyA members,
Fran from RDA US allowed us to distribute this brainstorming note about the future of RDA. Please take it as what it is: first ideas on how RDA could move from the viewpoint of our US colleagues.
I think it is a great resource to stimulate our discussions in Europe as well.
best
peter
------------------------------------------------------------------------------------------------
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***
Attached files:
RDA_US_2.0_Proposal.pdf
--
Full post: https://rd-alliance.org/group/rda-europe-synchronisation-assembly-sya/po...
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/52156
Attached files:
RDA_Sustainability_____Thoughts.docx
--
Full post: https://rd-alliance.org/group/rda-europe-synchronisation-assembly-sya/po...
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/52156
Author: Jamie Shiers
Date: 01 May, 2016
Dear Peter and all,
It is very good that so many of us see training as a key output / deliverable.
And it is clear that there is only so much that a small team can do.
Hence the proposal to leverage the ~4000 strong “RDA Collaboration”.
But equally important IMHO is to balance “push” with “pull” (from the organisations / projects that the “RDA Collaboration” represents and bridges to). (The bi-directional engagement as I call it).
Then we could truly say: “The Worldwide RDA Collaboration, that represents science at all scales, mobilises to target the “missing 500,000” data scientists. It offers training not only core data principles and values but also addresses the specific needs of the various communities and projects concerned. This allows the RDA to implements its vision of "researchers and innovators openly sharing data across technologies and disciplines and countries to address the grand challenges of society”.
Scalable. Sustainable. Implementable. Workable.
Cheers, Jamie
Author: Jamie Shiers
Date: 01 May, 2016
Dear Peter,
I will let Harry, Hugh and others talk about training as they have concrete things in place.
As far as Collaboration is concerned, I think that the key point is to make the 4000 RDA members feel empowered.
This is a big topic that needs wider discussion and probably some “test cases”.
Let’s take the case of data preservation and see how this fits.
Hilary is working on a quote for a new version of the RDA Recommendations & Outcomes booklet. I don’t know when this will appear. There are also some slides I sent to Mark in May 2015 and Fran uses some material from me in one of her courses.
The benefits of adoption are multiple and can be hard to measure precisely. One, of many examples comes from Jamie Shiers, Information Technology Department, CERN & Manager of the Data Preservation for Long-Term Analysis in High Energy Physics (DPHEP) who says that “In DPHEP (Data Preservation in High Energy Physics) we have saved person-years through the knowledge gained through the RDA including the Preservation IG and many others. The most conservative estimate I can think of is that we have saved 5 person years and got something much better and more sustainable than we would have done otherwise. This could well be an under-estimate but 5 person-years at the EU project rate of ~EUR100K/year is quite substantial.”
There is material that I have sent to the Preservation e-Infrastructure IG mailing list, including a status report covering the last 3 years (more or less the time of my involvement with the RDA.
It is referenced from the attachment which will appear in an upcoming version of the CERN Courier. (This is still a draft - some small changes are likely). There is also an iPRES paper currently in review.
What does this say? At a minimum we want to go through certification according to ISO 16363 for the WLCG Tier0. Then we will decide whether we want a formal audit or not. I intend to do the self-audit looking how to extend to all CERN experiments as well as to all CERN activities (the latter including also photos, videos, memos etc - the organisations “digital memory” as the article calls it).
We will certainly learn a lot from this exercise: it should be completed by 2018 to allow it to input to the next round of the European Strategy for Particle Physics update (2018/2019 - the exact dates are not yet fixed).
Coupled to this are other activities: the preparation of Active Data Management plans for WLCG, all CERN experiments (I would like to see this part of the formal approval / review process for experiments), Sharing and Re-use in practice (CMS have already made available 3 releases - some 200TB in total), Open Access Policies, Reproducibility, etc etc
Through projects that we are involved in, we are trying to spread this to other disciplines. Also through the EIROForum IT WG.
The time lines are strict: whilst not technically difficult getting all the necessary formal approval (e.g. the 100+ metrics of ISO 16363) is not going to happen overnight.
How can “the RDA” empower me / us so that the whole community benefits as much as possible?
(Not by telling me that there haven’t been many posts on the PeIG mailing list for example).
If we achieve the above I am immodest enough to think that it will be a pretty major achievement.
I have said many times that we could not have come up with this plan, nor advanced so far in its implementation, without expert input from many people at RDA meetings, including WGs and IGs.
If I had never come to Garching in September 2012, then to Gothenburg in 2013 etc we would be in a very different situation.
Probably unable to see the wood for the trees, so I think my estimate of the effort (= money) that we have saved is almost certainly an underestimate.
To mis-quote George Bernard Shaw, we are waiting to help, we are willing to help, we are wanting to help!
I hope that I haven’t gone too much off track - to come back to concrete steps:
1. We will continue to participate in WGs, IGs, BoFs etc that are relevant to our work and goals;
2. We would like to see how the latter could be leverage to benefit “all of RDA’, e.g. the workshops we are organising, or are likely to organise in the coming 1 / 2 / 3 years (data management, data sharing, reproducibility: “bit preservation” has been a bit overdone in our past DPHEP workshops but we still believe we have some valuable experience to share: at the 100+PB scale today and planning for up to 3 orders of magnitude more, including cost model and business case).
3. We are happy to participate in the production of “success stories” now and in the future.
How to take this further and generalise it?
I would suggest to take 2-3 examples (data preservation, training, and one other) and talk about them, presumably at a plenary, as shining examples of the “power of the RDA”. End by calling for other examples to be submitted for show-casing in the future.
Possibly follow with a 30’ discussion, explicitly trying to get “the silent majority” to speak up.
Not easy, but it could well be a snowball effect, starting slowly and rapidly gaining size and momentum.
Cheers, Jamie
On 01 May 2016, at 11:47, Wittenburg, Peter <***@***.***> wrote:
Thanks Jamie,
interesting point. How do you want to organise things?
I would like to understand how to do things practically. Currently I see that when organising an event based on some request we will find good experts from our RDA members base whom we could ask to run a course etc. They have shown in the groups where they are heading to, they have indicated their use cases or adoption stories, etc. It's all on the wiki.
The 500.000 data scientists you are talking about are a huge mass indeed - some of them we as individuals know per accident and if we are lucky we know in detail what they are doing, where they have deep knowledge, what they could contribute, etc. So when we would organise a meeting on preservation I know that for example you have done a lot, but don't know details. Yet there is no adoption story or WG/IG output about that.
So what is your suggestion?
best
peter
-----Ursprüngliche Nachricht-----
Von: Jamie Shiers [mailto:***@***.***]
Gesendet: Sonntag, 1. Mai 2016 08:56
An: Wittenburg, Peter
Cc: Hugh Shanahan; RDA Europe Synchronisation Assembly (SyA);
***@***.***; Mark Parsons; Berman, Fran; RDA Organisational
Assembly / Organisational Advisory Board (OAB); Simon CODATA; Harrison,
Andrew
Betreff: Re: [synchronisation-assembly] [rda-oab][synchronisation-assembly]
US future thoughts
Dear Peter and all,
It is very good that so many of us see training as a key output / deliverable.
And it is clear that there is only so much that a small team can do.
Hence the proposal to leverage the ~4000 strong “RDA Collaboration”.
But equally important IMHO is to balance “push” with “pull” (from the
organisations / projects that the “RDA Collaboration” represents and bridges
to). (The bi-directional engagement as I call it).
Then we could truly say: “The Worldwide RDA Collaboration, that represents
science at all scales, mobilises to target the “missing 500,000” data scientists.
It offers training not only core data principles and values but also addresses
the specific needs of the various communities and projects concerned. This
allows the RDA to implements its vision of "researchers and innovators
openly sharing data across technologies and disciplines and countries to
address the grand challenges of society”.
Scalable. Sustainable. Implementable. Workable.
Cheers, Jamie
On 30 Apr 2016, at 16:16, Peter Wittenburg <***@***.***>
wrote:
Dear Hugh, all,
did you ever see the training page – please look here:
http://europe.rd-alliance.org/training-programme
Our primary focus for out training efforts as RDA EU is to disseminate the
RDA global results, but we also do slightly more. And also the other regions
will have some activities in this respect. Since all material is open we can all
share our efforts.
So we are already doing training, organise this with a very small team and it
is a huge challenge to organise up to 3 or 4 events per month. In all this we
act as a broker between interested people and some experts which we can
make use of from the 4000 RDA experts. It may be that you find the focus of
the training courses wrong, but then please fill in the request for ideas and
wishes.
Let me add here that we are pleased that EDISON volunteered to create a
framework which allows us to synchronise training efforts across Europe.
And I should also mention (and that is perhaps as important as what is
said above) that various countries are doing training courses
organised by active RDA groups and often these national courses are
supported by RDA Europe. As an example just look
here:https://www.dkrz.de/Nutzerportal/veranstaltungen-1/de-rda-de-trai
ningsworkshop-2016
So Hugh – what is missing and can you do more with a small team?
best
Peter
------------------------------------------------------------------------------------------------
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***
RDA Europe Director, RDA TAB Member, EUDAT Scientific Advisor
Senior Advisor Data Systems, Max Planck Computing and Data Facility
Gießenbachstraße 2, 85748 Garching, Germany http://www.mpcdf.de,
http://www.mpcdf.de/~pewi
former affiliation: MPI for Psycholinguistics, Nijmegen, The
Netherlands
Von: Hugh Shanahan [mailto:***@***.***]
Gesendet: Samstag, 30. April 2016 11:27
An: Jamie Shiers
Cc: Wittenburg, Peter; ***@***.***; Mark Parsons; Berman,
Fran; RDA Organisational Assembly / Organisational Advisory Board
(OAB); RDA Europe Synchronisation Assembly (SyA); RDA Europe
Synchronisation Assembly (SyA)
(***@***.***-groups-europe.org); Simon CODATA;
Harrison, Andrew
Betreff: Re: [rda-oab][synchronisation-assembly] US future thoughts
Dear all
I wanted to follow up on the email from Jamie, namely his wish for the
RDA to engage in training. I agree with him when refers to it as the under-
utilised “killer app” of the RDA.
I don’t have to state the obvious that there is a huge requirement of Data
Science skills in Research. What is important to note is the depth and
breadth of training that is required. The skills required change subtly from
those working in a domain dominated by Volume and/or Velocity issues such
as High Energy Physics and Bioinformatics to those mostly facing the Variety
issue (in, for example, the Long Tail of Research).
It’s also important to point out that training for specialists in Data Science
is key, but training for the large numbers of researchers who will need to
have a moderate understanding of a variety of different topics within Data
Science is also essential. The recent estimated figure of 500K researchers
who need Data Science skills in Europe alone jumps out here.
There is an analogy here with Engineering – an Engineer will typically
understand some Calculus, Mechanics, Thermodynamics and Linear Algebra
and a variety of other topics. Obviously there are experts in all those fields
but that doesn’t mean an Engineer should simply hand off any matrix
inversion to an Applied Mathematician simply because she’s not done a PhD
in the topic. As it is many researchers are wasting much of their time and
effort re-inventing the wheel and view Open Research with suspicion
because they don’t see the bigger picture.
The number of Masters programmes in Data Science around the world are
growing rapidly but appear to be only addressing parts of the problem. As
noted by one study from the EDISON project, these programmes are often
focussed on particular sub-sets of Data Science rather than giving an
overview. Hence the danger is that without some leadership Data Science
will become a fractured discipline.
There is a need for an organisation to take the lead and propose, not
dictate, best practices in Data Science from introductory to advanced levels;
to point out that it’s necessary to have some understanding of all of it and
the fact that being open with data and its analysis makes for more effective
and efficient research; to set up the mechanisms to accredit individuals,
courses and degree programmes to ensure quality and maximise impact.
The RDA is ideally placed to do this. It has the authority based on its
extensive grassroots community of experts and its array of funders.
I cannot think of a more effective way of achieving the goals of the RDA.
All the best
Hugh
__________________________
Hugh Shanahan
Senior Lecturer in Bioinformatics
***@***.***
http://www.shanahanlab.org
@hughshanahan
Skype hugh_shanahan
Tel +44 (0)1784 443433
orcid.org/0000-0003-1374-6015
On 28 Apr 2016, at 11:00, Jamie Shiers <***@***.***> wrote:
Dear all,
Stimulated by these various discussions I have written a short very informal
note (aka brain dump) that is attached.
This lists 3 main points (as per Leif’s mail) although I confess to running out
of steam, time and page limit (1 double side of A4) on the 3rd (and possibly
most important) point.
However, as the note says, I am sure others will weigh in on this point.
Cheers, Jamie
On 27 Apr 2016, at 18:14, Peter Wittenburg <***@***.***>
wrote:
Dear SyA members,
Fran from RDA US allowed us to distribute this brainstorming note about
the future of RDA. Please take it as what it is: first ideas on how RDA could
move from the viewpoint of our US colleagues.
I think it is a great resource to stimulate our discussions in Europe as well.
best
peter
------------------------------------------------------------------------------------------------
Peter Wittenburg Tel: +49 2821 49180
***@***.*** ; ***@***.***
Attached files:
RDA_US_2.0_Proposal.pdf
--
Full post:
https://rd-alliance.org/group/rda-europe-synchronisation-assembly-sya/
post/us-future-thoughts Manage my subscriptions:
https://rd-alliance.org/mailinglist
Stop emails for this post:
https://rd-alliance.org/mailinglist/unsubscribe/52156
Attached files:
RDA_Sustainability_____Thoughts.docx
--
Full post:
https://rd-alliance.org/group/rda-europe-synchronisation-assembly-sya/
post/us-future-thoughts Manage my subscriptions:
https://rd-alliance.org/mailinglist
Stop emails for this post:
https://rd-alliance.org/mailinglist/unsubscribe/52156
--
Full post:
https://rd-alliance.org/group/rda-europe-synchronisation-assembly-sya/
post/aw-rda-oabsynchronisation-assembly-us-future
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post:
https://rd-alliance.org/mailinglist/unsubscribe/52208