RDA P13 in Philadelphia was excellently organised and many sessions, meeting opportunities and original kind of participation and interaction were offered.
I hereby would like to share some of my experiences there :
A session from the interest group I am co-Chairing (SHARC)
A session from a newly created group on ethics and social aspects of data science
A BoF session on health data
Meeting poster presenters and ambassadors
Participate in a CODATA Committee meeting as a satellite activity
Contribute to an « unconference » discussion
Get to meet colleagues from a new project in relation with RDA that will start later this year.
1. Session from the interest group SHARC : Wednesday 3 April 2019, 14.30 – 16.00
Evaluating FAIRness Practise - Need for Integrated Ecosystem of Assessment Linked to Crediting and Rewarding Mechanisms
This session attracted about 50 people on site and 8 remotely.
The agenda was as follows :
● Goals of the group’s project; current standing; objectives of the meeting: Introduction to the group?
A. Cambon-Thomsen (FR) 10 min
● Grids in SHARC recommendations, L. Mabile (FR) 5 min
● Grids construction and structure, R.David (FR) 10 min
● Eliane Frankhauser, DANS (NL) 10 min:
DANS FAIR assessment tools and implementation (provisional title).
● Luiz Olavo Bonino, GO FAIR (NL) 10 min:
Essential FAIR criteria, prioritisation and rules for implementation (provisional title)
● Questions-mediated discussion with panelists and audience (approx. 45 min)
Chairs: Alison Specht, Anne Cambon-Thomsen
The three first persons to present were part of the SHARC interest group (Myself, Laurence Mabile and Romain David), co-leading the group.
I described the origin, history and objectives of the interest group SHARC, aiming at producing tools and recommendations to better integrate sharing activities and the related tasks into the evaluation of scientific activities, be it for individual or group evaluation, promotions and recruitment or grant attribution. Laurence Mabile explained what had been accomplished so far and the part addressed in this session; Romain David described the tool for assessing several criteria proposed to appreciate each aspect of FAIR activity (Findable, accessible, inter-operable and re-usable) in the form of decision tree and semi-quantitative scale of appreciation ; this tool, so far designed by a small sub-group of SHARC IG, is intended for two kinds of users : 1) auto-evaluation (researcher, other scientist or data related professional, project manager, research group etc. and 2) evaluator (using the auto-evaluation plus objective elements) ; After explaining the logic of the tool and showing some examples of the content he announced the next as a questionnaire about this tool that will be sent to the SHARC group and people mentioning their e-mail contact, in order to appreciate its clarity, usability and pertinence. People attending the session were invited to get involved.
The two other speakers gave interesting and complementary perspectives and this session was anice follow up of the previous session in the programme about FAIR data maturity model, that had also attended.
● Eliane Frankhauser, described two tools much more simple than those designed in SHARC for assessing efforts towards FAIR for data sets, prepared within the DANS (Data archiving and networks services, in Netherlands ; this institution is coordinating the EU funded project FAIRsFAIR. The two tools are : FAIRdat tool: evaluation after deposit, for data managers and researchers (data curation) and FAIR checklist: evaluation before deposit, for researchers (data preparation for re-use).
● Luiz Olavo Bonino, from the GO FAIR initiative and also being from Netherlands. GO FAIR mission is to foster the coherent development of the global Internet of FAIR Data & Services (IFDS), with the main focus on early developments in the European Open Science Cloud (EOSC). He examined the various challenges in evaluating, compliance with the FAIR principles in a broad manner. It was useful to listen and interact with other projects developing different facets of such challenges.
All slides presented and notes taken are available at: https://www.rd-alliance.org/ig-rda-sharc-sharing-rewards-credit-rda-13th...
The general discussion was led by Alison Specht from Australia and myself. The main questions addressed were about
machine readable tools versus human use of criteria in evaluation
the ecosystem around FAIR activity :
who is doing these activities,
who should one credit for it,
the necessary identification of the chain of responsibilities
the differences between domains and the difficulty between generic and specific criteria/assessment,
the transition situation we have now where in many cases specific professionals for such activities are not identified in many institutions,
the difficulty and general under-estimation of the time necessary for Fair data management,
the enormous gap north – south.
Some practical proposals came out from the discussion, e.g. 5 % of project budget could be dedicated for Fair data stewardship.
A full report of the session will be available on the SHARC group webpage.
In addition to the session a poster was presented on the SHARC grid and it appracted several participants at each pause ; following both the poster presentation and the SHARC session several persons joined the SHARC IG and/or requested to participate in the survey to comment about the grid ; this will increase the variety (both disciplinary and geographical) of participants in the survey.
It has been decided to propose a SHARC IG session for RDA P14 with the results of the survey about the grid and decision trees on evaluation and discussing how an adapted grid can be annexed to recommendations. A publication will also be prepared using the survey results and presenting the grid.
2. A session from the interest group on ethics and social aspects of data science (IG established in 2015) – Tuesday 2 April 14. – 15.30
This session concentrated on Data Science Ethics excluding Privacy and several speakers addressed various aspects ;
Oya Beyan approached the question « Do we learn from data or do we simply perpetuate human bias? » and showed that we can increase bias, ignore them and be discriminative against some groups (minorities, vulnerable, less represented etc.) and showed that the usual checks and evaluation of tools (especially in machine learning) are not checked against these consequences. It seems thus that exploration of unintended consequences must be well thought of and taken into account before tools are disseminated and that vigilance and follow up must be included in good practices for data evaluation.
Alison Spetch discussed the intra-project management and transdisciplinary interaction as a demanding exercise in each community (in this case environmental sciences), with issues encountered going from vocabulary to licensing that are different in the various disciplines that see an a priori advantage to share data ; constructing trust to accomplish this is an intense activity that is not always foreseen, is difficult to appreciate and however must be integrated in the work plan. Hre case study was about an Australian Aerobiology Working Group
Myrna Morales concentrated her very energetic talk on the discriminatory aspect of data science between privileged and less privileged communities, after a historical review on the patient empowerment movement. She revisited the idea of democracy and its meaning and insisted on the importance to replace “opinions” with “facts”. Her talk was mainly addressing intra-USA issues and was linked to the history of black communities, but the main challenges are fo general value.
Fran Berman presented the Ethical challenges around IoT, data science, and data analytics with two main aspects : Bias at the different steps of the data chain and Mis-interpretation of ubiquitous data ; a set of issues to address with Big Data were identified:
Detect correlations but don’t tell us which are meaningful
Big data do not replace scientific inquiry
Big data can be easily gamed
Big data often is less robust than it seems
Echo chamber effect
Risk of too many correlations
Big data is giving scientific-sounding answers to ill-defined problems
Confusion between causation and correlation
In conclusion the necessity to introduce policy dimension and compliance in education and training for data scientists was underlined.
The discussion was lively, a bit dominated by the discrimination of groups issues whereas other dimensions were also considered as important by members of various communities represented in the audience. The role of ethicists/philosophers, researchers and governments were discussed as regards the establishment of codes of conduct. I intervened in the discussion to underline that data science ethics is about a chain of responsibilities (institutional and individual) and should be part of any education scheme.
After the session, given my past and present own implication in ethics bodies (presently member of the European group on ethics of science and new technologies, EGE, body counselling the European Commission for these aspects and referee for research integrity for my University Toulouse III in France), I joined this IG and intend to be active there especially as, at least in this session, the European voices were barely present.
3. A BoF session on Assessing FAIR Data Policy Implementation in Health Research ; Thursday 4 April, 9.00 – 10.30
This meeting was mainly addressing how to address sharing information on the new FAIR4Health project (EU funded) and the landscape analysis it will conduct to assess FAIR implementation in health research. I was especially interested as I am involved in another EU funded project (IMI FAIRplus ; IMI meaning : Innovative medicine initiatives, and these kind of projects involving heavily pharma industries). One of the objectives of the session was to identify international partners, in order to help guide the development of the FAIRification tool and guidelines for implementing FAIR Open data policy in health and social care research.
The topics of health data and the conditions under which they can be shared under different jurisdictions was extensively discussed and the aspect « silo » because of legal environments was commented upon. I could contribute to the discussion given my experience in FAIRplus and health data sharing in general. Exchanges seem to be very important on the existing policies, practices ; the question of where this BoF session leads to within RDA was addressed and the possibility of embedding this topic in an existing IG evoked.
Following this session I shall keep contact with the FAIR4Health project.
4. Meeting poster presenters and ambassadors
This was part of my objectives as I am considering applying for an RDA ambassador role and I was interested on the generic posters about ambassadors ; I also had the opportunity to meet two of them more extensively at the dedicated lunch and to exchange about how they implement this role in their activity and community.
Regarding posters in general, the exhibition was rich, interesting and diverse. It was easy to meet people or to leave a note to them. However the space between posters was too small and circulation was sometimes difficult. Having access even outside the poster session to the posters online (as pdf) would have been very interesting but I did not find this function in the programme.
Besides the programme itself there were opportunities to meet various groups and have informal discussions or participate in.
5. Participate in a CODATA Committee meeting as a satellite activity
I am part of the CODATA international data policy committee (http://www.codata.org/strategic-initiatives/international-data-policy-committee) and we had the opportunity to meet with the members present and this resulted in a fruitful interaction and decisions for some action points and methods for working together that would have barely been reached without this in person interaction.
6. Contribute to an « unconference » discussion
That was the first time I was participating to something called « unconference » and I was curious about it. Actually it is simply the definition in common (through submission of topics and vote by the attendance) of a question (or several) that are considered interesting to address. I participated in the discussion on « Shared data vocabularies & governance across institutions/governments/regions ». I found it quite technical in the approach but could contribute regarding health data and the specific example of naming infectious diseases (https://www.who.int/mediacentre/news/notes/2015/naming-new-diseases/en/) which I gave as an example of « word matters » and the possible social consequences of using certain terms. I could not stay until the end as I wanted to go to a co-located event (on a new project : PARSEC, see below) but it has been an interesting experience. I suppose the outcome can be the construction of a specific session for another occasion (P14). I am curious about the outcome and will follow the notes and opinions about this unconference.
7. Get to meet colleagues from a new project in relation with RDA that will start later this year
As a member of a new project that will start in May funded by the Belmont forum I had the opportunity to meet for the first time several of the members of the project. It is called PARSEC: Building New Tools for Data Sharing and Re-use through a Transnational Investigation of the Socioeconomic Impacts of Protected Areas, and involves RDA. Two of the main leaders (Alison Specht, Australia and Shelley Stall, USA) were presenting the project and this was an open session, so gave the possibility to disseminate information. Partners are from USA, France, Brasil and Japan, with associated partners from UK and Australia and international organisations being involved, RDA and and ESIP (Earth science information partners) and several organisations such as ORCID and DataCite. The whole project is articulated around two inter-related strands (below extract from slides presented)
Synthesis science: Employing transdisciplinary knowledge with multi-source data management, across disciplinary and organisational realms, to enable better data sharing and pioneer new technologies to improve our management and conservation of global biodiversity.
Data science: Removing barriers to data reuse by promoting data management best practices, data deposition in a trusted, community-approved repository that provides proper attribution, and engaging with the e-infrastructure to enable the automation of credit for data sharing and data reuse.
The composition of the teams in the countries involved and a global work plan were presented. It will be an important use case for various tools and policies developed within RDA, the IG SHARC being one example.
In conclusion, having the possibility to attend RDA P13 thanks to the expert grant was an enormous opportunity and I tried through this report to highlight the variety of activities and interactions that would otherwise have not been possible. This has also comforted my willingness to get involved deeper in RDA as underlined by my involvement in more IG. It also convinced me that my experience, although not in a technical data domain could effectively contribute to RDA policies and dynamics. I am already planning to participate to P14.