The Research Data Alliance (RDA) builds the social and technical bridges that enable open sharing of data. The RDA vision is researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society. RDA Plenaries are meetings that take place every six months and are designed to bring together the RDA consolidated community as well as new members in an engaging and productive setting to help advance the work being led by the RDA Working and Interest groups and the RDA as whole. The RDA Plenary meeting was held from 23rd to 25th September 2015 in Paris – France and was the sixth event since its inception. The plenary featured a special focus on research data for climate change while continuing to drive the adoption of the RDA group outputs that tangibly accelerate progress for global data sharing and increase data-driven innovation.
For the first time, I had a chance to participate in one of the most attended and educative plenaries in Research Data Alliance (RDA) history for which I am very proud indeed. This was made possible through RDA Europe’s travel and subsistence support to Early Career European Researchers and Scientists working with Data; I was one of eleven winners selected through a fair and rigorous application process. Tasks assigned to winners were as follows: invited to display a poster summarising their studies and areas of interest during the Plenary meeting; assigned RDA interest groups and or working groups for which they are requested to attend the working meeting during the plenary; asked to take notes and provide a summary of the meetings for internal use by the group members; and, requested to write a publishable summary on the meetings to be posted as a blog or news piece on the RDA web site, referencing the applicant as the author directly. The latter is this blog. The narrative here follows memoirs from day-to-day activities that I was able to capture and covers six days from Monday 21st to 26th September 2015.
Memoir of my daily activities
Day 1 – Monday 21st September 2015
Just before travelling to Paris, a friend of mine who actually lives in Paris but in the United Kingdom for studies gave me run down of what to expect in Paris, as well as interesting places to visit, given that it was my first visit to France. Well, prior to RDA, I had passed through France by bus to Germany but no one can convince me that was a visit! She really whet my appetite about interesting places to visit; so with a fully packed RDA programme, I planned to carefully weave these visits somehow. I will talk about places I was able to visit later.
Since I was keen on attending the pre-workshop on “Persistent Identifiers: Enabling Services for Data Intensive Research” organized by DataCite and EPIC (European Persistent Identifier Consortium), I headed straight to the venue (Institute Henri Poincaré) from magenta where my hotel was located. I arrived at the venue about 1:42pm and met Larry Lannom’s (from DONA) presentation which was about administering the Global Handle Registry (GHR). He gave a brief history of Handle System (HS) with early HS adopters in the 90s being library of congress, dspace, publishers which was led by AAAP initially and later joined by IPA. Usage of Handle Systems expanded during the 2000s. The purpose of the foundation was to provide management, software development and other services for technical coordination, evolution and application. He then moved to GHR operational changes under DONA management explaining the old and the new paradigms and layered under GHR structure, prefix allotment, and policies and business models. John Kunze (ARK, N2T) started with his presentation in a well seated manner and reading from an already prepared script; he talked in a clear and concise manner. After introducing himself, he started with Open Identifier infrastructure as a question - Open identifier infrastructure? He talked about non-traditional ID persistence touching on traditional silos such as Digital Object Identifier (DOI) etc. He emphasised the need to build the “Principles for open scholarly infrastructure”. Laura Paglione (ORCID) presented about connecting people to their scholarly activity and outputs via orcid.org. Anila Angjeli (ISNI) took over and presented on managing identities by interconnecting research and other domains.
The second batch of presentation on the theme “Supporting Data Intensive Research” started right after the coffee break. First presentation was by Tobias Weigel (DKRZ). I enjoyed his presentation on no delete policy scenario where PIDs are kept beyond object lifetime. Personally, from some earlier work, it occurred to me that UK National public transport access node (NaPTAN) data producers might have used such a policy. He argued that such policy provided added value such as transparency and an improvement of scientific practice. Anne Cambon-Thomsen (BRIF) talked about the Bioresource research impact factor (BRIF) and ended with call for workshops: one with EASE and other with BBMRI-ERIC. Kerstin Lehnert (IGSN) talked about why we need PIDs for samples and the need for IGSN (International Geo Sample Number). IGSN history started in the USA in 2004 and funded by US NSF. She pointed out that individuals need to create the identifiers; for example, whilst they are in the field. Peter Wittenburg (EUDAT) followed with his presentation without a PowerPoint/slides but was aided with a small note before him. With a mic and a seat, he delivered an interesting presentation. He pointed out that we all know that in most databases, metadata is hidden somewhere and to get hold of it, one needs scripting. He said, in RDA, there were very clear ways to harmonize the process. He suggested there was the need to reduce amount of solutions. PIDs are being assigned to data, software etc.; especially in the Internet of Things (IoT). Before data is published, we build workflows and others which are far beyond citations, he said. People are associating different kinds of information with PIDs; what is our boundary?, he questioned somewhat rhetorically. Double information leads to management problems. He suggested few special points that relates to data that one has to keep in mind: 1) Data being used in the lab is far from being published; 2) Some of our labs creates different PIDs and need to sort these out; 3) APIs for PIDs might be what we need; and, 4) Also trust issues needed to be solved. He intimidated that what they saw in EUDAT was that data organisations were different, pointing out that anytime EUDAT gets data it also gets a PID and People (i.e. countries) wanted to run their own systems which from computer science perspective was not so relevant. Sünje Dallmeier-Tiessen (THOR) then talked about project-thor.eu which comprises a technical and human component pointing out that THOR was focussing on data side of things. Four main strands were presented: research, implementation/services, community training and support. THOR is an open source and therefore is open to others as early adopters. THOR research and services comprise harmonization and incorporation of identifiers in an interoperable manner as well as link PIDs across platforms with an effort of removing “island”. Jennifer Lin (Making Data Count) took over and after introducing herself, she started by asking some few questions relating to whether people had deposited data using their service (DataONE) and also finding out whether those who have deposited the data have seen any usage activity. Well the latter left the room silent and it seemed that was what she expected as she followed it up stating that was the reason why the project was ongoing to provide useful metrics. At the highest level, the project investigates the impact of data.
There was then a roundtable with all speakers, moderated by Christophe Blanchi (DONA). “Programmers need to develop appropriate common APIs but we need to have a consensus on definitions”, Peter argued. For Jennifer, she suggested a focus on what she (i.e. making data count) needed referring to the needs of DataONE. Mark (from EUDAT) argued that finding the location of the data was the problem; and, local PID systems were quite a challenge. Conversion of Handle to DOI and the other way round was acknowledged as a challenging issue and Tobias was asked about this which he said the solution was a wish they have as discussion was ongoing. DOIs or Handles? Peter, argued that DOI is needed only when data is about to be published and it seemed appropriate just to use Handles prior to making the decision to publish since a lot of work is done before a decision to publish is made. Sünje touched on the cost involved and that DOI was seen as trusted identifier. One example shared by Peter was that, in time past, a project with six communities exchanging data, about 11TB, where they wanted a solution. For all the six solutions, they had to write script for each solution and it was difficult to scale the solution. He then gave a related example rhetorically - you have nice young researchers each have a good database with their data; how do we harmonize these solutions?
Laura gave a wrap up and pointed out that slides will be circulated to participants and also on twitter so people should look out. A reception and a tour followed around 6pm but at another venue (i.e. UPMC, Tour Zamansky, Salle Panoramique). Other interesting places I managed to visit were Panthéon, Saint-Laurent church, Pont au Change and leisurely walking around the Seine area and also along the “Boulevard de Sébastopol” street.
Day 2 – Tuesday 22nd September 2015
This was also a fully packed day for me. As an approved note taker, I was assigned to stream 4 which was one of the five pre-workshops being organised by RDA during the day and the stream was about "Research Data infrastructures for Environmental related Societal Challenges". Additionally, I was to report on two interest groups’ (IG) meetings: Brokering IG and Quality of Urban Life IG on 24th and 25th respectively. I will not write much here as I have provided a detailed report on what I did this day in another report covering activities on “E-INFRASTRUCTURES & RDA FOR DATA INTENSIVE SCIENCE - pre-RDA plenary workshops”. The venue was conservatoire national des arts et métiers (CNAM).
The overall theme for the pre-workshop was e-infrastructures and Research Data Alliance (RDA) for data intensive science. Stream 4 highlights were as follows:
- Many common experiences, issues and opportunities
- Relationship between distributed: data repositories; data aggregation and brokering
- Issues of interdisciplinary and cross domain data use
- Data Quality and business cases
- Trust and Capacity Development, Community building and governance
- Interoperability standards, metadata, semantics, licensing
- Research infrastructures and e-infrastructures need to be discussed and planned in a continuum
- Information Management starts with Data Production
- Consolidation and integration versus going to the cutting edge
- Emerging usage patterns of open data, new business through services (Amazon copied the EBI Gene Sequence database)
- Current model of supercomputing might not work for everyone
Additionally, the Stream 4 future plans and actions were as follows:
- Identify Commons
- Many issues are similar on European and international level
- Many actions for improvement could be exploited by variety of partners (semantics, ontologies, tools)
- Scientific RDA workshops between initiatives to discuss solutions specific to entire domains but also between domains
- Better coordination of European and international level
- Combining infrastructures and e-infrastructures
- Necessary to create real VREs
- Developing Business models
- More efficient public interventions or developing the market
- How to sustain and pay for the infrastructure and service
- Legal framework is needed
This day offered no time for visiting interesting places; but, I enjoyed the activities.
Day 3 – Wednesday 23rd September 2015
Day3 was about making sure people got to know about my presentation whilst taking the opportunity to network and also attend sessions of interest. The Opening Plenary Session started right after the registration with Patrick Cocquet (CEO Cap Digital & Plenary 6 Programme Co-Chair) kicking off the session at 9.06am and tossing the “ball” to Robert Jan-Smits (Director-General, DG Research and Innovation (RTD), European Commission). He intimated they were allowing data management plans in their strategy/funding programme arrangements with beneficiaries. Also, the long term storage curation and reuse of research data after the use of the research itself. He said they have set up a high level expert group to find out how to further develop EU research. Also putting a lot of money into large scale research infrastructure strategy. But to make progress, on data management, storage, interoperability, he indicated that there was the need for input from science community in these and other areas stressing the importance of what he was saying because of, for example, the era of combating climate change in our society. He said that it was the hope that interoperability will reach similar levels like TCPIP. He pointed out that the challenge was not about enthusiasm but to channel efforts to concrete outputs/deliverables. He wished the event well and the next was a video address by Günther H. Oettinger (Digital Economy & Society).
Günther H. Oettinger gave one example why research is needed citing the Ebola outbreak. He said supporting digitalisation of economy is the priority in Europe. Therefore, state-of-the-art digital infrastructure was needed and pointing out the importance of big data and cloud computing; he said there were plans for additional 200 million euros for the next two years in these areas. He cited the Copernicus programme where each orbiting satellite will generate several terabytes of data which will be freely available to all since Earth observation data will bring changes and opportunities to be addressed; one of these was data interoperability with highest standard of security among others. He pointed out the need to work together across the globe to ensure good standard for interoperability. He concluded by congratulating the RDA organizers for the timely initiative which had been able to capture attention and commitment. He wished the event well and ended at 9.20am. Patrick Cocquet (PC) took over also thanking the RDA P6 organizers for making the event possible. He also thanked other partners providing the Wi-Fi system among others. He also said something about the challenge programme and hinted that there would be some demonstrations during the RDA meeting.
Mark Parson (MP) then took over the mic and exclaimed, hello everybody! And there was a rousing response. MP talked about the engagement with industry. Also the need to create with people and not for people. He intimated the mission that was needed to build technical and social bridges. He said “data sharing is really hard” and that at the moment, no one tells the working groups exactly what to do as things are quite open to allow all sort of innovations. He noted that recently there had been the need of ubiquity of persistent identifiers (PIDs) and also TRUST in general and stressing that Trust is essential. He talked about what build trust saying it comes from shared experience, shared perspective among others. He encouraged participants to talk to each other especially those who disagrees with them as the tension has the tendency to create innovation. He said actual data reuse also builds Trust and Trust is also built around consensus. Mark acknowledged Hilary contribution but also other additional events such as the data challenge etc. He pointed out that working groups were delivering as well as new deliverables coming up. He said some of the working group were showing creativity from community workshops etc. and pointed out that data citation group, metadata group among others were doing great. He indicated that the value of partnership had been quite fruitful; for example, the Summer School group, working on Active Data Management, and, I got to know that one PhD thesis had been about RDA work and that was cool. He closed his talk with logistic perspective. He said, 4 members were stepping down and therefore there were vacant positions open from 23rd September 2015. He encourage people to vote strategically for the Technical Advisory Board (TAB) election as global representation will be good for all.
PC took over and tried filling the gap whilst waiting for Axelle Lemaire (Minister of State for Digital Technology, French Ministry of Economy, Industry and Digital Technology). He talked about award ceremony the next day. Someone suggested a question time just before Hilary was about to take advantage of the waiting period to talk about “House Keeping” issues. A participant asked “How is the open science clouds facilitating science across EU?” Robert, said something about the use of the expert groups he talked about to find out key issues; the group was to report by end of the year 2015. Robert talked about new ways of open science regarding data storage and curation of data. He intimated that the group they had commissioned was made up of scientists as they better understand science. MP also talked about the Funders Forum which is an opportunity for funders to come together to share ideas. He said the government was keen to see the problems of data sharing solved but still want to keep a distance to allow innovation to flourish. There was also a question on how to come from concrete deliverables to recommendations but there was no time to answer as it was announced that Axelle Lemaire had arrived for her presentation. She talked about Data Driven Bill in France. “I often hear that data is the petrol of the economy” and she followed saying that “I rather compared data with light” as light diffuses which meant that data needed to be shared to create value. So, in France, she said, they were trying to find out how to have a strategy pointing out data.gov.fr which had thousands of data. She acknowledged the importance of standardisation internationally. For example, being with farmers a day earlier and the farmers talking about standards.
Barbara Ryan (Secretariat Director, Group on Earth Observations – GEO) started her key note presentation at 10.11am. She stressed that they were interested in observations in, on, and around the earth. She talked about Global Systems of Systems noting that thematic observations also create their own silos. They realised that keeping the data as closed source was not cost efficient and they therefore went to White House to suggest that the data should be given out online as Scientists and government agencies were already being paid by tax payers who have already paid for the data being held by tax paying government. This meant that if you sell data as a government, you are actually losing in the long run. At this point, I was telling myself “it will be interesting to know if this is the case in other countries”. This was revealing to me. She pointed out that “Countries have borders, but earth observation don’t” juxtaposing that to say “Countries have borders, but research applications don’t”. Her presentation is available at RDA website (see Appendix A.1). PC then allowed Hilary to talk about housekeeping where she touched on the use of the bar coding system for logging attendance.
The poster session then started in parallel with the coffee break. My presentation was about “Harnessing datasets in England to understand determinants of car ownership” and available on RDA website (see Appendix A.2). A variety of datasets released from UK Government agencies is now making it possible for researchers and other interest groups to examine different aspects of society such as car ownership and use at more socially, spatially and temporally disaggregated levels than already studied. As part of an Engineering and Physical Sciences Research Council (EPSRC) sponsored project, the poster presented exploratory analysis using novel datasets to understand the relationship between car ownership in English households and potential factors that might influence it. Findings from linear dependency analysis suggest that with the exception of car driving time to primary schools, the remaining 40 variables were significantly (at the 0.01 and 0.05 levels) correlated to car ownership (CO) with varying correlation strengths, albeit noting that this does not imply any causality. Only correlations equal to or greater than |.3| are discussed here and the rest available in the poster. As expected from previous studies, population density and household median income are negatively (-.6) and positively (.5) correlated to CO respectively. However, our analysis goes much further than this to help to explain what it is about density in particular that might impact on CO and identifies several variables as being worthy of consideration in CO modelling in England. There were interests in the kinds of variables being looked at in our analysis.
I also attended the following sessions during the day: Joint meeting of IG Metadata, IG Biodiversity Data Integration & IG Marine Data Harmonization; Joint Meeting of IG Data Rescue, IG Geospatial, IG Big Data Analytics, IG Domain Repositories & IG Libraries for Research Data; Joint meeting of IG RDA/WDS Publishing Data Cost Recovery for Data Centres & IG Domain Repositories; and, RDA Outputs & Adopter Plenary Session. Updates on these session are available at RDA website (see Appendix A.3). I must say that the choice of venue for the social dinner was fantastic. The venue was “Bateaux Parisiens, Port de la Bourdonnais, Pier No. 7” - Paris by Night on the Seine!
Day 4 – Thursday 24th September 2015
In addition to reporting officially on “IG Quality of Urban Life: Progress and Next Steps for Quality of Urban Life Indicators” session, I attended the following sessions: RDA Outputs & Adopters & Experimentation Showcase Plenary Session; Experimentation Day Minute Madness; Joint meeting of IG Geospatial & IG Big Data Analytics; Joint meeting of IG Metadata, WG Metadata Standards Catalog & WG BioSharing Registry: Connecting data policies, standards & databases in life sciences; Climate Change Data Challenge Presentations & Award Ceremony; and, Keynote Presentation by Jean-Paul Leroux. Updates on these sessions are available at RDA website (see Appendix A.3). This was followed by a Networking Cocktail. My report on “IG Quality of Urban Life: Progress and Next Steps for Quality of Urban Life Indicators” session is separate from this blog.
There were approximately 17 participants in attendance for the Quality of Urban Life IG meeting; a more detailed report can be requested from the chairs. Generally, the following concerns/questions seemed to be emerging and as potential areas for the group (with the overarching question: What is it that the group want to solve/accomplish?):
- Identifying problems and raising red, yellow, green flags for now might be a good way to start
- Lack of standard for publishing of the data and building blocks will be a start for certain kinds of data
- Are we the group to develop standards, vocabularies, for Cities? Is it city data? There was caution that if the group want to develop standards would that mean too broad and impossible to do? A response was that what seemed to be a brain storming session suggest the reason why the group was an Interest Group (IG) and not a working group (WG) as the group has not achieved clarity yet.
- Urban data dictionaries
- Mark Fox offered to put together a one page about “How do we map the data etc.” to give a perspective.
- What is the type of ontologies we need for indicators?
- The divide between the formalism and the nature of data we have is huge.
- The science of Quality of Life is not set
- We need to map quickly what is out there; some sort of literature review could be helpful
- Time constraint for deliverables was also noted
Day 5 – Friday 25th September 2015
In addition to reporting officially on “IG Brokering” session, I attended the following sessions: IG National Data Services; IG Big Data Analytics; and, Final Plenary & Presentation of upcoming plenary meetings. My report on “IG Brokering” session is separate from this blog. Updates on these session are available at RDA website (see Appendix A.3).
There were approximately 20 participants in attendance for Brokering IG meeting; a more detailed report can be requested from the chairs. RDA outcomes and the possible contributions/synergies with the Brokering IG were discussed. Also report on Brokering Governance Working Group was discussed.
I managed to see Notre-Dame de Paris (or Notre-Dame Cathedral), which is on an island in Paris, and it was great! I also witnessed the “love lock” mystery, at Pont des Arts, often talked about in the media; but, I was not that fortunate as many of the “love locks” had been removed by then.
Day 6 – Saturday 26th September 2015
This day was for visiting some other interesting places and travelling back to the United Kingdom. Some of the places were Eiffel Tower, Luxor Obelisk, L'église de la Madeleine, Palais Garnier, Palais de Justice de Paris among others. After all the lessons learnt in the RDA meeting, presentation, networking, and good food, you probably might strongly agree with me when I say THANK YOU RDA EUROPE!!!