Adoption Day at Plenary 5

Sunday 8 March 2015 - San Diego Supercomputer Center (SDSC)

The first outputs from RDA working groups focus on:

  1. Data Foundation & Terminology: a basic data model and terminology to improve interoperability
  2. Data Type Registries: allowing humans and machines to act on unknown data types
  3. PID Information Types: semantic categories and a common interface for providers of persistent ID services to improve interoperability
  4. Practical Policy: defining best practice workflows how to deal with data automatically and in a documented way to increase trust and reproducibility

Following a brief introduction to the RDA outputs, a set of European and US adopters will illustrate the benefits of adoption. Participants are invited to interact directly with the output representatives present to understand how their adoption ideas and needs can be met.

07:30 Shuttles leave from Conference Venue  (please view below the shuttle schedule)
08:00 - 08:30 Registration at SDSC
08:30 - 08:40 Welcome & Logistics 
08:40 - 08:55

Keynote: Dr Patrick Cocquet, Cap Digital

Dr. Patrick Cocquet is the chief executive officer of Cap Digital cluster, a structure that gathers more than 650 innovative companies and more than 50 research and education organisms, working in digital economy fields. Among its various communities, the one working on data (big data, open data, data visualization) is one of the more active and innovative one. Holder of a qualification from a French engineering high school in 1979, he started his carrier in a large group, where he led international standardization and development of products range actions for embedded networks and then for internet networks. In 2000, he founded a startup, a company dedicated to software for internet network equipment. He started and organized Cap Digital in 2006, making it the first digital cluster in France and Europe.

08:55 - 09:10

RDA OutputsHerman Stehouwer, Max Planck Society


Herman will give a brief overview of all RDA outputs to date, especially focussing on the ones that are the subject of the adoption projects mentioned below.

Dr.ir. Herman Stehouwer has a background in computer science and computational linguistics. At the Max Planck Institute for Psycholinguistics he has been responsible for the technical aspects of the CLARIN search infrastructure for CLARIN-D as well as having the responsibility for a number of software projects. Since the start he has been involved with setting up the RDA. Now that the RDA is moving to a steady state he is the Europe representative in the RDA Secretariat next to his regular RDA and European RDA Support work.

 

09:10 - 09:50 US Adoption Call Winners
09:10 - 09:20

Materials Genome Initiative, Laura Bartolo, Kent State University

The U.S. Office of Science & Technology launched the Materials Genome Initiative (MGI) to enable discovery, development, manufacturing, and deployment of advanced materials at least twice as fast as possible today, at a fraction of the cost.  Critical to the success of MGI is the establishment of the Materials Innovation Infrastructure [MII] to seamlessly integrated advanced modeling, data, and experimental tools. Our RDA Adoption Demonstration Project focuses on the Data Type Registry and PID, with guidance and feedback from the National Institute of Standards and Technology, a lead Federal agency developing the key models, tools, standards, and data for the MII. 


Laura Bartolo is director of the Center for Materials Informatics at Kent State University, and focuses her research, teaching, and work on the management of materials data and information in collaborative, distributed open-source initiatives. She is Co-Chair of the Interest Group on Materials Data, Infrastructure, & Interoperability, past Chair of the ASM Materials Database Committee, and a Fellow of the U.S. National Academy of Sciences Board on Research Data and Information. 

09:20 - 09:30

Platform for Experimental Collaborative Ethnography, Luis Felipe Rosado Murillo, Rensselaer Polytechnic Institute

In this presentation, we will describe the design and the practical guidelines for the Platform for Experimental, Collaborative Ethnography (PECE), and our efforts to implement comprehensive data management policies based on recommendations from RDA's Working Group on Practical Policies. Our goal is to provide a model for broader adoption of RDA outcomes in the context of empirical humanities research (history, anthropology, folklore, and similar disciplines in the humanities that involve researcher-created qualitative data). Based on our implementation of these data management guidelines and practical policies, we expect wider adoption by individual researchers and research groups in the empirical humanities, enabling easier compliance with the data management expectations of funding agencies while increasing collaboration and data sharing among empirical humanities researchers.

 

Luis Felipe R. Murillo is a PhD candidate at UCLA Anthropology, currently working as a research fellow at the Berkman Center for Internet and Society, Harvard University. His research work is dedicated to the study of computing expertise in Free and Open Source development communities. 

09:30 – 09:40

Early outcomes: Implementation of RDA DFT Recommendations for DataFed.net, Aaron Addison, Washington University St. Louis

The DataFed.net data catalog lists numerous datasets related to atmospheric and air-quality data over time.  The metadata surrounding this data catalog is being examined in the context of the recently released RDA outcome focused on the work the Data Foundation and Terminology (DFT) working group has published.  This real-world examination of a RDA outcome is working to adopt a common terminology for both the community of practice as well as machine driven applications. 


Aaron Addison is the Director of Research Data & GIS for Washington University in St. Louis. 

Working through the University Libraries, he leads a team of professionals collaborating with faculty, students and staff on data collection, analytics, management and preservation. 

09:40 – 09:50

Deep Carbon Observatory, Stephan Zednik, Rensselaer Polytechnic Institute

The Deep Carbon Observatory (DCO) community is building a cyber-enabled platform for linked science, made available to the community by a multi-institutional data portal. Persistent identifiers and domain specific data types have been identified as key technological issues the portal must address. This presentation focuses on the DCO portal’s planned adoption of RDA DTR and PID methodologies and technologies as a means to address the DCO community's need for persistently identifiable and understandable data type information. 

Stephan Zednik is a Senior Software Engineer with the Tetherless World Constellation at Rensselaer Polytechnic Institute. His research interests include researcher collaboration networks, quality representation and semantics, and provenance representation from data science tools. Stephan participated in the W3C PROV working group, contributing to the W3C PROV-O Recommendation and as editor of the W3C PROV-XML Note. 

09:55 - 10:30 European Adopters
09:55 - 10:10

Practical Policies within the EUDAT Collaborative Data InfrastructureEUDAT, Mark van de Sanden

Research communities from different disciplines have different ambitions and approaches – particularly with respect to data organization and content but they share one common point: they all have basic data service requirements. EUDAT offers common data services, supporting multiple research communities as well as individuals, through a geographically distributed, resilient network connecting general purpose data centres and community-specific data repositories. These shared services and storage resources are distributed across 15 European nations and data is stored alongside some of Europe’s most powerful supercomputers. EUDAT, a user-driven, service-oriented, trusted, secure and sustainable data infrastructure offers solutions for finding, sharing, storing, replicating, staging and performing computations with primary and secondary research data. Community-specific data repository managers can join the data infrastructure to archive, replicate, process and catalogue data on behalf of their community. While researchers (from academia and industry), citizen scientists, policy makers, and members of the public can share, discover and re-use data via EUDAT

Mark van de Sanden is the team leader of the Data Services group at SURFsara. He has more than 20 years’ of experience in managing computers, supercomputers and large scale data infrastructure. He has a BSc in computer engineering from the technical college of 's Hertogenbosch and started work in 1994 at the National Aerospace Laboratory (NLR) as UNIX system administrator.  He joined SURFsara in 1997 as an UNIX system administrator of supercomputing environments. Mark is now a System Architect and since 2006 the team leader of the Data Services group. He is the work package leader of the service building activities within EUDAT, he is member of the management board of the EPIC consortium, and involved in the WLCG NL-T1, LOFAR Long Term Archive and PRACE data challenges.

10:10 - 10:20

Common Language Resources and Technology Infrastructure Adopting DFTCLARIN, Dieter van Uytvanck

Dieter van Uytvanck is technical director for the CLARIN ERIC. He graduated in Informatics (2002, Ghent University) and Language and Speech technology (2007, Radboud University Nijmegen) and has been involved in technical infrastructure building (based at the Max Planck Institute for Psycholinguistics) since CLARIN's preparatory phase in 2008. As of 2012 he is the chair of the Standing Committee on CLARIN technical centres. He is a member of EUDAT's Services and Architecture Forum, GÉANT's International User Advisory Committee and is co-chair of the RDA working group on dynamic data citation

10:20 - 10:30

German Climate Computing Center and PIT, Deutsches Klimarechenzentrum – DKRZ, Stephan Kinderman

The German Climate Computing Center (DKRZ) is a national service facility for Earth system research. Data services at DKRZ can benefit from a cross-project practical adoption of RDA outputs, and some first implementation work will be performed until the end of 2015. The principles of PID information typing and central type registration can support use cases from data versioning and replication to later hand-over processes from data dissemination to the archival stage. DKRZ aims to maintain an architecture of loosely coupled specialized services, which can benefit from a conceptual underpinning as provided by the Foundation and Terminology group. 

Stephan Kindermann works at the German Climate Computing Center (DKRZ), Germany since 2004. He currently leads the data infrastructure group at DKRZ and is involved in national as well as international  e-science projects and efforts. He received a PhD in computer science from the University of Erlangen in 1997. As a post doc he concentrated on parallel and distributed processing infrastructures. In the course of emerging national and international grid and e-science projects like EGEE, C3Grid and ESG he joined the DKRZ to help to improve data handling and data management in the climate community. Current activities include the integration of DKRZ services with large international efforts like ESGF, IS-ENES and EUDAT as well as preparing for the future big climate data handling use cases. Stephan Kindermann lives in Hamburg as well as Nuremberg, Germany.

10:30 - 11:00 Coffee Break (next session starts during this coffee break)
10:40 - 12:15

Ask the AdoptersAfter the presentations are over, at least one person from each project will be available at designated spaces around the room to answer questions, provide more information, and to speak in detail with anyone who wants to know more about how the output was used.  This is open networking time - all are encouraged to stay and continue the discussions.  Representatives of the 4 Working Groups will also be available for questions during this time.

12:15 - 12:30 Observations and Conclusions - capturing some of the key ideas and next steps on adoption
12:30 - 14:00 Lunch & SDSC Tours
14:00 Buses leave from SDSC (for those not attending Large Scale Data Projects)
14:00 - 18:00 Large Scale Data Projects Programme
18:15 Buses leave from SDSC for Paradise Point