Biomedical Data - P5 BOF session

The presentations are now available - linked at the bottom of this page

RDA is increasingly becoming a venue for research data used in the life sciences. Many biomedical researchers and data scientists have joined RDA as individuals, and projects such as ELIXIR in Europe and NIH Big Data to Knowledge (BD2K) BioCADDIE and CEDAR centres in the US are looking to RDA for connections with emerging data standards and infrastructures. The agenda includes short presentation from these biomedical efforts and an open discussion. This BoF session will provide a setting for RDA members engaged with biomedical data to meet each other and to take stock of current efforts and needs.

Proposers and contact persons:

  • Susanna-Assunta Sansone  (University of Oxford e-Research Centre and Nature Publishing Group; CEDAR and bioCADDIE Committee member; ELIXIR UK Node member; RDA BioSharing WG co-chair, RDA TAB member)
  • George Alter (ICPS, University of Michigan; bioCADDIE Committee member; Domain Repositories IG co-chair)


1. Introduction - George Alter, Susanna-Assunta Sansone
2. Biomedical data infrastructures programmes and the role of the community
  • NIH Big Data to Knowledge Intitiative (BD2K) - Ian Fore, NIH NCI

The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact. To meet this challenge, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative in 2012. BD2K is a trans-NIH initiative established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement. Overall, the focus of the BD2K initiative is the development of innovative and transforming approaches as well as tools for making Big Data and data science a more prominent component of biomedical research.

BioCADDIE: the NIH BD2K biomedical and healthCAre Data Discovery and Indexing Ecosystem Lucila Ohno-Machado, UCSD

We will present our proposed plans for developing a data discovery index prototype system under the bioCADDIE project (biomedical and healthCare Data Discovery Index Ecosystem) with help from a community of users and data scientists. Investigators at several institutions are organizing software development that will be guided by the outcomes of working groups composed of members from various communities (e.g., end-users, data producers, data scientists, etc.) The development will consist of base software to which different modules can “dock”, providing a flexible infrastructure to allow experiment with different approaches for executing searches and for displaying search results. We plan to start with data produced as a result of the NIH BD2K program and increasingly index data that come from several other sources, including popular research data repositories and publications.  We will coordinate out efforts with national and international initiatives focused on data indexing.

CEDAR: the NIH BD2K Centre for Expanded Data Annotation and Retrieval - Susanna Sansone, on behalf of the CEDAR team

An international partnership among Stanford, Yale and Oxford researchers, CEDAR will develop a unified framework that researchers in all scientific disciplines can use to create consistent, easily searchable metadata, to facilitate its use in the analysis of Big Data sets. The framework will tackle both general and discipline-specific metadata,  starting from the NIAID's ImmPort repository as first use case, and building on several long‑standing projects of national and international scope, such as the National Centre for Biomedical Ontology, the ISA and BioSharing communities (also part of ELIXIR UK Node, and closely linked to the RDA-Force11 BioSharing WG).

  • RDA Elixir Bridging Force IG Bengt Persson, ELIXIR Sweden Node

ELIXIR - the European life-science Infrastructure for Biological Information - was established in 2014 as a permanent organisation that consolidates Europe’s national centres into a single, coordinated infrastructure. Currently ELIXIR have 12 members (Czech Republic, Denmark, Estonia, Finland, Israel, Netherlands, Norway, Portugal, Sweden, Switzerland, UK and EMBL-EBI) represented by their national bioinformatics infrastructures - ELIXIR Nodes. In addition, ELIXIR includes participation from a further six ELIXIR Observer countries (Belgium, France, Greece, Italy, Slovenia, Spain) which are close to ratifying the ELIXIR Consortium Agreement. ELIXIR brings together national bioinformatics infrastructures from  ELIXIR’s member states and, for the first time, connects these with the major European life-science data archives. Through this coordination of local, national and international resources the ELIXIR infrastructure will meet the data-related needs of Europe’s 500,000 life-scientists. By implementing standards and, importantly, a Europe-wide framework of experts and supporting structures, ELIXIR will drive the coordination efforts at both national and international levels.


  • Force11: driving new, supporting and promoting existing efforts Maryann Martone, UCSD

Force11 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. Force11 has grown from a small group of like-minded individuals into an open movement with clearly identified stakeholders associated with emerging technologies, policies, funding mechanisms and business models. While not disputing the expressive power of the written word to communicate complex ideas, our foundational assumption is that scholarly communication by means of  enhanced media-rich digital publishing across multiple types of research objects is likely to have a greater impact than communication in traditional print media or electronic facsimiles of printed works. The vision of FORCE11 was laid out in the FORCE11 Manifesto, produced in 2011. Since that time, we have seen significant steps towards re-imagining scholarly communication. I will present an overview of FORCE11 and highlight initiatives underway, particularly in the area of research data, e.g., the Resource Identification Initiative and the Data Citation Principles, and new initiatives just starting in (the joint RDA-Force11) Biosharing, Software citation and Defining the Scholarly Commons.

3.  Open discussion
  • What are the key needs for data sharing in biomedical research?
  • How can RDA help biomedical research?
  • Can results of RDA WGs be helpful?