The Life Cycle of Structural Biology Data

RDA Structural Biology Interest Group
Supporting Output TitleThe Life Cycle of Structural Biology Data
Corresponding author: Chris Morris, STFC, Daresbury Laboratory, WA4 4AD
Contributors: Claudia Alen, Lucia Banci, Alexandre Bonvin, Pablo Conesa, Alfonso Duarte, John Helliwell, Rob Hooft, Brian Matthews, Antonio Rosato, Sameer Velankar, Geerten Vuister, John Westbrook, Martyn Winn


Research data is acquired, interpreted, published, reused, and sometimes eventually discarded. This document reports how structural biologists perform these tasks, and recommends improvements to the infrastructure available to them.

Download The Life Cycle of Structural Biology Data report


Executive Summary

Research data is acquired, interpreted, published, reused, and sometimes eventually discarded. Understanding this life cycle better will help the development of appropriate infrastructural services, ones which make it easier for researchers to preserve, share, and find data.


Structural biology is a discipline within the life sciences, one that investigates the molecular basis of life by discovering and interpreting the shapes of macromolecules. Structural biology has a strong tradition of data sharing, expressed by the founding of the Protein Data Bank (PDB) in 1971 (PDB, 1971). In the early years, data submissions to the archive were made by mailing decks of punched cards. The culture of structural biology is therefore already in line with perspective of the European Commission that data from publicly funded research projects are public data (COM(2011) 882 final).


This report is based on the data life cycle as defined by the UK Data Archive. This is the most clearly defined workflow that the authors are aware of. It identifies six stages: creating data, processing data, analysing data, preserving data, giving access to data, re-using data. Each will be discussed below. However, the data infrastructure for structural biology is not a perfect match for this workflow. For clarity, ʻpreserving dataʼ and ʻgiving access to dataʼ are discussed together. We also add a final stage to the life cycle, ʻdiscarding dataʼ.


Changes in research goals and methods have led to some changes in the requirements for IT infrastructure. A common data infrastructure is required, giving a simple user interface and simple programmatic access to scattered data. Progress on these tasks will support the development of workflows that facilitate the use of datasets from different facilities and techniques. The automatic acquisition of metadata can help. Large experimental centres already provide a highly professional data infrastructure. For smaller centres this is onerous - it is desirable that a standard package is provided enabling them to use the European e-infrastructure resources, in a way that integrates with other structural biology resources.



Group content visibility: 
Public - accessible to all site users
PDF icon SB-IG-Life-Cycle-Report.pdf476.61 KB
  • Rainer Stotzka's picture

    Author: Rainer Stotzka

    Date: 13 Jun, 2017

    (The thoughts I am describing in this comment reflect my personal opinions as a RDA member.)

    The report proposed as an "RDA Supporting Output" summarizes the European situation in research data management in some fields of structural biology. It concludes that a common data infrastructure is required "making the facilities offered by EUDAT and INDICO (European projects) more directly accessible to structural biologists".

    The report does not cover the situation and progress of structural biology in the other regions Americas, Africa, Asia, and Australia and of related scientific domains with similar characteristics. It is nearly identical to the deliverable D3.1 of the H2020 West-Life project . T

    o my knowledge the IG Structural Biology has been inactive from Feb 2013 until Apr 2017 without any meetings and open discussions visible to all RDA members. At P9 in Barcelona the co-chairs organized a session which I attended from 16:00 until 18:00: 5th April 2017, Breakout 3, 16:00 – 17:30 IG Structural Biology: The Life Cycle of Structural Biology Data ( The discussion of proposing the report as an "RDA Supporting Output" was neither addressed in the agenda (see link above) nor was it performed in the official meeting time.

    In my opinion the report “The Life Cycle of Structural Biology Data” does not represent an RDA consensus produced by an open and balanced discussion.

    I would recommend to facilitate an open and transparent discussion within the IG, to involve members from all regions, and to examine which already existing outputs and recommendations (from RDA, W3C, WDC, CODATA, …) can be adopted to improve research data management in structural biology. The newly forming IG on Disciplinary Interoperability Framework could provide new insights for future implementation. This could result in a real RDA community- and consensus-driven output that can be consolidated by a meeting at Plenary 10 in Montreal.

    I would welcome the revitalization of the IG Structural Biology by this process.

submit a comment