Chris Morris (Science and Technology Facilities Council, UK)
Lucia Banci, Antonio Rosato (University of Florence, IT)
Issues to be addressed
Structural Biology (SB) is concerned with the determination of the three-dimensional (3D) structures of individual biomacromolecules (proteins and nucleic acids), and with those of complexes and higher order structures formed by association of these individual components. The ultimate goal of SB is to understand the mode of action of these biological nano machines, based on their static 3D structures, their dynamic characteristics, and the ways in which they interact both with other macromolecules and with small ligands such as drugs and agrochemicals. SB contributes to society by supporting a range of applications including drug design, crop improvement, and engineering of enzymes of industrial significance.
The productivity of structural biologists has increased rapidly over the years, thanks to improvements in instruments including beamlines and detectors; improved NMR hardware; improved reagents and sample preparation protocols; and improved software for structure solution and for simulation and modelling. The next challenge is that the level of complexity, even of individual bio-macromolecules, and much more so of higher order structures, cannot be fully addressed by a single experimental approach. Consequently, progress in SB will crucially depend on establishing synergy between several experimental SB techniques, such as X-ray crystallography, NMR spectroscopy, electron microscopy.
However, at present the data generated within each of these disciplines are handled using different formats and with different underlying data models thus effectively preventing reuse/interoperability at the experimental data level. This makes it impossible to truly integrate structural biology data without previous interpretation and elaboration.
The proposed Interest Group aims to raise awareness of this fundamental bottleneck and, correspondingly, to collaboratively make a plan to address the challenge. This will imply the identification of possible temporary solutions that could become available in a short timeframe and the formulation of a more ambitious plan to promote the development of an intrinsically comprehensive approach to accompany the evolution of SB as an integrative multi-technique discipline. The next step will be to seek the necessary resources for implementation.
• M1 A site for document exchange (google docs)
• M1 A calendar of telecom meetings
• M3 A first set of recommendations for new practices, data formats, and data infrastructure
• M12 A case for support for the necessary development work
• A truly essential point is establishing smooth, shared formats and procedures to store, maintain and exchange experimental data and models. There is no approach to these issues that has been generally accepted by the scientific community. Actually, very limited attempts in this direction have been deployed at all. The Interest Group will recommend best practices for the use of existing data formats, and flag up the major challenges for future development of new formats.
• The Interest Group will additionally identify opportunities for automatic capturing of metadata during the current scientific workflow, and propose new infrastructural facilities where necessary.
Engagement with existing work in the area
One of the proposers, Antonio Rosato, plays a leading role in WeNMR, which has developed a web portal providing data processing facilities for Nuclear Magnetic Resonance methods. Another has developed the Protein information Management System, which supports data management for recombinant protein production, which precedes NMR and other structural techniques. Lucia Banci has a long-standing, internationally renowned reputation as a structural biologist using both NMR spectroscopy and X-ray diffraction data for the study of cellular processes. In addition, she is a key person in the activities and management of INSTRUCT, the ESFRI Infrastructure for Integrated Structural Biology.
We will invite participation, among others, from the Protein Data Bank (PDB), the international repository of 3D structural results; from experimental facilities, such as the Diamond Light Source which provides world-leading facilities for archiving and processing experimental data; from the BioMagResBank, the international repository of biological NMR data; from structural genomics consortia; and from the developers of structural software.
• M1 Organize a survey to gather suggestions and hints from global community (community engagement)
• M1-3 Broadening group membership by involving delegates of major as well as representative small SB centers/teams
• M2 Review current relevant initiatives world-wide
• M3-M12 Produce and continuously update a set of recommendations, including both a short term vision and a longer term vision to more comprehensively support integrated SB, including also plans for data infrastructure(s) in the field
Given the scale of the challenge we are taking on, an adoption plan will be part of our final deliverable.
The initial members are:
- Helen Berman, Rutgers University, Director of the RCSB PDB
- Alexandre Bonvin, University of Utrecht, WeNMR coordinator
- Paolo Carloni, German Research School for Simulation Sciences, Expert in simulation of protein structure and dynamics
- Jose Maria Carazo, Centro Nacional de Biotecnologoa CNB-CSIC, Electron Microscopist
- John Helliwell, University of Manchester, Chair, data management working group of the International Union of Crystallographers
- Graham Kemp, Chalmers University, NMR
- Gerard Kleywegt, PDBe, EBI
- Brian Marsden, SGC
- Torsten Schwede, University of Basel, Expert in Macrolecular Modelling
- David Stuart, Diamond Light Source, Instruct, University of Oxford
- Thomas Terwilliger, LANL
- Rikkert Wierenga, University of Oulu, protein crystallographer
- John Westbrook, RCSB
- Haim Wolfson, University of Tel Aviv, Developer of software for macromolecular complexes
Benefits and Impact
The above line of development would be very important to foster a leap forward in the way the SB community is organized and is likely to impact significantly on its future development, as well as on the broad biomedical research community. Presently, integration of SB techniques in Europe is tackled exclusively by the INSTRUCT ESFRI Infrastructure. However, the role of SB research within the large scientific domain of the Life Sciences is intimately linked to its capability to evolve from the current usage of scattered data from one-two different sub-disciplines to being able to truly integrate and synergistically exploit information from the whole array of SB experimental and modelling techniques. Thus, the action that will be carried out by the presently proposed working group will be beneficial to all researchers either working personally in SB or exploiting SB and its outcomes, even without the specific background, located in either large or smaller centers. We also predict that the action will be beneficial to funding agencies, by allowing their sponsored projects to obtain higher impact results and thus greater visibility. Further, the availability of common, standardized protocols and formats for data storage will enhance data reusability.