Extending MolProbity Diagnosis & Healing Methods to Empower Better CryoEM & Xray Models at 2.5-4A Resolution, plus Versioned, Redeposited "GEMS" for Important Individual Structures

NIH RePORTER · NIH · R35 · $241,500 · view on reporter.nih.gov ↗

Abstract

Summary/Abstract for Supplement MolProbity macromolecular model validation assigns overall quality scores, but most importantly, it reports probable local errors with graphic flags that help structural biologists fix most of those errors. It is considered state-of-the-art for model validation, has a large worldwide user base, is incorporated into most model-building and refinement software systems, and is central to model validation at the Protein DataBank. The component programs add and optimize hydrogen atoms to analyze and display all-atom contacts (H- bonds, favorable van der Waals, and serious steric "clashes" of unfavorable overlaps ≥0.4Å). That distinctive feature of MolProbity is augmented with updated versions of traditional validation measures such as Ramachandran and rotamers, with RNA ribose pucker and backbone conformers, and most recently with new criteria such as CaBLAM that are especially useful at the 2.5-4Å resolutions common for structures of exciting "molecular machines" obtained by crystallography or cryoEM. The programs in this effective and widely used toolset are diverse and rather loosely coupled, and some are more than 20 years old. For quite a while they have been in great need of a deep rewrite and modernization beyond what an academic research grant can support, in order to stay maintainable, extensible, and robust. Our problem with the MolProbity code is the sort of dilemma that this new type of supplement is designed to solve, in order to provide open scientific software that enables its users to take full advantage of the broad capabilities now possible for secure and stable accessibility, interoperability, and re-use.We believe MolProbity is worth this investment, first to the structural biology community and second to the biomedical end-users of these structures. I am very fortunate to have identified a professional contractor highly experienced in many challenging computational improvements, rewrites, and interactive 3D graphics, that is somewhat familiar with macromolecules and interested in taking on this project. The current plan is to unify our MolProbity code base around Python with calls to well-optimized and hardened libraries in C++ or C, and to use the same standard-format parsers, internal data hierarchy, in-memory communication, and open CCTBX toolbox utilities used by the most recent of our programs which are integrated with the Phenix software system. The rewrite will include more complete regression tests, develop a standardized build process, and explore a variety of cloud hosting options for the web service.

Key facts

NIH application ID
10166392
Project number
3R35GM131883-02S1
Recipient
DUKE UNIVERSITY
Principal Investigator
DAVID Claude RICHARDSON
Activity code
R35
Funding institute
NIH
Fiscal year
2020
Award amount
$241,500
Award type
3
Project period
2019-06-01 → 2024-05-31