Structure-based functional annotation of microbial genomes

NIH RePORTER · NIH · R01 · $722,354 · view on reporter.nih.gov ↗

Abstract

Abstract Given the recent explosion in the number of sequenced genomes and the relative lack of functional information on their contents, annotating the biological functions of all proteins across different genomes represents a major challenge to modern molecular and computational biology. The problem of genome annotation is particularly acute for bacteria; a vast range of commensal and pathogenic bacterial species impact human health, and only computational approaches, when appropriately combined with carefully targeted biochemical experiments, can provide the reliable, high-throughput annotations necessary to understand their physiology. The current approach to computational function prediction is mainly based on transfer from known proteins of similar sequence, which however becomes increasingly unreliable when the homology level is low. Recently, significant progress has been achieved in protein 3D structure prediction as witnessed by the community-wide blind testing experiments, and current state of the art methods can construct correct protein folds for the majority of genome sequences without using close homologous templates. Building on the hypothesis that biological function is more directly associated with 3D structure than sequence, this proposal aims to initiate a paradigm shift from protein structure prediction to structure-based function annotations. Combining expertise from computational biology, microbiology, and structural biology, the PIs will systemically examine the potential and scope of how computational structure models from cutting-edge modeling methods can help provide reliable high-throughput annotations of bacterial genomes, with a particular focus on the difficult targets that cannot be addressed by the existing sequence homology-based approaches. This project is designed to develop and test several cutting-edge approaches for protein function prediction using low-resolution (but correctly folded) models from the structure predictions. The specific aims include the development of novel structure-based methods for modeling of the protein-ligand binding sites, and enzyme and gene ontologies. The modeling methods and results will be tested by a set of carefully designed experiments, including high-throughput chemical screening and detailed structural-biology based characterizations. At all stages, iterative prediction-to-experiment-to-refinement loops will be established between the experiments and computational annotations to guide the functional modeling method development and advances. The studies of this project will be focused on E. coli K12 strain, for which >10% of the genome remains un-annotated despite a long history of use as a model organism; but the long-term goal is to build up a novel and robust framework which can be used as a resource for reliable function annotations for various other microbial genomes. Compared with current sequence-based approaches, the success of the structure-based pipelines could pot...

Key facts

NIH application ID
9976447
Project number
5R01AI134678-03
Recipient
UNIVERSITY OF MICHIGAN AT ANN ARBOR
Principal Investigator
Yang Zhang
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$722,354
Award type
5
Project period
2018-08-01 → 2022-07-31