Structure-based functional annotation of microbial genomes

NIH RePORTER · NIH · R01 · $777,000 · view on reporter.nih.gov ↗

Abstract

Abstract One of the most pressing challenges in modern biology is that of translating the massive amounts of information on biological sequences that has been made available by recent advances in sequencing technologies, into corresponding insights into the behavior of biological systems. Determining the functions and physiological roles of proteins remains a major component of this challenge; for many species, especially non-model microbes such as microbial pathogens, the fraction of the proteome consisting of poorly annotated proteins may approach 50%, severely limiting our ability to even identify mechanisms of pathogenesis and potential therapeutic targets. The massive number of poorly annotated proteins of potential biological importance necessitates the ongoing development of efficient and reliable computational approaches for functional annotation of proteins. Over the past few years, we have developed and applied several new workflows for whole-proteome structure prediction and functional annotation of bacterial genomes, with applications to laboratory strain E. coli K12 and to the minimal genome mycoplasma JCVI-syn3.0. Our workflows are distinguished by the integration of structural information (including high-accuracy protein structure prediction) in functional annotations, alongside classical methods such as sequence homology and syntenty, and recent developments such as the inclusion of deep-learning based predictors; we find that collectively, our workflows provide highly accurate functional annotations that are especially useful for ‘difficult’ protein targets without clear annotated homologs. We will now shift our focus to applying our tools to the proteomes of bacterial pathogens, with an initial emphasis on uropathogenic E. coli. Specifically, we will continue to develop our structure/function prediction capabilities to further improve accuracy and increase the richness of information delivered (Aim 1), perform prediction-guided biochemical characterization of likely virulence genes to assess predictive performance and identify potential pharmaceutical targets (Aim 2), obtain experimental structures for proteins that are identified as difficult structural targets which likely represent novel folds or unusual sequences for known folds (Aim 3), and test the physiological importance of likely newly-identified virulence factors in an in vivo mouse model (Aim 4). The experimental data gathered under Aims 2-4 will be continuously integrated with the ongoing methods development under Aim 1 to maximize the performance and utility of the developed tools. The results of this project will include further improvements to widely used and cited tools for rapid structure/function prediction, identification of specific virulence determinants in uropathogenic E. coli and preliminary insights into how they may be targeted for pharmaceutical intervention, and additional structural data of potential virulence factors that will aid in structure-based...

Key facts

NIH application ID: 10535650
Project number: 2R01AI134678-05
Recipient: UNIVERSITY OF MICHIGAN AT ANN ARBOR
Principal Investigator: Lydia Petra Freddolino
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $777,000
Award type: 2
Project period: 2018-08-01 → 2027-07-31