Integrating protein structure and genomic data to predict antibiotic resistance in Mycobacterium tuberculosis

NIH RePORTER · NIH · F32 · $72,302 · view on reporter.nih.gov ↗

Abstract

Project Abstract Tuberculosis causes over one million deaths annually, and increasing antibiotic resistance is rendering the disease more difficult to treat. Rapid genotype-based resistance diagnosis of Mycobacterium tuberculosis, the bacterium that causes tuberculosis, is needed to overcome the long treatment delays associated with culture- based methods. Previous work has established sets of genetic markers of antibiotic resistance to more common antibiotics, but such studies require large numbers of sequenced resistant isolates, and are unable to make predictions for rare or newly observed variants. The requirement for large numbers of isolates is especially problematic for five newly introduced antitubercular agents, which have small but increasing numbers of documented resistant isolates. Traditional methods for associating genotype with phenotype assume that every site is independent, and therefore many examples of mutations at a particular site are needed to infer statistically significant effects of variants on phenotype. Biological knowledge tells us that this assumption is not true – most bacterial genes encode proteins, which have distinct three-dimensional shapes and functions. Mutations that causes changes in similar regions of a protein are more likely to have similar effects on phenotype, potentially allowing for sharing of statistical signal that could increase the power of significance testing. In this proposed project, I will develop two complimentary statistical approaches that will use protein three-dimensional structure to boost signal from genetic variants that cause antibiotic resistance in M. tuberculosis. Specifically, I will first develop an unsupervised statistical test to determine if repeated mutations within the same protein are clustered in three-dimensional space, which indicates that the mutations confer a fitness benefit. This approach will have increased sensitivity over traditional methods that look for significant numbers of mutations, and facilitate the development of mechanistic hypotheses about the effects of mutation on protein function. Second, I will use protein three-dimensional structure as a prior in a Bayesian linear mixed model to predict antibiotic resistance. This prior will allow nearby variants to ‘boost’ one another’s signal and establish associations between genotype and phenotype that are beyond the reach of current methods. The key application of this approach will be establishing resistance-conferring genotypes for five newly introduced antitubercular agents. The approach proposed here will likely generalize to other bacterial pathogens and represent an important leap forward in using pathogen molecular data in the clinic.

Key facts

NIH application ID
10489291
Project number
5F32AI161793-02
Recipient
HARVARD MEDICAL SCHOOL
Principal Investigator
Anna Gustafson Green
Activity code
F32
Funding institute
NIH
Fiscal year
2022
Award amount
$72,302
Award type
5
Project period
2021-07-15 → 2023-10-02