# Integrating protein structure and genomic data to predict antibiotic resistance in Mycobacterium tuberculosis

> **NIH NIH F32** · HARVARD MEDICAL SCHOOL · 2021 · $66,390

## Abstract

Project Abstract
 Tuberculosis causes over one million deaths annually, and increasing antibiotic resistance is rendering
the disease more difficult to treat. Rapid genotype-based resistance diagnosis of Mycobacterium tuberculosis,
the bacterium that causes tuberculosis, is needed to overcome the long treatment delays associated with culture-
based methods. Previous work has established sets of genetic markers of antibiotic resistance to more common
antibiotics, but such studies require large numbers of sequenced resistant isolates, and are unable to make
predictions for rare or newly observed variants. The requirement for large numbers of isolates is especially
problematic for five newly introduced antitubercular agents, which have small but increasing numbers of
documented resistant isolates.
 Traditional methods for associating genotype with phenotype assume that every site is independent, and
therefore many examples of mutations at a particular site are needed to infer statistically significant effects of
variants on phenotype. Biological knowledge tells us that this assumption is not true – most bacterial genes
encode proteins, which have distinct three-dimensional shapes and functions. Mutations that causes changes in
similar regions of a protein are more likely to have similar effects on phenotype, potentially allowing for sharing
of statistical signal that could increase the power of significance testing.
 In this proposed project, I will develop two complimentary statistical approaches that will use protein
three-dimensional structure to boost signal from genetic variants that cause antibiotic resistance in M.
tuberculosis. Specifically, I will first develop an unsupervised statistical test to determine if repeated mutations
within the same protein are clustered in three-dimensional space, which indicates that the mutations confer a
fitness benefit. This approach will have increased sensitivity over traditional methods that look for significant
numbers of mutations, and facilitate the development of mechanistic hypotheses about the effects of mutation
on protein function. Second, I will use protein three-dimensional structure as a prior in a Bayesian linear mixed
model to predict antibiotic resistance. This prior will allow nearby variants to ‘boost’ one another’s signal and
establish associations between genotype and phenotype that are beyond the reach of current methods. The key
application of this approach will be establishing resistance-conferring genotypes for five newly introduced
antitubercular agents. The approach proposed here will likely generalize to other bacterial pathogens and
represent an important leap forward in using pathogen molecular data in the clinic.

## Key facts

- **NIH application ID:** 10312207
- **Project number:** 1F32AI161793-01A1
- **Recipient organization:** HARVARD MEDICAL SCHOOL
- **Principal Investigator:** Anna Gustafson Green
- **Activity code:** F32 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $66,390
- **Award type:** 1
- **Project period:** 2021-07-15 → 2023-10-02

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10312207

## Citation

> US National Institutes of Health, RePORTER application 10312207, Integrating protein structure and genomic data to predict antibiotic resistance in Mycobacterium tuberculosis (1F32AI161793-01A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10312207. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
