# Deep sequencing of pathogens to precisely define transmission networks using rare variants

> **NIH NIH R01** · HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH · 2021 · $554,524

## Abstract

Project Summary
Transmission trees that define how pathogens have spread through a host network are immensely valuable to
epidemiology, yet using existing methods comparing pathogen genomes such trees are difficult or impossible
to obtain for many diseases. This is because the phylogenetic tree of the infectious agents is not necessarily
equivalent to the transmission tree. For many pathogens the infecting population can harbor substantial
nucleotide diversity, that is not adequately characterized by the genomes of one or a few isolates, and which is
predicted to mislead attempts to reconstruct transmission chains. An alternative source of data to infer
transmission is `shared rare variants': polymorphic sites at which more than one nucleotide is present within
the infection, and which are shared among a small number of cases. The reasoning is that these reflect a
transmission bottleneck that allows through more than one genotype, and so the same variant site is
vanishingly unlikely to be found by chance in unrelated cases. Preliminary simulations modeling evolution of
pathogens on a transmission network indicate that this approach is greatly superior to existing methods. This is
further supported by recent work on viral pathogens including Influenza and HIV that correlates shared rare
variants with host networks, but these methods have not been tested by experiment, or applied to bacteria.
The proposed research uses deep sequencing to assay shared rare variants in populations of three bacterial
pathogens: experimental transmission of Citrobacter rodentium in mice, a longitudinal cohort study of MRSA
transmission in a high burden setting, and tuberculosis outbreaks. Preliminary data from the transmission
experiments indicate multiple polymorphisms have arisen over the relatively short transmission chains (20
animals). The MRSA study will use samples from 4 body sites collected from ~600 recruits to the US Army
undergoing basic training, and will test whether shared rare variants will be more likely to be found among
close contacts reflecting the host network. This can be used to determine whether some body sites are more
likely to transmit, and variants found in carriage samples can be compared with those from cases of skin and
soft tissue infection to determine which body site is the likely source. The new 10X Genomics platform, which
by tagging single molecules can increase resolution beyond the basic strategy, will be trialed to test whether it
further discriminates between potential sources. Finally, deep sequence data from two retrospectively analyzed
and identified outbreaks of TB will be assayed to develop means to infer the presence of unsampled links,
which can then be applied to samples prospectively collected and sequenced by collaborators. Taken together
this program of research will provide an unparalleled insight into the processes of infection within the host,
which will inform contact tracing and help identify missed links in the ...

## Key facts

- **NIH application ID:** 10196948
- **Project number:** 5R01AI128344-05
- **Recipient organization:** HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH
- **Principal Investigator:** William Hanage
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $554,524
- **Award type:** 5
- **Project period:** 2017-06-26 → 2024-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10196948

## Citation

> US National Institutes of Health, RePORTER application 10196948, Deep sequencing of pathogens to precisely define transmission networks using rare variants (5R01AI128344-05). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10196948. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
