# Fast and flexible Bayesian phylogenetics via modern machine learning

> **NIH NIH R01** · FRED HUTCHINSON CANCER CENTER · 2022 · $744,770

## Abstract

Project Abstract/Summary
The SARS-CoV-2 pandemic underlines both our susceptibility to and the toll of a global pathogen outbreak.
Phylogenetic analysis of viral genomes provides key insight into disease pathophysiology, spread and po-
tential control. However, if these methods are to be used in a viral control strategy they must reliably account for
uncertainty and be able to perform inference on 1,000s of genomes in actionable time. Scaling Bayesian phylogenet-
ics to meet this need is a grand challenge that is unlikely to be met by optimizing existing algorithms.
 We will meet this challenge with a radically new approach: Bayesian variational inference for phylogenet-
ics (VIP) using ﬂexible distributions on phylogenetic trees that are ﬁt using gradient-based methods analogous
to how one efﬁciently trains massive neural networks. By taking a variational approach we will also be able
to integrate phylogenetic analysis into very powerful open-source modeling frameworks such as TensorFlow
and PyTorch. This will open up new classes of models, such as neural network models, to integrate data such
as sampling location and migration patterns with phylogenetic inference. These ﬂexible models will inform
strategies for viral control.
 In Aim 1 we will develop the theory necessary for scalable and reliable VIP, including subtree marginal-
ization, local gradient updates needed for online algorithms, convergence diagnostics, and parameter support
estimates. We will implement these algorithms in our C++ foundation library for VIP. In Aim 2 we will
develop a ﬂexible TensorFlow-based modeling platform for phylogenetics, enabling a whole new realm of
phylogenetic models based on neural networks to learn phylodynamic heterogeneity with minimal program-
ming effort. We will provide efﬁcient gradients to this implementation via our C++ library. In Aim 3 we will
use the fact that VIP posteriors are durable and extensible descriptions of the full data posterior to enable
dynamic online computation of variational posteriors, including divide-and-conquer Bayesian phylogenetics.
This work will enable a cloud-based viral phylogenetics solution to rapidly update our current estimate of the
posterior distribution when new data arrive or the model is modiﬁed.
1

## Key facts

- **NIH application ID:** 10434141
- **Project number:** 5R01AI162611-03
- **Recipient organization:** FRED HUTCHINSON CANCER CENTER
- **Principal Investigator:** Frederick Albert Matsen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $744,770
- **Award type:** 5
- **Project period:** 2021-07-01 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10434141

## Citation

> US National Institutes of Health, RePORTER application 10434141, Fast and flexible Bayesian phylogenetics via modern machine learning (5R01AI162611-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10434141. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
