Scalable Inference in Statistical Models of Viral Evolution and Human Health

NIH RePORTER · NIH · F31 · $25,832 · view on reporter.nih.gov ↗

Abstract

Project Summary / Abstract Despite global public health advances, viruses remain a major threat to human health both in the United States and internationally. Recent and continuing outbreaks of SARS-CoV-2, Ebola, Zika, Lassa fever, and Chikungunya, as well as persistent epidemics such as HIV have emphasized the need to understand viral evolution and virus-host interactions during epidemics. Phylogenetic statistical models of viral evolution offer a powerful tool for studying the interplay between viral genetics and environmental or host factors. However, current phylogenetic models are often too inﬂexible to realistically model these relationships, and those that do are computationally intractable for even moderately sized data sets. This project aims to develop new statistical models that are both ﬂexible enough to model complex biological relationships and scalable to large data sets of viral and host traits. The ﬁrst aim is to develop more efﬁcient and less biased statistical methods for estimating the heritability of viral phenotypes (e.g. viral load, host CD4 T-cell count, replicative capacity). Current statistical practices typically produced biased heritability estimates and are intractable for large data sets. This project seeks to extend state-of-the-art inference techniques to model the heritability of viral pheno- types (enabling both unbiased and efﬁcient inference) and to apply these new methods to better estimate the heritability of viral load in HIV-1. The second aim seeks to develop statistical methods for studying complex, high-dimensional viral phenotypes such as infection severity which cannot be captured with a single measure- ment. These phenotypes are difﬁcult to quantify due to their inherent complexity, confounding rigorous efforts at, say, identifying unusually virulent viral clades. While phylogenetic factor analysis enables identiﬁcation and quantiﬁcation of high-dimensional phenotypes, it scales poorly to large data sets. We propose new inference techniques that address these scalability problems and allow previously intractable analyses. We plan to apply these new methods to study patterns of virulence in Ebola and Lassa fever and to identify unusually virulent viral strains. Additionally, these methods are well suited to identifying epistatic interactions between viral mu- tations and phenotypes of interest, and we plan to explore these interactions in HIV, Zika, and Chikungunya viruses. The third aim is to develop new statistical models speciﬁcally designed to predict outcomes of viral infections from viral sequence data. To accommodate the necessary ﬂexibility required by these models, we develop new inference strategies that are both highly generalizable (i.e. they do not rely on strict assumptions in existing models) and computationally efﬁcient. Strong predictive performance would enable researchers or clinicians to predict clinically relevant outcomes using viral sequences, which could help inform treatment. We...

Key facts

NIH application ID: 10394133
Project number: 5F31AI154824-02
Recipient: UNIVERSITY OF CALIFORNIA LOS ANGELES
Principal Investigator: Gabriel William Hassler
Activity code: F31
Funding institute: NIH
Fiscal year: 2022
Award amount: $25,832
Award type: 5
Project period: 2021-05-01 → 2022-11-06