Project Summary / Abstract Despite global public health advances, viruses remain a major threat to human health both in the United States and internationally. Recent and continuing outbreaks of SARS-CoV-2, Ebola, Zika, Lassa fever, and Chikungunya, as well as persistent epidemics such as HIV have emphasized the need to understand viral evolution and virus-host interactions during epidemics. Phylogenetic statistical models of viral evolution offer a powerful tool for studying the interplay between viral genetics and environmental or host factors. However, current phylogenetic models are often too inflexible to realistically model these relationships, and those that do are computationally intractable for even moderately sized data sets. This project aims to develop new statistical models that are both flexible enough to model complex biological relationships and scalable to large data sets of viral and host traits. The first aim is to develop more efficient and less biased statistical methods for estimating the heritability of viral phenotypes (e.g. viral load, host CD4 T-cell count, replicative capacity). Current statistical practices typically produced biased heritability estimates and are intractable for large data sets. This project seeks to extend state-of-the-art inference techniques to model the heritability of viral pheno- types (enabling both unbiased and efficient inference) and to apply these new methods to better estimate the heritability of viral load in HIV-1. The second aim seeks to develop statistical methods for studying complex, high-dimensional viral phenotypes such as infection severity which cannot be captured with a single measure- ment. These phenotypes are difficult to quantify due to their inherent complexity, confounding rigorous efforts at, say, identifying unusually virulent viral clades. While phylogenetic factor analysis enables identification and quantification of high-dimensional phenotypes, it scales poorly to large data sets. We propose new inference techniques that address these scalability problems and allow previously intractable analyses. We plan to apply these new methods to study patterns of virulence in Ebola and Lassa fever and to identify unusually virulent viral strains. Additionally, these methods are well suited to identifying epistatic interactions between viral mu- tations and phenotypes of interest, and we plan to explore these interactions in HIV, Zika, and Chikungunya viruses. The third aim is to develop new statistical models specifically designed to predict outcomes of viral infections from viral sequence data. To accommodate the necessary flexibility required by these models, we develop new inference strategies that are both highly generalizable (i.e. they do not rely on strict assumptions in existing models) and computationally efficient. Strong predictive performance would enable researchers or clinicians to predict clinically relevant outcomes using viral sequences, which could help inform treatment. We...