PROJECT SUMMARY/ABSTRACT My lab develops statistical methods to characterize genetic architecture and to translate genetic data into biological insights. We are broadly interested in rare variant genetic architecture and its relationship with that of common variants. A particular area of focus is on rare protein coding variation and its functional consequences. Among all known rare-variant associated genes, the majority are driven by protein truncating variants (PTVs). PTVs are consistently deleterious, which is a benefit for association testing, but they make up <10% of coding variants; other types of coding variants – missense, splice site, UTR variants – are functionally heterogenous, with a spectrum of effects including loss, gain and change of function. Their functional heterogeneity poses a challenge when integrating them into existing statistical methods, but it also presents an opportunity biologically. This application focuses on the question: what is the relationship between the functional effect of a mutation and its phenotypic effect on individuals? Proteins and protein coding variants are richly annotated. Variant effect prediction (VEP) has emerged as one of the most successful applications of artificial intelligence in biology; for missense variants alone, at least five sophisticated models (EVE, gMVP, ESM-1b, primateAI-3D, and alphaMissense) have been published in 2021-2023, with clear implications for association testing. Other functional annotations as well – including protein structure predictions, predicted effects on protein binding, and experimental measurements of variant effects – complement VEP methods by capturing the functional heterogeneity of equally pathogenic alleles. We aim to understand the relationship between the functional and phenotypic effects of protein coding variation by integrating genetic association data with a wide range of protein coding functional annotations. We will identify and characterize functionally informed allelic series: multiple independently associated alleles in a gene, whose heterogenous phenotypic effects align with their functional properties. To do so, we will use two complementary approaches, focusing on individual genes and on the exome as a whole. By analyzing individual genes with long, statistically unambiguous allelic series, we will identify functional annotations that are relevant, and we will learn how genes differ from each other in the functional effects of trait-associated variants within them. Then, we will analyze the functional architecture of protein-coding variation using the functionally informed allelic series model, which integrates any number of structural or functional annotations with genetic association data for one or more traits. In GWAS, the combination of genetic association data with regulatory annotations – using integrative statistical methods – has been among the most productive sources of biological insight, both in genome-wide scans and in the di...