PROJECT SUMMARY Genetic variation affecting gene expression level and splicing accounts for a large proportion of phenotypic variation between humans, including health and disease. The variants that underlie these phenotypic changes are often discovered by associating individuals’ gene expression data with their genotypes. These methods can be confounded by population structure in the sample, which leads to false positive and negative errors. As such, samples are often selected from relatively homogenous populations. However, this limits the applicability of results to populations not included in the study, and limits the resolution at which potentially causal variants can be identified. Previous work has shown that controlling for population structure locally across the genome in association studies of diverse samples serves to reduce error. However, these methods assign individuals to one of a few ancestral populations and do not fully capture the relatedness between included samples. To extend the results of association studies to diverse cohorts, I will develop a method to control for local relatedness between samples in association studies. The Ancestral Recombination Graph (ARG) is a data structure which encodes the genealogical relationships between samples at each locus along the genome. In Aim 1, I will develop a linear mixed model approach for association mapping that utilizes a similarity matrix derived from the ARG to control for local relatedness between samples. One barrier in extending the results of association studies investigating gene expression is that the majority of data currently available is from individuals of European descent. To address this limitation, I recently generated gene expression data for a large, globally diverse human sample. In Aim 2, I will use the method developed in Aim 1 to map expression level- and splicing-associated variation in this sample. I will then investigate enrichment of epigenomic features near associated variants to determine the functional mechanisms by which they may be driving transcription differences, and I will intersect my findings with previously discovered disease associations. Using this globally diverse dataset, I will also explore the diversity and evolution of human gene expression, elucidating the extent to which patterns of gene expression are partitioned within versus between populations and the sources of such stratification. Extending association studies to diverse cohorts requires not only diverse datasets, but also tools that can appropriately control for patterns of population structure within those datasets; the research proposed here addresses both goals. This will allow the discovery of associations in previously underrepresented groups and will also serve to improve confidence in discovering causal variants. Together, this proposed work will characterize the functional mechanisms linking genetic variation and phenotypic differences in a globally diverse human coh...