PROJECT SUMMARY/ABSTRACT In stark contrast to Mendelian disorders, the majority of complex trait-associated common variants map to non-protein coding regions. Since there is a less well-developed genetic code for the much larger non- protein coding portion of the genome, identifying the gene(s) and causal alleles underlying non- Mendelian/complex traits presents a challenge. Given the rapidity with which genome wide association studies (GWAS) are discovering regions associated with complex traits, causal allele and susceptibility gene identification have become severe bottlenecks. The overall goal of this proposal is to outline a rigorous and comprehensive strategy to discover functionally causal variants and their target genes. While the proposal focuses on prostate cancer, the strategies can be applied to any non-protein coding locus. The central hypothesis is that cancer risk loci are regulatory elements. Recent data convincingly demonstrate that GWAS loci are enriched for regulatory elements. Regulatory elements control the level of expression of genes. Causal variants are difficult to discover because the scientific community is less adept at annotating the non-protein coding portion of the genome. This proposal seeks to develop a novel computational and statistical framework to prioritize candidate causal variants and then to experimentally validate these predictions. The proposal will jointly model quantitative trait loci (QTL) and allelic imbalance (AI) signals in epigenetic data (ChIP-seq and ATAC-seq) and transcripts (RNA-seq) in a novel framework that we term cistrome wide association studies (CWAS). The most significant CWAS loci will be subjected to epigenome and genome editing to functionally characterize and identify causal variants. Aim 1 will utilize novel experimental methods to create the large-scale datasets that will inform Aim 2. The ultimate goal of Aim 1 is to perform H3K27 acetylation, and AR chromatin immunoprecipitation and high- throughput sequencing (ChIP-seq) to annotate active enhancers, Assay for Transposase-Accessible Chromatin (ATAC-seq) to identify open chromatin, and RNA-seq. All of these data will be subjected to target enrichment at a predefined set of variants, which will enable the rigorous and systematic measurement of AI at heterozygote sites. Aim 2 will utilize these data in a structured framework to computationally identify statistically significant CWAS prostate risk loci that. These loci will be experimentally tested in Aim 3 where Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) evaluation of the candidate causal variants will be performed. At the completion of this project, we fully anticipate that we will have begun to unravel the causal (i.e., pathogenic) variants that initiate human prostate cancer. Discovering the mechanisms underlying human traits will not only inform the biology of disease, but may also reveal opportunities to more rationally intervene in treatment and preven...