ABSTRACT Noncoding genetic variation that alters gene regulation is of paramount importance for health, disease, and evolution. Diseases ranging in incidence from the most common to the most rare all have substantial risk associated with regulatory variation; and most of the genetic differences between closely related species are noncoding. Whole genome sequencing can directly identify that variation but to realize its potential to elucidate the genetic determinants of health and disease, will require accurate annotation of this noncoding variation for functionality. In coding sequence, the genetic code allows variants to be annotated to a rough hierarchy of likely functional effects and pathogenicity. In noncoding sequence such annotation is less clear. Perturbation assays, i.e., assays that modify genetic or epigenetic states and measure the effect of those perturbations on regulatory endpoints, offer a possible path to annotating noncoding variation. However, to fully leverage this data, novel and sophisticated statistical and machine learning approaches are required to extract useful information from those assays, to integrate that information across regulatory endpoints, and to extrapolate findings so that annotation of previously unobserved (unperturbed) variation in diverse cell types is possible. The goal of the Duke Prediction Center is to develop the analytic approaches and tools that will allow for the routine annotation of noncoding variation for functionality and ultimately pathogenicity. Aim 1 is to establish best practices in perturbation assay design and analysis. This will allow IGVF characterization centers design their experiments so that, when coupled with optimized analyses, the data produced will be maximally informative for subsequent predictive modeling. Aim 2 is to develop novel mechanistic machine learning approaches for predicting the functional effect of noncoding variation on function in diverse cell-types. Aim 3 is to identify noncoding genomic regions that are subject to functional constraint which will be leveraged in prioritizing variants for pathogenicity. The expected outcomes of this project will be (i) robust estimates of optimal experimental design parameters and recommendations for analysis tools and best practices for the various assays used within the IGVF consortium, (ii) predicted functional effects of observed variation to be shared through the IGVF variant/phenotype catalog as well as a state-of-the-art machine learning method (and associated tools) that can identify previously-unknown interactions among genomic variants, both observed and novel, and predict their functional impact in diverse cell types, and (iii) a list of regulatory elements subject to functional constraint shared through the IGVF variant/phenotype catalog and a principled prioritization framework (and associated tools) for interpreting variation within patient genomes for pathogenicity. Due to the considerable success of genetics, t...