PROJECT SUMMARY / ABSTRACT A central goal of genomics is to understand the relationship between genotype and phenotype. In recent years, the ability to quantitatively study genotype-phenotype maps has been revolutionized by the development of multiplex assays of variant effect (MAVEs), which measure molecular phenotypes for thousands to millions of genotypic variants in parallel. MAVE is an umbrella term that includes massively parallel reporter assays for studies of DNA or RNA regulatory sequences, as well as deep mutational scanning assays of proteins or structural RNAs. The rapid adoption of MAVE techniques across multiple genomic disciplines has created an acute need for computational methods that can robustly and reproducibly infer quantitative genotype- phenotype (G-P) maps from the large datasets that MAVEs produce. Here we propose a unified conceptual and computational framework for quantitatively modeling G-P maps from MAVE data. This proposal is motivated by our realization that accounting for the noise and nonlinearities that are omnipresent in MAVE experiments requires explicit modeling of both the MAVE measurement process and the G-P map of interest. This joint inference strategy is more computationally demanding than most MAVE analysis methods, but it is feasible using modern deep learning frameworks. Our extensive preliminary data show that this modeling strategy is able to recover high-precision G-P maps even in the presence of major confounding effects, and thus has the potential to benefit MAVE studies in multiple areas of genomics. Aim 1 will develop methods for modeling the measurement processes that arise in diverse MAVE experimental designs. Aim 2 will develop general methods for modeling genetic interactions within G-P maps, and will use these methods in conjunction with new experiments to elucidate the molecular mechanism of a recently approved drug that targets alternative mRNA splicing. Aim 3 will develop methods for inferring G-P maps that reflect biophysical models of gene regulation, including both thermodynamic (i.e., quasi-equilibrium) and kinetic (i.e., non-equilibrium steady-state) models. These methods will then be used, in conjunction with new MAVE experiments, to develop a biophysical model for how a pleiotropic transcription factor regulates gene expression throughout the Escherichia coli genome. Aim 4 will study and develop methods for treating gauge freedoms and sloppy modes in the above classes of models, thereby facilitating the comparison, interpretation, and exploration of inferred G-P maps. All of the computational techniques we develop will be incorporated into a robust and easy- to-use Python package called MAVE-NN. We will benchmark MAVE-NN on a diverse array of MAVE datasets, including published datasets and data generated as part of this project. In all, this work will fill a major need in the analysis of MAVE experiments, yielding a robust, flexible, and scalable computational platform that will h...