Project Summary/Abstract A majority of heritable disease-causing variation resides in the non-coding portions of the human genome. A leading hypothesis is that most of this variation exerts its effects on cell-type-specific cis- regulatory sequences (CRSs). Interpreting such variation will therefore require quantitative models of the ‘regulatory grammar’ that controls the cell-type-specific activities of CRSs. We define the regulatory grammar of a cell type to be the independent and interacting contributions of transcription factor binding sites (TFBSs) to cis-regulatory activity. Models of regulatory grammar must also include the dependencies of those contributions on the number, orientation, spacing, and affinity of TFBSs. Detailed models of regulatory grammars are still in their infancy, partly because we lack systematic training data for how CRSs behave across diverse cell types in vivo. We propose to address this gap by systematically measuring the activities of CRSs across cell types within intact mammalian tissues. To collect this data, we will introduce a single-cell massively parallel reporter gene assay (scMPRA) that measures the cell-type-specific activities of CRSs in vivo. We will model the resulting data using a formal thermodynamic model in which each TF-DNA or TF-TF interaction is represented by its free energy (ΔG) of interaction. By comparing the magnitudes of the resulting ΔG values, we will quantify the independent and interacting contributions of specific TFBSs, thus deriving quantitative regulatory grammars that capture the differences between cell types within the mammalian retina (Aim 1) and the mammalian brain (Aim 2). By validating our models on sequence variants of endogenous CRSs, we hope to make progress towards a framework for accurately predicting the effects of non-coding genetic variation on the function of CRSs.