# Quantitative Modeling of Transcription Factor-DNA Binding

> **NIH NIH R35** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2021 · $523,665

## Abstract

Title: Quantitative Modeling of Transcription Factor–DNA Binding
PI: Rohs, Remo
PROJECT SUMMARY
Genes are regulated through transcription factor (TF) binding to specific DNA target sites in the genome.
These target sites are recognized through several layers of specificity determinants. The most extensively
studied layer of binding specificity are hydrogen bonds and hydrophobic contacts between protein amino acids
and functional groups of the base pairs mainly in the major groove. Base readout recognizes nucleotide
sequence within a short core-binding site of only a few base pairs. However, these distinct sequence
combinations in a TF binding motif occur many times in the genome and only a very small fraction of putative
binding sites are functional. It is still unknown how a TF locates and identifies its in vivo binding sites in the
plethora of possible genomic target sites. Recognition of three-dimensional DNA structure is an additional layer
that refines base readout. While the latter is restricted to direct contacts with the core motif, shape readout is a
mechanism through which flanking regions of the core motif or spacer regions between half-sites of dimeric
TFs contribute to binding specificity. Other layers of in vivo TF binding determinants are chromatin structure,
DNA accessibility, histone modifications, DNA methylation, cofactors and cooperative binding, and cell type.
Given this multi-layer nature of TF recognition, we will develop quantitative models to predict TF binding with
high accuracy. More important, however, is that our models will reveal recognition mechanisms in the absence
of experiment-based structural information. We will build models where each distinct layer of TF binding
specificity determinants is added to a base-line model combining DNA sequence and shape. Since it is
expected that the importance of each of these TF binding specificity determinants will vary dramatically across
protein families, we will use feature selection to identify relative contributions of each feature group as a
function of TF or TF family. We will also develop a deep learning framework where individual feature modules
can be added or removed from the input layer of convolutional neural networks. This approach will leverage
the advantages of deep learning while circumventing the “black box” nature of standard deep learning
methods. We will also generate experimental data for specific TFs using the SELEX-seq technology. This
approach is currently able to probe the effect of cofactors, cooperative binding, and protein mutations on the
binding specificity of a TF. We will add nucleosomes to the SELEX-seq binding assay and, thereby, probe
chromatin effects on TF binding using an in vitro experiment in the absence of other cellular contributions. This
project will result in a better mechanistic understanding of TF-DNA binding and reveal the impact of various
specificity determinants across multiple scales. The new insights will describe different...

## Key facts

- **NIH application ID:** 10189652
- **Project number:** 5R35GM130376-03
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Remo Rohs
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $523,665
- **Award type:** 5
- **Project period:** 2019-07-09 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10189652

## Citation

> US National Institutes of Health, RePORTER application 10189652, Quantitative Modeling of Transcription Factor-DNA Binding (5R35GM130376-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10189652. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
