# Defining the protein sequence features that control transcriptional activation domain function

> **NIH NIH R35** · UNIVERSITY OF CALIFORNIA BERKELEY · 2023 · $379,181

## Abstract

Project Summary/Abstract
 Most cell-type specific gene expression arises from transcription factors binding to
enhancers and promoters12,13. The last decade has seen explosive growth in the identification of
enhancers and cataloging of transcription factors binding data14, but it is still not possible to
predict gene expression from genome sequence, in large part because we still have a limited
understanding of transcriptional activation domains, the regions that bind coactivator proteins.
 Our group studies the protein sequence features that control the function of
transcriptional activation domains. We seek to understand how activation domains work, how
they evolve and how we can predict them from protein sequence. To study how activation
domains work, we rationally design mutations to test specific hypotheses. To study activation
domain evolution, we survey the diversity of extant orthologs to find highly diverse functional
sequences. To predict activation domains from protein sequence, we integrate all these data
with interpretable mechanistic predictors and with convolutional neural networks.
 We combine three approaches to study activation domain function. First, we use high
throughput assays in yeast and human cell culture, rationally designed thousands of mutations
to test specific hypotheses about function, and integrate these data with machine learning.
Second, we use biophysical simulations to study how mutations in activation domains change
the 3D structures of these intrinsically disordered regions. Third, we use high-resolution imaging
to study how activation domains modulate the movements of individual transcription factor
molecules in the nuclei of living cells. These three approaches will reveal the amino acid
sequence features that control how activation domains control transcription.
 Our long term goal is to build a family of computational models that predict activation
domains from protein sequence, predict the coactivators each activation domain recruits, and
predict how activation domains evolve.

## Key facts

- **NIH application ID:** 10714062
- **Project number:** 1R35GM150813-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA BERKELEY
- **Principal Investigator:** Max Valentin Staller
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $379,181
- **Award type:** 1
- **Project period:** 2023-08-01 → 2028-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10714062

## Citation

> US National Institutes of Health, RePORTER application 10714062, Defining the protein sequence features that control transcriptional activation domain function (1R35GM150813-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10714062. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
