Defining the protein sequence features that control transcriptional activation domain function

NIH RePORTER · NIH · R35 · $379,181 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract Most cell-type specific gene expression arises from transcription factors binding to enhancers and promoters12,13. The last decade has seen explosive growth in the identification of enhancers and cataloging of transcription factors binding data14, but it is still not possible to predict gene expression from genome sequence, in large part because we still have a limited understanding of transcriptional activation domains, the regions that bind coactivator proteins. Our group studies the protein sequence features that control the function of transcriptional activation domains. We seek to understand how activation domains work, how they evolve and how we can predict them from protein sequence. To study how activation domains work, we rationally design mutations to test specific hypotheses. To study activation domain evolution, we survey the diversity of extant orthologs to find highly diverse functional sequences. To predict activation domains from protein sequence, we integrate all these data with interpretable mechanistic predictors and with convolutional neural networks. We combine three approaches to study activation domain function. First, we use high throughput assays in yeast and human cell culture, rationally designed thousands of mutations to test specific hypotheses about function, and integrate these data with machine learning. Second, we use biophysical simulations to study how mutations in activation domains change the 3D structures of these intrinsically disordered regions. Third, we use high-resolution imaging to study how activation domains modulate the movements of individual transcription factor molecules in the nuclei of living cells. These three approaches will reveal the amino acid sequence features that control how activation domains control transcription. Our long term goal is to build a family of computational models that predict activation domains from protein sequence, predict the coactivators each activation domain recruits, and predict how activation domains evolve.

Key facts

NIH application ID: 10714062
Project number: 1R35GM150813-01
Recipient: UNIVERSITY OF CALIFORNIA BERKELEY
Principal Investigator: Max Valentin Staller
Activity code: R35
Funding institute: NIH
Fiscal year: 2023
Award amount: $379,181
Award type: 1
Project period: 2023-08-01 → 2028-07-31