# Modeling gene expression in yeast using large degenerate libraries

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $347,304

## Abstract

PROJECT SUMMARY
Short sequence elements in DNA and RNA determine the levels and composition of mRNAs and proteins,
making it critical that we can accurately model how any given sequence will affect transcription, splicing or
translation. Such models of cis-regulation will fill in gaps in our knowledge of these core gene expression
processes. Additionally, as large numbers of human genomes are sequenced, the ability to predict the effects
of sequence variation on the ultimate levels of proteins will be integral to the interpretation of variation in
regulatory sequences. Similarly, the construction of metabolic pathways with defined levels of expression and
the engineering of synthetic gene networks require accurate knowledge of how regulatory sequences affect
expression. This application seeks to use the yeast Saccharomyces cerevisiae as a test case for learning how
any short regulatory sequence affects protein levels. A predictive model will be trained on a set of libraries two
orders of magnitude more complex than have been characterized to date. Libraries will be generated of a
growth reporter gene with a million random sequences of 50 nucleotides that comprise either a DNA element
that regulates transcription or an RNA element that regulates splicing or translation. The libraries will be
transformed into yeast, and the yeast will be placed under selection such that they grow according to the ability
of each random sequence to contribute to protein expression. A convolution neural network approach will be
used to learn the relationship between these “fitness” phenotypes and their associated genotypes. Although
yeast is a single-celled eukaryote, it has been the source of most of the original findings on gene expression,
and these findings form the basis for much of our knowledge of more complex eukaryotes. Furthermore, the
short sequences in yeast that comprise the DNA- and RNA-binding sites of regulatory proteins tend to be
comparable in size to those of other organisms. Yeast is used often in synthetic biology and metabolic
engineering, and the work proposed here will result in novel tools for quantitatively controlling its gene
expression. Initial results with a library of 5' untranslated regions (UTRs) indicate that we can construct a
model to account for a large fraction of the observed variability in expression, and that the model extends to
native sequence elements. The model allowed us to forward engineer 5' UTRs to have increased activity.
Specific aims of this application are to assess the effects of random sequences targeted to upstream
regulatory elements, core promoter elements, 5' UTRs, introns and 3' UTRs; to learn predictive and
interpretable models using convolutional neural networks and to identify novel functional cis-regulatory
elements; and to validate our models on native sequences and combinatorial libraries, and by engineering
synthetic sequence elements with user-specified properties. In sum, the proposal seeks to const...

## Key facts

- **NIH application ID:** 9929602
- **Project number:** 5R01GM125809-03
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** STANLEY FIELDS
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $347,304
- **Award type:** 5
- **Project period:** 2018-08-01 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9929602

## Citation

> US National Institutes of Health, RePORTER application 9929602, Modeling gene expression in yeast using large degenerate libraries (5R01GM125809-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9929602. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*