BioMADE: Development of genomic language models to predict optimal genomes for commercial protein production

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $449,999 · view on nsf.gov ↗

Abstract

Microbes are being used to produce a range of products. Some of those products are proteins, such as insulin or blood-clotting factors. In many cases, fungal strains are used because they have protein secretion systems. However, the secretion systems are not optimized for throughput, so strain improvement is needed. To facilitate this process, AI methods will be employed to develop a digital model that emulates most of the behaviors of the original fungal cells. This is commonly referred to as a digital twin. The digital twin will be used to identify genetic edits that will improve protein synthesis and secretion. This will be accomplished for many fungal strains and behaviors. The models will be made publicly available using open-source software. A hierarchical genome-to-phenotype model (G2PM) for fungal systems will be developed. The focus will be on Pichia strains. This model will link DNA sequence to gene expression and, ultimately, to strain-level protein secretion performance. More than 8,000 diverse fungal genomes will be curated to train genomic language models (gLMs). These are deep neural networks that learn complex probability distributions over nucleotide sequences. These pretrained fungal gLMs and their learned embeddings will then be integrated into a sequence-to-expression model that predicts high-resolution RNA-seq profiles directly from genomic sequence, across multiple fungal species and environmental contexts. In parallel, the team will generate and pub

Key facts

NSF award ID
2533077
Awardee
University of California-Berkeley (CA)
SAM.gov UEI
GS3YEVSS12N6
PI
Yun S Song
Primary program
01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs
Quantitative sys bio and biotech, NANOSCALE BIO CORE
Estimated total
$449,999
Funds obligated
$449,999
Transaction type
Standard Grant
Period
09/01/2025 → 08/31/2027