# Leveraging machine learning and evolution to navigate sequence-function landscapes in multidomain proteins

> **NIH NIH K99** · UNIVERSITY OF CALIFORNIA BERKELEY · 2024 · $113,368

## Abstract

Project Summary/Abstract
 Allostery, the phenomenon describing how the state of one site in a protein is coupled to
the state of a distal site, is a fundamental driver of functional evolution in protein families. It is
especially impactful in multimeric and multidomain proteins – those that arise from the
recombination of protein domains that are structurally and functionally distinct. The goal of this
proposal is to develop methods that combine computational and experimental approaches to
understand the role of allostery in establishing new functions by coupling enzymatic activity to
biological processes at the membrane. Insights gained in this work will enable us to better
understand how domain recombination has expanded the functional repertoires of protein
families, and will enable more efficient engineering of synthetic proteins.
 In Aim 1 of this proposal, I will leverage recent advances in machine learning and
computational geometry to develop more accurate generative models of protein families that
implicitly account for evolutionary processes that act upon them.
 In Aim 2, I will conduct a systematic investigation into sequence-function landscape of a
dimeric bacterial bicarbonate transporter that couples proton transport across the membrane to
enzymatic production of bicarbonate. Using deep mutational scans in the context of a
suppressor screen, I will identify sequence positions that decouple enzymatic activity from
proton transport and will use this knowledge to test structure-function hypotheses related to
allostery in this protein system.
 In Aim 3, I will use machine learning models fit to protein families to rationally design
focused deep mutational scans to explore allostery in human atrial natriuretic peptide receptors.
These receptors directly couple ligand binding to secondary messenger production in a single
polypeptide chain containing multiple distinct domains. Using information from evolution will help
me make more effective use of an experimentally limited mutational budget and will allow me to
interrogate the higher order interactions that are a hallmark of allosteric networks.
 My background in structural biology and subsequent training in biological machine
learning give me a unique perspective and skillset to tackle these challenging problems. The
engaging scientific environment at UC Berkeley, and the strong support of my mentors Dr. Yun
Song and Dr. David Savage will enable me to more seamlessly operate at the interface of
computation and experimentation in biology as I launch my independent research career.

## Key facts

- **NIH application ID:** 10785243
- **Project number:** 1K99GM152766-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA BERKELEY
- **Principal Investigator:** Antoine Koehl
- **Activity code:** K99 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $113,368
- **Award type:** 1
- **Project period:** 2023-12-01 → 2025-11-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10785243

## Citation

> US National Institutes of Health, RePORTER application 10785243, Leveraging machine learning and evolution to navigate sequence-function landscapes in multidomain proteins (1K99GM152766-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10785243. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
