# Machine Learning Models for Studying Protein Interactions in the Context of Immune Receptors

> **NIH NIH R35** · ARIZONA STATE UNIVERSITY-TEMPE CAMPUS · 2024 · $339,346

## Abstract

Project Summary / Abstract
Protein interactions are the fundamental basis of all cellular processes. Proteins interact with each other to form
complexes that carry out a wide range of functions, from signal transduction, and gene regulation, to DNA re-
pair. Disruptions in protein interactions are implicated in a wide range of human diseases. While wet-lab assays
to study protein interactions are indispensable, with advancements in algorithms and machine learning, com-
putational methods for predicting protein interactions have the potential to revolutionize our understanding of
cellular processes, identify new drug targets, and develop more effective therapies. Our research applies domain
knowledge from biological sequence analysis, structural biology, and machine learning to computationally predict
whether given protein complexes will interact or generate novel protein receptors that may recognize target lig-
ands. These computational algorithms and machine learning models can be used to 1) develop new therapeutic
molecules to treat infectious diseases or cancer, and 2) produce new diagnostic tools to detect abnormality in
cells. To provide biological sequences (such as protein sequences) as input to these computational methods, one
must ﬁrst express them as a ﬁxed-size numeric vector, often referred to as an embedding of the input sequence.
However, the mainstream embedding techniques for biological sequences are simple adaptations of embedding
techniques from the ﬁeld of natural language processing. Biological sequences are highly complex and struc-
tured, where the unit of information is less noticeable when compared to natural languages. Two primary goals of
the proposed research in this project are: 1) pinpointing the determinant of an effective embedding of biological
sequences to have generalized principles to design protein language models for a given speciﬁc family of pro-
teins, and 2) applying these embedding techniques to better generate immune receptors such as T cell receptors
(TCRs) and B cell receptors (BCRs) that interact with a target epitope. Both research goals build on our previous
TCR embedding model that boosts downstream model performance by a wide margin on TCR-epitope binding
prediction and clustering of TCR repertoire. The outcome of this project will be a uniﬁed computational frame-
work for predicting protein interactions and designing novel TCRs and BCRs, which will have a profound impact
on human health.

## Key facts

- **NIH application ID:** 10942005
- **Project number:** 1R35GM155417-01
- **Recipient organization:** ARIZONA STATE UNIVERSITY-TEMPE CAMPUS
- **Principal Investigator:** Heewook Lee
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $339,346
- **Award type:** 1
- **Project period:** 2024-07-10 → 2029-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10942005

## Citation

> US National Institutes of Health, RePORTER application 10942005, Machine Learning Models for Studying Protein Interactions in the Context of Immune Receptors (1R35GM155417-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10942005. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
