# An Integrated Multilevel Modeling Framework for Repertoire-Based Diagnostics

> **NIH NIH R01** · BETH ISRAEL DEACONESS MEDICAL CENTER · 2021 · $528,873

## Abstract

Immune-repertoire sequence, which consists of an individual's millions of unique antibody and T-cell receptor
(TCR) genes, encodes a dynamic and highly personalized record of an individual's state of health. Our long-
term goal is to develop the computational models and tools necessary to read this record, to one day be able
diagnose diverse infections, autoimmune diseases, cancers, and other conditions directly from repertoire se-
quence. The key problem is how to find patterns of specific diseases in repertoire sequence, when repertoires
are so complex. Our hypothesis is that a combination of bottom-up (sequence-level) and top-down (systems-
level) modeling can reveal these patterns, by encoding repertoires as simple but highly informative models that
can be used to build highly sensitive and specific disease classifiers. In preliminary studies, we introduced
two new modeling approaches for this purpose: (i) statistical biophysics (bottom-up) and (ii) functional diversity
(top-down), and showed their ability to elucidate patterns related to vaccination status (97% accuracy), viral
infection, and aging. Building on these studies, we will test our hypothesis through two specific aims: (1) We
will develop models and classifiers based on the bottom-up approach, statistical biophysics; and (2) we will de-
velop the top-down approach, functional diversity, to improve these classifiers. To achieve these aims, we will
use our extensive collection of public immune-repertoire datasets, beginning with 391 antibody and TCR da-
tasets we have characterized previously. Our team has deep and complementary expertise in developing
computational tools for finding patterns in immune repertoires (Dr. Arnaout) and in the mathematics that under-
lie these tools (Dr. Altschul), with additional advice available as needed regarding machine learning (Dr.
AlQuraishi). This proposal is highly innovative for how our two new approaches address previous issues in the
field. (i) Statistical biophysics uses a powerful machine-learning method called maximum-entropy modeling
(MaxEnt), improving on past work by tailoring MaxEnt to learn patterns encoded in the biophysical properties
(e.g. size and charge) of the amino acids that make up antibodies/TCRs; these properties ultimately determine
what targets antibodies/TCRs can bind, and therefore which sequences are present in different diseases. (ii)
Functional diversity fills a key gap in how immunological diversity has been measured thus far, by factoring in
whether different antibodies/TCRs are likely to bind the same target. This proposal is highly significant for (i)
developing an efficient, accurate, generative, and interpretable machine-learning method for finding diagnostic
patterns in repertoire sequence; (ii) applying a robust mathematical framework to the measurement of immuno-
logical diversity; (iii) impacting clinical diagnostics; and (iv) adding a valuable new tool for integrative/big-data
medicine. The expected outco...

## Key facts

- **NIH application ID:** 10165490
- **Project number:** 5R01AI148747-02
- **Recipient organization:** BETH ISRAEL DEACONESS MEDICAL CENTER
- **Principal Investigator:** Ramy Arnaout
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $528,873
- **Award type:** 5
- **Project period:** 2020-05-15 → 2025-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10165490

## Citation

> US National Institutes of Health, RePORTER application 10165490, An Integrated Multilevel Modeling Framework for Repertoire-Based Diagnostics (5R01AI148747-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10165490. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
