An Integrated Multilevel Modeling Framework for Repertoire-Based Diagnostics

NIH RePORTER · NIH · R01 · $535,171 · view on reporter.nih.gov ↗

Abstract

Immune-repertoire sequence, which consists of an individual's millions of unique antibody and T-cell receptor (TCR) genes, encodes a dynamic and highly personalized record of an individual's state of health. Our long- term goal is to develop the computational models and tools necessary to read this record, to one day be able diagnose diverse infections, autoimmune diseases, cancers, and other conditions directly from repertoire se- quence. The key problem is how to find patterns of specific diseases in repertoire sequence, when repertoires are so complex. Our hypothesis is that a combination of bottom-up (sequence-level) and top-down (systems- level) modeling can reveal these patterns, by encoding repertoires as simple but highly informative models that can be used to build highly sensitive and specific disease classifiers. In preliminary studies, we introduced two new modeling approaches for this purpose: (i) statistical biophysics (bottom-up) and (ii) functional diversity (top-down), and showed their ability to elucidate patterns related to vaccination status (97% accuracy), viral infection, and aging. Building on these studies, we will test our hypothesis through two specific aims: (1) We will develop models and classifiers based on the bottom-up approach, statistical biophysics; and (2) we will de- velop the top-down approach, functional diversity, to improve these classifiers. To achieve these aims, we will use our extensive collection of public immune-repertoire datasets, beginning with 391 antibody and TCR da- tasets we have characterized previously. Our team has deep and complementary expertise in developing computational tools for finding patterns in immune repertoires (Dr. Arnaout) and in the mathematics that under- lie these tools (Dr. Altschul), with additional advice available as needed regarding machine learning (Dr. AlQuraishi). This proposal is highly innovative for how our two new approaches address previous issues in the field. (i) Statistical biophysics uses a powerful machine-learning method called maximum-entropy modeling (MaxEnt), improving on past work by tailoring MaxEnt to learn patterns encoded in the biophysical properties (e.g. size and charge) of the amino acids that make up antibodies/TCRs; these properties ultimately determine what targets antibodies/TCRs can bind, and therefore which sequences are present in different diseases. (ii) Functional diversity fills a key gap in how immunological diversity has been measured thus far, by factoring in whether different antibodies/TCRs are likely to bind the same target. This proposal is highly significant for (i) developing an efficient, accurate, generative, and interpretable machine-learning method for finding diagnostic patterns in repertoire sequence; (ii) applying a robust mathematical framework to the measurement of immuno- logical diversity; (iii) impacting clinical diagnostics; and (iv) adding a valuable new tool for integrative/big-data medicine. The expected outco...

Key facts

NIH application ID
10050030
Project number
1R01AI148747-01A1
Recipient
BETH ISRAEL DEACONESS MEDICAL CENTER
Principal Investigator
Ramy Arnaout
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$535,171
Award type
1
Project period
2020-05-15 → 2025-04-30