# COMPUTATIONAL TOOLS FOR THE ANALYSIS OF HIGH-THROUGHPUT IMMUNOGLOBULIN SEQUENCING EXPERIMENTS

> **NIH NIH R01** · YALE UNIVERSITY · 2020 · $415,624

## Abstract

PROJECT SUMMARY/ABSTRACT
The ability of our immune system to respond effectively to pathogenic challenge or vaccination depends on a
diverse repertoire of Immunoglobulin (Ig) receptors expressed by B lymphocytes. Each B cell receptor (BCR) is
unique, having been assembled during lymphocyte development by recombination of germline encoded V(D)J
genes. During the course of an immune response, B cells that initially bind antigen with low affinity through
their BCR are modified through cycles of somatic hypermutation (SHM) and affinity-dependent selection to
produce high-affinity memory and plasma cells. This affinity maturation is a critical component of T cell
dependent adaptive immune responses. It helps guard against rapidly mutating pathogens and underlies the
basis for many vaccines, but dysregulation can result in autoimmunity and other diseases. Next-generation
sequencing (NGS) technologies have revolutionized our ability to carry out large-scale adaptive immune
receptor repertoire sequencing (AIRR-Seq) experiments. AIRR-Seq is increasingly being applied to profile
BCR repertoires and gain insights into immune responses in healthy individuals and those with a range of
diseases, including autoimmunity, infection, allergy, cancer and aging. As NGS technologies improve, these
experiments are producing ever larger datasets, with tens- to hundreds-of-millions of BCR sequences.
Although promising, repertoire-scale data present fundamental challenges for analysis requiring the
development of new techniques and the rethinking of existing methods that are not scalable to the large
number of sequences being generated. This proposal describes the development of a series of novel
computational methods to explore the central hypothesis that: B cell clonal relationships and lineage
structures can be computationally derived from repertoire sequencing data and used to define B cell
migration and differentiation networks in health and disease. Specifically, computational methods will be
developed to: (Aim 1) identify clonally-related sequences and improve V(D)J gene assignment through
determining the Ig locus haplotype, (Aim 2) reconstruct clonal lineages, and use these to learn B cell migration
and differentiation networks, and (Aim 3) analyze sequences to predict repertoire properties and sequence
motifs that are associated with antigen binding or clinically-relevant outcomes. These
through
human
a combination of simulation-based studies, as
(myasthenia gravis) and murine (endogenous
methods will be validated
well as testing on new experimental data from both
retrovirus emergence) systems. Allmethods will be
integrated and made available through our widely-used, open-source Immcantation framework, which provides
a start-to-finish analytical ecosystem for AIRR-Seq analysis. Together, these methods provide a window into
the micro-evolutionary dynamics that drive adaptive immunity and the dysregulation that occurs in disease.

## Key facts

- **NIH application ID:** 9849157
- **Project number:** 5R01AI104739-06
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** Steven H. Kleinstein
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $415,624
- **Award type:** 5
- **Project period:** 2014-04-15 → 2022-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9849157

## Citation

> US National Institutes of Health, RePORTER application 9849157, COMPUTATIONAL TOOLS FOR THE ANALYSIS OF HIGH-THROUGHPUT IMMUNOGLOBULIN SEQUENCING EXPERIMENTS (5R01AI104739-06). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9849157. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*