# New computational methods to dynamically pinpointing the subregions carrying disease-associated rare variants

> **NIH NIH R01** · DUKE UNIVERSITY · 2024 · $394,731

## Abstract

PROJECT SUMMARY/ABSTRACT
The high-throughput sequencing technology allows us to query both common and rare variants for complex
human diseases. When variants are rare, single variant association analyses suffer from low power. To increase
power, existing whole-exome sequencing studies often aggregate the rare variants (RVs) across an entire gene
to study their collective effect. Presumably, when a gene harbors many pathogenic RVs, the aggregation will
increase the signal-to-noise ratio and thus the power. However, a gene often carries many mutations, while only
a subset will lead to novel or altered activities. These mutations usually do not distribute uniformly across the
entire gene or domain. For genes whose functional mutations are localized or concentrated to the specific
subregions, aggregating all the RVs across the entire gene or domain will dilute the signal, resulting in a loss of
power. Besides, even if the gene- or domain-based analysis can identify the pathogenic genes, they cannot
pinpoint the pathogenic subregions. Pinpointing the pathogenic subregions is preferred because it is usually
more unified in function and will be more informative to the downstream disease mechanism and translational
studies. To address these concerns and needs, we propose a novel statistical and computational method for
rare-variant association analysis with the three main features. First, it automatically searches the GVSs with
different sizes for their disease associations to optimize power. Second, it can pinpoint the disease-associated
GVSs with high resolution to facilitate the downstream disease mechanism studies. Third, it can be easily
customized to fit the special needs, such as preserving data privacy, incorporating functional annotations, and
adjusting for varying ancestry loadings for admixed populations. We will establish a rigorous mathematical and
statistical foundation for the GVS analysis and develop the software to realize its implementation on high-
throughput sequencing studies. We will apply our method to an ongoing whole-exome sequencing study of
amyotrophic lateral sclerosis (ALS) to identify ALS-related genomic subregions.

## Key facts

- **NIH application ID:** 10924043
- **Project number:** 5R01HG012555-03
- **Recipient organization:** DUKE UNIVERSITY
- **Principal Investigator:** Jichun Xie
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $394,731
- **Award type:** 5
- **Project period:** 2022-09-23 → 2026-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10924043

## Citation

> US National Institutes of Health, RePORTER application 10924043, New computational methods to dynamically pinpointing the subregions carrying disease-associated rare variants (5R01HG012555-03). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10924043. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
