New computational methods to dynamically pinpointing the subregions carrying disease-associated rare variants

NIH RePORTER · NIH · R01 · $394,731 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT The high-throughput sequencing technology allows us to query both common and rare variants for complex human diseases. When variants are rare, single variant association analyses suffer from low power. To increase power, existing whole-exome sequencing studies often aggregate the rare variants (RVs) across an entire gene to study their collective effect. Presumably, when a gene harbors many pathogenic RVs, the aggregation will increase the signal-to-noise ratio and thus the power. However, a gene often carries many mutations, while only a subset will lead to novel or altered activities. These mutations usually do not distribute uniformly across the entire gene or domain. For genes whose functional mutations are localized or concentrated to the specific subregions, aggregating all the RVs across the entire gene or domain will dilute the signal, resulting in a loss of power. Besides, even if the gene- or domain-based analysis can identify the pathogenic genes, they cannot pinpoint the pathogenic subregions. Pinpointing the pathogenic subregions is preferred because it is usually more unified in function and will be more informative to the downstream disease mechanism and translational studies. To address these concerns and needs, we propose a novel statistical and computational method for rare-variant association analysis with the three main features. First, it automatically searches the GVSs with different sizes for their disease associations to optimize power. Second, it can pinpoint the disease-associated GVSs with high resolution to facilitate the downstream disease mechanism studies. Third, it can be easily customized to fit the special needs, such as preserving data privacy, incorporating functional annotations, and adjusting for varying ancestry loadings for admixed populations. We will establish a rigorous mathematical and statistical foundation for the GVS analysis and develop the software to realize its implementation on high- throughput sequencing studies. We will apply our method to an ongoing whole-exome sequencing study of amyotrophic lateral sclerosis (ALS) to identify ALS-related genomic subregions.

Key facts

NIH application ID
10924043
Project number
5R01HG012555-03
Recipient
DUKE UNIVERSITY
Principal Investigator
Jichun Xie
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$394,731
Award type
5
Project period
2022-09-23 → 2026-07-31