# Pathogenic Variant Discovery Across a Broad Spectrum of Human Diseases

> **NIH NIH R01** · WASHINGTON UNIVERSITY · 2020 · $565,107

## Abstract

Project Summary
Falling costs of generating genomic data and computational advances in discerning health-affecting variants
therein are bringing personalized molecular medicine closer to reality. Progress has also been made on
establishing guidelines (e.g., by the American College of Medical Genetics and Genomics) for the
interpretation of sequence variants. However, the crucial step of systematically and accurately interpreting their
clinical implications remains an unsolved problem. Specifically, clinical interpretation is technically challenging
for several reasons, including: 1) the enormous number of variants in individual genomes, making it difficult to
pinpoint causal variants, 2) limited functional/clinical data at the gene and variant levels, 3) discovery of novel
clinical variants is a tedious low-throughput process using traditional laboratory and clinical approaches, and 4)
conventional bioinformatics tools tend to have insufficient precision based on limitations imposed by linear
sequence analysis alone. As a result, clinical genomics is still far too costly for routine clinical use. To meet the
urgent need of high precision clinical variant interpretation, our proposal aims to 1) build upon existing clinical
knowledge (ClinVar) from ClinGen efforts, 2) utilize rich human variation data in public databases (e.g., ExAC
and dbSNP), and 3) leverage existing and upcoming sequencing data from large disease cohorts and small
family studies; all to support developing/employing a cross-cutting computational/experimental strategy for
clinical variant discovery at a massive scale across a broad spectrum of human diseases. We hypothesize
that variants clustering in 3D spatial proximity to known pathogenic variants have high probabilities of affecting
protein function. We hypothesize further that many pathogenic variants in databases such as ExAC remain
undetected/hidden due to their recessive nature or their rarity that limits statistical power for detection in
association analyses. To test these hypotheses and to establish a database for functionally important variants
associated with human diseases, we propose to develop a software system called ClinPath3D to detect and
characterize clinically relevant pathogenic variants. Essentially, it will utilize protein structures and variant
pathogenicity potential to identify 3D spatial pathogenic variant clusters (PVCs) (Aim 1). We will then apply
ClinPath3D to interpret rare variants of unknown significance (VUS) from the ExAC, dbSNP, and other variant
databases using pathogenic variants obtained from ClinVar as nucleation points for clustering, all with a view
toward discerning disease variants in the general population (Aim 2). Finally, we will use large sequencing data
sets (CCDG, TopMed, UK100K) to statistically assess variant enrichment in specific disease cohorts and will
further improve positive results by experimentally characterizing 50-100 high-priority variants in kinases and
50-100 in tr...

## Key facts

- **NIH application ID:** 9952397
- **Project number:** 5R01HG009711-04
- **Recipient organization:** WASHINGTON UNIVERSITY
- **Principal Investigator:** FENG CHEN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $565,107
- **Award type:** 5
- **Project period:** 2017-09-04 → 2022-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9952397

## Citation

> US National Institutes of Health, RePORTER application 9952397, Pathogenic Variant Discovery Across a Broad Spectrum of Human Diseases (5R01HG009711-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9952397. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
