A next-generation morbid map of the human genome

NIH RePORTER · NIH · R35 · $450,894 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT The impact of next-generation sequencing (NGS) on gene discovery and molecular diagnostics for Mendelian conditions (MCs) is hard to overstate. However, to provide affected individuals with precise natural history, recurrence risk, and prognosis in clinical settings, identification of pathogenic variant(s)/genotypes alone is often insufficient. This is a challenge most notably for genes that cause more than one MC or ~25% of all genes that underlie MCs. In such an instance, even with a known genotype a patient's phenotype has to be compared to that of all MCs caused by variants in a gene to determine which MC, if any, is the likely diagnosis or whether patient instead has a novel condition. This comparison is increasingly difficult because delineation of the ~5,100 MCs currently known has typically been based on subjective grouping of affected individuals by phenotypic similarity. We propose to develop a quantitative framework for assessing overlap among the distributions of phenotypes due to pathogenic genotypes the same gene and apply this framework genome-wide. NGS has enabled identification of causal genotypes in hundreds of thousands of individuals with MCs, providing a sufficiently large dataset that it is now feasible to use machine learning to quantitatively and systematically identify “clusters” of co-occurring genotypes and phenotypic features for each known gene. We will refine and validate our approach by comparing differences between conventionally-delineated and quantitatively-delineated MCs and by assessing the similarity of individuals with well-studied atypical phenotypes/genotypes to quantitatively-delineated MCs. We will then apply the optimal strategy across the genome to generate a “next- generation morbid map” based on quantitatively-delineated MCs. We will also apply machine learning approaches to identify genomic properties associated with the propensity for each gene to underlie multiple MCs (i.e., the numeric contribution of each gene to the morbid map or phenotropy). This will enable a more precise and complete understanding of the genotypic and phenotypic spectrum of each MC, enable more objective diagnosis of individuals with atypical phenotypes, and more robustly identify the existence of multiple MCs among individuals with non-specific “class” phenotypes (e.g., developmental delay, autism, hearing impairment). We will make all newly developed methods publicly available via interactive and programmatic web-based tools to facilitate extension of this work to other human and model organism datasets.

Key facts

NIH application ID
10427402
Project number
5R35HG011297-03
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Jessica Xiao-Ling Chong
Activity code
R35
Funding institute
NIH
Fiscal year
2022
Award amount
$450,894
Award type
5
Project period
2020-09-01 → 2025-06-30