A Multi-omic approach towards improving candidate gene identification and variant prioritization in patients with congenital heart disease

NIH RePORTER · NIH · R21 · $115,500 · view on reporter.nih.gov ↗

Abstract

Project Summary Identification of the genetic basis for congenital heart disease (CHD) has benefitted from advances in exome sequencing (ES) and genome sequencing (GS) pipelines. Large cohort studies, such as the NHLBI-funded Pediatric Cardiovascular Genomics Consortium (PCGC), have sequenced the exomes or genomes of nearly 3000 CHD patients and identified variants with a high likelihood of contributing to CHD. Using approaches that identified rare variants enriched in CHD patient populations and damaging effect prediction algorithms that supported pathogenicity, a list of potentially pathogenic variants has been identified. In further support of pathogenicity, these variants are found in genes which have prior association with human CHD or have been implicated in heart development in animal models. While this approach has aided in identification of novel variants, more than one potential genetic variant is identified in many cases rendering follow-up analyses difficult. In the proposed exploratory grant, we will investigate the use of machine learning to use data obtained from transcriptomic analysis of both mouse and induced pluripotent stem cell (iPSC) models of CHD. Rather than building a common analytical pipeline by including all possible candidate genes for all CHDs, we will use genes differentially regulated in CHD model systems that display phenotypes observed in the patient to prioritize variants. To achieve this, the patient’s diagnosis will be used as input to identify RNA-seq datasets from mouse/iPSC models with similar diagnoses from the Gene Expression Omnibus (GEO) database. The genes differentially expressed in these datasets will carry additional weight in the prioritization pipeline. Simultaneously, we will examine the expression of the genes in single-cell RNAseq datasets from developing human embryonic hearts. This will allow us to evaluate a gene’s expression in relevant cell-types that contribute to normal heart development. Genes that are observed in multiple patients with overlapping subtypes of CHD will be presented as prioritized variants. This analysis pipeline will not exclude any genetic variant from consideration as a candidate but will use expression analysis in CHD-model systems and single-cell transcriptomic data to rank the variants. The result of this pipeline will be a ranked list of variants in each patient that are ordered based on the information from the datasets mentioned above and current standards of variant prioritization such as minor allele frequency and predicted damaging effect. As a direct consequence, we expect to discover novel candidate genes for CHD and identify genes with a higher burden in a subset of CHD cases. The creation, training and testing of the machine learning algorithm will provide a platform for variant prioritization in patients with CHD and this model has the potential to be extended to other congenital malformations.

Key facts

NIH application ID: 10360965
Project number: 1R21HL161823-01
Recipient: RESEARCH INST NATIONWIDE CHILDREN'S HOSP
Principal Investigator: Vidu Garg
Activity code: R21
Funding institute: NIH
Fiscal year: 2022
Award amount: $115,500
Award type: 1
Project period: 2022-01-01 → 2023-12-31