Bayesian genetic association analysis of all rare diseases in the Kids First cohort

NIH RePORTER · NIH · R03 · $169,000 · view on reporter.nih.gov ↗

Abstract

Rare diseases affect 1 in 20 people, but fewer than half of the ⇠10,000 catalogued rare diseases have a re- solved genetic etiology. Genetic association analyses of whole-genome sequencing (WGS) data from large, phenotypically diverse collections of rare disease patients enhance the discovery of novel etiologies, compared to within-study analyses, by increasing the probability of multiple cases sharing a genetic etiology and by boost- ing the number of controls (Turro et al., Nature 2020). The Gabriella Miller Kids First (KF) program has germline WGS data from 20 studies on 18,547 probands or relatives of probands with a birth defect or pediatric cancer. However, due to the bioinformatic and statistical challenges of analyzing such large and complex WGS datasets, a comprehensive cross-cutting genetic association analysis has never been performed. We present a research program of computational and statistical approaches to uncover novel germline etiologies of rare diseases in KF and replicate them in other cohorts to which we have access. In Aim 1, we will build a compact and portable relational database containing a sparse representation of all the rare variant genotypes in the KF WGS data. Due to natural selection, almost all pathogenic variants responsible for rare congenital or hereditary disorders are rare and will thus be included. We will annotate the variants with scores reﬂecting their predicted deleteriousness and their minor allele frequencies, and with their predicted molecular consequences. We will load sample-speciﬁc information into the database, including pedigree membership, membership of a maximal set of unrelated partic- ipants (MSUP) and group memberships for case/control association analyses. In Aim 2, we will develop a web application allowing authenticated users to browse variants by gene or sample. The web interface will allow users to click on sample IDs directly in a table of genotypes to view the phenotypes of individuals who are heterozy- gous, homozygous or compound heterozygous for a given consequence class of rare variants in a side panel. The application will also host and display the results of inference, such as posterior probabilities of association (PPAs), posterior probabilities over the mode of inheritance, posterior probabilities over the consequence class of pathogenic variants and posterior probabilities of the pathogenicities of variants. The application will be accessi- ble by authorized collaborating experts across disciplines. In Aim 3, we will obtain a PPA between each gene and each of a collection of case sets in KF in accordance with each study's data restrictions, if any. We will determine the case sets using Mondo Disease Ontology and Human Phenotype Ontology terms assigned to cases. We will select probands in a given case set using pedigree information and compare them to participants not in the case set who are in other pedigrees and in the MSUP. We will attempt to replicate ﬁndings with a PPA >0...

Key facts

NIH application ID: 10813172
Project number: 5R03HD111492-02
Recipient: ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
Principal Investigator: Ernest Turro
Activity code: R03
Funding institute: NIH
Fiscal year: 2024
Award amount: $169,000
Award type: 5
Project period: 2023-03-21 → 2025-02-28