# Bayesian genetic association analysis of all rare diseases in the Kids First cohort

> **NIH NIH R03** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2024 · $169,000

## Abstract

Rare diseases affect 1 in 20 people, but fewer than half of the ⇠10,000 catalogued rare diseases have a re-
solved genetic etiology. Genetic association analyses of whole-genome sequencing (WGS) data from large,
phenotypically diverse collections of rare disease patients enhance the discovery of novel etiologies, compared
to within-study analyses, by increasing the probability of multiple cases sharing a genetic etiology and by boost-
ing the number of controls (Turro et al., Nature 2020). The Gabriella Miller Kids First (KF) program has germline
WGS data from 20 studies on 18,547 probands or relatives of probands with a birth defect or pediatric cancer.
However, due to the bioinformatic and statistical challenges of analyzing such large and complex WGS datasets,
a comprehensive cross-cutting genetic association analysis has never been performed. We present a research
program of computational and statistical approaches to uncover novel germline etiologies of rare diseases in KF
and replicate them in other cohorts to which we have access. In Aim 1, we will build a compact and portable
relational database containing a sparse representation of all the rare variant genotypes in the KF WGS data. Due
to natural selection, almost all pathogenic variants responsible for rare congenital or hereditary disorders are rare
and will thus be included. We will annotate the variants with scores reﬂecting their predicted deleteriousness and
their minor allele frequencies, and with their predicted molecular consequences. We will load sample-speciﬁc
information into the database, including pedigree membership, membership of a maximal set of unrelated partic-
ipants (MSUP) and group memberships for case/control association analyses. In Aim 2, we will develop a web
application allowing authenticated users to browse variants by gene or sample. The web interface will allow users
to click on sample IDs directly in a table of genotypes to view the phenotypes of individuals who are heterozy-
gous, homozygous or compound heterozygous for a given consequence class of rare variants in a side panel.
The application will also host and display the results of inference, such as posterior probabilities of association
(PPAs), posterior probabilities over the mode of inheritance, posterior probabilities over the consequence class of
pathogenic variants and posterior probabilities of the pathogenicities of variants. The application will be accessi-
ble by authorized collaborating experts across disciplines. In Aim 3, we will obtain a PPA between each gene and
each of a collection of case sets in KF in accordance with each study's data restrictions, if any. We will determine
the case sets using Mondo Disease Ontology and Human Phenotype Ontology terms assigned to cases. We will
select probands in a given case set using pedigree information and compare them to participants not in the case
set who are in other pedigrees and in the MSUP. We will attempt to replicate ﬁndings with a PPA >0...

## Key facts

- **NIH application ID:** 10813172
- **Project number:** 5R03HD111492-02
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** Ernest Turro
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $169,000
- **Award type:** 5
- **Project period:** 2023-03-21 → 2025-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10813172

## Citation

> US National Institutes of Health, RePORTER application 10813172, Bayesian genetic association analysis of all rare diseases in the Kids First cohort (5R03HD111492-02). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10813172. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
