# Integrative analysis of whole genomes and transcriptomes from multiple cell types in rare disease patients

> **NIH NIH R01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2024 · $622,388

## Abstract

Whole-genome sequencing (WGS) is revolutionizing the diagnosis of rare diseases. However, at present, even
the most powerful approaches to etiological discovery typically fail to ﬁnd a genetic cause in a majority of partici-
pants (Turro et al., Nature 2020). There are a number of reasons for this. Firstly, rare disease studies are typically
composed of small sets of unresolved cases, each sharing a different genetic etiology, which constrains statistical
power when only WGS and clinical phenotype data are available on participants. Secondly, the unknown causal
variants may have molecular consequences that are challenging to predict computationally, such as disruptions to
the regulatory elements (REs) of a gene or the introduction of a cryptic splice site. Thirdly, some types of causal
mutations, such as structural variants, are prone to being missed by WGS. Systematic, transcriptomic proﬁling of
homogeneous cell populations taken from rare disease patients has the potential to overcome these limitations.
We have access to a collection of ⇠1,000 comprehensively phenotyped rare disease study participants with WGS
and RNA-seq of platelets, neutrophils, monocytes and CD4+ T-cells. Here, we present a research program of
statistical, computational and experimental approaches to uncover novel etiologies of rare diseases that exploits
the high dimensionality and the hierarchical nature of these data. We will concentrate on the etiologies under-
lying ⇠300 cases with a rare platelet disorder (RPD), exploiting our expertise in blood genomics. In Aim 1, we
will develop a Bayesian method for identifying rare disease-causing rare variants in REs, treating expression as a
molecular mediator of genetic etiology. Our approach models the causal path between rare variants that overlap
cell type-speciﬁc REs, the corresponding cell type-speciﬁc changes in expression, and the consequent alteration
in rare disease risk. To include a recently discovered class of enhancer marked by H3K122ac but not H3K27ac
in our hypothesis search space, we will generate H3K122ac data on the relevant cell types from healthy donors.
In Aim 2, we will apply several approaches for identifying pathogenic changes in transcript sequences. For ex-
ample, we will apply recently developed methodology for identifying splicing outliers within the cohort. To ensure
these outliers are extreme in the wider population, we will compute splicing frequency spectra in large RNA-seq
datasets such as GTEx. These spectra will capture the population distribution of the within-individual proportion
of RNA-seq reads for a gene that include a given splice junction. We will also exploit the joint availability of WGS
and RNA-seq in patients to identify extreme allelic imbalances at WGS-called heterozygote sites. The candidate
variants that we identify will be validated in cell lines and primary samples. Rare diseases collectively affect one
in 20 people but current etiological knowledge cannot resolve half of pat...

## Key facts

- **NIH application ID:** 10841545
- **Project number:** 5R01HL161365-02
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** Ernest Turro
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $622,388
- **Award type:** 5
- **Project period:** 2023-05-15 → 2028-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10841545

## Citation

> US National Institutes of Health, RePORTER application 10841545, Integrative analysis of whole genomes and transcriptomes from multiple cell types in rare disease patients (5R01HL161365-02). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10841545. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
