Integrative analysis of whole genomes and transcriptomes from multiple cell types in rare disease patients

NIH RePORTER · NIH · R01 · $622,388 · view on reporter.nih.gov ↗

Abstract

Whole-genome sequencing (WGS) is revolutionizing the diagnosis of rare diseases. However, at present, even the most powerful approaches to etiological discovery typically fail to find a genetic cause in a majority of partici- pants (Turro et al., Nature 2020). There are a number of reasons for this. Firstly, rare disease studies are typically composed of small sets of unresolved cases, each sharing a different genetic etiology, which constrains statistical power when only WGS and clinical phenotype data are available on participants. Secondly, the unknown causal variants may have molecular consequences that are challenging to predict computationally, such as disruptions to the regulatory elements (REs) of a gene or the introduction of a cryptic splice site. Thirdly, some types of causal mutations, such as structural variants, are prone to being missed by WGS. Systematic, transcriptomic profiling of homogeneous cell populations taken from rare disease patients has the potential to overcome these limitations. We have access to a collection of ⇠1,000 comprehensively phenotyped rare disease study participants with WGS and RNA-seq of platelets, neutrophils, monocytes and CD4+ T-cells. Here, we present a research program of statistical, computational and experimental approaches to uncover novel etiologies of rare diseases that exploits the high dimensionality and the hierarchical nature of these data. We will concentrate on the etiologies under- lying ⇠300 cases with a rare platelet disorder (RPD), exploiting our expertise in blood genomics. In Aim 1, we will develop a Bayesian method for identifying rare disease-causing rare variants in REs, treating expression as a molecular mediator of genetic etiology. Our approach models the causal path between rare variants that overlap cell type-specific REs, the corresponding cell type-specific changes in expression, and the consequent alteration in rare disease risk. To include a recently discovered class of enhancer marked by H3K122ac but not H3K27ac in our hypothesis search space, we will generate H3K122ac data on the relevant cell types from healthy donors. In Aim 2, we will apply several approaches for identifying pathogenic changes in transcript sequences. For ex- ample, we will apply recently developed methodology for identifying splicing outliers within the cohort. To ensure these outliers are extreme in the wider population, we will compute splicing frequency spectra in large RNA-seq datasets such as GTEx. These spectra will capture the population distribution of the within-individual proportion of RNA-seq reads for a gene that include a given splice junction. We will also exploit the joint availability of WGS and RNA-seq in patients to identify extreme allelic imbalances at WGS-called heterozygote sites. The candidate variants that we identify will be validated in cell lines and primary samples. Rare diseases collectively affect one in 20 people but current etiological knowledge cannot resolve half of pat...

Key facts

NIH application ID
10841545
Project number
5R01HL161365-02
Recipient
ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
Principal Investigator
Ernest Turro
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$622,388
Award type
5
Project period
2023-05-15 → 2028-02-29