Computational and Statistical Methods to determine variant effect across cell types and development stages

NIH RePORTER · NIH · U01 · $1,913,337 · view on reporter.nih.gov ↗

Abstract

Project Summary Built on the success of the GTEx project, the recently launched dGTEx project will recruit 120 donors to identify genetic variants affecting gene expressions across tissues at four developmental stages (postnatal, early childhood, pre-pubertal, and post-pubertal). Because there are considerably fewer samples in the dGTEx project than that in the GTEx project, there is a critical need to develop powerful and robust statistical methods to best use the dGTEx data for eQTL analysis. Moreover, single-cell sequencing is planned for the dGTEx project, creating additional challenges and opportunities. The overall objective of our project is to develop and apply novel statistical and computational methods to integrate different data sets to facilitate eQTL analysis of the dGTEx data, and share the results with the research community. We will accomplish this objective through three specific aims. For the first aim, we will infer tissue-specific eQTLs based on the total read count data by borrowing information across tissues and developmental stages. We will then develop a hierarchical Bayesian method to infer cell-type-specific eQTLs across developmental stages by jointly analyzing single-cell data and bulk samples with computationally estimated cell-type proportions. We will also consider isoform eQTLs for this aim. For the second aim, we will develop methods for identifying allele-specifically expressed genes in different cell types. To gain more power, we will develop methods to jointly call allelic events across tissues and cell types, correct for the specific biases in single-cell expression data, and develop methods for integrating allele- specific chromatin accessibility and allele-specific expression using single-cell multiome data. Single-cell data will then be combined with bulk RNA-seq data to improve allele-specific expression inference across subjects further. Finally, we will jointly analyze total read counts and allele-specific data for eQTL inference for this aim. In the third aim, we will develop methods to integrate data from other sources to complement the data collected from the dGTEx project, such as data from the GTEx project. We will leverage chromatin data to "transfer" known eQTLs from bulk tissues and larger cohorts to the specific (smaller) single-cell cohorts. We will also incorporate predicted effects of genetic variants from deep learning approaches in our modeling and analysis. To facilitate transcriptome-wide association studies for complex traits rooted in early development, we will develop gene expression imputation models based on our eQTL results. We will work with the dGTEx team to share our results with the broader scientific community via the dGTEx portal and ANVIL.

Key facts

NIH application ID
10990741
Project number
1U01HG013840-01
Recipient
YALE UNIVERSITY
Principal Investigator
Mark Bender Gerstein
Activity code
U01
Funding institute
NIH
Fiscal year
2024
Award amount
$1,913,337
Award type
1
Project period
2024-09-23 → 2027-08-31