New approaches for leveraging single-cell data to identify disease-critical genes and gene sets

NIH RePORTER · NIH · K99 · $108,778 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT Nominating candidate risk genes and gene sets underlying disease-critical processes is of utmost importance for developing drug targets and informing CRISPR screening experiments. To this end, large scale single-cell genomic and epigenomic data (from RNA-seq, ATAC-seq, Perturb-seq) can be integrated with genome wide association studies (GWAS) to enhance our understanding of the genetic architecture of human complex diseases and traits. In this proposal, I plan to develop new computational approaches to integrate single- cell functional genomic and epigenomic data with GWAS data for complex diseases and traits to identify and rank disease-critical genes and gene sets characterizing functional processes, as well as pinpoint short genomic regions linked to these disease-associated genes. My K99 training will be conducted at the Harvard T.H. Chan School of Public Health, as well as the Broad Institute, under the mentorship of Dr. Alkes Price. The key areas of my training will be to develop and evaluate approaches for gene-level and gene set-level functional architecture of diseases and traits and integrative analysis of single-cell, as well as bulk, functional genomics data with human disease genetics. My proposed approaches will attempt to bridge the gap between functional genomics and human genetics and downstream clinical drug/gene intervention experiments. The long- term goal of this research is to produce a set of computational tools that identify and rank top disease-critical genes, top disease-critical gene sets characterizing cell types or cellular processes and gene-linked genomic regions for each disease/trait. These approaches will reshape our understanding of the functional architecture of human diseases at cellular level and will inform future drug perturbation and CRISPR screening experiments. The first aim of this proposal is to develop methods to identify and rank disease-critical genes by integrating common and rare variant disease associations with gene-level functional information derived from single-cell genomics experiments. Here I will develop, compare and contrast multiple gene prioritization strategies that differ in how they annotate SNPs for a gene, how they aggregate variant level associations at gene level and how they use functional data in performing the gene prioritization. The second aim of this proposal is to develop new computational strategies to assess disease information in sets of genes that underlie a cell type or cellular processes active within or across cell types in a tissue. The third aim of this proposal is to pinpoint and prioritize short genomic regions that are either proximally or functionally linked (for example, as an enhancer) to disease- critical genes and gene sets from Aims 1 and 2. Here, I plan to integrate GWAS association signal near these gene-linked regions with deep learning models that can infer allelic effects at base pair resolution and single-cell ATAC-seq data....

Key facts

NIH application ID: 10342464
Project number: 1K99HG012203-01
Recipient: HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH
Principal Investigator: Kushal Kumar Dey
Activity code: K99
Funding institute: NIH
Fiscal year: 2022
Award amount: $108,778
Award type: 1
Project period: 2022-02-01 → 2023-01-31