High dimensional statistical data modeling and integration for studying regulatory variation

NIH RePORTER · NIH · R01 · $364,627 · view on reporter.nih.gov ↗

Abstract

Project Summary Gene regulatory programs of mammalian cells are largely influenced by long-range chromatin interactions. We propose to develop robust and scalable statistical methods for two critical genomic inference problems hinging upon long-range chromatin interactions. First, the study of long-range interactions at the single cell-level with 3C- based method scHi-C is fundamental to fully understanding cell type-specific gene regulation. scHi-C measurements harbor unexplored biological diversity. However, these measurements are prone to extreme sparsity, technological bias, and noise. While initial inference methods simply focused on lower dimensional representations of scHi-C data, lack of a scalable framework that can exploit nonlinearities in de-noising of the data impedes key inference tasks from these experiments. We will address these critical shortcomings by developing a novel deep generative model for scHi-C data. By de- noising the data, these methods will improve the power with which signals of interest can be studied. Second, while advances in sequencing and large-scale availability of epigenome data improved the power and interpretation of genome-wide association studies (GWAS), shortcomings in identifying which genes noncoding SNPs might be impacting through long-range chromatin interactions hinder the translation of GWAS findings into clinical interventions. Leveraging existing large-scale studies of diversity outbred mice, we will develop a rigorous framework that integrates multi-omics functional data modalities to fine-map model organism molecular quantitative trait loci and transfer the results to humans for linking noncoding GWAS SNPs to their effector, i.e., susceptibility, genes. Large-scale application with type 2 diabetes (T2D) traits will deliver candidate T2D effector genes and their regulatory loci that are amenable for experimental follow-up. Both aims will be accomplished through a combination of methodological development, theoretical analysis, data-driven simulation, computational analysis, and experimental validation. Statistical resources generated from this project will be disseminated as open-source software. Successful completion of the project will help to ensure that maximal information is obtained from powerful scHi-C experiments and model organism multi-omics data.

Key facts

NIH application ID: 10213308
Project number: 2R01HG003747-12
Recipient: UNIVERSITY OF WISCONSIN-MADISON
Principal Investigator: Sunduz Keles
Activity code: R01
Funding institute: NIH
Fiscal year: 2021
Award amount: $364,627
Award type: 2
Project period: 2007-04-26 → 2025-03-31