# High dimensional statistical data modeling and integration for studying regulatory variation

> **NIH NIH R01** · UNIVERSITY OF WISCONSIN-MADISON · 2024 · $371,185

## Abstract

Project Summary
Gene regulatory programs of mammalian cells are largely influenced by long-range
chromatin interactions. We propose to develop robust and scalable statistical methods
for two critical genomic inference problems hinging upon long-range chromatin
interactions. First, the study of long-range interactions at the single cell-level with 3C-
based method scHi-C is fundamental to fully understanding cell type-specific gene
regulation. scHi-C measurements harbor unexplored biological diversity. However, these
measurements are prone to extreme sparsity, technological bias, and noise. While initial
inference methods simply focused on lower dimensional representations of scHi-C data,
lack of a scalable framework that can exploit nonlinearities in de-noising of the data
impedes key inference tasks from these experiments. We will address these critical
shortcomings by developing a novel deep generative model for scHi-C data. By de-
noising the data, these methods will improve the power with which signals of interest can
be studied. Second, while advances in sequencing and large-scale availability of
epigenome data improved the power and interpretation of genome-wide association
studies (GWAS), shortcomings in identifying which genes noncoding SNPs might be
impacting through long-range chromatin interactions hinder the translation of GWAS
findings into clinical interventions. Leveraging existing large-scale studies of diversity
outbred mice, we will develop a rigorous framework that integrates multi-omics functional
data modalities to fine-map model organism molecular quantitative trait loci and transfer
the results to humans for linking noncoding GWAS SNPs to their effector, i.e.,
susceptibility, genes. Large-scale application with type 2 diabetes (T2D) traits will deliver
candidate T2D effector genes and their regulatory loci that are amenable for
experimental follow-up. Both aims will be accomplished through a combination of
methodological development, theoretical analysis, data-driven simulation, computational
analysis, and experimental validation. Statistical resources generated from this project
will be disseminated as open-source software. Successful completion of the project will
help to ensure that maximal information is obtained from powerful scHi-C experiments
and model organism multi-omics data.

## Key facts

- **NIH application ID:** 10812433
- **Project number:** 5R01HG003747-15
- **Recipient organization:** UNIVERSITY OF WISCONSIN-MADISON
- **Principal Investigator:** Sunduz Keles
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $371,185
- **Award type:** 5
- **Project period:** 2007-04-26 → 2026-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10812433

## Citation

> US National Institutes of Health, RePORTER application 10812433, High dimensional statistical data modeling and integration for studying regulatory variation (5R01HG003747-15). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10812433. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
