Statistical methods for improving reproducibility and utility of chromatin interaction data

NIH RePORTER · NIH · R01 · $306,631 · view on reporter.nih.gov ↗

Abstract

Project Summary The spatial organization of the genome in the nucleus plays an important role in the transcriptional control of genes. Currently, Hi-C is the most widely used high-throughput technique that probes the genome-wide spatial organization of chromatin. However, Hi-C experiments involve multiple complex experimental steps, introducing various sources of biases. Many data-analytical challenges still must be overcome to reach reliable and reproducible biological interpretations of the data. The small sample size of each individual study further limits the power and reliability of data analyses. When replicate samples are available, reproducibility across replicate samples informs us about the fidelity of the identification, and potentially it can be used to detect reproducible signals that are too modest to be detected reliably in individual samples. Even for samples from different cells, information may be borrowed through joint analyses to improve the identification of both topologically associated domains (TADs) and regions with different structures. This project proposes to develop a suite of new statistical methods that use the reproducibility information provided by replicate samples to select reliable identifications and to improve the accuracy of peak calling and TAD calling. Furthermore, it proposes a joint analysis framework to identify condition-specific architectural differences across different cells. Aim 1 will develop statistical methods to evaluate the reproducibility of identified chromatin loops and to select reproducible identifications. The reproducibility-based selection criterion complements the usual measure of significance on a single sample, but has the benefit of being comparable across data sets, protocols and different measures of significance. Aim 2 will develop robust, joint multi-sample peak calling and TAD calling methods. These methods will allow one to synergize information across samples and properly take account of variations across replicates, ultimately improving the power of the analysis and reducing false positives. Aim 3 will develop statistical methods for detecting TAD and other architectural differences between different cell types, cellular conditions, or disease status. Included in each proposed Aim are rigorous evaluations of the output of these methods utilizing orthogonal epigenomic data and experimental tests of hypotheses derived from the results of the analytical methods. These methods will enable users to generate reliable and robust scientific interpretation, and ultimately advance the understanding of nuclear organization and its role in gene expression and cellular function.

Key facts

NIH application ID: 9899253
Project number: 5R01GM109453-06
Recipient: PENNSYLVANIA STATE UNIVERSITY, THE
Principal Investigator: Qunhua Li
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $306,631
Award type: 5
Project period: 2013-09-01 → 2023-03-31