Predicting 3D physical gene-enhancer interactions through integration of GTEx and 4DN data

NIH RePORTER · NIH · R03 · $298,222 · view on reporter.nih.gov ↗

Abstract

Program Director/Principal Investigator (Liang, Jie): PROJECT SUMMARY/ABSTRACT We will develop computational tools that facilitate investigation of the fundamental relationship between gene expression and genome topology. Specifically, we will develop machine learning tools that can link enhancer and its targeted gene at genome wide scale. The ability of establishing relationship between enhancers and their target genes is critically important, as it will aid in our understanding of gene regulation and in establishing the relationship between noncoding risk variants from GWAS studies to potential causal genes. Our approach will be based on 3D polymer models of chromatin interactions derived from Hi-C data in the common fund 4D Nucleome (4DN) database, and will integrate data from the common fund supported Genotype-Tissue Expression (GTEx) databaseas, as well as data from ENCODE database. We will 1) construct a database of trusted high- quality database of candidate enhancer-gene target pairs. We will then 2) use this database to train a machine learning predictor that can predict enhancer-gene target pairs at genome wide scale. For 1), we will develop a pipeline to identify a small set of critical specific chromatin 3D interactions through simulation of large scale folding of 3D chromatin ensembles. The small set of specific interactions will be tested for sufficiency of chromatin folding. We will then identify computationally enhancers based on epigenetic histone modifications and chromatin accessibility data from ENCODE as well as the Roadmap Epigenomics Project. We will then select enhancers containing eQTLs from the GTEx databases, which are known to affect the expression of the target gene. The end result will be a high- quality and trustworthy database of enhance-gene pairs, which will be provided by the predicted critical specific 3D physical chromatin interactions connecting the eQTL-containing enhancer and the target gene. For 2), we will develop a machine-learning predictor that predicts enhancer-gene interactions from genomic, epigenomic, and Hi-C data at genome-wide scale. We will combine epigenetic data with genomic information (such as sequence motifs of TFs) as features. We will then train a machine learning predictor through hold-outs and cross-validations of the constructed database of enhancer-target gene pairs from 1). The efficacy of the predictor will then be assessed with the gold-standard of the CRISPRi-FlowFISH data. We will then carry out large scale computational and will construct databases of predicted enhancer-gene relationship for selected cell types. Overall, we will demonstrate significant added-power of integrating two important Common Fund data resources and will provide tools to facilitate understanding the relationship between genome topology and gene expression. Our computational tools will lead to new insight into the relationship of genome structure and genome function important for improving human health. 0...

Key facts

NIH application ID
10776871
Project number
1R03OD036492-01
Recipient
UNIVERSITY OF ILLINOIS AT CHICAGO
Principal Investigator
Jie Liang
Activity code
R03
Funding institute
NIH
Fiscal year
2023
Award amount
$298,222
Award type
1
Project period
2023-09-20 → 2024-09-19