Connecting transposable elements and regulatory innovation using ENCODE data

NIH RePORTER · NIH · U01 · $464,997 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Repetitive transposable elements (TEs) comprise over 50% of the human genome. While some investigators regard TEs as “parasitic” DNA, other studies suggest that TEs play a more constructive role in genome evolution by providing raw material for new biological functions. For example, TEs commonly harbor active cis-regulatory elements that are occasionally co-opted during evolution to wire new gene regulatory networks. While investigators now recognize the importance of TEs in gene regulation, TEs remain under-analyzed in high-throughput data because of methodological hurdles associated with their repetitive nature. Thus, the impact of TEs on the regulation of the human genome, both in normal development and disease, remains largely uncharacterized. We propose to develop novel computational methods to assess and clarify the impact of TEs in regulatory innovation using ENCODE data. In Specific Aim 1 we will develop new algorithms and statistical methods to predict active regulatory elements encoded by TEs from heterogeneous ENCODE data. If successful, we will generate a profile of TE-derived regulatory elements and their predicted targets across diverse cell/tissue types and developmental stages, revealing new gene regulatory networks wired by TEs. With these new methods we also intend to examine the extent of TE dysregulation in cancer cells and its transcriptional consequences. In Specific Aim 2 we will extend the models developed in Aim 1 to understand the role of TEs in shaping the 3D topology of the genome, which is intimately connected to genome function. We will investigate the role of TEs in partitioning the genome into chromosomal domains that orchestrate communication between cis-regulatory elements and their target genes. In particular, we will quantify the extent to which TEs drive conservation and divergence in genome topology across mammal species. In Specific Aim 3 we will take advantage of the repetitive nature of TEs to develop a novel statistical model that links sequence changes in different copies of TEs to epigenetic and functional differences. The numerous, but slightly different copies of a TE present in a single genome provide a unique opportunity to identify sequence variants that underlie epigenetic modification, which will further our understanding of how TEs become co-opted for host gene regulation. Finally, in Specific Aim 4, we will deploy our recently developed Repeat Element Browser as a web portal and downloadable application specifically tailored for investigators to analyze, visualize and explore data produced by ENCODE, others, and their own data in the context of TEs. The methods developed in this proposal will have a high impact on the utility of the data produced by ENCODE and will greatly expand our understanding of the contribution of TEs to non-coding regulatory elements in healthy tissues and disease.

Key facts

NIH application ID
10241106
Project number
3U01HG009391-04S1
Recipient
WASHINGTON UNIVERSITY
Principal Investigator
Barak A Cohen
Activity code
U01
Funding institute
NIH
Fiscal year
2021
Award amount
$464,997
Award type
3
Project period
2020-09-01 → 2023-01-31