# Research Project 3

> **NIH NIH P20** · CLEMSON UNIVERSITY · 2022 · $276,376

## Abstract

SUMMARY
Transposable elements (TEs, also referred to as jumping genes or mobile elements) are extraordinary
contributors to eukaryotic genome diversity, including in humans. TEs make up more than 50% of the human
genome and are far more common than protein coding genes, which comprise about 1% of the human
genome. Despite their abundance, TEs are understudied and major aspects of their mobile element biology
remain elusive. Due to their random insertion within the genome, insertions occur both in intergenic and genic
regions (including in exons). As retrotransposition is ongoing, with ~1 new insertion per 20 live births, there are
millions of polymorphic TEs within the human population, including some associated with disease. Highly repe-
titive regions are notoriously difficult to assemble, overrepresented at contig ends, and under-annotated from
short-read sequencing reads (presently prevalent in biomedical settings). In Aim I, we will improve the
annotation of the human mobilome (the genome’s entire mobile element content) by building upon the human
reference genome and the Human Genome Structural Variation consortium (providing access to Illumina short-
read and PacBio HiFi sequencing data). Part of our focus will be on improved calling of TEs from short-read
sequencing data. We will (a) implement chimpanzee as an outgroup in order to distinguish between TE
insertions and deletions containing TE sequence; and (b) develop a targeted-sequencing approach for trans-
posable elements. The latter will be combined with whole genome sequencing. Our targeted sequencing
approach will provide deeper coverage of breakpoints, improving identification of mobile elements. We will also
generate a high-resolution subfamily annotation with well-resolved end-branches. The youngest subfamilies
are commonly collapsed within older subfamilies because of size and few shared diagnostic mutations.
Underidentifying the youngest subfamilies leads to an apparent relative quiescence of TEs in recent history.
Building upon the TE annotation improvement in Aim I, we will investigate TEs to identify and characterize pu-
tative source elements (i.e. TEs capable of generating offspring insertions). Most TE insertions are dead upon
arrival and not able to create offspring TEs. While the identification of active L1s is relatively easy, the
identification of the drivers of Alu and SVA expansion has been far more elusive. A fine-scale TE subfamily
resolution that includes the youngest subfamilies will both shed light on the most recent TE evolution, and
allow investigation of source elements (which tend to be deleterious to their host) within the youngest
subfamilies. This makes the youngest subfamilies a prime target for an integrative source element identification
comparative approach.

## Key facts

- **NIH application ID:** 10348702
- **Project number:** 5P20GM139769-02
- **Recipient organization:** CLEMSON UNIVERSITY
- **Principal Investigator:** Miriam Kristine Konkel
- **Activity code:** P20 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $276,376
- **Award type:** 5
- **Project period:** 2021-02-10 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10348702

## Citation

> US National Institutes of Health, RePORTER application 10348702, Research Project 3 (5P20GM139769-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10348702. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
