# Development and Maintenance of RepeatMasker and RepeatModeler

> **NIH NIH R01** · INSTITUTE FOR SYSTEMS BIOLOGY · 2024 · $576,412

## Abstract

Project Summary
 Mammalian and most other eukaryotic genomes contain a large number of interspersed repeats (IRs), most of
which are copies of transposable elements (TEs) at varying levels of decay. Their presence complicates many
genome sequence analyses, but their accurate identification in an early analysis stage can reduce these
complications. In addition to their pervasiveness, over the last decades the research community has become
widely familiar with their enormous impact on genome activity and evolution.
 Every species has been exposed to a unique, complex set of TEs leaving recognizable copies from as long
ago as 300 million years to as recently as the present day. These TEs are uncovered and reconstructed by de
novo discovery methods, often by our RepeatModeler tool, while their copies are then annotated by our
RepeatMasker software. De novo methods can create TE libraries at a reasonable pace, but the product is far
from the desired quality that can be reached by hand curation. With the recent explosive growth in sequenced
species, these finishing steps, perhaps never fully automatable, now form a severe bottleneck in genome
analyses due to a lack of manpower and expertise, while the results, especially when produced by different
methods from different research groups, lack consistency and suffer from redundancy. Furthermore, the
annotation of genomes for which high-quality libraries have been created is not keeping up with library
improvements due to the computational burden of re-analysis.
 In this proposal, we describe a plan to refactor RepeatMasker by generalizing and improving TE alignment
adjudication, switching to a family-centric search strategy with support for incremental re-analysis, improving
annotation reporting and supporting cluster environments. Responding to the need for improved methods for
automated TE library generation we propose making significant changes to RepeatModeler’s core discovery
algorithms, develop a novel model extension tool, and. In addition, we will extend our novel methods for
exploiting multi-species alignments and ancestral reconstructions and utilize them to build a comprehensive
mammalian TE library.

## Key facts

- **NIH application ID:** 10798299
- **Project number:** 5R01HG002939-18
- **Recipient organization:** INSTITUTE FOR SYSTEMS BIOLOGY
- **Principal Investigator:** Robert MacDonald Hubley
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $576,412
- **Award type:** 5
- **Project period:** 2022-02-04 → 2027-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10798299

## Citation

> US National Institutes of Health, RePORTER application 10798299, Development and Maintenance of RepeatMasker and RepeatModeler (5R01HG002939-18). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10798299. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
