# Long-read assembly and annotation of rat genomes that are important models of complex genetic disease

> **NIH NIH R01** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2022 · $425,340

## Abstract

Project Summary:
The rat is commonly used as an experimental model to investigate a wide diversity of biological processes of
medical relevance. These include neuroscience and brain function, behavioral research, and drug dependency
and addiction. Similarly, aspects of cardiovascular and renal function are more readily investigated in the rat
where the volumes of fluid within these systems are on a much more manageable scale than in smaller rodents.
The utility of rat models for investigation has led to the development of inbred rat strains that harbor medically
relevant phenotypes. Such models have drawn on pre-existing, natural genetic variation to fix genomic diversity
that creates traits such as disease susceptibility. The existence of such models has spurred genetic
investigations that have sought to uncover genetic variation responsible for disease susceptibility traits and to
utilize such knowledge to reveal mechanistic aspects of disease pathogenesis.
However, rat studies can be impeded by the poor quality of existing rat genomic resources. For example, the rat
genome reference sequence is very underdeveloped compared to that of human and mouse. Contiguity of the
genome assembly is an order of magnitude smaller and gene annotation is much reduced. Further, the reference
genome was generated from the Brown Norway inbred rat strain. This strain is biologically remote from many rat
strains used in research.
The rat is recognized as a highly adaptable species. Evolutionary biologists have recognized that an essential
element of adaptation arises from structural variation events in the genome. For example, the adaptation of
humans to high starch diets after the introduction of grain-based agriculture is associated with structural variation
in the amylase gene. This structural variation results from gene duplication events that are adaptive in permitting
increased carbohydrate digestion. These duplication and other large-scale events cannot be observed and
understood simply by alignment of short read genome sequence to the reference genome. Long reads are
required to capture structural variation events.
The objective of our project is to advance rat genomics resources to increase their utility to ongoing studies of
this animal model. To do so, we will use PacBio long read sequencing to capture all genomic variation (SNP and
structural). We will assemble de novo the genomes of 9 widely used rat strains. We will also introduce additional
annotation of the rat genome to increase the numbers of transcripts and proteins associated with the rat
reference genome using long read RNA sequencing. The result of our work will be made available via the Rat
Genome Database and by Ensembl and will provide essential research information to the many laboratories that
employ rat models in their research.

## Key facts

- **NIH application ID:** 10449388
- **Project number:** 5R01HG011252-02
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** PETER A DORIS
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $425,340
- **Award type:** 5
- **Project period:** 2021-07-12 → 2026-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10449388

## Citation

> US National Institutes of Health, RePORTER application 10449388, Long-read assembly and annotation of rat genomes that are important models of complex genetic disease (5R01HG011252-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10449388. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*