Long-read assembly and annotation of rat genomes that are important models of complex genetic disease

NIH RePORTER · NIH · R01 · $425,340 · view on reporter.nih.gov ↗

Abstract

Project Summary: The rat is commonly used as an experimental model to investigate a wide diversity of biological processes of medical relevance. These include neuroscience and brain function, behavioral research, and drug dependency and addiction. Similarly, aspects of cardiovascular and renal function are more readily investigated in the rat where the volumes of fluid within these systems are on a much more manageable scale than in smaller rodents. The utility of rat models for investigation has led to the development of inbred rat strains that harbor medically relevant phenotypes. Such models have drawn on pre-existing, natural genetic variation to fix genomic diversity that creates traits such as disease susceptibility. The existence of such models has spurred genetic investigations that have sought to uncover genetic variation responsible for disease susceptibility traits and to utilize such knowledge to reveal mechanistic aspects of disease pathogenesis. However, rat studies can be impeded by the poor quality of existing rat genomic resources. For example, the rat genome reference sequence is very underdeveloped compared to that of human and mouse. Contiguity of the genome assembly is an order of magnitude smaller and gene annotation is much reduced. Further, the reference genome was generated from the Brown Norway inbred rat strain. This strain is biologically remote from many rat strains used in research. The rat is recognized as a highly adaptable species. Evolutionary biologists have recognized that an essential element of adaptation arises from structural variation events in the genome. For example, the adaptation of humans to high starch diets after the introduction of grain-based agriculture is associated with structural variation in the amylase gene. This structural variation results from gene duplication events that are adaptive in permitting increased carbohydrate digestion. These duplication and other large-scale events cannot be observed and understood simply by alignment of short read genome sequence to the reference genome. Long reads are required to capture structural variation events. The objective of our project is to advance rat genomics resources to increase their utility to ongoing studies of this animal model. To do so, we will use PacBio long read sequencing to capture all genomic variation (SNP and structural). We will assemble de novo the genomes of 9 widely used rat strains. We will also introduce additional annotation of the rat genome to increase the numbers of transcripts and proteins associated with the rat reference genome using long read RNA sequencing. The result of our work will be made available via the Rat Genome Database and by Ensembl and will provide essential research information to the many laboratories that employ rat models in their research.

Key facts

NIH application ID: 10449388
Project number: 5R01HG011252-02
Recipient: UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
Principal Investigator: PETER A DORIS
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $425,340
Award type: 5
Project period: 2021-07-12 → 2026-04-30