DMS/NIGMS 1: Multilayer network approach to tandem repeat variation in genomes

NIH RePORTER · NIH · R01 · $148,350 · view on reporter.nih.gov ↗

Abstract

Understanding the genetic bases of biological function is a fundamental quest ion in biological sciences. Traditionally, the conservation of genetic sequences across species and populations has been a primary concept with which to measure functionality. However, recent biochemical characterizations of the DNA have challenged this definition of functionality and argued up to 80% of the human genome to be functional. Several studies have pursued the possibility that biological function evolves as an adaptive response to rapid changes under environmental pressures whe reby sequence conservation does not directly predict function. By integrating -omics datasets and multilayer network approaches, we will specifically test the following four hypotheses: (1) Among the millions of tandem repeats, a small portion, still corresponding to thousands of loci, are functionally relevant. We further hypothesize that majority of these functional tandem repeats will be evolving under negative selection and pr imarily cluster together in multilayer networks of tandem repeat units. (2) Exonic tandem repeats have evolved as molecular tools to regulate the dosage of a particular functional motif. Thus, we expect that these functional tandem repeats will retain sequence conservation among paralogs as well as among species. (3) There are hundreds of tandem repeats in the mammalian genome that evolve under lineage-specific positive selection. We expect that such positively selected tandem re peats show unusual species-specific copy number expansions or contractions, and may affect gene expression and phenotypic traits more often than neutrally evolving tandem repeats. (4) Tandem-repeat copy numbe r variation, if functional, primarily effects phenotypic variation related to immunity and metabolism in humans. We expect that these repeat loci evolve under positive selection. To test these hypotheses, we will develop mathematical/computational methods to find groups of core nodes in multilayer genetic networks, and then apply them to multilayer networks that we will build, in which each network layer is based on a specific type of relationships between tandem repeat units. RELEVANCE (See instructions): Understanding genetic bases of biological function can alleviate ou r ability to understand and treat human disease. However, variable tandem repeats in the human genome have been difficult to characterize for functional and biomedical relevance. This research will leverage recently available long-read sequencing datasets to develop mathematical methods to investigate tandemly repeated sequences in the human genome, thus providing potentially transformative insights into genetic basis of human disease. P ROJ ECT/ P E R FO R M A N C E SI T E(S) (if ad di tional space is need ed , use

Key facts

NIH application ID
10592458
Project number
1R01GM148973-01
Recipient
STATE UNIVERSITY OF NEW YORK AT BUFFALO
Principal Investigator
Naoki Masuda
Activity code
R01
Funding institute
NIH
Fiscal year
2022
Award amount
$148,350
Award type
1
Project period
2022-09-24 → 2025-06-30