Software for exploring all forms of genetic variation in any species

NIH RePORTER · NIH · R01 · $457,500 · view on reporter.nih.gov ↗

Abstract

Modern DNA sequencing technologies have revolutionized the design of experiments investigating the biology of the genome and the genetic basis of traits. Arguably the most powerful application of these technologies has been the creation of exquisitely detailed catalogs describing the landscape of genetic variation in multiple species. However, discovery of genetic variation is merely the beginning. Exploration and analysis of the resulting catalogs is required to catalyze new insights into the relationship between genotype and phenotype. This proposal is motivated by two fundamental limitations inhibiting discovery from genetic variation datasets. First, existing software for mining variation to understand disease and other traits does not scale to large datasets involving thousands of samples. Second, most existing tools are focused on human studies; consequently, this inhibits the application of modern DNA sequencing to genetic studies of model organisms, livestock genetics, and newly sequenced species. We propose to solve these challenges by building upon our GEMINI framework. Since 2012, we have maintained GEMINI as a powerful software framework for exploring genome variation. GEMINI's strength is that it integrates genetic variation with a diverse set of genome annotations into a database to facilitate variant prioritization. It allows researchers to conduct complex analyses with simple queries based on sample genotypes, phenotypes, inheritance patterns, and genome annotations. GEMINI has quickly become a very popular tool for rare human disease research leading to discoveries by multiple labs, including our own. Despite its power and popularity, GEMINI has three important limitations. It was not designed for studies involving genetic variation from more than a few hundred samples. Furthermore, its focus is the analysis of single-nucleotide (SNP) and insertion-deletion (INDEL); it is blind to structural and copy number variation. Finally, GEMINI can only analyze genetic variation datasets for the human genome; no other species or genome builds are supported. Therefore, this proposal seeks to provide geneticists studying any species with a powerful, flexible and simple to use software system that is fast and scalable enough to support genetic research for many years to come. We will do this but achieving the following Specific Aims: (1) Develop a scalable, high performance genotype and haplotype query engine to empower large scale genome studies. (2) Devise new methods for genotyping, integrating and prioritizing structural variation. (3) Enable scalable, flexible genome analysis in any species and genome build. In summary, by completing these aims, the proposed research will provide geneticists studying any species with a powerful, flexible and simple to use software system that is fast and scalable enough to support genetic research for many years to come.

Key facts

NIH application ID
9984424
Project number
5R01GM124355-04
Recipient
UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
Principal Investigator
Aaron R Quinlan
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$457,500
Award type
5
Project period
2017-08-01 → 2022-07-31