# Software for exploring all forms of genetic variation in any species

> **NIH NIH R01** · UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH · 2020 · $457,500

## Abstract

Modern DNA sequencing technologies have revolutionized the design of experiments investigating the
biology of the genome and the genetic basis of traits. Arguably the most powerful application of these
technologies has been the creation of exquisitely detailed catalogs describing the landscape of genetic
variation in multiple species. However, discovery of genetic variation is merely the beginning. Exploration and
analysis of the resulting catalogs is required to catalyze new insights into the relationship between genotype
and phenotype. This proposal is motivated by two fundamental limitations inhibiting discovery from genetic
variation datasets. First, existing software for mining variation to understand disease and other traits does not
scale to large datasets involving thousands of samples. Second, most existing tools are focused on human
studies; consequently, this inhibits the application of modern DNA sequencing to genetic studies of model
organisms, livestock genetics, and newly sequenced species.
 We propose to solve these challenges by building upon our GEMINI framework. Since 2012, we have
maintained GEMINI as a powerful software framework for exploring genome variation. GEMINI's strength is
that it integrates genetic variation with a diverse set of genome annotations into a database to facilitate variant
prioritization. It allows researchers to conduct complex analyses with simple queries based on sample
genotypes, phenotypes, inheritance patterns, and genome annotations. GEMINI has quickly become a very
popular tool for rare human disease research leading to discoveries by multiple labs, including our own.
 Despite its power and popularity, GEMINI has three important limitations. It was not designed for
studies involving genetic variation from more than a few hundred samples. Furthermore, its focus is the
analysis of single-nucleotide (SNP) and insertion-deletion (INDEL); it is blind to structural and copy number
variation. Finally, GEMINI can only analyze genetic variation datasets for the human genome; no other species
or genome builds are supported. Therefore, this proposal seeks to provide geneticists studying any species
with a powerful, flexible and simple to use software system that is fast and scalable enough to support genetic
research for many years to come. We will do this but achieving the following Specific Aims:
 (1) Develop a scalable, high performance genotype and haplotype query engine to empower
 large scale genome studies.
 (2) Devise new methods for genotyping, integrating and prioritizing structural variation.
 (3) Enable scalable, flexible genome analysis in any species and genome build.
 In summary, by completing these aims, the proposed research will provide geneticists studying any
species with a powerful, flexible and simple to use software system that is fast and scalable enough to support
genetic research for many years to come.

## Key facts

- **NIH application ID:** 9984424
- **Project number:** 5R01GM124355-04
- **Recipient organization:** UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
- **Principal Investigator:** Aaron R Quinlan
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $457,500
- **Award type:** 5
- **Project period:** 2017-08-01 → 2022-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9984424

## Citation

> US National Institutes of Health, RePORTER application 9984424, Software for exploring all forms of genetic variation in any species (5R01GM124355-04). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9984424. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
