# A scalable, integrative, multi-omic analysis platform

> **NIH NIH R00** · UNIVERSITY OF COLORADO · 2020 · $223,638

## Abstract

PROJECT SUMMARY
Despite decades of effort, only a small portion of the heritability of genetic disorders can be currently explained.
Two explanations for this gap are that the underlying genetic variants are rare and currently unknown, and, we
have a poor understanding of the impact of the variants that we do have, in particular those residing outside of
the coding regions. Addressing these issues requires both larger cohorts and more whole-genome functional
assays (e.g RNA-seq, CHiP-seq, ATAC-seq, etc.). In recognition of projects like the Center for Common
Genetic Disorders (CCGD), the Trans-Omics for Precision Medicine (TOPMed) Program and ENCODE are
performing the gathering of massive amounts of genetic data across many different individuals and tissues. In
aggregate, this data will dramatically improve our power to understanding how variation affects genomic
architecture. The challenge is that these data are vast, complex, and multidimensional, and current methods
cannot operate at this scale.
 This proposal addresses this challenge by splitting the data into two distinct types of data, genotypes
and genome annotations, and developing technologies that are optimized to store and search each type
independently. These two highly-scalable methods, which will be extremely valuable on their own, will then be
integrated into a single system that enables queries across variation, gene expression, and regulation. For
example, consider the question, “Are there any tissues where de novo variants in case have a differential
enrichment versus those in controls?” This question is decomposed into a genotype query that produces two
sets of variants: de novos in case and de novos in controls. The sets then serve as input queries into a
genome annotation search across all putative enhancers in all tissues.
 This proposal builds upon both my recently published Genotype Query Tools (GQT), a method that
achieved vast speedups over other methods by operating directly on a compressed genotype index, and my
past research and training in genome arithmetic algorithms, for which I have published multiple novel
algorithms. Up to now I have focused on methods, so while the K99 phase of this project will include
development, it will have a distinct focus on the analysis of disease cohorts. This additional training will be the
foundation of an independent research program that will unlock the potential of large-scale genomics and
functional data sets, providing for the fast and fluid integration between phenotype, genotype, and functional
data.

## Key facts

- **NIH application ID:** 9970515
- **Project number:** 5R00HG009532-05
- **Recipient organization:** UNIVERSITY OF COLORADO
- **Principal Investigator:** Ryan M Layer
- **Activity code:** R00 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $223,638
- **Award type:** 5
- **Project period:** 2018-08-20 → 2022-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9970515

## Citation

> US National Institutes of Health, RePORTER application 9970515, A scalable, integrative, multi-omic analysis platform (5R00HG009532-05). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9970515. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
