# Deep tensor genomic imputation

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2022 · $384,251

## Abstract

Project Summary/Abstract
High-throughput sequencing assays allow scientists to measure biochemical properties like transcription factor
binding, histone modiﬁcations, and gene expression in nearly any cell line or primary tissue (“biosample”).
Unfortunately, measuring all possible biochemical properties in every biosample is infeasible, both because of
limited sample availability and because the cost would be prohibitive. We have previously developed a state-of-
the-art imputation method, called Avocado, that can ﬁll in the holes in such data sets. Avocado couples tensor
factorization with a deep neural network. The method is scalable to large data sets and provides more accurate
imputations than competing methods such as ChromImpute or PREDICTD. We have already applied Avocado
systematically to the NIH ENCODE data set and made the imputations publicly available via the ENCODE web
por tal.
 Here, we propose to extend Avocado in four important ways. First, we will extend Avocado to handle single-cell
data sets, thereby effectively turning each single-cell experiment into an in silico co-assay that measures multiple
properties of each cell in parallel. Second, we will extend Avocado to work with data such as Hi-C, which measures
three-dimensional properties of DNA. The extension involves converting Avocado's 3D tensor (biosample assay
 genomic position) to a 4D tensor with two genomic position axes. This extension will apply to a wide variety
of data types, including various types of Hi-C data, SPRITE, GAM, ChIA-PET and PLAC-seq. Third, we will
enhance Avocado to use variant aware genomic sequence to enable high-resolution imputation of regulatory
proﬁles. Finally, we will leverage the imputed data to infer cis-regulatory sequence annotations and the molecular
impact of regulatory non-coding variants in one of the most comprehensive collections of cellular contexts.
 All of the software produced by this project will be open source, and all of the imputed data and latent
factorizations will be made publicly available via the web portals associated with the NIH 4D Nucleome and
ENCODE Consortia, providing a valuable public resource for users of these data sets.

## Key facts

- **NIH application ID:** 10335796
- **Project number:** 5R01HG011466-02
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** William Stafford Noble
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $384,251
- **Award type:** 5
- **Project period:** 2021-02-01 → 2025-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10335796

## Citation

> US National Institutes of Health, RePORTER application 10335796, Deep tensor genomic imputation (5R01HG011466-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10335796. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
