# Tooling for accurately studying the epigenome along the human pangenome reference

> **NIH NIH U01** · UNIVERSITY OF WASHINGTON · 2024 · $1,418,806

## Abstract

Project Summary/Abstract
This proposal will provide the foundational tooling for understanding the function of the pan-genome reference
through the accurate annotation of regulatory elements within the pan-genome. As the genetic component of the
pan-genome reference comes into focus, the next challenge is understanding the functional relevance of genetic
variants within this reference. However, resolving this challenge requires tooling that enables users to: (1) get
accurate epigenetic data into a pan-genome reference; and (2) use epigenetic data once it is in a pan-genome
reference. This proposal leverages our team’s unique expertise in long-read epigenetics, short-read epigenetics,
pan-genome assembly, and genomic software development to develop transformative tooling for threading
accurate epigenetic information into a pan-genome graph, as well as extracting epigenetic information from a
pan-genome in a manner that is compatible with existing epigenetic and genetic analysis tools. Our tooling is
grounded in first assembling accurate epigenetic annotations at the level of haploid linear contigs, which are then
threaded into a pan-genome reference. This approach significantly improves the accuracy by which both long-
and short-read epigenetic features are mapped into a pan-genome, enables our tooling to readily adapt to new
pan-genomes, and enables user-generated epigenetic data to be incorporated into a pan-genome reference
without having to remake the pan-genome reference itself. Importantly, we are designing this tooling to work for
diverse types of epigenetic data acquired across sequencing platforms. In addition, this tooling will be available
through AnVIL, Conda, and other platforms, enabling users to readily adopt it into their own research pipelines.
Specifically, in Aim 1 we will develop tooling that uses a semi-supervised machine learning approach to
accurately classify long-read epigenetic data collected using diverse experimental methods and sequencing
platforms. In Aim 2, we will develop tooling that accurately aggregates long-read epigenetic data onto haploid
linear contigs, and then threads either long-read or short-read epigenetic data into a pan-genome reference. In
Aim 3, we will create fundamental operation tools for processing epigenetic data within a pan-genome to identify
epigenetic and genetic features at specific points of interest within a pan-genome in a sample-, path-, and read-
aware manner. Finally, we will apply our tooling to existing long-read and short-read epigenetic datasets to
identify genetic variants within the pan-genome reference associated with haplotype-, paralog-, and sample-
specific epigenetic features.

## Key facts

- **NIH application ID:** 10976065
- **Project number:** 1U01HG013744-01
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Andrew Ben Stergachis
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $1,418,806
- **Award type:** 1
- **Project period:** 2024-09-19 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10976065

## Citation

> US National Institutes of Health, RePORTER application 10976065, Tooling for accurately studying the epigenome along the human pangenome reference (1U01HG013744-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10976065. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
