# Interpretable Computational Models of Functional Genomics Data

> **NIH NIH R01** · COLD SPRING HARBOR LABORATORY · 2022 · $417,299

## Abstract

PROJECT SUMMARY
Understanding how the coordination of cis-regulatory elements (CREs) influences biological processes, such as
transcription and alternative splicing, is a major goal in computational genomics. This remains a challenge
because CRE activity at any given locus may depend on a host of other factors, including sequence context
and/or the presence of other CREs nearby. Recent developments in deep convolutional neural networks (CNNs)
have revolutionized our ability to predict regulatory functions from DNA sequence. Unlike previous computational
methods based on position-weight matrices, which capture an additive model of CREs, CNNs can, in principle,
also learn higher-order dependencies within the CRE, with other CREs, and with the broader sequence context.
However, CNNs are essentially black box models, with parameters that don’t have clear biological meaning.
Hence it remains a challenge to translate the improved predictions of a CNN to new biological insights. Here we
propose to develop three different computational methods that can comprehensively characterize higher-order
interactions within CREs and across different CREs from functional genomics data, specifically ChIP-seq and
CLIP-seq data publicly available through ENCODE. Each method serves as its own separate Aim and will be
developed in parallel. In Aim 1, we will develop a new post hoc model interpretability method based on employing
interpretable quantitative models originally developed to understand complex genetic interactions in laboratory-
based comprehensive mutagenesis (e.g. multiplex assays of variant effects) to characterize CRE dependencies
learned by a CNN, using synthetic sequences to target specific biological hypotheses. In Aim 2, we will develop
new CNN architectures where the learned parameters will express higher-order interactions that have direct
biological interpretations. In Aim 3, we will combine a Bayesian nonparametric framework for modeling CREs
with CNN-based CRE annotations and GPU acceleration to develop new methods for understanding how CREs
are specified in the genome. Successful completion of these Aims will provide a leap forward in our
understanding of higher-order CRE dependencies that are exploited but have not yet been fully revealed by
CNNs. This work will provide the community with: (1) a new suite of open-source computational tools that
address the problem of modeling CREs and their dependencies in functional genomics data; and (2) a
comprehensive genome-wide catalogue of CRE syntax for transcription factors and RNA-binding proteins that
will be hosted on a user-friendly webserver.

## Key facts

- **NIH application ID:** 10453055
- **Project number:** 1R01HG012131-01A1
- **Recipient organization:** COLD SPRING HARBOR LABORATORY
- **Principal Investigator:** Peter K Koo
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $417,299
- **Award type:** 1
- **Project period:** 2022-09-07 → 2027-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10453055

## Citation

> US National Institutes of Health, RePORTER application 10453055, Interpretable Computational Models of Functional Genomics Data (1R01HG012131-01A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10453055. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
