# Recovering reproducible and local signal in genomic data

> **NIH NIH P20** · BROWN UNIVERSITY · 2024 · $167,063

## Abstract

Challenge. One of the most important challenge in biological science today is to elucidate the extent to which complex 
experiments, which measure hundreds of thousands of variables, can be analyzed to generate consistent and global signal when 
repeated, to identify local signal related to tissues, cancer types or population structure. 
Importantly, we must include the intrinsic diversity of variation across different studies and control for technical confounders as 
part of this task. 
Most measurements from high-dimensional biological experiments display variation arising both from biological sources, such 
as genes belonging to a different tissue or different positions in the brains. While some components reappear across multiple 
tissues, global biological signal is more likely than spurious signal to be reproducibly present in multiple tissues. Our challenge is 
to systematically and reliably identify the global biological factors, and estimate the signal specific to each study. 
Aims. In order to meet this challenge, we propose a novel concept that combines ideas from meta-analysis and statistical 
modeling dimension reduction. We posit that one can develop high-dimensional data reduction techniques that at the same 
time function as multi-study tools to extract consistent signal and local specific signal.This proposal develops statistical methods 
for identifying shared and study-specific signal across multiple cancer studies. In this work, it is crucial to understand the shared 
signal - here, gene co-expression shared across different cancer types - and the signal specific to each study. This proposal will 
pilot this concept by building novel classes of multi-logistic regression and factor analysis methods. The key is to decompose 
data from each study into latent dimensions, some of which are global while some are not and only specific to a local signal. 
This will simultaneously achieve two goals: learning reproducible biological features shared among studies, and identifying the 
variation specific of each study. Specific aims include methodology design, software development and applications. 
Impact. The concepts, approaches, and software tools generated by this research will have a direct impact on the ability of the 
biomedical community to reproducibly identify stable signals across multiple high-throughput biology studies and to capture 
local signals. Our tools will also enable a more reliable identification of artifacts and thus facilitate more efficient experimental 
designs and guide technological development. We also hope to impact data sciences beyond genomics. Our study will be the 
first opportunity to evaluate the novel concept of sharing latent factors as well as estimating local latent structures. The 
proposed work could subsequently provide the inspiration, as well and the practical foundation, for expanding this concept to a 
variety of another dimension reduction and machine learning techniques.

## Key facts

- **NIH application ID:** 10904898
- **Project number:** 5P20GM109035-09
- **Recipient organization:** BROWN UNIVERSITY
- **Principal Investigator:** Roberta De Vito
- **Activity code:** P20 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $167,063
- **Award type:** 5
- **Project period:** 2016-06-01 → 2026-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10904898

## Citation

> US National Institutes of Health, RePORTER application 10904898, Recovering reproducible and local signal in genomic data (5P20GM109035-09). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10904898. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
