# New Statistical Methods for Medical Signals and Images

> **NIH NIH R01** · STANFORD UNIVERSITY · 2021 · $491,805

## Abstract

The analysis of large datasets from computational biology and medicine represents an important chal-
lenge for Statisticians. These data typically have a large number of correlated features with relatively weak
signals for predicting phenotypes of interest. Examples of such data includes DNA sequences and GWAS,
mass-spectra, MRI and EEG images, RNAseq and protein arrays, to name a few. The broad goal of this
ongoing three-investigator grant is to develop and study statistical techniques that enhance the analysis
and interpretation of these data. Our team combines experience in statistical modeling, algorithmic devel-
opment, and theoretical analysis of these techniques. In the new projects, our focus is the development
of state-of-the art methods to exploit known or implied structure in order to extract useful information from
high-dimensional data.
 The renewal will address these goals through four Speciﬁc Aims. The investigators will study:
1. Principal curves for modeling chromatin architecture. We propose new statistical methodology for
 modeling the chromatin structure of DNA based on contact maps derived from Hi-C assays. We use
 techniques inspired by principal curves, but applied in the context of metric scaling, that take into
 account local structure along the chromosome.
2. Fitting sparse models to large data and to summary data. Many modern datasets (e.g. GWAS with
 1M SNPs and 500K subjects) are computationally challenging. We propose computational advances
 that enable the lasso to scale to such scenarios. Often the authors of published GWAS studies do
 not share the raw data for privacy and other reasons. We propose techniques for approximately ﬁtting
 multivariate versions of these models given only the univariate summary scores typically reported.
3. Estimating high-dimensional eigenstructure in virology and genetics. We will exploit low rank struc-
 ture in sequence data to compare different methods for inference about sectors in viral proteins. For
 quantitative genetics, we will develop statistical theory, methods and software for eigenanalysis of
 multiple levels of variation, and speciﬁcally for genetic covariance matrices.
4. Prediction with side information. Many studies seek biomarker signatures that are predictive of
 outcomes such as disease status under various treatments. We propose a statistical approach for
 exploiting side information such as membership in gene pathways or quantitative measures for each
 biomarker in order to increase the power for discovering signatures in these challenging domains.
 Working together, the investigators and their students will implement the new statistical tools into publi-
cally available software, following a pattern established in earlier cycles of this grant, in which our packages
have found wide use among medical researchers both at Stanford and around the world.

## Key facts

- **NIH application ID:** 10218218
- **Project number:** 5R01GM134483-26
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** Iain M Johnstone
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $491,805
- **Award type:** 5
- **Project period:** 1996-09-10 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10218218

## Citation

> US National Institutes of Health, RePORTER application 10218218, New Statistical Methods for Medical Signals and Images (5R01GM134483-26). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10218218. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
