# High-Dimensional Regression for Data Integration

> **NIH NIH P01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2023 · $275,336

## Abstract

Project 1: High-Dimensional Regression for Data Integration 
Abstract
Associated with high-dimensional omic (e.g. genomic, transcriptomic, epigenomic) features there is a
rich set of functional and regulatory annotations, pathway information, and disease-specific knowledge 
from previous studies that is routinely used to interpret analyses of omic data. In this project, we 
propose to develop integrative regression methods capable of incorporating this array of external 
information a priori, rather than post hoc, to improve prediction performance, selection of predictive 
and associated features, and to gain insight into potential biological mechanisms in studies with highdimensional omic data. In our first Specific Aim we propose a general high-dimensional mixed 
modelling framework for integrating meta-features (e.g. functional annotations, pathways) into omic 
studies, with the flexibility to handle quantitative, categorical, and time-to-event outcomes, as well as 
the ability to accommodate correlated data through the inclusion of random effects. Our proposed 
approach brings together mixed modeling, high-dimensional regularized regression, and an empirical 
Bayes strategy that makes the direct estimation of tuning penalty parameters from the data analytically 
and computationally tractable. The proposed integrative models can be deployed in ‘predictive mode’ 
to develop diagnostic and prognostic signatures, or in ‘discovery mode’ to identify omic features 
associated with disease outcomes. We also propose an accompanying set of tools for inference and 
model interpretation. Our second Specific Aim focuses on integrative high-dimensional regression 
models for transcription-wide association studies (TWAS). We propose to leverage recent advances 
linking enhancers and other DNA regulatory elements and their proximal target genes to improve the 
prediction of genetically regulated gene expression with the goal of boosting the power and localization 
ability of TWAS. In our third Specific Aim, we focus on applications of the methods in Aims 1 and 2 to
several cancer datasets.

## Key facts

- **NIH application ID:** 10707448
- **Project number:** 5P01CA196569-08
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Juan Pablo Lewinger
- **Activity code:** P01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $275,336
- **Award type:** 5
- **Project period:** 2016-07-01 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10707448

## Citation

> US National Institutes of Health, RePORTER application 10707448, High-Dimensional Regression for Data Integration (5P01CA196569-08). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10707448. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
