High-Dimensional Regression for Data Integration

NIH RePORTER · NIH · P01 · $275,336 · view on reporter.nih.gov ↗

Abstract

Project 1: High-Dimensional Regression for Data Integration Abstract Associated with high-dimensional omic (e.g. genomic, transcriptomic, epigenomic) features there is a rich set of functional and regulatory annotations, pathway information, and disease-specific knowledge from previous studies that is routinely used to interpret analyses of omic data. In this project, we propose to develop integrative regression methods capable of incorporating this array of external information a priori, rather than post hoc, to improve prediction performance, selection of predictive and associated features, and to gain insight into potential biological mechanisms in studies with highdimensional omic data. In our first Specific Aim we propose a general high-dimensional mixed modelling framework for integrating meta-features (e.g. functional annotations, pathways) into omic studies, with the flexibility to handle quantitative, categorical, and time-to-event outcomes, as well as the ability to accommodate correlated data through the inclusion of random effects. Our proposed approach brings together mixed modeling, high-dimensional regularized regression, and an empirical Bayes strategy that makes the direct estimation of tuning penalty parameters from the data analytically and computationally tractable. The proposed integrative models can be deployed in ‘predictive mode’ to develop diagnostic and prognostic signatures, or in ‘discovery mode’ to identify omic features associated with disease outcomes. We also propose an accompanying set of tools for inference and model interpretation. Our second Specific Aim focuses on integrative high-dimensional regression models for transcription-wide association studies (TWAS). We propose to leverage recent advances linking enhancers and other DNA regulatory elements and their proximal target genes to improve the prediction of genetically regulated gene expression with the goal of boosting the power and localization ability of TWAS. In our third Specific Aim, we focus on applications of the methods in Aims 1 and 2 to several cancer datasets.

Key facts

NIH application ID
10707448
Project number
5P01CA196569-08
Recipient
UNIVERSITY OF SOUTHERN CALIFORNIA
Principal Investigator
Juan Pablo Lewinger
Activity code
P01
Funding institute
NIH
Fiscal year
2023
Award amount
$275,336
Award type
5
Project period
2016-07-01 → 2027-08-31