# Integrated Analysis for Genetic Association and Prediction

> **NIH NIH P01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2020 · $351,398

## Abstract

ABSTRACT 
The overall goal of this project is to develop novel statistical methods for integrative analysis of genomic data in 
cancer research. We propose to develop analytical tools that can integrate data from multiple genomic 
platforms and incorporate external omic information from publically available databases. These tools will be 
applicable to both etiological studies geared toward causal discovery and to clinical and translational studies 
geared toward predictive modeling. 
Advances in high-throughput molecular technologies have enabled large-scale omic projects (e.g. Encode, The 
Cancer Genome Atlas, Epigenome Roadmap) to generate vast amounts of information on the structure, 
function and regulation of the genome. In addition to this publically available data, individual studies are 
increasingly generating multiplatform genomic profiles (e.g. genotypes, gene expression, methylation copy 
number variation, miRNA) to elucidate the complex mechanisms of cancer development and progression, and 
investigate determinants and predictors of health and clinical outcomes. Integration across these multiple 
genomic “dimensions” and incorporation of the available external information can increase the ability to 
discovery causal relationships (e.g. Cancer-SNP associations), enhance prediction and prognosis modeling 
(e.g. cancer aggressiveness), and provide insights into biological mechanisms. We propose two analytic 
approaches aimed at addressing the challenges to effective integration across multiplatform genomic data and 
incorporation of external information from omic projects. The first approach (Aim 1) is a Bayesian regression 
and feature selection method that can integrate prior omic information in a very flexible manner allowing the 
data to `speak for itself' to determine which pieces of external information are relevant for the problem at hand. 
The method works with individual-level data and also with meta-analytic summaries, making it well suited for 
analyzing data from large multi-study consortia. The second approach (Aim 2) is a regularized regression and 
feature selection method for integrating multiplatform genomic features measured on the same set of 
individuals. The method is designed to scale to the very large numbers of features typical of genomewide 
platforms, to account for the different properties of each genomic data type, and to incorporate relevant 
external information to increase efficiency. Both approaches can be applied for causal discovery and for 
developing predictive and prognostic models. We will apply our methods to search for novel risk variants in the 
CORECT consortium of genome association studies, and to construct a prognostic model of CRC recurrence 
based on genomewide expression methylation data in the ColoCare consortium cohort of CRC patients. This 
work will provide new tools for analyzing high-dimensional multi-platform genomic that can take 
advantage of available external information.

## Key facts

- **NIH application ID:** 9991771
- **Project number:** 5P01CA196569-05
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Juan Pablo Lewinger
- **Activity code:** P01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $351,398
- **Award type:** 5
- **Project period:** — → —

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9991771

## Citation

> US National Institutes of Health, RePORTER application 9991771, Integrated Analysis for Genetic Association and Prediction (5P01CA196569-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9991771. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*