Evaluation and Development of Statistical Methods for Data Harmonization in Molecular Prognostication

NIH RePORTER · NIH · R21 · $496,342 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Survival analysis plays a foundational role in biomedical transcriptomics studies for developing reliable predictors of patient prognosis and treatment response. While survival analysis methods are available to address the issues of high dimensionality and signal sparsity, research is still lacking on the issue of data artifacts associated with disparate experimental handling, which is a pivotal feature of transcriptomics data. Published studies often deal with handling artifacts by borrowing methods that were developed for differential expression analysis, the most popular of which is quantile normalization for microarray data and scaling normalization for sequencing data. Despite the unfounded optimism for such ‘off-label’ uses, we found that normalization may distort a marker’s ordering across samples and subsequently compromise the detection of outcome-associated markers and the accuracy of outcome prediction. Thus, there is a pressing need to re- evaluate existing methods for dealing with these data artifacts and tailor new ones specifically for the derivation of molecular prognosticators so that it can be done accurately and reproducibly. In this proposal, we will first fill the knowledge gap for microRNAs (a class of small RNAs that play an important regulatory role of gene expression in humans) using data that are realistically distributed and robustly benchmarked. We will then develop new methods for managing handling artifacts, leveraging the survival regression framework. We will assess the performance of the new methods in comparison with existing methods using simulation tools and demonstrate their use with an application to ovarian cancer data from The Cancer Genome Atlas. Our project is expected to advance the knowledge needed for optimizing data harmonization in microRNA data and thus accelerating their reproducible translations to clinically useful predictors and for paving the way to press on these issues in RNA data and their translations.

Key facts

NIH application ID
10303963
Project number
1R21HG012124-01
Recipient
SLOAN-KETTERING INST CAN RESEARCH
Principal Investigator
Li-Xuan Qin
Activity code
R21
Funding institute
NIH
Fiscal year
2021
Award amount
$496,342
Award type
1
Project period
2021-09-03 → 2025-02-28