# Evaluation and Development of Statistical Methods for Data Harmonization in Molecular Prognostication

> **NIH NIH R21** · SLOAN-KETTERING INST CAN RESEARCH · 2021 · $496,342

## Abstract

PROJECT SUMMARY
Survival analysis plays a foundational role in biomedical transcriptomics studies for developing reliable
predictors of patient prognosis and treatment response. While survival analysis methods are available to
address the issues of high dimensionality and signal sparsity, research is still lacking on the issue of data
artifacts associated with disparate experimental handling, which is a pivotal feature of transcriptomics data.
Published studies often deal with handling artifacts by borrowing methods that were developed for differential
expression analysis, the most popular of which is quantile normalization for microarray data and scaling
normalization for sequencing data. Despite the unfounded optimism for such ‘off-label’ uses, we found that
normalization may distort a marker’s ordering across samples and subsequently compromise the detection of
outcome-associated markers and the accuracy of outcome prediction. Thus, there is a pressing need to re-
evaluate existing methods for dealing with these data artifacts and tailor new ones specifically for the derivation
of molecular prognosticators so that it can be done accurately and reproducibly. In this proposal, we will first fill
the knowledge gap for microRNAs (a class of small RNAs that play an important regulatory role of gene
expression in humans) using data that are realistically distributed and robustly benchmarked. We will then
develop new methods for managing handling artifacts, leveraging the survival regression framework. We will
assess the performance of the new methods in comparison with existing methods using simulation tools and
demonstrate their use with an application to ovarian cancer data from The Cancer Genome Atlas. Our project
is expected to advance the knowledge needed for optimizing data harmonization in microRNA data and thus
accelerating their reproducible translations to clinically useful predictors and for paving the way to press on
these issues in RNA data and their translations.

## Key facts

- **NIH application ID:** 10303963
- **Project number:** 1R21HG012124-01
- **Recipient organization:** SLOAN-KETTERING INST CAN RESEARCH
- **Principal Investigator:** Li-Xuan Qin
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $496,342
- **Award type:** 1
- **Project period:** 2021-09-03 → 2025-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10303963

## Citation

> US National Institutes of Health, RePORTER application 10303963, Evaluation and Development of Statistical Methods for Data Harmonization in Molecular Prognostication (1R21HG012124-01). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10303963. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
