# An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis

> **NIH NIH R01** · PURDUE UNIVERSITY · 2021 · $285,919

## Abstract

Project Summary
 The dramatic improvement in data collection and acquisition technologies in the past decades has enabled scientists to collect vast amounts of health-related data from biomedical studies. If analyzed properly, these data will
expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to personalized treatment. However, the biomedical data can be rather complex, how to analyze them has posed many
challenges on the existing methods. This proposal attempts to address three fundamental challenges: (i) Missing
data are ubiquitous in biomedical research, how to make a sufficient use of biomedical complex data in presence
of missing values? (ii) With the growing data size, typically comes a growing complexity of the patterns in the
data and of the models needed to account for the patterns. What is the general recipe for estimating parameters of
complex models? (iii) Biomarker identification from high-throughput omics data has been one of major focuses in
cancer research. Yet despite intense effort, the number of biomarkers approved by FDA each year for clinical use is
still in single digits. An important factor contributing to this failure is the lack of appropriate statistical methods
for analyzing such heterogeneous and high-dimensional data. Toward a sufficient use of biomedical complex data,
this project proposes an imputation-consistency algorithm as a general algorithm for high-dimensional missing data
problems. Then the algorithm is extended to address other two challenges under the principles of conditioning and
consistency; in particular, this project proposes some highly efficient and effective statistical algorithms that address
the heterogeneity and high-dimensionality issues encountered in biomarker identifications and eQTL analysis. The
proposed algorithms are applied to (i) select anticancer drug sensitive genes with the CCLE and SANGER data,
(ii) identify prognostic mRNA biomarkers for multiple types of cancers using the TCGA data, (iii) conduct eQTL
analysis for multiple types of cancers using the TCGA data, and (iv) identify informative circulating biomarkers for
type 1 diabetes. The proposed methods are highly efficient and general and can be applied to other types of disease
as well. Statistically, this project is to develop some general, effective, and highly efficient algorithms for complex
data analysis; biomedically, this project will significantly improve accuracy of biomarker identification from omics
data, which advances people's understanding of molecular mechanism and development of precision medicine.
1

## Key facts

- **NIH application ID:** 10073522
- **Project number:** 5R01GM126089-05
- **Recipient organization:** PURDUE UNIVERSITY
- **Principal Investigator:** FAMING LIANG
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $285,919
- **Award type:** 5
- **Project period:** 2018-01-01 → 2023-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10073522

## Citation

> US National Institutes of Health, RePORTER application 10073522, An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis (5R01GM126089-05). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10073522. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
