# Semi-parametric Statistical Methods for Predicting High-cost VA Patients Using High-Dimensional Covariates

> **NIH VA I01** · VA PUGET SOUND HEALTHCARE SYSTEM · 2020 · —

## Abstract

Background: The rising demands and health care costs make it urgent to develop new statistical
methods to accurately predict high-costs VA patients and important risk factors associated with high
costs. The ability to prospectively predict high-costs patients is an important step toward controlling
future health care costs. It is also important to identify disease areas that contribute significantly to the
high health care costs and other risk factors which policy makers can target by future intervention.
Health care cost data are characterized by a high level of skewness and heteroscedastic variances.
The large number of variables collected in the VA database provides rich information, but at the same
time, imposes great challenges for statistical analysis and computation. The administrative and
electronic medical record data from VA databases often contain missing data. The new statistical
procedure we propose aims to take advantage of the rich databases in VA for analyzing costs data. It
employs and develops state-of-art high-dimensional semiparametric statistical procedures to handle the
complexity of VA data sets.
Objectives: The project aims to develop a High Costs Prediction (HCP) system, which employs novel
high-dimensional semiparametric statistical methods and algorithms to analyze large VA database with
missing values and occurrence of censoring. The HCP system identifies potential high-costs patients,
provides prediction intervals of future costs, and suggests a list of important risk factors for cost control.
The outcomes of the project will help VA researchers and policy makers design effective interventions
to target those potential high-cost patients and reduce their costs without sacrificing quality of care. The
project will collaborate closely with VA Office of Analytics and Business Intelligence (OABI) to analyze
costs data for patients receiving primary care within VHA. In particular, we will identify a set of
modifiable risk factors (MRF) that are simultaneously important for improving care and reducing costs.
Our proposed work fills in an important blank area of VA health care costs data analysis. By combining
the HCP system with the existing Care Assessment Needs Scoring (CAN) system, we will make
important progress toward the ultimate goal of building a data-driven decision support system.
Methods: The project will develop a novel semiparametric procedure for predicting high costs patients.
The approach we propose incorporates high-dimensional covariates and nonlinear covariate effects
and addresses the challenge of censoring by death, which improves accuracy and increases the
flexibility of modeling. It does not require discretizing the cost and hence fully uses the information
contained in the cost data. It does not require any parametric distributional assumption. Another major
contribution of this project is that we propose weighted semiparametric quantile regression based novel
variable selection procedures which can ...

## Key facts

- **NIH application ID:** 9695867
- **Project number:** 5I01HX002310-02
- **Recipient organization:** VA PUGET SOUND HEALTHCARE SYSTEM
- **Principal Investigator:** Steven Bacchus Zeliadt
- **Activity code:** I01 (R01, R21, SBIR, etc.)
- **Funding institute:** VA
- **Fiscal year:** 2020
- **Award amount:** —
- **Award type:** 5
- **Project period:** 2018-05-01 → 2021-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9695867

## Citation

> US National Institutes of Health, RePORTER application 9695867, Semi-parametric Statistical Methods for Predicting High-cost VA Patients Using High-Dimensional Covariates (5I01HX002310-02). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/9695867. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
