# Efficient Statistical Learning Methods for Personalized Medicine Using Large Scale Biomedical Data

> **NIH NIH R01** · UNIV OF NORTH CAROLINA CHAPEL HILL · 2020 · $331,147

## Abstract

Project Summary:
 Coronavirus disease 19 (COVID-19) has created a major public health crisis around the world. The novel
coronavirus was observed to have a long incubation period and extremely infectious during this period. No proven
effective treatment or vaccine is available. Massive public interventions have been implemented in many countries
and states in the United States (US) at different phases of the outbreak with varying combinations of social dis-
tancing, mobility restriction and population behavioral change. Decisions on how to implement these interventions
(e.g., when to impose and relax mitigation measures) rely on important statistics of COVID epidemiology (e.g.,
effective reproduction number) that characterize and predict the course of COVID-19 outbreak. However, there is
a lack of robust and parsimonious model of COVID epidemic that can accurately reﬂect the heterogeneity between
susceptible populations and regions (e.g., demographics, healthcare capacity, social and economic determinants).
There is no rigorous study to guide precision public health interventions that are tailored to a population or region
depending on their characteristics. Furthermore, due to the non-randomized nature of public health interventions,
it is critical to account for biases and confounding when comparing mitigation measures of COVID-19 across re-
gions. To address these challenges, this project develops robust and generalizable analytic methods to evaluate
public health interventions and assess individual patient risks of COVID-19 infection and complications. In Aim 1,
we will develop dynamic and robust statistical models to predict the disease epidemic. The models will estimate
the date of the ﬁrst unknown infection case, instantaneous effective reproduction number, and account for the incu-
bation period of COVID-19 virus. Furthermore, heterogeneity in population's demographics, social and economic
indicators, healthcare capacity and geographic locations will be incorporated to reﬂect their impacts on COVID
epidemic. Under a longitudinal quasi-experimental design, we will provide valid inference for comparing public
health interventions implemented at different regions while accounting for confounding bias. Multiple sources of
data from different states in the US will be analyzed to empirically test which states' response strategies are more
effective and in which subpopulation. In Aim 2, we will focus on developing precise risk assessment tool of individ-
ual COVID-19 patients using electronic health records (EHRs) collected at New York Presbyterian hospital in New
York City, an epicenter of COVID-19. We will engineer features of patient's pre-conditions associated with severe
COVID complications, recovery, or death. More importantly, we will engineer features that represent proxies of virus
exposures from patients' geographic information. We will use machine learning techniques to create quantitative
summaries of patient prognosis (e.g., transit...

## Key facts

- **NIH application ID:** 10161345
- **Project number:** 3R01GM124104-03S1
- **Recipient organization:** UNIV OF NORTH CAROLINA CHAPEL HILL
- **Principal Investigator:** Yuanjia Wang
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $331,147
- **Award type:** 3
- **Project period:** 2018-04-01 → 2022-03-21

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10161345

## Citation

> US National Institutes of Health, RePORTER application 10161345, Efficient Statistical Learning Methods for Personalized Medicine Using Large Scale Biomedical Data (3R01GM124104-03S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10161345. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*