Efficient Statistical Learning Methods for Personalized Medicine Using Large Scale Biomedical Data

NIH RePORTER · NIH · R01 · $331,147 · view on reporter.nih.gov ↗

Abstract

Project Summary: Coronavirus disease 19 (COVID-19) has created a major public health crisis around the world. The novel coronavirus was observed to have a long incubation period and extremely infectious during this period. No proven effective treatment or vaccine is available. Massive public interventions have been implemented in many countries and states in the United States (US) at different phases of the outbreak with varying combinations of social dis- tancing, mobility restriction and population behavioral change. Decisions on how to implement these interventions (e.g., when to impose and relax mitigation measures) rely on important statistics of COVID epidemiology (e.g., effective reproduction number) that characterize and predict the course of COVID-19 outbreak. However, there is a lack of robust and parsimonious model of COVID epidemic that can accurately reﬂect the heterogeneity between susceptible populations and regions (e.g., demographics, healthcare capacity, social and economic determinants). There is no rigorous study to guide precision public health interventions that are tailored to a population or region depending on their characteristics. Furthermore, due to the non-randomized nature of public health interventions, it is critical to account for biases and confounding when comparing mitigation measures of COVID-19 across re- gions. To address these challenges, this project develops robust and generalizable analytic methods to evaluate public health interventions and assess individual patient risks of COVID-19 infection and complications. In Aim 1, we will develop dynamic and robust statistical models to predict the disease epidemic. The models will estimate the date of the ﬁrst unknown infection case, instantaneous effective reproduction number, and account for the incu- bation period of COVID-19 virus. Furthermore, heterogeneity in population's demographics, social and economic indicators, healthcare capacity and geographic locations will be incorporated to reﬂect their impacts on COVID epidemic. Under a longitudinal quasi-experimental design, we will provide valid inference for comparing public health interventions implemented at different regions while accounting for confounding bias. Multiple sources of data from different states in the US will be analyzed to empirically test which states' response strategies are more effective and in which subpopulation. In Aim 2, we will focus on developing precise risk assessment tool of individ- ual COVID-19 patients using electronic health records (EHRs) collected at New York Presbyterian hospital in New York City, an epicenter of COVID-19. We will engineer features of patient's pre-conditions associated with severe COVID complications, recovery, or death. More importantly, we will engineer features that represent proxies of virus exposures from patients' geographic information. We will use machine learning techniques to create quantitative summaries of patient prognosis (e.g., transit...

Key facts

NIH application ID: 10161345
Project number: 3R01GM124104-03S1
Recipient: UNIV OF NORTH CAROLINA CHAPEL HILL
Principal Investigator: Yuanjia Wang
Activity code: R01
Funding institute: NIH
Fiscal year: 2020
Award amount: $331,147
Award type: 3
Project period: 2018-04-01 → 2022-03-21