Race/Ethnicity-Specific Algorithms of Chronic Stress Exposures for Preterm Birth Risk: Machine Learning Approach

NIH RePORTER · NIH · K01 · $144,503 · view on reporter.nih.gov ↗

Abstract

Racial/ethnic disparities in preterm birth (PTB) are persistent in the U.S., with a higher prevalence of PTB in non-Hispanic (N-H) Black women than their N-H White counterparts. However, the underlying mechanism of such Black-White differences is not well understood. Even extensive biomedical, behavioral, and socio- demographic risk factors can explain only about half of PTB incidence. Chronic stress has received significant attention as a robust predictor of PTB, particularly among racial/ethnic minority groups. Nevertheless, literature shows inconsistent evidence on the relationships among race/ethnicity, chronic stress, and PTB, mainly because of the complexities involved in assessing women’s chronic stress exposures. Accurate chronic stress measures should capture the nature of stressors: cumulative, interactive, and population-specific. In this regard, conventional statistical models (e.g., linear regression) have limited ability to model chronic stress exposures with high precision. Thus, this study will adopt machine learning (ML), a state-of-the-art modeling technique, to compute non-linear and synergistic relationships among chronic stressors, detect unknown patterns, and reflect subtle differences in chronic stressors between N-H White and N-H Black women for more accurate prediction of their PTB risk. I will develop simple, accurate, and explainable ML algorithms of chronic stress exposures by building a hybrid algorithm specific to N-H White and N-H Black women and computing SHAP (SHapley Additive exPlanations) values. Specifically, the hybrid algorithm will combine Multivariate Adaptive Regression Splines (MARS) and Deep Neural Network (DNN) algorithms where MARS will select only “important” chronic stressor variables for each race/ethnicity to serve as DNN’s input features for PTB risk prediction. Additionally, a SHAP value for each chronic stressor in the final algorithm will quantify its degree of contribution to the predicted PTB risk. The ML algorithms will be trained and tested on a large national database—Pregnancy Risk Assessment Monitoring System (2012-2017)—collected by 37 U.S. states. The study’s specific aims are to 1) compare the accuracy among logistic regression (LR) and two ML algorithms (DNN and hybrid) of chronic stress exposures to predict PTB risk using area under the receiver operating characteristic curve (AUC); 2) compare the accuracy between race/ethnicity-combined and race/ethnicity- specific models within LR, DNN, and hybrid algorithms; and 3) determine the extent of the importance of chronic stressors to the predicted PTB risk in the best-performing algorithm using regression coefficients (for LR) or SHAP values (for ML algorithm). Career development goals are to 1) develop expertise in stress measurement in the context of maternal and child health, 2) acquire knowledge and skills in ML and the analysis of large-scale data, and 3) cultivate health informatics-focused manuscript and grant preparation skills ...

Key facts

NIH application ID
10448093
Project number
1K01NR019651-01A1
Recipient
EMORY UNIVERSITY
Principal Investigator
Sangmi Kim
Activity code
K01
Funding institute
NIH
Fiscal year
2022
Award amount
$144,503
Award type
1
Project period
2022-05-11 → 2025-04-30