# Machine learning in publicly available geotagged data to allow monitoring of maternal and child health

> **NIH NIH R01** · STANFORD UNIVERSITY · 2024 · $643,468

## Abstract

Over 95 percent of maternal and child deaths globally occur in low- and middle-income countries (LMICs)
where reliable death registries are mostly unavailable and other dependable data are scarce. Moreover, within
LMICs, the most disadvantaged and remote communities tend to have the highest mortality rates but the least
reliable data. Knowledge on the state of maternal and child health (MCH) in LMICs relies mostly on household
surveys. These expensive and time-consuming surveys cover merely a small minority (usually <2%) of all
communities in a country and are (at best) carried out only every couple of years. We propose a new approach
to measuring MCH indicators that would provide up-to-date estimates at a very high geographic resolution and
at little to no cost. Specifically, we hypothesize that machine learning in geotagged “big data” sources, with a
key source being satellite imagery, can accurately estimate critical MCH indicators for each village and
neighborhood in a country. Satellite imagery is a key data source in this project because it is updated
frequently (at least monthly), covers all areas of a country, and is available free of charge. We will pursue three
specific aims: 1) determining whether machine learning in satellite imagery and other publicly available
geotagged data can accurately estimate key indicators of MCH status in a village or neighborhood at a
snapshot in time and longitudinally over time; 2) determining whether this approach can also accurately
estimate coverage with critical MCH services at a snapshot in time and over time; and 3) achieving sufficient
interpretability of our machine learning models to inform on data needed to improve predictions, which
interventions to target where, and generalizability. The approach to achieving our aims is to use household
surveys that are geocoded at the village and neighborhood level as the “ground truth” against which we will
train machine learning models in satellite images and publicly available geotagged data. Our extensive
preliminary data, along with high-impact publications (e.g., in Science and Nature) by our team on using
satellite images to predict important determinants of MCH in LMICs (such as community-level poverty, surface
water quality, water and sanitation infrastructure, crop yields, travel time to the nearest healthcare facility, and
air pollution), demonstrate that this approach is feasible. Crucially, our predictions will improve over time as the
size and quality (e.g., the resolution of satellite imagery) of our data continue to increase. In addition to a
unique dataset as well as open-access code for our machine learning models, this project will provide maps
with an extremely high geographic resolution of key MCH indicators to inform policymakers and researchers on
the current state of MCH. This project is significant because it can inform the geographically precise planning
and targeting of MCH interventions, may enable the evaluation of past intervent...

## Key facts

- **NIH application ID:** 10800091
- **Project number:** 1R01HD111547-01A1
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** Marshall Burke
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $643,468
- **Award type:** 1
- **Project period:** 2024-08-21 → 2029-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10800091

## Citation

> US National Institutes of Health, RePORTER application 10800091, Machine learning in publicly available geotagged data to allow monitoring of maternal and child health (1R01HD111547-01A1). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10800091. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
