# Novel machine learning and missing data methods for improving estimates of physical activity, sedentary behavior and sleep using accelerometer data

> **NIH NIH R01** · STANFORD UNIVERSITY · 2023 · $334,475

## Abstract

PROJECT SUMMARY
We propose novel statistical and machine learning methods for processing and analyzing accelerometer data
for studying physical activity, sedentary behavior, and sleep and their effects on outcomes such as
cardiovascular health. Methods to accurately estimate and characterize physical activity, sedentary
behavior and sleep are crucially needed. Accelerometers have been widely adopted as the standard
objective measure of movement in free-living humans. Recent advances have spawned instruments that
collect enormous amounts of data that has far outpaced the research community’s ability to meaningfully
interpret them. Current studies rely on outdated methods for identifying non-wear and addressing missing data,
potentially yielding biased and inefficient estimates of relationships between behavioral activity
patterns and outcomes. Importantly, methods for distinguishing between non-wear periods and those that
represent sedentary behavior or sleep have not been validated using a gold standard in free-living contexts.
The handling of non-wear periods using a statistically valid approach that exploits the multivariate and time-
series nature of the data has yet to be developed. Thus, new methods are needed to address current gaps.
We propose developing and validating an ensemble classifier to distinguish non-wear time. We will adapt and
validate multiple imputation methods that exploit the multivariate and time-series nature of the data to handle
non-wear time in analyses that make use of entire profiles of physical activity. Specifically, we will evaluate
methods for incorporating multiple imputation for handling missing data from non-wear when applying adaptive
clustering algorithms to identify distinct patterns of sleep and activity in order to relate them to outcomes in a
generalized linear mixed effects model framework. We will create open-source user-friendly software that can
be adopted and enhanced by the research community. Our approach integrates three novel data resources to
develop our methods – two with knowledge of true activity and non-wear, and a third generated from a unique
four-year longitudinal time series for both accelerometry and cardiovascular risk factor measures in a real-
world setting. It offers an opportunity to develop and illustrate methods using data generated from wearable
devices in a natural environment that includes missing data. This is the first study to incorporate missing data
methods into learning algorithms under a generalized linear mixed effects model framework for accelerometer
studies. Such methods will be critical for both observational and clinical trial research in real-world settings,
where wear and non-wear time are not directly observed. The resulting insights and tools will also be highly
applicable to the processing and analysis of other types of intensively sampled serial data, such as those
generated from mobile digital devices.

## Key facts

- **NIH application ID:** 10548871
- **Project number:** 5R01LM013355-03
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** MANISHA DESAI
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $334,475
- **Award type:** 5
- **Project period:** 2021-05-03 → 2025-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10548871

## Citation

> US National Institutes of Health, RePORTER application 10548871, Novel machine learning and missing data methods for improving estimates of physical activity, sedentary behavior and sleep using accelerometer data (5R01LM013355-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10548871. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
