Novel machine learning and missing data methods for improving estimates of physical activity, sedentary behavior and sleep using accelerometer data

NIH RePORTER · NIH · R01 · $328,975 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY We propose novel statistical and machine learning methods for processing and analyzing accelerometer data for studying physical activity, sedentary behavior, and sleep and their effects on outcomes such as cardiovascular health. Methods to accurately estimate and characterize physical activity, sedentary behavior and sleep are crucially needed. Accelerometers have been widely adopted as the standard objective measure of movement in free-living humans. Recent advances have spawned instruments that collect enormous amounts of data that has far outpaced the research community’s ability to meaningfully interpret them. Current studies rely on outdated methods for identifying non-wear and addressing missing data, potentially yielding biased and inefficient estimates of relationships between behavioral activity patterns and outcomes. Importantly, methods for distinguishing between non-wear periods and those that represent sedentary behavior or sleep have not been validated using a gold standard in free-living contexts. The handling of non-wear periods using a statistically valid approach that exploits the multivariate and time- series nature of the data has yet to be developed. Thus, new methods are needed to address current gaps. We propose developing and validating an ensemble classifier to distinguish non-wear time. We will adapt and validate multiple imputation methods that exploit the multivariate and time-series nature of the data to handle non-wear time in analyses that make use of entire profiles of physical activity. Specifically, we will evaluate methods for incorporating multiple imputation for handling missing data from non-wear when applying adaptive clustering algorithms to identify distinct patterns of sleep and activity in order to relate them to outcomes in a generalized linear mixed effects model framework. We will create open-source user-friendly software that can be adopted and enhanced by the research community. Our approach integrates three novel data resources to develop our methods – two with knowledge of true activity and non-wear, and a third generated from a unique four-year longitudinal time series for both accelerometry and cardiovascular risk factor measures in a real- world setting. It offers an opportunity to develop and illustrate methods using data generated from wearable devices in a natural environment that includes missing data. This is the first study to incorporate missing data methods into learning algorithms under a generalized linear mixed effects model framework for accelerometer studies. Such methods will be critical for both observational and clinical trial research in real-world settings, where wear and non-wear time are not directly observed. The resulting insights and tools will also be highly applicable to the processing and analysis of other types of intensively sampled serial data, such as those generated from mobile digital devices.

Key facts

NIH application ID: 10120571
Project number: 1R01LM013355-01A1
Recipient: STANFORD UNIVERSITY
Principal Investigator: MANISHA DESAI
Activity code: R01
Funding institute: NIH
Fiscal year: 2021
Award amount: $328,975
Award type: 1
Project period: 2021-05-03 → 2025-01-31