# Using a Novel Machine Learning Based Data Integration Procedure to Understand the Cherokee Nation Community Population Health

> **NIH NIH S06** · CHEROKEE NATION · 2021 · $57,750

## Abstract

PROJECT SUMMARY
Previous studies show discrepancies of health and behavior prevalence between American Indian (AI)
populations and other racial or ethnic groups. Most health surveys have certain limitations for studying AIs due
to the small sample sizes for AI populations. Data collected by Cherokee Nation (CN) Health Survey provides
an excellent opportunity to conduct research for AIs since the sample size is large and the survey contains
extensive information. However, the CN Health Survey focused only on CN citizens who used CN clinics, and
thus the sample may suffer from sampling, coverage, and nonresponse errors without further proper
adjustments. Such difficulties greatly hamper the analysis of AI populations in health and behavior research.
Our general hypothesis is that data integration by combining information from non-probability and probability
samples can reduce sampling, coverage, and nonresponse errors in the original non-probability sample. The
Goal of this project is to develop an accurate and robust data integration methodology for AI population analysis
specifically tailored to health and behavior research and disseminate the methodology to local stakeholders.
In recent years, we have: 1) studied data integration using calibration and parametric modeling approaches; 2)
investigated machine learning and propensity score modeling methods in survey sampling and other fields; and
3) assembled an experienced multi-disciplinary team of experts.
In this project, we propose to capitalize on our expertise and fulfill the following Specific Aims:
Aim 1. Develop and evaluate our proposed novel data integration approaches using machine learning
and propensity score modeling by real data.
We will use real data to validate the proposed methods in terms of accuracy and robustness to the various data
types. The performance will also be assessed by comparing with results from existing data integration methods
such as calibration and parametric modeling approaches. The planned study takes advantage of a unique data
source and expands the impact of Indian Health Service (IHS)-funded research. We expect this novel integration
method will vertically advance the field by facilitating the analysis based on non-probability samples, which can
provide in-depth understanding regarding AI population-based health and behavior studies.
Aim2. Develop county-level small area estimation (SAE) models and examine the association of SAE
estimates with county-level geographic and health related environmental information.
We will compare the estimates based on SAE with direct estimates obtained in Aim 1. Multi-level model will be
built to examine the association between health-related outcomes with county-level geographic and
environmental factors.
Aim 3. Disseminate our research products to local and national stakeholders.
After CN IRB approval, we will disseminate our proposed methods, usage of our data files, and Computational
Codes (e.g. SAS macros and/or R pack...

## Key facts

- **NIH application ID:** 10223769
- **Project number:** 1S06GM142119-01
- **Recipient organization:** CHEROKEE NATION
- **Principal Investigator:** Ashley Comiford
- **Activity code:** S06 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $57,750
- **Award type:** 1
- **Project period:** 2021-09-20 → 2025-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10223769

## Citation

> US National Institutes of Health, RePORTER application 10223769, Using a Novel Machine Learning Based Data Integration Procedure to Understand the Cherokee Nation Community Population Health (1S06GM142119-01). Retrieved via AI Analytics 2026-06-14 from https://api.ai-analytics.org/grant/nih/10223769. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
