# Improving the representativeness of American Indian Tribal Behavioral Risk Factor Surveillance System (TBRFSS) by machine learning and propensity score based data integration approach A1

> **NIH NIH R21** · UNIVERSITY OF OKLAHOMA HLTH SCIENCES CTR · 2020 · $115,176

## Abstract

PROJECT SUMMARY
Previous studies showed discrepancies of health and behavior prevalence between American Indians (AI)
population and other racial or ethnic groups. Most health surveys have certain limitations when studying AI
population due to the small sample sizes for AI population. Data collected by AI Tribal Epidemiology Centers
(TECs) provides an excellent opportunity to conduct research for AI population due to sufficient sample size and
extensive information. However, most surveys conducted by TECs used non-probability sampling design (e.g.
convenient sample) due to its lower cost and increased time efficiency. Non-probability sample may suffer from
sampling, coverage and nonresponse errors without further proper adjustments. Such difficulties greatly
hampers the analysis of AI population in health and behavior research.
Our general hypothesis is that data integration by combining information from non-probability and probability
samples can reduce sampling, coverage and nonresponse errors in original non-probability sample. The Goal
of this project is to develop an accurate and robust data integration methodology for AI population analysis
specifically tailored to health and behavior research.
During the past years, we have 1) studied data integration using calibration and parametric modeling
approaches; 2) investigated machine learning and propensity score modeling methods in survey sampling and
other fields; and 3) assembled an experienced team of multi-disciplinary team of experts.
In this project, we propose to capitalize on our expertise and fulfill the following Specific Aims:
Aim 1. Develop a data integration approach using machine learning and propensity score modeling
We will develop machine learning and propensity score based data integration approaches to combine
information from non-probability and probability samples. Compared to existing methods (i.e., Calibration,
Parametric approach), our proposed approaches are more robust against the failure of underlying model
assumptions. The inference is more general and multi-purpose (e.g. one can estimate most parameters such as
means, totals and percentiles). Simulation studies will be performed to compare our proposed methods with
other existing methods. A computing package will be built to implement the method in other settings.
Aim 2. Evaluate the accuracy and robustness of the proposed method in AI health and behavior research
We will use real data to validate the proposed methods in terms of accuracy and robustness to the various data
types. The performance will also be assessed by comparing with results from existing data integration methods
such as calibration and parametric modeling approaches. The planned study takes advantage of a unique data
source and expands the impact of the Indian Health Service (IHS)-funded research. We expect this novel
integration method will vertically advance the field by facilitating the analysis based on non-probability sample,
which can provid...

## Key facts

- **NIH application ID:** 10063407
- **Project number:** 1R21MD014658-01A1
- **Recipient organization:** UNIVERSITY OF OKLAHOMA HLTH SCIENCES CTR
- **Principal Investigator:** Sixia Chen
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $115,176
- **Award type:** 1
- **Project period:** 2020-09-26 → 2022-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10063407

## Citation

> US National Institutes of Health, RePORTER application 10063407, Improving the representativeness of American Indian Tribal Behavioral Risk Factor Surveillance System (TBRFSS) by machine learning and propensity score based data integration approach A1 (1R21MD014658-01A1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10063407. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
