PROJECT SUMMARY Previous studies show discrepancies of health and behavior prevalence between American Indian (AI) populations and other racial or ethnic groups. Most health surveys have certain limitations for studying AIs due to the small sample sizes for AI populations. Data collected by Cherokee Nation (CN) Health Survey provides an excellent opportunity to conduct research for AIs since the sample size is large and the survey contains extensive information. However, the CN Health Survey focused only on CN citizens who used CN clinics, and thus the sample may suffer from sampling, coverage, and nonresponse errors without further proper adjustments. Such difficulties greatly hamper the analysis of AI populations in health and behavior research. Our general hypothesis is that data integration by combining information from non-probability and probability samples can reduce sampling, coverage, and nonresponse errors in the original non-probability sample. The Goal of this project is to develop an accurate and robust data integration methodology for AI population analysis specifically tailored to health and behavior research and disseminate the methodology to local stakeholders. In recent years, we have: 1) studied data integration using calibration and parametric modeling approaches; 2) investigated machine learning and propensity score modeling methods in survey sampling and other fields; and 3) assembled an experienced multi-disciplinary team of experts. In this project, we propose to capitalize on our expertise and fulfill the following Specific Aims: Aim 1. Develop and evaluate our proposed novel data integration approaches using machine learning and propensity score modeling by real data. We will use real data to validate the proposed methods in terms of accuracy and robustness to the various data types. The performance will also be assessed by comparing with results from existing data integration methods such as calibration and parametric modeling approaches. The planned study takes advantage of a unique data source and expands the impact of Indian Health Service (IHS)-funded research. We expect this novel integration method will vertically advance the field by facilitating the analysis based on non-probability samples, which can provide in-depth understanding regarding AI population-based health and behavior studies. Aim2. Develop county-level small area estimation (SAE) models and examine the association of SAE estimates with county-level geographic and health related environmental information. We will compare the estimates based on SAE with direct estimates obtained in Aim 1. Multi-level model will be built to examine the association between health-related outcomes with county-level geographic and environmental factors. Aim 3. Disseminate our research products to local and national stakeholders. After CN IRB approval, we will disseminate our proposed methods, usage of our data files, and Computational Codes...