PROJECT SUMMARY Previous studies showed discrepancies of health and behavior prevalence between American Indians (AI) population and other racial or ethnic groups. Most health surveys have certain limitations when studying AI population due to the small sample sizes for AI population. Data collected by AI Tribal Epidemiology Centers (TECs) provides an excellent opportunity to conduct research for AI population due to sufficient sample size and extensive information. However, most surveys conducted by TECs used non-probability sampling design (e.g. convenient sample) due to its lower cost and increased time efficiency. Non-probability sample may suffer from sampling, coverage and nonresponse errors without further proper adjustments. Such difficulties greatly hampers the analysis of AI population in health and behavior research. Our general hypothesis is that data integration by combining information from non-probability and probability samples can reduce sampling, coverage and nonresponse errors in original non-probability sample. The Goal of this project is to develop an accurate and robust data integration methodology for AI population analysis specifically tailored to health and behavior research. During the past years, we have 1) studied data integration using calibration and parametric modeling approaches; 2) investigated machine learning and propensity score modeling methods in survey sampling and other fields; and 3) assembled an experienced team of multi-disciplinary team of experts. In this project, we propose to capitalize on our expertise and fulfill the following Specific Aims: Aim 1. Develop a data integration approach using machine learning and propensity score modeling We will develop machine learning and propensity score based data integration approaches to combine information from non-probability and probability samples. Compared to existing methods (i.e., Calibration, Parametric approach), our proposed approaches are more robust against the failure of underlying model assumptions. The inference is more general and multi-purpose (e.g. one can estimate most parameters such as means, totals and percentiles). Simulation studies will be performed to compare our proposed methods with other existing methods. A computing package will be built to implement the method in other settings. Aim 2. Evaluate the accuracy and robustness of the proposed method in AI health and behavior research We will use real data to validate the proposed methods in terms of accuracy and robustness to the various data types. The performance will also be assessed by comparing with results from existing data integration methods such as calibration and parametric modeling approaches. The planned study takes advantage of a unique data source and expands the impact of the Indian Health Service (IHS)-funded research. We expect this novel integration method will vertically advance the field by facilitating the analysis based on non-probability sample, which can provid...