PROJECT SUMMARY/ABSTRACT Clinical recommendations for the management of diseases are mostly based on the average treatment effects observed in randomized controlled trials. However, the beneficial effects of most treatments vary across individuals. Identifying the factors contributing to treatment response heterogeneity is crucial for improving mechanistic understanding of treatment-disease interactions and optimizing patient outcomes. Recent advances in high-throughput technologies in biology and the development of large-scale databases provide an unprecedented opportunity for a more comprehensive understanding of mechanisms underlying inter-individual variations in treatment responses. Several methods, including subgroup analyses and summary score-based analyses, have been used to assess treatment response heterogeneity. To handle the high dimensionality of covariates, machine learning methods have also been developed to assess treatment heterogeneity. However, despite tremendous advancements in machine learning, two key limitations have hindered a large-scale deployment of the current methods to discover markers underpinning treatment heterogeneity from big data. First, the current approaches can fail to uncover strong but unexpected predictors of treatment response heterogeneity. A key problem is that counterfactual treatment responses for an individual under two possible strategies cannot be directly identified. To make progress, a common approach is to compare the average observed treatment responses across subgroups of individuals, defined either based on one or multiple clinical variables. Nonetheless, such approaches can fail to uncover true signatures for treatment heterogeneity. Second, the current methods for predicting treatment heterogeneity often result in models with limited generalizability. A key reason is that participants in the source population data (on which models are developed) are not a random sample from the target population (on which models will be deployed). When the source population data are not representative of the target population and treatment responses vary across factors that influence participation, algorithms that can tailor the model for use in the new target population will require cutting-edge tools in data science. To address these challenges, we propose novel causal machine learning methods that will enable the identification of markers (and their complex relationships) for individual treatment responses, with algorithms adaptable to a new target population. This project will combine theoretical developments with large-scale simulation studies and empirical evaluations on treatment for patients with stable coronary artery diseases. Successful completion of the proposed research will equip investigators with powerful methods to unlock the full potential of big data, advance our understanding of mechanisms for treatment response heterogeneity, and ultimately improve strategies for preventing and managin...