PROJECT SUMMARY/ABSTRACT Research. Non-small cell lung cancer (NSCLC) is the world’s deadliest cancer, but patients with NSCLC can have dramatically different outcomes, illuminating an urgent clinical unmet need for improved risk stratification. Our study is motivated by the following unresolved questions in NSCLC oncology: 1) What is the likelihood of recurrence for patients with definitively treated disease? 2) Which patients with advanced disease are most likely to benefit from consolidative radiotherapy? 3) What is the likelihood that a patient will develop central nervous system metastasis? We contend that predictive models derived from real-world data collected as part of standard of care, including tumor genomic profiling, imaging, and clinician notes, combined with newer clinical assays such as circulating tumor (ct)DNA sequencing and radiomics will advance personalized answers to these questions, leading to improved outcomes for patients. We have recently developed methods to overcome barriers to using real-world data with transformer-based natural language processing, eliminating the need for time-intensive manual curation of clinician notes, yielding structured data critical for developing predictive models. In a proof of principle study, we validated the prognostic value of ctDNA sequencing merged with radiomic, tumor registry and tissue genomic data to create a richly annotated dataset an order of magnitude larger than recent manually curated cohorts. Our preliminary studies show that multimodal models incorporating complementary data streams improve overall survival prediction over any single data modality, such as stage or tissue genomics, and standard of care biomarkers. Based on these results, we hypothesize that specific combination models, encompassing real world data from ctDNA and clinicogenomic sources, more accurately inform tumor biology and patient outcomes than single-modality variables. We will improve risk stratification and clinical management of NSCLC by studying whether and how real-world data can be used to develop multimodal risk models that in the future could be deployed in clinical settings with minimal patient and clinician overhead. Candidate. Justin Jee, MD PhD is an Instructor in the Thoracic Oncology Service at MSK. His goal is to integrate AI-extracted clinicogenomic data to discover multimodal biomarkers of antineoplastic response for patients with cancer. He will undergo a five-year training period with a multidisciplinary mentorship team including experts in computational oncology, machine learning, genomics, natural language processing, radiomics, and thoracic oncology to obtain the skills necessary to become an independent, tenure-track physician scientist. Environment. MSK is an academic cancer center renowned for patient care, innovative research, and training for junior faculty seeking careers as independent physician-scientists. MSK is home to MSK-IMPACT, an FDA- authorized, tumor/normal sequenci...