# Automatic integrated biomarkers to improve prediction of lung cancer outcomes

> **NIH NIH K08** · SLOAN-KETTERING INST CAN RESEARCH · 2024 · $256,487

## Abstract

PROJECT SUMMARY/ABSTRACT
Research. Non-small cell lung cancer (NSCLC) is the world’s deadliest cancer, but patients with NSCLC can
have dramatically different outcomes, illuminating an urgent clinical unmet need for improved risk stratification.
Our study is motivated by the following unresolved questions in NSCLC oncology: 1) What is the likelihood of
recurrence for patients with definitively treated disease? 2) Which patients with advanced disease are most likely
to benefit from consolidative radiotherapy? 3) What is the likelihood that a patient will develop central nervous
system metastasis? We contend that predictive models derived from real-world data collected as part of standard
of care, including tumor genomic profiling, imaging, and clinician notes, combined with newer clinical assays
such as circulating tumor (ct)DNA sequencing and radiomics will advance personalized answers to these
questions, leading to improved outcomes for patients. We have recently developed methods to overcome
barriers to using real-world data with transformer-based natural language processing, eliminating the need for
time-intensive manual curation of clinician notes, yielding structured data critical for developing predictive
models. In a proof of principle study, we validated the prognostic value of ctDNA sequencing merged with
radiomic, tumor registry and tissue genomic data to create a richly annotated dataset an order of magnitude
larger than recent manually curated cohorts. Our preliminary studies show that multimodal models incorporating
complementary data streams improve overall survival prediction over any single data modality, such as stage or
tissue genomics, and standard of care biomarkers. Based on these results, we hypothesize that specific
combination models, encompassing real world data from ctDNA and clinicogenomic sources, more accurately
inform tumor biology and patient outcomes than single-modality variables. We will improve risk stratification and
clinical management of NSCLC by studying whether and how real-world data can be used to develop multimodal
risk models that in the future could be deployed in clinical settings with minimal patient and clinician overhead.
Candidate. Justin Jee, MD PhD is an Instructor in the Thoracic Oncology Service at MSK. His goal is to integrate
AI-extracted clinicogenomic data to discover multimodal biomarkers of antineoplastic response for patients with
cancer. He will undergo a five-year training period with a multidisciplinary mentorship team including experts in
computational oncology, machine learning, genomics, natural language processing, radiomics, and thoracic
oncology to obtain the skills necessary to become an independent, tenure-track physician scientist.
Environment. MSK is an academic cancer center renowned for patient care, innovative research, and training
for junior faculty seeking careers as independent physician-scientists. MSK is home to MSK-IMPACT, an FDA-
authorized, tumor/normal sequenci...

## Key facts

- **NIH application ID:** 10985665
- **Project number:** 1K08CA286842-01A1
- **Recipient organization:** SLOAN-KETTERING INST CAN RESEARCH
- **Principal Investigator:** Justin Jee
- **Activity code:** K08 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $256,487
- **Award type:** 1
- **Project period:** 2024-08-13 → 2029-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10985665

## Citation

> US National Institutes of Health, RePORTER application 10985665, Automatic integrated biomarkers to improve prediction of lung cancer outcomes (1K08CA286842-01A1). Retrieved via AI Analytics 2026-06-14 from https://api.ai-analytics.org/grant/nih/10985665. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
