# Clinicopathologic and Genetic Profiling through Machine Learning and Natural Language Processing for Precision Lung Cancer Management

> **NIH NIH R01** · DARTMOUTH COLLEGE · 2022 · $367,647

## Abstract

PROJECT SUMMARY/ABSTRACT
Lung cancer is the second-most common type of cancer and the leading cause of cancer death in men and
women. Among the different types of lung cancer, non-small cell lung cancer (NSCLC) is the most common type
and it constitutes 85% to 90% of all lung cancer cases. Current cancer research has shown that multiple somatic
mutations affect the sensitivity of patients to various drugs used for NSCLC treatment. These mutations are
essential factors for determining the most effective, “personalized” treatment for each NSCLC patient; however,
most NSCLC patients develop resistance to these targeted therapies in their first year of treatment. Many
mechanisms of this resistance are still unknown. Designing and prescribing better targeted therapies for NSCLC
patients requires further understanding, particularly with respect to the relationship between NSCLC tumors’
pathological and clinical findings, genetic profiles, and targeted therapy responses/resistance. Currently, there is
no computational method to connect observations and findings from pathology reports, medical records, somatic
mutations, and the targeted therapy resistance. This project provides a plan to build a novel computational method
to identify statistically significant associations between the pathological findings of NSCLC tumors and the
presence of clinically-actionable somatic mutations. Furthermore, these associations, in combination with an
innovative set of feature analysis from pathology reports and electronic medical records, will be leveraged to
build and validate a machine-learning model to identify NSCLC patients with clinically-actionable somatic
mutations. Finally, the associated clinical, pathological, and genetic findings for NSCLC patients will be used in
a new machine-learning framework to predict patients’ time-to-resistance to targeted therapies. The required
data to build and validate the proposed models in this project will be obtained through a collaboration with the
Department of Pathology’s Laboratory for Clinical Genomics and Advanced Technologies at Dartmouth-Hitchcock
Medical Center. In addition to internal validation, the investigators in this proposal established a collaboration with
the Department of Pathology at the University of Vermont Medical Center to apply and validate the developed
models on an external data source. Upon successful implementation of this bioinformatics approach, the
developed models will be able to reveal statistically significant links between clinical and pathological findings,
clinically-actionable somatic mutations, and targeted-therapy responses for a better understanding of NSCLC
tumor development and treatment. The proposed approach will provide an accurate, fast, and inexpensive pre-
selection method for screening NSCLC patients with clinically-actionable mutations for translational research and
precision medicine. Furthermore, the proposed machine-learning method to identify NSCLC patients’ resistance...

## Key facts

- **NIH application ID:** 10475120
- **Project number:** 5R01CA249758-04
- **Recipient organization:** DARTMOUTH COLLEGE
- **Principal Investigator:** Saeed Hassanpour
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $367,647
- **Award type:** 5
- **Project period:** 2019-09-25 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10475120

## Citation

> US National Institutes of Health, RePORTER application 10475120, Clinicopathologic and Genetic Profiling through Machine Learning and Natural Language Processing for Precision Lung Cancer Management (5R01CA249758-04). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10475120. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
