# Machine Learning Development for Subtyping COPD

> **NIH NIH K25** · BRIGHAM AND WOMEN'S HOSPITAL · 2020 · $189,000

## Abstract

Project Summary
Chronic obstructive pulmonary disease (COPD) is a heterogeneous lung condition characterized by
progressive loss of lung function with subsequent increasing breathlessness and worsening quality of life. This
heterogeneity makes it difficult to predict health decline and develop targeted treatments for better patient care.
To date, researchers have attempted to use standard machine learning methodology to identify more
meaningful subtypes of COPD, but these methods often make general assumptions about the data, limiting
their ability to penetrate more complex patterns in some data sets. Thus, a meaningful reclassification of
COPD subtypes that could lead to more targeted therapies and interventions has been elusive. The applicant
introduces a new way of looking at the COPD subtyping problem by recasting it in terms of discovering
associations of individuals to disease trajectories – i.e., grouping individuals based on their similarity in
response to environmental and/or disease causing variables. The machine learning methods proposed build
on the most recent advances in Bayesian nonparametrics, a collection of theoretical ideas and techniques that
permit very flexible data representations. In this career development proposal, the applicant hypothesizes that
these machine learning methods and extensions thereof – together with data sources not previously leveraged
for COPD subtyping – will produce more biologically meaningful sub-groupings of patients, leading to a better
understanding of the genetic and biological underpinnings of the disease and ultimately improved patient
management. Aim 1 of this application involves evaluating the utility of CT-assessed lung mass – a potentially
more discriminative measure of emphysema than conventionally used measures – for defining COPD subtypes
using both K-means clustering and our disease trajectory algorithm. The goal of Aim 2 is to evaluate the utility
of comorbidity data for defining COPD subtypes using our trajectory clustering algorithm. Novel computed
tomography based measures of muscle wasting (cachexia) and pulmonary vascular pruning will be explored to
determine their efficacy in subtype determination. Additionally, we will extend and test the trajectory algorithm
in order to model discrete outputs (such as physician-diagnosed comorbidities), count data (e.g.
exacerbations), and time-to-event data (death). In Aim 3, the applicant will extend our trajectory clustering
algorithms to directly incorporate genetic and omics data for subtype discovery. Together, the research
proposed in the aims of this award will take full advantage of the comprehensive data set available through the
COPDGene study.
Execution of the aims in this proposal will be possible through active collaboration with Dr. Ron Kikinis, M.D., a
renowned leader in the field of medical image analysis, and Dr. Ed Silverman, an internationally recognized
expert in the genetic epidemiology of COPD.

## Key facts

- **NIH application ID:** 9948018
- **Project number:** 5K25HL130637-05
- **Recipient organization:** BRIGHAM AND WOMEN'S HOSPITAL
- **Principal Investigator:** James Ross
- **Activity code:** K25 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $189,000
- **Award type:** 5
- **Project period:** 2016-07-15 → 2022-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9948018

## Citation

> US National Institutes of Health, RePORTER application 9948018, Machine Learning Development for Subtyping COPD (5K25HL130637-05). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/9948018. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
