# Advancing Chronic Condition Symptom Cluster Science Through Use of Electronic Health Records and Data Science Techniques

> **NIH NIH R00** · UNIVERSITY OF PITTSBURGH AT PITTSBURGH · 2020 · $248,966

## Abstract

Despite their adverse impact on patient quality of life and healthcare utilization and costs, symptom clusters
(SCs) in common adult chronic conditions such as cancer, heart failure (HF), type 2 diabetes mellitus (T2DM),
and chronic obstructive pulmonary disease (COPD) are understudied and poorly understood. The lack of
access to real world, longitudinal patient symptom data sets and inability to adequately model the complexity of
SCs has greatly limited research. Based on our previous work, we propose that these gaps can be addressed
in an innovative way using electronic health records (EHRs) and data science techniques. Our overall objective
is to develop, apply and refine, and implement an optimized data processing and analysis pipeline for the
characterization of SCs in common adult chronic conditions for use with EHR data. We hypothesize that a core
set of SCs is shared among all common adult chronic conditions and that distinct SCs characterize specific
conditions and/or treatments. The long term training goal of this project is to assist Dr. Koleck in becoming an
independent investigator conducting a program of research dedicated to mitigating symptom burden in patients
with chronic conditions through use of informatics and omics (e.g., genomics and proteomics), the focus of her
pre-doctoral work. Using exceptional resources available from Columbia University, the K99 phase of this
project will focus on the development of a rigorous pipeline; essential competencies in SC analysis and
interpretation; and the data science techniques of clinical data mining, natural language processing, machine
learning, and data visualization. In the R00 phase, Dr. Koleck will independently implement the pipeline in
another medical center to determine the reproducibility of identified SCs and begin to explore clinical predictors
(e.g., socio-demographics, laboratory results, and medications) of SCs. The specific aims are to 1) develop a
data-driven pipeline for the characterization of SCs from EHRs using a cohort of adult patients diagnosed with
cancer, as SCs have been most systematically characterized in this condition; 2) apply the pipeline to three
other common adult chronic conditions that share biological and behavioral risk factors with cancer, i.e., HF,
T2DM, and COPD, and evaluate SCs in these conditions; and 3) determine if SCs differ for cancer, HF, T2DM,
and COPD when implementing the pipeline within another medical center and explore clinically relevant, EHR-
documented predictors of identified SCs. To accomplish research aims and training goals, an interdisciplinary
team of scientists with expertise in symptom science, biomedical informatics, data science, pertinent clinical
domains, and career development mentorship has been assembled. This research is significant because a
pipeline that accommodates the format in which symptom data is already being documented in EHRs has the
potential to greatly accelerate the acquisition of SC knowledge and ...

## Key facts

- **NIH application ID:** 10118580
- **Project number:** 4R00NR017651-03
- **Recipient organization:** UNIVERSITY OF PITTSBURGH AT PITTSBURGH
- **Principal Investigator:** Theresa Ann Koleck
- **Activity code:** R00 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $248,966
- **Award type:** 4N
- **Project period:** 2018-06-01 → 2023-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10118580

## Citation

> US National Institutes of Health, RePORTER application 10118580, Advancing Chronic Condition Symptom Cluster Science Through Use of Electronic Health Records and Data Science Techniques (4R00NR017651-03). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10118580. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
