Semi-supervised Algorithms for Risk Assessment with Noisy EHR Data

NIH RePORTER · NIH · R21 · $177,510 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Large electronic health record research (EHR) data integrated with -omics data from linked biorepositories have expanded opportunities for precision medicine research. These integrated datasets open opportunities for developing accurate EHR-based personalized cancer risk and progression prediction models, which can be easily incorporated into clinical practice and ultimately realize the promise of precision oncology. However, efficiently and effectively using EHR for cancer research remains challenging due to practical and methodological obstacles. For example, obtaining precise event time information such as time of cancer recurrence is a major bottleneck in using EHR for precision medicine research due to the requirement of laborious medical record review and the lack of documentation. Simple estimates of the event time based on billing or procedure codes may poorly approximate the true event time. Naive use of such estimated event times can lead to highly biased estimates due to the approximation error. Such biases impose challenges to performing pragmatic trials when the study endpoint is time to events and captured using EHR. The overall goal of this proposal is to fill these methodological gaps in risk assessment for cancer research using EHR data, which will advance our ability to achieve the promise of precision oncology. Statistical algorithms and software will be developed to (i) automatically assign event time information using longitudinally recorded EHR information; and (ii) to perform accurate risk assessment using noisy proxies of event times. The proposed tools for risk assessment using imperfect EHR data without requiring extensive manual chart review could greatly improve the utility of EHR for oncology research.

Key facts

NIH application ID
9955220
Project number
5R21CA242940-02
Recipient
HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH
Principal Investigator
TIANXI CAI
Activity code
R21
Funding institute
NIH
Fiscal year
2020
Award amount
$177,510
Award type
5
Project period
2019-07-01 → 2022-06-30