# Generating Reproducible Real-World Evidence with Multi-Source Data to Capture Unstructured Clinical Endpoints for Chronic Diseases

> **NIH FDA U01** · HARVARD MEDICAL SCHOOL · 2024 · $1,103,420

## Abstract

PROJECT SUMMARY/ABSTRACT
Randomized clinical trials (RCTs) are the gold standard for assessing treatment safety and efficacy and are the
primary evidence supporting FDA's regulatory decisions. However, RCTs have a number of limitations,
including the lack of generalizability of study findings and insufficient follow-up to assess long-term outcomes.
With the growing availability of disease modifying treatments (DMTs), novel approaches are needed to monitor
long-term safety and efficacy of agents used in chronic diseases. Electronic health record (EHR) data present
the opportunity to capture longitudinal treatment response in heterogeneous patient populations and real-world
settings and can be used to generate real-world evidence (RWE) to augment RCT data for these drugs.
However, availability of RWE for DMTs has been limited by the lack of computable information on disease
progression measures, which are the clinical outcomes monitored by physicians directing therapy. This
information is typically captured only in unstructured text during clinical visits and may also not be consistently
documented at every encounter, resulting in incomplete data even with labor-intensive manual abstraction.
Further, it is critical that RWE resources for robust post-market assessments of DMTs ensure reproducibility of
findings across healthcare systems. In this proposal, we address this unmet need by developing methods to
generate reproducible and generalizable RWE on unstructured efficacy and adverse event (AE) endpoints
used in the evaluation of therapies for rheumatoid arthritis and multiple sclerosis. We will create scalable
disease progression endpoints from EHR data by linking information in EHRs to registry data and building
algorithms for ordinal disease activity scores using features derived from scoring guidelines. In Aim 1, we
integrate disease activity and progression data from registries to generate scalable RWE on disease
progression endpoints leveraging structured and free-text EHR data. In Aim 2, we develop strategies to correct
for noise in medication prescriptions for DMTs in RWE studies. Aim 3 combines EHR data from multiple
healthcare systems through federated learning to ensure generalizability of RWE. We intend for the methods to
build new capabilities for use of RWE in FDA's regulatory decisions on drug effectiveness, providing an
efficient, scalable, and robust approach to using real-world clinical data to support approval of new drug
indications and conduct of postmarket studies for DMTs.

## Key facts

- **NIH application ID:** 10913529
- **Project number:** 5U01FD007929-02
- **Recipient organization:** HARVARD MEDICAL SCHOOL
- **Principal Investigator:** FLORENCE BOURGEOIS
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** FDA
- **Fiscal year:** 2024
- **Award amount:** $1,103,420
- **Award type:** 5
- **Project period:** 2023-09-01 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10913529

## Citation

> US National Institutes of Health, RePORTER application 10913529, Generating Reproducible Real-World Evidence with Multi-Source Data to Capture Unstructured Clinical Endpoints for Chronic Diseases (5U01FD007929-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10913529. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
