# Bridging clinical trial and real-world data via machine learning to advance rheumatoid arthritis treatment strategies

> **NIH NIH R01** · BRIGHAM AND WOMEN'S HOSPITAL · 2022 · $748,363

## Abstract

PROJECT SUMMARY/ABSTRACT
Rheumatoid arthritis (RA) is the most common autoimmune joint disease with over 15 treatment options,
reflecting both advances in therapy as well as the heterogenous response to therapy. After the first line
therapy methotrexate (MTX), patients and their rheumatologist proceed on a trial-and-error approach to identify
the optimal treatment. A landmark randomized controlled clinical trial (RCT), RACAT, compared the
effectiveness of triple therapy-MTX, sulfasalazine, and hydroxychloroquine vs MTX and a tumor necrosis factor
inhibitor (TNFi). The RACAT subgroup analyses observed that some patients had a better response to one
treatment strategy vs the other. However, like most RCTs, it was underpowered to better characterize these
subgroups. Real-world data (RWD), such as electronic health record (EHR) and registry data, have a larger
sample size but lack the randomization and precise clinical measurements performed as part of clinical trials.
The objective of this proposal is to apply and rigorously test state-of-the-art methods that can combine the
strengths of RCT and RWD to extend RCT findings. RACAT was a Veterans Affairs (VA) based clinical trial
and thus many of their subjects also have EHR data in parallel, providing an ideal study design to test methods
to understand how well we can replicated RCT using RWD. In Aim 1, we test methods using semi-supervised
machine learning methods to impute RACAT clinical endpoints using EHR data; the linked RACAT data will be
used as the gold standard comparison. Next, we apply causal inference modeling comparing triple therapy vs
TNFi using EHR data with the imputed endpoints and validate results using the linked RACAT data. In Aim 2,
we apply novel causal modeling methods that enable us to examine subgroup findings using RWD. We will
identify subjects in the larger EHR and registries similar to RACAT subgroups, i.e. patients who benefitted
more from triple therapy vs TNFi or vice versa, and subjects who remained on TNFi throughout the trial and did
well. These larger populations will provide improved power to study potential predictors of treatment response.
Moreover, the integration of EHR data allows us to study a broader set of potential predictors not collected in
RCT or registry data. Our overarching hypothesis is that we will identify the clinical subgroups observed in
RACAT with differing response to treatments within the larger populations of RA patients in EHR and registry.
We will also identify novel predictors of response by using a broader set of clinical data available in EHR. This
study is significant because it will provide a blueprint for studies for extending RCT findings in datasets with
linked RCT and RWD, applicable to many treatments and conditions. This study is innovative because of its
approach to maximize the data available from RCTs with existing RWD using linked datasets, powering studies
to optimize RA therapy for different patients. This proposal also ...

## Key facts

- **NIH application ID:** 10339668
- **Project number:** 1R01AR080193-01
- **Recipient organization:** BRIGHAM AND WOMEN'S HOSPITAL
- **Principal Investigator:** TIANXI CAI
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $748,363
- **Award type:** 1
- **Project period:** 2022-07-01 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10339668

## Citation

> US National Institutes of Health, RePORTER application 10339668, Bridging clinical trial and real-world data via machine learning to advance rheumatoid arthritis treatment strategies (1R01AR080193-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10339668. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
