# SOCAL: Privacy-protecting Sharing Of Clinical Data Across Laboratories

> **NIH NIH R01** · YALE UNIVERSITY · 2024 · $343,314

## Abstract

Project Summary
Privacy and security of personal information has become one of the major grand challenges in modern society,
especially for healthcare studies. Re-identification risks and data breaches require new policies and regulations
for data sharing across healthcare institutions and research laboratories. While policy cannot solve the problem
on its own, advanced technologies that work hand in hand with policy are important to address the
privacy/security concerns. Predictive analytics can support quality improvement, clinical research, and eventually
impact patient health status. Extensive clinical variable information and voluminous data records from multiple
institutions and laboratories are necessary to further improve the performance of modeling approaches and to
identify medication-outcome associations for diseases. Nonetheless, the transfer of such sensitive data among
institutions/laboratories can present serious privacy risks, which can jeopardize NIH’s mission. Aiming at
mitigating the privacy problem while increasing predictive capability via cross-institutional modeling, prior studies
proposed distributed methods to exchange only the predictive models, but not patient data. However, these
methods still pose many challenges to the clinical cross-institutional learning problem, including the need for
more comprehensive clinical variables and more patient records to achieve better prediction discrimination and
build more generalizable models, the necessity for discovery/alleviation of data manipulation to increase the
trustworthiness of the collaboratively trained models, and the requirement for more validation to ensure usability.
In this proposal, we plan to develop SOCAL (Privacy-protecting Sharing Of Clinical data Across Laboratories), a
distributed framework addressing these challenges by integrating vertical/horizontal modeling methods to
include both more complete variables and more records, discovering/alleviating data manipulation incidents
using models recorded on blockchain, and conducting controlled experiments and designing/testing a web portal
with physician-researchers to increase the usability of the system. SOCAL will be evaluated on a Coronavirus
Disease 2019 (COVID-19) dataset from five University of California (UC) Health medical centers. We expect the
knowledge/capability of collaborative modeling can be improved, the trustworthiness of the learning process can
be enhanced, and the framework will be ready for use. SOCAL is innovative because it will be a new integration
methodology for vertical/horizontal modeling, a novel data manipulation resisting methods, and a hardened
prototype for a practical blockchain application. We anticipate a powerful impact of the SOCAL framework to
largely reduce the privacy concerns of predictive modeling tasks for various stakeholders, including healthcare
providers, clinical researchers, and patients. Upon completion, SOCAL can accelerate the development of
methods/technologi...

## Key facts

- **NIH application ID:** 11110075
- **Project number:** 7R01EB031030-03
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** Tsung-Ting Kuo
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $343,314
- **Award type:** 7
- **Project period:** 2024-07-01 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11110075

## Citation

> US National Institutes of Health, RePORTER application 11110075, SOCAL: Privacy-protecting Sharing Of Clinical Data Across Laboratories (7R01EB031030-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/11110075. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
