# Computational Methods for Enhancing Privacy in Biomedical Data Sharing

> **NIH NIH DP5** · BROAD INSTITUTE, INC. · 2020 · $392,800

## Abstract

Project Summary
Data sharing is essential to modern biomedical data science. Access to a large amount of
genomic and clinical data can help us better understand human genetics and its impact on health
and disease. However, the sensitive nature of biomedical information presents a key bottleneck
in data sharing and collection efforts, limiting the utility of these data for science. The goal of this
project is to leverage cutting-edge advances in cryptography and information theory to develop
innovative computational frameworks for privacy-preserving sharing and analysis of biomedical
data. We will draw upon our recent success in developing secure pipelines for collaborative
biomedical analyses to address the imminent need to share sensitive data securely and at scale.
 Practical adoption of existing privacy-preserving techniques in biomedicine has thus far been
largely limited due to two major pitfalls, which this project overcomes with novel technical
advances. First, emerging cryptographic data sharing frameworks, which promise to enable
collaborative analysis pipelines that securely combine data across multiple institutions with
theoretical privacy guarantees, are too costly to support complex and large-scale computations
required in biomedical analyses. In this project, we will build upon recent advances in
cryptography (e.g., secure distributed computation, pseudorandom correlation, zero-knowledge
proofs) to significantly enhance the scalability and security of cryptographic biomedical data
sharing pipelines. Second, existing approaches that locally transform data to protect sensitive
information before sharing (e.g. de-identification techniques) either offer insufficient levels of
protection or require excessive perturbation in order to ensure privacy. We will draw upon recent
tools from information theory to develop effective local privacy protection methods that achieve
superior utility-privacy tradeoffs on a range of biomedical data including genomes, transcriptomes,
and medical images by directly exploiting the latent correlation structure of the data.
 To promote the use of our privacy techniques, we will create production-grade software of our
tools and publicly release them. We will also actively participate in international standard-setting
organizations in genomics, e.g. GA4GH and ICDA, to incorporate our insights into community
guidelines for biomedical privacy. Successful completion of these aims will result in computational
methods and software tools that open the door to secure sharing and analysis of massive sets of
sensitive genomic and clinical data. Our long-term goal is to broadly enable data sharing and
collaboration efforts in biomedicine, thus empowering researchers to better understand the
molecular basis of human health and to drive translation of new biological insights to the clinic.

## Key facts

- **NIH application ID:** 10017554
- **Project number:** 1DP5OD029574-01
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Hyunghoon Cho
- **Activity code:** DP5 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $392,800
- **Award type:** 1
- **Project period:** 2020-09-10 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10017554

## Citation

> US National Institutes of Health, RePORTER application 10017554, Computational Methods for Enhancing Privacy in Biomedical Data Sharing (1DP5OD029574-01). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10017554. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
