# CAREER: Distributed Large-scale Machine Learning with Security Guarantees

> **NSF 01002930DB NSF RESEARCH & RELATED ACTIVIT** · Purdue University (IN) · $564,455

## Abstract

The colossal scale of Machine Learning (ML) systems today means that only powerful players with sufficient computing resources are able to participate in large-scale ML development. As a result, ML pipelines tend to lack transparency or auditing mechanisms. Reliance on a small number of service providers also jeopardizes availability and reliability. Alternatively, ML development can be distributed to a network of volunteer organizations and individuals, mitigating dependency on central suppliers and motivating users to donate their restricted or private data through transparent use of artifacts for public good. However, a distributed setting also opens up a large attack surface from malicious actors who can tamper with any step of the process. This project addresses this challenge by creating tools for an open, secure, and practical distributed ML development paradigm. The project’s novel contributions are centered around verification mechanisms for distributed and heterogeneous ML pipelines with private data. More broadly, this project helps stakeholder communities and individuals take part in large-scale ML development without compromising their privacy, contributing to the advancement of Artificial Intelligence (AI) technologies that benefit society. This project also integrates the proposed research into educational activities to train a workforce knowledgeable in capabilities and vulnerabilities of AI tools, as well as outreach initiatives to engage stakeholder communities and industry practitioners with research.

The project is divided into three main tasks. The first task proposes verification techniques for distributed data pipelines, allowing data holders to contribute sensitive data with privacy guarantees while attesting to the legitimacy of their submissions. The second task studies proof-of-learning through reproducing computational steps. The research first develops analytical and empirical models for computation output error due to factors such as

## Key facts

- **NSF award ID:** 2542372
- **Awardee organization:** Purdue University (IN)
- **SAM.gov UEI:** YRXVL4JYCEF5
- **PI:** Zahra Ghodsi
- **Primary program:** 01002930DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** SaTC: Secure and Trustworthy Cyberspace, Artificial Intelligence (AI), CAREER-Faculty Erly Career Dev, Nat Security, Secure Border & Pub Safety
- **Estimated total:** $564,455
- **Funds obligated:** $326,628
- **Transaction type:** Continuing Grant
- **Period:** 07/01/2026 → 06/30/2031

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2542372

## Citation

> US National Science Foundation, Award 2542372, CAREER: Distributed Large-scale Machine Learning with Security Guarantees. Retrieved via AI Analytics 2026-06-26 from https://api.ai-analytics.org/grant/nsf/2542372. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
