CAREER: Distributed Large-scale Machine Learning with Security Guarantees

NSF Award Search · 01002930DB NSF RESEARCH & RELATED ACTIVIT · $564,455 · view on nsf.gov ↗

Abstract

The colossal scale of Machine Learning (ML) systems today means that only powerful players with sufficient computing resources are able to participate in large-scale ML development. As a result, ML pipelines tend to lack transparency or auditing mechanisms. Reliance on a small number of service providers also jeopardizes availability and reliability. Alternatively, ML development can be distributed to a network of volunteer organizations and individuals, mitigating dependency on central suppliers and motivating users to donate their restricted or private data through transparent use of artifacts for public good. However, a distributed setting also opens up a large attack surface from malicious actors who can tamper with any step of the process. This project addresses this challenge by creating tools for an open, secure, and practical distributed ML development paradigm. The project’s novel contributions are centered around verification mechanisms for distributed and heterogeneous ML pipelines with private data. More broadly, this project helps stakeholder communities and individuals take part in large-scale ML development without compromising their privacy, contributing to the advancement of Artificial Intelligence (AI) technologies that benefit society. This project also integrates the proposed research into educational activities to train a workforce knowledgeable in capabilities and vulnerabilities of AI tools, as well as outreach initiatives to engage stakeholder communities and industry practitioners with research. The project is divided into three main tasks. The first task proposes verification techniques for distributed data pipelines, allowing data holders to contribute sensitive data with privacy guarantees while attesting to the legitimacy of their submissions. The second task studies proof-of-learning through reproducing computational steps. The research first develops analytical and empirical models for computation output error due to factors such as

Key facts

NSF award ID: 2542372
Awardee: Purdue University (IN)
SAM.gov UEI: YRXVL4JYCEF5
PI: Zahra Ghodsi
Primary program: 01002930DB NSF RESEARCH & RELATED ACTIVIT
All programs: SaTC: Secure and Trustworthy Cyberspace, Artificial Intelligence (AI), CAREER-Faculty Erly Career Dev, Nat Security, Secure Border & Pub Safety
Estimated total: $564,455
Funds obligated: $326,628
Transaction type: Continuing Grant
Period: 07/01/2026 → 06/30/2031