The colossal scale of Machine Learning (ML) systems today means that only powerful players with sufficient computing resources are able to participate in large-scale ML development. As a result, ML pipelines tend to lack transparency or auditing mechanisms. Reliance on a small number of service providers also jeopardizes availability and reliability. Alternatively, ML development can be distributed to a network of volunteer organizations and individuals, mitigating dependency on central suppliers and motivating users to donate their restricted or private data through transparent use of artifacts for public good. However, a distributed setting also opens up a large attack surface from malicious actors who can tamper with any step of the process. This project addresses this challenge by creating tools for an open, secure, and practical distributed ML development paradigm. The project’s novel contributions are centered around verification mechanisms for distributed and heterogeneous ML pipelines with private data. More broadly, this project helps stakeholder communities and individuals take part in large-scale ML development without compromising their privacy, contributing to the advancement of Artificial Intelligence (AI) technologies that benefit society. This project also integrates the proposed research into educational activities to train a workforce knowledgeable in capabilities and vulnerabilities of AI tools, as well as outreach initiatives to engage stakeholder communities and industry practitioners with research. The project is divided into three main tasks. The first task proposes verification techniques for distributed data pipelines, allowing data holders to contribute sensitive data with privacy guarantees while attesting to the legitimacy of their submissions. The second task studies proof-of-learning through reproducing computational steps. The research first develops analytical and empirical models for computation output error due to factors such as