CAREER: From Redundancy to Efficiency -- Scalable, Observable Multicast for AI Datacenters

NSF Award Search · 01002930DB NSF RESEARCH & RELATED ACTIVIT · $700,000 · view on nsf.gov ↗

Abstract

Artificial intelligence systems rely on large groups of computers working together, but communication between these machines is becoming a major bottleneck. Today, many systems send the same data repeatedly between machines, which wastes time, energy, and network capacity. This project develops new ways for computers to share information more efficiently by using multicast, a method that allows one machine to send data to many others at once, much like a radio broadcast. The goal is to make communication faster, more reliable, and easier to manage in large-scale computing systems that support modern artificial intelligence applications. This project addresses the scalability and observability challenges that currently prevent the deployment of multicast in production environments. The technical work is organized into three integrated thrusts. The first thrust develops a scalable data plane using topology-aware algorithms to construct efficient transmission trees and introduces a new way to compress network forwarding state to fit within the limited memory of standard hardware. The second thrust creates an introspectable control plane and monitoring system that uses advanced probes and machine learning to detect and localize hidden network failures in real time. The third thrust integrates these research findings into the university curriculum through the creation of hands-on laboratory modules and a structured mentoring pipeline for students. This project improves the effi

Key facts

NSF award ID: 2543556
Awardee: Johns Hopkins University (MD)
SAM.gov UEI: FTMTDMBR29C7
PI: Soudeh Ghorbani
Primary program: 01002930DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), CAREER-Faculty Erly Career Dev, RES IN NETWORKING TECH & SYS, WOMEN, MINORITY, DISABLED, NEC
Estimated total: $700,000
Funds obligated: $389,092
Transaction type: Continuing Grant
Period: 07/01/2026 → 06/30/2031