CAREER: Integrated Load and Resource Management for High-Utilization Datacenters

NSF Award Search · 01002627DB NSF RESEARCH & RELATED ACTIVIT · $595,834 · view on nsf.gov ↗

Abstract

Modern cloud services rely on expensive and power-hungry hardware, making efficient use of computing resources essential for controlling cost and energy consumption. This project focuses on maximizing how much useful work each server can perform without becoming overloaded or unresponsive. The central idea is to make cloud systems determine, within a few microseconds, how much work a server can safely accept, allocate resources to individual tasks accordingly, and then distribute incoming requests across servers based on these allocations. Today, resource allocation and load distribution are handled independently, which leads to inefficient resource use and slow reactions to rapid changes in workload. By combining these operations into a coordinated framework, the project makes these capabilities easier for users to adopt. The overall goal is to improve cloud services without continuously adding more hardware. The project aims to redesign load and resource management in a coordinated manner across software and hardware layers. This problem is fundamentally challenging because resource demands vary widely across requests, bottlenecks shift over time, and independent control mechanisms often operate at similar timescales and interfere with each other. Addressing these challenges requires fine grained visibility into application behavior and new control abstractions that coordinate decisions across layers without introducing excessive overhead. To achieve this, the work is organized around three technical thrusts. The first thrust plans to develop unified and transparent mechanisms that track resource usage for each application request and enforce admission decisions across multiple shared bottlenecks. The second thrust plans to integrate these decisions with operating system scheduling, jointly managing application load and the resources allocated to handle it. The third thrust plans to extend these ideas to clusters of servers, redesigning load balancing, backpres

Key facts

NSF award ID: 2542973
Awardee: Georgia Tech Research Corporation (GA)
SAM.gov UEI: EMW9FC8J3HN4
PI: Ahmed Saeed
Primary program: 01002627DB NSF RESEARCH & RELATED ACTIVIT
All programs: CAREER-Faculty Erly Career Dev, RES IN NETWORKING TECH & SYS
Estimated total: $595,834
Funds obligated: $348,081
Transaction type: Continuing Grant
Period: 06/01/2026 → 05/31/2031