CAREER: An Architecture-Aware Optimization Theory for Deep Learning: Non-Euclidean Descent, Structured Preconditioning, and Scale Invariance

NSF Award Search · 01002627DB NSF RESEARCH & RELATED ACTIVIT · $599,999 · view on nsf.gov ↗

Abstract

Training modern artificial intelligence systems requires large amounts of computing time, energy, and money. Many of the optimization methods used to train neural networks are still chosen largely through trial and error because existing theory does not adequately explain why some methods work better than others on different model architectures. This project will develop a scientific foundation for making training faster, more reliable, and more resource efficient by linking optimization methods to the structure of the neural networks they are used to train. The project can reduce the cost and energy use of model training, provide more dependable guidance for practitioners, and support efficient and reliable artificial intelligence development. It will also support graduate education in optimization for deep learning, research-preparation activities for undergraduates, and hands-on artificial intelligence learning modules for local high school students, with participation in project activities open to all. This project studies how neural network architecture influences optimization through three complementary directions: structured preconditioning, optimization methods adapted to different notions of distance, and scale invariance induced by normalization layers. The research will analyze representative model components such as multilayer perceptrons, attention modules, embedding parameters, and layers preceding normalization. It will develop theory explaining when optimization methods matched to the model architecture improve training efficiency, characterize their behavior on losses whose landscape geometry is shaped by the network architecture, and study how optimization affects the quality of learned solutions beyond training loss alone, including downstream tasks and performance when data differ from the training distribution. These ideas will be tested through controlled experiments on representative neural network architectures and larger-scale validation

Key facts

NSF award ID: 2544658
Awardee: Toyota Technological Institute at Chicago (IL)
SAM.gov UEI: ERBJF4DMW6G4
PI: Zhiyuan Li
Primary program: 01002627DB NSF RESEARCH & RELATED ACTIVIT
All programs: Artificial Intelligence (AI), CAREER-Faculty Erly Career Dev, ROBUST INTELLIGENCE
Estimated total: $599,999
Funds obligated: $349,963
Transaction type: Continuing Grant
Period: 06/01/2026 → 05/31/2031