CAREER: Modeling and Mitigating Error Propagation in High-Performance Computing Applications

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $590,074 · view on nsf.gov ↗

Abstract

In the world of high-performance computing (HPC), the growing complexity and shrinking size of hardware components make systems more vulnerable to "soft errors"— temporary glitches that can disrupt calculations. Traditionally, these issues were managed through hardware-based solutions like redundancy, but these approaches consume significant energy, a major concern for modern processors. This project addresses the challenge of making HPC systems more resilient to soft errors without the high energy costs of traditional methods. It focuses on identifying and protecting the most vulnerable parts of a program — the specific states where errors are most likely to cause problems. By doing this efficiently, the project aims to ensure that programs can continue to function correctly even when errors occur. The broader benefits of this project include advancing the field of reliable computing, promoting energy-efficient technologies, and supporting education by making cutting-edge resilience techniques accessible to software developers and classrooms. Ultimately, this work contributes to the creation of more robust and efficient computing systems that can handle the increasing demands of modern technology, benefiting industries, education, and society as a whole. This project aims to address the increasing vulnerability of HPC systems to transient hardware faults, or soft errors, which are exacerbated by larger system scales, advanced technology scaling, and lower operating voltag

Key facts

NSF award ID
2441136
Awardee
University of Iowa (IA)
SAM.gov UEI
Z1H9VJS8NG16
PI
Guanpeng Li
Primary program
01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs
CAREER-Faculty Erly Career Dev, HIGH-PERFORMANCE COMPUTING, EXP PROG TO STIM COMP RES
Estimated total
$590,074
Funds obligated
$51,180
Transaction type
Continuing Grant
Period
07/01/2025 → 09/30/2025