# CAREER: Reforming Profiling Techniques to Guide Systemic Performance Tuning for GPU-Accelerated Deep Learning Workloads

> **NSF 01002425DB NSF RESEARCH & RELATED ACTIVIT** · University of California - Merced (CA) · $604,250

## Abstract

Graphics Processing Units (GPUs) are the go-to choice for deep learning due to their exceptional computational power and massive parallelism. However, maximizing GPU performance for model development and inference remains notoriously challenging as models grow increasingly complex, spanning multiple abstraction layers: the upstream Python layer, the midstream C/C++ layer, and the downstream GPU kernel layer. While this layered complexity meets diverse application needs, it also embeds inefficiencies that are difficult to detect due to intricate cross-layer interactions. The project addresses these inefficiencies through a comprehensive, cross-layer performance analysis of deep learning models. The project’s novelties are advancing state-of-the-art profiling techniques to enable systemic performance tuning across all layers. The project's broader significance and importance are deepening the understanding of systemic performance issues in deep learning, thus strengthening foundations in code analysis and advancing progress in fields increasingly reliant on deep learning, such as image processing. With interest from industry leaders like Meta, the project shows strong potential for translating academic insights into practical applications. Additionally, the project contributes to educational and outreach goals by integrating its findings into computer science curricula and K-12 programs to cultivate a workforce skilled in performance analysis and optimization.

Three innovat

## Key facts

- **NSF award ID:** 2441754
- **Awardee organization:** University of California - Merced (CA)
- **SAM.gov UEI:** FFM7VPAG8P92
- **PI:** Pengfei Su
- **Primary program:** 01002425DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** CAREER-Faculty Erly Career Dev
- **Estimated total:** $604,250
- **Funds obligated:** $341,525
- **Transaction type:** Continuing Grant
- **Period:** 07/01/2025 → 06/30/2030

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2441754

## Citation

> US National Science Foundation, Award 2441754, CAREER: Reforming Profiling Techniques to Guide Systemic Performance Tuning for GPU-Accelerated Deep Learning Workloads. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2441754. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
