CAREER: Revolutionizing the Evaluation of AI Agents with Online and Offline Data

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $600,000 · view on nsf.gov ↗

Abstract

This project focuses on designing new methods to facilitate the evaluation of artificial intelligence (AI) agents. It the era where AI agents are rapidly proliferating, with new systems of increasingly capable AI technologies, it is crucial to thoroughly understand their performance capabilities and limitations—a prerequisite for both safe deployment and continuous improvement. Traditional evaluation methods require running AI agents in live environments to collect performance data, but this approach can be resource-intensive and pose significant safety risks. This project addresses these challenges by developing innovative evaluation methods that dramatically reduce the need for expensive and potentially hazardous live testing, thereby accelerating the safe deployment of current AI systems and enabling the development of next-generation AI agents. Additionally, the project will train future AI researchers, helping to expand access to AI research opportunities across the United States. This project pioneers three research thrusts to fulfill different evaluation needs. First, the project delivers methods to efficiently evaluate an AI agent in a holistic manner with a scalar performance metric by reimagining Monte Carlo methods. The key innovation involves repurposing offline data to inform the online sampling process of Monte Carlo methods, thereby reducing the required sample size for accurate performance estimation. Second, the project develops methods to efficiently eva

Key facts

NSF award ID: 2442098
Awardee: University of Virginia Main Campus (VA)
SAM.gov UEI: JJG6HU8PA4S5
PI: Shangtong Zhang
Primary program: 01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs: CAREER-Faculty Erly Career Dev, ROBUST INTELLIGENCE
Estimated total: $600,000
Funds obligated: $506,819
Transaction type: Continuing Grant
Period: 09/01/2025 → 08/31/2030