# CAREER: Revolutionizing the Evaluation of AI Agents with Online and Offline Data

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · University of Virginia Main Campus (VA) · $600,000

## Abstract

This project focuses on designing new methods to facilitate the evaluation of artificial intelligence (AI) agents. It the era where AI agents are rapidly proliferating, with new systems of increasingly capable AI technologies, it is crucial to thoroughly understand their performance capabilities and limitations—a prerequisite for both safe deployment and continuous improvement. Traditional evaluation methods require running AI agents in live environments to collect performance data, but this approach can be resource-intensive and pose significant safety risks.  This project addresses these challenges by developing innovative evaluation methods that dramatically reduce the need for expensive and potentially hazardous live testing, thereby accelerating the safe deployment of current AI systems and enabling the development of next-generation AI agents. Additionally, the project will train future AI researchers, helping to expand access to AI research opportunities across the United States.

This project pioneers three research thrusts to fulfill different evaluation needs. First, the project delivers methods to efficiently evaluate an AI agent in a holistic manner with a scalar performance metric by reimagining Monte Carlo methods. The key innovation involves repurposing offline data to inform the online sampling process of Monte Carlo methods, thereby reducing the required sample size for accurate performance estimation. Second, the project develops methods to efficiently eva

## Key facts

- **NSF award ID:** 2442098
- **Awardee organization:** University of Virginia Main Campus (VA)
- **SAM.gov UEI:** JJG6HU8PA4S5
- **PI:** Shangtong Zhang
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** CAREER-Faculty Erly Career Dev, ROBUST INTELLIGENCE
- **Estimated total:** $600,000
- **Funds obligated:** $506,819
- **Transaction type:** Continuing Grant
- **Period:** 09/01/2025 → 08/31/2030

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2442098

## Citation

> US National Science Foundation, Award 2442098, CAREER: Revolutionizing the Evaluation of AI Agents with Online and Offline Data. Retrieved via AI Analytics 2026-06-07 from https://api.ai-analytics.org/grant/nsf/2442098. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
