# RI: Small: Characterizing the Meaning of Human Preferences for AI Alignment

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · University of Massachusetts Amherst (MA) · $599,969

## Abstract

Artificial Intelligence (AI) systems are becoming increasingly capable, yet deploying them to produce reliable, intended outcomes remains difficult. Large language models often fail to follow instructions, AI systems that govern important functions sometimes behave unpredictably, and autonomous systems, such as self-driving vehicles or robots, can act in ways that diverge from user expectations. To improve reliability and utility, future AI systems will need to demonstrate that their goals and behaviors consistently reflect and support the intentions of their human users. The dominant current paradigm for AI alignment relies on learning from human preferences over possible actions or outcomes of an AI system. However, such methods make a number of  assumptions about how preferences should be interpreted and ignore many potential sources of error. The aim of this project is to improve the scientific characterization of human preferences in the context of AI alignment and leverage that knowledge to practically improve AI systems.

More specifically, Reinforcement Learning from Human Feedback (RLHF) is now at the core of many of the most successful contemporary approaches to AI alignment in applications ranging from robotics to language modeling. RLHF aims to align a policy with the desires implied by human preferences between pairs of trajectories, outcomes, or model outputs. However, such approaches typically rely on very strong assumptions about the meaning of human prefere

## Key facts

- **NSF award ID:** 2437426
- **Awardee organization:** University of Massachusetts Amherst (MA)
- **SAM.gov UEI:** VGJHK59NMPK9
- **PI:** Scott D Niekum
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** ROBUST INTELLIGENCE, SMALL PROJECT
- **Estimated total:** $599,969
- **Funds obligated:** $599,969
- **Transaction type:** Standard Grant
- **Period:** 09/01/2025 → 08/31/2028

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2437426

## Citation

> US National Science Foundation, Award 2437426, RI: Small: Characterizing the Meaning of Human Preferences for AI Alignment. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2437426. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
