Collaborative Research: HCC: Small: Accounting for Focus Ambiguity in Visual Questions

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $285,000 · view on nsf.gov ↗

Abstract

Ambiguous language is a common part of communication. It means using vague words or phrases that can be interpreted in multiple ways depending on the context. This project addresses how a question answering system might handle ambiguous questions about images where it is unclear which part of an image a question refers to. For example, if someone asks “What is the medicine?” while looking at an image showing several pill bottles, a system should identify all relevant parts of the image and provide answers for each so that a person receives the full picture and can resolve ambiguities later. Instead, current visual question answering (VQA) services typically provide people with one answer per question and do not explain their reasoning process for choosing the answer. This limits a person’s ability to verify whether the desired interpretation was made. The possible repercussions from VQA services providing incomplete information can be grave, inflicting adverse personal, social, professional, legal, and financial consequences to VQA service users. The researchers will develop a socio-technical solution to address the need for innovative solutions that empower people to recognize when there is question ambiguity, and then resolve it. The project introduces the first back-end AI model that can specify every plausible image region that could be the focus of a question's language paired with natural language answers derived from those regions. The project will establish eff

Key facts

NSF award ID: 2516628
Awardee: University of Colorado at Boulder (CO)
SAM.gov UEI: SPVKK1RC2MZ3
PI: Danna Gurari
Primary program: 01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs: Cyber-Human Systems, SMALL PROJECT
Estimated total: $285,000
Funds obligated: $285,000
Transaction type: Standard Grant
Period: 09/15/2025 → 08/31/2027