# From Human-Powered to Automated Video Description for Blind and Low Vision Users

> **NIH NIH R01** · ARIZONA STATE UNIVERSITY-TEMPE CAMPUS · 2023 · $649,872

## Abstract

Project Summary
Approximately 12 million people in the United States have been diagnosed with a visual impairment. These
individuals face unique challenges in our modern environment, where much critical information related to
education, employment, entertainment, and community is presented in the form of digital videos. Inaccessible
information can result in social exclusion or become life threatening if individuals require access to it in order to
make decisions related to their health and safety. For example, in a personal or global health crisis, individuals
may need to access the mass amounts of information conveyed via videos or dynamic infographics in order to
make informed decisions. To address this need, the online platform YouDescribe allows blind and low vision
(BLV) users to request amateur volunteers to create video descriptions, also referred to as audio descriptions
(AD), of YouTube videos. However, the platform has been unable to keep up with the overwhelming demand,
and 92.5% of videos on the YouDescribe user wish list remain undescribed. The overall objective of this proposal
is to build an AI-driven system, suitable for use on a wide-scale, to automatically generate descriptions of online
videos, as well as answer questions asked by BLV users about the content of videos. The rationale for this
project is that AI-based tools are necessary to facilitate timely access to the deluge of new videos appearing on
the Internet every day. The proposed work encompasses three specific aims: 1) develop an AI-based tool in
collaboration with sighted describers that more efficiently produces video descriptions and increases the
availability of accessible videos. The goal is to create an AI-driven NarrationBot that will decrease the time
required for novice volunteers to produce video descriptions by 80%; 2) develop an AI-based tool in collaboration
with BLV individuals that offers user-driven access to visual information in online videos. The goal is to develop
an AI-driven QABot that allows users to pause a video, ask questions about content, and receive immediate
answers (e.g., “What breed is the dog?”, “German shepherd”) that are accurate 80% of the time; and 3) develop
and publicly release large-scale datasets to improve machine learning for video accessibility. These novel
datasets will be used to increase the quality and accuracy of NarrationBot and QABot until AI-generated
descriptions and answers need minimal intervention from human volunteers and can serve BLV users directly.
The proposed research is innovative because it focuses on videos, whereas existing AI-driven efforts to address
this problem have focused primarily on static photos or images. It is also one of only a few efforts to directly
partner with BLV individuals to develop AI-driven systems that produce visual descriptions or answer visual
questions. The proposed research is significant because it will result in open-source, AI-driven tools that will give
BLV individuals ...

## Key facts

- **NIH application ID:** 10568469
- **Project number:** 1R01EY034562-01
- **Recipient organization:** ARIZONA STATE UNIVERSITY-TEMPE CAMPUS
- **Principal Investigator:** Pooyan Fazli
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $649,872
- **Award type:** 1
- **Project period:** 2023-07-01 → 2028-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10568469

## Citation

> US National Institutes of Health, RePORTER application 10568469, From Human-Powered to Automated Video Description for Blind and Low Vision Users (1R01EY034562-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10568469. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
