# CAREER: Self-Supervised Video Representation Learning for Machine Perception

> **NSF 01002930DB NSF RESEARCH & RELATED ACTIVIT** · Massachusetts Institute of Technology (MA) · $600,000

## Abstract

Artificial intelligence systems can now generate realistic video, but they still struggle to learn the world knowledge needed to understand how environments change over time, anticipate the consequences of actions, and support decision- making in the physical world. This limitation is a major barrier to building machines that can safely and effectively assist people in homes, workplaces, and scientific settings. By developing learning methods that extract action-relevant structure directly from raw video and other sensor data, this project will help lay the foundation for more capable and adaptable intelligent systems, with potential benefits for robotics, scientific discovery, and other applications that require reliable machine perception. The project will also create open educational materials and mentorship activities that train students across vision, robotics, and machine learning.

This project develops a self-supervised framework for video representation learning that separates efficient perception modules from generative world models, enabling the discovery of compact representations of scene state, motion, and action from raw sensory streams without dense human annotation. The research will study learning objectives and architectures that support long-context prediction, planning, and action-conditioned world modeling, while also yielding representations that can implicitly support conventional vision capabilities such as 3D reconstruction, motion estimation, and 

## Key facts

- **NSF award ID:** 2543631
- **Awardee organization:** Massachusetts Institute of Technology (MA)
- **SAM.gov UEI:** E2NYLCDML6V1
- **PI:** Vincent Sitzmann
- **Primary program:** 01002930DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Artificial Intelligence (AI), CAREER-Faculty Erly Career Dev, ROBUST INTELLIGENCE
- **Estimated total:** $600,000
- **Funds obligated:** $360,000
- **Transaction type:** Continuing Grant
- **Period:** 07/01/2026 → 06/30/2031

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2543631

## Citation

> US National Science Foundation, Award 2543631, CAREER: Self-Supervised Video Representation Learning for Machine Perception. Retrieved via AI Analytics 2026-06-08 from https://api.ai-analytics.org/grant/nsf/2543631. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*