Web agents powered by large language models (LLMs) could usher in a new era of intelligent automation, completing complex tasks across websites and transforming how people interact with the web. Yet today’s agents rely mainly on prompting off-the-shelf LLMs, limiting their ability to adapt to specific sites, achieve high reliability, and operate cost-effectively. This project proposes an alternative, modular paradigm in which LLMs are orchestrated with lightweight, task-specific models to enhance adaptability, reliability, and safety while reducing inference costs. The work aims to boost productivity, lower barriers to web access for all users, and train the next generation of scientists through new courses and outreach activities. This project identifies several major gaps in the existing literature, including evaluation, planning, grounding, safety, and continual learning. These research gaps prevent web agents from quickly adapting to new websites, continually learning on the job, and achieving high reliability. This project will pursue four coordinated research thrusts to bridge these gaps: (1) a new public benchmark of complex, realistic web tasks for measuring progress and guiding innovation, (2) a world model for the web to enable model-based planning—the agent can simulate the effects of different actions before acting, (3) a specialized visual grounding model that maps agent plans to precise actions on websites to improve agent reliability, and (4) a safety contro