Building artificial intelligence (AI) systems that approach human cognitive flexibility requires a better understanding of how the brain uses visual and linguistic information to achieve specific goals. While previous research in cognitive neuroscience and AI has focused on visual classification tasks, such as identifying objects or labeling scenes, real-world behavior is more nuanced and often depends on selecting task-relevant information, guided by the observer’s goals. Critically, this process draws not only on the visual features of the scene, but on conceptual and linguistic knowledge as well. This project examines how people flexibly extract and use visual information in context and how this information is represented in computational models, supporting the goal of advancing theories of cognition and the development of more adaptive, human-aligned AI systems. The project integrates methods from visual AI (convolutional neural networks), language-based AI (large language models), neuroscience, and cognitive science. First, deep networks are trained to predict language embeddings of human scene descriptions elicited under different task goals, capturing how semantic meaning maps onto visual features. Next, these networks are reverse-engineered to generate activation maps that identify the regions of each image most relevant for a given task. These maps are validated using both behavioral experiments and electroencephalography (EEG). A novel multivariate analysis techn