Towards a Compositional Generative Model of Human Vision

NIH RePORTER · NIH · R01 · $324,460 · view on reporter.nih.gov ↗

Abstract

Understanding object recognition has long been a central problem in vision science, because of its applied utility and computational difficulty. Progress has been slow, because of an inability to process complex natural images, where the largest challenges arise. Recently, advances in Deep Convolutional Neural Networks (DCNNs) spurred unprecedented success in natural image recognition. The general goal of this proposal is to leverage this success to test computational theories of human object recognition in natural images. However, DCNNs still markedly underperform humans when challenged with high levels of ambiguity, occlusion, and articulation. We hypothesize that humans' superior performance arises from the use of knowledge about how images and objects are structured. Preliminary evidence for this claim comes from the success of hybrid models, that combine DCNNS for identifying features and parts in images, with explicit knowledge of object and image structure. These computations occur within a hierarchy, which includes both top-down and bottom- up processing. The specific goal of the work proposed here is to strongly test whether these computational strategies, structured, hierarchical representations and bidirectional processing, are used to recognize objects in natural images. Human bodies are composed of hierarchically organized configurable parts, making them an ideal test domain. We examine the complete recognition process, from parts, to pairs of parts, to whole bodies, each in its own aim. Each aim also tests important sub-hypotheses about when and how the computational strategies are used. Aim 1 examines recognition of individual body parts, testing whether it is dependent on parsing images into more basic features and relationships, for example edges and materials. Aim 2 examines pairs of parts, testing the importance of knowledge of body connectedness relationships. Aim 3 examines perception of entire bodies, testing whether knowledge of global body structure guides bidirectional processing. In each aim, we first develop nested computer vision models that either do or do not make use of structural knowledge, to test whether it aids recognition. We then test whether human performance can be accounted for by the availability of that structural knowledge. We next measure neural activity with functional MRI to identify where and how it is used in cortex. Finally, we integrate these results to produce even stronger tests, using the nested models to predict human performance and confusion matrices as well as fMRI activity levels and confusion matrices. Altogether, this work will strongly test key theoretical accounts of object recognition in the most important domain, perception of natural images. The work, based on extensive preliminary data, measures and models the entire body recognition system. The models developed and tested here should surpass the state-of-the-art, and be useful for many real-world recognition tasks. The proposal wi...

Key facts

NIH application ID: 10228003
Project number: 5R01EY029700-03
Recipient: UNIVERSITY OF MINNESOTA
Principal Investigator: DANIEL J KERSTEN
Activity code: R01
Funding institute: NIH
Fiscal year: 2021
Award amount: $324,460
Award type: 5
Project period: 2019-09-30 → 2023-07-31