This project focuses on advancing the field of robotic visual perception by addressing critical limitations in current artificial intelligence systems, particularly their inability to generalize effectively to novel environments and human-centered interactions. These limitations are in stark contrast to human perception, which supports a comprehensive human understanding of social activities and efficiently establishes visual learning of novel scenes with limited prior knowledge. Therefore, this project calls for a paradigm shift via transforming machinery perception into embodied recognition to achieve performance guarantees in real-world applications. This is achieved by the proposed human-like visual understanding and efficient learning methods. These advances are expected to foster interdisciplinary applications, including enhanced human-robot collaboration, and benefit areas like disaster response, security, and healthcare. The outcomes will also contribute to education by integrating findings into curricula and engaging diverse student groups, thus promoting inclusivity and STEM education. The research seeks to establish algorithmic foundations for human-embodied visual understanding, mirroring that of human recognition. To accomplish this objective, this project proposes two thrusts: human-centered visual understanding and human-like visual learning. Through these two thrusts, the project will develop novel frameworks for understanding human behaviors and interact