This paper presents a general computational treatment of how mammals are able to deal with visual objects and environments. The model tries to cover the entire range from behavior and phenomenological experience to detailed neural encodings in crude but computationally plausible reductive steps. The problems addressed include perceptual constancies, eye movements and the stable visual world, object descriptions, perceptual generalizations, and the representation of extrapersonal space.
The entire development is based on an action-oriented notion of perception. The observer is assumed to be continuously sampling the ambient light for information of current value. The central problem of vision is taken to be categorizing and locating objects in the environment. The critical step in this process is the linking of visual information to symbolic object descriptions; this is called indexing, from the analogy of identifying a book from index terms. The system must also identify situations and use this knowledge to guide movement and other actions in the environment. The treatment focuses on the different representations of information used in the visual system.
The four representational frames capture information in the following forms: retinotopic, head-based, symbolic, and allocentric. The functional roles of the four frames, the communication among them, and their suggested neurophysiological realization constitute the core of the paper. The model is perforce crude, but appears to be consistent with all relevant findings.