According to the traditional inferential theory of perception, percepts of object motion or stationarity stem from an evaluation of afferent retinal signals (which encode image motion) with the help of extraretinal signals (which encode eye movements). According to direct perception theory, on the other hand, the percepts derive from retinally conveyed information only. Neither view is compatible with a perceptual phenomenon that occurs during visually induced sensations of ego motion (vection). A modified version of inferential theory yields a model in which the concept of extraretinal signals is replaced by that of reference signals, which do not encode how the eyes move in their orbits but how they move in space. Hence reference signals are produced not only during eye movements but also during ego motion (i.e., in response to vestibular stimulation and to retinal image flow, which may induce vection). The present theory describes the interface between self-motion and object-motion percepts. An experimental paradigm that allows quantitative measurement of the magnitude and gain of reference signals and the size of the just noticeable difference (JND) between retinal and reference signals reveals that the distinction between direct and inferential theories largely depends on: (1) a mistaken belief that perceptual veridicality is evidence that extraretinal information is not involved, and (2) a failure to distinguish between (the perception of) absolute object motion in space and relative motion of objects with respect to each other. The model corrects these errors, and provides a new, unified framework for interpreting many phenomena in the field of motion perception.