1. Introduction
Neuroscientists have long posited anatomically distinct ventral and dorsal streams within the cortical visual system (Mishkin and Ungerleider Reference Mishkin and Ungerleider1982). According to Milner and Goodale’s (Reference Milner and Goodale2006; see also Goodale and Milner Reference Goodale and Milner2004) account, these distinct streams serve independent functions: the ventral stream functions to generate representations for “perception,” that is, visual recognition and downstream cognition, while the dorsal stream functions to generate representations for the online guidance of action. This “perception/action” account of the function of the visual streams has several subtheses, including the thesis that perception and action employ distinct representations (henceforth, “Two-Reps”) and that these representations are generated by distinct visual subsystems (henceforth, “Two-Systems”).
A critical piece of evidence for the perception/action model is the finding that visual illusions affect action tasks less than visual perception. If illusions have a smaller impact on guidance representations than on conscious perception, then it follows that the representations are not the same. Thus, Milner and Goodale (Reference Milner and Goodale2006) have argued that this psychophysical evidence provides powerful support for the perception/action model. In recent years, however, this support has come under sharp attack, as critics of the perception/action model have argued against both the validity of the original experiments and their support (irrespective of their validity) for the perception/action model.
In this article, I reexamine the psychophysical evidence for the perception/action model. As this evidence is wide-ranging and the criticisms of it diverse, I focus on the general ability of psychophysical evidence to support the different components of the perception/action model.
Section 2 further characterizes the perception/action model and examines the logical relationship between Two-Reps and Two-Systems. Section 3 presents the psychophysical evidence in greater depth, focusing on two findings that, I will argue, present the strongest support for the perception/action model. Section 4 briefly examines objections to the psychophysical evidence from experimental design and generality. I argue that these objections present only a limited challenge to the defender of the perception/action model. Section examines Thor Grünbaum’s (Reference Grünbaum2017, Reference Grünbaum2021) argument that a traditional, One-Rep account of the visual streams also explains the psychophysical evidence. I argue that Two-Reps provides a better explanation of the evidence and thus continues to be supported by it. Section 6 returns to Two-Systems, arguing that the psychophysical evidence only moderately supports this stronger thesis.
2. The perception/action model
Milner and Goodale articulate both an anatomical and functional model of the two visual streams. The anatomical model, which draws heavily on Mishkin and Ungerleider’s (Reference Mishkin and Ungerleider1982) account, posits two anatomically distinct processing streams within the cortical visual system: a ventral stream extending laterally from V1 to the inferotemporal cortex and a dorsal stream extending dorsally from V1 to the posterior parietal lobe. Although some connectivity between the streams has been identified, the anatomical existence of the two streams is well established (Borra et al. Reference Borra, Calzavara, Gerbella, Murata, Rozzi and Luppino2008; Schenk and McIntosh Reference Schenk and McIntosh2010; Zanon et al. Reference Zanon, Busan, Monti, Pizzolato and Paolo Battaglini2010).
The functional “perception/action” model holds that the ventral and dorsal streams serve distinct functions: “perception” and “action,” respectively. By “perception,” Milner and Goodale mean the representations and upstream processes employed during cognitive performance, such as conscious visual reports. By “action,” they mean the set of representations and upstream processes employed during online visuomotor guidance, such as the guidance of the hands and limbs when reaching for an object. For instance, when I see my mug and decide to have a drink, I rely on ventral stream representations. However, the dorsal stream underlies “action” by generating representations for online movement control. For example, when I reach for my mug, dorsal stream representations help me determine the exact joint angles and grip aperture I need to use during my approach.
As this example illustrates, both the ventral and dorsal streams are believed to be involved in action to some degree. Without ventral stream representations, I would not have noticed my mug and therefore would not have attempted to pick it up. Without dorsal stream representations, I would not have been able to precisely control my movements as I did so. To clarify the distinct roles that the ventral and dorsal streams play in action, Milner and Goodale often explain that the ventral stream is responsible for “action planning,” that is, the selection of the course of action and the broad characteristics of how it will be carried out. In contrast, the dorsal stream is responsible for “action guidance,” that is, the fine-grained updating of behavior as the action is carried out.
The perception/action model has two main components. First, Milner and Goodale propose that the ventral and dorsal streams constitute independent systems for distinct functions, perception and action, respectively. I will call this first component “Two-Systems” and the more traditional opposing view “One-System.”
It is conceivable that distinct systems could process features that are most relevant to action planning and action guidance, respectively, while nonetheless culminating in a unified representation that combines elements of both, just as, on the what/where model, distinct systems process object and spatial features of the distal scene while (presumably) culminating in a representation (or functionally integrated set of representations) that combine(s) elements of both. However, Milner and Goodale’s contention that the streams are modular “on the output side” precludes this (Milner and Goodale Reference Milner and Goodale2006, 14). According to the perception/action model, the ventral and dorsal streams generate distinct representations that are separately deployed in their downstream tasks, perception and action, respectively. I will call this claim “Two-Reps.”
Two-Reps is remarkable, as it conflicts with the overwhelming appearance that the same visual representation is employed in both activities. That is, it seems that my visual experience both helps me notice my mug and guides my activity during the approach (Clark Reference Clark2001). I will refer to this more conventional conception as “One-Rep.” However, it is important to note that the labels “One-Rep” and “Two-Rep” are misleading, as both views are compatible with multiple local representations being used in their respective tasks. Neither view takes a definitive stance on whether conscious experience is a single representation or composed of several related representations. Moreover, both views recognize that perceptual representations evolve continuously as one continues to receive sensory input. The crux of the disagreement between One-Rep and Two-Rep concerns whether perception and action operate in response to the same representational vehicle(s) regarding a given distal stimulus or if separate vehicles are employed in these different tasks.
Discussions of the evidential import of illusion experiments have rarely distinguished between these two components of the perception/action model. As we will see, this is a mistake, as these experiments provide much stronger support for Two-Reps than for Two-Systems. Thus, before examining the psychophysical evidence, it is important first to map out the logical relationships between the two theses.
I noted in the preceding text that if distinct systems were responsible for processing the features most relevant to action planning and guidance, respectively, but culminated in a unified representation that employed features of each, then (a version of) Two-Systems would be true while Two-Reps would be false. This is not how Milner and Goodale conceive of the relationship between these two components of the perception/action model, however. Rather, their conception of Two-Systems entails Two-Reps. This is true for two reasons. First, as noted previously, they individuate systems by the downstream tasks they directly function to subserve. That is, for example, the dorsal stream performs its action-guiding role by generating distinct representations that are employed in visuomotor guidance. If so, then Two-Systems entails Two-Reps.
The second reason that Milner and Goodale posit a tight logical relationship between Two-Systems and Two-Reps concerns their conception of the role that having distinct streams plays in performing their downstream tasks. As Milner and Goodale see it, dividing up perception and action plays a crucial role in optimal performance across both tasks because they place conflicting demands on the corresponding underlying computational systems (Milner and Goodale Reference Milner and Goodale2006, ch. 2). That is, they contend, online action guidance requires sensitivity to the evolving relationship between the target and the relevant effector system, necessitating processing that is high in temporal (but not necessarily spatial) resolution, updates representations rapidly, employs an egocentric (subject-centered) spatial frame, and processes guidance-specific features (such as exact grip apertures). Conversely, action planning requires sensitivity to the semantic features of the target (such as the appropriate place to grasp it) and the target’s objective spatial location. This necessitates processing that is high in spatial (but not necessarily temporal) resolution, is relatively constant across dynamic changes in the target’s egocentric direction—which in turn entails longer storage durations and an allocentric (object-centered) spatial frame—and processes selection-specific features (such as object identity). But notice that many of these requirements entail differences in the representations employed for each downstream task. Thus, Milner and Goodale argue that optimal performance in planning and guidance requires distinct streams to produce representations tailored for these distinct tasks. This, too, entails a tight logical relationship between Two-Systems and Two-Reps.
The fact that Milner and Goodale’s conception of Two-Systems entails Two-Reps informs their evidentiary relationship. If Two-Systems entails Two-Reps, then evidence for the former is also evidence for the latter. The contrapositive is also true: Evidence against Two-Reps is also evidence against Two-Systems.
Notice, however, that Two-Systems is stronger than Two-Reps. The former requires, while the latter does not, that the ventral and dorsal streams constitute relatively separate processing units for distinct tasks. The latter, by contrast, is consistent with a unified processing unit that generates distinct representations for different downstream tasks. However, Two-Reps is, as highlighted on the preceding text, fairly surprising. And if, as Milner and Goodale contend, the conflicting task demands of perception and action require distinct kinds of processing, then distinct representations for these distinct tasks are best supported by distinct systems. It follows that, Two-Reps is more probable if Two-Systems is true than if One-System is true. Thus, evidence for Two-Reps moderately supports Two-Systems.
Moreover, Milner and Goodale argue that the division between streams itself serves a useful function by allowing the ventral and dorsal streams to realize the ideal processing features specific to perception and action, respectively. Some of these processing features (such as spatial format) will persist in the relevant representations. Therefore, while Two-Reps does not entail Two-Systems, we can examine features of the separate representations for perception and action to draw inferences about the kinds of processes that generate those representations and thus whether distinct systems likely generated those representations. In this way, Two-Reps can serve as a broader evidential basis for Two-Systems. This complex relationship between Two-Reps and Two-Systems will prove crucial, especially when we return to the evidential support for Two-Systems in section 6. However, I will set Two-Systems aside for the next three sections to focus on whether the psychophysical evidence supports Two-Reps.
3. The psychophysical evidence: Two lessons
Numerous studies have indicated that visually guided behaviors, such as reaching, pointing, or grasping, are less susceptible to certain visual illusions than visual perception.Footnote 1
For example, the visual system’s tendency to determine absolute size by comparing an object with its surround gives rise to the Ebbinghaus Illusion (Fig. 1), where a circle surrounded by relatively small circles appears larger than an identical circle surrounded by relatively large circles. Similarly, a modified version of the illusion can make circles of different sizes appear to be of the same size (Fig. 2). Aglioti et al. (Reference Aglioti, DeSouza and Goodale1995) presented subjects with 3D versions of the original and modified Ebbinghaus illusions (Fig. 3). They instructed subjects to pick up the right center circle if they believed the center circles to be of equal size and the left center circle if they believed the center circles to be of different sizes (the order of left and right was counterbalanced across blocks). They then measured the maximum grip aperture of the fingers as the subject approached the center disc. Surprisingly, they found that, holding the actual size of the center circle constant, the maximum grip aperture was not significantly different between the trials where the center circles appeared to be of identical size and those where the center circles appeared to be different sizes. That is, the Ebbinghaus illusion appeared to have minimal influence on the guidance of the subject’s grip as they reached for the circle. This suggests that the representation driving this behavior is less sensitive to this comparative size effect than conscious experience. Because the Ebbinghaus illusion heavily influences conscious experience, the fact that the representations driving motor guidance are less sensitive to it provides evidence that the two are not identical.
The preceding study establishes that online motor guidance is relatively insensitive to multiple visual illusions when judging the size of objects during grasping tasks. Króliczak et al. (Reference Króliczak, Heard, Goodale and Gregory2006) establish that this effect extends to the hollow face illusion when judging the distance of objects during a “flicking” task. In the hollow face illusion, a concave mask of a face is perceived as a normal convex face (Fig. 4). Crucially, therefore, the hollow face generates illusory representations of the distance of the face’s surface. Króliczak et al. presented subjects with either a normal (convex) face, an illusory (concave) hollow face, or a concave mask lit to reveal its concavity (and thus not generate the hollow face illusion). The task was to either judge the location of a magnet on the mask (by drawing its location) or reach out and quickly flick the magnet off the mask. To prevent changes to the subject’s perceptual information during the flicking task, the subjects wore goggles that blacked out immediately upon initiation of the reaching movement. In the trials in which subjects were presented with the illusory mask, it was found that subjects’ judgments of the location of the target reflected the illusory distance: Their judgments were similar to their judgments on a normal (convex) face. However, in the flicking task, it was found that subjects reached as far when presented with the illusory hollow face as when presented with the nonillusory concave face. That is, their flicking behavior reflected the actual distance of the surface of the mask, not the illusory distance. This, again, despite a persistent illusory effect in visual experience (as reflected in the judgment task).
Where Aglioti et al. (Reference Aglioti, DeSouza and Goodale1995) established differences in the representation of the size of distal objects, and Króliczak et al. (Reference Króliczak, Heard, Goodale and Gregory2006) established differences in the representation of the distance of distal objects, Bridgeman et al. (Reference Bridgeman, Peery and Anand1997) established differences in the representation of locations of distal objects. They exploited yet another context effect in visual perception: When a target is placed within a larger frame, and the center of the frame is not aligned with the observer’s midline, the perceived location of the target is shifted in the direction opposite to the offset of the frame’s center (see Fig. 5). Across four experiments, Bridgeman et al. asked whether this “Roelofs effect” would influence perception and action equally. Subjects were first trained to identify five locations on a display by either pointing to those locations or pressing a key from 1 to 5. In the experimental trials, subjects saw an object at one of the five locations paired with a frame at one of three locations (center, right, and left) for one second. They then saw a brief display indicating the appropriate response format (pointing and judging), after which they registered their response. It was found that although the location of the frame had a significant impact on judged location (consistent with the Roelofs effect), there was comparatively little impact of frame position on pointing location, suggesting a relative insensitivity of guidance representations to the Roelofs effect.
I have summarized here just a few of many experiments that show that action systems are generally less sensitive to visual illusions than is conscious perception. Similar results have been obtained regarding the size of distal objects in the context of the Ponzo illusion (Brenner and Smeets Reference Brenner and Smeets1996; Ellis et al. Reference Ellis, Randall Flanagan and Lederman1999; Jackson and Shaw Reference Jackson and Shaw2000; Westwood et al. Reference Westwood, Dubrowski, Carnahan and Roy2000; Whitwell et al. Reference Whitwell, Buckingham, Enns, Chouinard and Goodale2016; see also Gonzalez et al. Reference Gonzalez, Ganel, Whitwell, Morrissey and Goodale2008), crowding effects (Chen et al. Reference Chen, Sperandio and Alan Goodale2015), and in apparent violation of Weber’s law (Heath et al. Reference Heath, Manzone, Khan and Davarpanah Jazi2017), regarding the length of distal lines in the context of the Müller-Lyer illusion (e.g., Dewar and Carey Reference Dewar and Carey2006; for a critical review, see Bruno and Franz Reference Bruno and Franz2009) and Sander parallelogram illusion (Whitwell et al. Reference Whitwell, Goodale, Merritt and Enns2018), regarding the orientation of distal objects when perturbed behind a backward-mask (Chen and Saunders Reference Chen and Saunders2016), regarding the location of a distal object when perturbed during a saccade (Hansen Reference Hansen1979; Hansen and Skavenski Reference Hansen and Skavenski1977; Bridgeman et al. Reference Bridgeman, Lewis, Heit and Nagle1979; Wong and Mack Reference Wong and Mack1981) and in contrast to a perceived change in velocity due to perturbation of the background (Brenner and Smeets Reference Brenner and Smeets1994; Smeets and Brenner Reference Smeets and Brenner1995) to name just a few. These effects have also been established for a wide variety of tasks, including a variety of grasping (Brenner and Smeets Reference Brenner and Smeets1996; Ellis et al. Reference Ellis, Randall Flanagan and Lederman1999; Jackson and Shaw Reference Jackson and Shaw2000; Westwood et al. Reference Westwood, Dubrowski, Carnahan and Roy2000; Dewar and Carey Reference Dewar and Carey2006; Gonzalez et al. Reference Gonzalez, Ganel, Whitwell, Morrissey and Goodale2008; Chen et al. Reference Chen, Sperandio and Alan Goodale2015; Chen and Saunders Reference Chen and Saunders2016; Heath et al. Reference Heath, Manzone, Khan and Davarpanah Jazi2017; Whitwell et al. Reference Whitwell, Goodale, Merritt and Enns2018) and pointing tasks (Hansen Reference Hansen1979; Hansen and Skavenski Reference Hansen and Skavenski1977; Bridgeman et al. Reference Bridgeman, Lewis, Heit and Nagle1979; Wong and Mack; Reference Wong and Mack1981; Brenner and Smeets Reference Brenner and Smeets1994; Smeets and Brenner Reference Smeets and Brenner1995).
Two lessons from the illusions literature will be important in what follows. First, the studies employ a variety of behavioral tasks and illusions to establish a relative insensitivity of behavior to those illusions. I have highlighted this variation previously: The Ebbinghaus illusion substantially influences the perceived size of distal objects but not grasping behavior toward those objects; the hollow-face illusion (in the context of Króliczack et al. experiments) substantially influences the perceived distance of the distal objects but not flicking behavior toward those objects; and the Roelofs effect substantially influences the egocentric location of distal objects but not pointing behavior toward those objects. Crucially, due to the variety of behavioral tasks and illusions employed, there is no fixed relationship between the illusory property and the correct distal property nor between corresponding inaccurate and accurate behaviors. As we will see, this raises problems when attempting to explain such effects away, as one requires many different kinds of counterexplanations for the many different kinds of effects established in the illusions literature.
The second lesson pertains to the types of visual illusions that affect action and the extent of their influence. It is important to note that the evidence does not suggest that visual illusions do not affect action. Instead, the evidence indicates that visual illusions impact action significantly less than perception. This point is often misconstrued in the literature, with some proponents of the perception/action model providing misleading descriptions of the evidence (e.g., Clark Reference Clark2001) and some critics of the perception/action model incorrectly suggesting that evidence of influence on action is novel (e.g., Schenk and McIntosh Reference Schenk and McIntosh2010). Moreover, it is sometimes argued that evidence for some influence on action undermines the support that illusion studies provide for Two-Reps (see, e.g., Briscoe Reference Briscoe2008; Schenk et al. Reference Schenk, Franz and Bruno2011; Briscoe and Schwenkler Reference Briscoe and Schwenkler2015; Ferretti Reference Ferretti2016, Reference Ferretti2021). However, the argument for Two-Reps from the illusion studies relies only on the substantial difference in responsiveness to the illusions, not on the complete absence of illusory effects in action (this point has also been made by Wu Reference Wu2013 and Kozuch Reference Kozuch2022, as well as Mole Reference Mole2009, who is otherwise critical of the perception/action model).
After acknowledging that the issue is not whether illusions affect action but to what degree, we can examine which illusions have the most impact on action and which have the least. Defenders of the perception/action model have persuasively argued that the illusions that most impact action appear to be those represented in early visual areas such as V1 (Milner and Dyde Reference Milner and Dyde2003; see Kozuch Reference Kozuch2022 for a recent summary). If so, it is unsurprising that such illusions impact action, as V1 is a common input to the ventral and dorsal streams. By contrast, illusions processed later in visual processing have significantly less impact on action. Thus, the second lesson from the two streams literature is that visual illusions affect action as a function of their presence in early visual areas.
4. Objections from experimental design and generality
In this section and the next, I aim to show that the psychophysical evidence strongly supports Two-Reps over One-Rep. In this section, I address more familiar objections to experimental evidence regarding deficiencies in the experiments and the generalizability of the experimental results.
4.1. Experimental deficiencies
As outlined in the preceding text, the literature on psychophysical differences between perception and action is both large and growing. There are, of course, numerous concerns about the validity of individual experiments. I cannot address even a small portion of the purported deficiencies discussed in the literature. The size of the literature, however, is a source of strength for the defender of the perception/action model, as even if one grants that a particular study or set of studies is confounded, this will only succeed in removing from consideration one of a large number of established effects. As discussed in the next section, strong support for Two-Reps requires just a single replicable result. Thus, opponents who seek to undermine this support have their work cut out for them.
I will discuss three sorts of confounding differences between the action and perception tasks that have the prospects of impacting a wide swath of experiments: differences in time, differences in attention, and differences in available perceptual information. I discuss these three because, besides potentially impacting many experiments, these confounds are very similar and often discussed in the same breath. However, it is possible to control for them differentially. That is, an experimental paradigm that controls for differences in perceptual information, for example, does not necessarily control for differences in attention. Thus, distinguishing between them allows us to identify experimental paradigms that might control for all three.
I will first clarify the three differences by returning to the Aglioti et al. (Reference Aglioti, DeSouza and Goodale1995) experiment. In that experiment, subjects first judged whether the Titchener arrays’ center circles were equal sized and then registered their response by grabbing either the left or right center circle. The dependent variable for the latter was the maximum grip aperture as the hand reached for the target. As a reminder, it was found that while the Ebbinghaus illusion substantially influenced the judgment task, determining which circle was reached for, the maximum grip aperture was relatively insensitive to this effect.
As my description of the experiment makes clear, the perception task (judging whether the center circles were of equal size) was always performed before the action task (grasping the center circle). Could this difference in time between the perception and action tasks help to explain the improved size estimate during the latter? Bruno et al. (Reference Bruno and Franz2009) have argued precisely this in the case of the Müller-Lyer illusion: that rapid visual processing furnishes action with an improved estimate during the approach. We know this, they argue, because the effect disappears when we remove online visual feedback during the movement (such as by having the subjects wear blinding goggles, as used in the Króliczak et al. [Reference Króliczak, Heard, Goodale and Gregory2006] experiment discussed previously). This criticism suggests a possible fix: We can control for the influence of time differences on the experimental results by eliminating the potential for online visual feedback during the action task. This is now common practice, with experimenters using so-called open-loop tasks to correct for them (Bridgeman et al. Reference Bridgeman, Peery and Anand1997; Haffenden and Goodale Reference Haffenden and Goodale1998; Króliczak et al. Reference Króliczak, Heard, Goodale and Gregory2006; Bruno et al. Reference Bruno, Bernardis and Gentilucci2008; Bruno and Franz Reference Bruno and Franz2009; Franz et al. Reference Franz, Hesse and Kollath2009; Whitwell et al. Reference Whitwell, Goodale, Merritt and Enns2018).
A second problem with the Aglioti et al. experiment is a potential difference in the allocation of attention between the perception and action tasks, a task-demand effect. In the perception task, subjects examine both center circles in context to determine which appears larger. In the action task, by contrast, subjects need only determine the graspable size of the target center circle. Among other differences, the former task, but not the latter, requires that the subject attend to both center circles. But it has long been known that the Ebbinghaus effect is most pronounced when one does so. As has been emphasized by Briscoe (Reference Briscoe2009), the Aglioti et al. results do not generalize to similar experiments using just a single array (Goodale and Milner Reference Goodale and Milner2004). Thus, evidence suggests that how subjects allocate attention based on particular task demands influences the results.
Notice that while differences in time and differences in attention are similar in the Aglioti et al. experiment, they are not the same confound. Imagine a version of the experiment in which subjects were told before each trial whether to perform the judgment or reaching task but wore blinding goggles to prevent visual feedback during the reach (as in Króliczak et al. Reference Króliczak, Heard, Goodale and Gregory2006). The latter control would eliminate the time difference between perception and action but would not eliminate the attentional difference. If subjects know whether to perform a judgment or grasping task before the presentation of the arrays, then they can allocate their attention to maximize performance on those tasks.
How, then, can we control for the attention confound? One method is to use a paradigm similar to Bridgeman et al.’s (Reference Bridgeman, Peery and Anand1997) discussed already. In that experiment, subjects first saw the target display (a circle with a rectangular surround) for 1s. Only after the display disappeared were the subjects told whether the trial was a perception (keypad) or action (point) trial. This setup thus eliminates the possibility of a task-based difference in attention during target presentation. The setup is also open-loop and thus eliminates the time-difference confound.
A third possible difference between the perception and action tasks is the perceptual information available. For example, could subjects in the Aglioti et al. experiment use tactile or proprioceptive information to refine their grip aperture? This confound is similar to the time difference confound but relates to the kind of perceptual information available. Even if subjects are presented with the same visual information at the same time (as in the Bridgeman et al. study), subjects may have additional perceptual information during the approach.
I want to make two remarks about the perceptual information confound. First, the confound appears to require different kinds of controls from either of the first two. If subjects could be using tactile and proprioceptive information to adjust their grips in the Aglioti et al. experiments, for example, then blinding goggles will not necessarily address this problem. Second, because the kinds of action tasks differ widely from experiment to experiment and the potential for different kinds of perceptual information to influence these actions differs, controlling for this confound is necessarily paradigm specific, with no apparent general strategy. For example, Kroliczak et al. (Reference Króliczak, Heard, Goodale and Gregory2006) asked subjects to point a fixed distance below the target mask to avoid tactile feedback from the normal (concave) mask, addressing a specific concern for this paradigm in a way that is often unavailable for others. I clearly cannot address all possible concerns of perceptual contamination in the illusion experiments. I can only note that this problem is widely recognized and addressed in the literature (Whitwell et al. Reference Whitwell, Goodale, Merritt and Enns2018). The best experimental paradigms control (as best as they can) for such effects. I conclude that there is no systematic failure to address this concern.
In summary, the illusions literature is vast, and so is the potential for confounds of individual studies. Three relatively general confounds, time, attention, and perceptual information, are widely recognized and, as far as possible, controlled for in recent studies. I conclude that, although we should be cautious in making broad claims about the strength of the extant psychophysical evidence, there are no general reasons to doubt the literature as a whole. Thus, there is hope that some presently established effects will continue to hold. In the next section, I argue that this more modest conclusion is enough to strongly support Two-Reps.
4.2. Generality
The second kind of criticism of the illusion experiments concerns their generality. Numerous studies now establish that some visual illusions do impact action. If so, the objection goes, then any differential influence on perception and action must itself be illusory.
This general argument has taken at least three forms. First, and most directly, some of the original experiments have simply failed to replicate, suggesting the original results may have been a mere aberration. Second, some have pressed that, even in the experiments cited as supporting the perception/action model, visual illusions have a recognizable impact on action. Third, some critics have run “conceptual replications” of the original experiments, replicating the general structure of the illusion experiments with new kinds of illusions, and found a similar influence of those illusions on perception and action.
Consider first the problem of replication. An illusion experiment can only support the perception/action model if it survives replication. This is not a novel problem for the two streams literature, and there is nothing uniquely damning about a failed replication in this literature. Many of the illusion experiments have continued to replicate over time (such as those employing the Ponzo illusion, intersaccadic perturbations, and conflicts between perceived movement and location; see section 3 for references). We should focus our attention on these examples.
However, the second and third versions of the generality objection are specific to this literature. To address the second objection, we must ask, “Exactly how different should perception and action be in their sensitivity to visual illusions to support the perception/action model?” To address the third, we must ask, “What kinds of illusions should action be less sensitive to, and in what sorts of situations?”
The answers to these questions depend on the hypothesis one aims to confirm or disconfirm. As noted in the preceding text, our primary concern is whether the psychophysical evidence supports Two-Reps. To reiterate, the argument for Two-Reps from the illusions studies is as follows:
-
1. If a perception task and action task are differentially sensitive to a visual illusion in the same perceptual circumstances, the representations on which task performance relies must represent different properties (or values of properties).
-
2. Assuming these representations are coherent, it follows that the representations driving the perception task and action task are not the same.
-
3. Thus, Two-Reps is true.
Notice that this argument does not rely on any claims about the size of the difference in representations driving perception and action tasks. It merely relies on the representations being different. Thus, the answer to the first question is that any difference between perception and action in their sensitivity to visual illusions is sufficient to support Two-Reps. Similarly, the argument does not rely on particular claims about which illusions should be differentially represented or how large the set of such illusions needs to be. Any difference in content is sufficient to support Two-Reps. Thus, the answer to our second question is that any kind of illusion, in any situation, is enough to support Two-Reps. In short, the argument for Two-Reps from the illusion experiments requires a replicable result (and thus the first kind of generality applies). However, that result does not need to be of any particular size or generalize across any particular range of illusions or task situations. Because at least some of the psychophysical results replicate, it follows that there is no major concern of generality, and the existing literature continues to support Two-Reps.
I have not addressed whether the psychophysical evidence has the generality problems that some opponents have claimed. I take this issue to be live in the present debate. Instead, I have argued that Two-Reps is supported by the evidence regardless of the size or generality of the results. To support Two-Reps, what matters is that the results are replicable. Because some illusion experiments are replicable, Two-Reps remains supported by the extant literature.
5. Grünbaum’s Argument
A third criticism takes a different form. Thor Grünbaum (Reference Grünbaum2017, Reference Grünbaum2021) has argued that, setting aside both the experimental deficiencies and generality concerns of the illusion experiments, the extant results underdetermine the decision between the perception/action model and the more traditional model that posits just one, shared representational vehicle between perception and action (One-Rep). Grünbaum’s argument is surprising, as it is widely assumed that successful, sufficiently generalizable results from the illusion experiments would support the perception/action model. It is also important because it advances an in-principle argument against the illusion experiments’ support: Grünbaum’s concern is not with any particular experimental results but with the basic motivation for reasoning from any behavioral results to the perception/action model generally or Two-Reps in particular. Thus, this argument demands our attention here.
To understand Grünbaum’s argument, notice that the evidence from illusion studies is purely behavioral: the finding is that there are systematic differences between performance on perception tasks that reflect differences in their sensitivity to visual illusions. We infer from differences in task performance to differences in the features represented by the systems driving that performance. Drawing on arguments from Anderson (Reference Anderson1978), Grünbaum argues that such inferences ignore a crucial determinant of task performance: the “transformational processes” that determine concrete behaviors based on representations of the distal environment.
To see what he means, consider again the study by Aglioti et al. (Reference Aglioti, DeSouza and Goodale1995). It was found that although subjects’ experiences are susceptible to contextual effects when judging the size of the center circles on Titchener arrays, their grasping behavior demonstrated a relative insensitivity to those effects. In keeping with Two-Reps, one explanation of these results is that the representation behind perceptual judgment is illusory while the representation driving grasping behavior is not (or substantially less so). However, another interpretation, which is consistent with One-Rep, is that both perceptual judgment and grasping rely on a common set of representations but that the mechanism that determines grip aperture based on those representations substantially attenuates the influence of the illusory size representation.
To understand the full import of this argument, it is important to distinguish between two ways a “transformational process” might achieve this attenuation. First, the transformational process might draw on additional perceptual information when refining grip aperture. As Grünbaum suggests, “it could be argued that grasping employs the same visual representation as judging but that, in addition to the visual representation of object size, grasping also uses other types of information (e.g., visual information about the hand and haptic information from touch) not available in judgement tasks” (Reference Grünbaum2017, 429). While this is possible, and as we’ve seen presents a challenge for interpreting particular experimental results, it does not represent an in-principle objection to drawing on behavioral evidence in support of Two-Reps. As seen in section 4.1, more refined studies have attempted to find and eliminate these potential confounds from their experiments. No argument has been given that such confounds are in-principle ineliminable.
Second, however, transformational processes may employ the same visual information (and no further perceptual information) in a different way, allowing for “corrections” to the illusory representation. One thought regarding the results of Aglioti et al., for example, is that action systems use information about the edges of objects rather than their size when determining grip apertures (Kopiske et al. Reference Kopiske, Bruno, Hesse, Schenk and Franz2016). If edge representations were relatively accurate, despite size representations being inaccurate, this could explain the relative insensitivity of grip apertures to illusions involving object size.
Notice that the previous argument is general: Any difference in behavior can be explained either by a difference in representation (supporting Two-Reps) or by some difference in the transformational mechanism from that representation to concrete behavior (supporting One-Rep). Thus, Grünbaum argues that the behavioral evidence underdetermines the choice between One-Rep and Two-Reps. If so, then that evidence cannot support either position.
To see what is wrong with this argument, notice that while the structure of the argument is general, the alternative explanation from transformational processes is necessarily particular. For example, when explaining the Aglioti et al. results, we might posit a transformational process that relies only on representations of object edges, rather than object size. However, we will need a different sort of transformational process when explaining flicking behavior in Króliczak et al.’s (Reference Króliczak, Heard, Goodale and Gregory2006) hollow face illusion experiments (which related to target distance rather than size) and yet another explanation when explaining pointing behavior in Bridgeman et al.’s (Reference Bridgeman, Peery and Anand1997) apparent motion experiments (which related to the target’s egocentric location). Grünbaum’s proposed alternative requires a different kind of explanation for (at least) every combination of illusory distal representation and task. As reviewed in section 3, this includes various illusory representations of size, length, orientation, and location (each of which may need to be further broken down based on the different sorts of transformational mechanisms required to adjust for different visual illusions), and a variety of grasping and pointing tasks. And this is only to list a sample of the known effects, where Grünbaum’s explanation requires a distinct transformational process for each effect, known and unknown. Given especially our limited ability to precisely measure action tasks (hence the current restriction largely to pointing and grasping tasks), the number of unknown effects likely far exceeds the known ones.
This reveals substantial hidden complexity in Grünbaum’s apparently simple solution from transformational processes: If each illusion-type/task pair requires a different kind of transformational solution, then the result is a highly complex solution to what is, according to Two-Reps, a relatively simple problem. In short, the prior probability of the One-Rep account is much lower than that of Two-Reps. If so, then the explanation from Two-Reps remains much stronger and thus is better supported by the first lesson from the illusion experiments described above.
This general challenge is made even more pressing by the second lesson from the illusion experiments. As discussed previously, the simple model on which guidance representations are never illusory to any degree is demonstrably false: Most visual illusions impact action to some extent (but often less than they influence perceptual judgment), and different illusions impact action to different degrees. As we saw in section 4.2, the first fact supports Two-Reps because what matters is that the extent of the influence from a given illusion is different, not that guidance representations are immune. Moreover, the second fact supports the perception/action model because that model predicts that any visual illusion that is primarily processed before the anatomical divide (and thus is part of a common input to the ventral and dorsal streams) should impact the dorsal stream (and thus behavior) to a larger extent than illusions that are processed primarily after the anatomical divide in the ventral stream. While the current evidence is limited, this is precisely what has been found (Milner and Dyde Reference Milner and Dyde2003; Kozuch Reference Kozuch2022).
The One-Rep account, by contrast, does not explain why illusions processed before the anatomical divide should impact action more than those processed after the anatomical divide. For one, that finding is, on its face, inconsistent with the One-Rep account, which posits a unified representation before the anatomical divide, while the cited evidence suggests influence on illusion representations after that divide. Moreover, any proposed explanation of these findings would again add complexity to the One-Rep account because it would require that we posit not just a transformational solution for every visual illusion but a transformational solution for exactly the right illusions and to exactly the right degree. Unlike the perception/action model, however, the explanation from One-Rep has no principled reason to distinguish between illusions generated before and after the anatomical divide and thus no principled reason to posit more effective transformational processes for the latter, but not the former.
In short, Grünbaum’s apparently simple solution hides significant complexity, which weighs against his explanation. Because, by contrast, Two-Reps and the perception/action model easily explain the illusion evidence, it continues to support Two-Reps over One-Rep. Moreover, as you may have noticed from the preceding discussion, the complexity required to salvage the One-Rep account also makes that account extremely ad hoc. No prior or principled considerations built into the One-Rep account suggest these solutions, while the solution from Two-Reps and the perception/action model is both prior and principled.
The argument I have just given points to a familiar issue in the philosophy of science: The strength of two theories is determined by more than the set of facts those theories explain. Thus, it does not follow from the fact that two theories are predictively equivalent with respect to a body of evidence that they are equally supported by that evidence. We have thus identified an in-principle objection to Grünbaum’s in-principle argument against illusion studies—and thus Anderson’s (Reference Anderson1978) argument on which he relies.
6. The psychophysical evidence and two-systems
The last two sections argued that the extant psychophysical evidence strongly supports Two-Reps. In this section, I consider what this evidence shows regarding Two-Systems.
Recall from section 2 that, on Milner and Goodale’s interpretation, Two-Systems entails Two-Reps. Because One-System is consistent with both One-Rep and Two-Reps, Two-Reps moderately supports Two-Systems. Thus, because the psychophysical evidence strongly supports Two-Reps, that evidence moderately supports Two-Systems. This conclusion, however, is far weaker than the standard position in the literature, which typically regards psychophysical evidence as among the strongest evidence for Two-Systems. Is there a route to stronger support for Two-Systems from the illusion experiments?
One route could rely on what Grünbaum has called “the received view” of mechanistic individuation according to which “the question of whether we are dealing with one or two computational mechanisms can only be answered in terms of representational content” (Grünbaum Reference Grünbaum2017, 421). Grünbaum grants that this notion is not entirely clear, but suggests that “as a rule of thumb, we can take it to involve the idea that differences in type of representational input, type of representational output, or both, imply different computational mechanisms” (ibid.).
Like Grünbaum, I’m not sure to what this view amounts. Grünbaum’s argument relies only on the claim, which I’ve endorsed in the preceding text, that because Two-Systems entails Two-Reps, One-Rep is inconsistent with Two-Systems. But assuming that difference in computational mechanism entails a difference in system, the preceding quote suggests the stronger claim that Two-Reps entails Two-Systems. In support of this “received view,” Grünbaum cites Sprevak’s (Reference Sprevak2010) view that “computation must involve representational content,” a view that he suggests may be necessary to avoid pan-computationalism, as nonrepresentational accounts of computation appear to be realizable by “walls and rocks” (Grünbaum Reference Grünbaum2017, 422; citing Searle Reference Searle1990, Reference Searle1992). But the view that computation essentially involves representations does not entail that computational mechanisms are differentiated by representational contents. It seems possible, for example, that multiple mechanisms might compute the same function or, crucially, that a single mechanism could compute multiple functions.
One way to understand Grünbaum’s “received view” is as a claim, not about “mechanisms” as commonly discussed in the philosophy of neuroscience (e.g., Machamer et al. Reference Machamer, Darden and Craver2000), but more narrowly as the physical realizers of a single computation. Contra my suggested counterexample mentioned previously, these mechanisms do not perform multiple computations by definition. The problem is that visual streams, as conceived of by Milner and Goodale, are certainly not mechanisms so conceived, as they carry out a variety of computations, producing different representations for different kinds of tasks. Thus, even if Two-Reps entails the existence of two “computational mechanisms,” the latter does not entail Two-Systems. In short, no plausible version of this “received view” can support an entailment from Two-Reps to Two-Systems.
In the last section, I suggested that Two-Reps better explains the psychophysical evidence than One-Rep. Could Two-Systems similarly do a better job of explaining this evidence than One-System? For that to be true, the evidence would need to exhibit features—beyond those that confirm Two-Reps—that are more readily explained by two independent systems than by a single, unified system. In section 3, I noted that, according to Milner and Goodale’s conception, the ventral and dorsal stream employ different processes to generate representations for perception and action, respectively. Thus, if the psychophysical evidence is best explained by such differential processing, then this would support Two-Systems.
Consider, for example, the claim that the ventral stream (and thus perceptual representations) operates over an allocentric spatial format while the dorsal stream (and thus action-guiding representations) operates over an egocentric spatial format.Footnote 2 Might the psychophysical evidence support the existence of processes emphasizing these distinct spatial formats? This would be the case if the behavior in perception tasks were more sensitive to illusions emphasized in an allocentric spatial format, while behavior in action tasks was more sensitive to illusions emphasized in an egocentric spatial format.
I will note three barriers to this kind of support for Two-Systems from the psychophysical evidence. First, this inference requires a clear list of illusions more prominently represented in egocentric than allocentric formats (and vice versa). However, I expect any such list to be quite controversial. To generate the requisite list, we need to settle this and similar issues about the provenance of the relevant illusions. This will require more work.
Second, the inference requires that we compare sensitivity to different illusions. That is, we need to show, for example, that perceptual representations are relatively more sensitive to illusions presented allocentrically than those presented egocentrically. However, it is unclear how these comparative judgments should work across the wide range of discussed illusions. For example, which involves more sensitivity to the relevant illusion, representing a center circle as 20 percent larger than reality in a Titchener array (the Ebbinghaus Illusion) or representing a target circle as 10 degrees further from fixation when a left-aligned frame is provided (the Roelofs effect)? It is simply not clear.
Finally, the inference requires substantial generality in the psychophysical results. I noted in section 4.2 that the inference from the psychophysical evidence to Two-Reps requires (in principle) just a single replicable result. The present inference, however, requires that sensitivity to different illusions reliably tracks the egocentric/allocentric divide. Because the generality of the present results remains in question, so does the corresponding support for Two-Systems.
To conclude this section, I will note that the preceding involves substantial speculation about how the evidence could go. The kind of inference just gestured at is not widely discussed in the extant literature (though, for an attempt of this kind, see Goodale and Ganel Reference Goodale, Ganel and Wagemans2015). This is because Two-Reps and Two-Systems are rarely clearly distinguished, and thus researchers have taken strong support for the former to entail strong support for the latter. Thus, an lesson from the arguments presented in this article is that we must be more careful about which of these theses is at issue when considering the evidential support provided by the present experiments.
7. Conclusion
I have argued that the extant psychophysical evidence strongly supports Two-Reps. Existing experimental paradigms allow us to control for confounds of time, attention, and additional perceptual information while confirming differences in the responsiveness to visual illusions between perception and action tasks. The existence of a relatively small but replicable number of results is sufficient to establish Two-Reps. While an alternative explanation from One-Rep is available, accounting for the evidence requires substantial hidden complexity, decreasing its prior likelihood. Thus, Two-Reps continues to be supported by the psychophysical evidence.
Two-Systems, by contrast, is only weakly supported by the psychophysical evidence. Because Two-Systems entails Two-Reps, and because Two-Reps is relatively surprising, strong support for Two-Reps weakly supports Two-Systems. Independent support for Two-Systems, however, would require substantially more generality in the experimental results than is currently established, and there are multiple barriers to using the extant results to confirm particular predictions of Two-Systems (beyond Two-Reps). Thus, the current psychophysical evidence does not offer independent support for Two-Systems.
However, this is not to say that Two-Systems remains unsupported by the extant empirical evidence. The evidence for the perception/action model is far-ranging, including the psychophysical evidence, lesion studies, and computational analyses of imaging data. Whether the latter kinds of evidence support Two-Systems is beyond this article’s scope. My claim is that the psychophysical evidence provides only moderate support for Two-Systems and that defenders of the latter should look elsewhere for stronger support.
Acknowledgments
I would like to thank David Barack, Laurenz Casser, Sam Clarke, Andrea Rivadulla Duró, Alex Kerr, and two anonymous reviewers for their invaluable written comments. I am also grateful to the European Society of Philosophy and Psychology, the Antwerp Centre for Philosophical Psychology, and the Institute of Philosophy at the School of Advanced Study, University of London for providing me with opportunities to present and discuss this work.