Introduction
Our seemingly effortless ability to perceive a world in which all the different visual attributes are in apparently precise temporal and spatial registration belies a complex cortical machinery, which decomposes the visual image into constituents such as form, color and motion, and processes them in separate and specialized visual areas. The evidence for this functional specialization in the primate visual brain comes from anatomical, electrophysiological (Zeki, Reference Zeki1978; DeYoe & van Essen, Reference DeYoe and Van Essen1988; Livingstone & Hubel, Reference Livingstone and Hubel1988; Zeki & Shipp, Reference Zeki and Shipp1988), and human imaging and clinical studies (Meadows, Reference Meadows1974; Zeki, Reference Zeki1990, Reference Zeki1991; Zeki et al., Reference Zeki, Watson, Lueck, Friston, Kennard and Frackowiak1991; Zihl et al., Reference Zihl, von Cramon, Mai and Schmid1991). This functional specialization has, moreover, temporal consequences since we perceive different attributes at different times, color taking temporal precedence over orientation, and orientation over motion (Moutoussis & Zeki, Reference Moutoussis and Zeki1997a ,Reference Moutoussis and Zeki b ; Zeki & Moutoussis, Reference Zeki and Moutoussis1997; Barbur et al., Reference Barbur, Wolf and Lennie1998; Arnold et al., Reference Arnold, Clifford and Wenderoth2001).
Of all the visual attributes, perhaps the easiest to separate both physiologically and perceptually are color and motion, color being associated with activity of the V4 complex and motion with activity of a separate system, based primarily on the area V5 (Zeki, Reference Zeki1978; Livingstone & Hubel, Reference Livingstone and Hubel1988; Zeki et al., Reference Zeki, Watson, Lueck, Friston, Kennard and Frackowiak1991). The evidence in favor of the separation of motion and color also comes from psychophysical experiments, which show that motion detection is impaired under conditions of equiluminance (Ramachandran & Gregory, Reference Ramachandran and Gregory1978; Cavanagh et al., Reference Cavanagh, Tyler and Favreau1984), indicating that the motion system, although sensitive to chromatic signals, does not contain neurons tuned to specific hues (Gouras & Kruger, Reference Gouras and Kruger1979; Dobkins & Albright, Reference Dobkins and Albright1994). Additional psychophysical evidence is consistent with functional specialization for other visual dimensions (Krumhansl, Reference Krumhansl1984; Livingston & Hubel, 1987; Theeuwes, Reference Theeuwes1992; Hong & Shevell, Reference Hong and Shevell2006; Hong & Blake, Reference Hong and Blake2009).
In the study reported here, we investigate functional specialization psychophysically using a visual masking paradigm, by examining the strength of interference between two perceptual signals, either arising from the same visual attribute (homogeneous target-mask pairs) or from different ones (heterogeneous target-mask pairs). Masking refers to the impaired detectability of a target stimulus when immediately preceded or succeeded by a task-irrelevant visual input, referred to as the mask (Breitmeyer & Ogmen, Reference Breitmeyer and Ogmen2006). Visual temporal masking has been reported in both the motion (Braddick, Reference Braddick1973; Ferrera & Wilson, Reference Ferrera and Wilson1987) and the color domain (Schmidt, Reference Schmidt2002; Breitmeyer et al., Reference Breitmeyer, Ro and Singhal2004) but not across the two. Moreover, although masking of a target color with a color mask has been reported in two studies (Schmidt, Reference Schmidt2002; Breitmeyer et al., Reference Breitmeyer, Ro and Singhal2004), both employed a metacontrast masking technique, in which the target and mask regions were nonspatially overlapping. Because this type of masking has been hypothesized to rely on a form of “motion deblurring” (Ansorge et al., Reference Ansorge, Francis, Herzog and Ögmen2007) rather than direct interference between target and mask signals, we chose to use the simplified backwards masking technique, in which the target and the mask overlap in space. This alone would enable us to draw conclusions regarding a functional specialization.
In our study, we manipulated the relationship between the target and mask, such that the target-mask pairing was either homogeneous (e.g., color target and color mask) or heterogeneous (e.g., color target and motion mask). If regions or cells in the visual system are nonspecialized and respond to multiple visual features (integrated representations), mask strength should remain constant across conditions (Fig. 1, Panel C). If cortical representations are exclusively integrated, it should be impossible to selectively mask one feature (e.g., color), while sparing the other (e.g., motion). This would not be true if the demonstrated functional specialization in the cortex is perceptually potent, that is, if signals from target and mask are processed in separate cortical sites or by different cells, when competition or interference will take place over a different time course, and is likely to be weaker (Fig. 1, Panel B).
Our study is divided into three experiments. In the first, we report the effect of homogeneous and heterogeneous target-mask pairs at both short and long stimulus onset asynchronies (SOAs); functional specialization predicts weaker masking in the case of heterogeneous pairs. In the second experiment, we investigate the time course of homogeneous pair masking in more detail, with the aim of exposing perceptual asynchronies between the visual features of color and motion. In the third section, we test the prediction that heterogeneous masking only occurs when the mask is given sufficient processing time (i.e., when the mask occurs prior to the target).
Our results constitute a psychophysical demonstration of functional specialization for the processing of color and motion in the human visual system.
Experiment 1: Feature-selective masking
Method
Apparatus
For all experiments, stimuli were displayed on a Sony Trinitron Multi-scan E450 monitor (refresh rate of 140 Hz; Sony, Tokyo, Japan) and generated using the Cogent toolbox for MatLab on a windows XP machine.
Stimuli and Procedure
The target stimulus contained both color and motion, while the mask featured only a single attributeFootnote 1 . Stimuli were presented on a gray background (6.9 cd/m2). The target was a fast moving (145 deg s−1; left or right) colored circle (Fig. 2). It was presented for 35 ms and covered a region of 5.1 deg. Two types of mask were tested, a color mask which consisted of a uniformly colored bar (10.2 × 5.1 deg; 200 ms duration; Fig. 3A) and a motion mask generated from the horizontal cyclic left–right motion of two fast moving white circles (Fig. 3B), covering the target region. The target colors were green and yellow, while the mask colors were red and blueFootnote 2 . Therefore, the target and mask colors could either be opponent or nonopponent pairs. Fig. 2 shows the four target-mask color pairs.
In the first experiment, one short and one long SOA condition was tested (0–21 msFootnote 3 and 504 ms, respectively). The long SOA is useful in ruling out confounding factors that could account for poor discrimination performance, such as general task difficulty or response confusion arising from the integration of target/mask information. Eighty trials per SOA were tested for each subject.
Observers
Ten subjects (average age 29 years; seven females) were tested on the initial version containing two different SOAs. All had normal or corrected to normal vision.
Procedure
Observers were instructed to report either the color or direction of motion (separate sessions) of the target and to ignore all features of the mask. The experiment used a two alternative forced-choice design and was performed in four sessions, run in a counterbalanced order. Each session was composed of blocks of 40 trials, with a break given after each. Observers completed a single practice block for each new task.
Results
Fig. 4A displays proportion-correct results for all conditions when a motion mask is used. At short SOAs, motion judgments are impaired (mean = 60%), but this is not true of color judgments (mean = 95%) where performance is at ceiling. A reversed pattern is shown in the complementary condition, employing a color mask (Fig. 4B).
Statistical comparison reveals a significant difference at short SOAs for both mask types (Motion mask: t(11) = 8.86, P < 0.001; Color mask: t(11) = 4.63, P < 0.001), thus demonstrating a feature-selective masking effect. Conversely, there is no significant difference in scores for the long SOA conditions [Motion mask: t(11) = 1.48, P = 0.166; Color mask: t(11) = 1.65, P = 0.13], for which masking was predicted to be minimal. Crucially, masking is not only significantly stronger within a visual dimension but is also weak or absent across dimensions. For judgments of color, the motion mask had little or no effect; performance remained at ceiling (95%). Similarly, for judgments of motion, the color mask appears relatively ineffectual, although performance in this condition drops slightly (<90%). Thus, for the display settings used in this experiment, it is possible to strongly mask one feature, while having no effect on the other.
In a separate analysis of the color masking data, we segregated trials into those containing opponent and nonopponent color pairs. The results failed to show a greater masking effect for opponent color pairs, t(9) = 1.7, P = 0.13.
Experiment 2: Time course of the homogeneous masking effect
Feature selectivity of visual masking, as demonstrated in Experiment 1, lends clear support to the idea of segregated color and motion processing. Another method to investigate this separation is to examine differences in the masking time course. Previous studies, using a different paradigm, have argued for a faster color processing system than for motion, resulting in the generation of a color percept 70–80 ms before that of motion (Moutoussis & Zeki, Reference Moutoussis and Zeki1997a ). Can this perceptual asynchrony be revealed using a masking paradigm? More specifically, is detectability of color greater than that of motion, at the same SOA, for masks of equal strength? In this experiment, we measure color and motion detectability for homogeneous target-mask pairs, using a range of different target and mask intervals (SOAs).
Method
Observers
Nine subjects (mean age 26 years; five females) were tested. All had normal or corrected to normal vision.
Procedure
Using the within dimension, stimuli described in Experiment 1, 10 different SOAs, from 7 to 142 ms (step size ∼15 ms)Footnote 4 were tested (48 trials per SOA). In order for a meaningful comparison of color and motion time courses to be made, it was important to first establish a benchmark, at which the color and motion masks were equally effective.
For each subject, using masked stimuli with a constant SOA of 21 ms, mask strength yielding 60% correct was established through the use of an adaptive staircase procedure. Mask strength was varied by increasing or decreasing the luminance of the mask. This was done for both types of homogeneous target-mask pairs (color–color and motion–motion). Each subject’s mask luminance values were then transferred to the main program measuring detectability at multiple SOAs.
Results
Fig. 5 shows detectability of the target feature (color or motion) as a function of SOA (target-mask interval) for an individual subject (Fig. 5A) and averaged across subjects (Fig. 5B). Performance for the second SOA value (21 ms) is approximately equal for color and motion conditions, indicating that mask strength has been successfully equalized (see Method for details). The idealized psychometric function fitted to the data (Fig. 5A) demonstrates that color detectability increases more rapidly than motion detectability; color detectability plateaus at ceiling level by 150 ms SOA, while motion detectability continues to increase. Collapsed across the two longest SOA conditions of the group data (Fig. 5C), there is a significant difference between color and motion detectability, t(8) = 3.85, P < 0.01. This difference is not present at short SOAs, t(8) = 0.06, P > 0.5. These time course differences support a segregated processing scheme for color and motion, with color being processed more rapidly.
Experiment 3: Extending the time course: Forward and backward masking
Experiment 1 demonstrated the existence of feature-selective masking using a backwards masking paradigm. In the Experiment 3, we wanted to learn if this pattern also applies to conditions when the mask is presented before the target (forward masking). Because binding requires more time when the signals to be bound are of a different type (e.g., color and motion; Bartels & Zeki, Reference Bartels and Zeki2006), we predicted that different target-mask types would require longer to interact, and that we would therefore see a strong cross-dimensional masking effect when the mask appears before, but not after, the target.
Method
Observers
Five observers (mean age 28 years; 1 female) were tested. All had normal or corrected to normal vision.
Procedure
In order to achieve greater flexibility in the temporal relationship between target and mask, and to test a finer scaled range of SOAs, the duration of the mask was decreased from 200 to 100 ms. Observers were tested on two forward masking and two backward masking SOA conditions. In the forward masking conditions, the mask offset preceded the target by either 100 (Forward Long: FL) or 7 ms (Forward Short: FS). In the backward masking conditions, the mask onset succeeded the target by either 100 (Backwards Long: BL) or 7 ms (Backwards Short: BS). For each of these SOAs, observers were tested on all four color–motion target-mask combinations (see Experiment 1 for details), using identical stimuli to Experiment 1. In total, each observer completed 80 trials per single condition.
Results
Fig. 6 displays overall proportion correct scores for all conditions, averaged across the five observers. The outer bars illustrate scores for the conditions, in which the mask and target had the greatest temporal separation. For these conditions, it was expected that masking would be minimal. The central two gray bars (conditions FS and BS) represent the cases, in which the target is immediately preceded or succeeded by the mask.
Heterogeneous masking
When target and mask were of different features (Fig. 6A and 6B), performance for the longest SOAs (outer bars) approaches ceiling level (90%) and masking is weak or absent. For the short SOAs there is a large masking effect when the mask immediately precedes that target (FS); performance is significantly reduced in the forward (FS) versus backward masking condition (BS), for both the color mask / motion judgment [t(4) = 5.7, P < 0.001] and motion mask / color judgment [t(4) = 2.5, P < 0.05] conditions.
Homogeneous masking
Overall performance for within dimension masking (69%) is less than that for across dimension masking (85%), consistent with the results of Experiment 1. In common with the cross-dimension conditions, higher performance is seen for the longest SOAs (Fig. 6C and 6D, outer bars). In contrast to the cross-dimension conditions, there is no significant difference between the short SOA conditions (FS and BS) for the color mask / color judgment condition, t(4) = 1.04, P = 0.35, and for the motion mask / motion judgment condition, t(4) = 0.97, P = 0.39.
A significant interaction between mask position (using the short SOA conditions of FS and BS) and target–mask relationship (either across or within dimensions) provides strong evidence for dissociable masking time courses, F(1,4) = 17.0, P = 0.015. These results indicate that when the target and mask are of different features, presentation of the mask prior to the target is optimal for maximizing the strength of the mask. When the mask is presented subsequent to the target, the masking effect is weak. This finding is in agreement with the results of Experiments 1 and 2, which showed weak or absent heterogeneous backwards masking.
Discussion
Three variations on the visual masking paradigm have provided converging psychophysical support for the existence of functionally specialized color and motion systems in the human visual brain. It has been shown that 1) masking is feature selective, 2) color and motion recover at different rates from mask interference, and 3) the optimal temporal position of the mask is dependent on the feature relation of the target-mask pair. All three lines of evidence point toward a segregated coding scheme for color and motion in the human visual system.
Although feature-selective visual masking is a logical consequence of functionally specialized color and motion systems, this separation has not been extensively explored with psychophysical techniques before. This was the main focus of Experiment 1, in which we show that when a color mask is presented, color judgments are impaired, while motion judgments are spared. When a motion mask is presented, the reverse is true. The selective masking effect is apparent only when the mask is presented subsequently to the target (backwards masking), is significant only at short SOAs, and disappears by 500 ms, consistent with previous masking results (Breitmeyer & Ogmen, Reference Breitmeyer and Ogmen2006) and ruling out the influence of other factors such as response confusion or memory limitations. A more in-depth examination of the time course of the masking effect was carried out in Experiments 2 and 3. The finding that detectability of the target color increases more rapidly than for motion (Experiment 2) is consistent with a shorter perceptual processing time for color compared to motion (Moutoussis & Zeki, Reference Moutoussis and Zeki1997a ), and therefore, a shorter time window in which interference from the mask signal is effective. Unequal processing times for color and motion imply that their perceptual encoding is accomplished by different neurons in the visual system. Forward masking is not feature selective but takes place with any combination of target and mask (Experiment 3). This implies that heterogeneous masking can be effective, but the target and mask signals may require more time to interact. This account fits well with the known functional segregation of the visual system and is also supported by evidence that binding across feature dimensions requires more time (Bartels & Zeki, Reference Bartels and Zeki2006). Additionally, we found that masking strength does not depend on opponent/nonopponent target-mask color pairs.
The results of these experiments add to previous evidence demonstrating functional specialization in the vision system (Zeki, Reference Zeki1978; DeYoe & van Essen, Reference DeYoe and Van Essen1988; Livingstone & Hubel, Reference Livingstone and Hubel1988). They do not rule out the existence of cells in the visual system that respond to multiple properties, as has been reported for color and form (Friedman et al., Reference Friedman, Zhou and von der Heydt2003) and for color and motion (Leventhal et al., Reference Leventhal, Thompson, Liu, Zhou and Ault1995; Seymour et al., Reference Seymour, Clifford, Logothetis and Bartels2009) but only that such units, assuming them to exist, do not display their perceptual potency in these experiments. Therefore, although these results point to the importance of separate processing streams, they are not in themselves at odds with the existence of conjunction detectors for multiple visual properties, as previously reported for color and form (Lovegrove & Over, Reference Lovegrove and Over1973; Lovegrove & Badcock, Reference Lovegrove and Badcock1981; Clifford et al., Reference Clifford, Spehar, Solomon, Martin and Zaidi2003) and color and motion (Seymour et al., Reference Seymour, Clifford, Logothetis and Bartels2009). The observation that there was little or no carryover effect (color masking motion or vice versa) indicates that any conjunction-selective cells contribute only weakly to perception, if at all. This of course raises the question of what the role of such putative conjunction-selective cells may be.
It is worth noting that for heterogeneous masking (Experiment 3), the type of target-mask pair made no difference. Regardless of whether color-masked motion or motion-masked color, the largest effect was found when the mask preceded that target by 100 ms. It is possible, therefore, that the processing latency differences for color and motion (Moutoussis & Zeki, Reference Moutoussis and Zeki1997a ) relate only to the development of conscious percepts. Interference from the mask may take place before conscious percepts are generated.