Audio-visual Stroop matching task with first- and second-language color words and color associates

Iva Šaban; James R. Schmidt

doi:10.1017/S0142716424000055

Audio-visual Stroop matching task with first- and second-language color words and color associates

Published online by Cambridge University Press: 11 April 2024

Iva Šaban

and

James R. Schmidt

Show author details

Iva Šaban*: Affiliation:
LEAD-CNRS UMR5022, University of Burgundy, Dijon, France CRFDP, University of Rouen Normandy, Mont-Saint-Aignan, France
James R. Schmidt: Affiliation:
LEAD-CNRS UMR5022, University of Burgundy, Dijon, France
*: Corresponding author: Iva Šaban, Email: [email protected]

Article contents

Abstract
Cognitive control measured by the Stroop task and corresponding conflict effects
Experiment 1
Experiment 2
General discussion
Conclusion
Replication package
Competing interests
Footnotes
References

Rights & Permissions

Abstract

In the audio-visual Stroop matching task, participants compare one Stroop stimulus dimension (e.g., the color of a written word) to a second stimulus (e.g., a spoken word) and indicate whether these two stimuli match or mismatch. Slower responses on certain trials can be due to conflict which occurs between color representations (semantic conflict) or due to conflict between responses evoked by task comparisons (response conflict). The contribution of these conflicts has been investigated with color word distracters. This is the first study which explores how two types of first- and second-language words affect audio-visual matching. Native French speakers performed a bilingual Stroop matching task with intermixed French (L1) and English (L2) color words (Experiment 1) and color associates (Experiment 2) presented in congruent and incongruent colors simultaneously with spoken French color words. Participants were instructed to indicate whether the spoken word “matches” or “mismatches” the font color, while ignoring written word meaning. Interestingly, the results were similar for the critical “mismatch” trials for both French and English words. The responses were the fastest on trials in which task comparisons activate fewer response alternatives, supporting the assumption of the response conflict account.

Keywords

audio-visual matching between-language interference response conflict semantic conflict within-language interference

Type: Original Article
Information: Applied Psycholinguistics , Volume 45 , Issue 2 , March 2024 , pp. 267 - 298

DOI: https://doi.org/10.1017/S0142716424000055 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Cognitive control measured by the Stroop task and corresponding conflict effects

People make everyday decisions about allocating cognitive control in order to pursue their goals (e.g., what to pay attention to, what to stop themselves from doing). For instance, when confronted with multiple sources of information, our cognitive system adapts our attentional resources away from distracting (i.e., non-goal relevant) stimuli and/or toward the goal-relevant stimuli and the action we are supposed to make. The Stroop task is one particularly useful tool in assessing the ability of the cognitive control system to control selective attention. In the Stroop task, participants are instructed to name the ink color of the written word while ignoring its meaning. The standard finding of slower and less accurate responding on incongruent (e.g., “red” in green) relative to congruent (e.g., “red” in red) trials is known as the congruency or Stroop effect (Stroop, Reference Stroop1935; for a review, see MacLeod, Reference MacLeod1991). Among other things, the Stroop effect indicates that control over selective attention is not absolute: the distracting word influences color naming, indicating that it is not ignored entirely.

One other question of interest concerns the source of this congruency effect. According to response conflict accounts, word reading and color naming compete for a single response channel (Goldfarb & Henik, Reference Goldfarb and Henik2007; Morton, Reference Morton1969; Posner & Snyder, Reference Posner and Snyder1975). The word reading response becomes available prior to a color naming response, because it is a faster and more automatized process than color naming (for the automaticity of reading debate, see Augustinova & Ferrand, Reference Augustinova and Ferrand2014; Besner et al., Reference Besner, Stolz and Boutilier1997). Thus, word reading disrupts color naming but not vice versa. Alternatively, semantic (or stimulus) conflict accounts assume that the conflict occurs in an earlier phase of processing (Luo, Reference Luo1999; Seymour, Reference Seymour1977; Simon & Berbaum, Reference Simon and Berbaum1988). When the ink color and word meaning are incongruent (e.g., “red” in green), two distinct semantic representations (“red” and “green”) are simultaneously activated. This semantic conflict takes time to resolve, presumably before response selection. Various authors have discussed the relative contribution of semantic and response conflict in explaining the source of congruency. Nowadays, the current consensus is that both effects contribute to the standard Stroop effect (Ferrand & Augustinova, Reference Ferrand and Augustinova2014). The presence of semantic and response conflict indicates that the distracting word slipped through the attentional filter, either at an early semantic processing phase or later response selection phase. Most models (Glaser & Glaser, Reference Glaser and Glaser1989) assume that semantic processing occurs earlier in the stimulus processing, with the response being selected at a later stage.

Stroop matching task

In a Stroop task, a to-be-ignored written word stimulus and the oral response (e.g., color naming and word reading) are compatible, which has been suggested as an inherent limitation of the Stroop task (Treisman & Fearnley, Reference Treisman and Fearnley1969). That is, a response in the form of a spoken word is required in both color naming and word reading tasks. This might produce a congruency effect only when the irrelevant stimulus attribute (e.g., word) belongs to the same class as the response. This limitation has inspired a novel variant of the Stroop task, named the Stroop matching task, in which responses are neither words nor colors.

In the Stroop matching task, participants are instructed to make matching/mismatching judgments on two simultaneously presented stimuli (Treisman & Fearnley, Reference Treisman and Fearnley1969). That is, participants were asked to indicate whether two stimulus dimensions “match” or “mismatch” (e.g., two color words or a word and color). Most importantly, this task permits a test of the contribution of two contrasting potential sources of conflict: semantic and response conflict. For instance, in the meaning decision task of Dyer (Reference Dyer1973), participants were asked to compare a color word to a color patch and to ignore the print color of the word. Matching/mismatching judgments were slower when the color word was printed in an incongruent color. However, responses are slower to “match” trials when the word mismatches the color (e.g., “red” in blue) than when the word and color match (e.g., “red” in red). This is because the incongruent color activates a semantic representation (i.e., blue) that competes with the representations activated by the other stimuli (i.e., red). According to this perspective, then, semantic conflict interferes with the matching/mismatching response (Dyer, Reference Dyer1973; Flowers, Reference Flowers1975). This finding challenges the assumptions of certain response conflict accounts because the supposedly slower color naming response (i.e., “blue”) influenced responding more than the faster word meaning response (i.e., “red”).

Similar findings were observed with the visual decision task in which participants were asked to decide whether two stimuli have the same ink color (Egeth et al., Reference Egeth, Blecker and Kamlet1969; Virzi & Egeth, Reference Virzi and Egeth1985). For instance, on a trial with the word “red” printed in blue and a blue patch, the required response is “match.” Interestingly, the conflicting verbal information provided by the word (i.e., “red”) did not produce interference, seemingly indicating that the word meaning is not fast enough to compete with the semantic unit (“blue”) accessed by the word’s ink color (Egeth et al., Reference Egeth, Blecker and Kamlet1969; Treisman & Fearnley, Reference Treisman and Fearnley1969). This finding again contradicts the assumptions of the response conflict account, since word reading, although faster than color naming, produced no interference with responding. However, when the color names were replaced with the words “SAME” and “DIFF,” interference reappeared. That is, two simultaneously presented words “DIFF” printed in the same color (e.g., red) resulted in interference, because the correct response for the colors (i.e., “matching” or “SAME”) competes with the response suggested by the distracters (i.e., “mismatching” or “DIFF”). This indicates that participants had difficulties to ignore the written words and respond to the ink color exclusively, as assumed by the response conflict account (Egeth et al., Reference Egeth, Blecker and Kamlet1969).

The meaning decision and visual decision tasks have been integrated within a single matching procedure to directly test whether interference is due to semantic or response conflict. Luo (Reference Luo1999) replicated both the interference in the meaning decision task and the absence of interference in the visual decision task. Luo argued that only the meaning decision task required participants to access the semantic system. In this task, when a Stroop stimulus “red” printed in blue is presented with a red patch (i.e., “matching” response is required), the ink color and the color patch activate two competing semantic representations (e.g., “blue” and “red”). According to Luo (Reference Luo1999), this generates a semantic conflict. In contrast, these findings are difficult to explain by the response conflict account because it did not matter whether the response was “matching” or “mismatching” since the response latencies were faster for related ink colors than for unrelated ink colors.

However, Goldfarb and Henik (Reference Goldfarb and Henik2006) pointed out that Luo’s (Reference Luo1999) analysis on the meaning decision task only distinguished between a “mismatching” condition in which colored patches appeared together with either an incongruent color word (e.g., “red” in blue paired with a blue rectangle) or a congruent color word (e.g., “red” in red paired with a blue rectangle). Goldfarb and Henik suggested that the congruency of the color word stimuli could play a role in producing a conflict. For both “matching” and “mismatching” responses, Stroop stimuli could be either congruent or incongruent. Thus, in addition to the four conditions contrasted by Luo (Reference Luo1999), Goldfarb and Henik (Reference Goldfarb and Henik2006) introduced a condition in which both dimensions of the incongruent Stroop stimuli mismatch with the color of the patch (e.g., “red” in blue with a green patch). They observed that “matching” responses were faster when Stroop stimuli were congruent (e.g., “red” in red with a red patch) than when they were incongruent (e.g., “red” in green with a red patch). The “mismatching” responses were the slowest when the word and ink color were congruent (e.g., “red” in red with a green patch). Delays were similar when the ink color and patch color matched (e.g., “red” in green with a green patch) and when they mismatched (e.g., “red” in blue with a green patch). To sum up, response latencies to incongruent trials were slower during “matching” responses and faster during “mismatching” responses. According to Goldfarb and Henik, participants erroneously made an irrelevant match between the word and its ink color. That is, seeing congruent and incongruent Stroop stimuli leads to a covert “matching” and “mismatching” response, respectively, which can either facilitate or interfere with the actual response required. Thus, they suggested that the results are clearly in line with the response conflict account.

In a related matching task variant, Bornstein (Reference Bornstein2015) asked participants to make an audio-visual matching judgment based on the task-relevant auditory (i.e., spoken color word) and visual stimuli (i.e., ink color of a written word). On each trial, participants were instructed to indicate whether the color of a written word (while ignoring its meaning) corresponds to a simultaneously presented spoken word. Bornstein (Reference Bornstein2015) compared the interference produced by congruent and incongruent written stimuli on matching spoken word and font color. Bornstein observed that incongruent distracters (e.g., “red” in blue while hearing “blue”) interfered more than congruent distracters (e.g., “blue” in blue while hearing “blue”) with “matching” responses, similar to Goldfarb and Henik (Reference Goldfarb and Henik2006). Furthermore, written words that were congruent with either task-relevant dimension (i.e., ink color or spoken word) interfered with “mismatching” responses relative to trials in which the word mismatched both (e.g., “green” in red while hearing “blue”).

Both the semantic and response conflict accounts assume the same outcome for “matching” responses with faster responses on congruent (i.e., All congruent) relative to incongruent color words (i.e., Sound-color congruent). According to the semantic conflict account, this is due to the fact that for congruent color words, all three task dimensions refer to the same color (i.e., blue). The response conflict account explains this difference in response speed by the three stimulus comparisons, which all suggest the same response alternative (i.e., “match”). Critically, the assumptions of these two accounts differ for “mismatching” trials. According to the semantic conflict account, All incongruent trials, in which a written color word is incongruent (e.g., “green” in red, hear “blue”) with the remaining two color dimensions, should produce the largest interference. Three different semantic representations (i.e., blue, red, and green) are simultaneously activated, thus slowing down responding. In contrast, the response conflict account suggests that incongruent color word distracters should facilitate responding when both dimensions (e.g., green and red) are incompatible with a spoken word (e.g., blue). This is because all three comparisons (i.e., written vs. spoken word, written word vs. color, and spoken word vs. color) provide evidence toward the same response alternative (i.e., “mismatching”), resulting in faster response latencies (Bornstein, Reference Bornstein2015; Caldas et al., Reference Caldas, Machado-Pinheiro, Souza, Motta-Ribeiro and David2012; Goldfarb & Henik, Reference Goldfarb and Henik2006). The shared prediction of semantic and response conflict accounts for “matching” trials and contrasting predictions for “mismatching” trials are visualized in Figure 1.

Figure 1. Prediction of semantic (SC) and response (RC) conflict accounts for “matching” and “mismatching” trials.

Color associates

All previously described Stroop matching task studies made use of color words. However, similar studies have not been conducted with another common word type with a strong color dimension, namely, color associates, which could help further evaluate conflict effects in the Stroop matching task. Color associates are words that are closely related to color words (e.g., “sky” with blue) and their semantic representations (Tanaka & Presnell, Reference Tanaka and Presnell1999). Color associates do produce interference with color naming in the Stroop task. Similar to color words, color associates can be congruent (e.g., “sky” in blue) or incongruent (e.g., “sky” in red) with the ink color. When contrasting the response latencies of these two types of trials, a congruency occurs, with slower and less accurate responses on incongruent relative to congruent color associates (Glaser & Glaser, Reference Glaser and Glaser1989; Klein, Reference Klein1964; Risko et al., Reference Risko, Schmidt and Besner2006; Schmidt & Cheesman, Reference Schmidt and Cheesman2005).

This difference in performance might be due to early semantic processes (Glaser & Glaser, Reference Glaser and Glaser1989). When a color word distracter is printed in an incongruent color (e.g., “sky” in red), two competing color representations (i.e., red and blue) are simultaneously activated, thus producing semantic conflict. According to this perspective, color associate congruency effects arise from early, semantic processes. Another account suggests that color associates might directly produce the color response linked to the color associate. That is, when the word “sky” is printed in red, both the responses linked to the color blue (i.e., the color associated with “sky”) and the response linked to the color red (i.e., which is associated with the ink color) will be activated. Thus, according to this perspective, incongruent color associates produce response competition, resulting in response conflict exclusively, rather that semantic conflict (Klein, Reference Klein1964). Third, Sharma and McKenna (Reference Sharma and McKenna1998) suggested that interference should occur only when vocal responses are required and should be eliminated with manual responses, though subsequent research clearly indicates the presence of conflict effects in keypress tasks (e.g., Schmidt & Cheesman, Reference Schmidt and Cheesman2005).

One reason why color associates might be especially interesting in the context of the matching task relates to a peculiarity of the matching task. For “matching” trials, both the semantic and response conflict accounts make identical predictions. For “mismatching” trials, the two accounts make exactly opposite predictions. Specifically, the semantic conflict account suggests that All incongruent trials should be slower than the two other types of “mismatching” trial types, whereas the response conflict account suggests that All incongruent trials should be faster than the two other types of “mismatching” trial types. Therefore, if both semantic and response conflict occur, the larger of the two effects will “mask” the other. In particular, evidence of a response conflict effect could indicate that only response conflict occurs in the matching task but could also indicate that response conflict is merely larger than semantic conflict. Thus, if the response conflict effect can be eliminated, then we might expect that the “true” effect of semantic conflict would be revealed. Although some competing accounts of color associates’ conflict effects exist (as discussed above), we hypothesized that color associates would produce only semantic conflict. Some evidence suggests this to be the case in standard Stroop studies (e.g., Schmidt & Cheesman, Reference Schmidt and Cheesman2005). All task comparisons (one relevant and two irrelevant) for each color associate trials are visualized in Figure 2.

Figure 2. Types of trials and example stimuli with relevant (highlighted column) and irrelevant task comparisons.

Bilingualism

The Stroop effect has been frequently investigated in bilingual people (Altarriba & Mathis, Reference Altarriba and Mathis1997; Dyer, Reference Dyer1971; MacLeod, Reference MacLeod1991; Mägiste, Reference Mägiste1982; Preston & Lambert, Reference Preston and Lambert1969; Tzelgov et al., Reference Tzelgov, Henik and Leiser1990). These previous studies showed that congruency can be observed with both first-language (L1) and second-language (L2) words. However, the interference is generally larger for L1 words than for L2 words. This could be explained by the nature of L2 connections. For instance, there has been debate about whether L2 words 1) have strong direct connections to semantic representations but weak connections to the L1 lexicon, 2) are strongly connected to the L1 lexicon but not semantics, or 3) have both semantic and lexical connections (Altarriba & Mathis, Reference Altarriba and Mathis1997; Kroll & Stewart, Reference Kroll and Stewart1994; Schmidt et al., Reference Schmidt, Hartsuiker and De Houwer2018). Thus, it is unclear whether L2 words would lead to semantic conflict, response conflict, or a combination of both. Specifically, L2 words would not be expected to generate semantic conflict if they have no (or very weak) connections to semantics. If the exact reverse is true and L2 words function as semantic associates to their L1 translations, then only semantic conflict might be expected, as discussed in the previous section on color associates.

Another important question in the bilingual Stroop literature concerns the modulation of Stroop interference by stimulus and response language (i.e., the language of a distracter and the language of a response, respectively). First, the distracter language can match the response language. For instance, color naming of the distracter “red” printed in green produces within-language (or intralingual) interference when English is a response language (i.e., a correct response is to say “green”). Second, the distracter language can mismatch the response language. That is, color naming of the distracter “rouge” (red in French) printed in blue produces between-language (or interlingual) interference when English is a response language (i.e., a correct response is to say “green”).

The magnitude of within- and between-language interference has been compared repeatedly. A standard finding is a larger within-language than between-language interference effect (Dyer, Reference Dyer1971; Hamers & Lambert, Reference Hamers and Lambert1972; Kiyak, Reference Kiyak1982; MacLeod, Reference MacLeod1991; Preston & Lambert, Reference Preston and Lambert1969). For instance, MacLeod (Reference MacLeod1991) reported that the between-language interference represents about 75% of within-language interference. However, these findings mostly originated from the standard visual (MacLeod, Reference MacLeod1991) and auditory (Hamers & Lambert, Reference Hamers and Lambert1972) Stroop task but have never been confirmed with the Stroop matching task. In a bilingual Stroop matching task, it might be assumed that distracters that match in language with a spoken word will produce larger interference relative to those that mismatch. To test this in the present series of studies, we used distracting words from both the first language (i.e., French) and a second language (i.e., English). However, spoken words were always French. French distracters are therefore expected to produce larger interference (i.e., within-language interference) relative to English distracters (e.g., between-language interference).

Present Study

In the present series of experiments, a bilingual audio-visual Stroop matching task was designed to further explore the 1) magnitude of interference produced by first- (L1) and second (L2)-language color words and color associates, and 2) the relative contributions of semantic and response conflict. In addition to first-language color words, frequently used as distracters in the literature, we introduced second-language color words (Experiment 1). That is, intermixed French (L1) and English (L2) color words served as distracters, while participants had to match its ink color with a spoken French color word. Thus, this manipulation allows us to test the consensus of larger within- than between-language interference. If this is the case, a larger interference effect is expected to occur with French (L1) than with English (L2) color word distracters. The design of this study can be found in the Audiovisual Stimulus Combination section. Experiment 2 aims to further expand the findings by using color associates instead of color words. That is, both French and English color associates were used as distracters, with participants matching their ink color with a spoken French color word. Note that, in contrast to Experiment 1, a spoken word (e.g., “vert,” French for green) does not correspond to a written word (e.g., “herbe,” French for grass). This manipulation should (according to some views) eliminate response conflict since “herbe” might be unable to retrieve the response linked to green. Furthermore, this could reveal the role of the semantic conflict, which is possibly masked by a (larger) response conflict effect. Apart from that the question of larger within- relative to between-language interference remains open. That is, French color associates are expected to produce more interference than their English counterparts.

The present series of studies also aims to investigate the source of this interference. As already discussed, the interference could be due to the conflict between semantic representations (i.e., semantic conflict) or due to the conflict between response alternatives (i.e., response conflict). Based on the findings of Luo (Reference Luo1999) and Goldfarb and Henik (Reference Goldfarb and Henik2006), these two opposing accounts predict similar outcomes for “matching” responses. That is, when a correct response is “match,” Sound-color congruent trials will produce slower responses than All congruent distracters. However, semantic- and response conflict accounts make different assumptions for “mismatching” responses, based on the congruency between task dimensions. According to the semantic conflict account, a written distracter should produce the largest interference by being incongruent with both task dimensions (e.g., on All incongruent trials) than by being incongruent with only one of them (e.g., on Word-sound congruent and Word-color congruent trials). This is because, on All incongruent trials, the distracting written word is incongruent with both target dimensions, thus producing a delay in responding. In contrast, the response conflict account assumes that the smallest interference will be observed with All incongruent trials, when all task comparisons suggest the same, “mismatching” response. That is, interference will be mostly observed on Word-sound congruent and Word-color congruent trials, where one of the irrelevant task comparisons suggests the same response alternative as the relevant comparison (i.e., “mismatch”), but the third comparisons suggest the other (incorrect) response alternative (i.e., “match”).

Experiment 1

Experiment 1 contrasts the response latencies on congruent and incongruent French (L1) and English (L2) color word distracters, each accompanied by a French spoken word. Participants were instructed to respond according to whether the ink color and spoken word match or mismatch by pressing the corresponding key. The combinations of visual and auditory stimuli produced five trial types: two “matching” and three “mismatching,” discussed in detail in the Audiovisual Stimulus Combination section. The aim of Experiment 1 was to (1) compare the magnitude of interference produced by first- and second-language color words in the audio-visual Stroop matching task and (2) investigate whether this interference is due to semantic or response conflict.

Method

Participants

A total of 34 (31 women) [removed for review] undergraduates (M _age = 19; SD = .78) voluntarily participated in the experiment in exchange for course credit. An a priori power analysis was conducted using G*Power 3 (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) for sample size estimation, based on data from Goldfarb and Henik (Reference Goldfarb and Henik2006), N = 12, which compared response times on matching and mismatching trials separately. The effect size in Goldfarb and Henik’s (Reference Goldfarb and Henik2006) study was η_p ² = .57, considered to be large. With a significance criterion of α = .05 and power .95, the minimum sample size needed with this effect size is N = 22 for repeated measures ANOVA. Preferring more power than minimally necessary, we decided to collect data for at least 30 participants, stopping after a testing week when this number was exceeded (resulting in the obtained sample size of N = 34).

All participants had normal of corrected-to-normal visual acuity, normal color vision, and normal auditory acuity, as assessed via screening questions. Participants gave written informed consent before the study. All the procedures were conducted in accordance with the Declaration of Helsinki, although nonbiomedical research in [removed for review] does not require ethics approval. All participants were native French speakers. A language questionnaire (to be discussed shortly) was used to assess and confirm that participants fit with these criteria. Average language background scores (mean age and standard errors) are presented in Table 1 (see Results section).

Table 1. Mean French and English language scores and standard errors (in brackets)

Apparatus

The experiment was conducted in a sound-attenuated room in the laboratory. Stimulus presentation and response timing were controlled and recorded by Psytoolkit (Stoet, Reference Stoet2010, Reference Stoet2017). The study was conducted using a PC laptop with an AZERTY keyboard and a 15’’ monitor. Participants responded with the “D” key when the audio and the ink color of the written distracted mismatched (e.g., hear “green” and see “brown” in brown). Participants responded with the “K” key when the audio and the ink color matched (e.g., hear “green” and see “brown” in green). Prior to the Stroop matching portion of the experiment, participants filled out a short language demographic questionnaire. This questionnaire asked for gender, age, native language, years of English training in school, a self-rating of English knowledge ranging from 0 (= almost none) to 5 (= perfect). A subset of questions from the French version of the Language Experience and Proficiency Questionnaire (LEAP-Q; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007) was inserted. In particular, the questions asking participants to list the languages in order of dominance and acquisition were retained. They were also asked to indicate the percentage with which they used French and English in the recent period. Also retained from the LEAP-Q were two boxes, one for French and one for English, asking for the age the participants began acquiring the language, became fluent in the language, began learning to read in the language, and became fluent in reading the language. The purpose of this questionnaire was to assure that participants had the correct language dominance. Finally, in addition to these two questionnaires, participants were asked to give the French translations of the four English words used in the experiment (i.e., “green,” “brown,” “pink” and “white”).

This was followed by the LexTale English vocabulary test (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) with instructions translated into French. This test contains 63 English-looking words (3 practice trials and 60 test trials). 2/3 of the test trials are actual English words (e.g., “moonlit,” “fluid”), whereas the remaining 1/3 are not (e.g., “plaudate,” “rebondicate”). Participants were instructed to select the words that they are certain are actual English words. Correct “hits” were rewarded with one point, and incorrect “false alarms” were penalized by two points.

Materials and design

During the experimental part of the experiment, participants were presented with a set of French-English translation equivalents (i.e., “green/vert,” “brown/marron,” “rose/pink,” and “white/blanc”), typed in lowercase Courier New Bold font (size 72). The corresponding print colors and their RGB codes were green (0, 128, 0), brown (165, 42, 42), hot pink (255, 105, 180), and white (255, 255, 255). These four words were non-cognates, that is, do not share phonological or orthographic features across languages, unlike several other color word pairs (e.g., “blue/bleu” or “red/rouge”). Auditory stimuli consisted of the color words (/vert/, /marron/, /rose/, /blanc/, French for green, brown, pink, and white, respectively), spoken by a female speaker.

The manipulation allowed for 2 within-subject factors: Trial Type (“matching” condition that contained All congruent and Sound-color congruent trials vs. “mismatching” condition that contained Word-sound congruent, All incongruent, and Word-color congruent trials) and Language (French vs. English). In each experimental block, there were 25% matching (6.25% All congruent, 18.75% Sound-color congruent) and 75% mismatching trials (18.5% Word-sound and Word-color congruent trials, 37.5% All incongruent). This was because each combination of color word distracter, print color, and sound were presented equally often to avoid contingency biases (i.e., learning of regularities between stimuli; Schmidt et al., Reference Schmidt, Crump, Cheesman and Besner2007; see also, Lorentz et al., Reference Lorentz, McKibben, Ekstrand, Gould, Anton and Borowsky2016).Footnote ¹ This does mean that mismatching responses were more frequent than matching responses. However, it is important to note that all of the key comparisons are within response type. That is, we conducted one analysis for matching responses and another analysis for mismatching responses, as previously suggested (Goldfarb & Henik, Reference Goldfarb and Henik2006). This way, even if participants had a learned strategic tendency to prepare the “mismatching” response, this bias cannot impact “matching” responses. No systematic biases were produced in our statistical tests, as two trial types were analyzed separately (i.e., none of our comparisons involve comparing a trial with a “matching” response to a “mismatching” response. In total, there were 3 larger experimental blocks of 128 trials each (in total 384 trials), presented randomly without replacement. This main phase of the experiment was preceded by a practice block. The practice block consisted of 32 trials, with the color words replaced with the stimulus “xxxx.”

Audio-visual stimulus combination

A total of 128 audio-visual stimulus combinations were created from the eight visual stimuli (“vert,” “marron,” “rose,” “blanc,” “green,” “brown,” “pink,” “white”), four font colors (green, brown, pink, and white), and four auditory stimuli (“vert,” “marron,” “rose,” “blanc”). These combinations were grouped into 5 conditions, varying by the congruence or incongruence between spoken word meaning, font color, and written word meaning. In two conditions, the font color and spoken color word (task-relevant comparison) were congruent and thus required a “matching” response. These conditions were as follows: 1) All congruent, and 2) Sound-color congruent. In the other three conditions, the font color and spoken color word were incongruent and thus required a “mismatching” response. These conditions were as follows: 3) All incongruent, 4) Word-sound congruent, and 5) Word-color congruent. All of these five conditions applied for both distracter languages. These conditions are presented in Figure 3.

Figure 3. All trial types across two distracter languages (French and English).

Note. All trial types have two equivalents: one with a French distracter (on the left) and one with an English distracter (on the right). Color patches represent the ink color in each trial.

Procedure

After completing the survey questions, the main experiment began. Stimuli were presented on a black (0, 0, 0) screen. On each trial, participants were first presented with a fixation “+” in gray (128, 128, 128) for 500 ms. This was followed by blank screen presented for 250 ms. Then, the colored distracter appeared on the screen until a response was registered or 2000 ms elapsed. The colored distracter was presented simultaneously with the auditory stimulus. Responses could be provided only after 300 ms from the stimulus onset. This is due to the programming of the experiment. On each trial, an initial event plays the audio and presents the visual stimuli, which is then followed by a second event with only the stimulus and where responses are recorded. This was also done because the task required a comparison of the auditory stimulus with the print color. Thus, a response before the auditory stimulus has been played is inevitably an anticipatory response that would be best excluded anyway. The next trial began after a 750-ms blank screen. The timeline of each trial is visualized in Figure 4. If the participant made an error or failed to respond in time, then the message “Erreur” (“Error”) or “Trop lent” (“Too slow”), respectively, appeared in red (255, 0, 0) for 1000 ms before the next trial. In both experiments, participants were explicitly instructed to respond as quickly and as accurately as possible and avoid reading a distracter since it represents a task-irrelevant dimension. The “matching” key had to be pressed for trials in which the spoken color word and the font color matched, and the “mismatching” key for trials in which the spoken color word and the font color mismatched.

Figure 4. Timeline of an experimental trial.

Results

We used French and English words in this experiment to compare a highly-fluent L1 with a low-fluency L2. In [removed for review], French is normally the native language and English is typically learned later in life and not to a very high level of mastery. To assure that this was actually the case for our sample, we first analyzed average language metric scoresFootnote ² , which are presented in Table 1. All participants seemed to sufficiently fit our language criteria, as they were native French speakers who acquired the language early in life. Importantly, French was ranked as the first language in terms of dominance and order of acquisition by all participants. The percentage of French use revealed that participants had been using French almost exclusively in their everyday lives. In contrast, English was learned much later as a foreign language in primary schools. Participants were only moderately proficient in English, as shown by LexTale score and their self-rated English knowledge level. Although they studied English for a considerable amount of time (almost 9 years) and declared being able to speak and read English fluently (approximately at the age of 15), their objective proficiency level is rather low.

Data analysis

The mean correct response times (i.e., made during the 2000 ms response window) and mean percentage error were analyzed. Response times were not trimmed (pre-planned analyses). However, we note that the direction and significance of all effects did not change in subsequent analyses with an interquartile range (IQR) trim method, unless otherwise noted. No participants were excluded from the sample, as their individual accuracy rate was 86.35% or above. The congruency variable had different levels for “matching” and “mismatching” responses, and matching and mismatching trial types were analyzed separately. One shared factor was a Distracter Language, with two levels: French (L1) and English (L2). Because the congruency variable had different levels for the “matching” and “mismatching” responses and because there are no relevant comparisons to make between the matching and mismatching trial types, two separate repeated measure analyses of variance with two within-subject factors were conducted. In the “matching” condition, 2 levels were analyzed (All congruent and Sound-color congruent), while in the “mismatching” condition, 3 levels were analyzed (Word-sound congruent, All incongruent, and Word-color congruent).

Response time (RT)

Response times were recorded in milliseconds as the time elapsed from stimulus onset to key press. A total of 5.98% trials were excluded from the analyses (5.77% incorrect and .21% time-out responses). Only RTs for correct responses in “matching” and “mismatching” conditions were analyzed and illustrated in Figure 5.

Figure 5. Mean response times with standard errors for “matching” and “mismatching” trials.

Matching trials

There was a main effect of Trial Type; F(1,33) = 209.609, MSE = 1606.534, η _p ² = .864, BF ₁₀ > 1000, p < .001. Responses on Sound-color congruent trials (M = 827, SE = 13.30) were slower than responses on All congruent trials (M = 728, SE = 13.93). The significant main effect of Language was observed, F(1,33) = 11.638, MSE = 1797.765, η _p ² = .260, BF ₁₀ = 1.124, p = .001, with slower responses in French condition (M = 790, SE = 14.71) relative to English condition (M = 765, SE = 12.53). The interaction between Trial Type and Language was also significant, F(1,33) = 9.272, MSE = 1649.944, η _p ² = .219, BF ₁₀ = 11.021, p < .01. There was no difference in response speed between French (M = 729, SE = 16.06) and English (M = 726, SE = 14.45) All congruent trials, t(33) = .286, M _diff = 3, BF ₁₀ = .191, BF ₀₁ = 5.236, p = .776. However, responses were significantly slower on French (M = 850, SE = 15.13) Sound-color congruent trials relative to English Sound-color congruent (M = 804, SE = 12.14) trials; t(33) = 6.847, M _diff = 46, BF ₁₀ > 1000, p < .001.

Mismatching trials

The main effect of Trial Type was observed, F(2,66) = 36.205, MSE = 926.505, η _p ² = .523, BF ₁₀ > 1000, p < .001. Responses on Word-sound congruent (M = 827, SE = 15.79) trials were significantly slower than responses on All incongruent (M = 784, SE = 12.01) trials, t(33) = 7.156, M _diff = 43, BF ₁₀ > 1000, p < .001 and Word-color congruent (M = 796, SE = 12.44) trials, t(33) = 5.085, M _diff = 31, BF ₁₀ > 1000, p < .001. Responses on Word-color congruent trials were slower relative to responses on All incongruent trials, t(33) = 4.167, M _diff = 12, BF ₁₀ = 129.88, p < .001. There was no main effect of LanguageFootnote ³ , F(1,33) = .278, MSE = 727.161, η _p ² = .008, BF ₁₀ = .161, BF ₀₁ = 6.211, p = .602, indicating that there is no difference in response latencies between French and English trials. The interaction between Trial Type and Language was also not significant, F(2,66) = .664, MSE = 1031.101, η _p ² = .02, BF ₁₀ = .179, BF ₀₁ = 5.586, p = .518.

Percentage error

The mean percentage error data for all trial types and languages are presented in Figure 6.

Figure 6. Mean percentage error with standard error for “matching” and “mismatching” trials.

Matching trials

There was a main effect of Trial Type, F(1,33) = 113.835, MSE = 115.229, η _p ² = .775, BF ₁₀ > 1000, p < .001, indicating that participants made significantly more errors on Sound-color congruent (M = 23.07, SE = 2.08) than on All congruent trials (M = 3.43, SE = .89). The main effect of Language was observed, F(1,33) = 8.034, MSE = 37.752, η _p ² = .196, BF ₁₀ = .391, BF ₀₁ = 2.557, p = .01, with higher percentage errors on French (M = 14.75, SE = 1.43) than on English trials (M = 11.76, SE = 1.39). The interaction between Trial Type and Language was marginally significant, F(1,33) = 4.272, MSE = 49.6, η _p ² = .115, BF ₁₀ = .987, BF ₀₁ = 1.013, p = .05. There was no significant difference in percentage error between French (M = 3.68, SE = 1.37) and English (M = 3.19, SE = .86) All congruent trials, t(33) = .338, M _diff = .49, BF ₁₀ = .194, BF ₀₁ = 5.155, p = .737. However, participants made significantly more errors on French (M = 25.81, SE = 2.23) than on English (M = 20.33, SE = 2.29) Sound-color congruent trials, t(33) = 3.144, M _diff = 5.483, BF ₁₀ = 10.617, p < .01, similar to the response time data.

Mismatching trials

There was a main effect of Trial Type, F(2,66) = 19.381, MSE = 11.884, BF ₁₀ > 1000, η _p ² = .37, p < .001. That is, participants made significantly more mistakes in Word-sound congruent (M = 4.095, SE = .69) relative to All incongruent (M = .532, SE = .118) trials, t(33) = 5.524, M _diff = 3.563, BF ₁₀ > 1000, p < .001), and Word-color congruent (M = 1.513, SE = .456) trials, t(33) = 3.826, M _diff = 2.583, BF ₁₀ = 54.49, p = .001. The percentage error was larger in the Word-color congruent than in the All incongruent condition, t(33) = 2.329, M _diff = .98, BF ₁₀ = 1.93, p < .05. No significant main effect of Language was observed, F(1,33) = .102, MSE = 6.423, η _p ² = .003, BF ₁₀ = .154, BF ₀₁ = 6.493, p = .752. The interaction between Trial Type and Language was significant, F(2,66) = 5.112, MSE = 7.647, η _p ² = .134, BF ₁₀ = 3.078, p = .01. There were no significant differences in percentage errors between French and English Word-sound congruent trials, t(33) = 1.788, M _diff = 1.645, BF ₁₀ = .766, BF ₀₁ = 1.305, p = .083 and All incongruent trials, t(33) = .397, M _diff = .08, BF ₁₀ = .198, BF ₀₁ = 5.05, p = .694. However, participants made significantly more errors on English than French Word-color congruent trials, t(33) = 2.223, M _diff = 1.386, BF ₁₀ = 1.587, p < .05.

Correlations

As a supplementary analysis, we assessed the level to which language metric variables correlate with different types of trials with both French (L1) and English (L2) color words used in the Stroop matching task. These analyses were purely exploratory and did not reveal any clear or significant results. However, we present these data in the Appendix for the interested reader.

Discussion

Experiment 1 had two aims: 1) compare the magnitude of between-language and within-language interference and 2) investigate the source of interference in a bilingual Stroop matching task with intermixed French (L1) and English (L2) color word distracters. Within-language interference was larger than between-language interference, but only for Sound-color congruent trials, with no significant difference between French and English word pairs across other trial types. That is, when a spoken word (e.g., “vert,” French for green) matched the ink color of the written distracter, the French incongruent distracters (e.g., “marron,” French for brown printed in green) were responded to slower and less accurately than English incongruent distracters (e.g., “brown” in green). It is plausible that French written distracters lead to a strong task-irrelevant comparison (i.e., written word-spoken word) that impairs performance on a task-relevant comparison (i.e., ink color-spoken word). Sound-color congruent trials also had significantly higher percentage errors relative to all other trial types. This is probably due to the fact that both task-irrelevant comparisons activate the “mismatching” response in contrast to task-relevant comparison which activates the “matching” response. However, the observed pattern of results for both French and English “matching” trials clearly correspond to the assumptions of both stimulus and response conflict, with faster responses on All congruent relative to Sound-color congruent trials.

Theoretically more interesting are the results for the mismatching trial types. Responses on Word-sound congruent trials were significantly slower and more error-prone relative to All incongruent and Word-color congruent trials (Bornstein, Reference Bornstein2015). That is, both incongruent French (e.g., “vert” in brown) and English (e.g., “green” in brown) distracters slowed down responding when the word distracter corresponded to the auditory stimulus (e.g., hear “vert”). This contrasts with the results of Goldfarb and Henik (Reference Goldfarb and Henik2006), who found the slowest “mismatching” responses for congruent distracters (i.e., Word-color congruent trials). Interestingly, response latencies were almost identical in French and English condition, suggesting that responding to the spoken L1 word is equally affected by a written L1 word (i.e., both spoken and written words are identical) and an L2 word (i.e., spoken and written words are not identical, but represent the same color concept, e.g., “vert” and “green”).

The responses were the fastest in All incongruent condition, which confirms the assumptions of the response conflict account. This also aligns with the findings on behavioral data of Caldas and colleagues (Reference Caldas, Machado-Pinheiro, Souza, Motta-Ribeiro and David2012) and Goldfarb and Henik (Reference Goldfarb and Henik2006), thus confirming a role of response conflict in the Stroop matching task. In contrast, the semantic conflict account should have predicted that these trials would be the slowest, because the word, color, and auditory stimulus are all incongruent with each other.

Experiment 2

Experiment 2 conceptually replicates Experiment 1 with one important modification. In particular, instead of the color words used in Experiment 1, participants were presented with French and English color associates. A complication with the matching task is that the predictions for the stimulus and response conflict account for mismatching trials are exactly in opposition. The response conflict account predicts that All incongruent trials should be the fastest of the three “mismatching” trial types (as observed), whereas the semantic conflict account predicts that they should be the slowest. Note that the predictions of both semantic and response conflict account for color associates are identical to the predictions for color words, already visualized in Figure 1. If both types of conflict exist, then it might be that the (larger) response conflict effect is concealing a (relatively smaller) semantic conflict effect. Therefore, one way to “reveal” the true effect of semantic conflict (assuming there is one, of course) would be to eliminate the response conflict. According to some, color associates produce semantic conflict (e.g., (Glaser & Glaser, Reference Glaser and Glaser1989; Schmidt & Cheesman, Reference Schmidt and Cheesman2005), but not response conflict. If this logic is correct, it remains plausible that semantic conflict will be observed for color associates. Although probably smaller, semantic conflict might emerge due to strong conceptual links between color associates and their corresponding color words. For example, on a French Sound-color congruent trial (e.g., see “ciel,” French for sky, printed in green, hear “vert,” French for green), a distracter “ciel,” associated with blue, should no longer interfere (or very little) with a relevant task comparison (i.e., “green”-“green”), simply because it does not belong to the same semantic category as a spoken word. Experiment 2 was therefore designed to further explore the role of semantic conflict that was possibly masked by response conflict in Experiment 1. Another question of interest concerns the distracter language. According to some models of bilingual memory, L2 words do not have strong direct access to semantics (Kroll & Stewart, Reference Kroll and Stewart1994). Thus, while semantic conflict might be observed for L1 words, these models would predict the absence of a semantic conflict effect for L2 words.