1. Introduction
Over the past few decades, a vast amount of psycholinguistic and neurolinguistic research has been dedicated to exploring the temporal dynamics of orthographic, phonological, and semantic processing when bilinguals read, hear, or speak words. Notably, based on the connectionist model, the bilingual interactive activation model (Dijkstra & van Heuven, Reference Dijkstra, Van Heuven, Grainger and Jacobs1998), its revised versions of the bilingual interactive activation plus model (BIA+ model; Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002), and the bilingual language interaction network for comprehension of speech model (BLINCS model; Shook & Marian, Reference Shook and Marian2013) proposed highly interactive architectures of bilingual word processing. According to their theoretical frameworks, bottom-up parallel language activation and a higher-level inhibitory control (IC) mechanism govern lexico-semantic organization of both languages when bilinguals read or hear one language.
The bimodal interactive activation model (BIAM; Diependaele et al., Reference Diependaele, Ziegler and Grainger2010; McClelland & Rumelhart, Reference McClelland and Rumelhart1981) unfolds the temporal dynamics for both visual and auditory word processing with an automatic sublexical/lexical-orthographical or -phonological involvement during speech or print perception in the monolingual contexts. Furthermore, as a bilingual extension of the BIAM, the influential BIA+ model initially depicts the bilingual lexico-semantic processing (i.e., orthography, phonology, and semantics) of visual word recognition. Recently, its generalizability for bilingual visual and auditory word recognition has been observed, as studies have demonstrated that the highly interactive architecture of bilingual word processing is also valid for auditory word recognition (Blumenfeld & Marian, Reference Blumenfeld and Marian2013; Frances et al., Reference Frances, Navarra-Barindelli and Martin2021; Villameriel et al., Reference Villameriel, Dias, Costello and Carreiras2016). Thus, as an integrated version of several models, the BIA+ model proposes a hierarchical architecture involving two subsystems from word identification to task schema: Letter cluster input leads to on-line co-activation of many lexical-orthographical and -phonological candidates from different languages, attesting to the bottom-up ‘language non-selective lexical access’. Additionally, the task schema subsystem is strongly influenced by Green’s (Reference Green1998)) IC model, which emphasizes the impact of IC on bilinguals’ language production and that bilinguals process the lexico-semantic system differently in different task contexts. Thus, the top-down IC mechanism is also initiated to handle the competition and selection of lexical-orthographical and -phonological candidates between two languages.
1.1. Behavioral and electrophysiological evidence for cross-linguistic activation
Indeed, under the framework of the BIA+ model, behavioral studies with cognate word pairs (Carrasco-Ortiz et al., Reference Carrasco-Ortiz, Amengual and Gries2021), interlingual homophones (Brysbaert et al., Reference Brysbaert, Van Dyck and Van de Poel1999; Duyck et al., Reference Duyck, Diependaele, Drieghe and Brysbaert2004; Sauval et al., Reference Sauval, Perre, Duncan, Marinus and Casalis2017), or interlingual homographs (Bijeljac-Babic et al., Reference Bijeljac-Babic, Biardeau and Grainger1997) have investigated the effects of cross-linguistic phonological and orthographic overlap on bilinguals’ lexical processing. For bilinguals with different writing systems, such as Greek and Spanish (Dimitropoulou et al., Reference Dimitropoulou, Duñabeitia and Carreiras2011), Russian and English (Jouravlev et al., Reference Jouravlev, Lupker and Jared2014), or Chinese and English (Zhou et al., Reference Zhou, Chen, Yang and Dunlap2010), researchers have also provided direct evidence for cross-linguistic interaction at multiple lexical levels. For example, reaction time (RT) is reduced in Greek–Spanish bilinguals during the recognition of the target-language words when briefly preceded by only phonologically related prime words of the non-target language, while adding orthographic similarity to the prime-target word pairs eliminates this facilitatory priming effect (Dimitropoulou et al., Reference Dimitropoulou, Duñabeitia and Carreiras2011). The authors argue that the cross-linguistic lexical competition at the orthographic level offsets the bottom-up priming effect. The findings fit well with the theory of the BIA+ model, in which bottom-up parallel language activation and lexical competition between languages coexist.
Moreover, in the investigation of specifying the nature of lexical processing (i.e., orthography and phonology) across modalities, a series of studies with the Visual World Paradigm (Berghoff et al., Reference Berghoff, McLoughlin and Bylund2021; Marian et al., Reference Marian, Blumenfeld and Boukrina2008; Shook & Marian, Reference Shook and Marian2019; Veivo et al., Reference Veivo, Järvikivi, Porretta and Hyönä2016; Weber & Cutler, Reference Weber and Cutler2004) have revealed an impressive amount of interactivity between lexicons in a bottom-up manner for same-script bilinguals. Further converging results were also reported in bilinguals who use different scripts (Giezen et al., Reference Giezen, Blumenfeld, Shook, Marian and Emmorey2015; Mishra & Singh, Reference Mishra and Singh2014, Reference Mishra and Singh2016; Shook & Marian, Reference Shook and Marian2012). For instance, in an investigation of cross-linguistic lexical activation of auditory words in Hindi–English bilinguals, Mishra and Singh (Reference Mishra and Singh2014, Reference Mishra and Singh2016) instructed the participants to look at an array of L1/L2 words while simultaneously being presented with an L2/L1 spoken word. The critical distractors were the phonological cohorts of translation equivalents of the auditory targets. Eye-tracking results revealed that, relative to unrelated distractors, participants more quickly oriented their visual attention toward those critical distractors. As predicted by the above-mentioned interactive models, these findings indicated an activation of the phonology of the non-target lexicon during the processing of bilinguals’ one language. With respect to cross-linguistic orthographic coding, complementary studies also revealed that L1 orthography and grapheme-to-phoneme mapping could influence L2 spoken word processing in a modality-independent way (Escudero & Wanrooij, Reference Escudero and Wanrooij2010; Marian et al., Reference Marian, Bartolotti, Daniel and Hayakawa2021; Qu et al., Reference Qu, Cui and Damian2018; Veivo et al., Reference Veivo, Porretta, HyÖNÄ and JÄRvikivi2018).
Time-locking event-related potentials (ERPs) provide insights into the temporal dynamics that underlie cross-linguistic interactions in bilinguals. One of the well-established ERP components is the N400 component in the 300–500-ms time-window, which is highly correlated with lexico-semantic processing during word or sentence reading, speech hearing, and picture identification (Kutas & Hillyard, Reference Kutas and Hillyard1984; for a review, see Kutas & Federmeier, Reference Kutas and Federmeier2011). It has been demonstrated that a decreased amplitude in the N400 component reflects the ease of retrieving lexico-semantic representations of target words in bilingualism (for a review, see Jankowiak & Rataj, Reference Jankowiak and Rataj2017). For example, the reduced negativity of the N400 component in response to cognate word pairs has been interpreted as facilitation due to greater ease in mapping form onto meaning in bilingual word recognition (Midgley et al., Reference Midgley, Holcomb and Grainger2011; Peeters et al., Reference Peeters, Dijkstra and Grainger2013).
Additionally, the modulation of the N400 amplitude can be the neural signature of on-line lexical integration and co-activation in bilingual word processing (Chen et al., Reference Chen, Bobb, Hoshino and Marian2017) and reveal unconscious native lexical access during non-native word recognition (Thierry & Wu, Reference Thierry and Wu2007; Wu & Thierry, Reference Wu and Thierry2010). For instance, using an implicit priming paradigm, Wu and Thierry (Reference Wu and Thierry2010) asked Chinese–English bilinguals and English monolinguals to decide the relatedness of the meaning of written or spoken English word pairs. The critical semantically unrelated word pairs randomly contained an orthographic (e.g., ‘accountant’ [会计/Kuai4Ji4/] – ‘conference’ [会议/Hui4Yi4/]) or phonological (e.g., ‘experience’ [经验/Jing1Yan4/] – ‘surprise’ [惊讶/Jing1Ya4/]) repetition of their Chinese translations. Even though behavioral results did not reveal any significant implicit priming effect, the attenuation of the N400 component on critical word pairs in Chinese–English bilinguals, but not in English monolinguals, revealed an involuntary activation of cross-linguistic phonological rather than orthographic representation in both visual and auditory modalities. In a similar implicit phonological priming task with American Sign Language–English bilinguals, Meade et al. (Reference Meade, Midgley, Sevcikova Sehyr, Holcomb and Emmorey2017) further demonstrated that the N400 component indexed bottom-up cross-linguistic phonological co-activation.
The electrophysiological studies with the translation recognition paradigm have identified another ERP component that is sensitive to the process of through-translation lexically similar word pairs (Guo et al., Reference Guo, Misra, Tam and Kroll2012; Ma et al., Reference Ma, Chen, Guo and Kroll2017; Moldovan et al., Reference Moldovan, Demestre, Ferré and Sánchez-Casas2016). This component is designated as the late positive component (LPC) and temporally and spatially overlaps with another component, the P600. Unlike the N400 component, LPC is thought to reflect more extensively elaborate processes, such as monitoring in response to spelling, semantic, or syntactic violations (van de Meerendonk et al., Reference van de Meerendonk, Indefrey, Chwilla and Kolk2011), semantic integration difficulty (Brouwer et al., Reference Brouwer, Fitz and Hoeks2012, Reference Brouwer, Crocker, Venhuizen and Hoeks2017), or information reanalysis (Stites et al., Reference Stites, Federmeier and Christianson2016). Guo et al. (Reference Guo, Misra, Tam and Kroll2012) conducted a critical study, instructing bilinguals to judge whether Chinese–English word pairs were correct translations. The critical non-translation word pairs were related in lexical form through translation (e.g., ‘bee’ [蜂/Feng1/] – ‘峰/Feng1/’ [peak]). In addition to the behavioral interference effects (slower RTs on critical word pairs than on matched controls), the ERP data revealed that the form-related neighbor condition significantly modulated the LPC (500–700 ms). Accordingly, researchers explained that the LPC effect could be evidence of information reanalysis being implemented for later decision-making processing. Furthermore, compared with the N400 component, the LPC component is usually thought to reflect more explicit processing (Hoshino & Thierry, Reference Hoshino and Thierry2012; Müller et al., Reference Müller, Duñabeitia and Carreiras2010). For example, Hoshino and Thierry (Reference Hoshino and Thierry2012) asked Spanish–English bilinguals to judge whether English word pairs were related in their meaning. Prime-target word pairs were related in their meaning in English (e.g., ‘apple’ – ‘pie’), in meaning through translation (e.g., ‘toe’ – ‘pie’; ‘pie’ means ‘foot’ in Spanish), or completely unrelated (e.g., ‘bed’ – ‘pie’). The N400 priming effects were observed when the prime was related to the target in meaning in both English and Spanish, suggesting an automatic semantic activation of both languages. However, the LPC priming effect was only observed for word pairs that were related in their meaning in English. In contrast, English monolinguals showed semantic priming effects in the N400 and LPC time-windows only when word pairs were related in their meaning in English. Researchers explained that although both semantic representations of the interlingual homographs were implicitly activated, as reflected by the attenuation of the N400 component, only the meaning of the target language reached a later explicit stage, as reflected by more positive LPC amplitudes.
1.2. The present study
To date, studies reviewed thus far found robust evidence of non-selective cross-linguistic lexical activation in a modality-independent way, even in a purely non-native word context. To the best of our knowledge, no study has investigated the electrophysiological dynamics of cross-linguistic orthographic and phonological activation in the processing of auditory L2 words in different-script Chinese–English bilinguals. As an alphabetic language, the phonemes and graphemes of English correspond to a large extent. Nevertheless, Chinese is a logographic script that allows clear dissociation between orthography and phonology. Such properties of the languages would document both distinct and interactive effects of orthography and phonology between written L1 and spoken L2 words. As such, it would be valuable to investigate how the assumptions of the BIA+ model can be generalized to different modalities and different-script language systems.
Overall, the present study aims to explore the cross-linguistic orthographic and phonological interaction with time-locking ERP measures. More specifically, to investigate the temporal dynamics of cross-linguistic orthographic and phonological activation in the processing of auditory L2 words, we employed a translation recognition task with visual L1 words in Chinese always being preceded by auditory L2 words in English. The task is typically motivated by the fact that it requires the participants to activate the form-meaning systems of two languages simultaneously to tap into the lexical links between two languages (de Groot, Reference de Groot1992). Previous studies have used the masked priming (Jiang, Reference Jiang1999; Wen & van Heuven, Reference Wen and van Heuven2018; Xia & Andrews, Reference Xia and Andrews2015; Zhang et al., Reference Zhang, van Heuven and Conklin2011) or implicit priming paradigm (Thierry & Wu, Reference Thierry and Wu2007; Wu & Thierry, Reference Wu and Thierry2010) to explore bilingual lexical interactions in Chinese–English bilinguals. These priming paradigms, however, are often helpful to explore the bottom-up or unconscious cross-linguistic lexical priming when word recognition is performed in a purely non-native word context and therefore do not require bilinguals to activate form-meaning mapping to the same extent as the translation recognition. Additionally, we used monosyllabic Chinese words as written words, as featured in a behavioral study by Ma and Ai (Reference Ma and Ai2018). Thus, there were four different conditions in the task. The written words (Chinese, L1) could be translation equivalents (e.g., ‘窗/Chuang1/’ – ‘window’), orthographically related (orthographic translation neighbor, e.g., ‘特/Te4/’ [particular] – ‘poem’ [诗/Shi1/]), phonologically related (phonological translation neighbor, e.g., ‘脚/Jiao3/’ [foot] – ‘angle’ [角/Jiao3/]), or completely unrelated (e.g., ‘观/Guan1/’ [observe] – ‘poison’ [毒/Du2/]) to the translations of the English spoken words.
In general, studies with translation recognition tasks have revealed significant interference effects (Guo et al., Reference Guo, Misra, Tam and Kroll2012; Ma & Ai, Reference Ma and Ai2018; Sunderman & Priya, Reference Sunderman and Priya2012). As mentioned earlier, the LPC component typically reflected the top-down decision-making processing in the postlexical stage. Since an increased positivity in LPC amplitude has been reported to be associated with slower responses (Guo et al., Reference Guo, Misra, Tam and Kroll2012; Ma et al., Reference Ma, Chen, Guo and Kroll2017; Wen et al., Reference Wen, Filik and van Heuven2018), we predict increased positivity in LPC amplitude for word pairs with through-translation orthographic and phonological overlap.
Additionally, the processing of the semantically unrelated English word pairs has been found to be modulated by the (implicit) lexical overlap between Chinese translations of the English words (Thierry & Wu, Reference Thierry and Wu2007; Wu & Thierry, Reference Wu and Thierry2010), which suggested connective links between the lexicons. As suggested by the BIA+ model, the activation across the lexicons also spreads in a bottom-up manner, even for different-script Chinese–English bilinguals. Since less negativity in the N400 component has been demonstrated to be associated with on-line lexical retrieval (for a review, see Jankowiak & Rataj, Reference Jankowiak and Rataj2017), we predict a cross-linguistic on-line lexical extraction when listening to English spoken words in the visuo-auditory language context, as reflected by the attenuation of the N400 component on the critical word pairs.
2. Methods
2.1. Participants
Thirty-two right-handed college students (19–25 years old, mean age = 21.8, SD = 2.1, 11 males) from Chongqing University participated in this study. They reported no psychiatric or neurological diseases and hearing problems and had normal or corrected-to-normal visual acuity. Participants were paid monetary compensation and gave written informed consent before participation, following the ethics protocol of the Institutional Research Ethics Committee of Chongqing University.
All participants were native speakers of Mandarin Chinese (L1) and started learning English (L2) at approximately 10 years old in a homogeneous academic context. Their proficiency in L2 was verified as follows. First, their mean College English Test Band 4 (CET 4) score was 549 (SD = 37, the highest score possible is 710, and the cut-off point for failure and success is 425). According to the CET 4 scoring criteria, 510–590 indicates upper intermediate proficiency. Second, all participants participated in an English version of the LexTALE test (Lemhofer & Broersma, Reference Lemhofer and Broersma2012; downloaded from http://www.lextale.com). The LexTALE test is an off-line lexical decision task in English that consists of 60 trials, including 40 words and 20 nonwords. The participant’s task is to decide whether each item is an existing English word or not. Accordingly, LexTALE is a quick and valid English vocabulary test that substantially correlates with a measure of general English proficiency. The mean LexTALE score is 62.3 (SD = 12.9, the highest score is 100, below 59 indicates lower intermediate proficiency, and 60–80 indicates upper intermediate proficiency). Finally, their English proficiency was self-reported on a 1- to 5-point Likert scale, with 1 being not fluent and 5 being very fluent. The self-reporting mean scores of English listening, speaking, reading, and writing abilities were 3.0 (SD = 0.8), 2.8 (SD = 0.7), 3.7 (SD = 0.6), and 3.3 (SD = 0.6), respectively. Taken together, according to the scores in the subjective and objective evaluation criteria, the participants were at an upper intermediate level in their English proficiency. The descriptive statistics (mean and SD) of the participants’ self-assessed English proficiency, CET 4 score, and LexTALE test score are summarized in Table 1.
2.2. Materials
We first generated a database of English target words from SUBTLEXUS corpus (Brysbaert & New, Reference Brysbaert and New2009), which were randomly subdivided into four lists of words. Ten postgraduate students who did not join the formal experiment were asked to translate the English words into one-character translation equivalents whenever possible (e.g., ‘window’ can be translated as one-character Chinese word ‘窗/Chuang1/’). Translation equivalents with the highest probability of occurrence were selected. Then, the Chinese visual words of four stimulus conditions were generated according to their lexico-semantic similarity, and each word was displayed only once to avoid repetition effects. A total of 240 word pairs were selected as stimuli:
-
1. Sixty word pairs of Chinese–English translation equivalents (e.g., ‘窗/Chuang1/’ – ‘window’);
-
2. Sixty word pairs of Chinese–English non-translation equivalents in which Chinese words are only orthographically related to the correct translations of the English words (orthographic translation neighbor, e.g., ‘特/Te4/’ [particular] – ‘poem’ [诗/Shi1/], which share the same radical ‘寺’);
-
3. Sixty word pairs of Chinese–English non-translation equivalents in which Chinese words are only phonologically related to the correct translations of the English words (phonological translation neighbor, e.g., ‘脚/Jiao3/’ [foot] – ‘angle’ [角/Jiao3/], which share the same pronunciation ‘/Jiao3/’);
-
4. Sixty word pairs of Chinese–English non-translation equivalents in which Chinese words are orthographically, phonologically, and semantically unrelated to the correct translations of the English words (e.g., ‘观/Guan1/’ [observe] – ‘poison’ [毒/Du2/]).
Five postgraduate students from Chongqing University who did not join the following experiment evaluated the translation equivalency with a score ranging from 1 (non-translation equivalent) to 5 (translation equivalent) for each word pair of each condition. The selected translation and non-translation equivalents averagely scored 4.9 (SD = 0.4) and 1 (SD = 0), respectively. They also rated the semantic relatedness and lexical similarity through translation of the three non-translation equivalent conditions (orthographic translation neighbor, phonological translation neighbor, and unrelated control) with a score ranging from 1 (semantically unrelated or similar in lexicon through translation) to 5 (semantically related or not similar in lexicon through translation). Paired sample t-tests showed no significant differences among three non-translation equivalent conditions in semantic relatedness (all ts < 1, p > 0.99), and the scores in lexical similarity through translation of orthographic and phonological translation neighbor were significantly lower than that of unrelated control (both ts > 62.8, p < 0.001).
The auditory English stimuli were generated by the voice synthesizer software Balabolka with a female native American accent (16 kHz). Each auditory word stimulus was edited using Audacity software for precise onset time for ERP synchronization and stored in a separate MP3 file. Stimuli varied in length from 352 to 857 ms, with an average length of 576 ms. One-way analyses of variance (ANOVA) revealed no significant differences in audio length among the four stimulus-type conditions (F(3, 177) = 0.09, p = 0.97, η2p = 0.002).
All English words chosen from the database of SUBTLEXUS corpus (Brysbaert & New, Reference Brysbaert and New2009) were highly common words with string lengths ranging from 3 to 8 letters and with the frequency the log 10W (log 10W is the log 10 of the total count of words that has been observed in the corpus) provided. Correspondingly, all Chinese words were one-character words with the frequency of log 10W provided from the corpus of SUBTLEX-CH-WF (Cai & Brysbaert, Reference Cai and Brysbaert2010). Furthermore, there were no significant differences in the numbers of strokes, string lengths, or mean frequencies among the four stimulus conditions (all Fs < 0.87, p > 0.45). The descriptive statistics (mean and SD) of the number of strokes, string lengths, frequency, audio length, L1–L2 semantic relatedness, and L1–L2 lexical similarity across the four stimulus types are summarized in Table 2.
2.3. Procedure
After electrode preparation, participants were tested individually in a soundproof and electrically shielded room, approximately 60 cm in front of a 19-inch monitor and two sound boxes. The session started with 16 practice trials to reduce the number of missing cells in the formal experiment. The event sequence for each trial was as follows (see Fig. 1): a black fixation cross appeared at the center of the screen randomly between 1,000 and 1,500 ms. This was succeeded by a Chinese word with a 40-point black Times New Roman font visually appearing for 250 ms, followed by 500-ms blank screen. Next, an auditory English word was presented from the sound boxes, during which a red fixation cross was presented for 1,000 ms. Finally, a blank screen was presented up to a maximum of 3,000 ms. Participants were instructed to judge whether the auditory English word was the correct translation of the preceding Chinese word by pressing the ‘F’ or ‘J’ key as accurately and quickly as possible (counterbalanced across participants) and pressing the space bar when they were not sure for translations. After finishing the translation recognition task, a symbol of an eye blink ‘(− −)’ was presented for 2,000 ms, during which participants were allowed to blink. Behavioral data (RT and accuracy) were collected simultaneously along with electroencephalogram (EEG) data.
The 240 experiment trials were randomly divided into four blocks. Self-paced rest periods occurred at 60-trial intervals. Stimuli presentation was controlled by E-prime 3.0 software (Psychological Software Tools, Pittsburgh, PA.). The entire experiment lasted for about 35 min.
2.4. Behavioral data processing and analyses
The accuracy of the dependent variable (0 or 1) was analyzed using a mixed logit model (Jaeger, Reference Jaeger2008). This model was fitted using the glmer function of lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (R Core Team, 2019). Fixed effects corresponded to experimental manipulations of stimulus type (translation, orthography, phonology, and control). The random effects structure of the final fitted model consisted of by-participant random intercepts and slopes and by-item random intercepts. The model did not contain the by-item random slopes because the stimulus types did not vary with items (Winter, Reference Winter2019).
RTs were measured after the onset of the auditory English word of each trial. Trials with errors (9.97% of the data) were not included for further analyses. Trials with RTs that were 2.5 standard deviations (SD) above or below the grand mean RT (2.62% of the data) were discarded to reduce outliers. The RT data were first log-transformed to reduce skewness and then delivered to a linear mixed-effects regression model (Baayen et al., Reference Baayen, Davidson and Bates2008) as the dependent variable. The model was fitted using the lmer function in the lme4 package, including the fixed effect of stimulus type, by-participant random intercepts and slopes, and by-item random intercepts.
2.5. EEG/ERP recording and analyses
During the cross-modal translation recognition task, an actiChamp amplifier and 64 Ag/AgCl active electrodes based on the International 10-20 system (Brain Products GmbH, Gilching, Germany) were used for EEG recording at a sampling rate of 1,000 Hz. Cz and Fpz were set as the on-line reference and ground electrode, respectively. The vertical electrooculogram was obtained from the electrode pasted 1 cm above the left eye. All electrode impedances were kept below 5 KΩ with an on-line frequency range of 0.01–70 Hz.
The EEG data were first pre-processed individually with BrainVision Analyzer 2.1 (Brain Products GmbH) as follows: Spherical splines topographically interpolated bad channels detected with visual inspection if necessary. The off-line reference was transformed to the average activity of two mastoids (i.e., TP9 and TP10). The data were filtered with a band-pass filter of 0.3–30 Hz (24 dB/octave slope). The independent component analysis (ICA) method was employed for artifact correction, such as eye movement, eye blink, and muscle noise, among others (Jung et al., Reference Jung, Makeig, Humphries, Lee, McKeown, Iragui and Sejnowski2000), with 62 ICA components being decomposed. In all datasets, independent components related to eye blink were found to be the most frequent cause of rejection according to their spatial and temporal features, and approximately 1–4 independent components were identified and removed by visual inspection. Continuous data were segmented based on the spoken L2 words for trials given correct response, beginning at the baseline 200 ms before the spoken L2 words onset and lasting 1,000 ms. Baseline correction was performed during the 200-ms pre-spoken word onset period. All EEG segments with amplitudes beyond ±75 μV were marked as bad and excluded from the further grand average automatically (3.45% of the data were excluded). Separate ERPs were formed for the four experimental conditions. To ensure a good signal-to-noise ratio, participants without a sufficient number of artifact-free trials (>30 times) in any condition were excluded after EOG correction and artifact rejection. Four subjects were excluded.
As is shown in Fig. 3, two expectant components, one maximal centro-parietal N400 (300–500 ms) and its following maximal front-central LPC (500–800 ms), were calculated. Twelve electrode sites (F3/Fz/F4, C3/Cz/C4, P3/Pz/P4, and O1/Oz/O2) distributed across four levels of the region (Frontal, Central, Parietal, and Occipital), and three levels of laterality (Left, Midline, and Right) were used to analyze the ERP data. Two repeated-measures ANOVA with three within-subject factors of stimulus type, region, and laterality were separately performed on the N400 and LPC amplitudes. Greenhouse and Geisser (Reference Greenhouse and Geisser1959) corrections were applied to all analyses when Mauchly’s test of sphericity was violated with conservative degrees of freedom reported. The effect size partial eta-squared (η2P ) was also reported. For significant interaction effects involving the factor of stimulus type, follow-up simple effect analysis tested the effect of stimulus type. To protect against Type I error, further pairwise comparisons concentrated on the comparisons between translation-related conditions (i.e., translation equivalent, orthographic translation neighbor, and phonological translation neighbor) and unrelated control with Bonferroni correction.
3. Results
3.1. Behavioral results
Mean accuracies and RTs as a function of stimulus type are shown in Fig. 2. The fitted model revealed a significant main effect of stimulus type on accuracy (χ 2(3) = 16.12, p < 0.001). Post-hoc test with Tukey correction showed that the accuracy of the unrelated control condition was significantly higher than that of the translation equivalent condition (Estimate = 1.00, SE = 0.29, z-value = 3.45, p = 0.003), and that of phonological translation neighbor condition (Estimate = 0.96, SE = 0.27, z-value = 3.55, p = 0.002). However, there was no significant difference between unrelated control and orthographic translation neighbor condition (Estimate = 0.62, SE = 0.28, z-value = 2.23, p = 0.12).
A likelihood ratio test showed a significant main effect of stimulus type on RT (χ 2(3) = 53.51, p < 0.001). Post-hoc test with Turkey correction showed that the participants responded faster to the translation equivalent condition than to the other three non-translation equivalent conditions (all z-values >6.64, p < 0.001). The crucial comparisons between conditions that required No response indicated significant interference effects (both z-values > 2.62, p < 0.05).
3.2. ERP results
Grand averaged ERP waveforms are collapsed according to translation-related conditions against unrelated control and exhibited at the representational electrode sites of Fz, Cz, Pz, and Oz in Fig. 3. The topographic distributions describing the difference of unrelated control minus translation-related conditions in the time-windows of 300–500 and 500–800 ms are exhibited in Fig. 4. Details of the analyses are described below.
3.2.1. N400 (300–500 ms)
The main effect of stimulus type was significant (F(3, 81) = 7.48, p < 0.001, η2 p = 0.22), which interacted with region (F(9, 243) = 3.42, p = 0.02, η2p = 0.11), but not with laterality (F(6, 162) = 1.36, p = 0.25, η 2p = 0.05), and the three-way interaction effect (F(18, 486) = 1.33, p = 0.23, η2p = 0.05) was not significant. Follow-up simple analyses revealed significant effects of stimulus type at the central–parietal–occipital regions (all Fs > 4.85, p < 0.004), but not at the frontal region (F(3,81) = 0.52, p = 0.67, η2p = 0.02). Further pairwise comparisons showed that translation equivalent condition elicited less negative N400 amplitudes than unrelated controls at the central–parietal–occipital regions (all ts > 3.85, p < 0.004). Importantly, orthographic translation neighbor condition elicited less negative N400 amplitudes than unrelated controls at the parietal–occipital regions (both ts > 3.05, p < 0.03). No differences were found between phonological translation neighbor condition and unrelated control (all ts < 1, p > 0.99).
3.2.2. Late positive component (500–800 ms)
The main effect of stimulus type was significant (F(3, 81) = 17.33, p < 0.001, η2p = 0.39), which interacted with region (F(9, 243) = 3.05, p = 0.04, η2p = 0.10). However, the two-way interaction effect between stimulus type and laterality (F(6, 162) = 1.54, p = 0.21, η2p = 0.05) and three-way interaction effect (F(18, 486) = 1.16, p = 0.33, η 2p = 0.04) were not significant. Follow-up simple analyses revealed significant effects of stimulus type at all regions (all Fs > 10.43, p < 0.001). Further pairwise comparisons showed that translation equivalent condition elicited more positive LPC amplitudes than unrelated controls at the central–parietal–occipital regions (all ts > 3.60, p < 0.008). Importantly, both orthographic (t(27) = 2.78, p = 0.058) and phonological (t(27) = 2.84, p = 0.051) translation neighbor conditions elicited marginally less positive LPC amplitudes than unrelated controls at the frontal region.
4. Discussion
The present study investigated the roles of cross-linguistic orthography and phonology during L2 auditory word recognition in a cross-modal situation. The critical relationships between written L1 words and spoken L2 words were orthographically or phonologically related through translation (i.e., orthographic and phonological translation neighbor). Compared with unrelated control, behavioral results revealed that participants responded slower in RT for the critical manipulations (see Fig. 2). More importantly, both orthographic and phonological translation neighbor conditions elicited marginally less positive LPC amplitudes than unrelated control conditions at the frontal region. Furthermore, only orthographic translation neighbor condition elicited less negative N400 components than unrelated control conditions at the parietal–occipital regions (see Figs. 3 and 4).
With respect to behavioral results, in line with other behavioral studies with unimodal translation recognition tasks (Ma & Ai, Reference Ma and Ai2018; Sunderman & Priya, Reference Sunderman and Priya2012), both cross-linguistic orthographic and phonological translation neighbors produced interference effects to the identification performance, as reflected by slower response time, indicating that conflict and competition take place for decision between languages. More specifically, when presented with Chinese–English non-translation equivalents that were orthographically or phonologically related through translation, participants may need more effort to re-check whether the pairs were correct translations to avoid incorrect responses. Furthermore, the dissociation between our findings and Wu and Thierry’s (Reference Wu and Thierry2010) reports (no behaviorally implicit priming effects) can be explained by different experiment tasks. In comparison with the implicit priming paradigm, the translation recognition task instructed the participants to explicitly activate and compare the form-meaning mapping of both languages. Note that due to the higher percentage of non-translation word pairs, we found a typical speed-accuracy trade-off that participants were less accurate and faster in responding to translation equivalents than to non-translation equivalents.
Regarding the ERP data, the results showed expected and widely distributed translation priming, as reflected by less negative or more positive ERP waveforms in the N400 and LPC time-windows (see Figs. 3A and 4A). It is very likely that the typical N400 and its carry-over, LPC, reflected the lexico-semantic activation and integration between words. More importantly, as expected, orthographic translation neighbor elicited less negative N400 amplitudes than unrelated controls at the parietal–occipital regions (see Figs. 3B and 4B). The result suggested a cross-linguistic lexical retrieval in the processing of spoken English words. However, the direction of LPC effects is opposite to the hypothesis. Both orthographic and phonological translation neighbor conditions elicited marginally less positive LPC amplitudes than unrelated controls in the frontal region (see Fig. 3B,C). These findings extended previous findings that cross-linguistic orthographic and phonological information is non-selectively activated for two languages that do not share script and modality, and the cross-linguistic orthographic and phonological activations have different temporal dynamics.
Typically, the N400 component reflects semantic processing and is sensitive to the ease of accessing features associated with the lexical codes (for a review, see Kutas & Federmeier, Reference Kutas and Federmeier2011). Under this framework, any influence boosting lexical access will lead to reduced N400 amplitudes. Thus, the orthographic N400 effect suggested that English auditory words activated the corresponding Chinese orthographic codes via translation, which is subsequently facilitated by the preceding orthographically similar written words. Moreover, the N400 has also been demonstrated to index bottom-up lexical processing (Meade et al., Reference Meade, Midgley, Sevcikova Sehyr, Holcomb and Emmorey2017) and be less sensitive to decision-related factors that influence behavioral performance (Holcomb et al., Reference Holcomb, Grainger and O’Rourke2002). In the present study, the orthographic effect was obtained at arrpoximately 300–500 ms and during the unfolding of English spoken words, whose average length was approximately 576 ms. Accordingly, such a rapid sequence of spreading activation may demonstrate a dynamic process of spoken word recognition, where on-line activation of cross-linguistic orthography is accessed as the auditory stimuli unfold. Moreover, such activation is not strictly as a postlexical process.
The temporal dynamics of the effect are compatible with previous studies reporting early and automatic effects of orthography on spoken word recognition in a monolingual context. For instance, electrophysiological studies of the orthographic impact on spoken word recognition have revealed that the earliest effect emerged at around 320-ms post-target onset (Perre & Ziegler, Reference Perre and Ziegler2008). Eye-tracking studies revealed that the orthographic representations were rapidly activated and used in mapping spoken words onto potential written words (Salverda & Tanenhaus, Reference Salverda and Tanenhaus2010). Just as predicted by the BIAM (Diependaele et al., Reference Diependaele, Ziegler and Grainger2010; McClelland & Rumelhart, Reference McClelland and Rumelhart1981), the links between phonology and orthography allow a continuous mapping of speech input onto orthographic representation as speech unfolds. As a bilingual extension of the BIAM, the BIA+ model further emphasizes the lexical links across languages. Bilinguals activate phonological and orthographic representations of both languages during the processing of words in any one language. For example, eye-tracking studies with the Visual World Paradigm also revealed that hearing L2 spoken words activated L1 orthographic representations and grapheme-to-phoneme mapping at the early stages of processing (Marian et al., Reference Marian, Bartolotti, Daniel and Hayakawa2021). The current results expanded previous findings that bilinguals parallelly activated the cross-linguistic orthographic information during listening to spoken words in different-script Chinese–English bilinguals.
Contrary to the hypothesis in polarity, both critical conditions elicited marginally less positive LPC responses in the frontal region. As mentioned earlier, the LPC is often observed following the N400 component during word comprehension, which is sensitive to top-down decision-making processing in the postlexical stage. An increased positivity in LPC amplitude has been reported to be associated with longer RTs (Guo et al., Reference Guo, Misra, Tam and Kroll2012; Ma et al., Reference Ma, Chen, Guo and Kroll2017; Wen et al., Reference Wen, Filik and van Heuven2018). For example, Guo et al. (Reference Guo, Misra, Tam and Kroll2012) found that the critical non-translation word pairs that were related in lexical form through translation generated a behavioral interference effect and larger LPC from the frontal to parietal regions. The opposite pattern in polarity between our findings and Guo et al.’s (Reference Guo, Misra, Tam and Kroll2012) reports is currently unknown, but the less positive amplitudes and topographical frontal distributions probably suggested that those processes could be related to executive control function processing, such as interference suppression, conflict monitoring, or inhibition, due to the fact that those processes were mainly found to involve frontal lobes (for reviews, see Alvarez & Emory, Reference Alvarez and Emory2006). For this reason, the possible explanation was that the less positive LPC amplitude to the orthographically or phonologically related word pairs is due to an additional executive control function involved in processing these word pairs. To be clear, when presented with Chinese–English word pairs that were orthographically or phonologically related through translation, participants activated both orthographic and phonological representations of Chinese translation equivalents in the processing of spoken English words, resulting in cross-linguistic competition and interference at the lexical level. To suppress the interference of preceding visual Chinese words, they may need more executive control function to avoid incorrect responses. However, as argued by Guo et al. (Reference Guo, Misra, Tam and Kroll2012), it is important that future research tests these results further. Notably, two languages involving distinct perceptual systems usually do not suffer the same degree of lexical competition and interference as unimodal language processing (Emmorey et al., Reference Emmorey, Luk, Pyers and Bialystok2008; for a review, see Emmorey et al., Reference Emmorey, Giezen and Gollan2016). This might be the reason why the LPC effects were only marginally significant.
According to the BIA+ model regarding the temporal dynamics of lexical interaction (see Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002), cross-linguistic lexical activations are more automatic and bottom-up at the early stages of word processing, but more susceptible to the top-down IC mechanisms at the later stages of comprehension. Under such a framework, in an eye-tracking study by Blumenfeld and Marian (Reference Blumenfeld and Marian2013) where English–Spanish bilinguals listened to L2 spoken words, cross-linguistic activation that occurred at the period of 300–500 ms was not impacted by executive control processes. However, cross-linguistic activation highly correlated with executive measures occurred at the period of 633–767 ms. This finding further supports our interpretation of LPC effects, which reflects the modulation of the top-down executive control mechanisms on cross-linguistic lexical competition and interference.
4.1. Theoretical implications
The connectionist BIA+ model (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002) accounts for an integrated word recognition system that is highly interactive. It suggested a cross-linguistic lexical activation in a bottom-up manner and that the top-down IC mechanism exerts an impact on each language node. Accordingly, it predicts both intra- and interlingual lexical bottom-up priming and competition effects and that top-down IC would deal with this competition. There is ample evidence that the lexical representations of both languages were activated in parallel and competed with each other, thereby leading to within- and between-language facilitatory priming or inhibitory interference effects (Dimitropoulou et al., Reference Dimitropoulou, Duñabeitia and Carreiras2011; Marian & Spivey, Reference Marian and Spivey2003; Midgley et al., Reference Midgley, Holcomb, vanHeuven and Grainger2008; Sunderman & Priya, Reference Sunderman and Priya2012; van Heuven et al., Reference van Heuven, Dijkstra and Grainger1998). For example, in a translation recognition task with Hindi–English bilinguals, Sunderman and Priya (Reference Sunderman and Priya2012) found interference for word pairs with orthographic and phonological overlap through translation, but facilitation for word pairs with purely phonological overlap. The authors argued that parallel cross-linguistic activation and top-down task schema, apparent even in maximally different-script bimodal bilinguals, such as Sign and Speech (Giezen et al., Reference Giezen, Blumenfeld, Shook, Marian and Emmorey2015; Giezen & Emmorey, Reference Giezen and Emmorey2016; Meade et al., Reference Meade, Midgley, Sevcikova Sehyr, Holcomb and Emmorey2017), resulted in differential patterns of the effects. Under such a framework, we believe that bottom-up parallel cross-linguistic activation as well as top-down IC led to the patterns discovered in our study because the cross-linguistic co-activation of spoken L2 words was primed and competed with L1 words at the lexical level, as reflected by the modulation of the N400 and LPC. More specifically, since the task required the participants to activate and compare the form-meaning systems of the two languages simultaneously, multiple lexical candidates that linked to the speech input became active in parallel during earlier stages of spoken word recognition. As a consequence, the cross-linguistic orthographic representations became active and were primed by the orthographic neighbors, leading to the earlier N400 effect. Subsequently, the multiple candidates compete with each other as well as with previous orthographic or phonological neighbors. Resolution of such competition and interference is thought to rely on an IC mechanism, leading to later LPC effects.
It is worth noticing that the BIA+ model initially emphasizes the cross-linguistic lexical activation between two languages that share the same script systems. It depicts bilingual visual word recognition that usually has cross-linguistic orthographic, phonological, and semantic overlap. Thus, a given input automatically provides bottom-up activation of both languages. However, due to the lateral connections between lexicons established through abundant practice as bilinguals learning to read and write, the present results further demonstrated that the differences in script systems and modality do not block cross-linguistic lexical activation. On a theoretical level, the present study offered evidence to the generalizability of the BIA+ model in Chinese–English bilinguals in a cross-modal situation.
As an extension of the BIA+ models, the BLINCS model (Shook & Marian, Reference Shook and Marian2013) also accounts for cross-linguistic interaction across multiple levels of processing, especially for audiovisual integration during speech comprehension for different-script bilinguals. According to its time course of word recognition, early stages of word processing are characterized by parallel bottom-up activation of representations when bilinguals hear one language. A higher-level IC mechanism governs lexico-semantic organization of both languages at later stages. The results of the current study can also be explained under this model.
4.2. Limitations and future directions
However, there are a few limitations that need to be improved in future studies. First of all, the present study investigated the temporal dynamics and distinct roles of cross-linguistic orthographic and phonological activation in the processing of L2 words by using an explicit translation recognition task. In contrast to previous studies using the masked priming (Jiang, Reference Jiang1999; Wen & van Heuven, Reference Wen and van Heuven2018; Xia & Andrews, Reference Xia and Andrews2015; Zhang et al., Reference Zhang, van Heuven and Conklin2011) or implicit priming paradigm (Thierry & Wu, Reference Thierry and Wu2007; Wu & Thierry, Reference Wu and Thierry2010), the current findings may highlight the impact of top-down processes of the task schema system on the bilingual lexical competition. In future studies, we could further explore the bottom-up activation of orthography and phonology of both languages using a masked priming paradigm. Second, in comparison with the study by Ma and Ai (Reference Ma and Ai2018), which reported only significant orthographic interference effects for both less proficient and more proficient Chinese–English bilinguals in two L2–L1 unimodal translation tasks, the present behavioral results revealed both orthographic and phonological interference effects. The divergent findings may have been caused by several factors, such as English proficiency, types of stimuli, modality, and translation direction (L1–L2 vs. L2–L1). It is necessary to examine whether these factors would impact the results in future research.
Finally, we found a parietal–occipital distribution for the orthographic priming effect and a frontal distribution for the orthographic and phonological competition effects regardless of relatively lower spatial resolution of ERP measures. Recent neuroimaging studies of Chinese processing demonstrated that both orthographic and phonological processing involved multiple brain areas, such as left middle frontal gyrus, left superior parietal lobule, and left mid-fusiform gyrus (for a review, see Wu et al., Reference Wu, Ho and Chen2012). It is essential to further investigate the spatial dynamics of the cross-linguistic lexical effects, which will provide us with a more comprehensive picture of bilingual networks.
In conclusion, the present findings obtained with Chinese–English bilinguals demonstrated that the cross-linguistic orthographic and phonological activation during auditory L2 word processing had different temporal dynamics with both bottom-up parallel cross-linguistic activation and the top-down IC mechanism impacting the lexical organizations between two languages.
Data availability statements
The data and materials in the current study are available from the corresponding author on reasonable request, without undue reservation.
Conflicts of interest
The authors declare none.
Funding statement
This work was supported by the Graduate Education and Teaching Reform Research Project of Chongqing Municipality (Grant No. YJG223018), the Chongqing Social Science Planning Project (Grant No. 2021NDYB148), the Interdisciplinary Supervisor Team for Graduates Programs of Chongqing Municipal Education Commission (Grant No. ydstd1923), the Fundamental Research Funds for the Central Universities of China (Grant No. 2019CDJSK04PT26), and the Graduate Scientific Research and Innovation Foundation of Chongqing, China (Grant Nos. CYB20048 and CYB21047).