Introduction
The term “immersion education” was adopted in Canada during the 1960s to describe the programs where English-speaking children were taught through the medium of French (Cummins, Reference Cummins, Childs and Bostwick1998; Genesee, Reference Genesee, Bhatia and Ritchie2006). Nowadays, the term is used to describe the approach to the second language (L2) instruction, in which the usual curricular activities are conducted in the L2. There are different types of language immersion programs, such as Content and Language Integrated Learning (CLIL) popular in most European countries (De Graaff et al., Reference De Graaff, Jan Koopman, Anikina and Westhoff2007) or Study Abroad Programs (Dewey, Reference Dewey2004).
Studies that examined the effects of immersion education on linguistic performance have reached a relatively uniform agreement that immersion education conducted in an L2-dominant linguistic environment (e.g., study abroad) significantly improves L2 speaking proficiency (Hernández, Reference Hernández2010; Linck et al., Reference Linck, Kroll and Sunderman2009; Llanes & Muñoz, Reference Llanes and Muñoz2009; Segalowitz et al., Reference Segalowitz, Freed, Collentine, Lafford, Lazar and Díaz-Campos2004; Serrano et al., Reference Serrano, Llanes and Tragant2011; Tanaka & Ellis, Reference Tanaka and Ellis2003). In contrast, findings for other linguistic domains have been inconsistent (for a review see Pliatsikas & Chondrogianni, Reference Pliatsikas and Chondrogianni2015). There is increasing evidence that L2-learners who received traditional classroom-based L2 instruction outperformed those receiving L2-immersion instruction in linguistic abilities, such as grammatical and lexical abilities (Collentine, Reference Collentine2008; Parafita Couto et al., Reference Parafita Couto, Mueller Gathercole and Stadthagen-Gonzalez2015), pronunciation of selected features of phonology (Segalowitz et al., Reference Segalowitz, Freed, Collentine, Lafford, Lazar and Díaz-Campos2004), and verbal learning (Dahl & Vulchanova, Reference Dahl and Vulchanova2014). However, L2-immersion instruction was reported to positively correlate with performance on the lexical categorization task (Malt & Sloma, Reference Malt and Sloman2003), while years of classroom-based L2 training was negatively associated with lexical performance (Zinszer et al., Reference Zinszer, Malt, Ameel and Li2014). This inconsistency may be attributed to a range of variables such as self-confidence and attitudes (Serrano, Reference Serrano2011; White & Turner, Reference White and Turner2005), L2 proficiency (Hernández, Reference Hernández2010; Tanaka & Ellis, Reference Tanaka and Ellis2003), length of L2 immersion (Malt & Sloma, Reference Malt and Sloman2003), age of earliest L2 exposure, code-switching patterns, classroom instruction (Zinszer et al., Reference Zinszer, Malt, Ameel and Li2014), and focus of L2 instruction (Collentine, Reference Collentine2008). In addition, it has been shown that the manner in which learners acquire their L2 (e.g., intensive vs. regular learning style) is highly correlated with L2 linguistic performance; intensive L2 input and exposure predict better L2 competence (Collins et al., Reference Collins, Halter, Lightbown and Spada1999; Freed et al., Reference Freed, Segalowitz and Dewey2004; Serrano et al., Reference Serrano, Llanes and Tragant2011). However, little is known whether these two pathways of language acquisition (i.e., L2 instruction vs. immersion) might have a different impact on the cognitive effects associated with language learning.
As for the cognitive consequences of language immersion, most studies have examined these cognitive effects in children who were enrolled in immersion education, which provides a nativelike linguistic environment. Prior research has suggested a positive effect of L2 immersion on cognitive control. For example, Nicolay and Poncelet (Reference Nicolay and Poncelet2013) compared the cognitive performance of children enrolled in English immersion classes for 3 years and those enrolled in monolingual French-speaking classes. The results demonstrated that L2-immersion positively influenced attentional/executive control, such as alerting, auditory selective attention, divided attention, and mental flexibility, but not interference inhibition. Their follow-up study (i.e., Nicolay & Poncelet, Reference Nicolay and Poncelet2015) used a longitudinal design, and the results were consistent with their previous findings. Similar studies on benefits associated with language immersion have been reported with respect to general cognitive abilities, such as intelligence (Woumans et al., Reference Woumans, Surmont, Struys and Duyck2016), verbal working memory, and word learning (Kaushanskaya et al., Reference Kaushanskaya, Gross and Buac2014). These cognitive benefits were correlated to the level of L2 proficiency and length of immersion (Bialystok & Barac, Reference Bialystok and Barac2012).
However, other studies reported no cognitive effects of L2 immersion in children, suggesting that these effects might be confounded by language proficiency (Poarch & van Hell, Reference Poarch and van Hell2012) or other types of experiences (Carlson & Meltzoff, Reference Carlson and Meltzoff2008), such as musical experience (Janus et al., Reference Janus, Lee, Moreno and Bialystok2016), video-gaming experience (Vulchanova et al., Reference Vulchanova, Aurstad, Kvitnes and Eshuis2015), physical activities (e.g., martial arts, training in aerobics, and yoga), and games (Diamond Reference Diamond2012; Diamond & Lee, Reference Diamond and Lee2011). Simonis et al. (Reference Simonis, Van der Linden, Galand, Hiligsmann and Szmalec2020) found that the unmatched background variables between groups (i.e., age, socioeconomic status [SES], and intelligence) could lead to a null result when assessing the cognitive differences through an across-group comparison. In addition, these confounding variables are complex and interact with each other. For example, a study with Chinese-English bilinguals showed that L2 proficiency along with SES and general intelligence were strong predictors of performance on conflict monitoring (Xie & Pisano, Reference Xie and Pisano2018). Given that the bilingual experience itself is fluid, complex, and dynamic (Deluca et al., Reference Deluca, Rothman and Pliatsikas2019; Luk & Bialystok, Reference Luk and Bialystok2013), there are some experience-based factors (e.g., linguistic environment of L2 acquisition, L2 exposure, and L2 proficiency) that might lead to different linguistic and cognitive consequences. Therefore, a longitudinal design has been strongly recommended to assess the cognitive effects associated with language immersion (Nicolay & Poncelet, Reference Nicolay and Poncelet2013, Reference Nicolay and Poncelet2015; Simonis et al., Reference Simonis, Van der Linden, Galand, Hiligsmann and Szmalec2020; Woumans et al., Reference Woumans, Surmont, Struys and Duyck2016), which allows for correlations between experience-based factors and language immersion-induced cognitive changes.
Study abroad, a specific immersion education program, provides a naturalistic linguistic environment, which allows L2-learners to have more social interaction in L2 and dramatically enhances L2 input in terms of quantity and quality (DeKeyser, Reference DeKeyser and DeKeyser2007). The Adaptive Control Hypothesis (ACH; Green & Abutalebi, Reference Green and Abutalebi2013) has been proposed to describe how cognitive control processes adapt to the linguistic contexts, in which individuals are engaged with different modes of language interaction (i.e., single language, dual language, and dense code-switching). For example, in a single language context, the cognitive demands on interference control (i.e., conflict monitoring and interference suppression) are significantly increased in L2 learners for selecting the target language while inhibiting the nontarget language. Specifically, in a single and L2-dominant linguistic environment, L2 learners are hypothesized to rely more on interference control to adjust to the linguistic environment. More empirical evidence has shown that the linguistic environment has impact on individuals in both monolinguals and bilinguals. For instance, monolinguals who lived in a multilingual environment compared to those living in a monolingual environment showed enhanced brain’s receptivity to learning a new language (Bice & Kroll, Reference Bice and Kroll2019). Another study found changes in the structure of a dynamic system (i.e., the cerebellum, forceps minor, caudate, and hippocampus) that showed an overall adaptation within the language control networks in adult sequential bilinguals, who did not receive any linguistic learning or training while living in the L2-dominant environment (DeLuca et al., Reference Deluca, Rothman and Pliatsikas2019). These observations suggest that the linguistic context (i.e., an L2-dominant linguistic environment) plays an important modulating role in individuals’ language learning and cognitive processes in relation to language usage and control.
Given the limited research on the cognitive consequences of different pathways of L2 acquisition (i.e., L2-instruction vs. L2-immersion) in young adults within an L2-dominant linguistic environment, the current study investigates these cognitive effects while considering the theoretical and methodological aspects of previous research. Specifically, we aim to explore (1) whether different pathways of language acquisition affect cognitive control differently by comparing performance for cognitive tasks in two groups of Chinese-English bilinguals who continued their L2 learning through instruction and immersion education, respectively; and (2) to what extent the L2-dominant linguistic context modulates these cognitive effects through a longitudinal design to track the potential cognitive changes induced by different language experiences and a naturalistic linguistic environment. The instruction and immersion groups were comparable on background measures (e.g., age, SES, and intelligence). To avoid the issue of task impurity (Miyake & Friedman, Reference Miyake and Friedman2012) and limitations associated with the use of a single indicator to measure a single aspect of cognitive functions (von Bastian et al., Reference von Bastian, Souza and Gade2016), we adopted a testing battery comprised of a series of theoretically motivated and commonly used experimental tasks that engage specific aspects of cognitive control (e.g., inhibitory control) in both the visual and auditory domains. The selection of the nonlinguistic cognitive tasks was based on the fact that the nature of language learning entails multiple domains, such as visual (e.g., reading and writing), auditory (e.g., listening and speaking), and working memory (e.g., language processing).
A novel aspect of the study is the use of a longitudinal design with four repeated assessments. This design has the following advantages: (1) the traditional cross-sectional comparative (e.g., immersion vs. nonimmersion) designs may not fully capture the effects of experience-based factors (e.g., language exposure and linguistic environment) on cognitive changes induced by language immersion (Deluca et al., Reference Deluca, Rothman and Pliatsikas2019), whereas a longitudinal design can examine how specific experience-based factors are reflected in executive functions, and crucially how they change over time; and (2) previous longitudinal design normally uses two testing points (e.g., pretest vs. posttest), the design of four testing points is relatively stable and reliable to trace within-subject changes, which can provide a better understanding of changes in cognitive performance over time due to the two pathways of language acquisition as well as L2-dominant linguistic environment. The main potential concern of this design, however, would be a practice effect (see “Discussion” section).
Method
Participants
Two groups of Chinese students were recruited from the University of Edinburgh. None of them had lived in any country other than China before and they participated in this study just after they arrived at Edinburgh; thus, the length of naturalistic exposure to English during the testing session was the same. One group of participants (n = 39) were prospective master students who attended a 10-week Pre-sessional English (PSE) course (i.e., L2-instruction group) in June, starting prior to the university master courses in autumn. The L2-instruction group received an intensive language training course. The second group of participants (n = 38) did not attend the language course (i.e., L2-immersion group) but started directly to attend regular subject-based academic courses taught in English in September. The L2-immersion group received a specific “immersion education.” There were three and seven participants from the College of Science and Engineering in the L2-instruction and L2-immersion group, respectively, and the others were from the College of Arts, Humanities, and Social Science. All participants were native speakers of Mandarin Chinese and English was their main second language (L2). In the L2-instruction group, 11 participants reported having learned an L3: French (2), Japanese (8), and Korean (1), and 4 participants reported having learned an L4: Cantonese (1), French (1), German (1), and Korean (1). In the L2-immersion group, 20 participants reported having learned an L3: French (7), German (1), Japanese (8), Korean (3), and Spanish (1), and one participant reported to have learned Spanish as L4. The self-reported proficiencies in L3s and L4s were very low: average scores are 0.96 and 0.46, respectively (scale: 0–5). Two participants in each group did not attend the last session of testing, but their data from the prior three testing sessions were included in the analyses. Participants’ demographic information is presented in Table 1.
1 APM scores were the number of corrected items (the total number was 36).
2 This a composite score based on parental education level given by the summed scores of both parents. The scale ranged from 1: primary school, 2: middle school, 3: high school, 4: bachelor’s or equivalent to, to 5: postgraduate. Frequency of language use: never, 1; yearly, 2; monthly, 3; weekly, 4; daily, 5
3 The average hours of practicing L2 per week over the 10 weeks, activities include watching TV, classes, reading books, conversation in L2, listening to radio, reading, and internet.
4 IELTS is an international standardized test of English language proficiency for nonnative English language speakers. The score ranged from 0 to 9 (high score means better performance), in 0.5 band increments.
5 The scale of self-reported L2 proficiency ranged from 0–5, marked by “none” to “fluent.”
* Significant differences between L2-instruction group and L2-immersion group.
The 10-week PSE is a course for prospective students preparing to enter a postgraduate degree program, which takes place in summer prior to the university’s opening date. Students who attend the PSE usually have not reached the language requirement for their master programs. The PSE program consists of two parts: English for Academic Purposes (EAP; 6 weeks) and English for Specific Academic Purposes (ESAP; 4 weeks). The EAP is divided into two 3-week courses (EAP1, weeks 0–3; EAP2, weeks 3–6), which focus on writing (paraphrasing, summarizing, and synthesizing), listening (lecture listening and note-taking), speaking (discussion and scenarios), reading (reading academic texts), and academic language study. The ESAP particularly focuses on the development of subject-based academic writing and presentation skills of specific subjects. The course consists of four 50-minute morning classes every working day. Based on the design of the EAP language program, we conducted longitudinal testing across four sessions: at the beginning of language course (W0, baseline), end of EAP1 (W3), end of EAP2 (W6), and end of ESAP (W10). Both groups received the same procedure of testing with four repeated sessions, but the L2-instruction group started earlier than the L2-immersion group (see Figure 1).Footnote 1
Background measures
The Raven’s Advanced Progressive Matrices
The Advanced Progressive Matrices (APM) (Raven & Foulds, Reference Raven and Foulds1962) was used as a control of nonverbal general intelligence. Consistent with a previous study (Xia et al., Reference Xia, Bak, Sorace and Vega-Mendoza2022), we adopted Set I (i.e., Item 5 or Item 7) as practice and Set II as the experimental test. The design of the matrices ensures that the demand of the level gradually increases with the items. Participants were instructed to complete the matrices item by item in 10 minutes and were told that if they were having difficulty with a specific item, they could guess the answer and continue to the next one. They started with Item 1 and had to answer as many items as they could. The results were scored as the number of correct items for each participant.
Questionnaires
Two English version of the questionnaires were employed during the first and the last sessions, respectively. The pretest questionnaire was to collect general demographic information (e.g., age and gender), IELTS (i.e., International English Language Testing System) scores, and self-reported language proficiency before the testing. Using 6-point scales (i.e., 0–5, marked from “none” to “fluent”), participants rated their speaking, understanding, reading, and writing skills in every language they had learned. Information on other confounding variables that might affect executive performance was also collected, including the age of L2 acquisition (AoA), parents’ education (as an index of SES), musical experience, and video-gaming experience. The posttest questionnaire was to collect the information on their self-reported English proficiency after 10 weeks, language usage, and the time they spent practicing/learning the L2.
Experimental tasks
Four nonlinguistic cognitive tasks were employed to measure different aspects of executive functions. In the computerized tasks, all stimuli were presented with E-Prime (version 2.0) on a 17-inch computer screen. A schematic representation of each task is depicted in Figure 2.
Attention Network Task
This task is a well-established assessment of attentional capacities (i.e., alerting, orienting, and inhibition) (Fan et al., Reference Fan, McCandliss, Sommer, Raz and Posner2002), which has been used to investigate the effects of bilingualism on attentional abilities (e.g., Costa et al., Reference Costa, Hernández and Sebastián-Gallés2008). We used this task to investigate the impact of L2 instruction and linguistic context on these attentional capacities. In this task, participants were instructed to respond to the central arrow of the horizontal five arrows presented in the middle of the screen either below or above a fixation cross. There were three types of trials: congruent, neutral, incongruent, and four cueing conditions: single, double, center, and no cue. Three attentional indices were obtained calculating the difference in RTs/accuracy rate between the following trials: Attention Network Task (ANT) conflict (congruent vs. incongruent); ANT alerting (double-cue vs. no-cue); ANT orienting (center-cue vs. single-cue). Participants started with a practice block consisting of 24 trials and followed by three experiment blocks of 96 trials each. Feedback on performance was only provided in the practice block.
Number Stroop task
To avoid any linguistic influence, a numerical version of the Stroop task was used (adapted from Hernández et al., Reference Hernández, Costa, Fuentes, Vivas and Sebastián-Gallés2010). This task is well documented to measure inhibitory control (i.e., Stroop effect) (Stroop, Reference Stroop1935) and is the primary focus of the current study. In this task, participants were asked to count digits or symbols presented on the center of the screen by pressing the keys 1, 2, or 3 on the keyboard while ignoring the numerical value of the digits. There were three experimental conditions: congruent, incongruent, and neutral condition. Inhibitory control was assessed by the difference in RTs/accuracy rate between incongruent and congruent trials. Participants were given a practice block with 18 trials and followed by two experimental blocks of 90 trials each. Feedback on performance was only provided in the practice block.
Test of Everyday Attention
The Test of Everyday Attention (TEA) (Robertson et al., Reference Robertson, Ward, Ridgeway and Nimmo-Smith1994) is a well-established clinical assessment of attention, which has three parallel versions (i.e., A, B, and C) to avoid practice effects. Most studies examining the cognitive effects associated with language learning/bilingualism have focused on the visual attentional domain. We selected the three subtests of Elevator Tasks to explore these effects in the auditory domain. To avoid practice effects, the three parallel versions were used in the order of A-B-C-A in the current study. All tasks were presented through media player with a headset.
-
(a) Elevator with Counting (EC: 7 trials): This task assesses sustained attention. Participants were asked to count tones of the same pitch presented at irregular intervals.
-
(b) Elevator with Distraction (ED: 10 trials): This task assesses auditory selective attention/inhibition. Participants were asked to count low tones while ignoring interspersed high tones.
-
(c) Elevator with Reversal (ER: 10 trials): This task assesses auditory attentional switching (auditory-verbal working memory). Participants were presented with high, middle, and low tones. They had to count the middle tones while the high and low tones indicated the counting direction (upward and downward, respectively).
Corsi Tapping Task
The Corsi Tapping Task (CTT) was adapted from the Wechsler Memory Scale-III (WMS-III, Wechsler, Reference Wechsler1997), which was established as a measure of visuospatial memory span (forward condition), working memory (backward condition), and mental rotation (rotated condition). Working memory plays a crucial role in L2 acquisition (Ardila, Reference Ardila2003). Previous studies have shown that the management of two languages results in more efficient executive processing, including working memory (Grundy & Timmer, Reference Grundy and Timmer2017) and spatial reasoning (Greenberg et al., Reference Greenberg, Bellana and Bialystok2013). This task began with the simple forward condition and ended with the difficult rotated condition. It was presented on a plastic whiteboard (27.5 cm × 21cm) with 10 blue cube-shaped numbered blocks (3 cm × 3 cm; from 1 to 10), but the numbers were only visible to the experimenter. The experimenter tapped on blocks sequences at a rate of approximately 1 second per block, with sequences varying in length from two to nine blocks. Participants reproduced the sequences in their original order (beginning with two two-block sequences) in the forward condition and reversed the order in the backward condition. The tasks stopped when participants made errors on both sequences at a given length (e.g., six-block sequences).
In addition to forward and backward conditions (Bialystok et al., Reference Bialystok, Craik and Luk2008), the rotated condition was used to measure mental rotation (adapted from Keehner & Gathercole, Reference Keehner and Gathercole2007). In this condition, there were two identical whiteboards, with numbers facing the experimenter only and one of the boards was rotated for 180° from the other one. The experimenter tapped on the blocks in a predetermined sequence, and the participants had to reproduce the sequences with the same blocks and the same order on the rotated board. The sequences varied in length from one to five blocks. The task stopped when the participants make errors on all six sequences at a given length. This task started with four one-block practice trials, and there were no practice trials in the forward and backward conditions. As for scoring, participants got one point for each correctly tapping trial, and results were scored as the percentage of accuracy based on possible number of trials.
Statistical analyses
All analyses were conducted using the lme4 packages (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (Version 3.6.1, R Core Team, 2019) and the lmerTest package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) was used to compute p-values and conduct pairwise comparisons to test significant interactions.
In the initial analysis, background variables with a continuous scale (e.g., age and SES) were analyzed using one-way ANOVA, while other experience measures on a nominal scale (e.g., gender) were using the chi-squared test. The factors that indicated group differences were added to models in the main analyses. In the main analyses of the RT-based tasks, the linear mixed-effect models (LMMs) were employed. The main motivation for using LMMs in favor of traditional analysis is that LMMs are robust against unbalanced datasets and based on by-subjects and by-items analyses (Baayen et al., Reference Baayen, Davidson and Bates2008). Moreover, LMMs offer additional benefits to SLA researchers: It gives more weight to participants with more data; it can change over time according to how L2 acquisition develops; and it includes all raw data and has more statistical power (Cunnings, Reference Cunnings2012). Before data analyses, following Xia et al. (Reference Xia, Bak, Sorace and Vega-Mendoza2022), we excluded the trials for which RTs were outside of 3-SD of each participant mean across all trial types and RTs associated with incorrect responses. Thus, the total trials excluded in each task were the following: 2.87% and 4.32 % trials in the ANT and Stroop, respectively.
In the LMMs, RTs was the dependent variable, Group (i.e., L2-instruction group vs. L2-immersion group), Session (i.e., four repeated sessions: W0, W3, W6, and W10), and Trial Type (e.g., congruent and incongruent trials in the model for the conflict effects) were fixed variables unless specified, and participants and items were random variables (i.e., including random intercepts for each participant and item). Fixed variables were allowed to interact with each other in a single model. The Trial Type was varied according to the effects of interest; thus, there were four models in the ANT (i.e., Overall RTs, Conflict, Alerting, and Orienting) and two models in the Stroop (i.e., Overall RTs and Stroop). Another LMM was run with the fixed variables that indicated significant main/interaction effects to demonstrate the differences between different levels of the fixed variables. Because the accuracy rate was relatively high in the RT-based tasks (i.e., ANT: 98.71%; Stroop: 97.57%), therefore, we only analyzed the average accuracy rate across the four sessions. Only the interactions were reported, which was for reasons of brevity and relevance to the research questions. For the three subtests of the TEA and the CTT, the accuracy rate was obtained based on the number of correct responses, therefore linear regression models (LRs) were used with the accuracy rate as a dependent variable, Group and Session as fixed variables. In the analysis of self-reported English proficiency, LRs were used with the proficiency scores as a dependent variable, Group and Testing Time (i.e., pretest vs. posttest) as fixed variables. The fixed and random effects outputs of models are presented in supporting materials.
We checked the model by removing/adding slopes to select the best-fitted models (Baayen et al., Reference Baayen, Davidson and Bates2008). The results showed that only adding slope for each participant fitted the best model. Because there were four testing points, we added a quadratic term to the model to allow us to test whether the effect of session became progressively stronger or leveled off (Steinberg et al., Reference Steinberg, Icenogle, Shulman, Breiner, Chein, Bacchini, Chang, Chaudhary, Giunta, Dodge, Fanti, Lansford, Malone, Oburu, Pastorelli, Skinner, Sorbring, Tapanya, Tirado and Takash2018). We checked to add a quadratic term or not and found that it did not change the results. To make our model simple and easy to be converted, we excluded the quadratic term from the main analyses. Due to the multiple levels of the session (i.e., 4 levels), following Spronken et al. (Reference Spronken, Holland, Figner and Dijksterhuis2016), we used the anova function to return F statistics corresponding to the sequential decomposition of the overall effects of fixed variables, and the summary function to return t statistics corresponding to the comparisons between different levels of fixed variables as well as their interactions (Bates et al., Reference Bates, Mächler, Bolker and Walker2015; Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017). We additionally calculated pseudo R 2 values using the r.squaredGLMM function of the package MuMIn for the LMMs (Barton, Reference Barton2014), which provides a value for marginal R 2 (variance explained by fixed effects) and a value for conditional R 2 (variance explained by both fixed and random effects). We used the r.squaredLR function for the LRs, which provides a value for R 2 and a value for adjusted R 2.
Results
Initial analyses
Background measures
No group differences were found on age, age of attending primary school, SES, AoA, musical experience, video-gaming experience, and Raven’s APM scores (all ps > .05), indicating comparable fundamental cognitive abilities in the two groups (see Table 1).
Language-related variables
The two groups did not differ on average hours of practicing L2 over the 10 weeks (p > .05). Due to the design of the PSE program, the class time (i.e., formal English-instruction classes in the L2-instruction group vs. classes of academic program conducted in English in the L2-immersion group) was longer in the L2-instruction group than the L2-immersion group (p < .001). As to the IELTS, as expected, the L2-immersion group obtained significantly higher scores relative to the L2-instruction group on overall scores as well as subcategories (i.e., speaking, listening, reading, and writing) (all ps < .05). In terms of previous frequency of language use, the L2-immersion group showed higher frequency of language use in the adulthood and in the last two years than that of the L2-instruction group (all ps < .05), which is consistent with the group difference in language proficiency. In the analysis of self-reported L2 proficiency, the main effect of group was significant: The L2-immersion group showed higher scores than L2-instruction group on the overall proficiency (Est = 0.98, 95% CI [0.29, 1.67], t = 2.82, p = .005). The main effect of Testing Time was significant: significant increase on overall self-reported proficiency at posttest (Est = 1.02, 95% CI [0.33, 1.71], t = 2.94, p = .004). The interaction between Group and Testing Time was not significant (p = .27) (R 2 = .11, R 2 adjusted = .09).
Prestage analyses
To investigate the effects of the L2 experience (i.e., immersion vs. instruction), we compared the performance of two groups at the first testing point (i.e., W0) and did not observe group differences in any of the cognitive measures (all ps > .05). To longitudinally explore when the changes occur and examine the degree of change over the four repeated sessions, we compared the performance between testing points (i.e., W0–W3 vs. W3–W6 vs. W6–W10). These analyses were run by changing the reference level in the original models from “W0” to “W3” and “W6,” respectively. Changes were observed between the testing points W0–W3 in most measures (details see “Main Analyses”), and no changes were found between the testing points W3–W6 and W6–W10 in any of the cognitive measures (all ps > .05). These findings indicated that these observed changes occurred at the earlier stage of testing session and lasted until the last session.
Main analyses
Attention Network Task
Mean RTs on the respective trial types are given in Table 2.
Note: Conflict: incongruent vs. congruent trials; Alerting: no-cue vs. double-cur trials; Orienting: center-cue vs. single-cue trials.
Overall performance
No main effects or interaction effects were found on the overall RTs (marginal R 2 = .003, conditional R 2 = .38) and accuracy performance (all ps > .05), indicating that there was no practice effect.
Alerting effect
The Alerting effect was significant, indicating faster responses on double-cue trials than on no-cue trials (Est = 55.66, 95% CI [28.66, 82.66], t = 4.04, p < .001). No other main effects or interactions were significant (all ps > .05) (marginal R 2 = .063, conditional R 2 = .376).
Orienting effect
The Orienting effect was significant, indicating faster responses on single-cue trials than on center-cue trials (Est = 30.02, 95% CI [1.65, 58.39], t = 2.07, p = .038). No other main effects or interactions were significant (all ps > .05) (marginal R 2 = .022, conditional R 2 = .375).
Conflict effect
The Conflict effect was significant, indicating faster responses on congruent trials than on incongruent trials (Est = 66.30, 95% CI [45.47, 87.12], t = 6.24, p < .001). The interaction between Conflict effect and Session was significant, indicating improvements in inhibition with testing sessions starting from W3 (W3 vs. W0: Est = –13.35, 95% CI [–20.05, –6.64], t = –3.90, p < .001; W6 vs. W0: Est = –20.88, 95% CI [–27.59, –14.17], t = –6.10, p < .001; W10 vs. W0: Est = –25.05, 95% CI [–31.85, –18.25], t = –7.22, p < .001). No other main effects or interactions were significant (all ps > .05) (marginal R 2 = .090, conditional R 2 = .4) (see Figure 3).
Number Stroop Task
Mean RTs on the respective trial types are given in Table 3.
Note: Stroop effect: incongruent vs. congruent trials.
Overall Performance
The overall accuracy did not change but mean RTs decreased over testing sessions, starting from W3 (W3 vs. W0: Est = –43.74, 95% CI [–67.63, –19.86], t = –3.59, p < .001; W6 vs. W0: Est = –56.95, 95% CI [–80.84, –33.06], t = –4.67, p < .001; W10 vs. W0: Est = –59.58, 95% CI [–83.88, –35.28], t = –4.81, p < .001). No other main effects or interactions were significant (all ps > .05) (marginal R 2 = .035, conditional R 2 = .426).
Stroop effect
There were main effects of Stroop effect [F (1, 7.22) = 44.98, p < .001] and Session [F (3, 295.01) = 11.00, p < .001], and a significant interaction between them [F (3, 294.30) = 10.90, p < .001]: improvements in inhibitory control with testing sessions starting from W3 (W3 vs. W0: Est = –20.06, 95% CI [–29.46, –10.65], t = –4.18, p < .001; W6 vs. W0: Est = –20.43, 95% CI [–29.84, –11.03], t = –4.26, p < .001; W10 vs. W0: Est = –25.40, 95% CI [–34.96, –15.83], t = –5.20, p < .001). No other main effect or interactions were significant (all ps > .05) (marginal R 2 = .094, conditional R 2 = .446) (see Figure 4).
Test of Everyday Attention
Performance on each sub-task of the TEA is given in Table 4.
In the EC, no main effects or interaction effects were found (all ps > .05) (R 2 = .026, R 2 adjusted = .003). The average accuracy rate of the two groups was 98.40%, indicating a ceiling effect on performance. In the ED, the main effect of Group was significant: the L2-instruction group performed better on selective attention/inhibition than the L2-immersion group did (Est = –6.43, 95% CI [–11, –1.87], t = –2.77, p = .006); the main effect of Session was significant: increased performance with testing sessions from W3 (W3 vs. W0: Est = 6.43, 95% CI [1.87, 11], t = 2.77, p = .006; W6 vs. W0: Est = 18.57, 95% CI [12.56, 24.58], t = 6.08, p < .001; W10 vs. W0: Est = 19.39, 95% CI [13.3, 25.48], t = 6.27, p < .001). No interaction effect was found (p = .73) (R 2 = .179, R 2 adjusted = .159). In the ER, only the main effect of Session was significant, indicating enhanced performance in attentional switching with testing session from W3 (W3 vs. W0: Est = 7.4, 95% CI [0.55, 14.25], t = 2.13, p = .034; W6 vs. W0: Est = 12.6, 95% CI [5.75, 19.45], t = 3.62, p < .001; W10 vs. W0: Est = 16.09, 95% CI [9.14, 23.03], t = 4.56, p < .001). No other main effect or interaction effect were significant (all ps > .05) (R 2 = .079, R 2 adjusted = .057) (see Figure 5).
Corsi Tapping Task
Performance on each sub-task of the Corsi Tapping Task is given in Table 5.
In the forward condition, no main effects or interaction effects were found (all ps > .05) (R 2 = .028, R 2 adjusted = .005). The average accuracy rate was 53.19%. In the backward condition, the main effect of Session was significant: enhanced performance in working memory with testing sessions starting from W6 (W6 vs. W0: Est = 3.41, 95% CI [0.38, 6.44], t = 2.21, p = .028; W10 vs. W0: Est = 4.17, 95% CI [1.1, 7.24], t = 2.67, p = .008) (R 2 = .039, R 2 adjusted = .016). No other main effect or interaction effect were significant (all ps > .05). In the rotated condition, the main effect of Session was significant: enhanced performance in mental rotation with testing sessions starting from W3 (W3 vs. W0: Est = 5.85, 95% CI [1.2, 10.5], t = 2.47, p = 0.014; W6 vs. W0: Est = 8.64, 95% CI [3.98, 13.29], t = 3.65, p < .001; W10 vs. W0: Est = 10.74, 95% CI [6.02, 15.45], t = 4.48, p < .001) (R 2 = .082, R 2 adjusted = .06). No other main effect or interaction effect were significant (all ps > .05) (see Figure 6).
Discussion
The aim of this study was to longitudinally investigate the effect of different pathways of language acquisition (i.e., instruction vs. immersion) and linguistic context (i.e., L2-dominant linguistic environment) on cognitive functions in young adult Chinese speakers learning English, who moved from China and continued their L2 learning in the United Kingdom. The two groups of participants involved in the current study mainly differed in terms of language acquisition and were comparable on background measures (e.g., demographic and social backgrounds; nonverbal intelligence and SES), life experience–based variables (e.g., musical and video-gaming experience), as well as language-related variables (e.g., AoA and exposure to L2-dominant linguistic environment). A four-repeated-assessment longitudinal design was used to track the possible changes on cognitive functions over 10 weeks. Cognitive performance was assessed through four nonlinguistic cognitive tasks: the ANT, the Number Stroop task, the Elevator subtests of the TEA (EC, ED, and ER), and the CTT, providing multiple-dimensional measures of cognitive functions in both the visual and auditory domains.
Both groups showed similar performance in all cognitive measures at the first testing session, suggesting comparable cognitive performance at the baseline (i.e., W0). Analyses to trace the degree of change over 10 weeks showed that these changes occurred at the earlier stage of testing session and lasted until the last session. The longitudinal results demonstrate a similar enhancement of cognitive performance in both groups in tasks assessing visual and auditory inhibition, attentional switching, and mental rotation, indicating an overall effect of linguistic context. The only significant group difference was observed in the auditory inhibition test, in which the L2-instruction group showed an overall superior performance than the L2-immersion group, suggesting a specific cognitive effect of language experience.
Specifically, both groups showed a comparable enhancement of inhibitory control in both visual and auditory domains over 10 weeks, suggesting a general effect of linguistic context. Given that both groups moved from their L1 country to the L2 English country, the change of linguistic environment (i.e., from L1-dominant to L2-dominant) might account for these enhancements. According to the Adaptive Control Hypothesis (Green & Abutalebi, Reference Green and Abutalebi2013), L2 learners might require more interference control in a single language context. In the current study, the Chinese-English participants were living in an L2-dominant linguistic environment (i.e., a single language context), which may lead to enhanced inhibitory control. Specifically, to adjust to the L2 environment, our participants had to actively use their less-dominant L2 while inhibiting their dominant L1 to the furthest extent, especially at the early stage. When they became used to the L2-dominant environment, the increasing demands of related cognitive control might not be as significant as the beginning, which may explain the absence of changes between the testing sessions W3–W6 and W6–W10. Moreover, it has been suggested that the inhibition is greater when inhibiting the dominant L1 than when inhibiting the less-dominant L2 (Costa & Santesteban, Reference Costa and Santesteban2004). Hence, more cognitive processes were required for inhibiting the dominant L1 in our participants.
The only group difference was observed in auditory inhibition, where the L2-instruction group showed an overall superior performance than the L2-immersion group. This observation indicates a specific effect of language experience on cognitive performance. Previous research has shown that intensive L2 input and exposure predict better L2 competence (Collins et al., Reference Collins, Halter, Lightbown and Spada1999; Freed et al., Reference Freed, Segalowitz and Dewey2004; Serrano et al., Reference Serrano, Llanes and Tragant2011) as well as a significant improvement in an attentional switching task (i.e., auditory attentional switching; Bak et al., Reference Bak, Long, Vega-Mendoza and Sorace2016). Elmer et al. (Reference Elmer, Meyer, Marrama and Jäncke2011), for example, found that intensive language training can influence brain activity in regions involved in auditory attention using the event-related functional magnetic resonance imaging (fMRI). The L2-instruction group in the current study, who attended the PSE program, had longer class hours and received relatively more intensive language training than the L2-immersion group. This specific language experience in the L2-instruction group could lead to the observed superior performance in auditory inhibition than the L2-immersion group. This greater improvement in performance on auditory inhibition could be argued to be the result of greater inhibition effort for L1 in the instruction group given its lower L2 proficiency at W0 compared to the immersion group. It should be noticed that both groups showed comparable improvement in language proficiency over 10 weeks and similar performance in all cognitive measures at the first testing session even though the L2-instruction group had a relatively lower level of proficiency than the L2-immersion group at the baseline. This suggests that language proficiency is not a strong predictor of cognitive performance in the current study.
The effect of language experience on inhibitory control was limited to the auditory domain, but not the visual domain, which could be due to the characteristics of the selected tasks. As Ooi et al. (Reference Ooi, Goh, Sorace and Bak2018) explained that different tasks might differ in assessing attentional control: the ANT and Stroop task were motivated by a theoretical framework of attention and mostly used in an experimental setting, while the Elevator subtests of the TEA were ecologically valid because arguably closer to everyday activities. In the other words, the subtests of the TEA are relatively sensitive in capturing the potential group differences in cognitive behavior measurements in young adults. Moreover, these tasks tapped to different domains: the ANT and Stroop task measured the visual attention control, while the TEA measured the auditory attention control. Participants in the current study learned English mainly through the auditory modality (i.e., listening and speaking), which is reflected in the time of the L2 learning and usage. These might explain the fact that the L2-instruction group obtained an overall superior performance than the L2-immersion group for inhibitory control measured by the ED subtest of the TEA, but not by the ANT or Stroop task.
In addition to inhibition, a similar improvement of attentional switching abilities was observed in both groups. The results of the time spent on practicing L2 showed that our participants mainly used L2 in classes and the average hours were less than 8 hours per week. This meant that our participants might have frequently switched between their two languages after classes (e.g., communicating with their friends in their L1), which might have led to enhanced attentional switching abilities. Previous studies suggested that bilingualism could increase flexibility in mental-set shifting (Houtzager et al., Reference Houtzager, Lowie, Sprenger and De Bot2017; Prior & MacWhinney, Reference Prior and MacWhinney2010), which received supporting evidence from neuroimaging studies (Garbin et al., Reference Garbin, Sanjuan, Forn, Bustamante, Rodriguez-Pujadas, Belloch, Hernandez, Costa and Ávila2010; Gold et al., Reference Gold, Kim, Johnson, Kryscio and Smith2013; Green & Abutalebi, Reference Green and Abutalebi2013; Rodríguez-Pujadas et al., Reference Rodríguez-Pujadas, Sanjuán, Ventura-Campos, Román, Martin, Barceló, Costa and Ávila2013).
There is existing evidence that an increase in one type of cognitive abilities might positively affect other related abilities (Green & Bavelier, Reference Green and Bavelier2003). This could partly explain the improved performance on mental rotation. Mental rotation refers to the cognitive ability to process visual-spatial representations. McLeay (Reference McLeay2003), for example, investigated the link between bilingualism and spatial ability and reported that bilinguals showed superior performance in spatial tasks involving mental manipulation compared to monolinguals. Studies have suggested that bilingualism may enhance the ability to control selective attention to specific aspects of mental representations (Bialystok, Reference Bialystok1999; Bialystok & Martin, Reference Bialystok and Martin2004). More specifically, bilinguals may rely more heavily on visual or spatial strategies than their monolingual counterparts, leading to better performance on tasks involving mental imagery and spatial manipulation (Ransdell & Fischler, Reference Ransdell and Fischler1991). The enhancement of L2 proficiency and intensive exposure to L2 in the current study could lead to more efficient performance in tasks involving mental rotation over 10 weeks.
A significant decrease of overall response time over testing sessions was found in both groups in the Stroop task, but not in the ANT. This could be explained as a global RT effect associated with bilingualism as proposed by Hilchey and Klein (Reference Hilchey and Klein2011): a domain-general enhancement of executive control in bilinguals could predict an overall faster response. The overall RTs in the ANT was stable, which could be due to the demand level of the task. The relatively less-demanding ANT, as compared to the Stroop task, might not sufficient enough to exhibit any change (Qu et al., Reference Qu, Low, Zhang, Li and Zelazo2016). The alternative explanation could be the practice effect due to familiarity with the task processes and better performance with repeated testing (Donovan & Radosevich, Reference Donovan and Radosevich1999). However, this could not explain the stable overall RTs in the ANT. Notably, the average accuracy rate did not change in both tasks.
In contrast, stable performance was found in tasks measuring other aspects of attentional control (i.e., alerting and orienting in the ANT and sustained attention in the EC) and working memory (i.e., forward condition in the CTT). Specifically, all participants showed a ceiling effect in the EC and obtained an average accuracy rate around 50% in the forward condition of the CTT. In contrast to studies by Nicolay and Poncelet (Reference Nicolay and Poncelet2013, Reference Nicolay and Poncelet2015), in which significant enhancement on alerting was reported in the immersed children compared to the nonimmersed children. It has been explained that the immersed children had low proficiency in the L2, and thus had to maintain their continuous readiness for effortful processing, such as for monitoring and understanding the academic courses taught in L2. Such experience has led to an enhancement in alerting abilities by the immersed children. It should be noted that the studies by Nicolay and Poncelet (Reference Nicolay and Poncelet2013, Reference Nicolay and Poncelet2015) involved young children, where the potential differences on cognitive performance are relatively “easier” to capture. The behavioral cognitive measures might not be sufficient enough to capture the differences in the current study with young adults, who may perform at ceiling level (Bialystok et al., Reference Bialystok, Martin and Viswanathan2005).
There were no significant interactions among the fixed variables, but the figures seem to suggest a different speed of changes between the two groups on specific cognitive measures, such as conflict resolution in the ANT, auditory attentional performance in the subtests of TEA (i.e., ED and ER), and working memory in the relatively demanding backward condition in the CTT. The reason behind the nonsignificant interactions could be the time interval between the repeated assessments: a longer time interval might not capture the relatively earlier changes between the two groups. In the current longitudinal design, which was based on the timeline of the10-week PSE, there were four repeated measures (i.e., W0, W3, W6, and W10) and three times of intervals (i.e., 3 weeks, 3 weeks, and 4 weeks). Our results observed initial enhancement in cognitive measures between W0 and W3, but no enhancements between W3–W6 and W6–W10, indicating significantly initial improvement induced by the early-stage testing session. It is possible that there were group differences within Week 1 and the two groups reached comparable performance at Week 3, leading to a statistically nonsignificant interaction. This possible explanation receives supporting evidence from previous studies using the same Elevator subtests of the TEA (Bak et al., Reference Bak, Long, Vega-Mendoza and Sorace2016; Long et al., Reference Long, Vega-Mendoza, Rohde, Sorace and Bak2020), in which language learners showed comparable performance at baseline and displayed significant improvements in the ER at the end of 1-week intensive Gaelic course. Hence, in the future studies, the time interval between repeated assessments could be shorter within a longitudinal research design, such as W0, W1, W2, and W3.
The main limitation in the current study would be the existence of practice and familiarity effects for some tasks, which are well-recognized in psychology (Donovan & Radosevich, Reference Donovan and Radosevich1999). It might be argued that the changes over repeated testing could be due to practice effects, interpreted as automatization of performance (Raichle et al., Reference Raichle, Fiez, Videen, MacLeod, Pardo, Fox and Petersen1994). In the ANT, there is little evidence that there are major practice effects (Fan et al., Reference Fan, McCandliss, Sommer, Raz and Posner2002). In the current study, no practice effect was observed, as reflected by the stability of overall RTs and accuracy rate over the repeated testing. Furthermore, decreased RTs were only observed on the incongruent trials that involved conflict resolutions, not on the congruent trials, suggesting that the main changes observed in the inhibitory control measured by the ANT for both groups resulted from the L2-dominant linguistic context. In the TEA, three parallel versions were developed to avoid practice effects (Robertson et al., Reference Robertson, Ward, Ridgeway and Nimmo-Smith1994), as the longitudinal follow-up, for example, to evaluate neurorehabilitation, was an important part of the test design. The absence of practice effects in the TEA was confirmed in studies using the TEA to examine the impacts of intensive language learning on attentional functions reported no practice effects (Bak et al., Reference Bak, Long, Vega-Mendoza and Sorace2016; Long et al., Reference Long, Vega-Mendoza, Rohde, Sorace and Bak2020). Hence, changes in performance in our participants are more likely to be due to the language experience and L2-dominant linguistic context than to practice effects. However, no studies have examined practice effects in the Stroop and CTT, the performance assessed by the two tasks could not rule out the potential influence of practice effects. Importantly, practice effects cannot explain the observed difference (i.e., auditory inhibitory control) between the two groups, as we would expect them to be similar in both conditions. Likewise, differences between the groups are unlikely to have resulted from nonlinguistic effects of the experience of moving to another country, and experience which both groups have shared.
In conclusion, this study presents the first longitudinal design with four repeated assessments aimed at investigating the effects of L2 instruction versus immersion on cognitive functions whilst tracing the observed changes within young adult individuals, whose cognitive capacities are supposed to be at their peak levels. The results suggest a comparable enhancement in most executive functions (i.e., visual and auditory inhibition, attentional switching, and mental rotation) in both groups, but the L2-instruction group showed overall superior performance in auditory attention than the L2-immerison group. These observations suggest a general effect of linguistic context (i.e., L2-dominant linguistic environment) and a specific effect of language experience. Further research could employ a longitudinal design to focus on how specific factors related to the L2 acquisition and bilingual experience affect executive functions within individuals, instead of traditional between-group design (i.e., immersion vs. nonimmersion). Moreover, the linguistic context plays an important role in language acquisition and language control.
Acknowledgments
This work received support of School of Philosophy, Psychology & Language Sciences Research Support Grants.
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/S0272263122000158.