Introduction
A better understanding of domain-general cognitive mechanisms (e.g., attention, working memory, inhibitory control) and their connections to language learning and use have recently garnered increased attention in the field of second language acquisition (SLA) (Luque & Morgan-Short, Reference Luque and Morgan-Short2021; McManus, Reference McManus2021). This is, in part, because of the central role they are hypothesized to play within usage-based approaches to SLA: Learning a language is accomplished using the same mechanisms used for learning other skills. Relatedly, scholars have sought to explain variability in language learning outcomes as a result of individual differences in these mechanisms (Botezatu et al., Reference Botezatu, Guo, Kroll, Peterson and Garcia2022; Saito et al., Reference Saito, Sun, Kachlicka, Alayo, Nakata and Tierney2022). Inhibitory control, or inhibition, has surfaced as one potentially important executive function for language use, proposed as a cognitive mechanism to explain language selection in bilinguals who must continuously inhibit and/or activate relevant lexical items their competing languages (Green, Reference Green1998; Green & Abutalebi, Reference Green and Abutalebi2013). Inhibitory control, in fact, encompasses multiple cognitive processes related to the suppression of irrelevant/distracting information and/or prepotent (i.e., dominant/automatic) responses (Friedman & Miyake, Reference Friedman and Miyake2004) and is therefore potentially relevant to language learning and use in multiple ways. Although a variety of taxonomies exist for conceptualizing inhibitory control (see Friedman & Miyake, Reference Friedman and Miyake2004, for a summary), it is generally understood to consist of processes that vary according to whether they are (i) intentional/unintentional, (ii) behavioral/cognitive (i.e., a motor response or a mental process), and (iii) resisting interference from something previously learned (i.e., proactive interference) or from an immediately distracting stimulus.
Retrieval-induced inhibition has been argued to reflect one type of inhibitory control used during language processing: Through the process of lexical retrieval, non-retrieved lexical items are inhibited (Dijkstra & van Heuven, Reference Dijkstra and Van Heuven2002; Lev-Ari & Peperkamp, Reference Lev-Ari and Peperkamp2013). For instance, if one is asked to memorize a list of vocabulary belonging to a particular lexical category (e.g., animals) and then practices a subset of words from the list, the non-practiced words will be inhibited (i.e., have their activation suppressed) such that if one is asked to recall a non-practiced word on a subsequent task, retrieval time will be slowed. Furthermore, the stronger one’s individual inhibitory control, the more delayed the retrieval time would be. In this way, retrieval-induced inhibition represents unintentional, cognitive, resistance to proactive interference (Friedman & Miyake, Reference Friedman and Miyake2004), as it operates before awareness—as opposed to active, conscious suppression—and resists the influence of prior-learned information.
Although inhibitory control has been investigated relatively extensively in the field of bilingual language processing, particularly in relation to sentence comprehension and lexical access (Costa & Santesteban, Reference Costa and Santesteban2004; Filippi et al., Reference Filippi, Leech, Thomas, Green and Dick2012), Darcy, Mora, and Daidone (Reference Darcy, Mora and Daidone2016) was a pioneering study that extended these ideas to L2 phonological processing. They provided preliminary evidence that competition-based accounts of language use might explain individual differences in L2 pronunciation. Motivated in part by the work of Lev-Ari and Peperkamp (Reference Lev-Ari and Peperkamp2013) demonstrating a relationship between inhibitory control and the amount of L1 attrition present in the perception and production of voiceless stops by English L1 learners of L2 French, Darcy et al. (henceforth DM&D) hypothesized that if inhibitory control could be shown to be related to L1 phonology, then it might also play a crucial role in L2 pronunciation skills: “being able to suppress the L1 more robustly could help L2 users reduce interference from their L1 segment categories during L2 use” (p. 745). As a result, variability in outcomes in L2 pronunciation could be the result of individual differences in inhibitory control.
To explore the relationship between inhibitory control and L2 phonological processing, DM&D compared the performance on a retrieval-induced inhibition task to L2 perception and production performance focusing on both consonant and vowel targets. Participants included Spanish L1 learners of English and English L1 learners of Spanish. L2 perception was measured using a speeded ABX categorization task; L2 production was measured using a delayed sentence repetition task; and L2 proficiency was measured as a function of vocabulary size using the X_Lex task (Meara, Reference Meara2005). The L2 perception speeded ABX categorization task was designed such that the same stimuli could be used for each L2 group, thus allowing for generalizability while simultaneously including an internal control mechanism for the materials. DM&D predicted that greater inhibitory control would be associated with higher accuracy in both perception (more accurate identification of target sounds) and production (more accurate production of target sounds) once proficiency was controlled. Instead, their findings appeared to indicate a stronger relationship between inhibition and perception—in comparison to production—which they suggested could be because perception relies on category formation while production additionally requires the inhibition of a dominant motor response (i.e., the articulation of an L1 segment). In other words, production skills may (additionally) rely on another category of inhibitory control—prepotent response inhibition (Friedman & Miyake, Reference Friedman and Miyake2004), or the intentional resistance of a dominant/automatic response. Therefore, the current study is a close replication of DM&D, with the variable modification of additional tests of inhibitory control measuring prepotent response inhibition.
One of the most common measures of prepotent response inhibition is the Stroop task (Stroop, Reference Stroop1935), which has also been demonstrated to be related to perceptual adaptation (Kim et al., Reference Kim, Clayards and Kong2020) and L1 speech perception (Lev-Ari & Peperkamp, Reference Lev-Ari and Peperkamp2014). In the classic version of the Stroop task, written words of color terms (e.g., green, blue) are presented in different colors of ink which either match or not the semantics of the word (e.g., seeing the word red in red ink and responding “red” vs. seeing the word red in blue ink and responding “blue”), and participants orally respond by indicating the color of the written word. More recent iterations (Bearden et al., Reference Bearden, Asgari, Sobel and Scoles2021; Gass & Lee, Reference Gass, Lee, Schmid and Lowie2011) involve manual versions of the task in which participants are trained to associate the ink colors with different keyboard button presses. Although the Stroop task is a classic test of prepotent response inhibition and thus an ideal candidate for use in the current replication, one final consideration relates to domain general rather than language-oriented response inhibition (Lev-Ari & Peperkamp, Reference Lev-Ari and Peperkamp2014; Linck et al., Reference Linck, Schwieter and Sunderman2012). The Simon task (Simon & Rudell, Reference Simon and Rudell1967), also a task that requires the inhibition of a prepotent response, has been argued to be non-linguistic/domain general (Linck et al., Reference Linck, Schwieter and Sunderman2012) because it involves identifying the spatial location of an object (e.g., whether a box is presented on the left of right side of a computer screen) rather than ignoring linguistic interference from written words.
A final point of consideration regarding the findings of DM&D is that the results did not appear to pattern in the same way for both consonant and vowel targets. For vowels, higher inhibitory control was related to more accurate perception, but no relationship emerged between inhibitory control and production. For consonants, higher inhibitory control was related to more accurate production, but no results were able to be reported for perception because the L2 English group performed at ceiling on the perception task. Therefore, to provide a complete picture of the relationship between inhibitory control and perception and production of both consonant and vowel targets, the current replication only recruited L2 Spanish participants, as this group showed sufficient variation on the consonant perception task in DM&D.
Motivation for the current replication
DM&D’s study provided an important contribution to the field, as it sought to better understand how inhibitory control might relate to perception and production in L2 speech. Critically, it demonstrated relationships between language abilities and general cognition, which provides support for usage-based accounts of L2 learning (Ellis, Reference Ellis2006; MacWhinney, Reference MacWhinney, Robinson and Ellis2008). To probe the reliability of the initial study’s findings and to better understand which types of inhibitory control are most robustly related to L2 perception and production, the current study is a close replication (Porte & McManus, Reference Porte and McManus2019, p. 73) of DM&D such that all major variables (e.g., participant characteristics, experimental tasks, materials, data analysis) remained unchanged except that inhibitory control was measured by two additional tasks: a Simon task and a Stroop task. Because the authors shared materials and tasks on IRIS (https://www.iris-database.org/details/xSmdt-ui4Xg), replication using the original tasks and materials was feasible.
Research questions and replication assessment
DM&D (2016) did not explicitly state research questions (RQs), but rather identified that their aim was “to examine the relationship between the strength of L2 learners’ inhibitory control and their accuracy in perceiving and producing L2 segments” (p. 745). The current study included three research questions with RQ1 intended to represent DM&D’s initial question and RQ2 and RQ3 representing the variable modification.
-
• RQ1: To what extent is inhibitory control, as measured via a retrieval-induced inhibition task, related to L2 learners’ accuracy in perception and production?
-
• RQ2: To what extent is inhibitory control, as measured via a Simon task, related to L2 learners’ accuracy in perception and production?
-
• RQ3: To what extent is inhibitory control, as measured via a Stroop task, related to L2 learners’ accuracy in perception and production?
Interpreting similarities and differences between the initial study and the current replication was done following the recommendations of Porte & McManus (Reference Porte and McManus2019): Means and standard deviations from the descriptive statistics of the two studies were used to compute Hedge’s g effect sizes and corresponding confidence intervals (CIs) (see also McManus & Marsden, Reference McManus and Marsden2018). To evaluate between-study differences, the field-specific guidelines of Plonsky & Oswald (Reference Plonsky and Oswald2014) were used such that Hedge’s g <. 40 and corresponding CIs passing through zero were taken to indicate negligible differences (see also McManus & Liu, Reference McManus and Liu2022). For the main inferential tests (correlation and regression), directionality (positive vs. negative) and magnitude of the effect size were considered.
Method
All materials, experimental and coding protocols, data, and analysis code are publicly available at https://osf.io/fxzvj/, and the preregistration is available at https://osf.io/w4gj2. Table 1 summarizes the tasks used in the initial study and current replication.
^ Original tasks/materials provided by DM&D on IRIS.
a Original task not on IRIS—software used in the current replication is provided on IRIS/OSF.
* Tasks added for replication.
Participants and overall design
A power analysis using G*Power (Faul et al., Reference Faul, Erdfelder, Buchner and Lang2009) based on the R2 values from the hierarchical multiple regression in DM&D indicated a target sample size of 60 participants for the current replication (see Supplementary Materials, S1).Footnote 1 Seventy L1 English learners of L2 Spanish recruited from classes at the University of Pittsburgh participated in the experiment. Following DM&D, participants were excluded if they were early simultaneous bilinguals or their native speaker (NS) status was unclear (n= 5); they indicated having a speech/hearing pathology (n = 1); they scored below two standard deviations (SDs) of the group mean on the control condition in the perception task (n = 3); or they exhibited an extreme value in the retrieval-induced inhibition task (n = 1). An additional two participants were excluded because of missing production data (n = 2). Table 2 provides a comparison of the demographic characteristics between the participantsFootnote 2 in DM&D and the final 58 L2 Spanish learners in the current replication.
Participant demographic information was gathered using the same language background questionnaire and motivation survey from the initial study. The motivation survey included nine statements related to language learning and use which participants rated on a 9-point Likert scale such that higher values correspond to greater motivation. To gauge L2 use, participants were asked to indicate how much (e.g., 0%, 1%–25%, 26%–50%) they used the L2 in certain contexts (e.g., texting, reading books). The percentage categories were converted to scores between 0 and 4 (e.g., 0% = 0, 76%–100% = 4) and an average score was calculated such that higher values correspond to greater use. As in the initial study, L2 proficiency was measured using X_Lex (Meara, Reference Meara2005), a test of receptive vocabulary, as well as through self-report ratings (1–5) of how well participants estimated they could speak, understand, read, and write in Spanish (with higher values corresponding to greater proficiency). As indicated in Table 2, the participant demographic characteristics between the current study and DM&D are numerically similar. Hedge’s g effect sizes and 95% CIs mostly indicated negligible effects. The only exception was self-evaluation of listening with a small effect and lower CI close to zero (g = 0.56 [0.02,1.10]).
Materials and procedure
The current study used the same instruments (https://www.iris-database.org/details/xSmdt-ui4Xg) and followed the same procedure as DM&D but incorporated two additional inhibition tasks. As in DM&D, data collection followed a similar order for each participant: production task, cognitive tasks, perception task, and vocabulary task. Data collection sessions lasted approximately 70 minutes. This research was approved by the University of Pittsburgh institutional review board; participants were compensatedFootnote 4 with a $20 Amazon gift card.
Perception
Perception was tested using the same speeded ABX categorization task, stimuli, and procedure from DM&D administered via DMDX. The experimental items included the Spanish vowel and consonant contrasts /e-ei̯/ and /d-ɾ/. Control items with segmental contrasts native to English were also included. All stimuli were presented in trisyllabic nonce words (e.g., [fa’neða] or [fə’nidɪʃ]). The stimuli were recorded in English and Spanish (i.e., with appropriate phonetic realizations for the respective language) by two female bilingual speakers of Mexican Spanish and American English. An additional set of control segments was included that consisted of vowel and consonant contrasts common to both English and Spanish. Each of the eight subconditions (two native, two nonnative, four control) were tested with four pairs of words.
The English and Spanish stimuli were presented in two separate blocks counterbalanced across participants. Within each block, stimuli were randomized. During each trial of the ABX task, participants heard three stimuli (nonce words) and had to select whether the last word they heard (X) matched the word of the first (A) or second (B) stimulus. Accuracy (i.e., correctly identifying which stimulus X matched) and reaction time (RT) were recorded. If a participant did not provide a response within 2,500 ms, the next trial began automatically. Within trials, a different voice was used for the A/B versus X recordings, and the recordings of the first and second stimulus were physically different. The four pairs of words for each of the eight subconditions were presented in four combinations (ABA, ABB, BAA, BAB) for a total of 128 trials (64 in each language). Participants completed the task on a laptop while wearing headphones. The task began with eight practice items that included feedback to familiarize participants with the procedure. A break was offered between the two blocks. The task took approximately 20 minutes.
Production
Production was tested using the same delayed sentence repetition task from DM&D administered via Microsoft PowerPoint. The consonant and vowel targets were the same as those in the perception task: /e-ei̯/ and /d-ɾ/. The task included a total of 16 sentences (four pairs per contrast) consisting of a question-and-answer sequence. Each item was presented via a series of two slides. On the first slide, participants heard the question and answer (recorded by the same female speakers who produced perception stimuli) and simultaneously saw the sentences in their written form. On the second slide, participants heard the question again and had to repeat aloud the answer they had previously heard. The written forms were not provided on the second slide. While completing the task, participants wore headphones and were seated in front of a computer in a sound-controlled booth. Their responses were recorded with a Shure SM58 microphone via a Focusrite Scarlett 4i4 interface on a Dell computer. Participants who hesitated or misremembered a target word were asked to repeat the trial. Initial instructions and a practice item were provided in the L1 English. The task took approximately 7 minutes.
Inhibitory Control
Three inhibitory control tasks were administered in the current replication: the task used in the initial study (retrieval-induced inhibition) and two additional tasks (Simon and Stroop). The retrieval-induced inhibition task was administered using the same task and procedure from DM&D. All three inhibitory control tasks were administered via E-prime 2.0.8.22 in participants’ L1 English. Participants were counterbalanced regarding the order in which they completed the tasks.
Retrieval-Induced Inhibition Task
The retrieval-induced inhibition task was the same task used in DM&D adopted from Lev-Ari and Peperkamp (Reference Lev-Ari and Peperkamp2013). The task included three phases: In the first phase, 18 words were presented to the participants who were instructed to memorize the words. The 18 words were comprised of six words (e.g., tiger, duck, elephant, cow, horse, snake) belonging to each of three different categories (i.e., animals, occupations, or vegetables). Each word was displayed along with its category (e.g., “animal – tiger”). In the second phase, participants were instructed to recall words they had learned. Specifically, they were asked to recall three words from two of the categories by seeing a prompt such as “animal – t” and being directed to type their response. In the third phase, participants were presented with a list of 34 words and asked to identify whether the word was one of the 18 they had been asked to memorize during phase one of the experiment.
The purpose of phase 2 was to create three types of experimental items: (a) words that were practiced, (b) words that were inhibited because they belonged to a category that was practiced but were not practiced themselves, and (c) words that functioned as a control because they belonged to a category that was not practiced. The expectation is that those with greater inhibitory control would have lower activation levels for words of type (b) and therefore would have longer reaction times for these items in Phase 3 compared with words of type (a) or type (c). Participants were automatically assigned by E-prime to one of six experimental lists based on their participant number. The task took approximately 8 minutes.
Simon Task
The Simon task used in the current replication was adapted from Lev-Ari and Peperkamp (Reference Lev-Ari and Peperkamp2014) and represents domain-general inhibition of a prepotent response. In the task, participants were instructed to identify whether a box was red or blue by pressing “q” or “p” on a keyboard. The boxes appeared on the left, center, or right side of the screen. Participants needed to identify the color of the box while ignoring its location. Each item in the task represented one of three conditions: (i) a congruent condition in which the location of the box matched the location of the keyboard button press (e.g., a box shown on the right hand side of the screen requiring a right/”p” button press); (ii) an incongruent condition in which the location of the box was the opposite of the location of the keyboard button press (e.g., a box shown on the right hand side of the screen requiring a left/“q” button press); and (iii) a neutral condition in which the box was located in the center of the screen. Reaction times are expected to be faster in congruent versus incongruent conditions.
The task began with a practice block consisting of 12 items (two cycles of each of the three conditions for each of the two colored boxes, randomly presented). During the practice block, participants received feedback (i.e., “incorrect” was displayed for any incorrect response). After the practice block, participants completed two blocks of 42 experimental items (total of 84 items), which were the same in procedure as the practice block except that no feedback was provided. Within each block, items were randomized. Participants were counterbalanced as to whether the blue box was associated with the left or right-hand side of the screen. The task took approximately 4 minutes.
Stroop Task
The Stroop task used in the current study was a manual version of the task adapted from Lev-Ari and Peperkamp (Reference Lev-Ari and Peperkamp2014) and represents the inhibition of a prepotent response. The task included four conditions: (i) a congruent condition in which the word semantics and ink color matched; (ii) an incongruent condition in which the word semantics and ink color conflicted; (iii) a neutral condition in which the symbol string @@@@ was presented; and (iv) a reading condition in which the word was presented in gray ink. Four colors were used: blue, red, green, and yellow.
The task began with a training component in which participants practiced matching a color to its correct response key (i.e., “s” for yellow, “d” for green, “k” for blue, and “l” for red). This phase included 100 trials using the symbols @@@@ 25 times in each of the colors. Next, participants practiced the full task with 40 items. Finally, participants completed two blocks of 96 experimental items (total of 192 items). Within each block, items were randomized. Participants received feedback during the training and practice sessions but not during the two experimental blocks. The task took approximately 10 minutes.
Coding and analysis
The current study followed the same coding procedures as reported in DM&D. Hedge’s g effect sizes and 95% CIs were calculated using the Effect Size Calculator (https://www.cem.org/effect-size-calculator). The remaining analyses were conducted in R (Version 4.2.3; R Core Team, 2022).
Perception
Following DM&D, only the results for the perception accuracy data are reported; however, RT information from the DMDX output was used to identify unanswered items (any RT greater than 2,500 ms; 77 items out of a possible 7,424 or 1.0%) and accuracy (negative RTs indicated incorrect responses). In line with DM&D, the results were screened for outliers (i.e., performance below two SDs from the mean on the control condition), which resulted in three participants being excluded. Accuracy was calculated as an error rate for each participant based on performance of the Spanish consonant and vowel items.
Production
The segmental analysis followed the same procedure as described in DM&D for both vowel and consonant targets.
Vowel Analysis
The vowel analysis compared formant movement between the monophthong and diphthong vowels with the prediction that learners would produce the Spanish vowels /e-ei̯/ similarly. Each vowel was segmented in Praat (Boersma & Weenink, Reference Boersma and Weenink2023) independently by both the first author and a research assistant following a segmentation protocol (provided on OSF) to allow for comparison and checking. Using the segmented TextGrids and a Praat script modified from Brato (Reference Brato2016)Footnote 5 measurement points at 20%, 50%, and 80% were identified. To analyze formant movement, the average F0 (fundamental frequency), F1, and F2 were measured in Hertz over 10-ms windows centered at the 20%, 50%, and 80% locations. For normalization purposes, these frequency values were converted to Bark using the formula provided by Baker & Trofimovich (Reference Baker and Trofimovich2005, p. 9): B = 26.81/(1 + [1,960/F]) – 0.53. Estimates of vowel position in terms of height and frontness were calculated using the Bark converted frequencies: height (B1 – B0) and frontness (B2 – B1). The overall formant movement was estimated: First, the Euclidean distances between the 20% and 50% measurement points and the 50% and 80% measurement points were calculated. These two distances were summed and used to represent the amount of formant movement within the vowel. Finally, following DM&D, a z score was computed for the learners’ diphthong formant movement based on the means and SDs reported for the NSs (M = 3.19, SD = 0.71, p. 747).
Consonant Analysis
The L2 Spanish consonants /d-ɾ/ were binarily scored (accurate or inaccurate) by two native speaker raters based on auditory and visual acoustic characteristics (see protocol on OSF). Following DM&D, to be coded as accurate, /d/ needed to be realized as spirantized [ð] and /ɾ/ needed to be realized as a single-closure tap with short duration. With four tokens of each consonant, participants received a score of a maximum of eight. After training and group rating of two experimental files, the two raters separately coded the remaining files. They agreed on 93% of the ratings, and the remaining differences were discussed and resolved. Interrater reliability was acceptable and comparable to DM&D (replication k = .92, DM&D k = .92).
Inhibitory Control
Retrieval-Induced Inhibition
Following DM&D, participants’ results were first checked to ensure that they recalled two or more items out of six in the practiced categories. All participants in the current replication passed this check. Next, median RTs were computed for each participant; comparisons were made among the item types; and an inhibitory control score was computed by dividing the median RT for inhibited items by the median RT for control (non-practiced) items. In this way, the greater the score is above 1, the stronger the inhibitory control.
Simon
A Simon effect is calculated by subtracting RTs on congruent trials from RTs on incongruent trials (Linck et al., Reference Linck, Schwieter and Sunderman2012) for correct responses, and those with higher inhibitory control therefore have lower Simon effects. Average RTs (for correct responses only) were calculated for each participant by trial type (i.e., congruent, neutral, or incongruent), and a Simon score was computed for each participant by subtracting the average RT on congruent trials from the average RT on incongruent trials.
Stroop
A Stroop effect is calculated by subtracting RTs on the neutral trials from RTs on the incongruent trials (Gass & Lee, Reference Gass, Lee, Schmid and Lowie2011; Lev-Ari & Peperkamp, Reference Lev-Ari and Peperkamp2014) for correct responses, and those with higher inhibitory control therefore have lower Stroop effects. Average RTs (for correct responses only) were calculated for each participant by trial type (i.e., congruent, incongruent, neutral, or read), and a Stroop score was computed for each participant by subtracting the average RT on neutral trials from the average RT on incongruent trials.
Relationship between inhibitory control and phonological processing
In line with DM&D, the following variables were included in partial correlation analyses: Perception scores were the error rates from the ABX perception task; consonant production was the number of accurate productions (out of eight); and vowel production was a z score of diphthong formant movement computed using Spanish NS means/SDs from DM&D. Following DM&D and as planned in the study preregistration, two main analyses were conducted and proficiency (X_Lex score) was included as a covariate: (i) partial correlation analyses between inhibitory control and the phonological scores, and (ii) hierarchical regression analysis using inhibitory control as a predictor of vowel perception.
The test assumptions of the Pearson partial correlation analyses were checked (i.e., linear relationships among variables, normal distribution, and lack of outliers; see Supplementary Materials, S2). Assumptions were violated (i.e., the phonological variables did not appear to be normally distributed nor did transformations of the variables improve distribution). Therefore, non-parametric Spearman partial correlation analyses were conducted using the PResiduals package partial_Spearman.
Finally, test assumptions for the hierarchical multiple regression analysis were checked (e.g., linear relationships among variables, homoscedasticity of residuals, lack of influential points; see RegressionAnalysis.R code). Multiple assumptions were violated, including the existence of influential points. Removal of the points and transformation did not result in model improvements. Therefore, the results of the regression analysis are interpreted cautiously by visual inspection of the residuals plot.
Results
Perception
Table 3 reports the descriptive statistics for the vowel and consonant perception error rates in DM&D and the current replication.
As indicated in Table 3, the vowel perception error rates in the current study and DM&D are numerically similar, with average vowel and consonant perception accuracy being 80.9% and 76.0%, respectively, in DM&D, and 80.8% and 83.2% in the current replication. Comparing the SDs and CIs, we see larger SDs in the initial study compared with the current replication, but at the same time the CI widths are relatively similar in both studies (about seven for both consonants and vowels). This may indicate potential outliers in the initial study. The Hedge’s g effect sizes and 95% CIs for these mean difference comparisons indicated negligible effects (with CIs passing through/including zero) for both vowel (g = –0.01 [–0.52, 0.51]) and consonant (g = 0.43 [–0.09, 0.95]) perception error rates, suggesting negligible differences between the performances of the two groups of participants.
Production
Table 4 reports the descriptive statistics for the vowel and consonant production accuracy scores from DM&D and the current replication. L2 learners in DM&D did not demonstrate clear differences in formant movement between the monophthong and diphthong vowels: M = 0.92 (0.36) compared with M = 1.18 (0.55), g = 0.55 [–0.10, 1.20]; however, L2 learners in the current replication did show differences with a small effect size, M = 1.58 (0.60) compared with M = 1.26 (0.38), g = 0.63 [0.26, 1.01]. Nevertheless, as in DM&D, Spanish L2 learners produced the diphthong /ei̯/ with much less formant movement than NSs: M = 1.58 (0.60) compared with M = 3.19 (0.71), g = 2.61 [1.65, 3.56]. It was also the case that the z score values computed for the L2 learners’ diphthong formant movement were comparable between the studies: M = –2.54 (0.78) compared with M = –2.27 (0.85), g = –0.32 [–0.85, 0.21]. Figure 1 provides a visual comparison of the formant movement in the vowels produced by L2 learners in the current study and is parallel to Figure 2 reported in DM&D (p. 758).
a Value from p. 761 L2 speakers (n = 18) included in the inhibitory control analyses.
For the /d-ɾ/ consonant contrast, participants in the current replication scored similarly (M = 4.50 [2.31] out of eight) in comparison to participants in DM&D (M = 4.09 [2.45]), with effect-size 95% CIs passing through zero (g = –0.17 [–0.70, 0.36]).
Inhibitory control
Retrieval-Induced Inhibition
As indicated in Table 5, results in the current study patterned similarly to those of DM&D when comparing retrieval-induced inhibition task performance: (i) Median RTs for the practice items were shortest of the three item types, and (ii) although a significant difference was found between practiced versus inhibited items, no significant difference was found between control versus inhibited items. For the inhibitory control scores, as indicated in Table 8, participants in the current replication scored similarly (1.02 [0.23]) to participants in DM&D (1.01 [0.11]), with effect size CIs passing through zero (g = –0.05 [–0.58, 0.48]).
b Values reported include both L2 speaker groups from DM&D, as results for the L2 Spanish only group were not reported (p. 760).
Simon
Mean RTs and SDs (correct responses only, 97% of total responses) for each trial type on the Simon task are presented in Table 6. Results of the Simon task indicated longer RTs for incongruent conditions (M = 518, SD = 180) compared with congruent conditions (M = 479, SD = 147) with non-overlapping CIs.
Stroop
Mean RTs and SDs (correct responses only, 96% of total responses) for each trial type on the Stroop task are presented in Table 7. Results of the Stroop task indicated longer RTs for incongruent conditions (M = 1,188, SD = 648) compared with neutral conditions (M = 857, SD = 414) with non-overlapping CIs.
As planned in the study preregistration, Simon and Stroop effect scores were tested for normality. Because neither Simon nor Stroop scores were normally distributed (see Supplementary Materials, S3), trial-level RTs were log-transformed and median log-RTs were computed, as recommended by Linck et al. (Reference Linck, Schwieter and Sunderman2012). For both Simon and Stroop effects, this resulted in scores that were normally distributed; therefore, the log-RTs were used in partial correlation and regression analyses.
Relationship between inhibitory control and phonological processing
Table 8 reports the means and SDs of the target phonological variables and three measures of inhibitory control included in the analysis. As a reminder, DM&D did not conduct an analysis for consonant perception because the L2 English group performed at ceiling on the perception task; thus, Table 8 contains “n/a” values for consonant (ABX) error rates.
For ease of interpretation, Table 9 summarizes the expected directionality of the relationships for the partial correlation analyses among the phonological variables and the tests of inhibitory control used in the current study. For example, vowel perception is operationalized as error rate on the ABX task, so a higher score corresponds to lower accuracy. For the retrieval-induced inhibition score, higher inhibitory control is associated with longer retrieval RTs, so a higher score corresponds to higher inhibitory control. Thus, an inverse or negative relationship is predicted between vowel perception scores and retrieval-induced inhibition scores if it is the case that stronger inhibitory control is related to more accurate vowel perception abilities. In contrast, for the Simon task, higher inhibitory control is associated with a lower Simon score, so a higher score corresponds to lower inhibitory control. Thus, a direct/positive relationship is predicted between vowel perception scores and Simon scores if it is the case that stronger inhibitory control is related to more accurate vowel perception abilities. The asterisks in Table 9 indicate when the expected directionality of the relationship was indeed found in the current replication study. These results are discussed in more detail in the Discussion.
* Expected directionality of the relationship found in the current study.
Table 10 reports the results of the partial correlation analyses (Pearson for DM&D, Spearman for the current study) for comparison. DM&D reported statistically significant relationships with medium effect (r = –0.42 and 0.34) between inhibitory control and vowel perception and inhibitory control and consonant production, such that participants with higher inhibitory control were more accurate in vowel perception and consonant production. No clear relationship was reported between vowel production and inhibitory control. In the current replication, the directionality of the relationships is as expected such that higher inhibitory control corresponded to higher perception and production accuracy. Nevertheless, no statistically significant relationships were found, and 95% CIs for the effect sizes (all considered less than small) passed through zero, indicating negligible effects.
Note: * p < .05, ** p < .01 as reported in DM&D.
Table 11 reports the results of the Spearman partial correlation analyses with the two additional measures of inhibitory control included in the current analysis.
As with the results for retrieval-induced inhibition, no statistically significant relationships were found, and 95% CIs for the effect sizes (all considered less than small) passed through zero, indicating negligible effects. If the directionality of the relationships is considered, for the Simon task, we find the predicted relationship for the consonant targets (higher inhibitory control corresponding to more accurate consonant perception and production); however, this is not the case for the vowel targets such that higher inhibitory control was associated with less accurate vowel perception and vowel production. For the Stroop task, a third pattern emerged such that we found the predicted relationships for the perception targets (both consonant and vowel) but not the production targets.
Finally, results from the hierarchical multiple regression analysis are reported. Table 12 reports the results of the hierarchical multiple regression analysis from DM&D along with the values from the current replication (although recall that test assumptions were violated). DM&D reported that inhibitory control was a statistically significant predictor of vowel perception accuracy (p = .018) with a small effectFootnote 6 (R2= 0.18) once proficiency was controlled. In the current replication, retrieval-induced inhibition was not a statistically significant predictor of vowel perception accuracy (p = .264, R2 = 0.03). Figure 2 graphically illustrates this relationship via a scatterplot of ABX error rate (vowels) and retrieval-induced inhibition with the fitted regression line in blue and the residual errors (difference between the observed and predicted value) in red.
Discussion
The current study was a close replication of Darcy et al. (Reference Darcy, Mora and Daidone2016), which examined the relationship between inhibitory control and L2 speech perception and production and unexpectedly reported differential relationships between inhibitory control and perception and production. Specifically, DM&D found that learners with higher inhibitory control were more accurate in vowel perception and consonant production. No clear relationship was found between vowel production and inhibitory control, and consonant perception was not able to be tested. Results of the current replication for the partial correlation analyses did not indicate statistically significant relationships, and effect sizes were small with 95% CIs crossing through zero. Assumptions were not met for the hierarchical multiple regression analysis. At the same time, a comparison of the descriptive statistics (means and SDs) for the tests of phonological processing and retrieval-induced inhibition between DM&D and the current replication indicated negligible differences between the two participant samples in terms of effect sizes and 95% CIs.
The underlying theoretical question at the core of DM&D and the current replication regards the extent to which L2 phonological skills are related to general cognitive abilities, specifically inhibitory control. The logic behind the hypothesized relationship is that those with greater inhibitory control might be better at suppressing their L1 during the processing of L2 acoustic-phonetic input, ultimately resulting in more accurate segmental categories. Having these more accurate categories would, in turn, result in more accurate perception and production of segments during language use. Put another way, variability in outcomes in L2 pronunciation could be the result of individual differences in inhibitory control.
Bringing together the findings from DM&D and the current replication, no strong, clear, or consistent relationship emerges between inhibitory control and L2 perception/production skills. There are multiple explanations for why there were discrepancies between the results of the two studies. The following possibilities are discussed in more detail here: (i) There is no, or only a weak, relationship between inhibition and L2 phonological skills; (ii) the type of inhibition tested or the specific tests used to measure inhibition were not appropriate to capture individual differences in cognitive abilities; and/or (iii) there were differences in study design features (e.g., participant samples, methodological decisions, time frame, context) that affected outcomes.
When considering the combined results of DM&D and the current replication, an important possibility to consider is that despite inhibitory control (both retrieval-induced and prepotent response inhibition) being demonstrated to relate to language processing in sentence comprehension and lexical access studies (Costa & Santesteban, Reference Costa and Santesteban2004; Filippi et al., Reference Filippi, Leech, Thomas, Green and Dick2012), its relationship to phonological processing may be null or weak at best. Even if inhibitory control affects L2 pronunciation skills, it could be that it plays a more critical role in acquisition stages or perceptual category formation; therefore, perception measures that reflect those processes might be more likely to show relationships as opposed to those in the current study, which reflect more end-state performance. In fact, an intriguing body of research is providing mounting evidence for the role of domain-general auditory processing in second language learning (e.g., Saito et al., Reference Saito, Suzukida, Tran and Tierney2021, Reference Saito, Sun, Kachlicka, Alayo, Nakata and Tierney2022) such that “the ability to precisely encode auditory input [e.g., information about frequency, duration, amplitude] may be a bottleneck for the establishment of knowledge about segmental and suprasegmental linguistic categories” (Zheng et al., Reference Zheng, Saito and Tierney2022, p. 480). Additional evidence links auditory processing to the establishment of higher executive functions via the temporal and sequential patterns present in sound signals and supports frameworks such as those suggested by the Auditory Scaffolding Hypothesis (Conway et al., Reference Conway, Pisoni and Kronenberger2009). As such, to better understand the relationship between inhibitory control and L2 phonological skills, a useful future replication study could test learners’ auditory processing skills and incorporate those scores in statistical models.
Another important consideration is that the specific tests used to measure inhibition might not have been appropriate to capture individual differences in cognitive abilities. Regarding the retrieval-induced inhibition task, a finding that is consistent across both studies was that inhibited items were responded to slower than practiced items. Neither study indicated significant differences between the inhibited and control items (recall that the inhibitory control score was computed by dividing the median RT for inhibited items by the median RT for control items). In other words, neither study found the expected differences between RTs on control versus inhibited items, which may be an indication that the task is not sensitive enough for this population. Future work examining the role of inhibitory control with language learning and should more thoroughly consider performance on the tasks used to measure inhibition. For instance, it is necessary to consider whether there is sufficient variability in inhibition scores. The same is true for the Stroop and Simon tasks. At first it might seem somewhat counterintuitive that highly robust and reliable tasks like the Stroop task would be problematic for investigations of individual differences using correlational analyses. Yet, Hedge et al. (Reference Hedge, Powell and Sumner2018) provided convincing evidence that such tasks are robust because they lack between-participant variability, which is necessary for ranking individuals for correlational studies like the current replication. This reduced reliability also results in reduced power for analysis, meaning that the sample sizes needed to detect relationships should likely be two to three times larger than those calculated without taking task reliability into account (Hedge et al., Reference Hedge, Powell and Sumner2018). For the current study that would mean about 180 participants. This might not be feasible in all cases, but the open sharing of data in the field would mean that researchers could combine data sets over time.
The current replication primarily explored whether L2 phonological processing would be related to tests of inhibitory control designed to measure prepotent response inhibition. Specifically, it sought to explore whether Simon/Stroop tasks would show a stronger relationship to production measures, given the possibility that production might additionally require the inhibition of a dominant motor response (i.e., the articulation of an L1 segment). Similar to the findings for retrieval-induced inhibition, no significant relationships were found, and effect sizes were small with 95% CIs crossing through zero. When considering the directionality of the results, different patterns emerged in comparison to retrieval-induced inhibition: Although the expected directionality of the relationships was present for perception and production of consonants and vowels for the retrieval-induced inhibition task, this was only the case for consonant measures (both perception and production) for the Simon task and perception measures (both consonant and vowel) for the Stroop task. In their study, DM&D summarized their findings by stating, “In sum, the relationship between inhibition and perception appeared stronger than the relationship between inhibition and production” (p. 764). One interesting complication, however, is that the pattern of the strengths of the relationships was not identical for vowel and consonant targets. A stronger relationship was noted for perception, but this was based only on the vowel perception results, as consonant perception data were not able to be analyzed in the initial study. Thus, it is likely that relationships between L2 perception and production and inhibitory control might differ when the target feature is a vowel or consonant. It is not immediately clear why this might be the case. One possibility is that it is not consonant versus vowel, per se, which is important, but rather the relative difficulty of perceiving and/or producing the target segment. Another possible explanation could be methodological if one considers the fact that vowel and consonant accuracy were measured differently on the production task in the studies. As acknowledged by DM&D, the formant analysis might allow for more individual variation than the consonant analysis, which was scored binarily and out of 8. At the same time, error rates for consonant and vowel perception would have resulted in comparable score types.
A final consideration relates to differences in study design features that might have affected outcomes. One obvious difference between the studies is that the sample sizes differed (n = 18, 19, 34 in DM&D and n = 58 in the current replication). The potentially underpowered initial study may have increased the risk of type I or type II errors (Loewen & Hui, Reference Loewen and Hui2021). One benefit of a larger sample is that it most likely results in narrower confidence intervals for effect sizes. As the field continues to improve on reporting practices, it is critical for researchers to report effect-size CIs, to allow for comparison in future replications (see Tables 10 and 11). Despite the lack of finding similar effects for the correlation analyses, a comparison of the descriptive statistics indicated comparability between the two studies regarding the outcomes of the phonological measures and the retrieval-induced inhibition task. One difference between the current replication and the initial study is that the correlation analyses in the initial study combined phonological (and inhibition) scores from two different L2 groups. In the current study, the descriptive statistics for the phonological scores were comparable to those of the L2 Spanish group in the initial study and the inhibition scores of the combined group for the correlation analysis. Nevertheless, the fact that the initial study’s correlation analyses included data from an L2 English group (whose phonological descriptives may have differed from the current sample) may have impacted outcomes.
In sum, the combined evidence from DM&D and the current replication does not provide consistent evidence of a strong relationship between inhibitory control and L2 perception/production skills. It could be that this relationship is mediated by general auditory processing skills and/or that our tests used to measure inhibition are not appropriate to capture individual differences. Clearly, we have much more to investigate to grow our understanding of the links between cognitive mechanisms and L2 phonological processing. For all potential future replications of this line of research, it is hoped that the open sharing of materials, data, and analysis code from the current study are useful.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0272263124000238.
Acknowledgments
I am grateful to Isabelle Darcy, Joan C. Mora, and Danielle Daidone for their support in the planning stages of this project. I would also like to thank Shiri Lev-Ari and Thorsten Brato for sharing their study materials and Praat script, respectively. I am also thankful to the participants, Irene Soto-Lucena, and Kevin McManus. Finally, I would like to thank SSLA handling editor Kazuya Saito and the three anonymous reviewers for their valuable feedback on this work.
Competing interest
The author declares no competing interests.
Data availability statement
The experiment in this article earned Open Data and Materials badges for transparent practices. The data and materials are available at https://osf.io/fxzvj/. The preregistration is available at: https://osf.io/w4gj2.