INTRODUCTION
Bilinguals often navigate complex communicative situations in which they produce and perceive speech in both of their languages. This necessarily entails switching between language-specific phonetic targets (e.g., Flege & Eefting, Reference Flege and Eefting1987b), as well as phonological rules (e.g., Simon, Reference Simon2010), often in real time. For instance, English and Spanish both contrast bilabial stops based on voicing, /b/-/p/, though the phonetic realizations differ in each language. English distinguishes between lag stops, whereas Spanish distinguishes between prevoiced and short-lag stops. Bilingual speakers regulate fine-grained phonetic differences such as this by seamlessly adjusting to the ambient languages. Manifold studies demonstrate that early and late bilingualsFootnote 1 develop language-specific phonetic categories in speech production (Flege & Eefting, Reference Flege and Eefting1987b, among many others). There is a growing body of research showing that early and late bilinguals also employ language-specific perceptual categorization routines (Antoniou et al., Reference Antoniou, Tyler and Best2012; Elman et al., Reference Elman, Diehl and Buchwald1977; Hazan & Boulakia, Reference Hazan and Boulakia1993, among others). Specifically, perception of the incoming acoustic signal seems to be modulated by the language the bilingual listener believes they are hearing. The listener’s beliefs can derive directly from the acoustic signal (e.g., Casillas & Simonet, Reference Casillas and Simonet2018; Gonzales & Lotto, Reference Gonzales and Lotto2013), and recent studies suggest they can also be cued conceptually by tapping into the listeners conceptual knowledge about which language is spoken in different language contexts (Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Yazawa et al., Reference Yazawa, Whang, Kondo and Escudero2019). That is, bilinguals modify their perceptual categorization routines in accordance with the phonetic realizations of the language they believe they are hearing. It is unclear how dynamic perceptual categorization develops and whether or not it can be conceptually cued in late bilinguals. In the present study, a conceptual replicationFootnote 2 of Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019), we tested the extent to which adult second language (L2) learners of Spanish modulated their identification of a resynthesized voice timing continuum based on the language they believed they were hearing.
BACKGROUND AND MOTIVATION
Previous studies have explored the notion that bilinguals develop independent phonetic (sub)systems between which they are able to switch as a function of the communicative setting (Bohn & Flege, Reference Bohn and Flege1993; Caramazza et al., Reference Caramazza, Yeni‐Komshian, Zurif and Carbone1973; Caramazza et al., Reference Caramazza, Yeni-Komshian and Zurif1974; Casillas & Simonet, Reference Casillas and Simonet2018; Elman et al., Reference Elman, Diehl and Buchwald1977; Flege & Eefting, Reference Flege and Eefting1987a; Garcia-Sierra et al., Reference Garcia-Sierra, Diehl and Champlin2009; Garcia-Sierra et al., Reference Garcia-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012; Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Gonzales & Lotto, Reference Gonzales and Lotto2013; Hazan & Boulakia, Reference Hazan and Boulakia1993; Williams, Reference Williams1977, Reference Williams1979). These “language set” experiments have attempted to demonstrate language-specific bilingual speech production and perception behavior by manipulating language mode (Grosjean, Reference Grosjean2001b). According to Grosjean (Reference Grosjean2001a, Reference Grosjean2001b), bilingual language modes refer to the level of activation of an individual’s languages in any given moment. This framework posits a continuum between a monolingual mode, in which only one of the languages is activated, and a bilingual mode, which supposes dual activation of the languages. It is uncontroversial that bilinguals produce language-specific phonetic targets in unilingual testing situations (e.g., Flege & Eefting, Reference Flege and Eefting1987b), and cross-linguistic interactions are attested when the communicative setting activates both languages (e.g., Simonet, Reference Simonet2014). There is an increasing amount of evidence indicating that bilinguals also adjust their perceptual categorization routines based on the language context in monolingual (Garcia-Sierra et al., Reference Garcia-Sierra, Diehl and Champlin2009, Reference Garcia-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012; Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Gonzales & Lotto, Reference Gonzales and Lotto2013, among others) and bilingual (Casillas & Simonet, Reference Casillas and Simonet2018) modes. This has been referred to as the double phonemic boundary effect.
A recent line of research has empirically tested the double phonemic boundary effect by manipulating language context through the acoustic properties of the stimuli presented to the participants (Casillas & Simonet, Reference Casillas and Simonet2018; Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Gonzales & Lotto, Reference Gonzales and Lotto2013), that is, language context was cued perceptually. For instance, Gonzales and Lotto (Reference Gonzales and Lotto2013) investigated how bilinguals perceptually accommodate the fine-grained phonetic differences related to how stop consonants are pronounced in English versus Spanish. This language pair has the same voicing distinction at bilabial, coronal, and velar place; however, the phonetic realizations differ with regard to voice-onset time (VOT)—the duration between the release of the stop consonant and onset of voicing (Lisker & Abramson, Reference Lisker and Abramson1964). While Spanish contrasts phonetically voiced /bdg/ with phonetically voiceless /ptk/, English differs in that the phonetic implementation of the same contrasts occur between lag stops. In Gonzales and Lotto (Reference Gonzales and Lotto2013), early Spanish–English bilinguals identified resynthesized VOT continua of pseudowords (“bafri” and “pafri”). Language context was cued conceptually through the instructions by indicating which language they would hear and perceptually through the acoustic properties of the target word endings. The English-specific final syllable [fɹi] was appended to the stimuli to create an “English-like” VOT continuum. A “Spanish-like” version was created by appending [fri] to the same continuum. Importantly, the continua were identical regarding the acoustic properties of the stop segment. Using a between-subjects design, Gonzales and Lotto (Reference Gonzales and Lotto2013) found that early bilinguals’ perceptual identification depended on the language-specific continua. That is, the bilinguals who heard the English-like continuum displayed identification functions typical of English speakers, and the bilinguals who heard the Spanish-like continuum produced more “voiceless” responses, essentially shifting the identification function to the left in a manner consistent with the phonetic realizations of Spanish stop voicing. Figure 1 depicts English and Spanish monolinguals’ typical voicing identification functions consistent with English and Spanish phonemic boundaries.
The findings suggest that bilinguals adjust perceptual categorization across language contexts by switching between language-specific phonetic systems. Gonzales and Lotto (Reference Gonzales and Lotto2013) questioned whether the same degree of phonetic (sub)system separation was also present in adult L2 learners. Models of L2 speech suggest that learner difficulties with nonnative segments/contrasts are explained using acoustic similarities and differences with L1 phonology. The Second Language Linguistic Perception (L2LP) model (Escudero, Reference Escudero2005; Van Leussen & Escudero, Reference Van Leussen and Escudero2015) posits that during the initial stages of L2 learning a copy of the L1 perception grammar is made (Full Copying hypothesis), and then develops independently of the L1 grammar. With exposure to the new language, adjustments are made to the L2 perception grammar through a comparison module, the Gradual Learning Algorithm (GLA). The L2LP proposes three distinct learning situations. If the learner perceives the contrast as novel (“new scenario”) the L2LP contends that a new phonetic category must be formed. If the learner perceives the contrast as familiar (“similar scenario”), (s)he must then reset the boundary between the acoustic characteristics of this contrast using the GLA. Finally, a “subset” scenario may occur when the L2 has a smaller phonemic inventory and a single L2 category is perceptually assimilated to multiple L1 categories. The Spanish/English voicing contrasts are considered a “similar scenario,” thus L2 learners of Spanish must adjust their voicing boundaries to accurately produce and perceive Spanish stops. Importantly, the model proposes that with exposure learners begin to selectively activate L1/L2 perception grammars during speech perception. Language activation can be triggered by a range of variables, linguistic and extralinguistic, such as the language of instruction or the language a given task requires (see Escudero, Reference Escudero2005; Yazawa et al., Reference Yazawa, Whang, Kondo and Escudero2019).
In effect the L2LP provides a theoretical framework that incorporates bilingual language modes and makes the prediction that the double phonemic boundary effect should also occur in adult learners using perceptual and conceptual cueing. Casillas and Simonet (Reference Casillas and Simonet2018) found that both early and late bilinguals showed mode-specific perceptual normalization criteria in conditions of rapid, random mode switching. Importantly, the study showed that bilinguals can exploit language-specific perceptual processes in real time and that this ability appears to be modulated by language proficiency in late learners. Thus, there is evidence for perceptually cued phonemic boundary shifts using the acoustic signal in early and late bilinguals. However, it is still unclear how this ability develops in adult learners. Casillas and Simonet (Reference Casillas and Simonet2018) assessed self-reported proficiency based on a composite measure that suggested the sample was limited to low-intermediate/intermediate learners. It remains to be seen if L2 perceptual processes, such as the double phonemic boundary effect, develop in conjunction with standardized measures of L2 proficiency. There is evidence suggesting that L2 phonological development unfolds in a monotonic relationship with L2 vocabulary size (Bundgaard-Nielsen et al., Reference Bundgaard-Nielsen, Best, Kroos and Tyler2012), though some studies suggest phonetic category development begins at an early stage of learning (Munro et al., Reference Munro, Derwing, Saito, Levis and LeVelle2013; Williams, Reference Williams1979).
A related question deals with the type of cueing used to incite the double phonemic boundary effect in L2 learners. Gonzales and Lotto (Reference Gonzales and Lotto2013) used perceptual and conceptual-based cueing and Casillas and Simonet (Reference Casillas and Simonet2018) used only perceptual cueing. Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019) tested for the double phonemic boundary effect using only conceptual cueing by presenting Spanish–English and French–English bilinguals a truncated version of the “bafri”/“pafri” VOT continuum. The stimuli were not language specific. Language context was manipulated conceptually through the instructions by informing the participants they were listening to incomplete utterances of two rare words constituting a /b-p/ minimal pair in English or their other language (e.g., for Spanish-English bilinguals, a Spanish or English bafri-pafri minimal pair). Bilinguals were asked to indicate, on each trial, which “word” the speaker was beginning to say. The continuum was exactly the same in both language contexts, ranging from the beginning portion of one member of the minimal pair (e.g., “ba”) to that of the other member (e.g., “pa”). Thus, the contexts were in no way cued differentially by the acoustic properties of the continuum tokens. Nonetheless, both groups displayed different perceptual categorization routines corresponding with the language they believed they were hearing.
In sum, previous research suggests bilinguals can maintain some degree of separation between sound systems. Language-specific perceptual categorization can be perceptually cued in both early and late bilinguals (Casillas & Simonet, Reference Casillas and Simonet2018; Gonzales & Lotto, Reference Gonzales and Lotto2013), and conceptually cued in early bilinguals (Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019). We build on this line of research by testing the conceptual-cueing hypothesis in adult L2 learners.
THE PRESENT STUDY
This research was a conceptual replication of Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019), who provided evidence for conceptually cued language-specific perceptual categorization routines in early Spanish/English and French/English bilinguals. Concretely, we extended the conceptual-cueing hypothesis to late bilinguals by following similar testing procedures and using the same auditory stimuli as Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019). Crucially, we test whether the double phonemic boundary effect can be conceptually cued in adult L2 learners. We operationalize conceptual cueing as the imparting of conceptual knowledge about which language will be spoken in different language contexts. Our language contexts, Spanish and English, differed from each other solely in terms of the conceptual content of the task instructions. Our study differed from Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019) in that we employed a within-subjects design and we assessed how L2 proficiency impacts perceptual categorization routines. Specifically, this work addressed the following research questions:
1. Can late bilinguals be conceptually cued to employ language-specific perceptual categorization routines?
2. If so, how is perceptual categorization modulated by L2 proficiency?
Following the L2LP, we hypothesize that target language boundary adjustments for the Spanish voicing contrast will develop such that conceptually cued language-specific perceptual categorization is possible. Specifically, the model predicts the L2 perception grammar will develop with increased L2 exposure, therefore, adult L2 learners of Spanish should display evidence for language-specific phonemic boundaries as proficiency in Spanish increases. This research fills a clear gap in the literature by adding to our knowledge about sound representation in the bilingual mind and how it develops in conjunction with target language proficiency in adult learners.
METHOD
PARTICIPANTS
A total of 169 individuals completed a two-alternative forced choice (2AFC) task in which the identity of a pseudoword VOT continuum was categorized in separate testing sessions. We recruited 139 undergraduate students enrolled in Spanish courses and graduate-level instructors from a university in the US Northeast. Students received course credit for their time. We also recruited 30 monolingual English speakers using the Prolific.ac online experimental platform. These participants received US$5 for completing the experiment. The pool of online-recruited participants was filtered using criteria set in Prolific.ac to insure participants self-reported as being monolingual English speakers born, raised, and currently living in the United States with no knowledge of any languages other than English. They reported no hearing difficulties and were required to use headphones on a desktop computer. Upon beginning the experiment, all participants responded to the following screening questions: 1) Was English your first language?, 2) Did you start learning Spanish at the age of 13 or older?, and 3) Do you have significant knowledge of any languages besides English and Spanish? We excluded data from any participant responding “no” to (1) or (2), or “yes” to (3). Participants responding categorically across all trials were also excluded. Based on these criteria, the final dataset included 119 participants.
We administered the Lexical Test for Advanced Learners of Spanish (LexTALE-ESP) (Izura et al., Reference Izura, Cuetos and Brysbaert2014; Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) to provide a standardized assessment of the participants’ proficiency/vocabulary size in Spanish. On this measure, scores can range from −20 to 60, with native speaker values generally above 50. Individuals with little or no knowledge of Spanish typically score from −20 to 0. For the purposes of the present study, proficiency is treated as a continuous variable, thus we consider a monolingual English speaker to have little to no proficiency in Spanish. Figure 2 plots the LexTALE data. Participants had a wide range of scores (Min. = −18, Max. = 56), suggesting all proficiency levels were likely represented in the sample. The mean value was 14.60 (95% HDI: [11.48, 17.89]). The horizontal lines indicate scores located at 1 and 2 standard deviations from the mean (SD = 17.75).
AUDITORY STIMULI
Instructions
Spoken instructions (duration = 35 seconds) were recorded in English by a 25-year-old female Spanish–English simultaneous bilingual. Following Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019), we created separate instructions for each language context. Prerecorded oral instructions conceptually cued the language context by informing participants they were going to hear fragments of rare words in the context-relevant language (Spanish or English), spoken by a native speaker of that language. Importantly, the audio contained no spoken instances of “bafri” or “pafri,” though these forced choice alternatives were presented in orthographic form (see the following text). We used Praat (Boersma & Weenink, Reference Boersma and Weenink2018) to interchange spoken instances of the words “English” and “Spanish” according to the language context, thus creating separate recordings of the instructions that were identical in content and acoustic information in all ways except for the words “English” and “Spanish.” To illustrate, the instructions began with a sentence such as “In the following experiment you will listen to rare words in X” where “X” was either “English” or “Spanish” in accordance with the language context. Thus our language contexts, Spanish and English, differed from each other solely in terms of the conceptual content of the task instructions. We recorded the instructions in a quiet room using an AKG C520 head-mounted condenser microphone. The signal was digitized at 44.1 kHz and 16-bit quantization using a Sound Devices USBPre 2 audio interface.
Voicing Continuum
We used the same voicing continuum described in Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019), which was a slightly modified version of the stimuli described in Gonzales and Lotto (Reference Gonzales and Lotto2013). For the sake of completeness, we provide an overview of how the stimuli were created. As the 2AFC task required that participants identify if the speaker was beginning to say “bafri” or “pafri,” the voicing continuum consisted of bilabial stop tokens that served as plausible representations of Spanish and English segments. The continuum was created in Praat (Boersma & Weenink, Reference Boersma and Weenink2018) and consisted of 14 stop + [af] sequences that varied in VOT from −35 to 35 ms in 5 ms increments (excluding a 0 ms step). An early Spanish–English bilingual produced a “pafri” token that was manipulated to create the other steps of the continuum. Specifically, the token was stripped of the [ɾi] segments and the voiceless interval of the stop, not including the release burst. Lead voicing was added in 5 ms increments to create seven prevoiced tokens. Voiceless intervals were added in 5 ms increments to create seven lag tokens.
Procedure
Participants completed the experiment in separate sessions that varied according to language context (Spanish, English). Session one consisted of the first iteration of the 2AFC task, followed by the LexTALE task. In session two participants completed the second iteration of the 2AFC task. The temporal order of the two language contexts was counterbalanced across participants. There was a minimum of 1 hour between testing sessions. In both sessions participants received general instructions before beginning. All interactions between participants and the experimenter took place in English, regardless of language context. PsychoPy3 (Peirce et al., Reference Peirce, Gray, Simpson, MacAskill, Höchenberger, Sogo, Kastman and Lindelv2019) presented the instructions and all stimuli. Participants received computer-based instructions in written and aural form simultaneously. In the 2AFC task the name of the language corresponding to the language context (Spanish, English) appeared at the top of the screen. Each trial began with the appearance of a fixation cross. After 500 ms the target words “bafri” and “pafri” appeared on either side of the screen. At the same time a randomly drawn continuum token was delivered binaurally through headphones. Participants were instructed to press the left or right arrow key to indicate whether the speaker was beginning to say the “rare word” located on the left or right side of the screen. A key press ended each trial. Participants were instructed to respond as fast and as accurately as possible. There were four initial practice trials pulling from the extremes of the continuum. During experimental trials, stimuli were drawn randomly in 10 separate blocks (14 steps × 10 blocks = 140 responses per participant). The same procedure was followed in both sessions. PsychoPy3 (Peirce et al., Reference Peirce, Gray, Simpson, MacAskill, Höchenberger, Sogo, Kastman and Lindelv2019) presented the stimuli in the LexTALE lexical decision task. The Spanish target words appeared in the center of the screen and participants used the keyboard to indicate whether they believed the targets were fake (0) or real (1) words. The experiment lasted approximately 75 minutes (session one: 10 minutes, break: 60 minutes, session two: 5 minutes).
Statistical Analyses
We report two primary statistical analyses. First, we analyzed the 2AFC identification data using a Bayesian multilevel logistic regression model.Footnote 3 Second, we analyzed participants’ perceptual boundaries using the random effects from the first model. The analyses were conducted in R (R Core Team, 2018, version 4.0.0). Both models were fit using stan (Stan Development Team, 2018) via the R package brms (Bürkner, Reference Bürkner2017).
In the first analysis, the participants’ responses, “bafri”/“pafri,” were modeled as a function of the fixed effects VOT, context (English, Spanish), LexTALE score (henceforth z-LexTALE), context order (English first, Spanish first), and all higher order interactions (excluding context order). The likelihood of the outcome variable was binomially distributed and the probability of making a “pafri” versus “bafri” response (coded as 1 vs. 0) was mapped to the logistic space using a logit linking function. The continuous predictors were standardized and the categorical predictors were deviation coded (Spanish = −0.5, English = 0.5; Spanish first = −0.5, English first = 0.5).Footnote 4 The random effects structure included by-subject intercepts with random slopes for VOT, context, z-LexTALE, and the corresponding higher order interactions. The model included regularizing, weakly informative priors (Gelman et al., Reference Gelman, Simpson and Betancourt2017), which were normally distributed and centered at 0 with a standard deviation of 5 for all population-level parameters. We established a region of practical equivalence (ROPE) of ±0.05 around a point null value of 0 (see Kruschke, Reference Kruschke2018).
In the second analysis we used the posterior of the random effects estimates of the aforementioned model to calculate a distribution of plausible points at which the probability of responding “voiceless” was equal to 50%. The average 50% crossover boundary was calculated for each participant in each context. Next, we established a by-subject phonemic boundary effect by subtracting the Spanish boundary from the English boundary, which was then standardized to have a mean of 0 and a standard deviation of 0.5. We used Bayesian linear regression to analyze the phonemic boundary effect as a function of z-LexTALE. The model was fit using a Student’s t-likelihood with a prior of Gamma(4, 1) for ν. The priors for β0 and β1 were normally distributed with a mean of 0 and a standard deviation of 1 (Normal(0, 1)), and the prior for σ was a Cauchy distribution set at 0 with scale 1. We established a ROPE of ±0.025 around a point null value of 0 (see Kruschke, Reference Kruschke2018).
Both models were fit with 2,000 iterations (1,000 warm-up). Hamiltonian Monte-Carlo sampling was carried out with 6 chains. For all models we report mean posterior point estimates for each parameter of interest, along with the 95% highest density interval (HDI), the percent of the region of the HDI contained within the ROPE, and the maximum probability of effect (MPE). We consider a posterior distribution for a parameter β in which 95% of the HDI falls outside the ROPE and a high MPE (i.e., values close to 1) as compelling evidence for a given effect. See the online supplementary materials for more information.
RESULTS
Figure 3 summarizes the posterior distribution of the omnibus model, illustrating point estimates ±70% and 95% HDI in graphical form. Table 1 of the supplementary materials provides a numeric summary of the posterior distribution. Holding all fixed effects constant, the log odds of responding “voiceless” were −0.85, or approximately 29.94% (β = −0.85, HDI = [−1.00, −0.68], ROPE = 0, MPE = 1), but, unsurprisingly, the likelihood of voiceless responses increased as VOT increased (β = 3.55, HDI = [3.19, 3.93], ROPE = 0, MPE = 1). At the mean proficiency level (x̄ LexTALE = 14.60, z-LexTALE = 0) participants’ responses differed as a function of language context (β = −0.23, HDI = [−0.37, −0.07], ROPE = 0, MPE = 1). Specifically, holding VOT and proficiency constant at 0, participants were more likely to respond “pafri” in the Spanish context. We found no evidence that the order in which the language context was presented had an effect on participants’ responses (β = −0.15, HDI = [−0.39, 0.07], ROPE = 0.16, MPE = 0.9). There was, however, evidence that the probability of responding “voiceless” increased in the Spanish context for participants with higher LexTALE scores above and beyond the effect of VOT. That is, conditional on the model, the data, and our prior assumptions, there was compelling evidence for a three-way interaction between VOT, language context, and z-LexTALE (β = −0.74, HDI = [−1.35, −0.06], ROPE = 0, MPE = 0.99). Concretely, “voiceless” responses increased as VOT increased and this effect was compounded in the “Spanish” condition, but only for participants with higher proficiency.
The triptych plot in Figure 4 provides draws from the posterior predictive distribution to illustrate the three-way interaction. Each panel depicts the proportion of voiceless responses as a function of VOT and context while holding proficiency constant at −2, 0, and 2 standard deviations, respectively (see Figure 2). One can see that independent of proficiency, voiceless responses increased as VOT increased. Importantly, there is also an observable shift to the left of the identification function in the “Spanish” context as proficiency in Spanish increases (2nd and 3rd panels). The identification function representing the “English” context is unaffected by changes in proficiency.
The left panel of Figure 5 summarizes the posterior distribution of the bivariate regression model analyzing the phonemic boundary effect. Table 3 of the supplementary materials provides a numeric summary of the posterior distribution. The model intercept represents the phonemic boundary effect at the average proficiency level (x̄ LexTALE = 14.60, z-LexTALE = 0), which equates to approximately 3.66 ms (β = 0.03, HDI = [−0.05, 0.10], ROPE = 0.39, MPE = 0.77). Importantly, the phonemic boundary effect increased as proficiency increased (β = 0.16, HDI = [0.03, 0.28], ROPE = 0, MPE = 0.99). The right panel of Figure 5 illustrates the positive correlation between the phonemic boundary effect and z-LexTALE. More detailed visualizations and table summaries are available in the online supplementary materials.
DISCUSSION
The present work investigated the double phonemic boundary effect in adult L2 learners of Spanish. We administered a two-alternative forced choice task in which the language context was conceptually cued in a within subjects design. This study replicated the double phonemic boundary effect and extended it to a different population: adult L2 learners. Specifically, we found that participants at the average proficiency level were more likely to identify stimuli drawn at random from a stop voicing continuum as “voiceless” when they were led to believe they were hearing rare Spanish words spoken by a native Spanish speaker. Furthermore, the results supported a theoretically motivated three-way interaction between VOT, the proficiency of the learners, and the language context. At low proficiency, voiceless responses only varied as a function of VOT; however, we show that learners with higher proficiency were more likely to respond “voiceless” when they believed they were hearing Spanish. In other words, the present work provides evidence that language-specific perceptual routines can be conceptually cued in adult L2 learners. Specifically, the double phonemic boundary effect increased as LexTALE scores increased.
Our findings corroborate those of a long line of “language set” experiments in which perceptual categorization has been shown to depend on language context in monolingual language mode in early bilinguals (Bohn & Flege, Reference Bohn and Flege1993; Elman et al., Reference Elman, Diehl and Buchwald1977; Flege & Eefting, Reference Flege and Eefting1987a; Garcia-Sierra et al., Reference Garcia-Sierra, Diehl and Champlin2009, Reference Garcia-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012; Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Gonzales & Lotto, Reference Gonzales and Lotto2013; Hazan & Boulakia, Reference Hazan and Boulakia1993; Williams, Reference Williams1977, Reference Williams1979) and in late learners (Casillas & Simonet, Reference Casillas and Simonet2018). This research can be contextualized in light of recent studies that have explored the double phonemic boundary effect by manipulating context-cueing strategies. Gonzales and Lotto (Reference Gonzales and Lotto2013) used explicit instructions and acoustic information to cue language context. Casillas and Simonet (Reference Casillas and Simonet2018) used only acoustic information. Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019) showed evidence for conceptual cueing in early bilinguals and we replicate this finding and extend it to late bilinguals.
The results provide evidence supporting the L2LP’s interpretation of Grosjean’s language mode hypothesis (Escudero, Reference Escudero2005; Van Leussen & Escudero, Reference Van Leussen and Escudero2015). Concretely, the model proposes that the ability to selectively activate L1 and L2 perception grammars develops with increased exposure to the L2. Previous research on cue-weighting strategies has supported the selective activation account by showing that Japanese learners of American English adjust their weighting of spectra and duration cues when identifying the /iː/-/ɪ/ vowel contrast (Yazawa et al., Reference Yazawa, Whang, Kondo and Escudero2019). Yazawa et al. (Reference Yazawa, Whang, Kondo and Escudero2019) conceive of language mode as a continuum ranging from L1 monolingual mode to L2 monolingual mode, with an intermediate L1–L2 bilingual mode existing in between. Our findings compliment this research and add to the existing evidence for selective activation of L1 and L2 monolingual modes using the double phonemic boundary effect. Yazawa et al. (Reference Yazawa, Whang, Kondo and Escudero2019) draw attention to the L1–L2 bilingual mode as an avenue for future research. Casillas and Simonet (Reference Casillas and Simonet2018) found that simultaneous bilinguals and adult L2 learners utilized different categorization criteria in both unilingual (i.e., one language at a time) and bilingual (i.e., English and Spanish concurrently) sessions. Future research could test for conceptually cued language-specific processing routines using cue-weighting and double phonemic boundary effects in L1–L2 bilingual mode.
A point of contention to keep in mind for future research relates to how the model conceives of L2 development. The L2LP specifically posits target language exposure as the driving force behind meaning-driven learning vis-à-vis the perception grammar (Van Leussen & Escudero, Reference Van Leussen and Escudero2015, p. 4). In effect, perception improves if the current state of the grammar results in misunderstandings; however, there is no straightforward method for operationalizing nor quantifying exposure. There is evidence that phonological development is correlated with L2 vocabulary size (Bundgaard-Nielsen et al., Reference Bundgaard-Nielsen, Best, Kroos and Tyler2012) and other lines of research show that learners’ largest gains often occur at an early stage of development (Munro et al., Reference Munro, Derwing, Saito, Levis and LeVelle2013; Williams, Reference Williams1979). The present work utilized a standardized assessment of vocabulary size as a proxy to L2 proficiency (see also Quam & Creel, Reference Quam and Creel2017). The evidence provided herein cannot partial out the possible mediating effect of either. It may be the case that proficiency, vocabulary size, quantity and quality of input, or any combination thereof are the key to perceptual development. For instance, an individual could receive large amounts of L2 input and not improve in L2 proficiency. The converse is also true. On the surface it seems plausible that there are multiple paths to development of the L2 perceptual grammar. The directed acyclic graph in Figure 6 illustrates some possible causal relationships leading to perceptual development.
A case could be made, for instance, that input, vocabulary size, or some version of the construct “proficiency” could lead to perceptual development. However, at this time we cannot discount the possibility that vocabulary size or proficiency are also mediator variables associated with input. Future investigations should consider measuring and controlling for input and vocabulary size in conjunction with standardized measures of proficiency to shed light on how these variables interact during L2 perceptual development to better inform models of L2 speech learning. This also opens the door to new avenues for research on individual differences in the development of the L2 perception grammar.
CONCLUSION
The present study replicated the findings of Gonzales et al. (Reference Gonzales, Byers-Heinlein and Lotto2019) and provided empirical evidence for conceptually cued language mode selection in late bilinguals. Specifically, we show that adult L2 learners of Spanish also display mode-specific perceptual normalization criteria in accordance with the fine-grained phonetic detail of the language they have been led to believe they are hearing. Additionally, we find that the double phonemic boundary effect develops as proficiency in the L2 increases. The results provide further evidence supporting the notion that there is some degree of separation between phonetic systems in the bilingual mind.
We express our gratitude to Kalim Gonzales, Krista Byers-Heinlein, and Andrew Lotto for sharing with us the auditory stimuli used in this study, and to Olimpia Rosenthal for lending her voice to facilitate this research. We also thank Timo Roettger, Miquel Simonet, Dave Kleinschmidt, and members of the Leap Lab, as well as two anonymous reviewers for comments and recommendations on an earlier version of this work. All errors are ours alone. The data, code, and experimental materials necessary to reproduce the analyses reported in this article are available at https://osf.io/cp9bs/
Supplementary Materials
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S0272263120000273.