Studies on age effects in second language (L2) acquisition show that pronunciation accuracy in the target language is one of the most difficult skills to acquire for late learners. Such investigations consistently demonstrate that postpuberty learners across different acquisition contexts are detectably different in speech production from monolingual native speakers and from early L2 learners (L2ers). The most robust finding is that foreign accent ratings show a negative correlation with age of acquisition (AOA); that is, the later an L2er is exposed to the L2, the stronger the foreign accent tends to be at the endstate of the acquisition process (e.g., Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009; Flege et al., Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006). As a consequence, AOA has been taken to be the primary predictor for pronunciation accuracy at L2 ultimate attainment. However, previous research also suggests that the link may not be entirely straightforward. In particular, researchers differ in how they conceptualize age effects in L2 speech production, the crucial question being whether AOA is the cause of persisting transfer from the first language (L1) or whether it is merely associated with it (Bialystok, Reference Bialystok1997, Reference Bialystok2001).
Some researchers interpret age effects as a direct reflection of a “critical” or “sensitive” period in the L2 acquisition of articulatory phonology and phonetics, which preempts nativelike pronunciation in late learners. In this view, behavioral deviance in L2 production compared to native speech has been linked to constraints in neurological and fine motor skills (e.g., Moyer, Reference Moyer1999). These constraints are argued to follow from maturational reductions in cerebral plasticity, which categorically prevent the reorganization of the speech production (and comprehension) system from the L1 to the L2 after a certain age (e.g., DeKeyser & Larson-Hall, Reference DeKeyser, Larson-Hall, Kroll and de Groot2005; Hyltenstam & Abrahamsson, Reference Hyltenstam and Abrahamsson2003). The location of this cutoff age for attaining nativelikeness is controversial, ranging from shortly after birth (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009; Hyltenstam & Abrahamsson, Reference Hyltenstam and Abrahamsson2003) through age 6 (Long, Reference Long1990) to puberty (DeKeyser, Reference DeKeyser2000; Scovel, Reference Scovel1988).
Others have interpreted age-related increments in foreign accent to be a consequence of the degree of L1 entrenchment in phonetic categorization, rather than maturational constraints on speech production. For instance, the speech learning model (Flege, Reference Flege, Burmeister, Piske and Rohde2002) hypothesizes that L2 learners are increasingly likely to process L2 phonetic categories as instances of L1 categories the longer the L1 had been spoken before the onset of L2 acquisition (Flege, Reference Flege and Birdsong1999). On this view, even though differences between L1 and L2 vowel and consonant categories may be detectable in comprehension, the classification of L2 phones as functional equivalents of L1 categories leads to the merging of L1 and L2 categories in speech production.
The entrenchment model implies that factors other than just AOA may impact on (non)native L2 pronunciation. Previous research has considered a wide variety of predictors such as length of residence in an L2 environment (LOR), chronological age at testing, typological distance of the languages, language use, language aptitude, and sociopsychological factors (for an overview, see Jesney, Reference Jesney2004). As has often been noted, however, many of these factors are confounded or covary with AOA. LOR, for example, is of necessity longer for earlier learners if they are age-matched to late learners at time of testing (e.g., Flege, Reference Flege, Piske and Young-Scholten2009; Moyer, Reference Moyer2007; Piske, MacKay, & Flege, Reference Piske, MacKay and Flege2001). In a similar vein, language use is linked with AOA because early L2ers who arrive in the target-language environment before age 10 tend to adopt the L2 as the main language of communication, whereas late L2ers retain higher degrees of use of the L1 in the target-language environment as well as in contacts to their home community (e.g., Jia, Aaronson & Wu, Reference Jia, Aaronson and Wu2002; Piske et al., Reference Piske, MacKay and Flege2001).
In addition, such investigations face the methodological problem that use of the L1/2Footnote 1 can only be measured on the basis of self-reports, which may not always be reliable. These problems may account to some extent for multifactorial analyses of L2 speech production often yielding mixed results, with factors other than AOA accounting only for a small amount of variance. For instance, in a meta-analysis of data from 240 L1 Korean and 240 L1 Italian L2ers of English across several tasks and studies, Flege (Reference Flege, Piske and Young-Scholten2009) reports that amount of L1/2 use accounts for less than 10% of the variance in foreign accent rating data. It hence remains to be investigated how factors such as use can be reliably assessed and how they affect L2 pronunciation independently of AOA.
The controversy on the role of the age effect in SLA is further complicated by findings suggesting that even L2 speakers with AOAs well below the onset of puberty do not invariably score within the native range (e.g., MacKay, Flege, & Imai, Reference MacKay, Flege and Imai2006; Piske et al., Reference Piske, MacKay and Flege2001). For instance, Korean children who arrived in the United States between the ages of 8 and 9 and who were tested after 3 and 5 years of residence were reliably rated as having a foreign accent compared to age-matched native English children (Flege et al., Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006; see also Flege, Munro, & MacKay, Reference Flege, Munro and MacKay1995; Flege, Yeni-Komshian, & Liu, Reference Flege, Yeni-Komshian and Liu1999). Similar effects are reported for Italian learners of English by MacKay et al. (Reference MacKay, Flege and Imai2006). In contrast, Pallier et al. (Reference Pallier, Dehaene, Poline, LeBihan, Argenti and Dupoux2003), Ventureyra (Reference Ventureyra2005), and Ventureyra, Pallier, and Yoo (Reference Ventureyra, Pallier and Yoo2004) suggest that speakers who experience sequential monolingualism, that is, a complete break in L1 input followed by a rapid breakdown of this system and full immersion in another language in all contexts of life (as experienced by international adoptees) may become fully nativelike in L2 speech perception even if this language reversal took place as late as age 10 (but see Hyltenstam, Bylund, Abrahamsson, & Park, Reference Hyltenstam, Bylund, Abrahamsson and Park2009).
Given the apparently relatively minor contribution of predictors other than age (such as L1/2 use) as well as the fact that even early learners may be perceptibly different from monolingual natives unless a complete language reversal has taken place, one may hypothesize that being a bilingual speaker in and of itself contributes to perceived nonnativeness. It has long been acknowledged that the end state in bilingual development cannot be equated with dual monolingualism (Grosjean, Reference Grosjean1998). Rather, it has been argued that multicompetence in more than one language should be taken to be the ultimate goal of L2 acquisition (Cook, Reference Cook and Cook2003). In other words, a proficient bilingual inherently differs from a monolingual by virtue of accessing an integrated language processing system that is partly shared across languages. This in turn implies interactions and cross-linguistic influence across both (or all) languages at multiple cognitive and linguistic levels that monolinguals do not experience.
Cross-linguistic interactive effects in bilinguals have been well documented in cognitive processing (e.g., Bialystok, Reference Bialystok2009) and linguistic processing at different levels (e.g., van Hell & Dijkstra, Reference Van Hell and Dijkstra2002, for the bilingual mental lexicon; Hernandez, Bates, & Avila, Reference Hernandez, Bates and Avila1994, for sentence processing; Cutler, Mehler, Norris, & Segui, Reference Cutler, Mehler, Norris and Segui1989, for the structuring of phonetic space). Bilingualism affects strategies and mechanisms of L1 and L2 processing (e.g., Dussias, Reference Dussias2004), as well as the speed of processing in either language (e.g., Hopp, Reference Hopp2010; McDonald, Reference McDonald2000), even if both languages are acquired from early childhood (Foursha, Austin, & van de Walle, Reference Foursha, Austin and Van de Walle2005; Proverbio, Cok, & Zani, Reference Proverbio, Cok and Zani2002; Werker & Byers-Heinlein, Reference Werker and Byers-Heinlein2008).
For pronunciation, several studies report bidirectional cross-linguistic influence in the speech production of bilinguals. Flege (Reference Flege1987; see also Flege & Hillenbrand, Reference Flege and Hillenbrand1984) studied late English–French and French–English bilinguals of different proficiency levels after some period of residence in an L2 environment. A comparison with monolingual controls of either language revealed that voice onset time (VOT) produced by the bilinguals in both their languages diverged from the monolingual norm (i.e., all bilingual speakers had shorter VOTs in English and longer VOTs in French than the controls). The degree of bidirectional influence was modulated by proficiency levels and length of residence in the L2 environment. In a similar vein, Fowler, Sramko, Ostry, Rowland, and Hallé (Reference Fowler, Sramko, Ostry, Rowland and Hallé2008) report that simultaneous (2L1) French–English speakers produce VOTs in either language that are different from those produced by monolingual native speakers (Fowler et al., Reference Fowler, Sramko, Ostry, Rowland and Hallé2008).Footnote 2 Although the bilinguals' VOTs clearly differ between English and French, indicating that phones do not merge across languages, the realization of those phones in one language affects their realization in the other.
These effects point to assimilatory processes in speech production (see also Sancier & Fowler, Reference Sancier and Fowler1997) and perception (e.g., Sundaraa, Polkaa, & Genesee, Reference Sundaraa, Polkaa and Genesee2006). Such cross-linguistic interaction seems to result from the active use of two languages, since the monolingual controls in the Fowler et al. study did not differ in their VOTs depending on whether they were occasionally exposed to the other language or not (but see Au, Knightly, Jun, & Oh, Reference Au, Knightly, Jun and Oh2002; Caramazza, Yeni-Komshian, Zurif, & Carbone, Reference Caramazza, Yeni-Komshian, Zurif and Carbone1973).
Despite these widely recognized cross-linguistic interactions that affect speech production and comprehension in bilinguals across AOAs, the reference groups in speech production studies typically consist of monolingual native speakers of the target language (e.g., Flege et al., Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006) or native speakers who overwhelmingly use the target language but might have some knowledge of other languages (e.g., Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009). It may be argued that the choice of a monolingual control group thus moves the yardstick of nativelikeness to a point that may, by definition, be out of reach for most bilinguals (see also Birdsong, Reference Birdsong, Kroll and de Groot2005). It may therefore be more appropriate to investigate whether L2ers can approximate the performance of speakers who have acquired the target language from birth but are also advanced late learners of an L2.
In this respect, 2L1 speakers, that is, bilinguals from birth, are unsuitable as a reference point in a direct comparison with late L2 learners because they differ in their chronological onset of bilingualism. In 2L1 speakers, assimilatory tendencies in phonetic categories across languages can be observed from the beginning of language acquisition due to concurrent L1 and L2 input (Fowler et al., Reference Fowler, Sramko, Ostry, Rowland and Hallé2008). In other words, 2L1 speakers arguably never develop monolingual native categories, which can then be affected by the later onset of bilingualism. In this way, they fundamentally differ from late L2 learners whose native (monolingual) phonetic categories affect speech production of a late-acquired language, and vice versa.
Similarly, early (child) L2ers, who are typically compared with late learners in studies on age effects in L2 acquisition, differ in their degree of entrenchment of L1 phonetic categories. Given the comparatively shorter length of exclusive L1 use for child L2ers, the impact of the L1 on a successively acquired L2 may be quantitatively distinct from the extent of cross-linguistic influence experienced by late learners (Flege et al., Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006).
The present study therefore introduces a bilingual reference group who shares the chronological asymmetry in the onset of L1 and L2 input characteristic of late L2 acquisition: L1 attriters, that is, adult long-term emigrants to a nontarget language environment whose use of their early-acquired L1 is greatly reduced following emigration. Both these bilingual populations are then compared against a (largely) monolingual reference group.
L1 ATTRITION
It is widely recognized that the development of L2 knowledge, processing, and use is partly influenced by the preinstantiated knowledge of the L1. At the same time the presence of a developing L2 has ramifications for the L1, potentially leading to changes in L1 processing and use (Schmid & Köpke, Reference Schmid, Köpke, Köpke, Schmid, Keijzer and Dostert2007). This process, known as L1 attrition, is most marked among speakers who experience a drastic and persistent change in linguistic habits and language environment (i.e., long-term migrants in a non-L1 setting), but it is by no means confined to such cases. Multicompetence approaches to bi- and multilingualism assume that the development of a second or foreign language system, even in low-proficiency instructed learners, will to some degree impact on the L1 (e.g., Cook, Reference Cook and Cook2003). That notwithstanding, investigations of L2 effects on L1 have so far usually focused on speakers who have experienced long-term immersion in an L2 environment and concomitant reductions in L1 use (for an overview, see Köpke & Schmid, Reference Köpke, Schmid, Schmid, Köpke, Keijzer and Weilemar2004; Schmid, forthcoming).
Although there has been an increasing interest in L1 attrition over the past two decades or so (see Schmid, Reference Schmid2010), virtually all investigations focus on lexical or grammatical features of the attriting language. For these linguistic levels it has been demonstrated that, whereas the reduction or cessation of L1 input and use in childhood can lead to considerable deterioration of grammatical categories (e.g., Schmitt, Reference Schmitt, Schmid, Köpke, Keijzer and Weilemar2004) and in extreme cases to an apparently complete loss of L1 proficiency (Pallier et al., Reference Pallier, Dehaene, Poline, LeBihan, Argenti and Dupoux2003), even severe reductions in L1 contact after puberty entail only relatively minor effects on L1 maintenance. For postpubescent attrition, it has been shown that while attrition may impair lexical access (Schmid & Köpke, Reference Schmid, Köpke and Pavlenko2008), underlying knowledge of grammar is quite resistant to processes of deterioration, and L2 interference seems to manifest itself predominantly in optionality at the interface level (Schmid, Reference Schmid2009; Tsimpli, Sorace, Heycock, & Filiaci, Reference Tsimpli, Sorace, Heycock and Filiaci2004). It therefore appears that, in both lexicon and grammar, attrition effects can be ascribed to the increased cognitive load of integrating two linguistic systems and retrieving elements from memory that had not been called upon for an extensive period of time, while underlying grammatical knowledge seems to remain quite stable.
Studies on the late attrition of pronunciation skills or phonetic/phonological perception, on the other hand, are few and far between.Footnote 3 Global observations of attriters' pronunciation suggest that L2 effects may be relatively limited at this level, too. For example, Giesbers (Reference Giesbers, Klatter-Folmer and Kroon1997) examines close to an hour of free speech produced by a Dutch native speaker who had been immersed in an Indonesian context for more than 30 years with few opportunities to use his L1. In the entire speech sample, Giesbers finds only 48 instances of clearly nontarget pronunciation, 30 of which concern stress and intonation patterns in compounds and sentences. In the absence of control data from monolingual speakers it is, of course, difficult to estimate whether this limited number of “deviances” differs at all from the native norm. In any case, it can hardly be considered an extensive “loss.”
However, assessing phonological and phonetic changes on the basis of discrete categories such as “correct” and “deviant” is not a straightforward matter, as L2 effects on L1 speech production can take more subtle forms that might not be classifiable as “errors” but contribute to an overall less-nativelike “acoustic flavor” in attriters' speech. Flege's (Reference Flege1987) seminal study of bidirectional effects in the speech of French–English and English–French bilinguals discussed above is probably the first formal investigation of such a phenomenon among late learners. Flege's (Reference Flege1987) finding that there is bidirectional interference on VOTs among experienced bilinguals is confirmed by Major (Reference Major1992) in the context of English–Brasilian Portuguese bilingualism. Similar bidirectional effects are reported by Mayr, Price, and Mennen (Reference Mayr, Price and Mennen2011) on VOTs and on vowel shift in a late Dutch–English bilingual speaker, by de Leeuw (Reference de Leeuw2008) on the lateral phoneme /l/ for late German–English bilinguals and by Mennen (Reference Mennen2004) on the suprasegmental level for late Dutch–Greek bilinguals.
Both Major (Reference Major1992) and Mennen (Reference Mennen2004) investigate five late learners of the L2, four of whom show bidirectional cross-linguistic interference. Yet, in both studies, one of the five participants appears to be exempt from cross-linguistic interference in either of her languages in formal (reading) style.Footnote 4 Different suggestions have been made to account for such interindividual variation in L2 influence on L1. Major (Reference Major1992) appeals to L2 proficiency in order to explain the variable levels of bidirectional interference among his participants, whereas Mennen points out that her exceptional speaker has a lower AOA than the others (15 vs. 20–25). In a case study of a Brazilian Portuguese–English bilingual, Sancier and Fowler (Reference Sancier and Fowler1997) furthermore observe that the extent of L2 influence on L1 VOTs is considerably stronger at the end of two periods of several months spent in an L2 environment than after recent exposure to and use of the L1.
It is interesting that neither Flege (Reference Flege1987) nor Mennen (Reference Mennen2004) use terms such as attrition or loss, instead arguing for a “‘merging’ of the phonetic properties of similar L1 and L2 phones” (Flege, Reference Flege1987, p. 62). Major (Reference Major1992), on the other hand, interprets the approximation of L1 VOTs to the L2 norm in an immersion setting as straightforward loss, arguing that “a close correlation exists between VOTs and other aspects of phonological proficiency, including global foreign accent” (p. 191). In other words, the larger the deviation in VOT from the native baseline, the more likely it is that a speaker will not be perceived to be a native, which entails that a change in VOT can contribute to the loss of the native-speaker status. In her study on the lateral phoneme /l/ in late German–English bilinguals, de Leeuw (Reference de Leeuw2008) also opts for the term loss, but she qualifies its implications. Although her findings clearly indicate that “at the level of performance, the bilingual migrants no longer conformed to the German monolingual norm” (in that they had a higher F1 frequency and an earlier alignment of the prenuclear rise), she goes on to assert “that despite these deviations, the German migrants to Canada are still native German speakers” (p. 203).Footnote 5
Regardless of the extent to which L2 effects on L1 speech production can be phonetically measured, the question arises as to whether these effects are actually perceptible in bilinguals' L1 speech. The only investigation of perceived global foreign accent in a group of L1 attriters we are aware of to date is de Leeuw (Reference de Leeuw2008; see also de Leeuw, Schmid, & Mennen, Reference de Leeuw, Schmid and Mennen2010).Footnote 6 In this study, short excerpts of spontaneous speech elicited in a narrative task performed by German migrants (to The Netherlands and to Anglophone Canada) were rated for nativelikeness by predominantly monolingual speakers of German in Germany. These ratings were compared to the ratings received in the same experiment by a control group of predominantly monolingual speakers of German residing in Germany. Overall, group comparisons showed that attriters were more likely to be perceived as nonnative than controls (for further details of this study, see below).
There were also, however, considerable differences among the attriters themselves, with 35% of the speakers being consistently perceived as nativelike and less than 25% receiving a clear nonnative rating. de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010) attempt to account for the difference in perceived nativeness on the basis of individual background and language use variables. By means of regression analyses they establish that neither AOA (>17 for all) nor LOR (>15 years for all) are significant predictors of perceived foreign accent in their sample. The frequency with which speakers use their L1 in informal settings with other bilinguals (family, friends) does not contribute toward perceived nativelikeness either, nor does the frequency of visits back to the home country. It is only L1 use in settings where little codeswitching is to be expected (in formal, work-related contexts or in distance communication with speakers in Germany), which has some (albeit limited) predictive power.
It therefore seems likely that changes in L1 pronunciation such as the ones revealed by the phonetic microanalyses reviewed above can eventually lead to a perceptible foreign accent in the native language for some experienced bilinguals. Other speakers appear to be spared from these developments and remain perceptibly nativelike even after several decades of residence in an L2 environment with little opportunity to use their L1.
On the face of it, the proportion of late bilinguals who do not develop a foreign accent in their L1 appears higher than the number of highly successful L2 learners who ultimately do attain nativelikeness: in de Leeuw et al.'s (Reference de Leeuw, Schmid and Mennen2010) investigation, more than a third of the bilingual participants were unambiguously rated as native speakers.Footnote 7 Although the proportion of L2ers identified as achieving nativelikeness varies across studies, most investigations locate it around or below the 5% norm originally proposed by Selinker (Reference Selinker1972). However, we are not aware of any study of L2 acquisition and L1 attrition that directly compares the two populations on foreign accent.
Such a direct comparison of L1 attriters and late L2 acquirers has the potential of opening a new perspective on the impact of age effects in language acquisition, as it allows for matching L2 speakers and native speakers on variables such as (the onset of) bilingualism and its effects, thus isolating age of onset. L1 attriters are native speakers of the target language because they have acquired it from birth. At the same time, for L1 attriters (as for L2 speakers) the target language is not the language they predominantly use. Hence, L1 attriters, like L2 speakers, experience asymmetric bilingualism effects, that is, cross-linguistic influence from the dominant language onto the weaker one. Figure 1 illustrates the rationale of the group comparisons.
HYPOTHESES
In this paper, we will present findings from a global foreign accent rating (FAR) experiment, comparing advanced late L2 learners of German, long-term L1 attriters of German and (predominantly) monolingual German control speakers. We aim to assess the relative impact of AOA and bilingualism in late L1 attrition and L2 acquisition. Based on the above rationale, we advance the following hypotheses:
Hypothesis 1: If L2 speech production is maturationally constrained (i.e., if AOA is the cause of the lower ultimate attainment witnessed among later learners), AOA should make an independent contribution to (non)nativelikeness in perceived accent. That is, late L2 learners should differ from native controls and late L1 attriters in foreign accent ratings at the group level. Moreover, the foreign accent ratings of individual late L2 learners should fall outside the range delimited by the L1 speakers of the target language with AOAs of 0.
Hypothesis 2: If L2 speech production is affected by cross-linguistic interference in bilingualism (i.e., if AOA is associated with differences in ultimate attainment between early and late learners), there should be a substantial overlap in foreign accent ratings for late L2 learners and late L1 attriters, and both groups should differ from predominantly monolingual native speakers. In addition, factors other than AOA, for example, LOR, use, attitudes, and so forth, should make a contribution toward the proportion of explained variance in foreign accent ratings.
Hypothesis 3: If cross-linguistic interference affects bilingual speech production, there should be differences between bilingual groups depending on language combination.
THE STUDY
Global foreign accent ratings
We conducted a global foreign accent rating experiment on free speech samples collected from two groups of late bilinguals: L1 attriters and L2 learners (see sections Participants and Materials below). Global assessments of perceived foreign accent in bilinguals have been widely used to make inferences about ultimate attainment in L2 phonology (for an excellent overview, see Jesney, Reference Jesney2004). Typically, in these studies, phonetically untrained native judges listen to samples of L2 and native speech and are asked to rate these for degree of nativelikeness. Native speaker performance is thus used as the implicit or explicitFootnote 8 reference point in judging foreign accent and, concomitantly, the range of the native scores is used as the cutoff criterion for establishing nativelikeness in the analysis of the rating results for the L2 speakers.
Despite the widespread use of foreign accent ratings, no established or standardized methodology exists for such studies (Jesney, Reference Jesney2004). There is therefore considerable variance between studies regarding the following issues:
• Measurement scales. Most studies rate speakers for nativelikeness on discrete, Likert-type scales with between 3 and 10 levels, although there are also investigations using sliding scales or magnitude estimations (for an overview, see Jesney, Reference Jesney2004, p. 2 ff.).
• Number of native control speech samples. It has been pointed out (Flege & Fletcher, Reference Flege and Fletcher1992) that native speaker judgments of foreign accent are subject to range effects. In other words, the larger the proportion of native or near-native samples included in the experiment, the stronger the perceived foreign accent of the L2ers. The proportion of control samples included in previous studies varies from none (Brennan & Brennan, Reference Brennan and Brennan1981; Schmid, Reference Schmid2002) to 50% (Munro & Derwing, Reference Munro and Derwing1995), with the majority of studies using somewhere between 10% and 20% (see the overview in Jesney, Reference Jesney2004).
• Type of speech. It has been demonstrated that the proportion of L2 learners who are perceived to be nativelike is higher in studies which use material that was formally elicited, such as word or list reading, than in casual style/free speech (e.g., Major, Reference Major1992; Moyer, Reference Moyer1999; Oyama, Reference Oyama1976). Abrahamsson and Hyltenstam (Reference Abrahamsson and Hyltenstam2009) therefore argue that formally elicited samples may reflect “language-like behaviour” rather than actual L2 proficiency (p. 254). A further problem of using formally elicited stimuli is that Flege and Fletcher (Reference Flege and Fletcher1992) established that raters were more likely to judge samples as nativelike after they became familiar with them. This may then impact on experiments that recurrently use the same material produced by different speakers, favoring those participants whose speech samples occur later.
• Length of samples. The length of the stimuli used also varies considerably, from single words in reading list style through full sentences to entire paragraphs. In free speech, the clips used are typically 10 to 20 s in length (Jesney, Reference Jesney2004). However, it has been established that native speaker judgments are usually made very fast; indeed, Flege (Reference Flege1984) shows that 30 ms may be enough for natives to accurately judge nativeness.
The highly divergent results across such studies, some of which suggest that a sizeable proportion of late L2ers can attain nativelikeness (e.g., Bongaerts, van Summeren, Planken, & Schils, Reference Bongaerts, van Summeren, Planken and Schils1997), whereas others find that such ultimate success is extremely rare (Moyer, Reference Moyer1999), may to some extent be ascribed to these methodological differences. For the experiment reported here, we therefore adopted the methods and criteria that have been applied most often in such investigations: a Likert-scale measurement in the intermediate range (6 points), 20% of native control samples, and free speech samples of 10–20 s in length (for a detailed account of the materials and procedure, see below).
Methodological variance notwithstanding, previous research has demonstrated the impact of a number of independent variables other than AOA on pronunciation accuracy. Perceived foreign accent has been found to correlate with length of residence (e.g., Asher & García, Reference Asher and García1969; Flege & Fletcher, Reference Flege and Fletcher1992; but see Oyama, Reference Oyama1976Footnote 9), amount of L1/2 use (Yeni-Komshian, Flege, & Liu, Reference Yeni-Komshian, Flege and Liu2000), attitude and motivation (Elliott, Reference Elliott1995; Moyer, Reference Moyer1999), and aptitude (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008). A gender effect has also occasionally been found, which appears to favor female prepuberty learners but disappears (or even reverses) among older learners (Asher & García, Reference Asher and García1969; Flege et al., Reference Flege, Munro and MacKay1995; Thompson, Reference Thompson1991). These factors will therefore also be considered in the present study.
Precursor studies
The populations whose perceived foreign accent was tested in the present investigation were subsamples drawn from two earlier studies: an investigation of L1 attriters reported by de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010) and an investigation of late L2 learners reported by Hopp (Reference Hopp2007).
Precursor study I. L1 attrition of German in an L2 English and L2 Dutch setting. De Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010) present a study of FARs of 57 L1 attriters. All speakers were long-term migrants who had been predominantly monolingual speakers of German prior to migration. Twenty-three of these speakers resided in The Netherlands and had Dutch as a second language, whereas 34 resided in an English-speaking environment (the Greater Vancouver area, Canada). These speakers had been recruited for a large-scale investigation of the L1 attrition of German (e.g., Schmid, Reference Schmid, Köpke, Schmid, Keijzer and Dostert2007, Reference Schmid2009). Migrants from Germany with an LOR of >15 years and an AOA of >17 years were invited to participate in an experiment that purported to investigate language change in Germany since German reunification. No other selection criteria (levels of L1 use, proficiency, etc.) were applied, and the recruitment text (which was circulated through newspapers and other media, German clubs and organizations, and personal contacts) explicitly stated that it was of no concern whether German was used daily or virtually never.
Speech samples ranging in length from 12.6 to 17.7 s were extracted from longer narratives elicited by a film retelling task from all speakers (see Materials Section) and interspersed with similar samples from five native German controls. The study employed 19 German listeners (students of phonetics at the University of Trier, Germany) who were presented with the samples in a soundproof room. Two judgments were invited for each sample: the raters were asked, first, to classify the speaker as native or nonnative and, second, to indicate their confidence in this rating on a 3-point scale (certain, semicertain, uncertain), resulting in an effective 6-point Likert scale (where 1 = certain of native speaker status and 6 = certain of nonnative status). The experiment is replicated in Schmid (Reference Schmid2009) with 15 additional speakers from the same population (2 German migrants in Canada, 1 German migrant in The Netherlands, and 12 reference group speakers) and 21 native raters, so that previous FARs are available for 36 attriters with English as L2, 24 attriters with Dutch as L2, and 17 native controls.
For inclusion in the present study, the 10 speakers who had received the lowest (i.e., most nativelike) and highest (most accented) ratings, respectively, from each of the two language groups were selected from this precursor study, resulting in a population of 40 attriters. This selection procedure was motivated by our desire to include samples representing the whole range of speakers who had been previously tested. In addition, speech samples from 16 predominantly monolingual reference group speakers were included in the analysis as baseline data.
Precursor study II. Late L2 acquisition of German by L1 speakers of English and Dutch. Hopp (Reference Hopp2007) tested 91 advanced late L2 learners of German with English, Dutch, and Russian as their respective L1s in a variety of morphosyntactic and semanticopragmatic domains in off-line and on-line tasks. Participants were recruited in Germany through informal networks, clubs, organizations, and advertisements. All participants had AOAs above 11 (average = 15.6) and advanced German after long periods of residence in Germany (average = 13.0 years). Participation in the study was solely contingent on AOA and results in a C-Test (see below). As part of the proficiency testing, all participants supplied samples of speech elicited in a picture-description task (for details, see Hopp, Reference Hopp2007). These speech samples ranged in length from 45 to 330 s. All speech samples, including those of four monolingual natives, were rated for nativeness on a 10-point scale (from clearly nonnative to clearly native) by three linguistically naive predominantly monolingual native speakers of German. The rating was broken down into discrete categories: fluency, vocabulary, expression, mistakes, and accent. A composite total score was computed from the ratings in each category.
Based on the composite score of these three raters, the 10 lowest scoring and the 10 highest scoring L1 English and L1 Dutch participants, respectively, were selected for the present study. This led to a sample of 40 L2ers, representing the bottom and the top range of late L2ers in the precursor study. In addition, the speech samples from the 4 native control participants were included in the present analysis.
Participants for the present study
The present study compares native speakers of German who are postpuberty emigrants to either Anglophone Canada or The Netherlands (L1Aers, n = 40 selected from de Leeuw et al., Reference de Leeuw, Schmid and Mennen2010, on the basis of the criteria established above) and late L2 learners of German with English or Dutch as L1s (L2ers, n = 40 selected from Hopp, Reference Hopp2007) with native speakers of German living in Germany (n = 20, 16 from de Leeuw et al., Reference de Leeuw, Schmid and Mennen2010, and 4 from Hopp, Reference Hopp2007). The choice to limit the investigation to those bilingual speakers from the two precursor studies who had received the highest and the lowest scores with respect to their perceived global accents was motivated by our desire to capture the full range of cross-linguistic interaction in the pronunciation of the available bilingual data. It should be pointed out that the preselection as well as differences in the recruiting procedure in both studies imply that the results of the present study represent the typical range of ultimate attainment in bilingual pronunciation, not the general population of (developing) L2 learners and L1 attriters.
Matching the L1Aer and L2er groups in terms of language combinations ensures that the type of cross-linguistic influence is similar across the two groups. Including more than one language combination further allows us to assess whether different language combinations affect perceived pronunciation accuracy in different ways. However, the measure applied here (global foreign accent rating) does not allow for conclusions with respect to cross-linguistic interactions with regard to specific phonemes or phonological environments.
The selection of L1Aers was restricted to late bilinguals who had emigrated from Germany to Canada or The Netherlands after age 17 in order to ensure that the L1 had been fully acquired prior to migration, so that the speakers would qualify as attriters rather than incomplete acquirers of the L1 (see Köpke & Schmid, Reference Köpke, Schmid, Schmid, Köpke, Keijzer and Weilemar2004). For this group, the nontarget language (i.e., the L2) has become the language used most frequently in daily life after an immersion period of more than 15 years.
The L2 group consisted of late L1 English and L1 Dutch L2 acquirers of German who first had contact with the L2 after age 11 and have been long-term residents in a German-speaking environment. For the recruitment of these speakers, advanced proficiency in L2 German was applied as a selection criterion. This process of screening for advanced L2ers was applied since, as Long (Reference Long, Hyltenstam and Viberg1993) points out, “[t]here is no value in studying obviously non-native like individuals intensively in order to declare them non-native like” (p. 204).
It was impossible to establish a truly monolingual control group, because obligatory foreign language instruction has been implemented in most educational systems. The selection of native speakers for the control group in this study was limited to individuals who had not acquired languages other than German before school age, never lived outside Germany, and who did not regularly use a language other than German in their daily lives. Table 1 presents an overview of age, AOA, and LOR of the speakers investigated in this study.
Note: L1Aers, first language attriters; L2ers, second language learners; AOA, age of acquisition; LOR, length of residence in an L2 environment.
Language use and language attitudes
For the speakers investigated by de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010), a large amount of information on L1/2 use in a variety of situations as well as on language and cultural attitudes and preferences was available, based on their responses to a sociolinguistic, personal background, and attitude questionnaire (see Schmid, Reference Schmid2011). This questionnaire elicited self-reports by means of 110 questions. Some of these were open-ended or categorical (yes/no), but the majority of items were elicited on a 5-point Likert scale, where the highest level (1) indicated overwhelming preference for or predominance of German and the lowest level (0) an equally strong preponderance of L2. By means of principal component analysis, Schmid and Dusseldorp (Reference Schmid and Dusseldorp2010) established the following eight compound variables, which were shown to possess high internal validity:
1. PARTNER: frequency of L1 use with the partner (4 questions)
2. CHILDREN: frequency of L1 use with children (4 questions)
3. FRIENDS: frequency of L1 use with friends with a German migrant background (4 questions)
These first three items thus pertained to the frequency of informal use of the L1 with other bilinguals, that is, L1 use in situations where code switching frequently occurs.
4. INTERMEDIATEFootnote 10: frequency of L1 use in situations where the interlocutors are also bilingual, but where code switching is deemed inappropriate
(a) frequency of attending a German church (2 items)
(b) frequency of attending a German club (2 items)
5. PASSIVE: frequency of passive exposure to targetlike L1
(a) frequency of exposure to German media (2 items)
(b) frequency of visits to Germany
6. WORK: the use of the L1 for professional purposes
7. TOTAL: total frequency of the use of German, calculated as the arithmetic mean of all of the language use variables listed above (20 items or fewer, e.g., in the case of participants who had no children or partner)
8. AFFILIATION: affiliation and identification
(a) L1 use for internal speech, such as thinking, dreaming and counting (3 items)
(b) language and culture of preference (2 items)
(c) perceived importance of L1 maintenance and its transmission to the speaker's children (2 items)
In order to obtain the same information from the L2ers, an online questionnaire including all questions that factored into the calculation of the compound variables listed above was constructed. All participants were invited to provide this information via a weblink sent to them by e-mail. Unfortunately, not all of the participants in the original study replied, but 31 of the 40 L2ers included in this study (L1 Dutch, n = 18; L1 English n = 13) filled in the questionnaire.
Figure 2 shows that the L2ers tend to use German more frequently than the L1 attriters in all contexts with the exception of the language spoken with children. In terms of affiliation, the L1Aers in The Netherlands appear to have a somewhat stronger bond with the German language and culture than the other groups, who appear to be largely balanced in their attitudes toward their L1 and their L2.
Materials
Speech samples. All speakers performed a narrative-descriptive task designed to elicit free speech. The L2ers and four of the native controls were asked to describe the Cookie Theft picture (Boston Diagnostic Aphasia Examination) and a cartoon strip, whereas the L1Aers and the remaining 16 control speakers were given the Charlie Chaplin—Modern Times film narration task as described by Perdue (Reference Perdue1993). From these narratives, speech samples ranging between 10 and 20 s in length were extracted. The following criteria were applied to this selection process:
• samples constituted full sentences or clause/intonational units (in order to be recognizable as grammatical structures)
• samples did not contain lexically or grammatically deviant structures, because it has been shown that such “errors” may adversely affect FARs (McDermott, Reference McDermott1986, ct. after Jesney, Reference Jesney2004)
• samples did not contain borrowings from the L2, nor items such as proper or place names (because the L1Aers with English as their L2 had the tendency to pronounce names such as Charlie Chaplin in an English-like fashion, e.g., realizing the /r/ in Charlie as a retroflex approximant [ɻ], while German speakers adapted such items to their L1 phonology).
In addition, we attempted to select speech samples that, as far as possible, described different parts of the stimulus. This was done because the two experimental populations (L1Aers and L2ers) had been given different narrative tasks. We wanted to avoid the effect where listeners would be able to categorize speakers on the basis of topic, not accent. Because de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010) used samples that all referred to the same scene of the film whereas Hopp (Reference Hopp2007) had all speech samples rated in their entirety, the actual excerpts used in the current study were not the same as the ones used in the precursor studies.
All samples were normalized to 3 dB and background noise was reduced in order to eliminate any possibility of perceptible differences between the excerpts from the two precursor studies.
C-Test. The speakers investigated in both precursor studies also completed a German C-Test as an assessment of their general proficiency. The C-Test is a variation on the cloze test in that words are only partially deleted but the gaps are more frequent. The first sentence of the text is left intact, and starting with the second sentence, the second half of every second word is removed and replaced by a gap. This task requires the participant to make full use of the natural redundancy of a text, making it possible to measure not only relatively low level skills (command of vocabulary, grammar, idioms) but also higher order skills such as awareness of intersentential relationships, global reading, and so forth.
Although the two precursor studies did not employ the same texts, the results seem to suggest that the two tasks were similar in their level of difficulty: in both studies, the predominantly monolingual controls achieved a mean of 82% to 83% (see Hopp, Reference Hopp2007, p. 200; Schmid & Dusseldorp, Reference Schmid and Dusseldorp2010, p. 138). The results from the C-Tests were therefore included in the present study as an indication of overall proficiency levels. Because it has been demonstrated that such proficiency levels correlate highly with measures of language aptitude (for overview, see Dörnyei & Skehan, Reference Dörnyei, Skehan, Doughty and Long2003), the C-Test score was also used here to indirectly estimate aptitude, because no direct aptitude measure was collected in the precursor studies.
Procedure
We replicated the procedure described by de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010) in that German listeners made two judgments for each speech sample. The first binary judgment determined native versus nonnative speaker status (in answer to the question “Is this person a native speaker of German?”). The second judgment expressed the level of confidence on a 3-point scale. This resulted in an operative 6-point Likert scale: 6 = certain of nonnative speaker status, 5 = semicertain of nonnative speaker status, 4 = uncertain of nonnative speaker status, 3 = uncertain of native speaker status, 2 = semicertain of native speaker status, and 1 = certain of native speaker status. Hence, a low FAR reflects a speaker who was perceived as native or near-native, whereas a high FAR reflects a speaker who was rated as having a noticeable foreign accent in his or her German speech.
A silent pause of 7 s followed each sample, and each sample was played only once. During the silent pause, German listeners assessed native- or nonnative-speaker status of the speaker they had heard and indicated how certain they were of this judgment. After the silent pause, the next sample was presented. The total duration of the sequence of 100 samples was 36.06 min. The samples were pseudorandomized, and two lists were created. All 100 speech samples were played from a mediaplayer, which automatically leveled volume across samples and inserted the 7-s silence between stimuli. For logistical reasons, the stimuli were played via a state-of-the-art audio system in a lecture theatre at the University of Mannheim.
Listeners
Two groups of listeners took part in the foreign accent assessment in two separate sessions. Seventy-six listeners took part in the first session, and 73 listeners took part in the second. All 149 listeners were first-year students at the Department of English at the University of Mannheim, Germany. They had received no specific phonetic training. Only those listeners who reported not to have been exposed to languages other than German in childhood were retained for analysis. In all, 130 German listeners were analyzed: 68 in the first group and 62 in the second. The German listeners also had good knowledge of English, as is standard in modern-day Germany.
RESULTS
The large number of raters (n = 130) did not permit an assessment of interrater reliability by means of measures such as the Cronbach α. However, a comparison of the average FARs received by the individual speakers in this experiment with the ones obtained in the precursor studies,Footnote 11 in which ratings obtained from smaller listener populations were shown to have excellent reliability, revealed very strong correlations: for the L1Aers, the correlation with the average scores elicited by de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010) was r = .839 (p < .001); for the L2ers the correlation with the scores reported by Hopp (Reference Hopp2007) was r = .620 (p < .001). These robust correlations suggest that ratings across listeners and studies are highly consistent, in particular because different extracts from the speech samples by the same speakers were rated in the precursor studies.
In the global foreign accent rating, the control group speakers received a mean FAR of 2.36 (SD = .95). The L1Aers received a mean FAR of 2.79 (SD = 1.25), and the mean FAR of the L2ers was 3.94 (SD = 1.46; see Figure 3). In a one-way analysis of variance, the group differences were highly significant, F (2, 98) = 14.033, p < .001, η2 = 0.47. Post hoc comparisons (Tukey honestly significant difference) revealed that the L2ers were significantly different from both the control speakers and the L1Aers (p < .001). There was no difference between L1Aers and controls at the group level (p = .258).
As for L1 effects, a comparison of the control speakers with the bilingual populations subdivided by contact language (attriters in The Netherlands, mean FAR = 2.67, SD = 1.38; in Canada, mean FAR = 3.01, SD = 1.18; L2ers with Dutch as L1, mean FAR = 3.48, SD = 1.55; L2ers with English as L1, mean FAR = 4.41, SD = 1.24; see Figure 4) revealed a significant difference between the five groups of speakers, F (4, 156) = 8.867, p < .001, η2 = 0.52, in a one-way analysis of variance.
Post hoc procedures (Tukey honestly significant difference) revealed that there were no differences between the two L1A (p = .915) or the two L2er (p = .142) groups. Further, neither of the two L1A subgroups differed significantly from the controls (attriters in The Netherlands p = .801; attriters in Canada p = .294). In contrast, both L2 subgroups received FARs that were significantly lower than those of the controls (L2ers with Dutch L1 p = .018; L2ers with English L1 p < .001). There was no significant difference between the Dutch L2ers of German and the attriters with Dutch or English as L2 (p = .253 and .754, respectively). However, the English–German L2ers differed significantly from both groups of attriters (attriters in The Netherlands p < .001; attriters in Canada p = .006). Moreover, a comparison of all bilingual groups by language revealed a marginally significant difference between those speakers with Dutch as the L1/2 and those with English as the L1/2, F (1, 78) 3.780; p = .055, with speakers with Dutch as L1/2 obtaining lower FAR ratings (mean = 3.08, SD = 1.51) than those with English (mean = 3.71, SD = 1.39).
These group results suggest (a) that the L2ers are different from both controls and L1Aers, but (b) that there is no difference between L1Aers and controls. However, the descriptive statistics presented above also indicate considerable variance within the populations. We therefore converted the average FARs into categorical ratings. Following de Leeuw et al. (Reference de Leeuw, Schmid and Mennen2010), we defined a “clearly native” range with a FAR between 1.0 and 2.5, an average “uncertain” range (2.5 < FAR < 4.5), and a “clearly nonnative” range (4.5 < FAR < 6). The resulting distribution is presented in Table 2. Group differences were significant (χ2 = 18.649, p = .001).
Note: L1Aers, first language attriters; L2ers, second language learners.
The scatterplot in Figure 5 depicts the individual FARs in the three groups. As can be seen, 29 L1 attriters fell within the range delimited by the native control group with a FAR of <3.62, whereas 11 scored outside this native range. As for the L2ers, 15 scored within the native range, and 32 fell within the range delimited by the L1 attriters with a FAR of <5.46.
This distribution shows that there are subsamples of bilingual speakers who are perceived differently from the majority of their peers: attriters who come to be perceived as nonnatives, and L2 learners who manage to attain a nativelike accent in German. In an attempt to account for such individual differences in FARs, we tested predictors pertaining to speakers' personal background, language habits, and attitudes.
There was no difference in the FARs given to male (n = 41, mean FAR = 3.32, SD = 1.62) and female (n = 59, mean FAR = 3.05, SD = 1.30) speakers (t = 0.915, p = .363). A correlation of age at testing with FAR initially revealed a significant negative relationship between these two variables (r = −.207, p = .039). It was hypothesized that this correspondence might be due to the fact that the attriting and control population were older on average than the L2ers. This assumption was confirmed: when the effect of group was partialed out, the correlation was no longer significant (r = .122, p = .285).
It was not possible to test the effects of LOR in one combined analysis for the two bilingual groups, because this factor does not impact on L1Aers and L2ers in the same way: a longer period of residence is assumed to lead to a stronger perceived foreign accent in L1Aers but to less accented speech production in L2ers. We therefore opted for bivariate correlations within each bilingual population between LOR and FAR. In this analysis LOR did not correlate with perceived foreign accent for either of the bilingual groups (L1Aers: FAR vs. LOR r = −.039, p = .809; L2ers: FAR vs. LOR r = .019, p = .906). The scatterplot in Figure 6 illustrates the lack of a correspondence between FAR and LOR.
Next, it was assessed whether self-reported use of German and attitudes toward the L1/2 impacted in any way on FAR. As is evident from Table 3, there are two significant correspondences: for the L2ers, the amount of use of German with the partner correlated negatively with FAR, indicating that those speakers who frequently use German in this context were more likely to be perceived as nativelike. In fact, of the 11 L2ers who were rated as nativelike, only 1 had a partner whose native language was not German (whereas 1 more was not in a relationship). The averaged total frequency of interactive use of German (i.e., the arithmetic mean of all answers to the 20 questions on use of German as listed above) also reached significance for the L2ers. No such correlations were found for the attriters, and none of the other language use and attitude factors was significant.
Note: L1Aers, first language attriters; L2ers, second language learners.
*p < .05. **p < .01.
Finally, we attempted to investigate the extent to which internal factors, for example, variation in language aptitude, might be associated with differences in FARs (Table 4). Language aptitude is commonly defined as the individual, largely innate talent for processing language. It is taken to encompass grammatical sensitivity, phonetic decoding ability as well as rote and inductive learning ability (e.g., Carroll, Reference Carroll and Diller1981). It has been demonstrated to correlate significantly with proficiency and ultimate attainment in adult L2 acquisition (e.g., Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2009; DeKeyser, Reference DeKeyser2000; Robinson, Reference Robinson2005) and the degree of loss of the L1 in late L1 attrition (Bylund, Abrahamsson, & Hyltenstam, Reference Bylund, Abrahamsson and Hyltenstam2010).
Note: FAR, foreign accent rating; L1Aers, first language attriters; L2ers, second language learners; LOR, length of residence in an L2 environment.
*p < .05. **p < .01.
Unfortunately, no direct aptitude measures were available for the participants tested in the present study. Given that aptitude scores are typically strongly correlated with proficiency measures of different types (for an overview, see Dörnyei & Skehan, Reference Dörnyei, Skehan, Doughty and Long2003), we used the C-Test scores obtained for all participants as part of the general proficiency testing to estimate individual aptitude. First, we checked whether proficiency can be taken to reflect aptitude or whether it correlates with any of the factors investigated previously, that is, LOR and use. In a linear regression analysis, the predictors LOR and total use together do not contribute significantly to the total explained variance in the C-Test scores: L1Aers: F (2, 37) = 1.007, p = .375, r 2 = .052; L2ers: F (2, 28) = 0.800, p = .459, r 2 = .054. We thus surmise that the C-Test scores can be taken to provide an indirect and partial measure of language aptitude. For all bilinguals, there is a moderate correlation of the C-Test score and FAR at r = −.490 (p < .001), which also holds for the L1 attriters (r = −.435, p = .005) and the L2ers (r = −.472, p = .002) separately. The correlation between C-Test score and FAR remains strong at r = −.413 (p < .001) when the effects of group, LOR, and total use are partialled out, which suggests that the C-Test score is a strong predictor of nativelike pronunciation, largely independently of other external factors. This conclusion is further strengthened by the fact that for the predominantly monolingual controls, C-Test scores and FAR were not significantly correlated (r 2 = −.063, p = .815).
DISCUSSION
The findings from the global foreign accent assessment task presented above allow for a number of interesting observations regarding the general development of pronunciation in long-term bilinguals. It was shown that, at group level, late bilingualism leads to stronger foreign accents in the L2 than in the L1. In other words, advanced L2 speakers are overall outperformed in terms of perceived nativelikeness by long-term attriters, who learned the language under investigation from birth but have not been using it dominantly for a substantial period of time. Of the 40 L1Aers investigated, 29 (72.5%) scored within the range of the unattrited, predominantly monolingual native controls, whereas only 13 L2ers (37.5%) fell within this range. This indicates that late bilinguals who start out as native speakers of the target language on average still remain closer to the native benchmark than those who approximate it coming from another language. The group results thus suggest that it is easier to retain an early-acquired language across an extended period of nonexposure than to attain it from scratch at a later age. At first glance, these results seem to support Hypothesis 1, which holds that L2 acquisition is maturationally constrained.
However, the differences in FARs between the bilingual groups are not nearly as categorical as Hypothesis 1 would predict. In other words, the perceived difference between long-term bilinguals and native speakers in terms of their pronunciation cannot be ascribed entirely to AOA or the sequence in which the languages were acquired: a subset of the attriters (who had acquired German from birth) were perceived to be clearly nonnative, whereas a number of late L2ers did fall within the unambiguously native range. Moreover, there was a sizeable overlap between the L1A and the L2 group, with 32 of all L2ers (80%) falling within the range of perceived foreign accent delimited by the L1 attriters. In a direct comparison of L1 attriters and advanced L2 learners, four-fifths of all L2ers were thus rated no worse than (late bilingual) native speakers of German in terms of perceived foreign accent. It is important that this considerable overlap in our study holds true for a population of L2ers who were not prescreened for nativelikeness in speech production. Unlike in most studies on ultimate attainment in L2 speech production (e.g., Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008, Reference Abrahamsson and Hyltenstam2009; Bongaerts, Reference Bongaerts and Birdsong1999), inclusion in the study by Hopp (Reference Hopp2007) was contingent solely on advanced general proficiency in a C-Test. In addition, L2 and native performance overlap in extemporaneous speech which has previously been found to be among the most challenging tasks for late L2ers in L2 global accent studies (e.g., Jesney, Reference Jesney2004; see also above).
Contrary to Hypothesis 1, this finding poignantly illustrates that monolingual acquisition of the target language from birth in and of itself does not ensure sustained nativelikeness in speech production. Rather, the late onset of bilingualism affects speech production in such a way that some native speakers lose their perceived native accents after prolonged immersion in a nontarget language environment. In the present study, the ages of emigration in the L1 attriter group ranged from 17 to 51 years. In view of these late onsets of L2 acquisition, it can safely be assumed that all L1 attriters had acquired German to the monolingual adult standard before emigration. In other words, it is unlikely that any perceived foreign accent in German might have been the result of incomplete acquisition. Instead, the loss of nativelikeness is a consequence of late bilingualism.
In view of the general finding that FARs varied substantially within and across both bilingual populations, we attempted to account for variation of accent ratings in terms of the bilingual experience. We established that the length of time that participants had spent in a bilingual setting (LOR) did not contribute to the FAR. In other words, attriters with longer periods of residence in an L2 environment were not rated to be less nativelike than those whose emigration had taken place more recently. In a similar vein, those L2ers who had spent the longest time in the target language environment did not achieve significantly better FAR scores than those who had come to Germany only a few years ago.Footnote 12 For L1 attrition, it has often been proposed that the bulk of the development takes place within the first decade after emigration (e.g., de Bot & Clyne, Reference Bot and Clyne1994). The speakers investigated here had likely reached their ultimate attrition stage during this time, and the long period of residence (>15 years) stipulated as an inclusion criterion for the present investigation may thus have prevented measurable LOR effects.
We further investigated to what degree perceived foreign accent is affected by whether participants use the target language on a regular basis, or how they feel about the German language and German culture. We therefore assessed the impact of self-reported frequency of use of and exposure to German and of linguistic and cultural affiliation on FARs. Again, this analysis did not yield any tangible explanatory findings for the L1 attriters. These results are in line with previous findings that have also reported null effects of such factors on L1 attrition across different linguistic levels (e.g., Schmid, Reference Schmid, Köpke, Schmid, Keijzer and Dostert2007; Schmid & Dusseldorp, Reference Schmid and Dusseldorp2010). Similarly, for the L2ers, which language the participants predominantly used with their children, friends, in clubs or churches, or for professional purposes did not influence their perceived nativelikeness. Furthermore, the FARs were not impacted on by the language or culture that the participant preferred.
Only one of the individual external factors, namely, the amount of German spoken with the partner, did reach significance for the L2 group (but not for the attriters). Based on this finding, we further established that of those L2ers who were rated as nativelike, only one had a partner who was not a native speaker of German (while one other was not currently in a relationship). This finding is interesting in the light of neurobiological investigations on issues such as stimulus appraisal and language learning, which have suggested that emotional involvement may contribute to success in L2 learning (e.g., Schumann, Reference Schumann1998). Alternatively, the effect may reflect that interactions with a partner are usually more frequent and consistent over time than the other types of L2 communication investigated here. When frequency of use was calculated as the arithmetic mean of all the component variables on interactive use of the target language that were collected, it did show a significant contingency with FARs for the L2ers. In other words, when added across contexts and situations, the relative amount of use of the L2 affects the degree of perceived foreign accent for L2 speakers. No such correspondence was found for the L1 attriters.
These findings suggest that, for speech production, the ability to retain the native status once the language has been acquired is largely independent of external factors. In contrast, the ability to attain the native status in speech production after puberty is affected by the overall amount of use of the target language (see also Flege & Liu, Reference Flege and Liu2001; Yeni-Komshian et al., Reference Yeni-Komshian, Flege and Liu2000). These findings are compatible with Hypothesis 2, and, by consequence, with models of L2 speech production that emphasize the degree of entrenchment of the L1 versus the L2 (e.g., Flege's speech learning model; Flege, Reference Flege, Burmeister, Piske and Rohde2002). According to these models, increased frequency of use of the L2 loosens the relations of L2 sounds to L1 phonetic and phonemic categories. For the L2ers in this study, greater use of German led to less perceptible foreign accents.
With respect to Hypothesis 3, the cross-linguistic comparisons in this study also revealed marginally significant L1/2 effects in that bilinguals with Dutch as the L1/2 were perceived to be closer to the native norm than bilinguals with English as the L1/2. In line with Hypothesis 3, this difference suggests that the amount of cross-linguistic interference in speech production may to some extent be conditioned by phonetic and phonological differences between languages. Interference between German and Dutch, which are typologically and phonologically more closely related than German and English, leads to a less perceptible foreign accent in L1 attrition and L2 acquisition. This finding supports models that view L2 speech production as the gradual restructuring of L1 speech categories to the target classification as a function of the relative distance of L1/2 categories and the extent of L2 input.
In this vein, the lack of use effects found here for the L1 attriters would appear to suggest that the L1 categories had been entrenched to a greater degree before the onset of bilingualism, such that late bilingualism has a less pronounced effect on L1 speech production overall. This conclusion is further backed up by the observation that advances in acquiring L2 phonology do not go hand in hand with perceptible decrements in L1 speech production ability (e.g., Fowler et al., Reference Fowler, Sramko, Ostry, Rowland and Hallé2008).Footnote 13
Finally, we attempted to investigate whether the development of (non)nativelikeness in late bilinguals is modulated by language aptitude. Such effects have been reported before for late L2ers (e.g., DeKeyser, Reference DeKeyser2000; Hyltenstam & Abrahamsson, 2009) and prepubescent attriters (Bylund et al., Reference Bylund, Abrahamsson and Hyltenstam2010), but they have not yet been investigated in the context of late L1 attrition.
Because no direct measures of aptitude were collected, we resorted to using the scores from a C-Test that had been administered to both bilingual populations in order to estimate individual aptitude. Although such global proficiency measures constitute an amalgam of factors, language aptitude has been shown to explain a large degree of variance in proficiency scores (see above). As the C-Test scores were not related to any of the external factors measured in this study for either of the bilingual groups, it appears likely that they reflect individual differences in language aptitude to a large degree.
The general proficiency score measured by the C-Test correlated strongly with foreign accent in both the L1 attriter and the L2 group. Our finding that there is no effect of L1 use in the L1 attriter group, but a moderate correlation between global proficiency scores and foreign accent ratings, is compatible with the assumption that language aptitude may have some protective function in L1 attrition. In other words, higher degrees of language aptitude might mitigate the adverse effects of cross-linguistic influence on L1 speech production even after prolonged periods of nonuse.
Conversely, language aptitude would seem to serve a facilitatory function in late L2 acquisition. It is interesting that the total amount of use of the L2 does not correlate with the C-Test score (r = .23), our indirect measure of aptitude. This lack of a correlation indicates that use and aptitude independently affect L2 speech production. What this implies is that increased language use facilitates approximating nativelike pronunciation in L2 acquisition across individual variation in terms of aptitude (see also Harley & Hart, Reference Harley and Hart1997). However, the data in the present study cannot address the question of the relative influences of use versus aptitude. In particular, the extent to which above-average language aptitude is a prerequisite for attaining nativelike accents cannot be answered in the present context, as the L2 participants in Hopp (Reference Hopp2007) were selected on the basis of their advanced proficiency. They might thus represent a skewed sample with above-average language aptitude to start with. Summarizing, the indirect effects of language aptitude across the bilingual groups point to a protective function of aptitude in L1 attrition and a facilitatory function of aptitude in late L2 acquisition.
In general, the direct comparison between late L1 attriters and late L2ers undertaken in the present study suggests that late bilingualism is an important factor in the development of overall pronunciation in both L1 and L2. More particularly, the findings highlight that, irrespective of whether the target language investigated is the early-acquired L1 or a late-learned L2, speakers who become bilingual after puberty experience bilingualism effects in terms of bidirectional cross-linguistic influence. The present study thus substantiates the argument that the endpoint of bilingualism does not amount to additive monolingualism, whether in L1 or in L2 (Cook, Reference Cook and Cook2003; Grosjean, Reference Grosjean1998).
When putting all findings from this study together, they do not seem to allow us to reject either Hypothesis 1 or Hypothesis 2 straightforwardly; rather, it seems that their interpretation depends on the perspective taken on the data.
Looking at the results from the L1 attrition perspective, we find that L1 attriters do not differ in perceived foreign accent from the native speakers at the group level. Moreover, FARs in L1 attrition are not significantly modulated by external factors such as age of emigration, length of time in an L2 environment, L1/2 use, and affiliation. Given that there are also significant differences between L1 attriters and L2ers at the group level, the present study would then appear to support the view that there is a strong effect of AOA on bilingual speech production.
Looking at the results from the L2 acquisition perspective, we find considerable overlap of the L2ers and the L1 attriters, with only 20% of the L2ers investigated here scoring outside the range of native speakers of German. In addition, the FARs attained by the L2ers are significantly correlated with the levels of use of the L2 and language aptitude. On the basis of these similarities between L1 attriters and L2ers, the present findings would then emphasize the strong bilingualism effects on speech production and the influence of factors other than AOA.
Ultimately, however, we would argue that the interpretation of the findings in the present study is not just a matter of which perspective is preferred. Given the nontrivial consequences of postulating maturational constraints on L2 acquisition for our understanding of the neurocognitive architecture and the mental processing of language, proponents of maturational constraints in L2 acquisition have to demonstrate that there is a substantial contribution of maturational constraints in L2 development. Experimentally, this translates into the requirement that AOA needs to be shown to exert a strong and (largely) independent predictive role for convergence in bilingualism (e.g., Birdsong, Reference Birdsong and Birdsong1999). This entails that late L2ers pattern outside the performance range delimited by native speakers with AOAs at birth. The present study can identify no such independent predictive role of AOA on perceived foreign accent. The findings that (a) the bilingual groups overlap to a large extent and (b) foreign accent in the L2 is correlated significantly with other variables indicate that foreign accent in bilingualism cannot be predominantly related to AOA.
Taken together, our results are compatible with interference models of bilingual speech production (e.g., Flege et al., Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006) and, more broadly, continuity models of L2 acquisition (e.g., Bialystok, Reference Bialystok2001; Hopp, Reference Hopp2007; Schwartz & Sprouse, Reference Schwartz and Sprouse1996). Detailed differences between the models aside, they propose that cross-linguistic interference constrains L2 acquisition initially but subsides after sustained and sufficient L2 input, which leads to the restructuring of the L2 system toward the target language. In these respects, these models can accommodate the finding that cross-linguistic interference in late bilingualism can impact similarly on both first-learned and later-learned languages.
At any rate, our approach of juxtaposing late L2 acquisition and late L1 attrition has wide-ranging methodological implications. Our comparison of different bilingual populations introduces a different frame of reference for studies on L2 ultimate attainment (see also Montrul, Reference Montrul2008; Tsimpli et al., Reference Tsimpli, Sorace, Heycock and Filiaci2004). Late L1 attriters constitute a control group that matches late L2 learners in terms of the asymmetric onset of bilingualism and a concomitantly lower proportion of input and use of the target language. Directly comparing and contrasting late L2 learners and L1 attriters thus allows us to disentangle effects of AOA from effects that affect bilinguals independently of the age of acquisition of the target language. We believe that this approach constitutes a methodological advance over the traditional comparisons of late L2 learners and (predominantly) monolingual native speakers. It seems to us that the direct comparison of late L2 acquisition and L1 attrition provides a fruitful line of inquiry that promises to cast new light on questions of age effects and cross-linguistic interference in bilingualism.
Given that late bilingualism has been demonstrated to lead to perceptible nonnativelikeness even in the mother tongue, the findings of the present study highlight that using the (monolingual) standard of nativelikeness as the only frame of reference in research on L2 acquisition is methodologically problematic. Hence, as far as investigating age effects in (late) L2 acquisition goes, the measure that has been used in virtually all previous research would seem to be questionable. In view of this methodological shortcoming, we believe it is also necessary to reexamine and reevaluate previous research on L2 acquisition that is exclusively based on comparisons of L2ers and monolingual native speakers.
Needless to say, the present study is but a first step toward direct comparisons of late L1 attrition and L2 acquisition. Because the data from L1 attriters and L2 learners were collected in two independent studies, we could not ensure direct comparability of, for example, C-Test scores, speech production samples, and the range of background information available. In addition, it would be desirable to collect aptitude measures in a direct comparison between late L1 attriters and late L2 learners in future research, and to extend such investigations to encompass a wider range of linguistic features beyond foreign accent, for instance, aspects of morphosyntax such as inflection. For reaching more definitive conclusions about the nature of similarities and differences in accent between attriters and L2 learners, we would also need to carry out phonetic and phonological analyses of the production samples. Such analyses are currently being conducted. Bearing these limitations in mind, we hold that the findings nevertheless are more in accordance with a continuity approach to language development than with an account that assumes maturational constraints: an early AOA does not deterministically lead to nativelikeness, and neither does a late AOA deterministically prevent it.
CONCLUSION
This study undertook a direct comparison of perceived foreign accent in late L1 attrition of German and late L2 acquisition of German. We introduced the method of directly comparing L1 attriters and L2 learners to predominantly monolingual natives as a way of disentangling effects of age of onset from effects of bilingualism in speech production. Although the natives and the L1 attriters differ from the L2 learners in perceived foreign accent at the group level, we also found a sizeable overlap between L1 attriters and L2 learners. Variation in foreign accent was not related to length of residence for either bilingual population. Of other external variables, use of the L2 turned out to be significant to some degree for the L2ers, in particular, where highly intensive and sustained contacts were concerned, whereas an indirect measure of aptitude correlated with foreign accent for all bilingual groups. In line with interference accounts of bilingual speech production, these results illustrate that speech production can perceptibly be affected by cross-linguistic interference in both groups of bilinguals even though the attriters acquired the target language as their native language from birth, whereas the L2ers were postpuberty foreign-language learners. We conclude that acquiring a language from birth is not sufficient for ensuring nativelikeness in bilingual speech production. In consequence, nativelikeness, if defined against a predominantly monolingual standard, cannot serve as a performance criterion in investigations of age effects on L2 ultimate attainment.
ACKNOWLEDGMENTS
Part of the research presented here was supported by NWO Grant 275-70-00 (to M.S.S.). We are grateful to Lysbeth Plas, Bregtje Seton, Dieter Thoma, and Gülsen Yılmaz for their help at various stages. We also thank the audiences at EUROSLA 20, ISB 8, and two anonymous reviewers for helpful comments.