At this stage we can say with certainty that language attrition is a genuine phenomenon and a genuine problem, but one about which we know relatively little. However, having identified the problem and outlined several possible areas to investigate, we can anticipate large-scale future research will provide some of the answers. (Freed, Reference Freed, Lambert and Freed1982, p. 5)
1. Introduction
This plenary is dedicated to the memory of two great scholars, Richard D. Lambert and Theo van Els, and to the celebration of the work of two equally great scholars, Kees de Bot and Bert Weltens. Together, they formed the ‘quadrumvirate’ that throughout the 1980s led the movement to establish research into the loss and deterioration of language skills among cognitively non-impaired adults – commonly and collectively referred to as language attrition – as an area of bilingualism research in its own right. The starting point for this development is usually taken to be in 1980, when a conference on ‘The Loss of Language Skills’ was organized at the University of Pennsylvania by Lambert and his colleague Barbara F. Freed. I had the great honour and pleasure of speaking to Richard Lambert at length about what had prompted him to diversify into this subject area when I was researching the background and history of the field of language attrition in 2003, for an overview chapter (Köpke & Schmid, Reference Köpke, Schmid, Schmid, Köpke, Keijzer and Weilemar2004) introducing the proceedings of the first conference of the series in which the present event is the fourth. Lambert told me his interest in processes of the loss of language skills had been sparked by doubts that he had begun to entertain about the appropriateness of requiring US college students to study a foreign language (FL). Not only he but a broad majority of experts and laypeople alike believed that the language skills that are acquired at such cost and effort are not very durable unless the learner continues to use and hone them. Most graduates did not.
Lambert elaborates on this point of view in his chapter in Lambert and Freed (Reference Lambert and Freed1982), pointing out that research on the hows, whys and whens of language attrition ‘should lead to the creation of new language skill maintenance and revivification programs to supplement the first-time language skill acquisition programs that now predominate’ (Lambert, Reference Lambert, Lambert and Freed1982, p. 7). As his colleague and co-editor, Barbara F. Freed, notes in her introduction to the volume, ‘vast amounts of time, energy, and funding have been invested to further the development of curriculum materials and methodology to increase second language learning’ but ‘the maintenance of these skills once attained’ had largely been disregarded (Freed, Reference Freed, Lambert and Freed1982, p. 5).
Forty years on, these observations have lost nothing of their accuracy and relevance: language learners still drop off the horizon of research, policy and pedagogy the moment they have taken their exam or attained their degree or diploma, and no consideration has been given to the development of skill maintenance and revivification programs. It is no exaggeration to say that we currently have no understanding of how – or even if – FL skillsFootnote 1 can attrite; of which grammatical or lexical features are more or less vulnerable, and why; of whether and how different subskills or modalities are differentially affected; of what other factors (length of time, amount of contact, attitudes) will facilitate or impede attrition and to what degree; nor of how former learners can be supported in maintaining or regaining proficiency and whether pedagogical approaches geared towards teaching a language the first time round are fit for purpose in re-learning.
There is even less understanding of how pedagogical approaches and characteristics of the learner experience (or of the learner personality) feature in the attritional process. To give but one example, the ‘earlier is better’ view of FL instruction has been thoroughly debunked in recent years following a number of impressive, large-scale and longitudinal studies (see e.g. Mitchell & Myles, Reference Mitchell and Myles2019; Muñoz, Reference Muñoz2008). These insights are based on observations of learning trajectories and the recognition that, other things being equal, younger children develop their FL skills more slowly than older children or adolescents/adults. However, the question of whether age-related differences in learning trajectories may potentially make language knowledge more or less resilient post-instruction has never been empirically assessed. Surely such insights into what facilitates the maintenance of skills should inform pedagogical and policy decisions alongside what we know about their acquisition?
There are few but notable exceptions to the blinkeredness within the field of second language acquisition (SLA), learning, and teaching to the problem of FL attrition, mainly consisting of a small number of widely known early and large-scale studies (see below, Section 2). Over the past two decades, however, investigations of L2 attrition in general and instructed FL attrition in particular have been few and far between (see Mehotcheva & Köpke, Reference Mehotcheva, Köpke, Schmid and Köpke2019, for a recent overview). The field thus continues to suffer from the same shortcomings pointed out for first language (L1) attrition research two decades ago (Köpke & Schmid, Reference Köpke, Schmid, Schmid, Köpke, Keijzer and Weilemar2004) – a lack of empirical evidence, theoretical frameworks and methodological coherence – compounded by problems that are specific to this field and do not apply in the same way in L1 attrition research (see below, Section 3).
It is, furthermore, difficult to identify any general and overall picture across L2 attrition research, as investigations are often carried out in different contexts and settings. These range from school learners of a community minority language (e.g. Murtagh, Reference Murtagh2003) through simultaneous and early bilingual children and adolescents whose parents have returned to their country of origin after an extended stay in another linguistic environment (‘returnees’, e.g. Flores, Reference Flores2015; Lee, Reference Lee2002; Taura, Reference Taura2008), investigations of former Study Abroad university students (Engstler, Reference Engstler2012; Huensch et al., Reference Huensch, Tracy-Ventura, Bridges and Cuesta Medina2019; Mehotcheva, Reference Mehotcheva2010) to the very specific experience of the Latter Day Saints missionaries in the US who typically receive a short period of intensive instruction in a FL followed by two to four years of proselytizing in that linguistic environment, but who, more often than not, stop using the foreign language entirely upon their return to the US (e.g. Hansen, Reference Hansen, Schmid and Lowie2011; Nagasawa, Reference Nagasawa and Robinson1999; Russell, Reference Russell and Hansen2012). While each of these contexts contributes important insights into language development and retention these may not generalize to what is almost certainly the most frequent setting of L2 attrition – a period of instruction in the home country, with exposure largely confined to language classes experienced at school, college, or university and followed by years or decades of decreased or non-existent use and little opportunity to take the language up again. Very few investigations have focused on this context (see Section 2). However, it probably comprises not only the largest number of L2 attriters but also the ones who are most in need of support, as the instructed setting will likely result in lower proficiency levels and less entrenchment of the language than any of the contexts described above (which all have some immersion element), rendering knowledge more vulnerable to post-instruction erosion. It is also the setting in which the belief that FL knowledge is extremely vulnerable to erosion – the ‘use it or lose it’ tenet – is likely to be most prevalent.
2. The attrition of instructed foreign languages
Lambert and Freed's initiative to introduce language attrition as a research field sparked a number of empirical investigations of L2 attrition throughout the 1980s. Of particular note here is an impressively large survey of 587 participants in the US who had finished learning Spanish at high school or college between 1 and 50 years previously, comparing them with 146 participants in the last week of their course (Bahrick, Reference Bahrick1984). The survey comprised a battery of tests spanning reading comprehension as well as recall and recognition of vocabulary, idioms and grammar (no information is provided on what particular items or grammatical features were targeted in these tests, nor how many items were included in each subtest). The statistical analyses conducted revealed the following main conclusions:
• Eight of the ten variables studiedFootnote 2 declined exponentially between year 3 and year 6 after instruction and subsequently remained at a steady level for several decades, followed by another spurt of decline. Bahrick (Reference Bahrick1984, p. 110f.) coined the term ‘permastore’ for the portion of linguistic knowledge that remains resilient between and beyond these two periods of decline.
• Self-reported amount of exposure or rehearsal did not have any influence on the retention functions – in other words, L2 use did not affect attrition or maintenance. Bahrick ascribes this lack of an effect to very low levels of rehearsal and lack of variance across the population (p. 109) – no-one had used Spanish enough to make a difference.
• The level and success of training had a strong effect, with participants who had studied the language longer and/or received a higher grade retaining more knowledge. In absolute terms, all learners seemed to lose about the same amount, but this represented a smaller proportion of the total knowledge for the more advanced learners. For learners at the lowest proficiency levels, this means that they became indistinguishable from the control group who had never learned Spanish within six years of ceasing instruction, while successful learners at higher proficiency levels retained between 62% and 80% of what they had known (p. 111f.)
• Recall of grammar (unlike recognition of grammar) and, to some extent, recognition of idioms continue to decline linearly after the first attrition interval beyond year 6, failing to stabilize in the same way the other variables do (p. 116).
The take-home message from Bahrick's study, often cited in L2 attrition research, is that, while FL knowledge does indeed attrite to some extent between 3 and 6 years after training ceases, ‘the remainder is immune to further losses for at least a quarter of a century, and much of that content survives for fifty years or longer’ (p. 110), despite the fact that Bahrick's participants had rehearsed their knowledge ‘minimally or not at all’ (p. 109).Footnote 3 This has come to be seen as the most surprising finding from this study, while the at least equally puzzling outcome that grammar did not stabilize in the same way as vocabulary has largely been ignored (I return to this point below).
Similar overall themes emerged from Weltens' (Reference Weltens1989) study of L2 attrition of French in the Netherlands. This study differed from Bahrick's investigation in several crucial aspects: firstly, level of training was a dichotomous variable, with participants having learned French in secondary school for either four or six years (the extra two years in the latter group were more intensive, so that these students had double the number of contact hours). Secondly, the post-instruction interval was much shorter, with a maximum of four years. A subset of the participants was re-tested two years into the study, combining a cross-sectional with a longitudinal approach. Lastly, while the tested cohort was smaller (n = 25 for each level of training at 0, 2 and 4 years post instruction, total n = 150 – still considerable in a field dominated by very small samples), the linguistic measures used to estimate proficiency were more detailed, spanning holistic proficiency (cloze test, listening and reading comprehension) as well as lexical (French to Dutch translation of missing words in a sentence context), morpho-syntactic (multiple choice) and phonological (discrimination and production) skills, and detailed self-reports on proficiency at the end of learning and at the time of testing.
Weltens concluded that ‘attrition sets in rather quickly and then levels off’ (p. 92), but that this process of deterioration was limited to morpho-syntactic skills and self-assessments. Absolute performance depended on training level (with participants who had had longer and more intensive instruction outperforming the low-instruction group), but rate of attrition did not – like Bahrick, Weltens (Reference Weltens1989, p. 92) found that participants ‘lose a fixed amount of knowledge, independent of their original level’. While participants’ subjective impression was that their lexical skills had deteriorated after as little as two years, this was not borne out by their absolute performance on the tasks, with lexical skills remaining stable across the four-year period. Weltens does point out that this might be linked to the untimed paradigm used in this study, and that a more sensitive test, such as lexical decision under time constraints, might have revealed problems of access (p. 93). Phonological skills, on the other hand, actually improved over time (p. 94).
These two large-scale and meticulous investigations thus pointed to a surprising longevity of linguistic knowledge, despite the fact that their participants were not given any input that might have triggered a reactivation of dormant knowledge (and therefore potentially even underestimated the proportion of knowledge that has been retained). As such, they provided an important point of departure for what should have developed into an important research area, informing policy and practice and begging to be replicated in other learning contexts and for other language combinations. However, in contrast to the study of L1 attrition which has grown and flourished (see e.g. Schmid, Reference Schmid2016), investigations of L2 attrition remain few and far between, with the bulk of them being conducted in the form of Ph.D. projects, which tend not to be easily available and therefore do not inform subsequent research as much as they ought to have doneFootnote 4 (overviews of the available research can be found in Bardovi-Harlig & Stringer, Reference Bardovi-Harlig and Stringer2010; Mehotcheva & Köpke, Reference Mehotcheva, Köpke, Schmid and Köpke2019). Matters are made worse by the fact that there is no commonly adopted methodology and that population size in most studies is small, which makes comparisons between investigations or generalizations to the broader context difficult and attaining an overall picture of the linguistic and extralinguistic drivers of attrition impossible.
3. Unique challenges
By the turn of the millennium it was becoming increasingly evident that the study of the maintenance and deterioration of language skills was in need of its own theoretical and methodological frameworks in order to meet the specific challenges of conducting research on language attrition in its own right, rather than unsuccessfully attempting to establish it as a kind of mirrored clone of SLA research (Köpke & Schmid, Reference Köpke, Schmid, Schmid, Köpke, Keijzer and Weilemar2004; Schmid & Köpke, Reference Schmid, Köpke, Schmid and Köpke2019). This applies to both theory and methodology. Attrition research has often proceeded on the assumption that the predictions made by theoretical frameworks with respect to language acquisition can simply be reversed for language attrition. The most straightforward example of this is probably Jakobson's Regression Hypothesis (Jakobson, Reference Jakobson1941; Keijzer, Reference Keijzer2007), which predicts that linguistic features will be lost in the reverse order in which they are acquired. This has repeatedly been shown to be an assumption that is not only overly simplistic but also has little explanatory potential (Schmid, Reference Schmid2002): sequences observed in either L1 or L2 acquisition are not simply reversed in the attrition of these skills, and multiple other factors come to bear on this developmental process, of which crosslinguistic similarity/difference is only one. Similar problems abound with respect to other theoretical frameworks but are beyond the scope of this plenary.
In addition to the increasingly evident challenge of reconceptualizing theoretical frameworks so that they are capable of capturing insights from both L1 and L2 attrition, there are two major methodological challenges for attrition research – both to some extent linked to variability: unlike studies of instructed L2 learning, research on language attrition does not have recourse to ready-made classroom populations which tend to be fairly homogenous in terms of a broad range of background variables, learning experiences and proficiency profiles. Language attrition is a developmental experience unfolding across the lifespan, and as such it is shaped by literally everything else that the language user experiences across decades – changes in their language environment, in their occupation, in their family situation or in their circle of friends, the decision to take up another language, and so on. Many of these factors tend to vary to a much smaller extent (if at all) during the time that individuals are in formal education and benefit from instructed language learning. This means that it is not only much more challenging to recruit sizeable participant populations for attrition studies, but that there are many more factors that need to be considered and accounted for in statistical modelling than is the case for language acquisition and learning. To make matters worse, unexpected and nonlinear interactions between some of these factors have often been reported (e.g. Cherciov, Reference Cherciov2013; de Bot et al., Reference de Bot, Gommans, Rossing, Seliger and Vago1991; Opitz, Reference Opitz2013; Schmid & Yılmaz, Reference Schmid and Yılmaz2018).
As Schmid and Köpke (Reference Schmid, Köpke, Schmid and Köpke2019) describe, the ‘noughties’ were thus a period characterized by various initiatives and efforts to provide a standardized methodology and solid adaptations of theoretical frameworks for research on L1 attrition. While these efforts are just as relevant for L2 attrition, very little work has been done in this area to explore to what extent a similar approach might be valid and appropriate here.
This may to some extent be because of a second challenge, this one specific to this particular subfield, which seemed – and, to some extent, still seems – insurmountable and which does not similarly apply to L1 attrition: the question of the baseline. Ever since the early studies of child L1 acquisition (e.g., Brown, Reference Brown1973) it has been established that, individual variability notwithstanding, (monolingual) native languages develop at roughly the same rate and in roughly the same sequence across speakers of the same linguistic community (e.g. Clark, Reference Clark2003). When investigating L1 attrition among speakers who reached puberty before becoming bilingual, the researcher can thus be reasonably confident that their language skills will have been similar at that time to those of others who had grown up under similar circumstances, and thus establish a valid control group.Footnote 5 The same is not true in L2 acquisition, in particular in instructed L2 learning, where both sequence and ‘endstate’ (that is, proficiency at whatever stage learning ceases) can show dramatic variability. This means that, where an L2 attriter is incapable of performing a particular task, it is hard to know whether this is due to attrition or non-acquisition – how much has been lost versus how much was never acquired in the first place?
In practice, this problem has been dealt with through either longitudinal or cross-sectional approaches. In the former, participants are tested initially, for example at the end of a school year or a course, and then re-tested after a period of non-exposure. While this method is certainly the most rigorous and reliable way of establishing end-state proficiency, it is problematic in other ways. Firstly, it limits the attrition span to relatively short periods for practical reasons (Bardovi-Harlig & Stringer, Reference Bardovi-Harlig and Stringer2010, note that existing longitudinal studies of L2 attrition typically cover only 1–2 years and none of them exceed 5). The longer this period, furthermore, the larger the inevitable loss of participants, so that the initial sample needs to be very large to allow for sufficient statistical power in the eventual participant population. Secondly, it has to cope with the fact that, if a comparable method of assessment is to be used at all testing moments, there is likely to be some degree of re-testing benefit, particularly for studies attempting to draw a more fine-grained picture of the attritional trajectory by re-testing participants at regular intervals.
The second way of dealing with the baseline issue is the cross-sectional method, which estimates baseline proficiency based on information about the learning experience, such as the length of the course, the level of the attained qualification, self-estimates of proficiency at the endstate, and/or measures of success in terms of grades. In this method, there is a residual degree of uncertainty about the validity of the baseline estimate, and populations therefore need to be sufficiently large and conform to assumptions about random sampling and normal distribution to allow overall patterns to emerge despite such possible errors.
4. Common themes
Notwithstanding the limited body of research on L2 attrition and the methodological challenges and limitations pointed out above, the available cumulative insights from the work conducted over the past 40 years still allow for a number of generalizations to be extrapolated. One of these relates to the consistent finding that higher levels of proficiency facilitate retention (e.g. Xu, Reference Xu2010) in relative, though not in absolute terms. On the other hand, insights into the impact of frequency of L2 use are much less straightforward. As mentioned above, Bahrick (Reference Bahrick1984) dismissed the failure of this variable to emerge as a significant predictor as meaningful, ascribing it instead to low levels of use and low levels of variance. However, the lack of predictive power of exposure and use is one of the key defining and recurring features that have puzzled researchers of both L2 and L1 attrition for decades. Intuitively, it appears obvious that frequency of use should be the main driver of attrition vs. retention, but findings remain inconclusive (Mehotcheva & Mytara, Reference Mehotcheva, Mytara, Schmid and Köpke2019, p. 358). Insights from L1 attrition suggest a much smaller role for exposure than most people would take for granted (see Schmid, Reference Schmid, Schmid and Köpke2019 for an overview). Schmid and Yılmaz (Reference Schmid and Yılmaz2018) raise the intriguing possibility of an interaction of language learning aptitude and exposure, with high-aptitude individuals being able to maintain their L1 in the absence of frequent contact while skills deteriorate among low-aptitude individuals who do not have the opportunity to use their L1. It is unclear whether this finding will translate to L2 attrition. There are some isolated investigations suggesting a beneficial role of continued instruction (e.g., Russell, Reference Russell and Hansen2012) or self-study (Xu, Reference Xu2010), but no study has ever tested the impact of different kinds of exposure (e.g. written vs. spoken, active vs. passive) on language maintenance and reactivation, let alone investigated how frequent it should be or at what stages of the attrition process it might be most supportive.
The important question of how long it takes for FL attrition to set in, how deeply knowledge can eventually erode, and how this is constrained by initial proficiency levels or other factors is even harder to answer, in particular because almost all investigations of attrition to date look at relatively short incubation periods, with only very rare cases exceeding 5 years of non-exposure (Bardovi-Harlig & Stringer, Reference Bardovi-Harlig and Stringer2010; Larson-Hall, Reference Larson-Hall, Schmid and Köpke2019). Within those five years, attrition is limited, and the general trend is that participants tend to overestimate the degree to which their language has deteriorated (Weltens, Reference Weltens1989, p. 93).
With respect to the fabric of the attriting language (linguistic features), grammar tends to be more stable than vocabulary in FL attrition (e.g. Bardovi-Harlig & Stringer, Reference Bardovi-Harlig, Stringer, Schmid and Köpke2019; Larson-Hall, Reference Larson-Hall, Schmid and Köpke2019). Earlier findings to the contrary (e.g. Bahrick, Reference Bahrick1984; Weltens, Reference Weltens1989, see above Section 2) are likely due to changes in pedagogical styles: the participants in these early investigations had experienced the language classroom between the 1930s and the 1980s, at a time when teaching typically focused on explicit learning of grammatical rules and translation into the L1, rather than on acquiring communicative skills. This is also suggested by the fact that some very early investigations, focusing on the development of knowledge of Latin after relatively short periods of non-use, also show the vocabulary to be more resilient than grammar (e.g., Geoghegan, Reference Geoghegan1950; Kennedy, Reference Kennedy1932). Communication-focused teaching styles, which have become the norm over the past decades (e.g., Mitchell, Reference Mitchell1988, Reference Mitchell and Swarbrick2002), facilitate the acquisition of implicit grammatical skills which are more resilient against deterioration (e.g., Paradis, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007). However, vocabulary attrition is by far the most broadly studied area of FL attrition compared to very few investigations of what happens to grammatical or phonological/phonetic accuracy (see Bardovi-Harlig & Stringer, Reference Bardovi-Harlig, Stringer, Schmid and Köpke2019 for an overview).Footnote 6
Lastly and possibly most intriguingly, there is no insight at all into how stable proficiency is after it has been re-gained, as compared with proficiency at the beginning of the attrition period. Insights from L1 attrition suggest that reactivated knowledge may be considerably more resilient to erosion than proficiency at the beginning of the attrition period (Köpke & Genevska-Hanke, Reference Köpke and Genevska-Hanke2018 – note that, in this study, there were also differences to the speaker's personal circumstances in the post-exposure attrition period whichmay have contributed to the increased resilience), but there is no way of knowing whether L2 retraining may have a similar ‘booster’ effect.
A final important observation in the context of ‘forgetting’ and re-training is the fact that all studies that have compared production/recall with perception/recognition have found that retention is higher with respect to the latter. In other words, there seems to be a tendency among attriters to be able to identify a word or judge a structure as (un-)grammatical when it is presented to them, even though they may not be able to spontaneously produce it. This is in line with predictions made by the Activation Threshold Hypothesis (e.g. Paradis, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007) and further indicates that what is affected in attrition is accessibility rather than representation of knowledge. Anecdotally, many L2 attriters also have stories of situations where some kind of emergency or highly emotional situation triggered an ability to use the language, which astonished them but which they were unable to recapture later. Several of the participants in the study described below (Section 5) had stories to tell of how their suddenly resurfaced ability to speak French, German or Spanish saved the day while on holiday, when another family member had suffered a medical emergency, when the car broke down and a rescue service had to be summoned by phone, or when there were mix-ups concerning rooms or luggage. Another participant wrote how they ‘had a blazing argument with a French person I was staying with and I found myself very fluid in the heat of the moment’. Such accounts, along with anecdotal reports of re-immersion spontaneously triggering the effect of the language ‘flooding back’ despite a complete lack of exposure before the reimmersion (e.g. by Bardovi-Harlig & Stringer, Reference Bardovi-Harlig, Stringer, Altarriba and Isurin2013) suggest that what underlies ‘deterioration’ may not be the entire decay of the memory trace but the inability to activate it in the absence of extraordinary levels of motivation, energy and effort. This would indicate that restoring the knowledge to previous levels of availability may need less input, stimulation and effort than would be the case when the same knowledge is taught from scratch: the so-called ‘Savings Paradigm’ (e.g., de Bot et al., Reference de Bot, Martens and Stoessel2004; Hansen, Reference Hansen, Schmid and Lowie2011). In consequence, traditional language classes designed for the acquisition of new knowledge may be inappropriate for individuals seeking to re-attain their earlier proficiency levels, resulting in boredom and frustration. Interventions specifically targeting the needs of attriters may be able to achieve more in shorter amounts of time, while also being more enjoyable and giving re-learners confidence in their abilities. As was pointed out above, the perceived need to develop such interventions was one of the main drivers behind Lambert and Freed's efforts in the early 1980s – sadly, to date, nothing has come of these.
5. The attrition of instructed French in the UK: A pilot investigation (with Florence Myles and Ángel Osle, University of Essex)
In the remainder of this presentation, I will report on an investigation that was conducted in the hope that it might serve as a pilot, foundation and stimulus for further study and renewed interest in L2 attrition. In designing this study, we drew on the initial pioneering methodologies of the two large studies described in Section 2 (Bahrick, Reference Bahrick1984; Weltens, Reference Weltens1989). We opted to initially run a web-based survey, with a comparatively small number of linguistic items, despite the limitations this imposes on fine-grained assessment, partly because this would allow us to collect a large enough sample to draw some initial conclusions and partly because face-to-face experimental research was still severely constrained due to the Covid19 pandemic.
5.1 Method and materials
Ethical approval for the present study was provided by the Ethics Committee of the Department of Language and Linguistic Science, University of York. Participants were recruited and remunerated by Qualtrics through its panel base. They were native speakers of English with no other home languages and had studied FrenchFootnote 7 at secondary schools in England between 1 and 50 years ago (see below for more detail). All participants had taken one or both of the two official state exams in this language: General Certificate of Secondary Education (GCSE, taken around age 16) or A-level (university-entry level, taken two years later). Students who have taken a French GCSE are typically around A2/low B1 in the Common European Framework of Reference (Curcin & Black, Reference Curcin and Black2019) while A-levels lead to B2 (Milton, Reference Milton2007). In order to help refine the estimate of individual variability in baseline proficiency, participants were also asked a number of questions relating to the length (in years) and intensity (average hours per week) of instruction. We also asked them to estimate their own position with respect to the rest of their cohort in terms of both talent and diligence by moving a slider on a scale from 0 (least talented/diligent of their cohort) to 100 (most talented/diligent), and to indicate on a similar slider their disagreement/agreement with the statement ‘I am very good at learning languages’. Lastly, we presented them with the short descriptors of the six CEFR levels (https://www.coe.int/en/web/common-european-framework-reference-languages/the-cefr-descriptors) and asked them to indicate which best described their proficiency in French at the end of instruction as well as at the present time.
The survey furthermore contained basic personal background questions on age, gender, education, the age at which they had started learning French and how long ago they had last studied it. We then had a set of questions on attitude towards language learning, the language learning experience at school, and how often the participant used or was exposed to French in a variety of contexts, all of which were measured on a sliding scale from 0 to 100. We also asked participants to self-assess on a sliding scale from 0 to 100 whether their proficiency at present was worse (0–49), the same (50) or better (51–100) now than it had been when they had stopped learning, and asked a few questions about whether they had ever taken any classes or done any other activities to revive their French knowledge since leaving school. (The full questionnaire, which also contained some open questions on activities and anecdotes about the language use, is provided in Supplementary Appendix A).
Once participants had completed the survey, they were given a language assessment consisting of two parts: The first part used a subset of the LexTALE vocabulary assessment (see Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012 for the original LexTALE task and Brysbaert, Reference Brysbaert2013 for the French version). The original task consists of 84 sequences of letters which are plausible words in the target language. Fifty-six are existing words, while 28 are nonwords. The participant has to indicate for each word whether or not it is an actual word in that language, and the overall score is the average percentage of correct acceptances of words and correct rejections of nonwords. This was deemed to be too long and demanding for the purpose of the present study, giving the relatively low level of proficiency of participants and the length of the rest of the survey, we therefore used 20 words and 10 nonwords. Sixty-six participants for whom there was a gap of more than 70% between the score they achieved on words vs. nonwords were eliminated from the analyses, as we assumed that they had simply responded ‘yes’ or ‘no’ to all or most of the items. The second part used an online French placement test, consisting of 30 multiple-choice items targeting levels A1 to B1. Of these, 12 items test knowledge of verb morphology and use (past tenses (5 items), future (1 item), subjunctive (3 items) and inflection for person (3 items)), seven test the use of pronouns, and 11 relate to closed-class items such as prepositions and conjunctions. This test was developed by the French section of the Department of Language and Linguistics of the University of Essex for determining the level of language proficiency of new students in order to assign them to the appropriate modules, it is used here with the kind permission of the developers. The overall score was the percentage of accurate responses.
5.2 Participant characteristics
A total of 491 participants (101 males, 387 females and three participants who did not identify with either gender or preferred not to respond) completed the survey. The average age was 46.1 years (stdev 16.3). One hundred and twenty-nine had completed secondary education with the GCSE, 141 had taken their A-levels, and 221 had continued to tertiary education (176 took a B.A. and 45 an M.A. or a Ph.D.). There was a skew in terms of the language learning experience, with 65 participants having taken French A-levels and 426 having taken it at GCSE.Footnote 8 Both populations started learning French at age 10.8 on average (GCSE stdev 2.01, A-level stdev 2.97). Participants in the lower proficiency group had studied French for an average 1.71 years (555 hours), while the higher proficiency group had 2.1 years of instruction (726 hours). In both populations, the length of time since they had studied French averaged around 30 years (29.97 (range: 1.25–61) for GCSE and 29.26 (range: 2.75–63.5) for A-level).
Students in the GCSE group tended to assess their proficiency at the end of learning as rather low: about half of them thought they had been at A1, with a further third rating it as A2 and only about 11% giving themselves a B1. Even among A-level students, around 25% estimated their ultimate proficiency at A1, with a third giving themselves an A2 and another quarter a B1, but only around 9% a B2 (note that this indicates that only 10% of all participants felt that they had reached the target level). Generally, the perception prevailed that participants’ French had deteriorated: over 75% of GCSE students and 63% of A-level students rated their skills at present as worse or a lot worse, at 40 or below on the sliding scale from 0 (a lot worse) to 100 (a lot better), with only 10% of GCSE and 20% of A-level students picking a number larger than 50, indicating that their French had improved since finishing school. This self-assessment correlated weakly with length of time since instruction (r = −.199, p < .001).
Participants in the GCSE group responded correctly to an average of 30.5% of questions on the proficiency task (stdev. 9.81), with A-level students averaging 37.9% (14.4); and performance on the LexTALE task was at 54.5% and 57.4%, respectively (stdev 9.3/9.8).
5.3 Background variables
Following Schmid and Dusseldorp (Reference Schmid and Dusseldorp2010), we conducted a Principal Component Analysis (PCA) on the results of our survey in order to arrive at a manageable set of compound variables capable of acting as predictors in the regression models described below. Twenty-seven variables were entered into the PCA, comprising questions about frequency of language exposure and use in various settings (9 questions), how the participant had experienced the language instruction (6 questions), and their attitudes towards French language, culture and society and its impact on their own lives (9 questions). The PCA further comprised three questions about how talented and diligent a learner the participant had been. The PCA used Varimax rotation, and components were retained based on Eigenvalues > 1. This yielded a total of five components, cumulatively accounting for 71.7% of variance. Scrutiny of factor loadings revealed that the first component, accounting for 40.9% of variance, related chiefly to L2 exposure, with all nine questions loading highest on this component. The variables loading highest on the second component (16.5% of variance) related to the instructional experience and the enjoyment thereof. The third component (6.1% of variance) appeared to reflect a predominantly instrumental attitude towards L2 learning, comprising questions on whether the participant benefitted from their knowledge of French in terms of occupational opportunities, salary and the ability to read and watch more widely, while the fourth component (4.5%) comprised questions on more affective aspects, such as whether the participant thought French was a beautiful language and useful to communicate with others. Lastly, the final component (3.8%) reflected whether the participant was a diligent and talented learner and had focused on questions of grammar and knowledge in order to pass exams while at school. In addition, independently quantifiable responses to questions about length and intensity of instruction (total hours), age at beginning and end of instruction, length of time since instruction and self-estimated level of proficiency at the end of instruction were used as separate variables in the statistical models.
5.4 Results
5.4.1 Multiple linear regression
In order to assess to what extent proficiency had changed since the end of instruction and what predictors facilitate maintenance vs. attrition, we built a series of linear regression models. We used the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R 4.1.2 (R Core Team, 2021). In order to arrive at the best fitting and most parsimonious model, predictors were entered in consecutive steps, with each resulting model being compared with the previous, simpler model. If the more complex model, containing the most recently added predictor, provided a significantly better fit than the simpler previous one (assessed on the basis of a decrease of the Akaike Information Criterion >2 and a significant ANOVA at p < .05), the predictor was retained in subsequent models.
In building the models, we first attempted to account for variance at the beginning of the attrition period. In order to achieve this, the first predictors to be entered into the model related to level (GCSE vs. A-level, 2 levels), total hours of instruction (scaled predictor), estimated proficiency at end of instruction (CEFR scale, 6 levels) and the final compound variable on learner talent/diligence that had emerged from the PCA (scaled and centred predictor). In the next step, we assessed the impact of age at beginning and end of instruction. These two predictors were highly correlated (r = .915, p < .001) and could not be entered into the same model. We therefore entered each separately into the previous best model and then assessed the two models against each other before evaluating the better one against the best previous model. We then proceeded to add the predictors relating to L2 exposure, learning experience, instrumental attitude and emotional value of the L2. These all correlated weakly with the total number of hours of instruction (all r's < .2) and with the length of time since instruction (all r's < .25). The final models were evaluated for variance inflation factors (VIF), which were unproblematic and below 1.2 for all predictors.
Our models also included interactions between level and length, between level and exposure, and between length and exposure. None of these yielded any significant findings.
Table 1 summarizes the three final models with all significant predictors (the full regression models, including intercepts, estimates, significance levels and adjusted R2s for both included and excluded predictors as well as variance inflation factors for each predictor in the final models are given in Supplementary Appendix B).
*p < .05, **p < .01, ***p < .001.
The first finding to be pointed out is that the final models, comprising all significant predictors, have extremely low levels of explained variance for the two measures of proficiency (12.87% for the grammar assessment – henceforth ‘grammar’ – and 7.21% for the LexTALE task – henceforth ‘vocabulary’). On the other hand, the final model for the question of whether the participant thought their proficiency had improved or deteriorated since the end of instruction (henceforth ‘self-assessment’) was able to account for almost one-third of the variance in this output variable – note that, in the context of attrition research, that can be considered a very good model.
Secondly, the predictors used to estimate proficiency levels at the end of the learning period are all retained in the final model for the grammar assessment. For vocabulary, on the other hand, only the level to which the language was studied and the self-rated proficiency at the end of instruction have predictive power. Self-assessment is only impacted by the level and the total hours of instruction.
The age at which the participant began to learn the language has a negative impact on grammar and self-assessment (participants who started at a younger age perform better), while the age at which they stopped impacts the model positively (participants who studied it up to a higher age perform better). This is in line with expectations. As was explained above, it was not possible to represent both predictors in the same model due to their high correlation, and the model comparison showed that for grammar, initial age was the stronger predictor while for the self-assessment, it was final age, so these were the predictors we retained going forward.
The predictors derived from the survey questions about the frequency of L2 use, how positive the learning experience had been, attitude towards and emotional affiliation with the L2 all emerge as strongly significant predictors for the self-assessments: participants who use the L2 frequently, enjoyed their classes at school, and find the L2 a useful and beautiful language report less self-assessed decline in the post-instruction period than people who do not use it, did not enjoy studying it and do not hold a positive attitude towards it. However, none of these predictors have much of a role to play for measured proficiency. The only significant finding is that there is a relationship between frequency of L2 use and the vocabulary score – however, this relationship not only accounts for less than 2% of overall variance, it is also negative and probably spurious.
Lastly, while length of time since instruction has a strong and negative impact on the self-assessment when it is initially entered into the model, the predictive power of this variable begins to be eroded once the PCA predictors are also entered (see note above about the weakly negative correlation), so in the final model it no longer emerges as a significant predictor. Conversely and rather puzzlingly, length of time since instruction is retained as a significant predictor in both models relating to measured proficiency, but its impact is positive, suggesting a higher rather than a lower score for participants with a longer post-instruction interval.
5.4.2 Non-linear effects: Data visualization
In order to ascertain whether there might be interactions between variables that are not linear and therefore elude the regression models described above, we also conducted a series of data visualizations. Firstly, keeping in mind Bahrick's (Reference Bahrick1984) finding that proficiency levels drop exponentially relatively early on in the attrition period and hold steady afterwards, in conjunction with the impact of initial proficiency, we wanted to explore the impact of time as a non-linear predictor. We therefore plotted the development of the three outcome variables against the time elapsed since instruction and added a LOESS-fit line for both instruction levels.
For the two measured variables, grammar and vocabulary, these plots do not show any initial drop (Figures 1 and 2). A-level students consistently perform better than GCSE students, and in both cohorts, participants who studied the language a longer time ago (towards the right of the chart) outperform more recent graduates (on the left). The LOESS line for the A-level cohort does suggest a fall in proficiency over the first ca. 15 years, but this line is more erratic than the others, probably due to the relatively smaller cohort. This becomes more evident in Figure 3, which only shows the participants in this subgroup and indicates that relatively few participants in our sample who attained their A-levels within the past 20 years scored particularly highly, while representation across the lower half of the proficiency spectrum appears to hold steady over the entire attrition period.
On the other hand, a very clear picture emerges from Figure 4 about the impact of length of time since instruction on self-perceived proficiency: this appears to decline very steadily and more-or-less in parallel for both cohorts for the first 20 years since instruction. After this, the rate of decline continues albeit at a reduced pace, and in particular seems to hold steady between 20 and 30 years.
We finally explored the possibility of two further, non-linear interactions: firstly, we wanted to evaluate to what extent length of time since instruction might interact with amount of use, as reported in L1 attrition by de Bot et al. (Reference de Bot, Gommans, Rossing, Seliger and Vago1991) who found that length of time only impacted those participants who used their L1 extremely infrequently. We therefore divided the cohort into three approximately even-sized groups according to their score on the cumulative variable measuring L2 exposure and plotted the effect of length on the three outcome variables as a function of this variable (Figures 5–7).
As is evident in these figures, there is no differential impact of length of time according to the frequency of exposure for the two measured outcome variables. While the levels of self-perceived decline across participants seem to follow a similar trajectory over time in all three exposure groups, it is clear that participants with high levels of exposure think that they fare better overall than those with intermediate or low levels. Even in this group, however, the average dips into the negative side – participants think that their command of French has worsened – after only a few years post-instruction.
Finally, following Schmid and Yılmaz (Reference Schmid and Yılmaz2018) – whose results suggest that frequency exposure may be a factor that differentially affects populations with different levels of language learning aptitude, in that high-aptitude immersed speakers can maintain their L1 in the absence of exposure but low-aptitude participants see their L1 attrite unless they use it – we again divided the sample into three subgroups based on the PCA variable relating to whether the participant considered themselves a good language learner, and plotted the outcome scores against the PCA variable relating to frequency of L2 use. However, as Figures 8–10 show, looking at the data from this angle did not yield any additional insights, either, except for yet again revealing the dramatic impact frequency of L2 use has on the self-assessment of L2 deterioration or maintenance, and the complete absence of such an effect for the two measured variables.
6. Discussion
The present study aimed to provide a baseline and point of departure for L2 attrition research, 40 years after Bahrick's influential investigation. We opted for a cross-sectional and relatively light-touch investigation of a large data sample in order to assess whether similar patterns of relationships between outcome and predictor variables would hold among participants who were also living in an environment with a strong bias towards English monolingualism and who had studied a FL, but whose learning experiences were more in line with current-day pedagogical trends and insights. We therefore conducted an internet-based survey among native speakers of British English who had taken a GCSE or A-level in French (which is the most frequently taught FL in the UK). Our survey, intended to be a pilot and laying the foundations for future research, combined an admittedly ‘quick and dirty’ assessment of grammatical and vocabulary knowledge with detailed questions about how often participants had L2 exposure, how they had experienced their French instruction in school, and what attitudes they held towards that language.
The most straightforward outcome of this survey is that participants seem to have strong opinions on the inevitability of the decline of FL skills over time, in particular when these skills are not being used frequently: the regression model that was fitted to the response to the question of whether their language was better or worse at the present time than when they had stopped studying it presented a comprehensive picture, with significant impact of the responses on the survey relating to the length of time since instruction, the frequency of use, the learning experience and the attitudes they held towards French. The LOESS plots presented in section 5.3.2 above underscore this findings.
When the same variables were fitted against the outcomes of two tasks measuring grammatical and lexical proficiency, however, they did not have any predictive power for the outcome variables: Rather than scores dropping over time, they actually seem to improve somewhat. It is possible that this finding reflects a higher end-state proficiency among participants who completed their language instruction further back, but this interpretation remains purely speculative and future work will have to ascertain the mechanisms at work here. Our findings underscore that the maintenance of skills over time seems to be largely independent of how frequently the language is being used as well as of the language learning aptitude of the individual.
While these findings may seem highly surprising and counter-intuitive, they are not entirely unexpected to anyone familiar with research on either L1 or L2 attrition. Virtually all the cumulative data in the field suggest that knowledge of languages is astonishingly resilient. It is very likely that this resilience is linked to the way in which the brain handles and processes language: firstly, linguistic knowledge contains an implicit component absent in other school-learned subjects such as maths or history, and implicit/procedural knowledge has been demonstrated to be more resistant to erosion than explicit/declarative knowledge (e.g., Paradis, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007). Possibly more importantly, researchers agree that the representation of all the languages in the multilingual brain is strongly interconnected, and that activation spreads through the entire network, stimulating not only the language currently in use but all others, as well. For example, van Hell and Dijkstra (Reference Van Hell and Dijkstra2002) show that, in an experiment conducted entirely in the participants’ L1 (Dutch), word recognition and lexical decision are facilitated when the corresponding words in their L2 (English) and even their L3 (French) are cognates with Dutch, provided that they had attained a reasonable level of proficiency. This may mean that, unlike other school-acquired knowledge such as algebra or history,Footnote 9 FL skills continue to receive a certain amount of stimulation simply because we use our native language, and that this stimulation is sufficient to prevent erosion of the underlying knowledge.
There is, of course, a strong a strand caveat in order here, relating to the nature of the data we collected: due to the exploratory nature of our survey, we did not probe deeply into matters such as the accessibility of language knowledge for actual language use. All that the data above is able to illustrate is that, given enough time to reflect (the survey did not impose any time constraints), there is no evidence that participants who studied French several decades ago have less underlying knowledge of French vocabulary and grammar than those who studied it much more recently; nor that absence of practice leads to the erosion of this knowledge. It is likely that different tasks using online methods may reveal that former language learners do find it harder to reactivate this knowledge for both language comprehension and production.
However, what our findings do suggest is that anyone who has ever learned a FL, regardless of how long ago it was and how little use they made of it, retains a hidden treasure trove of knowledge that is only waiting to be made available for use once more, but the self-reports show that most people are not aware of this. We mention above the ‘Savings Paradigm’ approach, a framework associated with the psychology of learning and memory that states that it is easier to re-learn (and subsequently retain) something that you once knew, even if it has become entirely inaccessible, than it is to acquire equivalent knowledge from scratch (e.g. de Bot et al., Reference de Bot, Martens and Stoessel2004). English-speaking countries such as the UK have concerningly low levels of FL proficiency – recent estimates show that only about one-third of people in the UK are able to hold a conversation in a language other than English (e.g. https://ec.europa.eu/eurostat/statistics-explained/index.php?title = Foreign_language_skills_statistics) and that this ‘language ignorance’ has an annual cost to the economy of £48 billion or 3.5% of national income (Foreman-Peck & Wang, Reference Foreman-Peck and Wang2014). Given this lack of FL abilities, efforts should be made to understand how the retrieval of this knowledge can best be facilitated for former learners wishing to do so. Simply expecting them to use language classes or apps that were devised for the first-time learner is not good enough.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0261444822000301
Acknowledgements
The survey reported here was made possible with the financial support of the Department of Language and Linguistic Science, University of York. The ideas developed as well as the survey used arose from discussions with a number of colleagues, including Dr. Mufeeda Irshad, University of Sri Jayewardenepura; Dr. Tuğba Karayayla, Ankara Yildirim Beyazit University; Professor Merel Keijzer, Rijksuniversiteit Groningen; Professor Honggang Liu, Northeastern Normal University; Professor Florence Myles, University of Essex and Dr. Ángel Osle, University of Essex. All remaining errors and oversights are mine.
Monika S. Schmid obtained her Ph.D. from the Heinrich-Heine Universität Düsseldorf. She has held positions at the Vrije Universiteit Amsterdam, the Rijksuniversiteit Groningen and the University of Essex, and is currently Professor of Linguistics at the University of York.
Her work has focused on various aspects of first language attrition. She has published two monographs and edited several collected volumes and special issues of journals on this topic, most recently the Oxford handbook of language attrition (2019). Her website, https://languageattrition.org, collects information on language attrition and how to study it for non-specialists as well as the research community.