PREDICTION AND ERROR-BASED LEARNING IN L2 PROCESSING AND ACQUISITION: A CONCEPTUAL REVIEW

Giulia Bovolenta; Emma Marsden

doi:10.1017/S0272263121000723

PREDICTION AND ERROR-BASED LEARNING IN L2 PROCESSING AND ACQUISITION

A CONCEPTUAL REVIEW

Published online by Cambridge University Press: 09 November 2021

Giulia Bovolenta

and

Emma Marsden

Show author details

Giulia Bovolenta*: Affiliation:
University of York
Emma Marsden: Affiliation:
University of York
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
Evidence for prediction in L1 and L2 speakers
Learning from error: Prediction as a learning mechanism
Conclusion
Footnotes
References

Rights & Permissions

Abstract

There is currently much interest in the role of prediction in language processing, both in L1 and L2. For language acquisition researchers, this has prompted debate on the role that predictive processing may play in both L1 and L2 language learning, if any. In this conceptual review, we explore the role of prediction and prediction error as a potential learning aid. We examine different proposed prediction mechanisms and the empirical evidence for them, alongside the factors constraining prediction for both L1 and L2 speakers. We then review the evidence on the role of prediction in learning languages. We report computational modeling that underpins a number of proposals on the role of prediction in L1 and L2 learning, then lay out the empirical evidence supporting the predictions made by modeling, from research into priming and adaptation. Finally, we point out the limitations of these mechanisms in both L1 and L2 speakers.

Type: State of the Scholarship
Information: Studies in Second Language Acquisition , Volume 44 , Issue 5 , December 2022 , pp. 1384 - 1409

DOI: https://doi.org/10.1017/S0272263121000723 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: © The Author(s), 2021. Published by Cambridge University Press

There is currently great interest in whether the same mechanisms that underpin language processing also drive language learning. This interest stems, in part, from a desire to account for language learning in the absence of any kind of predetermined grammatical hard-wiring. That is, accounting for both learning phenomena and processing phenomena within the same model would achieve a desirable theoretical parsimony (O’Grady, Reference O’Grady2005). In addition, investigating the nature and role of processing in second language (L2) acquisition potentially offers a way to shed light on the ways in which L2 acquisition may differ from first language (L1) acquisition.Footnote ¹

In the field of second language acquisition (SLA), it has been suggested that prediction has a role in language learning and, specifically, for acquiring complex contingencies, thought to be among the hardest phenomena to learn. For example, prediction during sentence comprehension may be useful for L2 learners by allowing for hypothesis testing, which could help them retreat from overgeneralization when their predictions (their hypotheses) are disconfirmed (Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). The idea that prediction may serve as a learning tool is supported by computational modeling showing that data from L1 acquisition and processing can be reproduced by recurrent neural networks that use prediction and error-based learning (Chang et al., Reference Chang, Dell and Bock2006). However, there is evidence that L2 speakers often lag behind L1 speakers in their ability to predict upcoming input, as shown by data from both eye-tracking and EEG (electroencephalography) studies. Indeed, it has been suggested that L2 speakers, in particular, may be affected by a reduced ability to generate expectations (Grüter et al., Reference Grüter, Rohde and Schafer2014, Reference Grüter, Rohde and Schafer2017). This possibility sparked a concern that limitations in L2 learners’ ability to predict, relative to L1 speakers, may prevent them from using a prediction-based learning mechanism (e.g., Kaan et al., Reference Kaan, Futch, Fuertes, Mujcinovic and de la Fuente2019; Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015).Footnote ² Therefore, understanding exactly what is meant by prediction, how L2 learners may differ from native speakers in their ability to predict, and what can be learned through prediction, will be necessary to address these concerns. The aim of this article is to provide an overview of research into prediction in L1 and L2 processing and learning, to offer a frame of reference for those interested in the role prediction may play in L2 acquisition in particular.

Our review is structured as follows: In the first section, we define prediction and describe different prediction mechanisms that have been identified in the literature, showing that it can be conceived of as a continuum going from simpler to more complex instances of prediction. We present the empirical evidence for the different types of prediction in L1 and L2 speakers and highlight the factors that can constrain prediction in both groups. In the second section, we introduce the theoretical debate on the role of prediction in language learning—both L1 and L2. We introduce computational models of L1 processing that show that language acquisition and priming phenomena can be explained by error-based learning. We then review the available empirical evidence for a learning mechanism based on prediction error, from studies on priming and adaptation in both L1 and L2, and conclude by highlighting potential limitations of this mechanism.

Evidence for prediction in L1 and L2 speakers

Defining “prediction”

We need to make here a preliminary distinction between different conceptions of prediction: prediction as the formulation of expectations during sentence comprehension (as in “preprocessing”; DeLong et al., Reference DeLong, Troyer and Kutas2014), which is the focus of this review, and a more general sense of prediction as inference generation (e.g., using contextual cues to assign referents to ambiguous pronouns). Grüter et al. (Reference Grüter, Rohde and Schafer2017) note that prediction in the narrower sense of “preprocessing” is still not often investigated by SLA researchers. They add that:

The term “prediction” has been used in the SLA literature, primarily in the context of L2 reading, to refer to inference generation, or guessing (e.g., in fill-the-gap tasks), more generally (e.g., McLaughlin, Reference McLaughlin, Goldman and Trueba1987). This usage does not specify the temporal aspect of this process, i.e., when such inference generation takes place during the incremental construction of meaning as we read/listen. As such, [this usage of the term prediction] is compatible with both (retroactive) information integration and prediction in terms of (proactive) linguistic pre-processing.

(Grüter et al., 2017, footnote 6, p. 224)

One example of this generic, temporally nonspecific usage of “prediction” in SLA is the literature on statistical preemption (Ambridge & Brandt, Reference Ambridge and Brandt2013; Boyd & Goldberg, Reference Boyd and Goldberg2011; Foraker et al., Reference Foraker, Regier, Khetarpal, Perfors and Tenenbaum2009; Robenalt & Goldberg, Reference Robenalt and Goldberg2016), a proposed learning mechanism that is driven by associative learning: Every time an expected outcome is not encountered after a given cue, the strength of its association with that cue diminishes (Rescorla & Wagner, Reference Rescorla and Wagner1972). This line of research has to date examined inference generation through offline tasks such as acceptability judgments (e.g., Robenalt & Goldberg, Reference Robenalt and Goldberg2016) to determine to what extent learners take into account potential alternatives to structures they encounter. While it cannot be ruled out that prediction during processing also plays a role in determining acceptability, these kinds of acceptability tasks also capture the result of processes (such as retroactive information integration) that are not part of prediction in the narrower sense (i.e., linguistic preprocessing), and thus this line of research is not part of the scope of the current review.

Another field of research on prediction that is temporally nonspecific investigates the effect of expectation violation on the formation of new declarative memories. This line of research does not originate in SLA research, but it is relevant to language learning, as well as learning, more generally: Findings have shown that novel associations that violate established patterns are remembered better than those that do not (Brod et al., Reference Brod, Hasselhorn and Bunge2018; De Loof et al., Reference De Loof, Ergo, Naert, Janssens, Talsma, Van Opstal and Verguts2018; Greve et al., Reference Greve, Cooper, Kaula, Anderson and Henson2017; Greve et al., Reference Greve, Cooper, Tibon and Henson2019) and that generating incorrect guesses followed by corrective feedback can, under some circumstances, lead to better learning than simply being exposed to the correct answer (Potts et al., Reference Potts, Davies and Shanks2019; Potts & Shanks, Reference Potts and Shanks2014). While these effects have been observed for a variety of stimuli, such as conceptual knowledge (Brod et al., Reference Brod, Hasselhorn and Bunge2018) and arbitrary picture-word mappings (Greve et al., Reference Greve, Cooper, Kaula, Anderson and Henson2017), it is its role in the acquisition of vocabulary that may be most relevant to SLA. Expectation violation has been shown to aid the acquisition of L1 vocabulary in young children (Stahl & Feigenson, Reference Stahl and Feigenson2017), as well as Dutch–Swahili translation word pairs in adult Dutch L1 speakers (De Loof et al., Reference De Loof, Ergo, Naert, Janssens, Talsma, Van Opstal and Verguts2018; see also Gambi [cited in Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021] for more recent work on L2 vocabulary learning). While it is of relevance to SLA, however, this particular conceptualization of prediction is, too, outside the scope of our review, which is on prediction as linguistic “preprocessing,” or the incremental formulation of expectations during sentence comprehension. We will now turn to prediction as linguistic “preprocessing” and the ways in which it can be conceptualized.

In the literature on prediction during language comprehension, there is a great amount of variation in terms of approaches and terminology used, with different authors focusing on different aspects of the phenomenon (see Kuperberg & Jaeger, Reference Kuperberg and Jaeger2016 for a review). Pickering and colleagues (Pickering & Gambi, Reference Pickering and Gambi2018; Pickering & Garrod, Reference Pickering and Garrod2013), distinguish between two types of prediction, prediction-by-association and prediction-by-production. The first mechanism, prediction-by-association, is driven by basic associative mechanisms like spreading activation, and may constitute the stage prior to prediction-by-production. A feature specific to this account is that in the more complex route (prediction-by-production) preactivation involves forward speech planning: comprehenders use the language production system to formulate predictions about upcoming input. The prediction-by-production route is considered to be very accurate and, therefore, to aid processing, but it is thought to be an optional mechanism; for example, it is not always available, especially in L2, or populations with cognitive limitations (Pickering & Gambi, Reference Pickering and Gambi2018). By contrast, prediction-by-association is less precise and less effective than prediction-by-production; however, it is an integral part of comprehension and, being automatic, it does not take up cognitive resources, which is why it should remain unimpaired even in comprehenders with limited resources (Pickering & Gambi, Reference Pickering and Gambi2018, p. 1030).

Similar to Pickering and colleagues, other authors also distinguish between two broad types of “preprocessing” prediction: a simple, automatic kind, generally limited to the semantic domain, and a more complex, resource-intensive type, involving prediction of specific linguistic features. Kuperberg and Jaeger (Reference Kuperberg and Jaeger2016) contrast a basic sense of prediction, as expectations based on discourse context, with “predictive activation” of low-level (e.g., phonological, morphological) features. In this latter case, comprehenders can “predictively preactivate” low-level representations (e.g., phonological form) based on high-level inferences, before encountering them in the input, rather than just making a high-level event hypothesis, as happens in the more basic, simple case of prediction. Another potential dual-route account of prediction is also offered by Huettig (Reference Huettig2015), modeled on Kahneman’s (Reference Kahneman2011) dual-system model of reasoning: a “dumb” route (System 1), based on simple associative mechanisms, which is contrasted with a “smart” route (System 2), linked to more effortful active reasoning.

The conceptualizations of prediction we have just seen could all be seen as, essentially, dichotomous distinctions; however, the empirical evidence (which will be reviewed in the following text) suggests a more graded process, which can vary in complexity and specificity depending on a variety of factors including context, language proficiency, and the nature of the task. In light of this complexity, Huettig (Reference Huettig2015) proposes a multiple-mechanisms account of prediction, called PACS (production-, association-, combinatorial-, simulation-based prediction). According to this account, prediction can be driven by diverse mechanisms. One is basic association, which is often for semantic information, but may also involve other types of representation (e.g., phonological); another is production, where prediction happens through covert speech production. There is also a combinatorial route—where meaning is built by drawing on multiple linguistic constraints, and an event simulation route, where mental imagery may be used to preactivate linguistic representations. Crucially, these four mechanisms interact with each other: For instance, basic association may provide input that then feeds into the combinatorial route (Huettig, Reference Huettig2015).

To further illustrate the graded nature of prediction processes and its context, task, and individual specificity, we now review evidence for prediction in L1 and L2 speakers in growing order of complexity: from basic sensitivity toward word predictability based on semantic context to preactivation of specific morphological and phonological features, but all aligning with our broad working definition of prediction for the purposes of this review as incremental formulation of expectations during sentence comprehension. In the subsequent section, we then move on to examine the factors that can constrain the extent of predictive processing in both L1 and L2 speakers.

Types of prediction: From basic expectations to preactivation of specific features

A simple type of prediction: Sensitivity to word predictability

The predictability of a word from context is known to affect the way it is processed during comprehension. Words that are predictable from their semantic context are easier to process: L1 speakers spend less time fixating on them during reading (Balota et al., Reference Balota, Pollatsek and Rayner1985; Demberg & Keller, Reference Demberg and Keller2008; Ehrlich & Rayner, Reference Ehrlich and Rayner1981; McDonald & Shillcock, Reference McDonald and Shillcock2003), and are quicker to react to them in behavioral tasks such as lexical decisions (Schwanenflugel & LaCount, Reference Schwanenflugel and LaCount1988; Schwanenflugel & White, Reference Schwanenflugel and White1991; Stanovich & West, Reference Stanovich and West1983) and naming tasks (Forster, Reference Forster1981; Stanovich & West, Reference Stanovich and West1981, Reference Stanovich and West1983; Traxler & Foss, Reference Traxler and Foss2000). Using EEG, words that are highly predictable from context elicit a reduced N400Footnote ³ relative to unexpected words (Kutas & Hillyard, Reference Kutas and Hillyard1980, Reference Kutas and Hillyard1984), a finding that has been widely replicated (see Kutas & Federmeier, Reference Kutas and Federmeier2011 for a review). The size of the N400 elicited by an unexpected sentence-final word is inversely proportional to the cloze probability of the word, that is, how likely the word is to occur at the end of that sentence (DeLong et al., Reference DeLong, Groppe, Urbach and Kutas2012; DeLong et al., Reference DeLong, Urbach and Kutas2005; Luke & Christianson, Reference Luke and Christianson2016), although it is not affected by the number of potential alternative completions (Kuperberg et al., Reference Kuperberg, Brothers and Wlotko2020; Kutas & Hillyard, Reference Kutas and Hillyard1984). The N400 reduction to predictable words is found in L2 speakers, too (Martin et al., Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013). Finally, both L1 and L2 speakers can also exhibit sensitivity to word predictability based on structural cues, not just semantic ones, as evidenced by data from EEG (Kaan et al., Reference Kaan, Kirkham and Wijnen2016) and self-paced reading (Leal et al., Reference Leal, Slabakova and Farmer2017).

In the L1 processing literature, there was initial resistance to accept evidence of sensitivity to word predictability as evidence of “prediction,” on the grounds that it could also simply be interpreted as an effect of “integration,” that is, the ease with which the word’s meaning could be accessed or combined with that of preceding words (see Kutas & Federmeier, Reference Kutas and Federmeier2011 and Van Petten & Luka, Reference Van Petten and Luka2012, for reviews). Indeed, it is very difficult to distinguish between these two accounts (prediction and integration) experimentally. For example, to explain the observation that the N400 in response to predictable words is smaller than that to less predictable words, one could argue that it is because comprehenders were expecting to encounter the specific, highly predictable word (thus, this observation is usable as evidence of prediction). But it is also possible that comprehenders were not expecting anything in particular, and that upon hearing a highly predictable word, it was simply easier for them to process due to its closer semantic fit with the preceding context (thus, this observation could be usable as evidence of integration).

In the body of evidence we have seen so far, a clear-cut distinction between prediction and integration may not always be found; it is now generally accepted that the N400 indexes a cascade of processes that happen both before, during and after word recognition (Nieuwland et al., Reference Nieuwland, Barr, Bartolozzi, Busch-Moreno, Darley, Donaldson, Ferguson, Fu, Heyselaar, Huettig, Matthew Husband, Ito, Kazanina, Kogan, Kohút, Kulakova, Mézière, Politzer-Ahles, Rousselet and Von Grebmer Zu Wolfsthurn2020). However, it is clear that basic sensitivity to word predictability, based on frequency information and often drawing on associative mechanisms, appears to be a robust feature of language processing, both in L1 and L2. Comprehenders use sentential context, whether highly constraining or not, to update their expectations about the likelihood of potential continuations, in a probabilistic fashion (i.e., where multiple possibilities have varying likelihoods). These expectations then affect processing of upcoming input, depending on how expected each was. However, in the next section we will see evidence for how, when context allows it, comprehenders can also make use of these expectations ahead of encountering input to narrow down the range of possible continuations, thus constituting a more complex type of prediction.

Anticipating content: Integrating cues with context

In experimental settings, at least, it has been shown that comprehenders can combine their expectations with context to identify the most likely referent of an upcoming word from a limited set of candidates. Studies using the visual world eye-tracking paradigm have shown that comprehenders use cues such as verb selectional restrictions to form expectations for upcoming content as the sentence unfolds, and identify likely referents from a set of options based on how well they fit these expectations. For instance, when hearing a verb such as eat in “The boy will eat…,” L1 speakers already restrict the range of potential expected completions to items that can be the object of eat; if the visual scene only contains one item that fits that category (e.g., a cake), they will automatically look at the picture of the cake even before hearing the word cake (Altmann & Kamide, Reference Altmann and Kamide1999). This shows a more active kind of anticipation, that goes beyond simple sensitivity to word likelihood: Rather than responding to a word based on how likely it was, comprehenders used their expectations to narrow down the range of potential referents for an upcoming word, ahead of encountering the word, by picking out the most likely candidate from the ones available.

Again, these effects have been observed in L2 speakers, too, although not to the same extent as in L1 speakers. High-proficiency L2 English speakers behaved similarly to L1 speakers when the object of a sentence could be predicted based on the verb’s meaning (Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018) or from the situational context more generally (Ito, Pickering, & Corley, Reference Ito, Pickering and Corley2018), giving anticipatory looks to suitable objects (from a constrained set) to the same extent as L1 speakers. However, there is also evidence that these effects are slower and weaker in L2 speakers (Dijkgraaf et al., Reference Dijkgraaf, Hartsuiker and Duyck2019). They are also modulated by proficiency: Lower-skilled L2 speakers are more likely to fixate (give a prolonged gaze on) less relevant themes in a visual world paradigm (e.g., “cat” when listening to the sentence “The pirate will chase … [the ship]”) compared to higher-skilled bilinguals (Peters et al., Reference Peters, Grüter and Borovsky2018).

The same visual world paradigm has been used in different languages to investigate features such as gender marking (Lew-Williams & Fernald, Reference Lew-Williams and Fernald2007) and case marking (Kamide et al., Reference Kamide, Altmann and Haywood2003), showing that L1 speakers can also use morphological cues to select possible referents from the items in a visual scene. L2 learners have often failed to show the same ability to anticipate content, whether on the basis of gender marking (Grüter et al., Reference Grüter, Lew-Williams and Fernald2012; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010), morphosyntactic (Andringa & Curcic, Reference Andringa and Curcic2015; Hopp, Reference Hopp2015), or morphological information (Mitsugi & MacWhinney, Reference Mitsugi and MacWhinney2016). However, there are also instances of L2 speakers performing similarly to L1 speakers in studies using the visual world eye-tracking paradigm with morphological cues (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Hopp & Lemmerth, Reference Hopp and Lemmerth2018). High-proficiency L1 Russian-L2 German speakers, too, showed native speaker-like prediction using determiner gender marking, even though Russian does not have gender-marked prenominal articles (Hopp & Lemmerth, Reference Hopp and Lemmerth2018). Dussias et al. (Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013) used the paradigm employed by Lew-Williams & Fernald (Reference Lew-Williams and Fernald2007) to investigate predictive processing of gender marking, extending it to L2 speakers. They showed that highly proficient L1 English (a –gender language) and L1 Italian (a +gender language) speakers of L2 Spanish could use gender agreement in prenominal determiners, as done by the L1 Spanish speakers in Lew-Williams and Fernald’s (Reference Lew-Williams and Fernald2007) study, with all groups of participants giving anticipatory looks to appropriate objects in the visual scene. By contrast, low-proficiency L1 English speakers did not show nativelike prediction (however, it should be noted that the L1 Italian group, which had low proficiency, only showed anticipatory looking for feminine determiners).

The evidence we considered earlier shows that both L1 and L2 speakers are sensitive to word predictability, as evidenced by their processing of more or less predictable words. We have now seen that L1 speakers (and sometimes, highly proficient L2 speakers too) can also use their expectations for upcoming content, based on cues in the input (which may be semantic, such as verb selectional restrictions, or grammatical, such as morphological gender marking) to select suitable referents from those made available by context. However, even preferential looking to suitable targets in a visual world eye-tracking study does not necessarily imply preactivation of a specific lexical item or feature: The visual world paradigm provides the item to the participants, who identify it from a set of options as that which most closely matches their expectations. A desire to establish conclusive evidence of prediction in the strictest sense (rather than integration) has informed more complex experimental work using EEG, aimed at showing that preactivation of specific features is possible, in the appropriate circumstances.

Preactivation of specific features: Evidence from EEG

A series of EEG studies has examined prediction by manipulating the morphological and phonological dependencies between highly predictable words and prior elements in the sentence, such as adjectives and determiners (DeLong, Reference DeLong2009; DeLong et al., Reference DeLong, Urbach and Kutas2005; Otten & Van Berkum, Reference Otten and Van Berkum2008; Szewczyk & Schriefers, Reference Szewczyk and Schriefers2013; Van Berkum et al., Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005; Wicha et al., Reference Wicha, Moreno and Kutas2004). For example, DeLong and colleagues (DeLong, Reference DeLong2009; DeLong et al., Reference DeLong, Urbach and Kutas2005) investigated whether specific words were being predicted by their participants by manipulating the phonological alternation of the English singular indefinite article (a/an). Participants read sentences such as “The day was breezy so the boy went to fly…,” which is highly constraining for the completion (a) kite. At this point, encountering the an form of the determiner (potentially leading to a less expected noun, e.g., an airplane) elicited a significantly larger N400 compared to the form a, compatible with the more likely kite. This suggests that the expectation for kite was being used to preactivate a specific linguistic representation including its phonological form (the initial consonant), in turn generating an expectation for a instead of an. (However, see Nieuwland et al., Reference Nieuwland, Politzer-Ahles, Heyselaar, Segaert, Darley, Kazanina, Von Grebmer Zu Wolfsthurn, Bartolozzi, Kogan, Ito, Mézière, Barr, Rousselet, Ferguson, Busch-Moreno, Fu, Tuomainen, Kulakova, Husband and Huettig2018 for a failure to replicate this effect in L1 speakers.) The size of the N400 effect on the article was graded based on the cloze probability of the target noun (i.e., how likely subjects were to expect it as the next word, based on an offline sentence completion task done by native speakers), suggesting that participants were making probabilistic predictions of specific words. Similar results using EEG have been obtained by manipulating gender agreement between nouns and determiners in Spanish (Wicha et al., Reference Wicha, Moreno and Kutas2004) and Dutch (Van Berkum et al., Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005), as well as animacy marking agreement between nouns and adjectives in Polish (Szewczyk & Schriefers, Reference Szewczyk and Schriefers2013).

Compared to the simpler types of prediction seen previously, there seems to be a greater gap between L1 and L2 speakers when it comes to preactivating specific features. Martin et al. (Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013) used the EEG paradigm from DeLong et al. (Reference DeLong, Urbach and Kutas2005), which required participants to preactivate phonological forms (a/an kite), but they failed to replicate in L2 English speakers the effect observed by DeLong et al. for L1 speakers on the determiner. However, Martin et al. still found a basic effect of noun predictability on the noun: Replacing a highly predictable noun with a less predictable one elicited an increased N400, as it did in L1 speakers. This means Martin et al.’s participants did have probabilistic expectations about possible upcoming nouns, even though they were not building on these to predict the appropriate determiner, as did L1 speakers in DeLong et al.’s study. Furthermore, subsequent research suggests that L2 speakers can preactivate specific features in a manner similar to native speakers, at least if those features exist in their L1. Foucart et al. (Reference Foucart, Martin, Moreno and Costa2014) exposed native Spanish speakers and two groups of L2 Romance bilinguals (French–Spanish late bilinguals, and Spanish–Catalan early bilinguals) to Spanish sentences with highly predictable final nouns, manipulating the gender of the preceding determiner following Wicha et al.’s (Reference Wicha, Moreno and Kutas2004) design. Unlike the English a/an alternation, which the L2 participants in the Martin et al. (Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013) study did not have in their L1 (Spanish), gender agreement between determiners and nouns is a common feature of Romance languages, meaning that both the bilingual groups in Foucart et al. (Reference Foucart, Martin, Moreno and Costa2014) would be familiar with this feature from their L1. When morphological gender marking on the determiner was incongruent with the gender of the expected noun, all three groups—Spanish monolinguals and the two bilingual groups—exhibited an increased N400 response, suggesting that L2 speakers were preactivating gender features in a way similar to L1 speakers. This kind of study arguably provides stronger evidence for the preactivation of specific forms than visual world eye-tracking studies because participants were not provided with the possible completions.

The studies we have reviewed here showed that L1 comprehenders (and sometimes, L2 comprehenders too) do not just form expectations based on context, and integrate them with other information, but can also preactivate the phonological and morphological features of the most likely completions, and use those to form further expectations about other elements in the sentence. However, we have also seen that there is variation due to factors (such as L1-L2 similarity for L2 speakers) which can constrain the extent to which comprehenders engage in prediction. In the next section, we review in more detail the factors which constitute the main limitations to predictive mechanisms, in both L1 and L2 speakers.

Limitations to prediction during sentence comprehension in both L1 and L2 speakers

It has been suggested that L2 speakers may suffer from a Reduced Ability to Generate Expectations, or RAGE (Grüter et al., Reference Grüter, Rohde and Schafer2014, Reference Grüter, Rohde and Schafer2017). Grüter et al. (Reference Grüter, Rohde and Schafer2014, Reference Grüter, Rohde and Schafer2017) showed that L2 English speakers (L1 Japanese and L1 Korean) do not take verb aspect information into consideration when formulating expectations about discourse context, as L1 English speakers do, despite the fact that, in both Japanese and Korean, verb aspect has the same discourse implications as it does in English. Grüter et al. argue that processing limitations make it difficult for L2 speakers to integrate cues to formulate predictions.

However, as the authors point out, the distinction between prediction in the L1 and L2 is far from a monolithic one. L2 speakers’ prediction abilities vary depending on proficiency, and L1 speakers can also be limited in their prediction abilities, prompting several authors to conclude that the difference between L1 and L2 in prediction is probably a quantitative, rather than a qualitative, one (Grüter et al., Reference Grüter, Rohde and Schafer2017; Kaan, Reference Kaan2014; Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). Therefore, rather than asking whether or not L2 speakers can predict, we can look at factors which affect prediction in both L1 and L2 speakers, and which can tend to affect L2 speakers in a specific way. The prediction of specific linguistic input based on expectations is constrained by a number of factors, both relating to the input (linguistic constraints and the context) and to the comprehender (cognitive abilities, processing speed, and proficiency). Here, we highlight just three of these constraints on prediction, which are not unique to L2 speakers, but can affect processing in both L1 and L2: cognitive abilities, proficiency, and task design. This is not meant to be a comprehensive list of the factors affecting prediction in L2 speakers (see Kaan, Reference Kaan2014 for a review). Rather, it shows how these factors can vary from being intrinsic to speakers (cognitive abilities) to completely extrinsic (task design), highlighting the complexity of mechanisms involved.

First, predictive mechanisms, other than the most basic sensitivity to word cloze probability, are cognitively demanding and are not consistently observed, even in L1 speakers: They are impaired in elderly L1 speakers (DeLong et al., Reference DeLong, Groppe, Urbach and Kutas2012; see also Huettig, Reference Huettig2015) and low-literacy populations (Mishra et al., Reference Mishra, Singh, Pandey and Huettig2012). Predictive looks in visual world paradigm eye-tracking studies correlate with working memory capacity and processing speed (Huettig & Janse, Reference Huettig and Janse2016) and are delayed under memory load (Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018). In a visual world eye-tracking study using Russian, Sekerina (Reference Sekerina2015) showed a gradient in the speed with which different Russian-speaking populations (L1 adult, heritage speaker adult, L1 child) showed preferential looking toward the upcoming noun based on gender information from the preceding adjective. Prediction in L1 speakers can also vary in speed depending on the cues used to formulate expectations (Chow et al., Reference Chow, Momma, Smith, Lau and Phillips2016, Reference Chow, Lau, Wang and Phillips2018). L2 speakers may be particularly affected by time constraints as they tend to be slower in their processing compared to L1 speakers (Frenck-Mestre, Reference Frenck-Mestre, Herrida and Altarriba2002; Frenck-Mestre et al., Reference Frenck-Mestre, German, Foucart, Heredia and Altarriba2014; Hahne, Reference Hahne2001), and so predictive behavior may not be observable.

Second, anticipation of linguistic material is heavily dependent on proficiency, both in the sense of correct knowledge representations, and in the procedural sense: Using morphological dependencies to generate expectations (as in visual world paradigms) or to probe for them (EEG studies) relies on participants both having a knowledge of these dependencies and being able to deploy it rapidly during processing. Such automatized knowledge may not be available to all, and perhaps only to the most advanced L2 learners for some linguistic dependencies.Footnote ⁴ In fact, as reviewed in the preceding text, while L2 speakers often do not predict to the same extent as L1 speakers, several studies have replicated prediction findings from L1 using high-proficiency L2 speakers, both using eye-tracking and EEG (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Foucart et al., Reference Foucart, Martin, Moreno and Costa2014; Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018; Ito, Pickering, & Corley, Reference Ito, Pickering and Corley2018). Effects of proficiency have been observed in L1, too, both with regard to proficiency in the sense of knowledge representations (e.g., vocabulary size) and in the procedural sense (e.g., verbal fluency). Speed of anticipatory looking in a visual world eye-tracking paradigm correlated positively with vocabulary size in both adults and children (Borovsky et al., Reference Borovsky, Elman and Fernald2012), and with word reading skills in children (Mani & Huettig, Reference Mani and Huettig2014). In adults, anticipatory looking based on semantic cues was found to correlate with verbal fluency (Hintz et al., Reference Hintz, Meyer and Huettig2014), which is compatible with the idea of a prediction-by-production route (Pickering & Gambi, Reference Pickering and Gambi2018; Pickering & Garrod, Reference Pickering and Garrod2013). Following this account, reduced production skills may explain differences in prediction performance between L2 and L1 speakers, too. Grüter et al. (Reference Grüter, Lew-Williams and Fernald2012) found that L2 speakers who were unable to use gender cues to anticipate nouns also made errors in gender assignment on determiners in elicited production. Similarly, Hopp (Reference Hopp2013) observed that English learners of German showed nativelike anticipatory use of gender information in a visual world paradigm only if they were able to accurately and consistently produce the right gender assignment for those nouns. However, high proficiency in L2 speakers does not necessarily lead to native speaker-like prediction: even highly proficient L2 speakers may fail to display fully native speaker-like prediction (Dijkgraaf et al., Reference Dijkgraaf, Hartsuiker and Duyck2019; Kaan et al., Reference Kaan, Kirkham and Wijnen2016) and studies investigating the relation between prediction and L2 proficiency have not found a direct correlation between the two (e.g., Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018; Kim & Grüter, Reference Kim and Grüter2021; see Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021 for a discussion).

Third, the nature of the task used can have a significant effect on the emergence and experimental detection of predictive processing, in multiple ways. On the one hand, prediction studies demonstrating prediction during language processing generally employ highly constraining contexts, which are rare in natural language use (as noted by Luke & Christianson, Reference Luke and Christianson2016). In fact, due to the rarity of highly constraining contexts in everyday language use, the relevance of predictive processes has been questioned, both in relation to language comprehension (Huettig & Mani, Reference Huettig and Mani2016) and L1 acquisition (Rabagliati et al., Reference Rabagliati, Gambi and Pickering2016). On the other hand, even given a context which encourages predictive processing, prediction may not be detected if not enough time is available. Huettig and Guerra (Reference Huettig and Guerra2015) found that anticipatory looking by L1 Dutch speakers based on gendered determiner cues was observed if the visual targets appeared on screen four seconds before the spoken sentence, but not if they only appeared one second before. Trenkic et al. (Reference Trenkic, Mirkovic and Altmann2014) found that L2 English speakers performed similarly to (though more slowly than) native speakers when processing English determiners in a visual world eye-tracking paradigm, even though they didn’t have the equivalent feature in their L1 (Mandarin). In fact, neither native speakers nor the L2 speakers showed evidence of prediction, as preferential looking emerged after the onset of the noun following the determiner in both groups (rather than prior to the noun); however, even this effect was slower to emerge in the L2 group, relative to native speakers. These findings illustrate the critical role of timing in detecting prediction. The reason why the data from Trenkic et al. (Reference Trenkic, Mirkovic and Altmann2014) did not count as evidence of “prediction” is that participants did not begin looking at potential referents before the onset of the noun; however, as we have seen, the speed with which preferential looking emerges is affected by several factors such as task timing and memory load, even in L1 speakers. Therefore, it is possible that, if participants had more time, preferential looking would have been observable even without needing to hear the noun first. The same applies more generally to L2 speakers when they fail to show predictive behavior in eye-tracking experiments. When preferential looking emerges ahead of the onset of the target for L1 speakers but not for L2 speakers it may simply reflect slower processing in L2 speakers (in a context that did not allow for detection at longer time intervals), rather than a qualitative difference between the groups.

In reality, all the factors described in the preceding text—cognitive abilities, proficiency, and task design—are likely to interact with each other. For instance, whether a task will show evidence of prediction depends, among other things, on whether it allows enough time for prediction to emerge and be observed in the particular experimental paradigm being used; in turn, what constitutes “enough time” will be affected by individual differences such as proficiency, verbal abilities, and working memory, for both L1 and L2 speakers. These limitations are relevant to our core question about the extent to which language learning, and L2 learning in particular, may draw on prediction as a learning mechanism, to which we now turn our attention.

Learning from error: Prediction as a learning mechanism

Having examined the extent to which language processing during sentence comprehension involves prediction, and the factors constraining it in both L1 and L2 speakers, we now turn to the question of whether prediction can serve as a learning mechanism. First, we lay out different accounts of the potential role of prediction in SLA, and in language acquisition more generally. We then examine the evidence for error-based learning, starting from the computational modeling that has inspired proposals on the role of prediction in SLA, and also covering empirical evidence from priming and adaptation in both L1 and L2.

What role may prediction play in L1 and L2 acquisition?

While there is abundant evidence that predictive mechanisms operate in language comprehension, the extent to which they may also contribute to language acquisition is debated. While some argue that prediction drives L1 acquisition (Chang et al., Reference Chang, Kidd and Rowland2013; Rowland et al., Reference Rowland, Chang, Ambridge, Pine and Lieven2012), there is skepticism on the importance of prediction in this respect (Huettig, Reference Huettig2015; Huettig & Mani, Reference Huettig and Mani2016; Rabagliati et al., Reference Rabagliati, Gambi and Pickering2016). Enabling learning is one of the main functions that have been proposed for predictive processing (see Huettig, Reference Huettig2015 for a discussion). Specifically, it has been suggested that prediction and error-based learning are necessary for L1 acquisition, partly due to the score of studies on statistical learning showing that children use forward transitional probabilities (the likelihood of an element being followed by another) to acquire language (Saffran et al., Reference Saffran, Aslin and Newport1996). However, tracking these probabilities does not necessarily involve predictive processing; in fact, backward probabilities (the likelihood of an element being preceded by another) are also used by children (Pelucchi et al., Reference Pelucchi, Hay and Saffran2009). The fact that learning can occur without prediction, then, casts doubt on claims that prediction is absolutely necessary for language learning (Huettig, Reference Huettig2015). Overall, the empirical evidence on whether children use prediction for learning their L1 is mixed (Rabagliati et al., Reference Rabagliati, Gambi and Pickering2016). While it is not clear whether prediction during processing is a necessary or pervasive element of L1 acquisition, there is, however, certainly evidence that prediction can be a source of learning. Computational models using error-based learning, which rely on prediction, can model data from L1 syntactic acquisition (Chang et al., Reference Chang, Dell and Bock2006) and from priming studies in L1 and L2, supporting claims that error-based learning may be the mechanism underpinning these phenomena (Bock et al., Reference Bock, Dell, Chang and Onishi2007). This evidence is reviewed more fully in the following section.

Against this backdrop, it has been suggested that prediction may serve as a learning mechanism for certain aspects of SLA (Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). Specifically, this proposal aims to address the problem of overgeneralization, traditionally stressed by generativist approaches to language acquisition: How learners can learn to use rules productively yet avoid producing ungrammatical forms (e.g., “I goed” instead of “I went”), even though they have no direct evidence that such forms are not allowed in the language. In L1 acquisition, children often overgeneralize rules, but eventually converge on the target variety of their language (Pinker, Reference Pinker2009). According to Phillips and Ehrenhofer’s proposal, prediction may offer a way out of overgeneralization for L2 learners, especially when learning complex phenomena (e.g., those that require learners to integrate information from syntax and semantics), by providing the opportunity for hypothesis testing: The ability to make sophisticated predictions about upcoming input, using multiple cues, may allow learners to acquire complex contingencies and, crucially, for retreat from overgeneralization when these hypotheses are not confirmed (Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015).

To exploit this mechanism, learners would need to rapidly integrate multiple cues as they process speech; thus, the fact that the proposed hypothesis-testing mechanism relies on processing speed and proficiency makes it an unlikely candidate for early L1 acquisition, as Phillips and Ehrenhofer (Reference Phillips and Ehrenhofer2015) acknowledge. In L2 learners, however, it may serve a useful function, if they can formulate the relevant prediction quickly enough and track the source of their predictions so that they may readjust their prediction based on that cue for the next time they experience it. To our knowledge, this proposal has not yet been investigated empirically. We will, however, look at the existing evidence for error-based learning. In the following sections, we first review computational models showing that data from L1 processing and acquisition is consistent with a learning mechanism driven by error-based learning that, in turn, requires prediction to occur. We then review empirical evidence from human language processing, which is compatible with the predictions made by these models, and which shows evidence of error-based learning during both L1 and L2 processing.

Insights from computational modeling of L1 processing and acquisition

Computational modeling has shown that certain aspects of language can be acquired through the same mechanisms that are used to process it (Elman, Reference Elman1990). The models in question use so-called neural networks, a particular type of computational model loosely inspired by brain architecture, which consists of units connected to each other in a network. As each unit is activated, it transmits a signal to the units to which it is connected. The connections between units are weighted, meaning that the extent to which one unit affects the next can be adjusted. Neural network models can be used for a variety of tasks, such as classifying data (e.g., determining whether an image is a picture of a bee). A model can learn to perform the required task through supervised training: It is given a “training set” consisting of input (e.g., a set of images) and a desired, or target, output (e.g., a set of labels, either “bee” or “no bee”) for that input. As the model works its way through the input, it produces its own output (i.e., a label for each image). At each step, the model compares its own output to the target output (i.e., the desired label). The difference between the model’s output and the target output is known as the prediction error, and is used by the model to adjust its connection weights, so that the next time it encounters that input, the output it produces will be closer to the target output. In this manner, by gradually adjusting its connection weights, the model learns to perform the required task.

Sentence processing, too, can be modeled with neural networks. It is often modeled using a particular type of neural network called a recurrent neural network, or RNN (Elman, Reference Elman1990). In an RNN, an additional series of connections allows the model to keep track of its previous states (akin to keeping track of words experienced in a sentence), which allows it to process input unfolding over time. Elman (Reference Elman1990) first used this architecture to train a model on next-word prediction in a miniature language. As it encountered each word in the sentence, the model’s output was a pattern of activation reflecting the probabilities of possible continuations. Any difference between its output and the actual next word (prediction error) was then used to adjust its connections. As the model learned the word-order patterns in the language, words belonging to the same syntactic categories began to produce similar patterns of activation, even though the model had no initial notion of word category. This suggested that it is possible to acquire syntactic structure simply through processing language, by estimating the likelihood of possible continuations, and adjusting it based on experience (Elman, Reference Elman1990, Reference Elman1993; see Mikolov et al., Reference Mikolov, Chen, Corrado and Dean2013, for similar results obtained with a natural language corpus).

The potential relevance of these models to prediction during human language processing, and, in turn, language learning, is demonstrated by research showing that the magnitude of prediction error the models encounter positively correlates with sensitivity to word predictability in humans. RNNs trained on next-word prediction were first trained on natural language corpora and then applied to the same materials that were given to human participants in experimental studies, making it possible to compare model performance with human processing. Word-by-word prediction error from these models has been shown to reflect reading times (Frank, Reference Frank2013; Frank & Hoeks, Reference Frank and Hoeks2019; Goodkind & Bicknell, Reference Goodkind and Bicknell2018; Monsalve et al., Reference Monsalve, Frank and Vigliocco2012; Van Schijndel & Linzen, Reference Van Schijndel and Linzen2018), N400 amplitudes during EEG (Frank et al., Reference Frank, Otten, Galli and Vigliocco2013, Reference Frank, Otten, Galli and Vigliocco2015), and MEG responses (Wehbe et al., Reference Wehbe, Vaswani, Knight and Mitchell2014). In other words, the “error signal” used by neural network models to do error-based learning positively correlates with language users’ expectations about upcoming input, which suggests that these expectations may be what supports error-based learning in humans too.

More support for a potential role of prediction in learning comes from the Dual-Path model (Chang, Reference Chang2002; Chang et al., Reference Chang, Dell and Bock2006). This is a specific instance of RNN model that is particularly relevant to current debate on prediction in SLA because it has been cited as the theoretical underpinning for error-based learning in L2 and for the potential role of prediction in such learning (e.g., Jackson & Hopp, Reference Jackson and Hopp2020; Leal et al., Reference Leal, Slabakova and Farmer2017; Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). Originally developed as a model of language production (Chang, Reference Chang2002), the Dual-Path model was adapted to next-word prediction by harnessing its production output to formulate predictions for upcoming words (Chang et al., Reference Chang, Dell and Bock2006); that is, it simulates the prediction-by-production route in humans (Huettig, Reference Huettig2015; Pickering & Garrod, Reference Pickering and Garrod2013). The model was evaluated against data from L1 acquisition, showing that it could simulate findings from preferential looking studies on the acquisition of transitive structures (i.e., Hirsh-Pasek & Golinkoff, Reference Hirsh-Pasek, Golinkoff, McDaniel, McKee and Cairns1996; Naigles, Reference Naigles1990). It could also reproduce data on structural priming (Chang et al., Reference Chang, Dell and Bock2006) and the acquisition of word order biases in English and Japanese (Chang, Reference Chang2009). Priming has been suggested to be a case of implicit error-based learning, and we review evidence for this claim in the next section.

Evidence from priming and adaptation effects in L1 and L2 speakers

The main source of experimental behavioral evidence for error-based learning, in both L1 and L2 speakers, comes from studies on structural priming and adaptation. Structural priming refers to the fact that when language users encounter a particular syntactic construction, they are more likely to expect it again, or to use it in production, than they were before encountering it (Arai et al., Reference Arai, van Gompel and Scheepers2007; Bock, Reference Bock1986; Ferreira & Bock, Reference Ferreira and Bock2006; Ledoux et al., Reference Ledoux, Traxler and Swaab2007). Structural priming effects begin early: They have been observed in children as young as 3 years of age, with priming effects lasting across learning sessions (Branigan & Messenger, Reference Branigan and Messenger2016; Rowland et al., Reference Rowland, Chang, Ambridge, Pine and Lieven2012) and during the earliest stages of L2 learning (Weber et al., Reference Weber, Christiansen, Indefrey and Hagoort2019). It has been suggested that structural priming should be regarded as a case of implicit error-based learning, which modifies a comprehender’s language system, rather than simply inducing a temporary activation of representations (Bock et al., Reference Bock, Dell, Chang and Onishi2007; Bock & Griffin, Reference Bock and Griffin2000; Chang et al., Reference Chang, Janciauskas and Fitz2012).

When the priming effect is persistent, it is often called adaptation. Kaan and Chun (Reference Kaan, Chun, Federmeier and Watson2018a) define syntactic adaptation as “persistent” or “cumulative” priming, where “comprehension or production is not or not only affected by the most recently encountered structure, but by the cumulative prior exposure to structures of the same type” (p. 87). In computational modeling terms, the updating of expectations seen in adaptation would be akin to “adjusting one’s weights.” Adaptation can be measured by tracking the increase in priming effect following repeated exposure to a structure over time, which may manifest itself as increased likelihood to use it in production (Kaan & Chun, Reference Kaan and Chun2018b) or reduced response times when encountering it in comprehension (Fine & Jaeger, Reference Fine and Jaeger2016). Another method, more familiar to SLA research, is to use a pretest/posttest design (Jackson & Hopp, Reference Jackson and Hopp2020; Jackson & Ruf, Reference Jackson and Ruf2017). Adaptation to syntactic structure alternations (such as that between prepositional object and double object dative constructions in English) has been observed in L1 production (Jaeger & Snider, Reference Jaeger and Snider2013; Kaan & Chun, Reference Kaan and Chun2018b; Kaschak, Reference Kaschak2007; Kaschak et al., Reference Kaschak, Loney and Borreggine2006, Reference Kaschak, Kutta and Jones2011; Kaschak & Borreggine, Reference Kaschak and Borreggine2008), and in L1 comprehension (Farmer et al., Reference Farmer, Fine, Yan, Cheimariou and Jaeger2014; Fine et al., Reference Fine, Jaeger, Farmer and Qian2013; Fine & Jaeger, Reference Fine and Jaeger2016). Adaptation effects have also frequently been observed in L2 speakers (Hopp, Reference Hopp2020; Jackson & Ruf, Reference Jackson and Ruf2017; Kaan & Chun, Reference Kaan and Chun2018b; McDonough & Trofimovich, Reference McDonough, Trofimovich, Cadierno and Eskildsen2015; Montero-Melis & Jaeger, Reference Montero-Melis and Florian Jaeger2020; Shin & Christianson, Reference Shin and Christianson2012; see Jackson, Reference Jackson2018 for a review; and see Jackson & Hopp, Reference Jackson and Hopp2020 for an instance of priming without adaptation).

The permanence of priming effects observed in adaptation is compatible with accounts of priming as an instance of learning; however, it does not specifically implicate a role for prediction error as the driving learning mechanism. Additional support for the claim that priming (and adaptation) is a learning mechanism, and specifically an error-based learning mechanism, comes from the observation of inverse frequency effects, which are predicted by an error-based learning model. In the Dual-Path model, low-frequency words would generate greater prediction error, causing a larger adjustment in the weights and therefore a larger learning effect (Chang et al., Reference Chang, Dell and Bock2006). Inverse frequency effects are also observed at the level of structure, not just words: In both L1 and L2, structures that have lower frequency in the input elicit greater priming effects (Hartsuiker et al., Reference Hartsuiker, Kolk and Huiskamp1999; Hartsuiker & Westenberg, Reference Hartsuiker and Westenberg2000; Jaeger & Snider, Reference Jaeger and Snider2013; Kaan & Chun, Reference Kaan and Chun2018b; Kaschak, Reference Kaschak2007; Kaschak et al., Reference Kaschak, Loney and Borreggine2006; Montero-Melis & Jaeger, Reference Montero-Melis and Florian Jaeger2020). In L2 learners, frequency effects appear to be based on the statistics of the L1 at lower proficiency levels, moving to more native speaker-like expectations as proficiency increases (Jackson & Ruf, Reference Jackson and Ruf2017; Montero-Melis & Jaeger, Reference Montero-Melis and Florian Jaeger2020; see Jackson’s Reference Jackson2018 review). Finally, at least in L1, frequency effects also extend to adaptation, with greater adaptation observed for dative structures that are encountered in unexpected contexts (Fazekas et al., Reference Fazekas, Jessop, Pine and Rowland2020). More recent research has begun to directly investigate the link between adaptation and language acquisition, showing that children can adapt to different syntactic structures and use their adapted predictions to infer the meaning of new words and interpret ambiguous words (Havron et al., Reference Havron, de Carvalho, Fiévet and Christophe2019; Havron et al., Reference Havron, Babineau, Fiévet, Carvalho and Christophe2021). All these findings suggest that structural priming and adaptation, driven by a prediction-based mechanism, could potentially play a role in both L1 and L2 acquisition.

Limitations of error-based learning during processing

Even adaptation, however, is subject to variability. For instance, adaptation to structural alternation may depend on which specific semantically constrained constructions are used. In Experiment 1 in Jackson and Ruf (Reference Jackson and Ruf2017), intermediate English–German L2 learners were exposed to fronted temporal adverbial phrases such as Im Winter trägt Paul eine Jacke (“In winter Paul wears a jacket”), which are marked in both English and German, but more frequently in German. They showed both immediate priming and adaptation to these structures, as measured in a posttest. However, in Experiment 2, exposure to fronted adverbial phrases using locative instead of temporal expressions (e.g., Auf dem Berg trägt der Schüler eine Jacke, “On the mountain the pupil wears a jacket”) led to short-term priming, but no adaptation. Similarly, Jackson and Hopp (Reference Jackson and Hopp2020) found L2 English speakers exposed to fronted adverbials in English exhibited immediate priming but no evidence of adaptation in the posttest, unlike L1 English speakers.

In particular, adaptation to garden-path sentences appears to be less robust than adaptation to simple structural alternations, in both L1 and L2 speakers. While some studies find an adaptation effect to garden-path sentences in L1 (Farmer et al., Reference Farmer, Fine, Yan, Cheimariou and Jaeger2014; Fine et al., Reference Fine, Jaeger, Farmer and Qian2013; Kaan et al., Reference Kaan, Futch, Fuertes, Mujcinovic and de la Fuente2019), others find no such evidence (Dempsey et al., Reference Dempsey, Liu and Christianson2020) or show that is very difficult to detect (Prasad & Linzen, Reference Prasad and Linzen2021). In L2 speakers, the evidence is again mixed (Kaan et al., Reference Kaan, Futch, Fuertes, Mujcinovic and de la Fuente2019; Hopp, Reference Hopp2020). Kaan et al. examined adaptation to garden-path sentences (filler-gap wh-subordinates and ambiguous coordination) in L1 and L2 English speakers, in a study using self-paced reading. Adaptation was only found in the L1 English group, and only for the “easier” structure (coordination). Therefore, learning from prediction error during processing was difficult, not just for L2 speakers but for L1 speakers too, who only showed adaptation to the less cognitively demanding of the two structures examined. Such a finding is arguably compatible with evidence that L2 speakers have difficulty recovering from garden-path sentences in a way that is reflective of the difficulty faced by children in their L1. In Pozzan and Trueswell’s (Reference Pozzan and Trueswell2016) study, participants listened to instructions and carried them out while interacting with a visual scene. Error rates on garden-path sentences (showing an inability to revise the initial parse) were similar for 5-year-old L1 speakers and adult L2 learners when there were no referential cues supporting the target interpretation. However, Hopp (Reference Hopp2020, Experiment 1) found that L2 German speakers could adapt to garden-path sentences (specifically, to the intransitive use of optionally transitive verbs such as play) if the sentences provided an unambiguous cue flagging the correct interpretation, in the form of case marking (e.g., in “The boy played and he pleased the parents with the music,” the verb played would be followed by and and the pronoun he in the nominative case, signaling the start of a new clause).Footnote ⁵ In sum, as is the case for prediction mechanisms, we see that there is variation both among L1 and L2 speakers in the extent to which they can adapt to specific syntactic structures.

Another potential source of variability in adaptation is that prediction depends on context: Evidence suggests that the extent to which predictions are made during language comprehension depends on the overall reliability of context as a source of prediction (Delaney-Busch et al., Reference Delaney-Busch, Morgan, Lau and Kuperberg2019), and that predictions stop being formulated when cues become unreliable (Brothers et al., Reference Brothers, Swaab and Traxler2017). In a self-paced reading study by Brothers et al. (Reference Brothers, Swaab and Traxler2017, Experiment 2), the global validity of predictive cues affected the extent of prediction found. Participants read critical sentences that all had highly predictable completions (i.e., in highly constrained sentential contexts), and a set of highly constraining filler sentences that were manipulated by either having expected or unexpected completions. Participants who saw the filler sentences with the expected completions showed an effect of predictability on the critical items (i.e., reduced reaction times for predictable completions), while the group who saw the filler items with unexpected completions did not show a statistically significant prediction effect on the predictable completion critical items. This suggests that the overall likelihood of disconfirmed predictions had led the group who had experienced the unpredictable completions to abandon the use of sentential constraint as a cue.

The sensitivity of prediction to cue reliability means that prediction error may sometimes result in abandonment or temporary suppression of predictive mechanisms, instead of leading to adaptation (Brothers et al., Reference Brothers, Swaab and Traxler2017; Hopp, Reference Hopp2016; Husband & Bovolenta, Reference Husband and Bovolenta2020; Lau et al., Reference Lau, Holcomb and Kuperberg2013; Van Heugten et al., Reference Van Heugten, Dahan, Johnson and Christophe2012; Wlotko & Federmeier, Reference Wlotko and Federmeier2011). Hopp (Reference Hopp2016) manipulated cues that had high predictive value (gender marking in German) so as to make them unreliable. L1 speakers, who had been using them to anticipate upcoming referents in a visual world eye-tracking paradigm, stopped anticipatory looking when gender marking became unreliable. Hopp argues that this explains why L2 German speakers in previous studies (Hopp, Reference Hopp2013, Reference Hopp2016) only used gender predictively if they consistently produced accurate gender marking; for those who did not (meaning that they had nontarget representations), prediction lead to error, which perhaps led to it being abandoned. If a cue is not a reliably predictive cue, it is arguably an important part of the learning process to stop using it (as it would cause inefficient processing). However, language users can rapidly adapt to input that is seemingly inconsistent, if they can identify new reliable cues in it which have predictive value. Kroczek and Gunter (Reference Kroczek and Gunter2017) exposed listeners to speech by speakers who differed in the relative probabilities of specific syntactic structures they used (OSV/SVO word order in German). While structure usage was not consistent across speakers, listeners developed distinct expectations for syntactic continuations depending on the speaker, as each speaker manifested reliable structure usage. These findings suggest that language users are constantly evaluating the reliability of cues and potential cues, abandoning them if they are no longer reliable and tuning in to new ones that reliably predict upcoming input.

Conclusion

We have seen that the picture, when it comes to prediction in comprehension and learning, is extremely nuanced. Prediction consists of different processes, with different levels of complexity, from basic priming mechanisms to preactivation of specific features. Prediction abilities can vary depending on a large number of factors, in L1 and L2 speakers alike. Variability between L1 and L2 speakers—and even among L1 and L2 speakers—increases as prediction becomes more complex. Sensitivity to word predictability is the most robust type of prediction, showing the least difference between L1 and L2 speakers. However, even at the most complex level of prediction (i.e., preactivation of specific features), L2 speakers of sufficient proficiency have the potential to predict in a native speaker-like fashion, though higher proficiency does not necessarily lead to native speaker-like prediction. In both groups of language users, then, prediction is modulated by cognitive abilities, processing speed, and various aspects of proficiency. More research is needed to investigate the complex relationships between these factors. The role of cognitive and linguistic individual differences, is clearly a burgeoning area of interest in the language learning sciences (e.g., Bolibaugh & Foster, Reference Havron, Babineau, Fiévet, Carvalho and Christophe2021; Buffington et al., Reference Buffington, Demos and Morgan-Short2021; Pili-Moss, Reference Pili-Moss2021; Riches & Jackson, Reference Riches and Jackson2018; Walker et al., Reference Walker, Monaghan, Schoetensack and Rebuschat2020; including special issues dedicated to the topic such as those edited by Andringa & Dąbrowska, Reference Andringa and Dąbrowska2019; Roberts & Meyer, Reference Roberts and Meyer2012), and investigating the role of individual cognitive abilities in prediction and, specifically, error-based learning, in L1 and L2 (or Lx) speakers would constitute a timely extension of this agenda. Such research could shed more light on some of the factors we have reviewed, such as the varying effect of L2 proficiency (e.g., helping to clarify the relative contributions of knowledge representation and “procedural” proficiency in enabling prediction). Individual differences focused research could also help to address broader questions relating to the explanatory power of processing-based accounts of language acquisition (e.g., Havron et al., Reference Havron, Babineau, Fiévet, Carvalho and Christophe2021).

There is ample evidence—from empirical studies on priming and adaptation—that prediction error can be one source of learning, and such evidence is compatible with the predictions made by computational models employing error-based learning. Adaptation to syntactic structure is observed in both L1 and L2 speakers, but again, there is a great deal of variation. Cue reliability affects the extent to which comprehenders make predictions, and factors such as the complexity of the specific syntactic structures encountered can influence the degree of adaptation that can take place, both in L1 and L2 speakers. The extent to which prediction error could also support the acquisition of complex contingencies, as suggested by the Hypothesis Testing proposal, remains an open question to be investigated empirically. More generally, further research will be needed to investigate the question of which kind of linguistics properties can be learned by prediction error, and through which specific mechanisms. This review has focused primarily on error-based learning during the online processing of syntactic structure, mostly evidenced by syntactic adaptation. Other strands of research, however, have used different paradigms to investigate the effect of prediction error on L2 acquisition, such as research on declarative memory formation and vocabulary learning briefly mentioned at the start of this review (e.g., De Loof et al., Reference De Loof, Ergo, Naert, Janssens, Talsma, Van Opstal and Verguts2018). In reality, these mechanisms—error-based learning during online syntactic processing and enhanced declarative memory formation driven by prediction error—are unlikely to operate in isolation, yet the relationship between them is still unclear. A promising avenue for further research will be to examine the connections between them, for instance, by investigating the extent to which error-based learning during incremental sentence processing involves adjusting existing representations, and to what extent it may rely on the formation of new declarative memories (e.g., see Bernolet et al., Reference Bernolet, Collina and Hartsuiker2016, for evidence that syntactic priming is enhanced by explicit memory of the prime sentence). In turn, this understanding may help to address the question of what can and cannot be learned (rather than merely consolidated) through prediction error.

In light of the evidence we have reviewed, there is clearly no simple answer to the question of whether impaired prediction in L2 (or Lx) speakers could be a (qualitative or quantitative) hindrance for acquisition. However, the role of prediction in L2 acquisition is a fruitful avenue for future research, best approached with an awareness of its many nuances and complexities.

Footnotes

¹ We limit our review to L2, though it has relevance to additional languages (Lx) too. Lx involves other issues that would merit further review and as such are beyond the scope of our review.

² For our purposes, L2 learning will be defined here as a change in a speaker’s L2 representations resulting from exposure to the language, in any form.

³ The N400 is a negative shift in potential, detected in the centroparietal region of the scalp, which peaks approximately 400 ms after encountering a new stimulus. In language studies, the stimulus will normally be a word (presented visually or aurally), but this component is also observed in response to other kinds of stimuli, such as pictures.

⁴ However, there is also evidence of predictive processing of determiner-noun agreement emerging after relatively brief exposure to a miniature language (Curcic et al., Reference Curcic, Andringa and Kuiken2019), but only in subjects who became aware of the agreement rule.

⁵ However, participants continued to show garden-path effects in response to optionally transitive verbs when these were embedded in other sentences, without unambiguous case marking. This suggests that they did not adapt to the intransitive use of optionally transitive verbs in general, but rather to the nominative pronoun signalling the start of a new clause (Hopp, Reference Hopp2020).

References

Altmann, G. T., & Kamide, Y. (1999). Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition, 73, 247–264. https://doi.org/10.1016/s0010-0277(99)00059-1 CrossRef Google Scholar PubMed

Ambridge, B., & Brandt, S. (2013). Lisa filled water into the cup: The roles of entrenchment, pre-emption and verb semantics in German speakers’ L2 acquisition of English locatives. Zeitschrift Für Anglistik Und Amerikanistik, 61, 245–263. http://pcwww.liv.ac.uk/~ambridge/Papers/German.pdf CrossRef Google Scholar

Andringa, S., & Curcic, M. (2015). How explicit knowledge affects online L2 processing: Evidence from differential object marking acquisition. Studies in Second Language Acquisition, 37, 237–268. https://doi.org/10.1017/S0272263115000017 CrossRef Google Scholar

Andringa, S., & Dąbrowska, E. (2019). Individual differences in first and second language ultimate attainment and their causes. Language Learning, 69, 5–12. https://doi.org/10.1111/lang.12328 Google Scholar

Arai, M., van Gompel, R. P. G., & Scheepers, C. (2007). Priming ditransitive structures in comprehension. Cognitive Psychology, 54, 218–250. https://doi.org/10.1016/j.cogpsych.2006.07.001 CrossRef Google Scholar PubMed

Balota, D. A., Pollatsek, A., & Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364–390. https://doi.org/10.1016/0010-0285(85)90013-1 CrossRef Google Scholar PubMed

Bernolet, S., Collina, S., & Hartsuiker, R. J. (2016). The persistence of syntactic priming revisited. Journal of Memory and Language, 91, 99–116. https://doi.org/10.1016/j.jml.2016.01.002 CrossRef Google Scholar

Bock, K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18, 355–387.CrossRef Google Scholar

Bock, K., Dell, G. S., Chang, F., & Onishi, K. H. (2007). Persistent structural priming from language comprehension to language production. Cognition, 104, 437–458. https://doi.org/10.1016/j.cognition.2006.07.003 CrossRef Google Scholar PubMed

Bock, K., & Griffin, Z. M. (2000). The persistence of structural priming: transient activation or implicit learning? Journal of Experimental Psychology. General, 129, 177–192. https://doi.org/10.1037/0096-3445.129.2.177 CrossRef Google Scholar PubMed

Bolibaugh, C., & Foster, P. (2021). Implicit statistical learning in naturalistic and instructed morphosyntactic attainment: An aptitude‐treatment interaction design. Language Learning. Advance online publication. https://doi.org/10.1111/lang.12465 CrossRef Google Scholar

Borovsky, A., Elman, J. L., & Fernald, A. (2012). Knowing a lot for one’s age: Vocabulary skill and not age is associated with anticipatory incremental sentence interpretation in children and adults. Journal of Experimental Child Psychology, 112, 417–436. https://doi.org/10.1016/j.jecp.2012.01.005 CrossRef Google Scholar

Boyd, J. K., & Goldberg, A. E. (2011). Learning what not to say: The role of statistical preemption and categorization in a-adjective production. Language, 87, 55–83. http://www.jstor.org/stable/23011540 CrossRef Google Scholar

Branigan, H. P., & Messenger, K. (2016). Consistent and cumulative effects of syntactic experience in children’s sentence production: Evidence for error-based implicit learning. Cognition, 157, 250–256. https://doi.org/10.1016/j.cognition.2016.09.004 CrossRef Google Scholar PubMed

Brod, G., Hasselhorn, M., & Bunge, S. A. (2018). When generating a prediction boosts learning: The element of surprise. Learning and Instruction, 55, 22–31. https://doi.org/10.1016/j.learninstruc.2018.01.013 CrossRef Google Scholar

Brothers, T., Swaab, T. Y., & Traxler, M. J. (2017). Goals and strategies influence lexical prediction during sentence comprehension. Journal of Memory and Language, 93, 203–216. https://doi.org/10.1016/j.jml.2016.10.002 CrossRef Google Scholar

Buffington, J., Demos, A. P., & Morgan-Short, K. (2021). The reliability and validity of procedural memory assessments used in second language acquisition research. Studies in Second Language Acquisition, 43, 635–662. https://doi.org/10.1017/S0272263121000127 CrossRef Google Scholar

Chang, F. (2002). Symbolically speaking: A connectionist model of sentence production. Cognitive Science, 26, 609–651. https://doi.org/10.1207/s15516709cog2605_3 CrossRef Google Scholar

Chang, F. (2009). Learning to order words: A connectionist model of heavy NP shift and accessibility effects in Japanese and English. Journal of Memory and Language, 61, 374–397. https://doi.org/10.1016/j.jml.2009.07.006 CrossRef Google Scholar

Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological Review, 113, 234–272. https://doi.org/10.1037/0033-295X.113.2.234 CrossRef Google Scholar PubMed

Chang, F., Janciauskas, M., & Fitz, H. (2012). Language adaptation and learning: Getting explicit about implicit learning. Language and Linguistics Compass, 6, 259–278. https://doi.org/10.1002/lnc3.337 CrossRef Google Scholar

Chang, F., Kidd, E., & Rowland, C. F. (2013). Prediction in processing is a by-product of language learning [Review of Prediction in processing is a by-product of language learning]. The Behavioral and Brain Sciences, 36, 350–351. https://doi.org/10.1017/S0140525X12002518 CrossRef Google Scholar PubMed

Chow, W.-Y., Lau, E., Wang, S., & Phillips, C. (2018). Wait a second! Delayed impact of argument roles on on-line verb prediction. Language, Cognition and Neuroscience, 33, 803–828. https://doi.org/10.1080/23273798.2018.1427878 CrossRef Google Scholar

Chow, W.-Y., Momma, S., Smith, C., Lau, E., & Phillips, C. (2016). Prediction as memory retrieval: Timing and mechanisms. Language, Cognition and Neuroscience, 31, 617–627. https://doi.org/10.1080/23273798.2016.1160135 CrossRef Google Scholar

Curcic, M., Andringa, S., & Kuiken, F. (2019). The role of awareness and cognitive aptitudes in L2 predictive language processing: Role of awareness and aptitudes in L2 prediction. Language Learning, 69, 42–71. https://doi.org/10.1111/lang.12321 CrossRef Google Scholar

Delaney-Busch, N., Morgan, E., Lau, E., & Kuperberg, G. R. (2019). Neural evidence for Bayesian trial-by-trial adaptation on the N400 during semantic priming. Cognition, 187, 10–20. https://doi.org/10.1016/j.cognition.2019.01.001 CrossRef Google Scholar PubMed

DeLong, K. A. (2009). Electrophysiological explorations of linguistic pre-activation and its consequences during online sentence processing. Unpublished doctoral dissertation, UC San Diego. https://escholarship.org/uc/item/4q7520sb Google Scholar

DeLong, K. A., Groppe, D. M., Urbach, T. P., & Kutas, M. (2012). Thinking ahead or not? Natural aging and anticipation during reading. Brain and Language, 121, 226–239. https://doi.org/10.1016/j.bandl.2012.02.006 CrossRef Google Scholar PubMed

DeLong, K. A., Troyer, M., & Kutas, M. (2014). Pre-processing in sentence comprehension: Sensitivity to likely upcoming meaning and structure: Pre-processing in sentence comprehension. Language and Linguistics Compass, 8, 631–645. https://doi.org/10.1111/lnc3.12093 CrossRef Google Scholar PubMed

DeLong, K. A., Urbach, T. P., & Kutas, M. (2005). Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience, 8, 1117.CrossRef Google Scholar PubMed

De Loof, E., Ergo, K., Naert, L., Janssens, C., Talsma, D., Van Opstal, F., & Verguts, T. (2018). Signed reward prediction errors drive declarative learning. PloS One, 13, e0189212. https://doi.org/10.1371/journal.pone.0189212 CrossRef Google Scholar PubMed

Demberg, V., & Keller, F. (2008). Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 109, 193–210. https://doi.org/10.1016/j.cognition.2008.07.008 CrossRef Google Scholar PubMed

Dempsey, J., Liu, Q., & Christianson, K. (2020). Convergent probabilistic cues do not trigger syntactic adaptation: Evidence from self-paced reading. Journal of Experimental Psychology. Learning, Memory, and Cognition, 46, 1906–1921. https://doi.org/10.1037/xlm0000881 CrossRef Google Scholar

Dijkgraaf, A., Hartsuiker, R. J., & Duyck, W. (2019). Prediction and integration of semantics during L2 and L1 listening. Language, Cognition and Neuroscience, 34, 881–900. https://doi.org/10.1080/23273798.2019.1591469 CrossRef Google Scholar

Dussias, P. E., Valdés Kroff, J. R., Guzzardo Tamargo, R. E., & Gerfen, C. (2013). When gender and looking go hand in hand: Grammatical gender processing in L2 Spanish. Studies in Second Language Acquisition, 35, 353–387. https://doi.org/10.1017/S0272263112000915 CrossRef Google Scholar

Ehrlich, S. F., & Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Memory and Language, 20, 641.Google Scholar

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211. https://doi.org/10.1207/s15516709cog1402_1 CrossRef Google Scholar

Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71–99. https://doi.org/10.1016/0010-0277(93)90058-4 CrossRef Google Scholar PubMed

Farmer, T., Fine, A., Yan, S., Cheimariou, S., & Jaeger, F. (2014). Error-driven adaptation of higher-level expectations during reading. Proceedings of the Annual Meeting of the Cognitive Science Society, 36, 2181–2186. https://escholarship.org/uc/item/00t2m3wr Google Scholar

Fazekas, J., Jessop, A., Pine, J., & Rowland, C. (2020). Do children learn from their prediction mistakes? A registered report evaluating error-based theories of language acquisition. Royal Society Open Science, 7, 180877. https://doi.org/10.1098/rsos.180877 CrossRef Google Scholar PubMed

Ferreira, V. S., & Bock, K. (2006). The functions of structural priming. Language and Cognitive Processes, 21, 1011–1029. https://doi.org/10.1080/016909600824609 CrossRef Google Scholar PubMed

Fine, A. B., & Jaeger, T. F. (2016). The role of verb repetition in cumulative structural priming in comprehension. Journal of Experimental Psychology. Learning, Memory, and Cognition, 42, 1362–1376. https://doi.org/10.1037/xlm0000236 CrossRef Google Scholar PubMed

Fine, A. B., Jaeger, T. F., Farmer, T. A., & Qian, T. (2013). Rapid expectation adaptation during syntactic comprehension. PloS One, 8, e77661. https://doi.org/10.1371/journal.pone.0077661 CrossRef Google Scholar PubMed

Foraker, S., Regier, T., Khetarpal, N., Perfors, A., & Tenenbaum, J. (2009). Indirect evidence and the poverty of the stimulus: The case of anaphoric one. Cognitive Science, 33, 287–300. https://doi.org/10.1111/j.1551-6709.2009.01014.x CrossRef Google Scholar PubMed

Forster, K. I. (1981). Priming and the effects of sentence and lexical contexts on naming time: Evidence for autonomous lexical processing. The Quarterly Journal of Experimental Psychology Section A, 33, 465–495. https://doi.org/10.1080/14640748108400804 CrossRef Google Scholar

Foucart, A., Martin, C. D., Moreno, E. M., & Costa, A. (2014). Can bilinguals see it coming? Word anticipation in L2 sentence reading. Journal of Experimental Psychology. Learning, Memory, and Cognition, 40, 1461.CrossRef Google Scholar PubMed

Frank, S. L. (2013). Uncertainty Reduction as a Measure of Cognitive Load in Sentence Comprehension. Topics in Cognitive Science, 5, 475–494. https://doi.org/10.1111/tops.12025 CrossRef Google Scholar PubMed

Frank, S. L., & Hoeks, J. (2019). The interaction between structure and meaning in sentence comprehension: Recurrent neural networks and reading times. Proceedings of the 41st Annual Conference of the Cognitive Science Society, 337–343. https://doi.org/10.31234/osf.io/mks5y CrossRef Google Scholar

Frank, S. L., Otten, L. J., Galli, G., & Vigliocco, G. (2013). Word surprisal predicts N400 amplitude during reading. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 878–883. https://repository.ubn.ru.nl/bitstream/handle/2066/119221/119221.pdf Google Scholar

Frank, S. L., Otten, L. J., Galli, G., & Vigliocco, G. (2015). The ERP response to the amount of information conveyed by words in sentences. Brain and Language, 140, 1–11. https://doi.org/10.1016/j.bandl.2014.10.006 CrossRef Google Scholar

Frenck-Mestre, C. (2002). An on-line look at sentence processing in the second language. In Herrida, R. & Altarriba, J. (Eds.), Bilingual Sentence Processing (pp. 217–236). North Holland.CrossRef Google Scholar

Frenck-Mestre, C., German, E. S., & Foucart, A. (2014). Qualitative differences in native and nonnative semantic processing as revealed by ERPs. In Heredia, R. R. & Altarriba, J. (Eds.), Foundations of Bilingual Memory (pp. 237–255). Springer New York. https://doi.org/10.1007/978-1-4614-9218-4_12 CrossRef Google Scholar

Goodkind, A., & Bicknell, K. (2018). Predictive power of word surprisal for reading times is a linear function of language model quality. Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018), 10–18. https://www.aclweb.org/anthology/W18-0102.pdf CrossRef Google Scholar

Greve, A., Cooper, E., Kaula, A., Anderson, M. C., & Henson, R. (2017). Does prediction error drive one-shot declarative learning? Journal of Memory and Language, 94, 149–165. https://doi.org/10.1016/j.jml.2016.11.001 CrossRef Google Scholar PubMed

Greve, A., Cooper, E., Tibon, R., & Henson, R. N. (2019). Knowledge is power: Prior knowledge aids memory for both congruent and incongruent events, but in different ways. Journal of Experimental Psychology. General, 148, 325–341. https://doi.org/10.1037/xge0000498 CrossRef Google Scholar PubMed

Grüter, T., Lew-Williams, C., & Fernald, A. (2012). Grammatical gender in L2: A production or a real-time processing problem? Second Language Research, 28, 191–215. https://doi.org/10.1177/0267658312437990 CrossRef Google Scholar PubMed

Grüter, T., Rohde, H., & Schafer, A. (2014). The role of discourse-level expectations in non-native speakers’ referential choices. Proceedings of the Annual Boston University Conference on Language Development. https://par.nsf.gov/biblio/10028988 Google Scholar

Grüter, T., Rohde, H., & Schafer, A. J. (2017). Coreference and discourse coherence in L2: The roles of grammatical aspect and referential form. Linguistic Approaches to Bilingualism, 7, 199–229. https://doi.org/10.1075/lab.15011.gru CrossRef Google Scholar

Hahne, A. (2001). What’s different in second-language processing? Evidence from event-related brain potentials. Journal of Psycholinguistic Research, 30, 251–266. https://doi.org/10.1023/a:1010490917575 CrossRef Google Scholar PubMed

Hartsuiker, R. J., Kolk, H. H. J., & Huiskamp, P. (1999). Priming word order in sentence production. The Quarterly Journal of Experimental Psychology Section A, 52, 129–147. https://doi.org/10.1080/713755798 CrossRef Google Scholar

Hartsuiker, R. J., & Westenberg, C. (2000). Word order priming in written and spoken sentence production. Cognition, 75, B27–B39. https://doi.org/10.1016/s0010-0277(99)00080-3 CrossRef Google Scholar PubMed

Havron, N., Babineau, M., Fiévet, A.-C., Carvalho, A., & Christophe, A. (2021). Syntactic prediction adaptation accounts for language processing and language learning. Language Learning. Advance online publication. https://doi.org/10.1111/lang.12466 CrossRef Google Scholar

Havron, N., de Carvalho, A., Fiévet, A.-C., & Christophe, A. (2019). Three- to four-year-old children rapidly adapt their predictions and use them to learn novel word meanings. Child Development, 90, 82–90. https://doi.org/10.1111/cdev.13113 CrossRef Google Scholar PubMed

Hintz, F., Meyer, A. S., & Huettig, F. (2014). The influence of verb-specific featural restrictions, word associations, and production-based mechanisms on language-mediated anticipatory eye movements. The 27th Annual CUNY Conference on Human Sentence Processing. https://pure.mpg.de/rest/items/item_1949864/component/file_1949863/content Google Scholar

Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The intermodal preferential looking paradigm: A window onto emerging language comprehension. In McDaniel, D., McKee, C., & Cairns, H. S. (Eds.), Language, speech, and communication. Methods for assessing children’s syntax (pp. 105–124). The MIT Press. https://psycnet.apa.org/record/1997-97174-005 Google Scholar

Hopp, H. (2013). Grammatical gender in adult L2 acquisition: Relations between lexical and syntactic variability. Second Language Research, 29, 33–56. https://doi.org/10.1177/0267658312461803 CrossRef Google Scholar

Hopp, H. (2015). Semantics and morphosyntax in predictive L2 sentence processing. IRAL, International Review of Applied Linguistics in Language Teaching: Revue Internationale de Linguistique Appliquee Enseignement Des Langues. Internationale Zeitschrift Fur Angewandte Linguistik in Der Spracherziehung, 53, 277–306. https://doi.org/10.1515/iral-2015-0014 CrossRef Google Scholar

Hopp, H. (2016). Learning (not) to predict: Grammatical gender processing in second language acquisition. Second Language Research, 32, 277–307. https://doi.org/10.1177/0267658315624960 CrossRef Google Scholar

Hopp, H. (2020). Morphosyntactic adaptation in adult L2 processing: Exposure and the processing of case and tense violations. Applied Psycholinguistics, 41, 627–656. https://doi.org/10.1017/S0142716420000119 CrossRef Google Scholar

Hopp, H., & Lemmerth, N. (2018). Lexical and syntactic congruency in L2 predictive gender processing. Studies in Second Language Acquisition, 40, 171–199. https://doi.org/10.1017/S0272263116000437 CrossRef Google Scholar

Huettig, F. (2015). Four central questions about prediction in language processing. Brain Research, 1626, 118–135. https://doi.org/10.1016/j.brainres.2015.02.014 CrossRef Google Scholar

Huettig, F., & Guerra, E. (2015). Testing the limits of prediction in language processing: Prediction occurs but far from always. The 21st Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP 2015). Google Scholar

Huettig, F., & Janse, E. (2016). Individual differences in working memory and processing speed predict anticipatory spoken language processing in the visual world. Language, Cognition and Neuroscience, 31, 80–93. https://doi.org/10.1080/23273798.2015.1047459 CrossRef Google Scholar

Huettig, F., & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Language, Cognition and Neuroscience, 31, 19–31. https://doi.org/10.1080/23273798.2015.1072223 CrossRef Google Scholar

Husband, E. M., & Bovolenta, G. (2020). Prediction failure blocks the use of local semantic context. Language, Cognition and Neuroscience, 35, 273–291. https://doi.org/10.1080/23273798.2019.1651881 CrossRef Google Scholar

Ito, A., Corley, M., & Pickering, M. J. (2018). A cognitive load delays predictive eye movements similarly during L1 and L2 comprehension. Bilingualism: Language and Cognition, 21, 251–264. https://doi.org/10.1017/S1366728917000050 CrossRef Google Scholar

Ito, A., Pickering, M. J., & Corley, M. (2018). Investigating the time-course of phonological prediction in native and non-native speakers of English: A visual world eye-tracking study. Journal of Memory and Language, 98, 1–11. https://doi.org/10.1016/j.jml.2017.09.002 CrossRef Google Scholar

Jackson, C. N. (2018). Second language structural priming: A critical review and directions for future research. Second Language Research, 34, 539–552. https://doi.org/10.1177/0267658317746207 CrossRef Google Scholar

Jackson, C. N., & Hopp, H. (2020). Prediction error and implicit learning in L1 and L2 syntactic priming. International Journal of Bilingualism, 24, 895–911. https://doi.org/10.1177/1367006920902851 CrossRef Google Scholar

Jackson, C. N., & Ruf, H. T. (2017). The priming of word order in second language German. Applied Psycholinguistics, 38, 315–345. https://doi.org/10.1017/S0142716416000205 CrossRef Google Scholar

Jaeger, T. F., & Snider, N. E. (2013). Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime’s prediction error given both prior and recent experience. Cognition, 127, 57–83. https://doi.org/10.1016/j.cognition.2012.10.013 CrossRef Google Scholar PubMed

Kaan, E. (2014). Predictive sentence processing in L2 and L1: What is different? Linguistic Approaches to Bilingualism, 4, 257–282. https://doi.org/10.1075/lab.4.2.05kaa CrossRef Google Scholar

Kaan, E., & Chun, E. (2018a). Syntactic adaptation. In Federmeier, K. D. & Watson, D. G. (Eds.), Psychology of Learning and Motivation (Vol. 68, pp. 85–116). Academic Press. https://doi.org/10.1016/bs.plm.2018.08.003 Google Scholar

Kaan, E., & Chun, E. (2018b). Priming and adaptation in native speakers and second-language learners. Bilingualism: Language and Cognition, 21, 228–242. https://doi.org/10.1017/S1366728916001231 CrossRef Google Scholar

Kaan, E., Futch, C., Fuertes, R. F., Mujcinovic, S., & de la Fuente, E. Á. (2019). Adaptation to syntactic structures in native and nonnative sentence comprehension. Applied Psycholinguistics, 40, 3–27.CrossRef Google Scholar

Kaan, E. & Grüter, T. (2021). Prediction in second language processing and learning: Advances and directions. In Kaan, E. and Grüter, Theres (Eds.) Prediction in second language processing and learning (pp. 2–24). John Benjamins.CrossRef Google Scholar

Kaan, E., Kirkham, J., & Wijnen, F. (2016). Prediction and integration in native and second-language processing of elliptical structures. Bilingualism: Language and Cognition, 19, 1–18. https://doi.org/10.1017/S1366728914000844 CrossRef Google Scholar

Kahneman, D. (2011). Thinking, fast and slow. Macmillan.Google Scholar

Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49, 133–156. https://doi.org/10.1016/S0749-596X(03)00023-8 CrossRef Google Scholar

Kaschak, M. P. (2007). Long-term structural priming affects subsequent patterns of language production. Memory & Cognition, 35, 925–937. https://doi.org/10.3758/bf03193466 CrossRef Google Scholar PubMed

Kaschak, M. P., & Borreggine, K. L. (2008). Is long-term structural priming affected by patterns of experience with individual verbs? Journal of Memory and Language, 58, 862–878. https://doi.org/10.1016/j.jml.2006.12.002 CrossRef Google Scholar PubMed

Kaschak, M. P., Kutta, T. J., & Jones, J. L. (2011). Structural priming as implicit learning: Cumulative priming effects and individual differences. Psychonomic Bulletin & Review, 18, 1133–1139. https://doi.org/10.3758/s13423-011-0157-y CrossRef Google Scholar PubMed

Kaschak, M. P., Loney, R. A., & Borreggine, K. L. (2006). Recent experience affects the strength of structural priming. Cognition, 99, B73–B82. https://doi.org/10.1016/j.cognition.2005.07.002 CrossRef Google Scholar PubMed

Kim, H., & Grüter, T. (2021). Predictive processing of implicit causality in a second language: A visual-world eye-tracking study. Studies in Second Language Acquisition, 43, 133–154. https://doi.org/10.1017/S0272263120000443 CrossRef Google Scholar

Kroczek, L. O. H., & Gunter, T. C. (2017). Communicative predictions can overrule linguistic priors. Scientific Reports, 7, 17581. https://doi.org/10.1038/s41598-017-17907-9 CrossRef Google Scholar PubMed

Kuperberg, G. R., Brothers, T., & Wlotko, E. W. (2020). A tale of two positivities and the N400: Distinct neural signatures are evoked by confirmed and violated predictions at different levels of representation. Journal of Cognitive Neuroscience, 32, 12–35. https://doi.org/10.1162/jocn_a_01465 CrossRef Google Scholar PubMed

Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language comprehension? Language , Cognition and Neuroscience, 31, 32–59. https://doi.org/10.1080/23273798.2015.1102299 CrossRef Google Scholar PubMed

Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP). Annual Review of Psychology, 62, 621–647. https://doi.org/10.1146/annurev.psych.093008.131123 CrossRef Google Scholar PubMed

Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: brain potentials reflect semantic incongruity. Science, 207, 203–205. https://doi.org/10.1126/science.7350657 CrossRef Google Scholar PubMed

Kutas, M., & Hillyard, S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307, 161–163. https://doi.org/10.1038/307161a0 CrossRef Google Scholar PubMed

Lau, E. F., Holcomb, P. J., & Kuperberg, G. R. (2013). Dissociating N400 effects of prediction from association in single-word contexts. Journal of Cognitive Neuroscience, 25, 484–502. https://doi.org/10.1162/jocn_a_00328 CrossRef Google Scholar PubMed

Leal, T., Slabakova, R., & Farmer, T. A. (2017). The fine-tuning of linguistic expectations over the course of L2 learning. Studies in Second Language Acquisition, 39, 493–525. https://doi.org/10.1017/S0272263116000164 CrossRef Google Scholar

Ledoux, K., Traxler, M. J., & Swaab, T. Y. (2007). Syntactic priming in comprehension: Evidence from event-related potentials. Psychological Science, 18, 135–143. https://doi.org/10.1111/j.1467-9280.2007.01863.x CrossRef Google Scholar PubMed

Lew-Williams, C., & Fernald, A. (2010). Real-time processing of gender-marked articles by native and non-native Spanish speakers. Journal of Memory and Language, 63, 447–464. https://doi.org/10.1016/j.jml.2010.07.003 CrossRef Google Scholar PubMed

Lew-Williams, C., & Fernald, A. (2007). Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science, 18, 193–198.CrossRef Google Scholar PubMed

Luke, S. G., & Christianson, K. (2016). Limits on lexical prediction during reading. Cognitive Psychology, 88, 22–60. https://doi.org/10.1016/j.cogpsych.2016.06.002 CrossRef Google Scholar PubMed

Mani, N., & Huettig, F. (2014). Word reading skill predicts anticipation of upcoming spoken language input: a study of children developing proficiency in reading. Journal of Experimental Child Psychology, 126, 264–279. https://doi.org/10.1016/j.jecp.2014.05.004 CrossRef Google Scholar PubMed

Martin, C. D., Thierry, G., Kuipers, J.-R., Boutonnet, B., Foucart, A., & Costa, A. (2013). Bilinguals reading in their second language do not predict upcoming words as native readers do. Journal of Memory and Language, 69, 574–588. https://doi.org/10.1016/j.jml.2013.08.001 CrossRef Google Scholar

McDonald, S. A., & Shillcock, R. C. (2003). Eye movements reveal the on-line computation of lexical probabilities during reading. Psychological Science, 14, 648–652. https://doi.org/10.1046/j.0956-7976.2003.psci_1480.x CrossRef Google Scholar PubMed

McDonough, K., & Trofimovich, P. (2015). Structural priming and the acquisition of novel form-meaning mappings. In Cadierno, T. & Eskildsen, S. Wind (Eds.), Usage-based perspectives on second language learning (pp. 105–123). De Gruyter Mouton Berlin.CrossRef Google Scholar

McLaughlin, B. (1987). Reading in a second language: studies with adult and child learners. In Goldman, S. R. & Trueba, H. T. (Eds.), Becoming literate in English as a second language (pp. 57–70). Ablex Publishing Corporation.Google Scholar

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of the ICLR Workshop. http://arxiv.org/abs/1301.3781 Google Scholar

Mishra, R. K., Singh, N., Pandey, A., & Huettig, F. (2012). Spoken language-mediated anticipatory eye movements are modulated by reading ability: Evidence from Indian low and high literates. Journal of Eye Movement Research, 5, 1–10. https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_1406626 CrossRef Google Scholar

Mitsugi, S., & MacWhinney, B. (2016). The use of case marking for predictive processing in second language Japanese. Bilingualism: Language and Cognition, 19, 19–35.CrossRef Google Scholar

Monsalve, I. F., Frank, S. L., & Vigliocco, G. (2012). Lexical surprisal as a general predictor of reading time. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 398–408. https://dl.acm.org/doi/10.5555/2380816.2380866 Google Scholar

Montero-Melis, G., & Florian Jaeger, T. (2020). Changing expectations mediate adaptation in L2 production. Bilingualism: Language and Cognition, 23, 602–617. https://doi.org/10.1017/S1366728919000506 CrossRef Google Scholar

Naigles, L. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374. https://doi.org/10.1017/s0305000900013817 CrossRef Google Scholar PubMed

Nieuwland, M. S., Barr, D. J., Bartolozzi, F., Busch-Moreno, S., Darley, E., Donaldson, D. I., Ferguson, H. J., Fu, X., Heyselaar, E., Huettig, F., Matthew Husband, E., Ito, A., Kazanina, N., Kogan, V., Kohút, Z., Kulakova, E., Mézière, D., Politzer-Ahles, S., Rousselet, G., … Von Grebmer Zu Wolfsthurn, S. (2020). Dissociable effects of prediction and integration during language comprehension: Evidence from a large-scale study using brain potentials. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 375, 20180522. https://doi.org/10.1098/rstb.2018.0522 CrossRef Google Scholar PubMed

Nieuwland, M. S., Politzer-Ahles, S., Heyselaar, E., Segaert, K., Darley, E., Kazanina, N., Von Grebmer Zu Wolfsthurn, S., Bartolozzi, F., Kogan, V., Ito, A., Mézière, D., Barr, D. J., Rousselet, G. A., Ferguson, H. J., Busch-Moreno, S., Fu, X., Tuomainen, J., Kulakova, E., Husband, E. M., … Huettig, F. (2018). Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. eLife, 7, e33468. https://doi.org/10.7554/eLife.33468 CrossRef Google Scholar

O’Grady, W. (2005). Syntactic carpentry: an emergentist approach to syntax. Lawrence Erlbaum.CrossRef Google Scholar

Otten, M., & Van Berkum, J. J. A. (2008). Discourse-based word anticipation during language processing: Prediction or priming? Discourse Processes, 45, 464–496. https://doi.org/10.1080/01638530802356463 CrossRef Google Scholar

Pelucchi, B., Hay, J. F., & Saffran, J. R. (2009). Learning in reverse: Eight-month-old infants track backward transitional probabilities. Cognition, 113, 244–247. https://doi.org/10.1016/j.cognition.2009.07.011 CrossRef Google Scholar PubMed

Peters, R., Grüter, T., & Borovsky, A. (2018). Vocabulary size and native speaker self-identification influence flexibility in linguistic prediction among adult bilinguals. Applied Psycholinguistics, 39, 1439–1469. https://doi.org/10.1017/S0142716418000383 CrossRef Google Scholar PubMed

Phillips, C., & Ehrenhofer, L. (2015). The role of language processing in language acquisition. Linguistic Approaches to Bilingualism, 5, 409–453.CrossRef Google Scholar

Pickering, M. J., & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychological Bulletin, 144, 1002–1044. https://doi.org/10.1037/bul0000158 CrossRef Google Scholar

Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. The Behavioral and Brain Sciences, 36, 329–347. https://doi.org/10.1017/S0140525X12001495 CrossRef Google Scholar PubMed

Pili-Moss, D. (2021), Cognitive predictors of child second language comprehension and syntactic learning. Language Learning, 71, 907–945. https://doi.org/10.1111/lang.12454 CrossRef Google Scholar

Pinker, S. (2009). Language Learnability and Language Development. Harvard University Press.Google Scholar

Potts, R., Davies, G., & Shanks, D. R. (2019). The benefit of generating errors during learning: What is the locus of the effect? Journal of Experimental Psychology. Learning, Memory, and Cognition, 45, 1023–1041. https://doi.org/10.1037/xlm0000637 CrossRef Google Scholar PubMed

Potts, R., & Shanks, D. R. (2014). The benefit of generating errors during learning. Journal of Experimental Psychology. General, 143, 644–667. https://doi.org/10.1037/a0033194 CrossRef Google Scholar PubMed

Pozzan, L., & Trueswell, J. C. (2016). Second language processing and revision of garden-path sentences: A visual word study. Bilingualism: Language and Cognition, 19, 636–643. https://doi.org/10.1017/S1366728915000838 CrossRef Google Scholar PubMed

Prasad, G., & Linzen, T. (2021). Rapid syntactic adaptation in self-paced reading: Detectable, but only with many participants. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47, 1156–1172. https://doi.org/10.1037/xlm0001046 CrossRef Google Scholar

Rabagliati, H., Gambi, C., & Pickering, M. J. (2016). Learning to predict or predicting to learn? Language , Cognition and Neuroscience, 31, 94–105. https://doi.org/10.1080/23273798.2015.1077979 CrossRef Google Scholar

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical Conditioning II: Current Research and Theory, 2, 64–99.Google Scholar

Riches, N. and Jackson, L. (2018), Individual differences in syntactic ability and construction learning: An exploration of the relationship. Language Learning, 68, 973–1000. https://doi.org/10.1111/lang.12307 CrossRef Google Scholar

Robenalt, C., & Goldberg, A. E. (2016). Nonnative speakers do not take competing alternative expressions into account the way native speakers do: L2 learners and competition dynamics. Language Learning, 66, 60–93. https://doi.org/10.1111/lang.12149 CrossRef Google Scholar

Roberts, L. and Meyer, A. (2012), Individual differences in second language learning: Introduction. Language Learning, 62, 1–4. https://doi.org/10.1111/j.1467-9922.2012.00703.x CrossRef Google Scholar

Rowland, C. F., Chang, F., Ambridge, B., Pine, J. M., & Lieven, E. V. M. (2012). The development of abstract syntax: Evidence from structural priming and the lexical boost. Cognition, 125, 49–63. https://doi.org/10.1016/j.cognition.2012.06.008 CrossRef Google Scholar PubMed

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. https://www.ncbi.nlm.nih.gov/pubmed/8943209 CrossRef Google Scholar PubMed

Schwanenflugel, P. J., & LaCount, K. L. (1988). Semantic relatedness and the scope of facilitation for upcoming words in sentences. Journal of Experimental Psychology. Learning, Memory, and Cognition, 14, 344–354. https://doi.org/10.1037/0278-7393.14.2.344 CrossRef Google Scholar

Schwanenflugel, P. J., & White, C. R. (1991). The influence of paragraph information on the processing of upcoming words. Reading Research Quarterly, 26, 160–177. https://doi.org/10.2307/747980 CrossRef Google Scholar

Sekerina, I. A. (2015). Predictions, fast and slow. Linguistic Approaches to Bilingualism, 5, 532–536. https://doi.org/10.1075/lab.5.4.16sek CrossRef Google Scholar

Shin, J.-A., & Christianson, K. (2012). Structural priming and second language learning. Language Learning, 62, 931–964. https://doi.org/10.1111/j.1467-9922.2011.00657.x CrossRef Google Scholar

Stahl, A. E., & Feigenson, L. (2017). Expectancy violations promote learning in young children. Cognition, 163, 1–14. https://doi.org/10.1016/j.cognition.2017.02.008 CrossRef Google Scholar PubMed

Stanovich, K. E., & West, R. F. (1981). The effect of sentence context on ongoing word recognition: Tests of a two-process theory. Journal of Experimental Psychology. Human Perception and Performance, 7, 658. https://psycnet.apa.org/record/1982-07095-001 CrossRef Google Scholar

Stanovich, K. E., & West, R. F. (1983). On priming by a sentence context. Journal of Experimental Psychology. General, 112, 1–36. https://doi.org/10.1037/0096-3445.112.1.1 CrossRef Google Scholar PubMed

Szewczyk, J. M., & Schriefers, H. (2013). Prediction in language comprehension beyond specific words: An ERP study on sentence comprehension in Polish. Journal of Memory and Language, 68, 297–314. https://doi.org/10.1016/j.jml.2012.12.002 CrossRef Google Scholar

Traxler, M. J., & Foss, D. J. (2000). Effects of sentence constraint on priming in natural language comprehension. Journal of Experimental Psychology. Learning, Memory, and Cognition, 26, 1266–1282. https://doi.org/10.1037/0278-7393.26.5.1266 CrossRef Google Scholar PubMed

Trenkic, D., Mirkovic, J., & Altmann, G. T. M. (2014). Real-time grammar processing by native and non-native speakers: Constructions unique to the second language. Bilingualism: Language and Cognition, 17, 237–257. https://doi.org/10.1017/S1366728913000321 CrossRef Google Scholar

Van Berkum, J. J. A., Brown, C. M., Zwitserlood, P., Kooijman, V., & Hagoort, P. (2005). Anticipating upcoming words in discourse: Evidence from ERPs and reading times. Journal of Experimental Psychology. Learning, Memory, and Cognition, 31, 443–467. https://doi.org/10.1037/0278-7393.31.3.443 CrossRef Google Scholar PubMed

Van Heugten, M., Dahan, D., Johnson, E. K., & Christophe, A. (2012). Accommodating syntactic violations during online speech perception. Poster Presented at the 25th Annual CUNY Conference on Human Sentence Processing, May, 14–16. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.670.6140&rep=rep1&type=pdf Google Scholar

Van Petten, C., & Luka, B. J. (2012). Prediction during language comprehension: Benefits, costs, and ERP components. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 83, 176–190. https://doi.org/10.1016/j.ijpsycho.2011.09.015 CrossRef Google Scholar PubMed

Van Schijndel, M., & Linzen, T. (2018). Modeling garden path effects without explicit hierarchical syntax. In T. Rogers, M. Rau, X. Zhu, & C. W. Kalish (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (pp. 2603–2608). Cognitive Science Society. http://tallinzen.net/media/papers/vanschijndel_linzen_2018_cogsci.pdf Google Scholar

Walker, N., Monaghan, P., Schoetensack, C. and Rebuschat, P. (2020), Distinctions in the acquisition of vocabulary and grammar: An individual differences approach. Language Learning, 70, 221–254. https://doi.org/10.1111/lang.12395 CrossRef Google Scholar

Weber, K., Christiansen, M. H., Indefrey, P., & Hagoort, P. (2019). Primed from the start: Syntactic priming during the first days of language learning. Language Learning, 69, 198–221. https://doi.org/10.1111/lang.12327 CrossRef Google Scholar

Wehbe, L., Vaswani, A., Knight, K., & Mitchell, T. (2014). Aligning context-based statistical models of language with brain activity during reading. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 233–243. https://www.aclweb.org/anthology/D14-1030.pdf CrossRef Google Scholar

Wicha, N. Y. Y., Moreno, E. M., & Kutas, M. (2004). Anticipating words and their gender: An event-related brain potential study of semantic integration, gender expectancy, and gender agreement in Spanish sentence reading. Journal of Cognitive Neuroscience, 16, 1272–1288. https://doi.org/10.1162/0898929041920487 CrossRef Google Scholar PubMed

Wlotko, E. W., & Federmeier, K. D. (2011). Flexible implementation of anticipatory language comprehension mechanisms. Journal of Cognitive Neuroscience, 23, 233.Google Scholar