1. Introduction
A recast involves a teacher's or other interlocutor's reformulation of a learner's erroneous utterance via altering one or more errors in it while retaining its semantic content (Révész, Reference Révész2012). It typically occurs in a context where interlocutors are engaged in communicative activities as opposed to focusing on forms activities (Long, Reference Long2007). The following excerpt (Gass & Varonis, Reference Gass, Varonis and Eisenstein1989) is often cited in the literature to exemplify the provision of recast in a learner-learner interaction:
-
Learner A: A man is uh drinking coffee or tea uh with uh the saucer of the uh uh coffee set is uh in his uh knee
-
Learner B: In him knee
-
Learner A: Uh on his knee
-
Learner B: Yeah
-
Learner A: On his knee
-
Learner B: So sorry, on his knee
In this conversation, learner B first offers erroneous recast, followed by learner A's recast, which is, then, followed by learner B's confirmation and subsequent modified output.
There are mixed findings in the second language acquisition (SLA) literature regarding the effectiveness of recasts in promoting second language development (Révész, Reference Révész2012). Several researchers have underscored their potential benefits (e.g., Gass, Reference Gass, Doughty and Long2003; Long, Reference Long2007; Nassaji, Reference Nassaji2007, Reference Nassaji2016, Reference Nassaji2017, Reference Nassaji, DeKeyser and Botana2019) including the provision of both positive and negative feedback, immediate juxtaposition of the error along with the reformulated version, facilitation of understanding on the part of the learner, and increased noticing. On the other hand, some have cast doubt on the efficacy of recasts, arguing that they could be potentially ambiguous (e.g., Lyster & Ranta, Reference Lyster and Ranta1997; Panova & Lyster, Reference Panova and Lyster2002) and that learners may conceive them as non-corrective.
Despite this ongoing debate, several studies have investigated the variables that may influence the extent to which recasts can promote second language (L2) acquisition (e.g., Ellis et al., Reference Ellis, Basturkmen and Loewen2001; Goo, Reference Goo2012; Hawkes & Nassaji, Reference Nassaji2016; Lyster & Ranta, Reference Lyster and Ranta1997; Mackey, Reference Mackey2007; Nassaji, Reference Nassaji2017, Reference Nassaji, DeKeyser and Botana2019; Saito, Reference Saito2013; Sheen, Reference Sheen2011). In this respect, an important question concerns the nature of the relationship between learner factors, including gender (Ross-Feldman, Reference Ross-Feldman and Mackey2007), language anxiety (Sheen, Reference Sheen2008), and language aptitude (Sheen, Reference Sheen2007), with recasts in relation to linguistic development. One learner factor relevant to the present study is working memory (WM), which has been the subject of several investigations (e.g., Goo, Reference Goo, Granena, Jackson and Yilmaz2016; Mackey et al., Reference Mackey, Adams, Stafford and Winke2010; Révész, Reference Révész2012; Trofimovich et al., Reference Trofimovich, Ammar, Gatbonton and Mackey2007). It has been defined as the ‘the temporary storage and manipulation of information that is assumed to be necessary for a wide range of complex cognitive activities’ (Baddeley, Reference Baddeley2003, p. 189), with processing and storage competing for a shared pool of limited resources that vary across individuals. This individual variability, referred to as working memory capacity (WMC), could justify learners’ differences in benefiting from interactional feedback. Another fertile area of research on recasts concerns the idea whether recasts are more likely to result in declarative or procedural knowledge (e.g., Ellis et al., Reference Ellis, Loewen and Erlam2006; Loewen & Nabei, Reference Loewen, Nabei and Mackey2007). This idea led Révész (Reference Révész2012) to take this line of research further and examine whether the effects of recasts on various outcome measures are mediated by a difference in WM capacity, a learner-internal factor.
The initial study investigated the role of WM in relation to gains from recasts on different outcome measures. However, it did not indicate whether these differences would carry over to the learners’ reactions after receiving recasts (Révész, Reference Révész2012). The study was concerned with the question why some learners are better than others when it comes to making a comparison between their own erroneous utterance and their interlocutors’ more target-like utterance, seeking the answer in the difference in learners’ WM. Révész discussed the hypothesis that depending on their phonological short-term memory (PSTM) and complex verbal WMC, two of the four components of working memory (Baddeley, Reference Baddeley2000), the participants in her study had possibly engaged in different types of learning processes to various degrees, with high PSTM learners being able to maintain the information in recasts longer in short-term memory. It is possible that these different types of learning processes could be partly accounted for by the different types of behavior that learners engage in after receiving the feedback. This is a significant hypothesis to investigate since it is in line with the idea that learners do not perceive feedback in the same way (Mackey et al., Reference Mackey, Adams, Stafford and Winke2010).
Furthermore, since a number of researchers (e.g., McDonough & Mackey, Reference McDonough and Mackey2006) have argued that the process of modifying output could be as important as the product, an interesting question that arises at this juncture is concerned with how differences in WM capacity translate into possibly different types of behavior, given that some learners may modify their output during interaction whereas others may not and that WM may be responsible for the benefits that learners garner from interaction and even for what they do at the moment of output modification (Mackey et al., Reference Mackey, Adams, Stafford and Winke2010). In other words, it is possible that high WM learners may be better able to reprocess and reconstruct their utterances with new grammatical encoding. A second justification for performing a replication on the study concerns the idea whether the three outcomes measures that Révész used did draw on differential types of knowledge, given that the issue was not an established fact but rather an assumption, rendering some of her interpretations speculative (Révész, Reference Révész2012). A third justification has to do with the idea that the participants in the initial study were Hungarian learners of English, which, as Révész points out, constrains the generalizability of the findings to learner populations with other first languages and in different contexts.
According to the classification proposed by the Language Teaching Review Panel (2008), the current study would fall more toward the conceptual end of the replication continuum in that (a) it was designed to assess the extent to which the findings of the initial study would be generalizable to other learner populations, and (b) it was aimed at assessing an explanation that the researcher of the initial study provided to account for the results she obtained. The explanation underscored the idea that the participants in the initial study had possibly engaged in different types of learning processes to various degrees. As a follow-up on the initial study and its proposed explanation, the current study attempted to investigate into the interaction of learners’ gains from recasts and WM capacity as observed on different outcome measures and the extent to which the participants were behaviorally engaged with recasts as a function of their WM capacity. Specifically, what happens after recasts was the focus of the current study. This was accomplished by a coding method that categorized responses to recasts.
2. Recasts and different outcome measures
In an L2 interaction-based context, learners may focus on form while attending to meaning and use at the same time. One of the primary ways that this opportunity can be provided is through recasts (Doughty, Reference Doughty and Robinson2001; Nassaji, Reference Nassaji2016). What sets recasts apart from other types of feedback is that they (a) provide the correct form and (b) try to maintain the focus on meaning (Nassaji, Reference Nassaji2017), thereby prompting learners to make cognitive comparisons (Doughty, Reference Doughty and Robinson2001) between their non-target constructions and target-like utterances. Long (Reference Long2007) argued that recasts could afford valuable processing resources for form-meaning connections. He also underscored the advantage that recasts occur in context, and thus involve semantic contingency and joint attentional focus between the erroneous utterance and the reformulated one. In other words, as the learner is engaged in and understands at least part of a message, they can notice the gap (Schmidt, Reference Schmidt1990), hence the allocation of more cognitive resources to form-meaning connections. It has also been argued that recasts could potentially lead to modified output and automatization of language knowledge (McDonough, Reference McDonough2005; Swain, Reference Swain, Cook and Seidlhofer1995). These benefits have rendered recasts the most frequently used feedback technique in L2 classrooms (Lyster & Ranta, Reference Lyster and Ranta1997). It should be noted, however, that despite the above-mentioned theoretical benefits associated with recasts, some have cast doubts on whether they can promote SLA (e.g., Lyster & Ranta, Reference Lyster and Ranta1997, Reference Lyster and Ranta2013; Panova & Lyster, Reference Panova and Lyster2002).
As to what type of knowledge recasts could foster, a brief overview of a dichotomy in SLA literature is in order, that of declarative and procedural knowledge. According to the skill acquisition theory (DeKeyser, Reference DeKeyser, Long and Doughty2009), language learning occurs in several stages and is, in essence, like other complex cognitive skills such as learning how to swim. In other words, L2 learners first obtain information from the input. At this declarative stage, knowledge is characterized as being slow and demanding in terms of processing. The next stage, known as proceduralization, involves applying these rules to production or comprehension (knowledge How), resulting in faster performance by drawing on ready-made chunks. Finally, having been exposed to extensive practice opportunities, L2 learners are likely to automatize the procedural knowledge for use in fluent and effortless performance (DeKeyser, Reference DeKeyser and DeKeyser2007a, Reference DeKeyser and DeKeyser2007b). DeKeyser (Reference DeKeyser, Long and Doughty2009) argues that procedural knowledge itself does not turn into automatized knowledge, but that its presence is conducive to the development of automatized knowledge. He also underscores the idea that practice should involve ‘real operating conditions’ (p. 292). This emphasis on the provision of real conditions renders oral feedback including recasts a potential candidate for inducing proceduralization and automatization of correct forms.
Révész (Reference Révész2012) concurs with Ellis's (Reference Ellis2004) recommendation that different carefully selected assessment tools ought to be included in effects-of-instruction studies to better shed light on the effects of the treatment, if any, on different types of L2 knowledge. Caveat should, however, be exercised that implicit instruction does not necessarily lead to implicit knowledge, nor does explicit instruction to explicit knowledge (Ellis, Reference Ellis2004). Regarding recasts, for example, it has been empirically shown that they can lead to gains on a range of various assessment tools, from metalinguistic tests (e.g., Ellis et al., Reference Ellis, Loewen and Erlam2006; Loewen & Nabei, Reference Loewen, Nabei and Mackey2007; Nassaji, Reference Nassaji2017) to oral production tasks (e.g., Mackey & Philp, Reference Mackey and Philp1998; Nassaji, Reference Nassaji2017; Révész & Han, Reference Révész and Han2006). Meta-analyses of feedback studies have also shown that oral communicative tasks are more amenable to change by recasts than metalinguistic tasks, although both types of measure seem to benefit from them substantially (Lyster & Saito, Reference Lyster and Saito2010; Mackey & Goo, Reference Mackey, Goo and Mackey2007). Until recently, most recast studies employed a single assessment task (e.g., Mackey & Philp, Reference Mackey and Philp1998; McDounough & Mackey, Reference McDonough and Mackey2006). This has been unwarranted, given the need in SLA research to gather a multiplicity of data sources (Doughty, Reference Doughty, Doughty and Long2003; Ellis et al., Reference Ellis, Loewen and Erlam2006; Norris & Ortega, Reference Norris, Ortega, Doughty and Long2003). Over the past few years, however, several studies have recognized the need for this multiplicity (e.g., Nassaji, Reference Nassaji2017, Reference Nassaji, DeKeyser and Botana2019; Révész, Reference Révész2012).
3. WM
WM refers to the cognitive processes that account for the temporary storage and manipulation of information and is assumed to ‘serve the function of facilitating performance of a range of complex tasks’ (Baddeley, Reference Baddeley2003, p. 190). The most widely used model of WM has been proposed by Baddeley and Hitch (Reference Baddeley, Hitch and Bower1974). It was originally made up of three separable components assumed to be working in tandem: Two slave systems, namely the phonological loop and the visual-spatial sketchpad and a central executive with limited attentional capacity. In this model, the phonological loop is held to be responsible for the retention of auditory sequences whereas the visual-spatial sketchpad handles visual and spatial information. The central executive is tasked with monitoring the interaction of these components with other cognitive domains and filtering information. A fourth component was later added to the model: The episodic buffer (Baddeley, Reference Baddeley2000). This new slave system has the function of storing information, where information from several different sources can be bound into chunks and episodes and where information from different modalities can be integrated into a single multi-faceted experience.
Phonological loop is further divided into two subsystems, namely a temporary storage system and a subvocal rehearsal system. The temporary storage holds memory traces very briefly such that the information will decay unless the rehearsal component comes into play. The rehearsal process involves both maintaining and registering information within the store, provided that the received visual information can be verbalized (Baddeley, Reference Baddeley2003).
Phonological loop capacity is argued to play a key role in certain aspects of SLA (e.g., Ellis & Sinclair, Reference Ellis and Sinclair1996; French & O'Brien, Reference French and O'Brien2008; O'Brien et al., Reference O'Brien, Segalowitz, Collentine and Freed2006). With specific reference to recasts, some have argued that learners with a higher PSTM are more likely to show grammatical development. Ellis (Reference Ellis2005), for example, speculates that learners with high phonological loop capacity have an edge in gaining grammatical benefits from recasts by virtue of their superior ability to hold the reformulated utterance in memory. Research also shows that high PSTM learners outperform low PSTM learners on delayed posttests (Mackey et al., Reference Mackey, Philp, Egi, Fujii, Tatsumi and Robinson2002; Trofimovich et al., Reference Trofimovich, Ammar, Gatbonton and Mackey2007). It should be noted that there are two widely accepted measures of PSTM, namely the digit span (DS) and non-word span (NWS) tasks (Baddeley, Reference Baddeley2003). The former involves the repetition of increasing numbers of digits, while the latter requires subjects to repeat sequences of non-words of varying lengths.
The second component of the model is the central executive, tasked with supervising attentional processes (Baddeley, Reference Baddeley2003). As with PSTM, the central executive is limited in capacity and differs across individuals (Baddeley, Reference Baddeley2003). Assessment of this component typically involves subjects doing complex WM tasks such as reading or listening to a set of sentences and having to recall the final word of each sentence after the presentation of a set (Conway et al., Reference Conway, Kane, Bunting, Hambrick, Wilhelm and Engle2005; Waters & Caplan, Reference Waters and Caplan1996), referred to as reading span (RS) and listening span (LS) tasks respectively.
The role of WM in L2 learning and use has been the focus of extensive research over the past few decades. Overall, WM is deemed a core element of language aptitude and overall proficiency, in addition to various L2 processes including reading, writing, sentence processing, speaking, vocabulary development, grammar, and the processing of intake (see Juffs & Harrington, Reference Juffs and Harrington2011, for an overview). However, there is still a debate in SLA research on whether there is a link between complex WM capacity and gains from recasts, with some studies reporting a positive relationship (e.g., Mackey et al., Reference Mackey, Philp, Egi, Fujii, Tatsumi and Robinson2002; Sagarra, Reference Sagarra and Mackey2007) and others suggesting no effect (e.g., Trofimovich et al., Reference Trofimovich, Ammar, Gatbonton and Mackey2007). It has been suggested that the differences in these findings could be explained in terms of differences in methodology (Révész, Reference Révész2012). For instance, whereas in some studies, recasts are delivered exclusively in response to learners’ erroneous utterances (e.g., Mackey et al., Reference Mackey, Philp, Egi, Fujii, Tatsumi and Robinson2002; Sagarra, Reference Sagarra and Mackey2007), in others (e.g., Trofimovich et al., Reference Trofimovich, Ammar, Gatbonton and Mackey2007), they are provided after each learner utterance, regardless of whether an error occurs, hence more predictability.
4. The initial study
4.1. Method
In order to investigate the effect of different outcome measures on gains from recasts, Révész (Reference Révész2012) collected data as part of a larger study (Révész, Reference Révész2007), using a pretest-posttest-delayed posttest design. The participants were 90 beginner-level English as a Foreign Language (EFL) learners, who were randomly assigned to recast, non-recast, and control groups. The past progressive construction was chosen as the target structure in the study on the grounds of enjoying perceptual salience and bearing communicative value. Perceptual salience concerns the ease of hearing or perceiving a given structure and is typically measured by three sub-factors, namely the number of phones in the functor, the presence/absence of a vowel in the surface form, and the total relative sonority of the functor (see Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001). The past progressive is made up of a free morpheme (was/were) and a bound morpheme (ing) and is considered to be physically salient. The structure also bears communicative value in that it denotes grammatical tense and aspect. Before and after the treatment, that is, the provision versus absence of recasts, a grammaticality judgment task (GJT), a written production task, and two oral communication tasks were administered. Half of the participants were also invited to take the delayed posttest. In addition, tests of PSTM (DS, NWS) along with a test of complex WM were administered to a few participants in each group (Révész, Reference Révész2012).
4.2. Materials and procedures
Révész (Reference Révész2012) used three comparable versions of the treatment task, each for one treatment session and each containing ten photos. The materials were designed using the computer program Microsoft PowerPoint and piloted on native speakers of English and Hungarian beginning-level EFL learners. She pointed out that the piloting was intended to determine whether there was an equal number of obligatory contexts for the target structure (i.e., one usage of the past progressive construction), and that the use thereof was natural. Comparability of the three versions in terms of lexical variation was confirmed by Guirad's index and in terms of syntactic complexity by clauses per Analysis of Speech unit (Foster et al., Reference Foster, Tonkyn and Wigglesworth2000).
The task in the study involved the participants imagining themselves as witnesses of a hypothetical situation, presented in a photo, where a crime was happening. It, then, required them to carefully describe each of their photos within 40 seconds, with the researcher taking on the role of a police officer. The treatment was provided on three separate days, with each session lasting 15 minutes. In the recast group, recasts were consistently provided to learners by the researcher in the event they produced errors in the use of the past progressive construction. A small number of distracting recasts were also presented randomly in response to other erroneous utterances. The study controlled for recast characteristics by providing only simple declarative recasts with falling intonation. The non-recast group did not receive any type of feedback during the treatment sessions while the control group participated only in the pretest and posttests. To assess the participants’ growth or lack thereof after the treatment, Révész used three versions of three different outcome measures. A brief overview of these measures follows.
4.3. Assessment measures and scoring
4.3.1. GJT
To tap the declarative knowledge of participants regarding the past progressive construction, Révész (Reference Révész2012) employed a GJT, consisting of 36 sentences and 12 distractors. The error types were determined by recourse to past SLA research (Bardovi-Harlig, Reference Bardovi-Harlig2000). In the piloting phase, she invited 15 Hungarian EFL learners from a similar population to take the test. No significant differences were found among the three versions. In the main study, no time limit was set for the task to allow learners to draw on their declarative knowledge. Each learner response was classified into one of four categories, motivated by the developmental stages that learners pass through in learning this construction (Bardovi-Harlig, Reference Bardovi-Harlig2000), namely stage 1: bare progressive (e.g., talking), stage 2: present progressive (e.g., is talking), and stage 3: past progressive (e.g., was talking). Révész (Reference Révész2012) used the following scoring method: Three points were awarded for grammatical items judged as grammatical and for ungrammatical items supplied with appropriate corrections. Two points were awarded for changing grammatical sentences to present progressive, for changing ungrammatical bare progressive sentences to present progressive, and for judging ungrammatical present progressive sentences as grammatical. One point was given for correcting a grammatical item into a bare progressive, for changing an ungrammatical present progressive form into a bare progressive, and for judging a bare progressive item as grammatical. Zero points were awarded for grammatical items judged as ungrammatical with any non-target-like change to a form different from the ones listed above. Any sentence that was judged ungrammatical but for which no correction was supplied was excluded. Also excluded were items where the corrections indicated that the sentences were judged based on linguistic forms other than the target form.
4.3.2. Written picture description task
Révész (Reference Révész2012) used this task on the grounds that it would allow for the use of both declarative knowledge, by virtue of being unpressured, and procedural knowledge, by requiring the participants to supply the target construction. A 10-minute time limit was set for the task to allow for relatively unpressured written production. It involved describing a picture that showed eight people engaged in different activities. Then, the data were coded, yielding a total score for each participant. The coding involved identifying the obligatory contexts for the target structure, checking to see if any progressive marking had been produced, and analyzing the data based on the four developmental categories suggested in Bardovi-Harlig (Reference Bardovi-Harlig2000).
4.3.3. Oral description tasks
Six versions of two time-pressured oral tasks, which involved the participants describing five photos, were developed and piloted. These tasks were employed on the assumption that they would provide appropriate contexts for the use of procedural knowledge. A total of 87 hours of oral data was recorded, transcribed, and subsequently coded (Révész, Reference Révész2012).
4.3.4. Tests of PSTM and complex WMC
Révész used Hungarian versions of the DS and NWS (Racsmány et al., Reference Racsmány, Lukács, Németh and Pléh2005) tests to measure the participants’ PSTM. The DS test involved repeating four lists of numbers increasing progressively from three to nine digits. The NWS test contained 36 non-words that conformed to Hungarian phonotactics and ranged in length from one to nine syllables. It required the participants to repeat the non-words as they were presented one by one. The highest number of syllables of a particular length that a participant could recall at least twice was recorded as the participant's NWS. To measure the participants’ RS, sentences of increasing length were presented. In addition to requiring the participants to remember the last word of each sentence after the presentation of each set, the task also involved answering comprehension questions for a random number of sentences to ensure that participants were not merely concentrated on recalling the words (Révész, Reference Révész2012).
4.4. Data collection and analysis
Révész (Reference Révész2012) administered 40-minute pretests and posttests to all and equivalent delayed posttests to half of the participants. Three separate Many-facet Rasch measurements (MFRM) were used (Linacre, Reference Linacre1989) to assess the impact of the treatment on the participants’ development of the target construction, with the three analyses involving the different types of outcomes measures. To examine the relationship between the participants’ WMC and gains from recasts, a number of Pearson correlations were conducted.
5. Results
5.1. Research question 1
The first research question was concerned with whether the type of outcome measure employed influences the observed effects of recasts on L2 development. The results suggested that it does. However, the extent of development varied considerably from measure to measure, with recasts producing the greatest impact on gain scores on the oral production test (approximately 5.5 logits), followed by the written production test (about 4 logits) and the GJT (approximately 2.5 logits). These results were interpreted as suggesting that while recasts could promote the acquisition of both declarative and procedural knowledge, they seem to be more geared to the development of procedural knowledge.
5.2. Research question 2
The second research question dealt with the relationship between PSTM and/or complex WMC and the recast-induced gains on different outcome measures. A series of correlations were conducted between the WM measures and the pretest posttest gain scores. In the recast group, the participants’ gains on the GJT and written description tests showed moderate to strong correlations with their performance on the RS test scores (GJT: r = 0.53; p < 0.05; written description: r = 0.47; p < 0.05). However, no significant correlations were found between the GJT and the DS and NWS test scores. On the other hand, moderate to strong correlations were identified between the gain scores achieved by participants on the oral description test and the results of the PSTM tests (DS: r = 0.49; p < 0.05; NWS: r = 0.55; p < 0.01), but the oral test gain scores showed no significant correlations with the RS test results. In contrast, the non-recast group showed no significant correlations between the WM and developmental measures. It was implied that WM did not seem to be related to the development of learners in the non-recast group, but was associated with the gains achieved by the recast group (Table 1).
6. Learner engagement
Based on these results, Révész concluded that recasts are conducive to the development of procedural rather than declarative knowledge. In addition, the strong correlation between PSTM scores and oral test scores and that between complex WM scores and the written test scores were attributed to the extent to which learners were engaged with recasts as a function of their complex WM and PSTM differences (Révész, Reference Révész2012).
These kinds of differences in terms of the degree of involvement may also be confirmed by any teacher observing his or her students in the classroom (Philp & Duchesne, Reference Philp and Duchesne2016). While some students may be completely off-task, some others may be compliant, yet inattentive. The term frequently used to talk about learners’ interest and participation in an activity is engagement. Definitions of the term are highly variable, possibly because of the varied research contexts and foci, each underscoring particular facets. Generally, it refers to a state of heightened attention and involvement and has been described as a multifaceted construct including, at the least, three components: cognitive, behavioral, and emotional (Fredricks et al., Reference Fredricks, Blumenfeld and Paris2004; Philp & Duchesne, Reference Philp and Duchesne2016). Philp and Duchesne (Reference Philp and Duchesne2016) observe that behavioral engagement concerns the degree and quality of learners’ participation, with such indicators as the amount of effort, persistence, and active involvement. Cognitive involvement refers to sustained attention and mental effort, whereas emotional engagement is found in the learners’ subjective or affective responses during tasks. A fourth dimension that some consider to be also involved in tasks concerns social engagement, highlighting the learners’ degree of affiliation and willingness to be involved. A perfect example of the positive involvement of all factors is the concept of flow (Csikszentmihalyi, Reference Csikszentmihalyi1990), representing the all-encompassing involvement of the individual to the exclusion of all else.
Given the consensus that engagement could provide an optimal condition for learning, it comes as a surprise that, to date, there has been little principled understanding of the term in applied linguistics research. However, following the emphasis on the need for L2 learners to pay attention to the connections between language form and its meaning in use, the construct has recently drawn the attention of SLA research. This kind of research (e.g., Leow, Reference Leow2015; Robinson, Reference Robinson1995) recognizes gradations of cognitive involvement, with the term engagement being used as a near synonym, although, as noted above, paying attention is only one dimension of engagement. For the purposes of the current study, only the behavioral dimension, involving such indicators as answering questions or participating in tasks (Philp & Duchesne, Reference Philp and Duchesne2016), was taken into account.
Considering the important role of responses to feedback in today's interaction research, it would be plausible to explore learners’ level of engagement induced by recasts via such methods as coding, as suggested by Gass and Valmori (Reference Gass and Valmori2015). In the case of recasts, this level of engagement may be manifested in some learners by simply parroting or repeating the feedback they receive, possibly involving immediate recall from a short-term memory store. In contrast, using the correct form learners receive through recasts later in the discourse places higher demands on WM, possibly because learners need to recall their earlier utterance while exploring possibilities for modification (Gass, Reference Gass2006).
7. The replication study
The first two research questions of this study are the same as the ones in the initial study while the third one extends the research by following up on the learners’ reactions to recasts:
1. Does the type of outcome measure employed influence the observed effects of recasts on L2 development?
2. Are PSTM and/or complex WMC related to whether any effects of recasts are observed on different types of outcome measures?
3. What is the relationship between recast-induced behavioral engagement and PSTM and/or complex WMC?
7.1. Method
As the current study is a conceptual replication, we attempted to reduce heterogeneity between the two studies in terms of method, in line with the recommendations of Marsden et al. (Reference Marsden, Morgan-Short, Thompson and Abugaber2018), with the data being elicited in the same way, and the participants’ age, proficiency level and the learning context being similar. The differences concerned the samples’ first languages (L1s), the language of the WM tests, and the addition of the variable learner engagement in the current study, which led to the third research question.
7.1.1. Setting and participants
The current study examined the effect of different outcome measures on the effectiveness of recasts and the relationship of learners’ WM capacity with their gains from recasts in terms of their engagement with the recasted form on different task types. The participants were 90 EFL learners of a beginner level at a language school in Iran. Biographical information collected about the participants indicated that their age ranged from 18 to 22 (M = 18.90, SD = 0.6) and that none had been to an English-speaking country. The approach employed at the language school consisted of an amalgamation of mechanical drills and communicative activities. A pretest, consisting of only the GJT and the written production task, was administered to an original pool of 120 participants. Based on the pretest results, 102 students who showed a lack of knowledge of the target construction were invited to participate in the oral task. Finally, 90 students, 50 males and 40 females, all native speakers of Persian, were randomly selected to take part in the study. Their prior English instruction lasted from two to three years (M = 2.6, SD = 0.2), and was not found to be significantly different across the three groups.
7.1.2. Materials and procedures
The materials and procedures in the current study were similar to those of Révész (Reference Révész2012). Persian versions of the RS test, DS and NWS tasks were developed and piloted on ten individuals, yielding the internal consistencies of 0.94 and 0.96, respectively. Also, test-retest reliability of the measures was obtained with a five-week interval and reported to be r = 0.76, p < 0.01 for the DS task and r = 0.74, p < 0.01 for the NWS task. The tasks were used in line with the guidelines in Conway et al. (Reference Conway, Kane, Bunting, Hambrick, Wilhelm and Engle2005). The target structure chosen for the current study was, as in Révész (Reference Révész2012), one usage of the past progressive construction and was presented in the same way as in Révész (Reference Révész2012), that is, a description task requiring the participants to describe what was happening in a picture, with one of the researchers taking on the role of the police officer. The reason for not opting for a target structure different from that of Révész (Reference Révész2012) was that, given a different setting, the replication study would have been potentially incomparable to the initial study. Developmental readiness for the target structure was operationalized as the second stage that learners go through to develop knowledge of this construction (Bardovi-Harlig, Reference Bardovi-Harlig2000). In the first stage, the learner has knowledge of bare progressive (e.g., walking). In the second stage, they gain knowledge of present progressive (e.g., is walking), followed by the third stage in which past progressive is acquired (e.g., was walking). Learners’ developmental readiness was specified in the same way as in the initial article, namely based on the learners’ performance on the written pretest (see Révész, Reference Révész2012).
A coding method, intended to provide information on responses to recasts, was developed to measure the extent to which the students with different degrees of WM capacity would be engaged with recasts. The reason for the use of a coding scheme was that Révész (Reference Révész2012) attributed differences in performance to the extent to which the participants were engaged in recasts as a function of different WM capacities. In the current study, the coding of learners’ reactions towards recasts, based on the categories suggested by Gass and Valmori (Reference Gass and Valmori2015), ran the gamut from: (1) no opportunity, (2) opportunity, but did not repeat, (3) repeated the recasted form, (4) negotiated the response, to (5) used the recasted form later in the discourse. Examples of the categories are provided in the Appendix. The coding was carried out via collaboration between the first researcher and a research assistant who listened to the audio recordings of the recast-embedded episodes. Collaborative coding (Smagorinsky, Reference Smagorinsky2008) was used to ensure the reliability of the decisions by providing room for discussion in relation to the data, with each decision being the outcome of a thoughtful exchange between the coders regarding what to call each and every data segment.
8. Results
8.1. Results of the GJT
The means and standard deviations for the performance of each group over time are presented in Table 2. The results of an analysis of variance showed that there were no significant differences between the groups on the pretest, F(2, 127) = 0.15, p = 0.85.
A mixed between-within analysis of variance was used to measure the impact of the treatment on participants’ scores on the GJT across the pretest, posttest, and delayed posttest. With Greenhouse-Geisser corrections applied, there was a significant effect for Time × Group interaction, F(4, 252) = 2.60, p = 0.04, ƞ p2 = 0.03 and a significant main effect for time, F(2, 126) = 39, p = 0.00, ƞ p2 = 0.23, but no significant main effect for group, F(2, 127) = 1.51, p = 0.22. It is noteworthy that the data did not meet the assumptions of normal distribution and homoscedasticity, even after applying transformations, hence a possible loss of power to find statistical results, according to Larson-Hall (Reference Larson-Hall2010). Post-hoc pairwise comparisons indicated significant within-group gains in the performance of the two experimental groups over time but no significant difference between the groups. Cohen's (Reference Cohen1988) effect sizes for comparisons between groups at posttest 1 were: recast versus non-recast, d = 0.26; recast versus control, d = 0.45; non-recast versus control, d = 0.19. Also, comparisons between groups at posttest 2 yielded the following contrast effect sizes: recast versus non-recast, d = 0.15; recast versus control, d = 0.27; non-recast versus control, d = 0.11.
8.2. Results of the oral description task
The descriptive statistics for the oral task description appear in Table 3. An analysis of variance found no significant difference between the three groups on the pretest, F(2, 127) = 0.09, p = 0.91.
A mixed between-within analysis of variance was run to examine the effect of the treatment on participants’ scores on the oral task across the pretest, posttest, and delayed posttest. Significant effects were found for Time × Group interaction, F(4, 254) = 54, ƞ p2 = 0.46, p = 0.00, and for group, F(2, 127) = 41, p = 0.00, ƞ p2 = 0.39, and time, F(2, 126) = 121, p = 0.00, ƞ p2 = 0.48, with Green-Geisser corrections applied. Note that the data did not meet the assumptions of normal distribution and homoscedasticity, even after the use of transformations. Post-hoc pairwise comparisons indicated significant within-group gains in the performance of the two experimental groups over time. Also, Games–Howell post-hoc pairwise comparisons revealed that both on posttests 1 and 2, the recast group performed significantly better than the other two groups, at which time the difference between the non-recast and control group did not reach significance. Contrast effect sizes for comparisons between groups at posttest 1 were as follows: recast versus non-recast, d = 2.29; recast versus control, d = 2.56; non-recast versus control, d = 0.49. Also, comparisons between groups at posttest 2 yielded the following effect sizes: recast versus non-recast, d = 1.80; recast versus control, d = 1.61; non-recast versus control, d = 0.02.
8.3. Results of the written description task
Table 4 presents the group means and standard deviations for each condition over time. The results of an ANOVA showed a similar performance by the three groups on the pretest, F(2, 127) = 1.00, p = 0.36.
A mixed between-within analysis of variance was conducted on the pretest, posttest, and delayed posttest scores of the written description tasks. There was a significant effect for Time × Group interaction, F(4, 254) = 15, ƞ p2 = 0.19, p = 0.00, a significant effect for group, F(2, 127) = 27, ƞ p2 = 0.30, p = 0.00, and time, F(2, 126) = 21, ƞ p2 = 0.14, p = 0.00, with homoscedasticity and normal distribution assumptions being violated and the Green-Geisser corrections applied. Post-hoc pairwise comparisons indicated significant within-group gains in the performance of only the recast group over time. Also, Games–Howell post-hoc pairwise comparisons showed that the recast group performed significantly better than the other two groups on both posttests 1 and 2, at which time the difference between the non-recast and control group was not significant. Contrast effect sizes for comparisons between groups at posttest 1 were as follows: recast versus non-recast, d = 0.98; recast versus control, d = 1.13; non-recast versus control, d = 0.05. Similar comparisons between groups at posttest 2 yielded the following effect sizes: recast versus non-recast, d = 1.27; recast versus control, d = 1.62; non-recast versus control, d = 0.32.
As for the second research question, a series of correlation tests conducted on each of the WM test scores and gains on the three measures yielded two significant correlations only in the recast group: one between the oral gains and NWS scores (r = 0.88, p = 0.00) and one between the gains on the written description task and the RS scores (r = 0.73, p = 0.00).
8.4. Recast-induced engagement and WM
To address the third research question concerned with exploring the relationship between recast-induced engagement and WM capacity, a series of Pearson product moment correlations was conducted. The data included the participants’ engagement scores calculated based on the average of coding scores obtained by each participant in the recast group and the scores on the DS, NWS, and RS tasks. A significant correlation was found between the DS and engagement scores (r = 0.63, p = 0.00), and NWS and engagement scores (r = 0.77, p = 0.00), but the correlation between RS and engagement scores did not reach significance (r = −0.05, p = −0.7).
9. Discussion and conclusion
According to the statistical analyses, the answer to the first research question for the most part echoed the finding reported in the initial study. On the one hand, it was indicated that the recast group had an advantage over the non-recast and control groups on two of the measures, that is, the written and oral description tasks. On the other, unlike what was the case in Révész (Reference Révész2012), the recast group did not significantly outperform the other two groups on the GJT on the posttest and the delayed posttest. However, consistent with Révész's finding, recasts seemed to engender a greater impact on the oral production task than the written production task. We arrived at this finding on the grounds that, based on the new field-specific benchmarks for interpreting effect sizes (Plonsky & Oswald, Reference Plonsky and Oswald2014), greater comparison effect sizes were found for the oral production task. This could suggest that recasts were more conducive to the later use of the target structures on such tasks. Furthermore, the pairwise comparisons revealed within-group gains on the written task scores only in the recast group but for both experimental groups on the oral production task. Therefore, it could be argued that the difference between the two tasks is not as pronounced as in the initial study.
The idea that the GJT, unlike the oral and written production tasks, was least impacted by recasts may lie in the different nature of knowledge that it seems to draw on. The GJT, by nature, involves the use of metalinguistic knowledge, which is declarative, whereas the oral production task, which benefited most from the provision of recasts, draws on procedural knowledge. Written production tasks could be argued to fall somewhere between the declarative-procedural knowledge types continuum, allowing for unpressured language production on the one hand and a focus on meaning on the other. Therefore, as reported in a number of previous studies (e.g., Ellis et al., Reference Ellis, Loewen and Erlam2006; Révész & Han, Reference Révész and Han2006), and in the initial study, recasts are more geared toward tasks that require procedural knowledge.
A possible explanation for the non-effectiveness of recasts for the GJT in this study may be that the participants in our study were from a slightly different instructional context. In other words, the participants in the two studies had different L1 backgrounds. It remains to be seen if a different L1 background, possibly interacting with learners’ RS, could predispose learners to perform differently on certain WM measures. That is, the language of the WM tasks could have acted as an intervening variable, despite the claim that WM tasks are language independent (Osaka & Osaka, Reference Osaka and Osaka1992).
The reason the effects found for recasts were weaker, compared with the ones in the initial study, could be sought in what has come to be referred to as the Proteus effect, where large effects are reported in the early phases of an area of research, with the effects gradually regressing toward the mean effect and even non-significance. However, as Thompson (Reference Thompson2001) aptly warned, interpreting effect sizes with fixed benchmarks would merely be being stupid in another metric. With that in mind, in evaluating the potential benefits of recasts in the present study, it could be argued that even the poorer effects reported in this replication study could justify the treatment. That is, the minimal manipulation and effort associated with delivering recasts, coupled with the positive outcomes they can induce in learners’ L2, encourage the use of this implicit type of feedback since the investment of time and effort seems to be lower, compared with explicit kinds of feedback. Alternatively, it could be that the violation of normal distribution and homoscedasticity assumptions in the current study has led to a loss of power to find statistical differences between the groups.
The better performance of the recast group on the oral production task could be attributable to the notion of transfer-appropriate processing (TAP) (Lightbown, Reference Lightbown and Han2008; Révész, Reference Révész2012). TAP basically posits that learning considerably hinges on the extent to which the learning and retrieval conditions are similar. The implication of this for the results of the current study would be that the retrieval conditions for the GJT were quite dissimilar to the conditions afforded by recasts, which occurred in a communicative context with a primary focus on meaning and the existence of time pressure, whereas the oral task conditions were congruent with the nature of the recasts, hence better gains. From this perspective, the superiority of the recasts on the written description task, compared with the GJT, could possibly be attributed to the idea that in the former, the participants were required to describe pictures, similarly to what was the case during the oral tasks in which recasts were delivered.
The second research question dealt with the relationship between PSTM and/or complex WMC and the recast-induced gains on different outcome measures. As in the initial study, a possible explanation for the higher engagement levels reached by high PSTM learners is that a high phonological capacity possibly predisposes learners to go beyond merely repeating their interlocutor's utterance and venture more demanding responses to recasts. It is possible that participants with high complex WM, on the other hand, were not intrinsically primed for such noticing. This result was confirmed by the follow-up coding method employed to address the third research question concerned with the relationship between recast-induced behavioral engagement and PSTM and/or complex WMC. It should be noted that the correlation found in the initial study between GJT gain scores and the RS scores and the one between the oral task gain scores and the DS scores were absent in the replication study, possibly showing that WM tests could be influenced by the language being used, unlike what has been claimed to be the case (Osaka & Osaka, Reference Osaka and Osaka1992).
The novelty of this replication study was the addition of the variable engagement and the significant correlation found between DS and engagement scores, and that between NWS and engagement scores and the lack thereof between RS and engagement scores. Future research can use more ecologically sound approaches, such as motivation questionnaires and introspective interviews including stimulated recall, to look more closely at the learners’ level of engagement and provide insight into learner perceptions (Philp & Duchesne, Reference Philp and Duchesne2016) in relation to recasts from a multidimensional perspective, reflecting the cognitive, social, affective, and behavioral aspects of student engagement.
Acknowledgements
We fall short of words in expressing immense gratitude to Dr. Andrea Révész for her meticulous reading of an earlier version of the manuscript. Also, our sincere thanks go to the anonymous reviewers for their constructive feedback and the editor of Language Teaching, Dr. Graeme Porte. Any errors are, of course, exclusively our own.
Appendix:
Example 1. No opportunity
Learner: A man *comes and then he *walk and there is a girl…
Example 2. Ignoring the opportunity to repeat the recasted form.
Learner: A *girl is talking with a boy.
Researcher: Yeah, a girl was talking with a boy.
Learner: Ok, then, there was a man …
Example 3. Repeating the recasted form.
Learner: A man *walk in the park.
Researcher: Yeah, a man was walking in the park.
Learner: Yes, a man was walking in the park.
Example 4. Negotiating the response.
Learner: There was a boy. I think he *runs very fast.
Researcher: Yeah, he was running very fast.
Learner: You mean I should say he was running?
Researcher: Yes, go ahead.
Example 5. Using the recasted form later in the discourse
Learner: Two men *are walking in the street.
Researcher: Yes, two men were walking there.
Learner: And two girls were sitting, I think.