Introduction
One strand of instructed second-language acquisition (SLA) research seeks to understand how properties of the native tongue (L1) can be harnessed for the purpose of learning a second or additional language (L2). Prior L1 knowledge and experience can play pivotal roles in L2 learning (Ellis & Sagarra, Reference Ellis and Sagarra2010; Ellis & Wulff, Reference Ellis, Wulff, Schwieter and Benati2019), serving to inform the ways in which explicit instruction is researched and put into practice (McManus & Marsden, Reference McManus and Marsden2018, Reference McManus and Marsden2019a; Michaud & Ammar, Reference Michaud and Ammar2022). This approach derives from the delineation of cross-language relationships, especially competition between them (Ellis, Reference Ellis2006; MacWhinney, Reference MacWhinney, Gass and Mackey2012), as these are thought to influence the routes and rates of L2 learning (Avery & Marsden, Reference Avery and Marsden2019; Cunnings, Reference Cunnings, Schweiter and Wen2022). Previous investigations have examined the architecture of such mechanisms through manipulating both explicit information (EI), or metalinguistic explanation, and practice type during input. However, despite major advances in the field, especially over the past decade, there remains much debate as to which specific variables and in which combinations induce the greatest gains in L2 morphosyntactic development.
Some studies have explored how the presence or absence of EI about the L2 may be of benefit, albeit with mixed results (Henry et al., Reference Henry, Culman and VanPatten2009; Henry et al., Reference Henry, Jackson and DiMidio2017; but see Tolentino & Tokowicz, Reference Tolentino and Tokowicz2014). Others have teased apart the effects of EI and task-essential practice, particularly about the L1, offering considerably more promise (McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b). However, these investigations relied heavily upon corresponding L1–L2 form-meaning mappings, such as inflectional case marking and verb tense. Form-meaning mappings in other L2 learning contexts may be harder to pin down. One such example is plural marking (I like cats ), whose polysemous nature of the –s suffix may be less salient (Ellis, Reference Ellis2006, Reference Ellis2022) due to its overlap with third-person singular (My friend owns a cat) and possessives (I like my friend’s cat), rendering its lack of contingency from competing cues an unreliable predictor of its function (Gries, Reference Gries, Dąbrowska and Divjak2015; MacWhinney, Reference MacWhinney, Ellis and Robinson2008). Because plurals are thought to be difficult to acquire through unguided, naturalistic exposure alone (Shintani & Ellis, Reference Shintani and Ellis2010), it is important to consider how explicit instruction can be tailored to match the nature of the learning problem (McManus, Reference McManus2022).
One possible means of tailoring L2 instruction with respect to plural marking is through contrastive linguistic input, otherwise known as contrastive instruction, whereby the L1 and L2 are used in tandem (as opposed to separately) to highlight grammatical differences between them (Kupferberg, Reference Kupferberg1999). Such an approach may be appropriate for this particular learning problem, as some lines of enquiry have suggested that even when form-meaning mappings are present, attention to form through EI and task-essential noticing practice may suffice to elicit improvements in accuracy (Kasprowicz & Marsden, Reference Kasprowicz and Marsden2018). Early investigations into contrastive instruction offered promise (Ammar et al., Reference Ammar, Lightbown and Spada2010; Horst et al., Reference Horst, White and Bell2010; Kupferberg & Olshtain, Reference Kupferberg and Olshtain1996; Spada et al., Reference Spada, Lightbown, White, Housen and Pierrard2005). However, the nature of the instruction used in those precursive studies was comparatively nebulous in light of how the field has since evolved. For example, the EI and practice activities lacked specificity in what they sought to probe (such as the quantity of exposure to the target features and the precise degree to which the L1 and L2 were involved in the instruction), compounded by the absence of delayed post-testing. These limitations suggest the need for a clearer understanding of which instructional factors may positively affect L2 learning within the contrastive framework.
Only two known studies have investigated the potential use of contrastive instruction to treat plural-marking accuracy among learners whose L1 lacks a morphological system to represent countability, both of which appeared to facilitate learning (Lucas, Reference Lucas2020; Lucas & Yiakoumetti, Reference Lucas and Yiakoumetti2019). Critically, however, those investigations failed to identify which specific factors among its contrastive EI and L2 practice activities induced the gains in subsequent performance. Accordingly, it is possible that an alternative form of instruction, including that of a non-contrastive nature, could have led to similar outcomes.
The present study attempts to extend this line of enquiry by examining whether contrastive instruction is more effective than non-contrastive instruction in treating plural-marking accuracy and, if so, which instructional variables exert the greatest impact when delivered via a web-based medium. In particular, although L1 practice may yield a positive effect when administered in conjunction with L1–EI (McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b), it remains unclear as to what might happen if L1 practice were placed alongside contrastive EI. Doing so may increase the salience of L1–L2 pluralization differences and help reduce cue competition between both languages, enhancing accuracy through more efficient processing routines (Ellis, Reference Ellis2022; Shirai, Reference Shirai2018). Additionally, little research has investigated how learners may orient themselves towards the plural-marking acquisition process and instruction tailored to that end, particularly when incorporating the explicit use of L1. In the broader context of studies that foreground L2 development, these research gaps are of great significance because “work in cross-linguistic influence and L2 instruction is relatively more novel and underexplored” (McManus, Reference McManus2022, p. 83). The overarching aim is to offer concrete suggestions for how empirical insights might translate into theoretical developments and pedagogical solutions.
Literature review
The role of L2–EI and practice type in L2 instruction
The presence of competing cues between two languages has long been acknowledged as a source of learning difficulty, primarily due to its potential to block L2 uptake despite repeated input (Ellis, Reference Ellis2006, Reference Ellis2022; Ellis & Sagarra, Reference Ellis and Sagarra2010). Consequently, one goal of L2 instruction is to reduce such competition (MacWhinney, Reference MacWhinney, Gass and Mackey2012), especially by moderating the type of EI and practice activities (McManus, Reference McManus2022). This is a complex matter considering how the EI and practice type may differ, as well as interact with a target L2 feature, against the backdrop of a learner’s L1 (Henry et al., Reference Henry, Culman and VanPatten2009). In a bid to explore these factors, several studies have focused on the role of L2–EI and associated practice activities.
Fernández (Reference Fernández2008) investigated the use of L2–EI, building upon VanPatten’s (Reference VanPatten, VanPatten and Williams2015) Processing Instruction (PI), a pedagogical framework that forces learners to attend to form in order to process its meaning. Fernández (Reference Fernández2008) examined Spanish object–verb–subject (OVS) and subjunctive sentences, exposing one group to L2–EI and another to structured input (SI) without EI. Online tests showed no clear benefits for L2–EI in OVS sentences but improved response times for subjunctive sentences. This suggests that the effectiveness of EI depends on the nature of the task and the processing problem it entails. Henry et al. (Reference Henry, Culman and VanPatten2009) studied the impact of L2–EI on L1 English speakers’ learning of L2 German accusative case marking. This target was chosen due to the common misinterpretation of OVS sentences as SVO, ignoring case markings as cues. Online tests revealed that L2–EI led to faster response times (PI group) compared with its absence (SI group). This supported Fernández’s (Reference Fernández2008) supposition that the nature of EI plays a role in the interaction between structures and processing. In a follow-up study, Henry et al. (Reference Henry, Jackson and DiMidio2017) examined the function of L2–EI in combination with prosodic cues. Immediate post-tests showed improvements in reception and production, indicating that morphosyntactic accuracy may be enhanced by other processing cues such as prosody.
At the same time, not all evidence is as straightforward or compelling. Sanz and Morgan-Short (Reference Sanz and Morgan-Short2004) discovered that, although manipulating L2–EI and feedback led to improvements in productive measures, task-essential practice alone may have been sufficient to elicit such gains, most likely because it forced learners to process the form for meaning. Stafford et al. (Reference Stafford, Bowden and Sanz2012) examined Latin case morphology and found that even for groups receiving less feedback, participants were still able to make significant progress in interpretation tasks, while explicit metalinguistic feedback during practice resulted in greater gains in productive tasks. Tolentino and Tokowicz (Reference Tolentino and Tokowicz2014) investigated grammatical features in L1 English and L2 Swedish learners and observed improvements in receptive measures for all test groups, with the L2–EI group performing less accurately. Taken together, these mixed findings concerning L2–EI corroborate the view that solutions to L2 morphosyntactic issues may rather lie in the nature of the learning problem (Henry et al., Reference Henry, Culman and VanPatten2009; McManus, Reference McManus2022), including that of an underlying processing difficulty (VanPatten, Reference VanPatten, VanPatten and Williams2015), stressing the need to refine the variables involved in explicit L2 instruction.
The role of L1–EI and practice type in L2 instruction
A lack of clear conclusions has leveraged a shift away from L2–EI and more towards L1–EI. Whereas the role of L1 knowledge in L2 learning has been recognized for decades (see Sparks, Reference Sparks2022, for a historical overview of how L1 skills derived from individual differences—including proficiency, awareness, and understanding—are implicated in L2 achievement), the means by which L1–EI may be manipulated as a contributory factor to SLA processes has remained largely uncharted until recently.
A major contribution in this regard has come from McManus and colleagues (McManus, Reference McManus2019; McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b). These researchers selected the French Imparfait as a grammatical target due to its cross-linguistic complexity. These landmark investigations largely incorporated three instructional variations: (1) a “core L2-only” condition comprising L2–EI followed by L2 practice, (2) an “L2+L1” condition involving the same core treatment but with additional L1–EI and L1 practice (in an attempt to reduce L1–L2 cue competition), and (3) an “L2+L1prac” condition consisting of the same treatment as the second condition except without the L1–EI (while retaining the L1 practice to ascertain whether the L1–EI alone would induce a positive effect). Although these variables were consistent between studies, there were several distinguishing features. For instance, some outcome measures emphasized judgement tasks in reading and listening (McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018), whereas others incorporated picture description tasks in oral production (McManus & Marsden, Reference McManus and Marsden2019a). Although the results demonstrated improvements across all treatment types and modalities, only the L2+L1 group sustained gains into the delayed post-tests. Moreover, trajectories of learning, as evidenced by measurements of L2 performance over the full span of the intervention in McManus and Marsden (Reference McManus and Marsden2019b), indicated that the L2+L1 group started off slowly but picked up over time compared with the L2-only and L2+L1prac groups, which remained stable throughout. This suggests that different instructional elements may produce different effects over time.
There is little doubt that these investigations offer significant insights into the ways in which explicit instruction is devised and executed. Nevertheless, gaps remain. It is still unclear whether L1–EI and/or L1 practice can assist L2 learners in acquiring other challenging grammatical features, particularly in different learning contexts. Crucially, it is yet to be ascertained how these developments might fit into the contrastive framework, especially because studies involving EI thus far have exploited the use of the L1 and L2 separately. As McManus (Reference McManus2022) himself pointed out:
It seems that a relevant question going forward could be whether explicit information that includes a contrastive component is helpful in L2 learning or whether clarifying the form-meaning connections in the respective languages is just as effective. (p. 102)
Aligning L2 instruction to the problematic nature of plural marking
While taking into account underlying cognitive mechanisms, such as cue competition (Fernández, Reference Fernández2008; Henry et al., Reference Henry, Culman and VanPatten2009; McManus & Marsden, Reference McManus and Marsden2019b) and associated processing constraints (VanPatten, Reference VanPatten, VanPatten and Williams2015), it is now necessary to examine the grammatical target under investigation—plural marking—within the context of the research gaps for the contrastive framework. This is important, as recent reports have suggested that instructional studies commonly fail to state the motivation for both their design and selection of the linguistic target (Bardovi-Harlig & Comajoan-Colomé, Reference Bardovi-Harlig and Comajoan-Colomé2022).
Plurality signifies a meaning that can be represented by form through the –s suffix. Although plurality in English can be marked in other ways (e.g., cacti, children, fish, teeth, women), the present study is solely concerned with obligatory plural marking (e.g., bananas, boxes, cities, knives, volcanoes) since its comparative regularity makes it easier to both teach and operationalize. It warrants investigation because it is known to be a challenging L2 form among learner groups whose L1 does not acknowledge countability through inflectional morphemes (Shintani & Ellis, Reference Shintani and Ellis2010), such as L1 Japanese speakers of English—the learner group forming the target population of this study.
As already mentioned, the polyfunctionality of the –s suffix (used also for third-person singular and possessives) means that its ubiquity creates low contingency (Ellis, Reference Ellis2006) and, as a result, becomes an unreliable predictor of its function (see also low cue validity; MacWhinney, Reference MacWhinney, Kroll and de Groot2005). A ramification of this “conspiracy” (Ellis, Reference Ellis2022, p. 44) is that L2 plurals become a difficult form to master (DeKeyser, Reference DeKeyser2005). The issue is compounded in a language like Japanese because, although a form-meaning connection clearly exists in English for plurals, Japanese can convey their meaning simply through number marking, rendering them communicatively redundant (Ellis, Reference Ellis2006; Ellis & Wulff, Reference Ellis, Wulff, Schwieter and Benati2019). Consider the following example:
In Japanese, the noun cat does not require an inflectional marker, as plurality is expressed through the numerical context, unlike in English, where it is necessarily pluralized as cats. This discrepancy can pose challenges for Japanese learners of English, particularly when making generalized statements that lack quantification through number marking (I like cats vs. I like cat in literal Japanese). Moreover, generality in English is conveyed not only by the plural morpheme but also by the absence of the definite article, which provides a further such cue. Similarly, the –s suffix denotes countability, yet this property is determined by the lexical item (I like books [countable noun] vs. I like literature [uncountable noun]). Conversely, plural marking is not required in Japanese for communicating these non-specific meanings, where the bare form alone is acceptable.
One associated problem arising from these cross-linguistic differences is that repeated L1 use may lead to entrenchment of non-pluralized forms in L2 (Ellis, Reference Ellis2006, Reference Ellis2022; MacWhinney, Reference MacWhinney and Schmid2017). Even with explicit knowledge of plural-marking rules, learners may still face difficulties due to such entrenchment and other usage-based factors like low contingency creating cue competition, as discussed earlier. Consequently, these constraints on processing may compromise accuracy (see also Malovrh & Lee, Reference Malovrh and Lee2022, for the relationship between explicit knowledge, grammatical rules, and processing).
Collectively, these factors suggest the need to move beyond instructional techniques that rely solely on corresponding one-to-one mappings between form and meaning, as per the preceding studies involving L1–EI and L2–EI. One such option is contrastive instruction.
Contrastive instruction as a viable pedagogy
Contrastive instruction explicitly points out the similarities and differences between the L1 and L2 together (as opposed to in isolation) for the purpose of L2 learning (Kupferberg, Reference Kupferberg1999). Although the systematic examination of L1–L2 differences dates back to Contrastive Analysis (Lado, Reference Lado1957), that approach was related more to how educators and researchers might understand learner difficulties. Nevertheless, it did give rise to classroom practices that targeted specific L2 features prescribed for a given L1 learner group (for language examples, see Swan & Smith, Reference Swan and Smith2001). Despite these parallels, the term contrastive instruction is used here in preference, not only to emphasize that tandem use of the L1 and L2 can be used as a learning resource and a potential variable in intervention research, but also to “avoid unhelpful value-laden interpretations” (Lucas & Yiakoumetti, Reference Lucas and Yiakoumetti2019, p. 960). It is also worth noting that contrastive instruction is distinct from pedagogical translanguaging (e.g., Cenoz & Gorter, Reference Cenoz and Gorter2022) because, although both practices harness a learner’s linguistic repertoire to facilitate language learning, contrastive instruction tends to be used more for improving specific areas of L2 morphosyntactic weakness (here, plural marking) and is not dependent upon a synchronous delivery.
Investigating this pedagogy is a seemingly worthy venture because, as Gutierrez-Mangado et al. (Reference Gutierrez-Mangado, Martínez-Adrián and Gallardo-del-Puerto2019) stated, advances in related practices are “profusely lacking” (p. xvii). Additionally, there have been recent calls for detailed profiling of learners’ L1 backgrounds and metalinguistic awareness to be better incorporated into formal language-teaching repertoires (Ballinger et al., Reference Ballinger, Lau and Quevillon Lacasse2020; Roehr-Brackin, Reference Roehr-Brackin2018; Woll & Paquet, Reference Woll and Paquet2021).
Despite this major gap, only a handful of studies have sought to explore the potential of contrastive instruction. Early research indicated apparent benefits, not only from promoting cross-linguistic awareness in the classroom (Ammar et al., Reference Ammar, Lightbown and Spada2010; Horst et al., Reference Horst, White and Bell2010), but also—and most notably—from contrastive EI coupled with practice activities across diverse language targets and learner groups (e.g., possessive determiners: Spada et al., Reference Spada, Lightbown, White, Housen and Pierrard2005 [L1 English/L2 French]; relative clauses: Ammar & Lightbown, Reference Ammar, Lightbown, Housen and Pierrard2005 [L1 Arabic/L2 English], Kupferberg & Olshtain, Reference Kupferberg and Olshtain1996 [L1 Hebrew/L2 English]; tense aspect: Kupferberg, Reference Kupferberg1999 [L1 Hebrew/L2 English]). Although offering promise insofar as learning gains were observed, these studies suffered from methodological weaknesses, predominantly through a lack of delayed post-testing, but also through inadequate sample sizes (less than 10 participants per subgroup in Ammar & Lightbown, Reference Ammar, Lightbown, Housen and Pierrard2005), questionable instruments (decontextualized test items in Kupferberg & Olshtain, Reference Kupferberg and Olshtain1996), and findings that were most likely skewed from confounding variables (non-contrastive control group participants being inadvertently exposed to contrastive EI in Spada et al., Reference Spada, Lightbown, White, Housen and Pierrard2005), all of which strongly indicate the need for further testing.
Only two known studies have explored contrastive instruction for improving plural-marking accuracy among L1 Japanese learners of L2 English. In the first study (Lucas & Yiakoumetti, Reference Lucas and Yiakoumetti2019), participants received teacher-fronted contrastive EI and practice activities. Although results showed significant improvements in written receptive and productive measures, the study lacked delayed post-testing, leaving the durability of the gains open to question. The second study (Lucas, Reference Lucas2020) focused exclusively on loanwords because they may be more susceptible to plural-marking errors than non-loanwords, possibly due to entrenched attention to their L1 non-pluralized forms (Ellis, Reference Ellis2006, Reference Ellis2022; MacWhinney, Reference MacWhinney and Schmid2017). In Japanese, a loanword such as banana (バナナ—banana) may be harder to pluralize than a non-loanword such as persimmon (柿—kaki), a supposition that was subsequently verified in Lucas (Reference Lucas2018; for further justifications on marked features requiring greater pedagogical attention, see Gass, Reference Gass, Hinds and Rutherford1982; and for the relationship between markedness and plural morphemes, see Tamura, Reference Tamura2023). Using a web-based delivery, Lucas (Reference Lucas2020) incorporated contrastive EI, highlighting L1–L2 differences between how plurality is expressed in Japanese and English, along with error-identification and translation practice activities. Compared with a control group, the experimental group showed improved receptive and productive accuracy in written plurals. It is possible that rather than employing explicit-deductive rules of the target structure as previous studies have done (Spada et al., Reference Spada, Lightbown, White, Housen and Pierrard2005; Tolentino & Tokowicz, Reference Tolentino and Tokowicz2014), the cross-linguistic contrasts could have contributed to these improvements. Although speculative, the dual-language input may have helped learners to gain sensitivity to morphosyntactic distributions in their developing language systems that promoted L2 remapping (McManus & Marsden, Reference McManus and Marsden2018), thus serving to reduce a processing burden that ultimately led to the gains in plural-marking accuracy.
Critically, however, no known studies concerning contrastive instruction have attempted to isolate variables with the same degree of precision as those involving L2–EI and L1–EI. What remains uncertain is whether explicit non-contrastive instruction from rule explanation alone (DeKeyser, Reference DeKeyser1995; Norris & Ortega, Reference Norris and Ortega2000) might equally be as effective in treating plural-marking accuracy. Importantly, greater illumination of the contrastive elements is required. For example, no known research has investigated the effects of additional L1 practice for contrastive instruction following the successes documented in McManus and Marsden (Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b). One way to incorporate such L1 practice is by applying English pluralization conventions to Japanese. Although rarely used in common parlance, a plural suffix does, in fact, exist in Japanese in the form of –tachi (達・たち). This represents countability in certain formal contexts, especially when referring to people (e.g., 学生達—gakusei-tachi “students”) but almost never to inanimate objects (for a semantic analysis of –tachi, see Nakanishi & Tomioka, Reference Nakanishi and Tomioka2004). The question, then, is what might happen if –tachi were applied to Japanese nouns that do not require pluralization in L1 but do so in L2.
Learner orientations towards the use of L1 in L2 learning
The debate surrounding the explicit use of L1 in L2 learning remains a perennial point of contention, challenging the traditional notion of strict adherence to the target language (de la Fuente & Goldenberg, Reference de la Fuente and Goldenberg2022). Recent years have witnessed a gradual erosion of the monolingual assumption, marked by a growing, albeit hesitant, acceptance of the role of the native language (Hall & Cook, Reference Hall and Cook2012). Despite this evolving acknowledgment, a critical gap persists in understanding individual learner orientations concerning L1 usage, especially when examining language-learning processes traditionally reported using only quantitative methods. Such orientations, which largely encompass attitudes but can also include preferences, beliefs, and behaviours, are typically explored in studies of dual-language usage through qualitative approaches (e.g., Horst et al., Reference Horst, White and Bell2010).
Although there are some notable exceptions that have adopted a mixed-methods approach to investigate constructs relevant to the present study, such as cross-linguistic awareness (Ammar et al., Reference Ammar, Lightbown and Spada2010; McManus, Reference McManus2019) and the conceptual bases of plurality (Lucas, Reference Lucas2020, Reference Lucas2022; Tsang, Reference Tsang2017), a more nuanced understanding is needed from the localized perspective of the learner. These insights can provide valuable information about the processes involved when implementing contrastive instruction, shedding light on possible dynamics at play during related interventions (for an overview of mixed-methods research in instructed SLA, see Sato, Reference Sato, Gurzynski-Weiss and Kim2022).
The present study
Based on the research gaps identified, the present study set out to investigate the effects of EI and practice type on plural-marking accuracy among Japanese learners of English as a foreign language (EFL) across four conditions: (1) non-contrastive EI (rule explanation of pluralization only) + L2 practice, (2) contrastive EI + L2 practice, (3) as per Condition 2 + additional L1 practice (application of L2 pluralization to L1 through the Japanese suffix –tachi), and (4) using prepositions as a control, non-contrastive EI (rule explanation of prepositional use only) + L2 practice. Accordingly, the following research questions were formulated:
-
• RQ1 (General): To what extent is there a difference between contrastive and non-contrastive instruction with respect to L1 Japanese learners’ subsequent written accuracy of L2 English plural marking?
-
• RQ2 (Specific): Which combination of explicit information and practice type has the greatest impact on learning outcomes?
-
• RQ3 (Supplementary): What are participant orientations towards the target feature and the present study’s intervention, particularly regarding the explicit use of L1 in L2 learning?
Given that previous research has repeatedly demonstrated that raising awareness of cross-linguistic features can lead to improvements in grammatical accuracy (as detailed above), it is hypothesized that the contrastive conditions in the present study (Conditions 2 and 3) will be more effective than the non-contrastive conditions (Conditions 1 and 4). Because additional L1 practice has previously proved beneficial in other learning contexts (McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b), it is also predicted that, of the two contrastive conditions, Condition 3 will be more effective than Condition 2. Finally, due to these projected outcomes, participant orientations towards the intervention, particularly the use of L1 for the learning of L2 plural marking, are expected to be favourable.
Method
Research design
The study fostered an exploratory sequential mixed-methods design (Sato, Reference Sato, Gurzynski-Weiss and Kim2022), incorporating primary data from a quantitative-based intervention to address RQ1 and RQ2 and supplementary data from semi-structured retroactive interviews, as well as from closed- and open-ended questionnaire responses, to address RQ3. This design was chosen because, although quantitative and qualitative data independently offer distinct perspectives, the opportunity to adopt both approaches can lead to richer insights (Tashakkori et al., Reference Tashakkori, Johnson and Teddlie2021).
Participants and sampling
Before beginning the research, approval was obtained from the Ethics Committee at the institution where the study was conducted. Thereafter, all participants provided written informed consent for their anonymized data to be used.
Aged 18 to 19 years, all participants were L1 speakers of Japanese studying at a large, private university in Western Japan. In all, 127 participants (51 male, 76 female) took part in the primary, quantitative portion of the study. In addition, five participants (three male, two female) volunteered to join semi-structured interviews, some conducted individually and others in pairs, distributed over two separate time intervals (see below for further details, including interviewee profiles).
Recruitment took place via opportunity sampling from several intact compulsory non-English major EFL classes (e.g., economics, engineering, law) during the second semester of a four-year bachelor’s programme. The primary data were collected by five instructors (including the author), aiming to enhance ecological validity (Loewen & Plonsky, Reference Loewen and Plonsky2016).
Proficiency level was determined from the university’s placement test, Global Test of English Communication (GTEC), which yielded a combined listening/reading mean score of 213 (SD = 27.05). This performance corresponds to the Common European Framework of Reference for Languages (CEFR) upper A1 level (Basic User). An entry questionnaire indicated that exposure to English before entering university came predominantly from six years of formal education, while study-abroad experience and knowledge of other languages were minimal.
Intervention
The intervention, administered online as homework over six consecutive weeks, was accessed on the participants’ digital devices via the university’s learning management system (LMS), a platform developed specifically for the institution but much akin to Moodle (Dougiamas, Reference Dougiamas2002). Performance measures were recorded via the LMS during designated class time at pre, post, and delayed intervals, separated by seven weeks (pre-post) and four weeks (post-delayed). The participants were randomly assigned to one of the four conditions within each class by dividing the registration list into four equal parts. Figure 1 illustrates the timeline, while Table 1 summarizes the conditions and variables.
Note. PlurNC = Plural-Non-contrastive, PlurC = Plural-Contrastive, PrepNC = Preposition-Non-contrastive.
To account for time on task, the number of target sentences varied across the four conditions (see below) but were controlled and counterbalanced (for criteria, see Appendix 1 in the Supplementary Material). The settings on the LMS were adjusted in such a way as to restrict access to only the intended participants of each condition. Its timed countdown function ensured that all six sessions lasted no more than 20 minutes across the four conditions. Although most participants completed all six sessions, those who failed to finish at least four were trimmed from the data set, resulting in an overall completion rate of 5.7 out of 6 (95.5%). To promote comprehension, written instructions were presented in Japanese. The entire instructional content in both Japanese and English for all four conditions can be found on IRIS (https://www.iris-database.com).
After the delayed post-test, the participants were debriefed on the study’s purpose and offered Conditions 3 and 4 (as these covered all aspects of the training), as well as the tests and their answers, as self-study activities to avoid any lost learning opportunities.
Condition 1: Experimental 1 (PlurNC–L2)
Condition 1 took plural marking as its target feature and was non-contrastive in that no reference was made to L1–L2 morphological or conceptual similarities and differences but only to explicit-deductive rule explanation (DeKeyser, Reference DeKeyser1995; Norris & Ortega, Reference Norris and Ortega2000). Practice activities were completed in L2 English only. The rationale for Condition 1 was to ascertain whether explicit instruction of plural marking without contrastive EI or L1 practice would be beneficial, specifically by clarifying form-meaning connections alone. The condition was labelled “Plural-Non-contrastive-L2” (PlurNC–L2).
Condition 1 began with brief EI in Japanese about L2 rule explanation for pluralization (Figure 2). This brevity was intended to prevent the participants from drawing L1–L2 contrasts for themselves, even implicitly (Fukuta & Yamashita, Reference Fukuta and Yamashita2023). Moreover, by solely focusing on rule-based morphology, the EI could avoid details of cross-linguistic differences related to countability, thereby mitigating conceptual contrasts.
The proceeding page was accessed by clicking either Yes or No to confirm comprehension of the EI. While this confirmation did not guarantee that the participants understood the EI, it was included to maintain engagement.
Prior to the practice activities, participants were semantically primed for the target nouns by completing an L2 vocabulary–definition matching activity (e.g., coffee = A type of drink). This priming task aimed to facilitate comprehension of the upcoming nouns without focusing on their form (Trofimovich & McDonough, Reference Trofimovich and McDonough2011).
The practice activities comprised two types of cloze tasks, presented as Parts 1 and 2, each containing five unrelated sentences. In Part 1, participants were instructed to select one of the five nouns in its appropriate form—indefinite article, singular, or plural—from a dropdown menu of 15 options (Figure 3, left). The sentence prompts and their appropriate responses were intended to be as unequivocal as possible, dealing only with grammatical forms and not with lexical items. Immediate feedback through providing the most appropriate response was displayed after each prompt, including corrections if an inappropriate selection had been made. For example, if Item 15 in Figure 3, coffees, were erroneously selected in response to the prompt (… he dislikes the taste of _____), the correction would subsequently be displayed underneath as Correct answer: 14, along with the entire sentence featuring the prompt capitalized in its appropriate form (… he dislikes the taste of COFFEE). This feedback ensured that the participants received full exposure to the target forms while attending to appropriate usage/non-usage of the plural suffix. In this way, explicit clarification of L2 form-meaning connections was provided without referencing the L1. As the participants worked through each of the five sentences, the prompts in the dropdown menu did not decrease in number but remained as the full list of 15 options.
Part 2 operated on much the same principle as Part 1, except instead of a dropdown menu, participants were instructed to manually type the noun in its appropriate form into a textbox using the indefinite article, singular form, or plural form (Figure 3, right). Again, immediate feedback of the same nature was provided.
There were 10 target nouns and 10 corresponding practice sentences in each session.
Condition 2: Experimental 2 (PlurC–L2)
Condition 2 incorporated contrastive EI about L1–L2 similarities and differences concerning plural marking in conjunction with L2 practice activities (Lucas, Reference Lucas2020). The rationale was to isolate whether contrastive instruction with L2 practice would be more beneficial than either rule explanation alone or contrastive instruction with additional L1 practice. This condition was labelled “Plural-Contrastive-L2” (PlurC–L2).
Each session consisted of three parts. Parts 1 and 2 were the same cloze activities as Condition 1 (Figure 3), although these differed in two respects: (1) the number of practice items in each part was reduced from five to four to account for there being an additional variable in Part 3 (which also controlled for time on task) and (2) following each prompt, EI was provided in Japanese about L1–L2 similarities and differences (Figure 4). The EI covered morphological and conceptual aspects of both languages, particularly countability (Lucas, Reference Lucas2020, Reference Lucas2022; Tsang, Reference Tsang2017). As with Condition 1, the same vocabulary–definition matching activity preceded the cloze activities. However, it was distinct from Condition 1 in that there was accompanying text explicitly stating that the target nouns were loanwords (since, as previously explained, they are thought to be more difficult to pluralize than non-loanwords, Lucas, Reference Lucas2018) and that their differing L1–L2 forms can be used to express the same meanings in both Japanese and English.
Part 3 sought to engage participants in dual-language use to promote further cognitive involvement through cross-linguistic comparisons and contrasts (Leow, Reference Leow2015; Lucas, Reference Lucas2020), whereby appropriate noun forms were elicited through a series of five L1–L2 translation sentences. For example:
The five sentences were presented as a short narrative instead of decontextualized sentences to aid comprehension. A short hint serving as a sentence starter (such as I live … in the example above) was provided to assist with word order so that attention could instead be directed to morphology. Additionally, the instructions guided participants to focus on the L1–L2 differences of the noun forms, with feedback being provided as an exemplar translation to encourage such reflection. The omission of this translation activity in Condition 1 was another conscious decision to minimize any cross-linguistic contrasts being drawn, even on an implicit level. Prior to these translations, participants were primed for the five target nouns by completing another vocabulary–definition matching activity.
Each session consisted of 13 target nouns and 13 corresponding practice sentences. Given the fixed paragraph format used for presenting the five nouns in Part 3, it was impractical to precisely match the number of stimuli with that of Condition 1. As mentioned, this is the principal reason that the number of practice items in Condition 2 was reduced from five to four, serving to buffer this difference.
Condition 3: Experimental 3 (PlurC–L1/L2)
Condition 3 contained all the same parts as Condition 2 but sought to build upon it by investigating whether the effects of additional L1 practice would be beneficial (McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b). This condition was labelled “Plural-Contrastive-L1/L2” (PlurC–L1/L2).
As previously explained, the extra contrastive variable involved utilizing an L2 rule in L1, namely by applying English pluralization conventions to Japanese through the suffix –tachi (達・たち). This process began with the same vocabulary–definition matching activity in Part 3 of Condition 2. However, this was proceeded with a list of the five nouns, by which participants were required to indicate via a two-choice dropdown menu whether each noun was countable or uncountable. This was included because, unlike Conditions 1 and 2, determining countability before operating on the target language was an essential task component, as it dictated whether –tachi would be applied. The ratio was fixed at three countable to two uncountable across all six sessions. Following immediate feedback on whether each noun was correctly identified as countable or uncountable, the same five sentences from Part 3 of Condition 2 were presented as a cloze activity with the target nouns removed. The cloze featured a dropdown menu listing 10 options in Japanese derived from the five nouns in one of two forms: either (1) the lemma (labelled そのまま—sono mama “as it is”) or (2) the lemma + the Japanese plural suffix –tachi to reflect how the plural form is not necessary in Japanese but would be applied in English to countable nouns (Figure 5). Taking the same sentence as in the example above, since the word スーパーマーケット (sūpāmāketto “supermarket”) requires the plural form, the appropriate response from the dropdown menu would be スーパーマーケット達 (sūpāmāketto-tachi) as opposed to スーパーマーケット(そのまま) (sūpāmāketto (sono mama)).
Finally, Part 4 involved translating these five Japanese sentences into English, as per Part 3 in Condition 2.
Each session comprised 11 target nouns and 16 practice sentences. The paragraph format and the fixed ratio of countable versus uncountable nouns inevitably led to a further discrepancy in item distribution between conditions. To address this issue, the number of practice items in Parts 1 and 2 was reduced from four to three. Additionally, it is worth reiterating that the countable/uncountable identification and translation items used the same sentences as those in Condition 2. Consequently, although each session featured 16 practice sentences, Parts 3 and 4 involved five repetitions, rendering the content to 11 unique practice sentences.
Condition 4: Control (PrepNC–L2)
Condition 4 served as a control and took prepositions as its target feature. The rationale was to incorporate a different grammatical focus so as to compare performance with the other three conditions. As with Condition 1, no contrastive elements were included, and it used only L2 English in the practice activities. This condition was labelled “Preposition-Non-contrastive-L2” (PrepNC–L2).
Condition 4 began with EI about L2 use of prepositions, particularly regarding different categories such as multi-word verbs, prepositions of place and time, and when no prepositions are required (e.g., discuss about something vs. discuss something; for further details, see instructional content on IRIS). Thereafter, the number of target nouns and practice sentences and their sequence were identical to that of Condition 1, except the sentences were adapted to ensure that they each contained two prepositions. The sentences were to be selected from a choice of 10 (including the option of “no preposition required,” indicated as (–)), followed by immediate feedback of appropriate responses. For example, the right-hand sentence in Figure 3 was modified to:
The sandwiches for sale (a) _____ the café were good value (b) _____ money.
[Answers: (a) at, (b) for]
This format ensured that the sentences provided exposure to the same target nouns in their appropriate forms, but attention was directed towards prepositions instead of plural marking.
Instruments and procedure
A battery of receptive and productive online instruments was used to measure performance. Based on Lucas (Reference Lucas2020), three versions of each instrument were created following a Latin square (labelled Patterns A, B, and C), counterbalanced at each test interval and within each condition. These were subsequently checked by 10 L1 speakers of English, piloted on a comparable group of learners from the same institution, and revised accordingly. To avoid participants deducing the purpose of the investigation (thereby producing a possible priming effect), the productive task was administered first. However, for logical flow, they are presented here in reverse order.
Error recognition: Timed acceptability judgement task
The receptive instrument was an acceptability judgement task (AJT) designed to test participants’ ability to recognize and correct plural-omission errors. Metalinguistic judgements are widely used in SLA research (Norris & Ortega, Reference Norris and Ortega2000) for reasons such as their versatility, convenience in scoring/showing changes over time, and their ease of development (Plonsky et al., Reference Plonsky, Marsden, Crowther, Gass and Spinner2020), while corrections enable the source of an error to be identified (Mackey & Gass, Reference Mackey and Gass2016).
The instrument took its design from Lucas (Reference Lucas2020) and incorporated a 20-sentence narrative on a specified topic (e.g., shopping). This narrative included six plural-omission errors (see example in Figure 6), six random distractor errors (e.g., prepositions, verb aspect), and eight non-errors. All three test versions were uniform in meeting these criteria, standardized for word frequency, sentence length, sentence type (affirmatory, negatory, or interrogative), and error position (central or posterior), while differing only in content (for full test criteria and sample content, see Appendix 2 in the Supplementary Material, and for all three AJT versions, see IRIS). Although the number of test items (n = 20) and its subset of target grammar items (k = 6) was small, the aim was to avoid overtaxing the participants beyond their capacity considering their relatively low level of linguistic proficiency.
After a short practice run of two sentences (one acceptable and one unacceptable using a verb-aspect error as a distractor), each of the 20 sentences was displayed, one at a time, on separate pages. Once a subsequent page was accessed, it was not possible to go back and alter responses. If a sentence was judged grammatically acceptable, the participants were simply to type “OK” into a textbox below the sentence. However, if a sentence was deemed grammatically unacceptable, it was to be corrected by typing in the amended form as a single word (Figure 6). Although these dichotomous judgements may be less common than scalar ones (Spinner & Gass, Reference Spinner and Gass2019), they are nevertheless helpful in gaining insight into whether knowledge can be practically applied. Using the LMS timer function, 10 minutes was allotted to complete the task. Although this may seem stringent, the time limit was determined from the piloting, which showed that with each sentence being relatively short, judgements could adequately be made within 30 seconds. Furthermore, Mackey and Gass (Reference Mackey and Gass2016) suggested that “it is advisable to get quick responses without a great deal of thinking time” (p. 60).
Although time pressure was applied to the task, it was done so holistically rather than to individual test items. As a result, this may have enabled the participants to deliberate their responses, increasing the likelihood that the AJT tapped into explicit rather than implicit knowledge. Allocating time pressure to each test item separately would have been preferable, but the LMS did not possess such a function. Nevertheless, the allotted period of 10 minutes served to ensure that the overall task was completed within a given timeframe. It is also important to recognize that the boundaries between implicit and explicit knowledge are not clearly defined (Ellis, Reference Ellis and Hornberger2008; Pawlak, Reference Pawlak and Lewandowska-Tomaszczyk2019), and there remains much debate surrounding the use of timed versus untimed AJTs in measuring these two types of knowledge (Plonsky et al., Reference Plonsky, Marsden, Crowther, Gass and Spinner2020).
Because the data were analyzed following a Latin square design, instrumental reliability was calculated for the combination of the six target items across all three test versions (6 × 3 = 18). According to Kline’s (Reference Kline2000) standard criteria, Cronbach’s alpha was “Acceptable” (α = .77).
Written production: Timed picture description task
The productive instrument was a picture description task (PDT) designed to elicit loanword nouns in their plural form. PDTs are a common data collection tool (Mackey & Gass, Reference Mackey and Gass2016) and can be useful in that they “elicit sufficient spontaneous writing for assessment purposes within a short period of time” (Sasayama et al., Reference Sasayama, Garcia Gomez and Norris2021, p. 1).
Based on Lucas (Reference Lucas2020), three distinct test versions were used, each featuring pictures of shops specializing in musical instruments, furniture, and electronics, respectively. Although these represent specific semantic fields, they were chosen because the cognates within each category allowed for lexical familiarity (e.g., piano, bed, smartphone; for a complete list of nouns produced from all three categories as part of the marking criteria, see IRIS). Each test version maintained uniformity by including (1) a picture of the designated shop with an additional image of people engaging in shopping activities, aimed at eliciting richer responses beyond the depicted items on sale (Figure 7), and (2) a standardized set of prompts administered in the participants’ L1 to serve as a guideline for writing. The English equivalent of these prompts is as follows: Describe what kind of shop this is; what it generally sells; who these two people are; what kind of items they probably like; why they came to the shop; what they are doing now and what will happen next. Together, the pictures and prompts were designed to elicit a range of nouns in both singular and plural forms to prevent participants from overproducing plurals (for further sample content, see Appendix 3 in the Supplementary Material, and for all three PDT versions, see IRIS).
The participants were instructed to write as much as possible within 10 minutes (determined by the same reasoning as the AJT), measured using the LMS timer function. As with the AJT, because participants had sufficient time to consider the accuracy of what they were writing within the given timeframe, it is more likely that the PDT also drew more from explicit than implicit knowledge.
Task scoring
Having downloaded all data in spreadsheet format, task scoring for the AJT was based on a binary outcome variable: one point for each successful instance of a corrected plural-marking error and zero for each unsuccessful instance. If, in the rare event that a plural was misapplied to a singular or uncountable noun (e.g., musics , Pattern A, Item 16), these were regarded as grammatically unacceptable and consequently scored as zero. Similarly, if non-target words were corrected, these were also scored as zero. Measured against the maximum of six errors within each test version, the total scores were then converted into percentages.
For the PDT, the first step in scoring the data was to separate nominal loanwords from partial loanwords and non-loanwords through colour coding. To demonstrate this process, the loanwords in the following example have instead been underlined:
This store sells furniture. For example, bed, chairs, tables , shelf, lights and carpets. Two women want to buy a light. They think which light to buy. They decided to buy white one.
Because only full loanwords were examined, nouns that possessed (1) both a Japanese form and an equivalent loanword form, i.e., partial loanwords (“chair”: 椅子—isu / チェア—chiea; “carpet”: 絨毯—jūtan / カーペット—kāpetto), or (2) only a Japanese form (“shelf”: 棚—tana), i.e., non-loanwords, were excluded from the analyses (for further details on these nominal categorizations, see Lucas, Reference Lucas2018). Uncountable nouns (furniture) and nouns necessitating the indefinite article ( a light) were also excluded. Focusing exclusively on full loanword forms requiring pluralization, conversion to percentages was then calculated using the following formula:
In the example above, since two plurals have been appropriately produced out of a possible three (bed, tables, lights ), accuracy would be scored as 66.67%. To ensure consistency between noun categories (full loanwords/partial loanwords/non-loanwords), a set of marking criteria was developed with a second rater for all three test versions (available at IRIS). This protocol enabled standardization between potentially ambiguous items. Using 20% of the data for all three time intervals across all four conditions, standard measures of agreement judged against Landis and Koch (Reference Landis and Koch1977) demonstrated that both the receptive and productive Kappa readings were “Almost perfect” (AJT: κ = .98; PDT: produced: κ = .97, possible: κ = .86, mean percent accuracy: κ = .94).
Quantitative data analysis
The quantitative data were analyzed following non-parametric procedures. The reasons for this were twofold. Firstly, most of the data sets did not conform to a normal distribution according to Shapiro-Wilk tests (p < .05). Secondly, there were no missing data points across the three time intervals throughout the testing trajectory for both the receptive and productive measures. However, applying a parametric test (3 × 4 analysis of variance with planned contrasts) resulted in an identical pattern of results.
Between-group differences (for Condition) were assessed using Kruskal-Wallis tests, followed by post hoc analyses using Mann-Whitney U tests to locate any significant differences in performance. Within-group differences (for Time) were determined via Friedman tests, with any significant differences identified using Wilcoxon Signed-Rank tests. The alpha level was set at .05, including Bonferroni adjustments for the post hoc procedures. All analyses were performed using R via Langtest (Mizumoto, Reference Mizumoto2015; Mizumoto & Plonsky, Reference Mizumoto and Plonsky2016).
Because non-parametric tests rely on ranks, the magnitude of change for both the omnibus tests and the post hoc analyses are presented as the effect size r, which are interpreted using Plonsky and Oswald’s (Reference Plonsky and Oswald2014) field-specific benchmarks: .25 (small), .40 (medium), .60 (large).
Supplementary data for triangulation
Four semi-structured interviews were conducted online with the five volunteer participants, representing all three experimental conditions across three separate classes. Audio-recorded using the teleconferencing platform Zoom (https://zoom.us), each lasted approximately 10 minutes. To facilitate ease of communication, the interviews were carried out bilingually in English and Japanese by the author. The first two interviews took place immediately after the study’s pre-test, whereas the remaining two followed the immediate post-test (Figure 1; for participant assignment to condition, see Appendix 4 in the Supplementary Material). Of the five interviewees, quantitative data from Participants 3 and 4 were not included in the main analyses due to attrition. Although the remaining three were included in those analyses, it is unlikely that 10 minutes of extra reflection would have skewed the findings, particularly since (1) the interviews involved only one participant from each of the respective experimental conditions and (2) a gap of either seven or four weeks separated the interviews from the subsequent test (pre-post or post-delayed), reducing the potential for any sustained effects.
Two interview protocols were developed to explore RQ3, each aligned with the respective interval. For the interviews following the pre-test, the main objectives were to enquire about (1) reactions to the pre-test and (2) personal experiences of plural-marking acquisition. The interviews carried out after the immediate post-test set out to explore (1) the perceived effectiveness of the intervention, including whether the use of L1 was helpful for those in the contrastive conditions, and (2) the cognitive processes underlying plural marking through brief stimulated-recall prompts (for the full interview question bank, see Appendix 4 in the Supplementary Material).
Upon completion of the immediate post-test, four tailored versions of an online exit questionnaire were administered to the four groups, mirroring each condition. The aim, likewise corresponding to RQ3, was to gather further data on (1) the participants’ self-rated awareness of the target grammatical feature and (2) the perceived effects and general impressions of the online treatment. Although the number of items varied slightly across conditions, the average totalled 20. Most responses were collected using closed-ended Likert-style questions, although a final section soliciting overall impressions was open-ended, while a section enquiring about perceptions of the training components included both types of questions (see below for further details). In this way, the questionnaire produced both quantitative and qualitative data.
Supplementary data analysis
For the interviews, a process modelled on the six steps of Braun and Clarke’s (Reference Braun and Clarke2021) reflexive thematic analysis was chosen as a suitable means to interpret the data due to its systematic yet flexible approach. The audio recordings were first translated and transcribed verbatim by the author, with hesitation markers omitted for clarity, before being read multiple times to become intimately familiar with the content (Step 1). Initial codes were then manually generated, primarily through an inductive, data-driven approach (open coding), supplemented by a few codes derived deductively when specific constructs, such as the influence of the L1 on L2 learning, needed to be identified (Step 2). Patterns among the codes were then used to generate potential themes (Step 3), followed by them being thoroughly checked against the data set in a cyclical manner (Step 4). This process was facilitated by a research assistant holding a doctoral degree in Applied Linguistics, with expertise in qualitative-based research methodology in general and thematic analysis in particular. The initial themes were cross-checked, reviewed, refined reflexively and recursively against the coded data until a final iteration of themes for each protocol was unanimously agreed upon (Step 5). Lastly, the findings were contextualized to RQ3 for the write-up (Step 6).
Regarding the questionnaire, the closed-ended responses were analyzed descriptively through mean percentages. This decision was made on the grounds of research manageability given the present study’s large-scale nature, associated space restrictions, and single authorship. Similarly, sample responses were randomly selected due to the substantial volume of qualitative data. These responses were translated by the author and then subjected to the same iterative process of interpretation through reflexive thematic analysis (Braun & Clarke, Reference Braun and Clarke2021) in the same way as described above, including with the same research assistant.
Results
Error recognition of plural omissions
Table 2 presents the descriptive statistics for the error recognition of plural omissions and Figure 8 shows the differences in performance across Condition and Time (for raw receptive scores, see IRIS).
The Kruskal-Wallis tests indicated no significant difference between the four conditions in the pre-test [H(3) = 1.74, p = .628, r = .04]. However, significant differences were detected in both the immediate post-test [H(3) = 34.98, p < .001, r = .47] and the delayed post-test [H(3) = 18.60, p < .001, r = .32].
Between-group comparisons (Table 3) revealed no significant differences among all pairings in the pre-test, indicating that all four groups started off at a similar level of ability. In the immediate post-test, however, Conditions 2 and 3 (PlurC–L2 and PlurC–L1/L2) differed significantly from Conditions 1 and 4 (PlurNC–L2 and PrepNC–L2). This suggests that the contrastive variables in the treatment elicited greater learning gains, particularly for Condition 3, since it yielded a larger effect size. In the delayed post-test, Conditions 1, 2, and 3 differed significantly from Condition 4.
Note. Sum. = summary, – = non-significant, ↑ = significant increase, S/M/L = small/medium/large effect size.
The Friedman tests demonstrated that there was a significant difference between the three time intervals for Condition 1 [χ2(2) = 9.74, p = .008, r = .47], Condition 2 [χ2(2) = 16.88, p < .001, r = .63], and Condition 3 [χ2(2) = 29.38, p < .001, r = .92]. However, no significant difference was detected for Condition 4 [χ2(2) = 1.10, p = .58, r = .10].
Within-group analyses (Table 4) indicated significant differences between the pre-test and immediate post-test for Conditions 1, 2, and 3 but not for Condition 4. Although this shows that all three experimental groups improved, it was Condition 3 that displayed the greatest gains, as indicated by the largest effect size. Similarly, differences between the pre-test and delayed post-test were significant for all three experimental groups, suggesting that improvements were maintained following the treatment. The only significant difference between the immediate post-test and delayed post-test was that of Condition 3, for which performance decreased. This illustrates that, although the treatment induced the biggest impact for Condition 3, this group likewise showed the steepest post-treatment decline (Figure 8). (For additional post hoc statistics for between- and within-group comparisons, see Appendix 5 in the Supplementary Material).
Note. Sum. = summary, – = non-significant, ↑/↓ = significant increase/decrease, S/M/L = small/medium/large effect size.
Written production of plural forms
The descriptive statistics for the written production of plural forms are set out in Table 5, while the differences in performance across Condition and Time are plotted in Figure 9. Although this set of results, like error recognition, is reported as percentages, the mean number of target plurals produced per session across all three time intervals and four conditions was five (for raw denominators of instances of produced and possible plural forms, see Appendix 6 in the Supplementary Material, and for raw productive scores, see IRIS).
The Kruskal-Wallis tests revealed no significant differences between the four conditions in either the pre-test [H(3) = .68, p = .879, r = .014] or the delayed post-test [H(3) = 6.88, p = .076, r = .16]. However, a significant difference was observed in the immediate post-test [H(3) = 17.76, p < .001, r = .31].
Between-group comparisons (Table 6) demonstrated no significant differences among condition pairings in the pre-test, indicating baseline parity from the outset. In the immediate post-test, however, Conditions 2 and 3 differed significantly from Condition 4, implying that the contrastive variables led to greater improvements in plural production compared with the control group. There were no significant differences between any of the other pairings. In the delayed post-test, none of the pairings were statistically different, suggesting that the productive gains were lost following the treatment.
Note. Sum. = summary, – = non-significant, ↑ = significant increase, M = medium effect size.
The Friedman tests revealed a statistically significant difference between the three time intervals for Condition 1 [χ2(2) = 9.26, p = .01, r = .45], Condition 2 [χ2(2) = 12.87, p= .002, r = .54], and Condition 3 [χ2(2) = 12.07, p = .002, r = .55], although not for Condition 4 [χ2(2) = .17, p = .92, r = .018].
The final set of pairwise analyses located the within-group differences (Table 7). Significant differences between the pre-test and immediate post-test were detected for the contrastive groups (Conditions 2 and 3) but not for the non-contrastive groups (Conditions 1 and 4). Regarding statistical differences between the pre-test and delayed post-test, a retention of learning gains occurred in Conditions 1 and 2, whereas these were lost in Conditions 3 and 4. Finally, immediate and delayed post-test analyses indicated that improvements were not sustained following the treatment. (For additional post hoc statistics, see Appendix 5 in the Supplementary Material.)
Note: Sum. = summary, – = non-significant, ↑ = significant increase, S/M = small/medium effect size.
Participant orientations
Detailed findings about participant orientations from the supplementary data are available in Appendices 7 and 8 in the Supplementary Material. These materials include core excerpts from the interview transcripts with their codes and themes, as well as the full set of exit questionnaire data representing participants’ post-treatment feedback from the closed- and open-ended questions. What follows is a concise summary.
During the pre-treatment interviews, participants expressed their reactions to the pre-test, describing perceived difficulty in both the receptive and productive components, compounded by a lack of confidence and self-rated ability. In sharing their personal experiences of grammar acquisition, it also became apparent that they generally paid little attention to plural marking prior to the intervention, despite recognizing the challenges it presented to their learning. For instance, some reflections noted potential cross-linguistic effects, with Participant 3 (Condition 2) remarking: “Native English speakers probably have a feel for [plural forms], but I think for Japanese learners, it’s pretty difficult for us to notice them.” Nevertheless, none of the participants were able to offer any concrete solutions.
Post-treatment, however, the participants in the three experimental groups displayed positive attitudes towards the intervention and its online format, with perceived improvements in their performance between the pre-test and immediate post-test. Notably, when reflecting on the dual-language approach of the contrastive instruction, a change in perception towards the target feature emerged. Participant 4 (Condition 3) exemplified this shift, stating: “In Japanese, plurals aren’t important, but in English, I can now see that they’re quite important.” This heightened awareness was echoed by Participant 5 (Condition 3), who expressed how it seemed to manifest in subsequent written performance: “[Since the training] I’ve become much more attentive to using plurals when I write.” Lastly, stimulated recall indicated that the interviewees were able to select appropriate responses and justify their choices by explaining pluralization rules, particularly through countability.
The exit questionnaire responses offered further illumination. Its first overarching aim was to gain insight into the participants’ self-rated awareness of plurals. Respondents from all three experimental conditions generally deemed them problematic: 66.9% reported being aware of frequently making plural-omission errors before the intervention, while 72.7% believed that loanwords tend to be harder to pluralize than non-loanwords.
The second aim of the questionnaire was to explore the perceived effects and general impressions of online training. Overall, improvements were noted, and orientations were positive. For example, the majority of respondents believed they had better understood distinguishing between countable and uncountable nouns following the treatment, particularly in Conditions 2 and 3 (86.2% and 94%, respectively). Additionally, directing attention towards L1–L2 conceptual differences, especially for those in Condition 3, appeared to be valuable, with one comment voicing: “Learning the distinction between countable and uncountable nouns was helpful because there is no such distinction in Japanese between these [categories of] words.” The potential benefits of L1 use in L2 learning were also acknowledged. In Condition 3, 70% of respondents considered L1 pluralization practice advantageous for L2 grammar development, a sentiment echoed in the comments: “The approach of using –tachi was very helpful because it enabled me to make connections with Japanese.” Finally, highlighting morphological L1–L2 differences through translation practice also appeared to be a beneficial component of the instruction, with more than 80% in the two contrastive conditions believing that it aided subsequent performance.
Discussion
Contrastive versus non-contrastive instruction
RQ1 probed the extent to which a significant difference would exist between contrastive and non-contrastive instruction. The results largely patterned with those of Lucas and Yiakoumetti (Reference Lucas and Yiakoumetti2019) and Lucas (Reference Lucas2020) in that the contrastive conditions led to greater plural-marking gains than the non-contrastive conditions, thereby supporting the first hypothesis.
A possible source of the learning problem is that the more habituated nominal singular forms become from repeated L1 use and experience (since plurals are not obligatory in Japanese), the more biased processing routines may become from blocking and learned attention (Ellis, Reference Ellis2006, Reference Ellis2022; Ellis & Sagarra, Reference Ellis and Sagarra2010). Considering also how the plural morpheme possesses low contingency (Ellis, Reference Ellis2006, Reference Ellis2022), the combined effect is that plural marking can become difficult to acquire despite repeated exposure to the form (DeKeyser, Reference DeKeyser2005; Shintani & Ellis, Reference Shintani and Ellis2010). One interpretation of this first finding, then, is that the L1–L2 contrasts may have directed attention to how the same meanings can be expressed through different forms between Japanese and English, facilitating the reduction of competing cues and leading to morphological and conceptual remapping of plurals (Fernández, Reference Fernández2008; Henry et al., Reference Henry, Culman and VanPatten2009; MacWhinney, Reference MacWhinney, Gass and Mackey2012; McManus & Marsden, Reference McManus and Marsden2019b). Conceptual remapping may play a critical role when it comes to plural marking since differentiating the countability of English nouns is a notorious source of difficulty for L2 English learners whose L1 lacks such a system (Tsang, Reference Tsang2017), not least because plural forms are likely to require additional linguistic encoding (Kurumada & Grimm, Reference Kurumada and Grimm2019). The inclusion of a non-contrastive condition that similarly targeted plural marking (Condition 1, PlurNC–L2) suggests that EI serving to clarify form-meaning connections through L2 rule-explanation alone was not as effective as the explicit cross-linguistic contrasts, at least in the immediate post-tests. However, performance across the two contrastive conditions (Conditions 2 and 3, PlurC–L2 and PlurC–L1/L2) in the delayed post-tests fell to the same degree of accuracy as the non-contrastive condition (Condition 1), implying that, in the long-run, all types of practice were beneficial.
The architecture of instruction: EI and practice type
RQ2 sought to establish which combination of EI and practice type would produce the greatest impact on learning outcomes. Applying L2 rules to L1 (by adding the Japanese plural marker –tachi to nominal loanwords) through L1 practice activities in Condition 3 appeared to trigger a stronger response in the receptive immediate post-test (AJT) compared with the other three conditions, which supports the second hypothesis. This finding is particularly relevant because it substantiates the role of L1 practice in L2 learning, aligning with previous research (McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b). In addition to the contrastive EI, the L1 practice may have further reinforced morphological and conceptual L1–L2 differences, which, despite failing to reach statistical significance, still led to around 10% greater gains in Condition 3 than in Condition 2 (Figure 8).
Conversely, the productive data for the immediate post-test (PDT) indicated no significant differences between the two contrastive conditions (Table 6), which counters support for the second hypothesis. A possible explanation for the inconsistency between the receptive and productive outcomes is that, although the novelty of the L1 pluralization may have enhanced the salience of the target feature (Ellis, Reference Ellis2006; Fukuta & Yamashita, Reference Fukuta and Yamashita2023; Spada & Lightbown, Reference Spada and Lightbown2008), this heightened sensitivity only pertained to recognizing associated errors in the AJT, without manifesting as a discernible difference in performance in the PDT. Note also that productive accuracy for both contrastive conditions dropped in the delayed post-test, plateauing at a similar level of accuracy as Condition 1.
In summary, the current study aligns with a growing corpus of research demonstrating that the effectiveness of both EI and practice type for the purpose of L2 learning is likely to hinge upon the nature of the EI and practice activities themselves along with the way they interact with the target structure and its associated processing problem (Alonso-Aparicio, Reference Alonso-Aparicio, Tyler, Ortega, Uno and Park2018; Fernández, Reference Fernández2008; Henry et al., Reference Henry, Culman and VanPatten2009, Malovrh & Lee, Reference Malovrh and Lee2022), with use of the L1 seeming to play a particularly beneficial role (de la Fuente & Goldenberg, Reference de la Fuente and Goldenberg2022; McManus & Marsden, Reference McManus and Marsden2017).
Other considerations in interpreting the findings
One commonality between the receptive and productive data sets is that accuracy for all three experimental groups fell to roughly the same level in the delayed post-tests. For the AJT, there was a significant difference between the experimental groups and the control group, indicating that, although performance deteriorated, the treatment nevertheless yielded a sustained improvement. On the other hand, none of the delayed post-test performances for the PDT in the experimental groups were statistically maintained, despite these groups being descriptively at least 20% more accurate than the control group (Figure 9). This may have been attributable to the instrument or perhaps to the fact that it was completed online as opposed to offline, which in some way may have affected performance (e.g., manual typing in L2, particularly given the learners’ limited experience for their age, may have required more cognitive resources than using pen and paper).
The productive findings from the PDT, in particular, suggest that more intensive instruction involving greater time and effort may be necessary before L1 form-meaning connections and processing routines can manifest as a weakening of L1-entrenched attention (Ellis, Reference Ellis2006, Reference Ellis2022; MacWhinney, Reference MacWhinney and Schmid2017). Indeed, McManus and Marsden (Reference McManus and Marsden2017) employed around 30 frequency counts per target exemplar in their reading-based treatments, a figure roughly triple that of the present study, implying that the quantity of practice may be just as crucial as the content. As previously elucidated, this may be especially true when considering the concept of countability (Jarvis & Pavlenko, Reference Jarvis and Pavlenko2010; Tsang, Reference Tsang2017), as it creates the foundation of plurality, which is known for its resistance to morphosyntactic change among native speakers of Japanese (Lucas, Reference Lucas2022; Shintani & Ellis, Reference Shintani and Ellis2010). Furthermore, the participants in McManus and Marsden (Reference McManus and Marsden2017) were at a more advanced level (around CEFR B2 compared with upper A1 in the present study), suggesting that language proficiency may also have a mediating effect.
Another factor to consider is that since both the AJT and PDT more likely tapped into explicit knowledge, automatized knowledge may not have been available (Godfroid, Reference Godfroid, Mohebbi and Coombe2021). For this reason, the findings should be approached with caution and point towards the need for future research to measure plurals through implicit knowledge (Fukuta & Yamashita, Reference Fukuta and Yamashita2023). This could be achieved, for example, through online measures that force learners to attend to the target feature in an AJT in individually timed sentences (as opposed to collectively like in the present study), which might employ additional data sources from reaction times (since incongruent forms are thought to slow down processing; Jiang, Reference Jiang2007) or even from eye-tracking (where it is posited that longer ocular fixations are indicative of slower processing; Rusk et al., Reference Rusk, Paradis and Järvikivi2020).
Additional insights from participant orientations
RQ3 enquired about participant orientations towards plural marking and the intervention, particularly regarding the explicit use of L1 in L2 learning, as determined through interview and questionnaire data. Overall, the participants reiterated the problematic nature of plural marking across all three experimental conditions. However, they exhibited increased sensitivity to the target feature and positive attitudes towards the instructional techniques, expressing satisfaction with the learning outcomes. These findings fell in line with the third hypothesis.
Although all three experimental conditions appeared to benefit, the dual-language use in the two contrastive conditions seemed to facilitate learning in two principal ways. Firstly, the participants indicated that by developing an awareness of cross-linguistic differences through contrastive EI and practice tasks involving L1–L2 translation, not only did they come to recognize the significance of plural forms and the underlying concept of countability, but they also reported viewing the approach as a valuable learning resource.
Secondly, the additional L1 practice in Condition 3 was generally well received, suggesting that, despite commonly reported aversion associated with the use of the native tongue in L2 learning among commentators (de la Fuente & Goldenberg, Reference de la Fuente and Goldenberg2022), it might hold more of a substantial role than previously assumed for both SLA processes (McManus & Marsden, Reference McManus and Marsden2017) and formal learning contexts (Horst et al., Reference Horst, White and Bell2010).
This triangulation adds weight to the emergent pattern from the statistical findings, particularly given the paucity of prior research seeking to investigate morphological and conceptual contrasts by supplementing quantitative data with qualitative insights.
Conclusion
Overall, the findings of the present study hold important implications. They support recent developments testifying to the merits of contrastive instruction and L1 practice in L2 learning (Lucas, Reference Lucas2020; McManus, Reference McManus2022). It is possible that L1 practice may help to increase contingency from competing cues from the polysemous –s suffix, thereby assisting learners to improve their awareness of low L1 cue validities (Ellis, Reference Ellis2006, Reference Ellis2022; McManus & Marsden, Reference McManus and Marsden2017, Reference McManus and Marsden2018, Reference McManus and Marsden2019a, Reference McManus and Marsden2019b), as well as reduce the effects of blocking and learned attention from repeated use and experience of singular forms in L1 (Ellis, Reference Ellis2006, Reference Ellis2022; Ellis & Sagarra, Reference Ellis and Sagarra2010). In so doing, this reiterates the function of dual-language use as part of the acquisition process (Ballinger et al., Reference Ballinger, Lau and Quevillon Lacasse2020; Woll & Paquet, Reference Woll and Paquet2021), including that of an online nature (Lee, Reference Lee2023), by tailoring the instructional elements of EI and practice type to align with the nature of the learning problem (Fernández, Reference Fernández2008; McManus, Reference McManus2022).
At the same time, the study equally gives rise to new questions, particularly on the basis of its limitations. It should be emphasized that the findings are merely tentative and warrant circumspection. Because any observable gains from the contrastive instruction eventually tapered off in the delayed post-tests, exposure to a greater number of target exemplar tokens (McManus & Marsden, Reference McManus and Marsden2017), along with exploring the effects of higher proficiency levels and online versus offline deliveries, is recommended. Moreover, while the number of target nouns and practice sentences across conditions was approximately equal, future replications should standardize them to precisely the same amount. Finally, although a non-contrastive plural-marking condition was included, it lacked L1 practice activities, making it difficult to establish whether the nature of the EI or the practice type had a greater influence on the outcomes. Therefore, future research should aim to increase and standardize opportunities for learners to operate on the target language to assess the durability of effects. In addition, further variables could be included in the non-contrastive instruction to observe their impact in relation to contrastive instruction, not only for plural marking but also for other challenging linguistic forms relevant to homogenous learner groups.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S027226312400038X.
Funding
This project was partly funded by the Kansai University Support Fund for Newly Appointed Faculty.
Acknowledgements
I am grateful to the handling editor, Dr. Luke Plonsky, and the three anonymous reviewers for their meticulous and insightful feedback, which significantly enhanced the quality of this manuscript. I extend my appreciation to Dr. Kevin McManus for his constructive comments and encouragement when this paper was originally presented at AAAL in Portland, Oregon, in 2023. Additionally, I thank Dr. Atsushi Mizumoto for his invaluable assistance with the statistical analyses, as well as Dr. Myles Grogan for his help with the inter-rater work and qualitative analyses. The completion of the data collection owes much to the support of Peter Chu, Andy Decker, William Marcus, and Jose Porras, whose contributions were fundamental to this study. Lastly, I express my gratitude to the students whose participation made this research possible. Any remaining errors are my own.