Exploring the potential of content-embedded working memory capacity tasks for advancing second language acquisition research

Janire Zalbidea; Bernard I. Issa

doi:10.1017/S0272263125000154

Exploring the potential of content-embedded working memory capacity tasks for advancing second language acquisition research

Published online by Cambridge University Press: 25 March 2025

Janire Zalbidea

and

Bernard I. Issa

Show author details

Janire Zalbidea*: Affiliation:
Maynooth University, Maynooth, Co. Kildare, Ireland
Bernard I. Issa: Affiliation:
University of Tennessee Knoxville, Knoxville, TN, USA
*: Corresponding author: Janire Zalbidea; Email: [email protected]

Article contents

Abstract
Measuring working memory capacity: Complex and content-embedded tasks
Working memory capacity and explicit L2 aptitude
Working memory capacity and L2 reading comprehension
Method
Scoring and reliability analyses
Analysis
Results
Discussion
Limitations and conclusions
Data availability statement
Competing interest
Footnotes
References

Rights & Permissions

Abstract

This article explores the utility of content-embedded working memory capacity (WMC) tasks for advancing second language (L2) research. While both complex span and content-embedded tasks implement a dual-task paradigm that requires processing and maintenance of information, they differ in that the former demand maintenance of extraneous memory elements during processing, while the latter demand processing and maintenance of the same elements. Since manipulating information stored in working memory is critical for L2 processing and development, particularly in intentional learning contexts, content-embedded tasks may serve as strong predictors of several linguistic outcomes. We report preliminary evidence suggesting that both content-embedded tasks (available in IRIS [https://www.iris-database.org/details/iv6nR-HD9NQ]) and complex span tasks can be significant predictors of explicit L2 aptitude and L2 reading comprehension, but that content-embedded tasks can show advantages over complex span tasks in some instances. We discuss methodological implications for the measurement of WMC in L2 research.

Keywords

complex span task content-embedded task explicit L2 aptitude L2 reading comprehension working memory capacity

Type: Methods Forum
Information: Studies in Second Language Acquisition , First View , pp. 1 - 25

DOI: https://doi.org/10.1017/S0272263125000154 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open materials
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Working memory refers to the mental system responsible for maintaining, processing, and using information in service of complex cognition (e.g., Baddeley, Reference Baddeley2017; see Cowan, Reference Cowan2017 for a taxonomy of definitions). Particularly among theoretical accounts of second language acquisition (SLA) that emphasize the role of attention (e.g., Leow, Reference Leow2015; Schmidt, Reference Schmidt1993; Skehan, Reference Skehan2018; VanPatten, Reference VanPatten2004), it is broadly assumed that the cognitive processes that support successful L2 development and performance rely heavily on the effective deployment of limited working memory resources. Kormos (Reference Kormos, Granena and Long2013), for instance, identifies a central role for working memory across all language learning processes, including input processing, noticing, knowledge integration, and automatization. In line with this view, individual differences in working memory capacity (WMC) have been shown to predict a wide array of L2 outcomes, including vocabulary and grammar development (e.g., Li, Reference Li2015; Li, Ellis, & Zhu, Reference Li, Ellis and Zhu2019; Martin & Ellis, Reference Martin and Ellis2012; Serafini & Sanz, Reference Serafini and Sanz2016; Zalbidea & Sanz, Reference Zalbidea and Sanz2020; for a review, see McCormick & Sanz, Reference McCormick, Sanz, Schwieter and Wen2022), reading and listening comprehension (e.g., Abu-Rabia, Reference Abu-Rabia2003; In’nami, Hijikata, & Koizumi, Reference In’nami, Hijikata and Koizumi2022; Sagarra, Reference Sagarra2017), or accuracy and complexity in L2 production (e.g., Vasylets & Marín, Reference Vasylets and Marín2021; Zalbidea, Reference Zalbidea2017). Research has also documented positive, albeit variable, links between WMC and dimensions of explicit L2 learning aptitude (e.g., Robinson, Reference Robinson and Robinson2002; Roehr & Gánem-Gutiérrez, Reference Roehr and Gánem-Gutiérrez2009; Sáfár & Kormos, Reference Sáfár and Kormos2008; Yoshimura, Reference Yoshimura, Bonch-Bruevich, Crawford, Hellermann, Higgins and Nguyen2001), a construct shown to predict L2 achievement across domains (e.g., Granena, Reference Granena2014; Saito, Reference Saito2017; Saito, Suzukida, & Sun, Reference Saito, Suzukida and Sun2019; Sparks, Humbach, Patton, & Ganschow, Reference Sparks, Humbach, Patton and Ganschow2011; Sparks, Patton, & Luebbers, Reference Sparks, Patton and Luebbers2019; Yalçin & Spada, Reference Yalçin and Spada2016; Yilmaz & Granena, Reference Yilmaz and Granena2016; for a review, see Li, Reference Li, Wen, Skehan, Biedroń, Li and Sparks2019). As will be further discussed subsequently, a key observation is that these prior L2 studies have measured WMC using complex span tasks, such as reading or operation span tasks, which combine a memory storage requirement (e.g., remembering letters) with an extraneous processing requirement (e.g., solving math equations).

Although decades of research have documented the theoretical and empirical importance of working memory as a construct in SLA, methodological discussions surrounding the measurement of WMC have only recently burgeoned (see Shin & Hu, Reference Shin, Hu, Schwieter and Wen2022; Wen, Reference Wen2016; Wen, Juffs, & Winke, Reference Wen, Juffs, Winke, Winke and Brunfaut2021). Such discussions have centered mostly around the validity of complex span tasks, such as debates about how manipulating different task parameters impacts their psychometric properties (e.g., Alptekin, Erçetin, & Özemir, Reference Alptekin, Erçetin and Özemir2014; In’nami et al., Reference In’nami, Hijikata and Koizumi2022; Linck, Osthus, Koeth, & Bunting, Reference Linck, Osthus, Koeth and Bunting2014; Shin, Reference Shin2020) or how certain measurement procedures may lead to confounding effects (e.g., Wen, Reference Wen2016; Wen et al., Reference Wen, Juffs, Winke, Winke and Brunfaut2021). For example, Alptekin et al. (Reference Alptekin, Erçetin and Özemir2014) showed that the demands of the processing requirement in the reading span test (judging if a sentence contains a morphosyntactic vs. semantic anomaly) modulates the relationship between L2 reading achievement and the processing component of this complex span measure. These studies offer important implications in relation to the administration and assessment of complex span tasks, particularly as they relate to the widely popular reading span test, and have aided in identifying factors that impact the criterion validity of these tasks in L2 research. Interestingly, however, there has been less consideration of how WMC is operationalized across different memory tests, and what the implications of researchers’ test choices can be for the theoretical interpretations of the links between WMC and L2 outcomes. These issues are worth attention, as they can lead to more grounded rationales for the selection of WMC tasks that combine storage and processing demands and, ultimately, they can help increase the explanatory and predictive utility of WMC in SLA research.

Here, we sought to contribute toward this broader goal by bringing attention to the potential of content-embedded WMC tasks. While both complex span and content-embedded tasks implement a dual-task paradigm that requires processing and maintenance of information, they differ critically in that the former demand maintenance of extraneous (i.e., task-irrelevant) memory elements during processing (e.g., memorize letters while solving math equations unrelated to said letters), while the latter demand processing and maintenance of the same (i.e., task-relevant) memory elements (see Zamary, Rawson, & Was, Reference Zamary, Rawson and Was2019). Since cognitive accounts of SLA posit that processing information that is stored in working memory is critical for successful linguistic processing and development to take place (e.g., Leow, Reference Leow2015; Schmidt, Reference Schmidt1993; Skehan, Reference Skehan2018; VanPatten, Reference VanPatten2004), it is reasonable to expect that performance in content-embedded tasks, relative to complex span tasks, would be more predictive of certain L2 outcomes.

We begin this methods forum by describing the functional differences between the two types of WMC tasks in further detail, outlining the rationale for investigating the value of content-embedded tasks for informing SLA research and theory. Next, we report and discuss initial empirical evidence on the potential of content-embedded and complex span tasks for predicting performance in tests of explicit L2 aptitude and L2 reading comprehension, two measures for which links with WMC have been documented in the literature.

Measuring working memory capacity: Complex and content-embedded tasks

Complex span tasks comprise a type of WMC task that reflects concurrent storage and processing demands (e.g., Conway, Kane, Bunting, Hambrick, Wilhelm, & Engle, Reference Conway, Kane, Bunting, Hambrick, Wilhelm and Engle2005; Engle et al., Reference Engle, Tuholski, Laughlin and Conway1999; Kane, Bleckley, Conway, & Engle, Reference Kane, Bleckley, Conway and Engle2001). Of relevance to the present work, a key characteristic of most complex span tasks—such as reading, listening, or operation span tasks—is that they require the maintenance of information that is extraneous or independent from the information being processed (see Miyake, Reference Miyake2001). In other words, such complex span tasks require storing elements in memory that are not relevant to the processing task. For instance, recent versions of the operation span task ask participants to remember letters presented one at a time in serial order (i.e., the storage component) while they judge the accuracy of a solution for a math equation (i.e., the processing component). Reading or listening span tasks are conceptually similar in design, requiring participants to remember letters, digits, or unrelated words while judging the well-formedness or plausibility of a series of sentences. As illustrated, these complex span measures tap into “the ability to maintain information that is irrelevant to the information being processed in the working memory system” (Zamary et al., Reference Zamary, Rawson and Was2019, p. 2547).

Researchers in cognitive psychology have also employed another category of WMC measures, which we refer to here as content-embedded tasks, following Was, Rawson, Bailey, and Dunlosky (Reference Was, Rawson, Bailey and Dunlosky2011) and Zamary et al. (Reference Zamary, Rawson and Was2019). As the aforementioned complex span tasks, content-embedded tasks are also dual tasks involving processing and storage requirements (Ackerman, Beier, & Boyle, Reference Ackerman, Beier and Boyle2002; Kyllonen & Cristal, Reference Kyllonen and Christal1990). However, content-embedded tasks are functionally different from complex span tasks in that they require the maintenance and processing of the same memory elements. In this sense, content-embedded tasks involve maintaining “task-relevant” information. For example, in the ABCD task, participants are asked to consider three premises about the order of the letters A, B, C, and D with the goal of creating a string in which the letters are arranged in a particular order. They read about the ordering of A and B first (e.g., “A comes after B”), followed by the ordering of C and D (e.g., “D comes before C”), and then the ordering of the two sets of letters (e.g., “Set 1 comes after Set 2”). Participants are instructed to indicate the solution afterward (in this example, DCBA). As shown, the information being maintained for later recall (e.g., DCBA) is dependent on the processed information (i.e., premises about letter and set ordering). Therefore, unlike the functional requirements of complex span tasks, content-embedded tasks ask participants to maintain relevant information—as opposed to extraneous—for processing.Footnote ¹

To our knowledge, no empirical L2 studies to date have relied on content-embedded tasks to measure participants’ WMC. SLA research examining the contributions of WMC to L2 outcomes, including research on L2 aptitude and L2 reading comprehension, has prioritized the use of complex span tasks. Among them, the reading span test remains the most commonly administered measure, although some scholars have cautioned against using it as a single WMC measure given its language-reliant processing component (see, e.g., Juffs & Harrington, Reference Juffs and Harrington2011; Shin & Wu, Reference Shin, Hu, Schwieter and Wen2022; Wen, Reference Wen2016). This state of affairs is well illustrated, for instance, with a frequency analysis of the WMC tests considered in the studies in In’nami et al.’s (Reference In’nami, Hijikata and Koizumi2022) meta-analysis on the link between WMC and L2 reading achievement. As summarized in Table 1, out of the 68 empirical studies included in their sample, around 69% of them administered a reading span test. With a notable difference, the next most common complex span measure was the operation span test, administered in almost 15% of the studies.

Table 1. Distribution of WMC tests in studies included in In’nami et al. (Reference In’nami, Hijikata and Koizumi2022)

Of interest to SLA, research in cognitive psychology indicates that both complex span and content-embedded tasks are positively associated with higher-order cognitive skills, such as reading comprehension (e.g., Unsworth & Engle, Reference Unsworth and Engle2007; Was & Woltz, Reference Was and Woltz2007) or reasoning ability (e.g., Kyllonen & Christal, Reference Kyllonen and Christal1990). Two key empirical studies have directly compared the predictive value of complex span and content-embedded tasks: Was et al. (Reference Was, Rawson, Bailey and Dunlosky2011) predicted that content-embedded tasks (ABCD, digit, and alphabet tasks) would explain a greater amount of variance in L1 reading comprehension, as assessed by a series of standardized tests, relative to complex span tasks (reading, operation, and counting span tasks). They found that the ABCD and digit tasks loaded most strongly onto a content-embedded latent factor and that the reading span and operation span tasks did so onto a complex span latent factor. As expected, content-embedded tasks were more strongly linked to L1 reading comprehension scores than complex span tasks, highlighting the utility of the former for predicting cognitive activities that require processing of information maintained in memory.

In a more recent study using the same WMC measures, Zamary et al. (Reference Zamary, Rawson and Was2019) hypothesized that content-embedded tasks would also explain more variance in inductive reasoning than complex span tasks. Their prediction was driven by the fact that inductive reasoning requires finding common connections among stimulus elements through explicit thought processes, such as hypothesis formation and testing (e.g., Johnson-Laird & Khemlani, Reference Johnson-Laird, Khemlani and Ross2013; Klauer, Willmes, & Phye, Reference Klauer, Willmes and Phye2002). In line with Was et al. (Reference Was, Rawson, Bailey and Dunlosky2011), they found that the ABCD and digit tasks loaded most strongly onto a content-embedded latent factor, whereas the reading span and operation span tasks loaded most strongly onto a complex span latent factor. Their results also highlighted the advantages of content-embedded tasks relative to complex span tasks for explaining variance in inductive reasoning skills. Together, both studies provide evidence indicating that cognitive processes that require the maintenance of task-relevant information can be more strongly predicted by content-embedded tasks than complex span tasks.

Building on this argument and considering prior empirical findings in cognitive psychology, the present study begins exploring the utility of content-embedded tasks in L2 research relative to complex span tasks. As noted, it is uncontroversial to claim that working memory plays a central role in regulating and orchestrating the attentional resources that are implicated in key acquisitional processes, particularly at the beginning stages of SLA in intentional learning contexts, such as those typically found in classroom instruction. Such processes include input processing, conscious noticing of forms, the initial creation of form-meaning links, or the identification of patterns and rules, among others (e.g., Kormos, Reference Kormos, Granena and Long2013; McCormick & Sanz, Reference McCormick, Sanz, Schwieter and Wen2022; Wen, Reference Wen2016). WMC as measured by both complex span and content-embedded tasks is expected to positively relate to linguistic outcomes that are subserved by such processes. However, because manipulating information that is stored in working memory is considered particularly important for successful L2 processing and development, it is reasonable to expect content-embedded tasks to show certain predictive advantages in some cases.

To begin exploring this hypothesis, we consider the predictive value of the most popular complex span tasks (reading span and operation span tests) and established content-embedded tasks (ABCD and digit tests) in connection with L2 learners’ performance in explicit L2 aptitude and L2 reading comprehension tests. We chose to focus on explicit L2 aptitude—specifically, as it relates to language analytic and associative learning skills—and L2 reading comprehension here as a first step because their associations with WMC have been long theorized and continue to receive ample empirical interest. Regarding aptitude, both language analysis and associative learning involve cognitive abilities that require intentionally searching and working out relations and rules in datasets, and thus one’s ability to coordinate the processing and maintenance of task-specific memory elements, as gauged by content-embedded tasks, should serve an important role. Similarly, performance in content-embedded tasks should be particularly predictive of L2 reading comprehension, as readers must maintain relevant textual material in memory while simultaneously integrating it with new incoming material and their long-term prior knowledge to build a coherent mental representation. Effective storage and processing of task-specific textual elements should therefore be critical in managing key reading processes, such as making inferences or updating interpretations.

In sum, while both theory and research indicate connections between WMC and performance on explicit L2 aptitude and L2 reading comprehension tests, we explore the extent to which WMC is linked to individual variability in these criterion measures when operationalized using content-embedded and complex span tasks. In what follows, we provide some conceptual and empirical background to contextualize our results within the broader literature on WMC and its connections to explicit L2 aptitude and L2 reading comprehension.

Working memory capacity and explicit L2 aptitude

Language learning aptitude refers to a “conglomerate of different abilities that can assist in the different stages and processes of language learning” (Kormos, Reference Kormos, Granena and Long2013, p. 141). L2 aptitude has been shown to predict successful L2 learning processes and outcomes, particularly in instructed contexts (e.g., Bylund, Abrahamsson, & Hyltenstam, Reference Bylund, Abrahamsson and Hyltenstam2010; Granena, Reference Granena, Granena and Long2013; Li, Reference Li2015, Reference Li2016, Reference Li, Wen, Skehan, Biedroń, Li and Sparks2019; Kormos & Sáfár, Reference Kormos and Sáfár2008; Linck et al., Reference Linck, Hughes, Campbell, Silbert, Tare, Jackson and Doughty2013; Saito, Reference Saito2017; Saito et al., Reference Saito, Suzukida and Sun2019; Sparks et al., Reference Sparks, Humbach, Patton and Ganschow2011, Reference Sparks, Patton and Luebbers2019; Yalçin & Spada, Reference Yalçin and Spada2016; Yilmaz, Reference Yilmaz2013; Yilmaz & Granena, Reference Yilmaz and Granena2016). Explicit L2 learning aptitude specifically has been operationalized using tests such as the Modern Language Aptitude Test (MLAT; Carroll & Sapon, Reference Carroll and Sapon1959), the Pimsleur Language Aptitude Battery (PLAB; Pimsleur, Reed, & Stansfield, Reference Pimsleur, Reed and Stansfield2004), or the LLAMA tests (Meara, Reference Meara2005). As noted by Li and Zhao (Reference Li and Zhao2021), these tests are considered explicit to the extent that they “require learners to engage in conscious, effortful information processing” (p. 39).

Researchers have also been interested in examining the cognitive abilities that underlie explicit L2 aptitude (see Li, Reference Li, Li, Hiver and Papi2022). Since working memory comprises the cognitive system necessary for the temporary storage and manipulation of information, making it the central site for conscious processing (e.g., Baddeley, Reference Baddeley2017; see Cowan, Reference Cowan2017 for an overview of definitions), it is reasonable to posit a strong link between an individual’s WMC and their aptitude for explicit L2 learning (e.g., Robinson, Reference Robinson and Robinson2002, Reference Robinson2005; Sawyer & Ranta, Reference Sawyer, Ranta and Robinson2001; Skehan, Reference Skehan2018; Wen, Reference Wen2016). More specifically, the conscious and effortful mental processes that underlie successful performance on explicit associative learning or inductive grammatical learning tests, where learners are asked to work out or infer relationships between elements, are assumed to depend heavily on one’s capacity for the temporary storage and manipulation of information during online mental operations.

In line with this view, positive links have been reported between WMC and explicit L2 aptitude, although these connections have often been weaker than anticipated. Yoshimura (Reference Yoshimura, Bonch-Bruevich, Crawford, Hellermann, Higgins and Nguyen2001) found that L1 and L2 reading span tests were positively related to the language analytic and word association subtests of the Language Aptitude Battery for Japanese (LABJ; Sasaki, Reference Sasaki1996). Robinson (Reference Robinson and Robinson2002) also reported a moderate positive relationship between the LABJ and L1 reading span scores. Similarly, Sáfár and Kormos (Reference Sáfár and Kormos2008) found a medium-sized positive correlation between L1 Hungarian-L2 English learners’ WMC, measured by a backward digit span testFootnote ², and a language analysis subset adapted from the PLAB, as well as total scores in a Hungarian L2 aptitude test. Similarly, Yalçin, Çeçen, and Erçetin (Reference Yalçin, Çeçen and Erçetin2016) reported positive weak-to-moderate correlations between L1 Turkish-L2 English learners’ L1 and L2 reading span scores (but not operation span scores) and the LLAMA-F as well as the LLAMA total score. Roehr and Gánem-Gutiérrez (Reference Roehr and Gánem-Gutiérrez2009), conversely, found no significant correlations between L1 and L2 reading span tests and MLAT scores among L1 English learners of L2 German and Spanish. Lastly, Li’s (Reference Li2016) meta-analysis reported moderate to weak positive correlations between WMC and L2 aptitude components but also called for a more nuanced investigation of the links between WMC and different aspects of L2 aptitude.

In all, findings to date have been variable, leading to interpretations that suggest a reduced role of working memory skills in relation to some L2 aptitude measures (see, e.g., Yalçin et al., Reference Yalçin, Çeçen and Erçetin2016). As noted by Li (Reference Li, Li, Hiver and Papi2022), although WMC and L2 aptitude are related to one another, the constructs appear to be differentiated. Moreover, of relevance to this study, one important methodological observation is that prior research has relied only on complex span measures (primarily the reading span test) to assess learners’ WMC.

Working memory capacity and L2 reading comprehension

Reading is a complex endeavor that encompasses various sources of linguistic and nonlinguistic knowledge, as well as substantive attentional resources. Working memory is hypothesized to serve a central role in a wide range of cognitive processes involved in skilled L1 and L2 reading, such as the encoding of sentence-level information and the inference and integration of messages across the text (e.g., Daneman & Carpenter, Reference Daneman and Carpenter1980; Harrington & Sawyer, Reference Harrington and Sawyer1992; Koda, Reference Koda2005). More specifically, Li and D’Angelo (Reference Li, D’Angelo, Chen, Dronjic and Helms-Park2016) explain that working memory “serves as a buffer for recently read text, enabling the reader to make meaningful connections within and between sentences, while retrieving information from long-term memory to facilitate comprehension through the integration of background knowledge” (p. 162). Indeed, decades of research have documented that WMC is an important determinant of L1 reading comprehension (for meta-analytic reviews, see, e.g., Carretti, Borella, Cornoldi, & De Beni, Reference Carretti, Borella, Cornoldi and De Beni2009; Peng et al., Reference Peng, Barnes, Wang, Wang, Li, Swanson and Tao2018).

Working memory is also expected to be strongly implicated in L2 reading comprehension, even more so than in the L1. Since L2 knowledge tends to be more limited and less automated, the type of processing involved in L2 reading is typically more effortful and places heavier demands on learners’ limited working memory resources (e.g., Alptekin & Erçetin, Reference Alptekin and Erçetin2009; Rai, Loschky, & Harris, Reference Rai, Loschky and Harris2015). While extensive research has investigated the role of WMC in L2 reading (e.g., Leeser, Reference Leeser2007; Sagarra, Reference Sagarra2017), findings to date have also been far from consistent, as highlighted in In’nami et al.’s (Reference In’nami, Hijikata and Koizumi2022) most recent meta-analysis on the relationship between WMC and L2 reading comprehension. Their results revealed a small synthetic correlation (r = .30) between the two constructs, which the authors interpreted as indicative of a “weak relationship” (p. 16). Additionally, they found that correlations were larger when WMC measures were administered in the L2 rather than the L1, which can be explained by the fact that L2 WMC tests, unlike L1 or language-neutral ones, “are subject to learners’ L2 proficiency” (Li, Reference Li2023, p. 668). Conversely, correlations were not moderated by factors such as memory task modality (verbal, nonverbal, combined), L2 proficiency, or L2 reading comprehension test standardization. These observations largely align with and expand prior meta-analytic findings (e.g., Jeon & Yamashita, Reference Jeon and Yamashita2014; Linck et al., Reference Linck, Osthus, Koeth and Bunting2014; Shin, Reference Shin2020). Similar to what was observed for explicit L2 aptitude research, it is notable that the reading span test remains the most commonly used test in this line of inquiry, while content-embedded tasks have remained unexplored.

Building on earlier research, the present study begins to examine the utility of complex span and content-embedded tasks for predicting measures of explicit L2 aptitude and L2 reading comprehension among L1 English learners of L2 Spanish. We expected complex span and content-embedded tasks to show similarities since both types of tasks tap into the maintenance and processing of information. Nonetheless, considering that content-embedded tasks are designed to measure the ability to maintain and process task-relevant information—which is deemed critical for successful performance in explicit L2 aptitude and L2 reading comprehension tests—we hypothesized that they would show certain advantages over complex span tasks.

Before describing the study methodology, it is relevant to acknowledge that, while the present study treats explicit L2 aptitude and L2 reading comprehension as criterion variables to assess the predictive utility of complex span and content-embedded tasks, the theoretical connections between WMC and these constructs continue to be a matter of discussion, particularly the directionality and nature of the working memory-aptitude link (see, e.g., Kormos, Reference Kormos, Granena and Long2013; Jackson, Reference Jackson2020; Robinson, Reference Robinson2005). Highlighting that both play important roles in conscious L2 learning, Li (Reference Li, Li, Hiver and Papi2022) notes that WMC and explicit L2 aptitude “are significantly correlated with, and yet dissociable from” each other (p. 39), as suggested by the meta-analyzed findings in Li (Reference Li2016). We recognize, in line with recent proposals and as brought up by an anonymous reviewer, that the interconnections between WMC and L2 aptitude, among other cognitive variables, can be conceptualized as dynamic and reciprocal (see Jackson, Reference Jackson2020; Serafini, Reference Serafini, Kersten and Winsler2022).

That said, we wish to emphasize that our research was not aimed at contributing to theoretical debates regarding the working memory-aptitude link, but rather sought to provide an initial testing ground for advancing methodological discussions surrounding the functionality of different types of WMC tests. Since we did not seek to maximize explained variance in the study’s L2 outcomes, but rather to consider how the two WMC tests behaved, the complex span and content-embedded tasks were treated as sole cognitive predictors of both L2 aptitude and L2 reading comprehension. Investigating the connection between L2 aptitude and L2 reading comprehension was outside the scope of this work.

Method

Participants

The final sample included 42 undergraduate L1 English learners of L2 Spanish from two public universities in the United States.Footnote ³ Participants, whose mean age was 19.79 years (SD = 2.16; 32 female, 10 male), were enrolled in Spanish language courses and reported studying various majors (e.g., political science, biology, Spanish, social work, neuroscience). They had begun learning Spanish as a foreign language at around 12.69 years of age (SD = 3.95) and indicated a mean total of 5.88 years (SD = 3.50) of formal Spanish instruction, combining classroom experience at university (M = 1.45, SD = 1.59), high school (M = 2.54, SD = 1.34), middle school (M = 1.32, SD = 1.16), and elementary school (M = .82, SD = 1.68). Eight participants reported having some working knowledge of an additional foreign language (e.g., German, Japanese). Four participants indicated having studied abroad in a Spanish-speaking country in programs ranging from 2 weeks to 4.5 months in duration.

Materials

As part of a larger project, the study comprised two data collection sessions, each lasting approximately 1.5 hours. All data were collected individually in a research lab and participants received a $50 Amazon gift card.

Complex Span Tasks

We employed the short versions of the reading span (RSpan) and operation span (OSpan) tasks developed and validated by Oswald, McAbee, Redick, and Hambrick (Reference Oswald, McAbee, Redick and Hambrick2015). Both tasks are available at https://englelab.gatech.edu/taskdownloads and were administered in participants’ L1 (English) using E-Prime 2.0 (Psychology Software Tools Inc., Pittsburgh, PA). We opted for the abridged automated versions of both complex span tasks because their shorter administration time better aligned with the administration time of the content-embedded tasks of this study. Since limited testing time is a common logistic concern for SLA researchers, and our aim here was to compare the predictive utility of each task type, we found it pertinent to ensure that test administration time remained as similar as possible.

Reading span task

In the RSpan task, participants read a set of sentences of approximately 10-15 words in length and are instructed to judge whether or not each sentence is semantically sensible. After each sentence, participants are shown a letter for recall at the end of each set. Set sizes range from four to six sentences, and there are two administrations for each set size. The time limit for each processing-storage trial is automatically set to be equivalent to 2.5 standard deviations above the mean time for the processing-only answers that participants provide during the practice trials.

Operation span task

In the automated OSpan task, participants read a set of arithmetic operations and are instructed to judge whether a proposed solution to each math equation is true or false. Following each operation, participants are shown a letter for recall at the end of the set. Set sizes ranged from four to six operations, and there are two administrations for each set size, as in the RSpan. The time limit for each processing-storage trial is also set to be equivalent to 2.5 standard deviations above the mean time for the processing-only answers that participants provide during the practice trials.

Content-embedded tasks

The ABCD and digit tasks were adapted from Zamary et al. (Reference Zamary, Rawson and Was2019) (see also Was et al., Reference Was, Rawson, Bailey and Dunlosky2011) and also administered in participants’ L1 (English) using E-Prime 2.0. Variants of both content-embedded tasks have been employed in prior research (e.g., Ackerman et al., Reference Ackerman, Beier and Boyle2002; Kyllonen & Christal, Reference Kyllonen and Christal1990; Shipstead, Harrison, & Engle, Reference Shipstead, Harrison and Engle2016; Was & Woltz, Reference Was and Woltz2007). The E-Run files for the ABCD and digit tasks can be accessed in the IRIS repository.

ABCD task

The ABCD task asks participants to pay attention to three premises to establish the order of the letters A, B, C, and D. Participants are first provided with information about the ordering of the letters A and B (e.g., “B comes after A”). Next, participants click a button to replace the first premise with one about the ordering of the letters C and D (e.g., “D comes before C”). This process is repeated to again replace the second premise with one about the ordering of the two letter sets (e.g., “Set 2 comes before Set 1”). After clicking a button, participants are shown the eight possible orderings of the four letters and are instructed to select the right answer (see Figure 1). This cycle is repeated for each of the 23 trials. All trials are self-paced, although participants are asked to respond as quickly as possible. Three practice trials were included.

Figure 1. Sample response screen from the ABCD task.

Digit task

In the digit task, participants are shown six single-digit numbers one at a time for 2 seconds each. Afterward, participants respond to one or two questions about the number string (e.g., “How many even numbers are there?,” “What is the sum of the first two numbers?”). Participants are told to hold the numbers in memory to answer the questions accurately. All responses are numeric, and participants indicate their answers using the keyboard (see Figure 2). The response slide is untimed, but participants are instructed to respond as quickly and accurately as they can. The task includes a block of 12 single-question trials followed by an additional block of 12 double-question trials. Participants completed four practice trials that included single questions similar to those in the critical items.

Figure 2. Sample response screen from the digit task.

Explicit L2 aptitude test

Explicit L2 aptitude measures included two tests of language analytic ability and two tests of associative learning skills. The first measure of language analytic ability was Part 4 (Language Analysis) of the PLAB (Pimsleur et al., Reference Pimsleur, Reed and Stansfield2004), which gauges participants’ ability to reason logically with relation to a foreign language by asking them to figure out an artificial language grammar. The test includes directions with sample items and 15 critical items in which participants have to select the correct translation in the artificial language (out of four possible options) for each English phrase or sentence provided.Footnote ⁴ The second measure was subtest F of the computerized LLAMA (v. 2) aptitude battery (LLAMA-F; Meara, Reference Meara2005), which also tests participants’ ability to infer the grammatical system of an unknown language. Participants see a set of 20 pictures and accompanying sentences as they try to deduce the rules that govern the language. There is a 5-minute study phase, followed by the self-paced test phase.

For associative learning, the first measure was Part V (Paired Associates) of the MLAT (MLAT-5; Carroll & Sapon, Reference Carroll and Sapon1959), which tests rote learning ability. Participants are instructed to learn 24 English-Kurdish word pairs in 2 minutes. Afterward, there is a 2-minute practice phase where they are given the chance to write down the English equivalents, with the option to look at the study page if needed. Lastly, they are given 4 minutes to complete a multiple-choice test where they have to select the English equivalent of each Kurdish word (out of four possible options). The second measure was subtest B of the LLAMA (v. 2) aptitude battery. Participants have to learn as many associations as possible from a set of 20 novel words from a Central-American language and their corresponding images. They have a 2-minute study phase, followed by the self-paced test phase.

We selected these four tests because they operationalize language analytic ability and associative learning skills in a highly comparable manner, and thus anticipated that they would effectively capture common aspects of explicit L2 aptitude. Both the PLAB-4 and the LLAMA-F require participants to infer and apply grammatical rules, such as rules about word-internal morpheme order, in an unknown artificial language. The LLAMA-B and MLAT-5 also gauge participants’ ability to learn novel words similarly, by asking them to form and memorize connections with L1 translations or pictures.

L2 reading comprehension test

The L2 reading comprehension test comprised two subtests at the levels A1 and A2 from the Common European Framework of Reference for Languages, adapted from the standardized Diploma de Español como Lengua Extranjera by the Instituto Cervantes (see Appendix S1 online for materials). Both subtests followed the same format, instructing participants to read a short text and then answer five multiple-choice questions, for a total of 10 items. Since our participants were recruited from various curricular levels, they were told to complete the reading comprehension test at their own pace, taking as much time as needed within reasonable limits. Time on task was recorded and then employed as a covariate in the statistical analyses, as we report subsequently.

To ensure the test’s suitability for the US-based Spanish foreign language context, we sought the expertise of four native speakers representing various Latin-American Spanish varieties (Mexico, Puerto Rico, Colombia, Ecuador), all of whom had experience teaching Spanish in the US at the university level. They were tasked with reviewing the texts and identifying any linguistic elements that they found challenging to understand or were not commonly used in their respective varieties. Additionally, we requested feedback from the basic Spanish language program director of a large public US university to also ascertain the test’s appropriateness for students at lower curricular levels. Based on their input, we made any necessary changes to ensure that the texts and questions were in line with the linguistic expectations of US-based L2 Spanish learners. For instance, we replaced the word “piso” for “apartamento,” since the former is primarily used in Spain to refer to an apartment and may not be as familiar to learners in the US.

Scoring and reliability analyses

Complex span and content-embedded tasks

Descriptive statistics for all WMC measures are summarized in Table 2. Performance in the complex span tasks was represented as the total score, following Conway et al. (Reference Conway, Kane, Bunting, Hambrick, Wilhelm and Engle2005), which reflects the number of correctly recalled elements across all trials regardless of their order (see Oswald et al., Reference Oswald, McAbee, Redick and Hambrick2015).Footnote ⁵ All participants scored above the 50% accuracy threshold in the processing portion of both tasks, as advised by Richmond et al. (Reference Richmond, Burnett, Morrison and Ball2022).Footnote ⁶ The mean processing accuracy score was 28.20 (SD = 1.29; Min-Max = 25-30) and 28.00 (SD = 1.77, Min-Max = 23-30) for the RSpan and OSpan tasks, respectively. Cronbach’s alpha was .61 and .73 for the RSpan and OSpan tasks, respectively.

Table 2. Descriptive statistics for the WMC tests

Performance in the content-embedded tasks was calculated as the number of correct responses per minute, following Zamary et al. (Reference Zamary, Rawson and Was2019).Footnote ⁷ For the digit task, only time spent on the response screen was considered, as the stimulus presentation time for each digit was fixed.Footnote ⁸ Cronbach’s alpha was .90 and .75 for the ABCD and digit tasks, respectively.Footnote ⁹

A principal component analysis with oblimin rotation and Kaiser’s criterion of eigenvalue > 1.0 (Bartlett’s sphericity test, X² (6) = 67.07, p <.001, KMO = .70) revealed that all four WMC measures loaded on a single component (RSpan = .82; OSpan = .74; ABCD = .84; digit = .85), which was expected considering that all tests were designed to tap into the maintenance and processing of information. Fixing a binary component structure led to the RSpan (.75) and OSpan (.97) tests loading strongly onto one component, and the ABCD (.96) and digit (.91) tests loading strongly onto another component, indicating that each of the two complex span and content-embedded tasks pattern together (see Appendix S2 online).

Explicit L2 aptitude

Descriptive statistics for the aptitude measures are displayed in Table 3. Participants received 1 point for each accurate response to the 15 and 24 multiple-choice items on the PLAB-4 and MLAT-5, respectively. The LLAMA-F and LLAMA-B were automatically scored by the program on a 0-100 scale. Cronbach’s alpha for the PLAB-4 and MLAT-5 tests was .77 and .87, respectively. Reliability statistics could not be computed for the LLAMA (v.2) tests because the software did not record item-level data.

Table 3. Descriptive statistics for the explicit L2 aptitude tests

A principal component analysis with oblimin rotation and Kaiser’s criterion of eigenvalue > 1.0 (Bartlett’s sphericity test, X² (6) = 37.19, p <.001, KMO = .73) revealed that all four tests load on a single component indicative of explicit L2 aptitude (PLAB-4 = .75; LLAMA-F = .71; MLAT-5 = .83; LLAMA-B = .73) (see Appendix S3 online).

L2 reading comprehension

Participants received 1 point for each correct response to the 10 multiple-choice questions in the reading comprehension test. The average score was 6.40 points (SD = 2.18; Min-Max = 2-10). Cronbach’s alpha for this test was .65, indicating adequate instrument reliability given the relatively small item set. Participants took an average of 8.04 minutes (SD = 3.10; Min-Max = 4.40-17.40) to complete the test.Footnote ¹⁰ While we provide the continuous mean score for this test here as part of the descriptive statistics, the dichotomous (correct/incorrect) item-level data were employed in the analyses, as we describe subsequently.

Analysis

We first ran a correlation matrix of all the standardized measures of the study to explore connections at the task level, using all available data. Next, to examine the utility of complex and content-embedded tasks for predicting explicit L2 aptitude and L2 reading comprehension, a series of mixed-effects models were computed. To limit the number of tests and coefficients in the models, we derived composite variables for Complex Span (mean for z-OSpan and z-RSpan scores) and Content-embedded (mean for z-ABCD and z-digit scores) WMC estimates. For explicit L2 aptitude, we calculated two separate linear mixed-effects models with proportion correct as the dependent variable, Complex Span and Content-embedded WMC as fixed effects, and Participant and Test as random effects, since individual accuracy data were considered across four aptitude tests. For L2 reading comprehension, we computed two separate logit mixed-effects models with accuracy (1 = correct, 0 = incorrect) on the L2 reading comprehension test as the dependent variable, Complex Span and Content-embedded WMC as fixed effects, and Participant and Item as random effects. We also included time on task (in minutes, standardized) as a fixed effect covariate in the baseline model. Random slopes were not retained in either set of models due to singular fits. For both dependent variables, nested models were compared using chi-squared tests to examine if inclusion of fixed effects improved model fit. Descriptive and inferential analyses were performed in jamovi (v.2.3.28.0), with the exception of the mixed-effects models, which were fitted and compared in R (v.4.1.1.) using the lme4 (Bates, Mächler, Bolker, & Walker, Reference Bates, Mächler, Bolker and Walker2015), lmerTest (Kuznetsova, Brockhoff, & Christensen, Reference Kuznetsova, Brockhoff and Christensen2017), and jtools (Long, Reference Long2022) packages. Models were estimated using maximum likelihood, with listwise deletion of observations, and optimized with BOBYQA.

Results

Table 4 displays the correlation matrix of all task-level measures. Regarding the complex span tasks, only the RSpan showed significant positive correlations with multiple measures of L2 aptitude and reading comprehension. Both content-embedded tasks evidenced significant positive associations with various measures. The Complex Span WMC and Content-embedded WMC composites were also significantly correlated, r = .57, p <.001. Of all the significant correlation coefficients, only the bootstrapped 95% CIs for the coefficient between the RSpan and the MLAT-5 crossed zero (see Appendix S4 online for full table). We now turn to describing results from the mixed-effects models.

Table 4. Correlation matrix: WMC, L2 aptitude, and L2 reading comprehension tests

Note: L2RCT = L2 Reading Comprehension Test.

*p < .05, **p < .010, ***p < .001

WMC and explicit L2 aptitude

Tables 5 and 6 display the estimates for the linear mixed-effects models containing Complex Span WMC and Content-embedded WMC as fixed effects, respectively. Both WMC variables were significant positive predictors of explicit L2 aptitude. The marginal R², representing the variance explained by fixed effects, was slightly larger for the Content-embedded WMC model (.11) than for the Complex Span WMC model (.08).

Table 5. Complex span WMC and L2 aptitude: Estimates for linear model

Note: R² marginal = .08; R² conditional = .58.

Table 6. Content-embedded WMC and L2 aptitude: Estimates for linear model

Note: R² marginal = .11; R² conditional = .57.

To probe deeper into the contributions of each WMC predictor, a series of nested models (see Table 7) were compared using chi-square tests. A model with Complex Span WMC as a fixed effect (Model 1a) provided a significantly better fit than the random-effects baseline model (Model 0), with only participant- and test-level intercepts, χ² (1) = 8.90, p = .003, as did a model with Content-embedded WMC as a fixed effect (Model 1b), χ² (1) = 13.32, p <.001. Additionally, a more complex model that included both WMC measures (Model 2) provided a significantly better fit than a model with only Complex Span WMC as a fixed effect (Model 1a), χ² (1) = 6.14, p = .013, but this was not the case when compared to a model with only Content-embedded WMC as a fixed effect (Model 1b), χ² (1) = 1.71, p = .191. That is, while adding Content-embedded WMC into a model that included Complex Span WMC as a fixed effect significantly improved model fit, the opposite was not true, suggesting that Content-embedded WMC explains variance beyond Complex Span WMC.

Table 7. Explicit L2 aptitude: Summary of models

WMC and L2 reading comprehension

Tables 8 and 9 display the estimates for the logit mixed-effects models with Complex Span WMC and Content-embedded WMC as fixed effects, respectively. While only Content-embedded WMC emerged as a significant coefficient, the odds ratio linked to both WMC predictors followed the expected positive pattern of association. For each unit increase in Content-embedded WMC, the odds of scoring accurately on the L2 reading comprehension test increase by 1.63, compared to the slightly lower value of 1.54 associated with each unit increase in Complex Span WMC. In line with the results for explicit L2 aptitude, the marginal R² was slightly larger for the Content-embedded WMC model (.04) than the Complex Span WMC model (.02).

Table 8. Complex span WMC and L2 reading comprehension: Estimates for logit model

Note: R² marginal = .02; R² conditional = .40.

Table 9. Content-embedded WMC and L2 reading comprehension: Estimates for logit model

Note: R² marginal = .04; R² conditional = .40.

Nested models (see Table 10) were again compared using chi-square tests. A model with Complex Span WMC as a fixed effect (Model 1a) provided a descriptively better fit than the baseline model (Model 0), with only participant- and item-level random intercepts and time on task as a covariate, but this difference was not statistically significant, χ² (1) = 3.28, p = .070. In contrast, a model with Content-embedded WMC as a fixed effect (Model 1b) provided a significantly better fit than the baseline, χ² (1) = 4.79, p = .029. A model with both WMC predictors (Model 2) did not provide a significantly better fit than one with either only Complex Span WMC (Model 1a), χ² (1) = 2.16, p = .141, or Content-embedded WMC as a fixed effect (Model 1b), χ² (1) = .64, p = .422. In sum, unlike for explicit L2 aptitude, only a model with Content-embedded WMC as a fixed effect provided a significantly better fit to the data than the baseline model.

Table 10. L2 reading comprehension: Summary of models

Supplemental analyses

In line with Zamary et al.’s (Reference Zamary, Rawson and Was2019) and Was et al.’s (Reference Was, Rawson, Bailey and Dunlosky2011) reasoning, we interpret these results in connection with the functional differences previously described between content-embedded and complex span tasks. However, following Zamary et al. (Reference Zamary, Rawson and Was2019), we decided to conduct a series of supplemental analyses to discount alternative, artifact-related explanations for our results.

Supplemental model set 1

First, even though complex span tasks are typically scored using total span performance accuracy on the storage component of the task (see Conway et al., Reference Conway, Kane, Bunting, Hambrick, Wilhelm and Engle2005), individuals’ performance on the processing component (i.e., judging sentences or solving math equations) can also vary nontrivially (see Richmond et al., Reference Richmond, Burnett, Morrison and Ball2022). It is possible that scoring complex span tasks by considering performance on both the storage and processing components increases their predictive value in relation to explicit L2 aptitude. To test this, we ran a series of parallel models with a composite Complex Span WMC variable derived from the mean of the standardized total span and processing accuracy scores in the RSpan and OSpan tasks (see Appendix S5 online). To summarize, the models produced the same qualitative pattern of results as the main analyses:

Regarding explicit L2 aptitude, a model with the new Complex Span WMC composite as a fixed effect provided a significantly better fit than the baseline model, χ² (1) = 9.62, p = .002. A fuller model that included both WMC predictors also provided a significantly better fit than a model with only the new Complex Span WMC composite as a fixed effect, χ² (1) = 5.28, p = .021, but this was not the case when compared to a model with only Content-embedded WMC as a fixed effect, χ² (1) = 1.56, p = .211.

Regarding L2 reading comprehension, a model with Complex Span WMC as a fixed effect did not provide a significantly better fit than the baseline model, χ² (1) = 2.86, p = .091. Additionally, a model with both cognitive predictors did not provide a significantly better fit than a model with either Complex Span WMC, χ² (1) = 2.14, p = .143, or with Content-embedded WMC as a fixed effect, χ² (1) = .21, p = .645.

Supplemental model set 2

Lastly, an additional set of alternative models were computed without considering OSpan performance, that is, with only the RSpan storage and processing accuracy scores included in the composite (see Appendix S6 online). While this supplemental analysis entailed excluding part of the data, we found it to be pertinent given that the RSpan remains the most widely employed WMC measure in SLA research, with usage frequencies far exceeding those of the OSpan.

For explicit L2 aptitude, the additional models also yielded the same pattern of results as our main analyses: a model with the RSpan estimate as a fixed effect provided a significantly better fit than the baseline model, χ² (1) = 9.30, p = .002. A fuller model that included both cognitive predictors also provided a significantly better fit than a model with only RSpan as a fixed effect, χ² (1) = 5.37, p = .021, but this was not the case when compared to a model with only Content-embedded WMC as a fixed effect, χ² (1) = 1.34, p = .246.

For L2 reading comprehension, results departed slightly from the main models. In this case, a model with RSpan as a fixed effect did provide a significantly better fit than the baseline model, χ² (1) = 4.17, p = .041. However, a model with both WMC predictors did not provide a significantly better fit than a model with either only RSpan as a fixed effect, χ² (1) = 1.44, p = .230, or one with only Content-embedded WMC as a fixed effect, χ² (1) = .82, p = .365.

Discussion

The main goal of this methods forum was to begin exploring the potential value of content-embedded tasks for advancing SLA research. To this end, we first discussed their functional differences relative to complex span tasks and then made a case to consider them in light of core theoretical assumptions in cognitive accounts of L2 processing and development. Next, we reported preliminary empirical evidence suggesting that both content-embedded tasks and complex span tasks can predict measures of explicit L2 aptitude and L2 reading comprehension, but that content-embedded tasks can show stronger links relative to complex span tasks in some cases. Specifically, we found that explicit L2 aptitude was significantly predicted by both task types, but also that the inclusion of content-embedded WMC improved model fit when complex span WMC was already considered. This suggests that content-embedded WMC tasks provide explanatory power beyond that which is offered by complex span WMC tasks with regards to explicit L2 aptitude. Regarding L2 reading comprehension, results were less definitive. Comprehension was only significantly predicted by content-embedded WMC, but considering complex span WMC along with content-embedded WMC did not show better fit than a single-predictor model. While this can still suggest a stronger link between L2 reading comprehension and content-embedded WMC relative to complex span WMC, results should be considered more tentatively, as we further discuss later on.

Overall, this set of findings may be interpreted in connection with the functional requirements of each task type. It can be argued that, relative to complex span tasks, content-embedded tasks can offer “a more direct measure of an individual’s ability to maintain information in [working memory] that is relevant to the cognitive process being performed” (Was et al., Reference Was, Rawson, Bailey and Dunlosky2011, p. 914). Whereas complex span tasks require storing information (e.g., letters) active in working memory while completing an unrelated processing component (e.g., solving math equations), content-embedded tasks require continuous manipulation of task-relevant information that is maintained in memory. In this regard, it is possible to argue that the demands for successful performance in the ABCD and digit tasks more closely approximate the nature of simultaneous processing and storage required in common explicit L2 aptitude tests, as well as in other complex cognitive tasks that are ubiquitous in intentional learning conditions. Under such conditions, learners are typically prompted to consciously infer or figure out links between various linguistic elements, and such processes can be assumed to rely heavily on their capacity for the temporary storage and manipulation of task-relevant data (see Granena, Reference Granena, Granena and Long2013, Reference Granena2014; Kormos, Reference Kormos, Granena and Long2013; Wen, Reference Wen2016).

Study outcomes expand previous research investigating connections between learner individual differences in WMC and explicit L2 aptitude (e.g., Sáfár & Kormos, Reference Sáfár and Kormos2008; Roehr & Gánem-Gutiérrez, Reference Roehr and Gánem-Gutiérrez2009; Yalçin et al., Reference Yalçin, Çeçen and Erçetin2016) as well as L2 reading comprehension (e.g., Jeon & Yamashita, Reference Jeon and Yamashita2014; Leeser, Reference Leeser2007; Sagarra, Reference Sagarra2017), with methodological implications for future work beyond these lines of inquiry. Findings highlight that complex span tasks that tap into “the ability to maintain information that is irrelevant to the information being processed in the working memory system” (Zamary et al., Reference Zamary, Rawson and Was2019, p. 2547), which have been prioritized in L2 research to date, may be less predictive of explicit L2 aptitude outcomes than content-embedded tasks, and possibly of certain L2 reading comprehension measures. These results also add to the evidence supporting the benefits of content-embedded tasks over complex span tasks for predicting several higher-level cognitive functions, including L1 reading comprehension (Was et al., Reference Was, Rawson, Bailey and Dunlosky2011) and inductive reasoning (Zamary et al., Reference Zamary, Rawson and Was2019).

Following Was et al.’s proposal (2011), the more robust connection observed between the content-embedded WMC composite and explicit L2 aptitude—as well as L2 reading comprehension, albeit to a lesser degree—may also suggest that this variable is reflective of participants’ general cognitive capacity to a greater extent than the complex span WMC composite. The RSpan and the OSpan have highly comparable designs, with both tasks similarly structured to tap into individuals’ ability to store memory elements during interference or distraction. In contrast, the storage and processing demands of the ABCD and digit tasks are coordinated slightly differently across modalities. While both of these tasks require participants to encode, maintain, and manipulate information in their working memory, there are differences in the way verbal or numerical information is presented and how the processing demand is intertwined across trials in each task. In this respect, it is possible that content-embedded tasks can better complement each other methodologically, while still tapping into the same construct, potentially capturing a wider breadth of working memory performance.

From a methodological standpoint, the insights provided by the supplemental analyses, which aimed to rule out alternative artifact-related reasons to explain results, also warrant some discussion. Findings indicate that the stronger link between the criterion measures and content-embedded WMC remains even when individual differences in both storage and processing performance are considered when scoring the complex span tasks, in line with Zamary et al. (Reference Zamary, Rawson and Was2019). This suggests that the predictive strength observed here for content-embedded tasks, particularly with regard to explicit L2 aptitude, may not be accounted for by claiming that they are a better measure of an individual’s ability to store and process information during a cognitive task. The most plausible explanation comes back to the aforementioned functional differences across task types with respect to the maintenance of task-relevant information in the interdependent storage and processing components.

For L2 reading comprehension, a slightly different picture emerged, since the supplemental models showed that both the new RSpan and the content-embedded WMC composites were significant predictors. Since the RSpan requires some basic knowledge and skills in the verbal domain, it is reasonable to expect this test to predict verbal outcome measures better than tests with a numerical processing component, such as the OSpan, due to the shared dependence on general verbal ability (see, e.g., Juffs & Harrington, Reference Juffs and Harrington2011; Shin & Wu, Reference Shin, Hu, Schwieter and Wen2022). In the present study, the close alignment between the cognitive and linguistic demands of the L2 reading comprehension test and the verbal processing component of the RSpan may have contributed to some of its predictive effects observed in the supplemental analyses. Nonetheless, any inferences from the L2 reading comprehension data should be drawn carefully because, unlike the explicit L2 aptitude data, which were elicited using four distinct tests with a considerable number of items each, the reading comprehension data were derived from a single type of test containing a small number of items.

In sum, the findings of this study have ramifications that align with our general aim of exploring the potential value of content-embedded tasks for advancing cognitively-oriented SLA research and theory. From a methodological standpoint, our results suggest that for L2 researchers interested in understanding how WMC is associated with explicit L2 aptitude, and possibly L2 reading comprehension, considering content-embedded tasks as part of their design can be worthwhile—either as a way to understand participant profiles or within an individual-differences approach. Additionally, there are practical reasons that can make the inclusion of these tasks advantageous for researchers, such as their fast and uncomplicated administration, similar to the abridged versions of the complex span tasks. From a theoretical perspective, going forward, the use of these tasks in conjunction with complex span tasks may allow researchers to delve deeper into the cognitive mechanisms underlying intentional L2 learning. This, in turn, can contribute to the development of more comprehensive models of SLA that account for the interplay between WMC and different L2 learning processes and outcomes.

Lastly, we wish to highlight that our intention is not to discourage the continued use of complex span tasks, or to diminish their theoretical relevance and robustness for SLA research. Quite the contrary, their utility is undoubtedly evidenced in decades of empirical research, which have demonstrated the superiority of these tests over traditional short-term memory span tests (i.e., tests requiring only storage of target memory elements) for explaining linguistic outcomes (see, e.g., Linck et al., Reference Linck, Hughes, Campbell, Silbert, Tare, Jackson and Doughty2013). Certainly, both complex span and content-embedded tasks tap into individual differences in the limited-capacity attentional resources that are broadly relevant for engaging in intentional L2 learning. Initial evidence from this study suggests that considering learners’ performance in content-embedded tasks can provide additional unique value for cognitively-oriented L2 research, at least in certain cases, and that the increased adoption of these tasks alongside complex span tasks can, ultimately, contribute to enhancing the explanatory and predictive scope of WMC in SLA.

Limitations and conclusions

Findings from this study should be interpreted considering its limitations. While the initial evidence presented here highlights the potential of content-embedded tasks, further research utilizing a wide range of L1 and L2 measures is necessary to support more fine-grained claims about the functional significance of these tests for advancing SLA research and theory. As previously mentioned, the current study employed one type of L2 reading comprehension assessment which included a small set of test items; thus, future work may wish to consider more comprehensive test formats that can broadly capture the diverse processes involved in L2 reading comprehension. Additionally, our sample of university students was relatively modest in size. Future research studies with larger sample sizes would allow for a more complex examination of the relationships among cognitive and linguistic variables through advanced statistical techniques, such as structural equation modeling, which can also account for measurement error. Studies considering process-level data in conjunction with outcome-level data would also be fruitful in understanding the mechanisms underlying WMC effects as measured by different task types.

To conclude, this study contributes to recent discussions on the measurement of WMC in L2 studies (e.g., Shin & Hu, Reference Shin, Hu, Schwieter and Wen2022; Wen et al., Reference Wen, Juffs, Winke, Winke and Brunfaut2021). While primarily focused on methodology, the findings also hold theoretical relevance. The demands of both complex span and content-embedded tasks are in accordance with the notion of working memory as a workspace for the processing and storage of information. However, it may be argued that content-embedded tasks better align with the core requirements associated with certain intentional L2 learning processes. One possibility is that successful L2 outcomes in conscious and effortful learning and processing conditions are not just predicted by one’s overall WMC for managing and storing information; for a range of L2 outcomes, which need to be further investigated, the emphasis may lie on the coordination of task-relevant information within a limited capacity system as an important individual difference underpinning variability.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S0272263125000154.

Data availability statement

The experiment in this article earned Open Material badge for transparent practices. The data are available at https://url.avanan.click/v2/r02/ and https://www.iris-database.org/details/iv6nR-HD9NQ.

Acknowledgments

Financial support for this research was provided by the College of Liberal Arts at Temple University. We would like to extend our sincere gratitude to Christopher Was and Amanda Zamary for their invaluable assistance with the content-embedded tasks. We are grateful to Nick Pandža for statistical advice, and to Philip Hamrick and Christopher Was, again, for valuable discussions about this work. We also thank Alex Alarcón, Molly Clark, Alberto Fernández del Valle, Daniel Guarín, Georgia Kikis, Josh Pongan, and Coral Zayas-Colón for their help with various aspects of this research. Finally, we thank the editors and anonymous reviewers at SSLA for their helpful feedback on earlier versions of this manuscript. Any errors are our own.

Competing interest

The authors declare none.

Footnotes

¹ A reviewer rightly observed that earlier versions of the reading span test that asked participants to recall sentence-final words (as opposed to unrelated letters) are similar to content-embedded tasks in that the stimuli involved in the processing and storage components overlap in content, at least partially. While this is the case, the functional requirements of these earlier versions of the reading span task still differ fundamentally from those of content-embedded tasks in that the information stored for later recall is not contingent on the processed information. That is, the cognitive operations targeted by the storage and processing components are not interdependent by design, a condition that characterizes content-embedded tasks: In both the ABCD and digit tasks, participants must store and continuously update information in memory and must process this information based on specific questions or instructions to arrive at the correct solution. While we acknowledge that the term “content-embedded” does not fully capture differences between these two task types, we have chosen to employ the same terminology as previous research (Was et al., Reference Was, Rawson, Bailey and Dunlosky2011; Zamary et al., Reference Zamary, Rawson and Was2019) for consistency reasons.

² The backward digit span test is typically considered a WMC measure because the order of items to be recalled must be manipulated (but see Redick & Lindsey, Reference Redick and Lindsey2013).

³ A participant who only completed one research session was not included in the final sample.

⁴ Even though the PLAB was initially developed and validated for use with students in grades 7-12 (ages 13-19) in the US educational system, L2 research has also administered it successfully among older populations (see, e.g., Dąbrowska, Reference Dąbrowska2018). Our results suggest that this test was similarly suitable for our sample of young adults studying at university.

⁵ One participant’s RSpan data were not recorded due to a technical error.

⁶ Richmond et al. (Reference Richmond, Burnett, Morrison and Ball2022) have shown that “enforcing an 85% or better accuracy criterion for the processing portion of the task [in complex span tasks] results in the removal of a disproportionate number of individuals exhibiting lower [WMC] estimates” (p. 780), and suggest researchers “instead choose to adopt a criterion for processing performance closer to 50% (where performance worse than 50% likely represents misunderstanding of or insufficient engagement with the processing task” (p. 791). We chose to follow their guidelines here, particularly because we administered the short versions of the complex span tasks.

⁷ A reviewer inquired about the inclusion of speed of response in assessing content-embedded task performance. The automated design of the complex span tasks took the speed of response into account by imposing time limits for each trial based on individual practice performance; in contrast, response trials were self-paced in content-embedded tasks. Considering task time in these tasks, following Zamary et al. (Reference Zamary, Rawson and Was2019), provided a way to account for processing speed variability while maintaining comparability with existing research.

⁸ A subset of the sample (n = 18) completed an earlier version of the digit task in which response deletion was not enabled, in line with the parameters of the ABCD task. Only one participant inquired about the possibility of deleting an answer during task completion. Their digit data were not considered in the present analysis, and subsequent participants performed an updated version of the task in which response deletion was enabled.

⁹ Item 19 in the digit task was removed for the reliability analysis because it had no variance.

¹⁰ A participant’s task time on the L2 reading comprehension test was not recorded due to an error.

References

Abu-Rabia, S. (2003). The influence of working memory on reading and creative writing processes in a second language. Educational Psychology, 23, 209–222. https://doi.org/10.1080/01443410303227CrossRef Google Scholar

Ackerman, P. L., Beier, M. E., & Boyle, M. D. (2002). Individual differences in working memory within a nomological network of cognitive and perceptual speed abilities. Journal of Experimental Psychology: General, 131, 567. https://doi.org/10.1037/0096-3445.131.4.567CrossRef Google Scholar PubMed

Alptekin, C., & Erçetin, G. (2009). Assessing the relationship of working memory to L2 reading: Does the nature of the comprehension process and reading span task make a difference? System, 37, 627–639. https://doi.org/10.1016/j.system.2009.09.007CrossRef Google Scholar

Alptekin, C., Erçetin, G., & Özemir, O. (2014). Effects of variations in reading span task design on the relationship between working memory capacity and second language reading. Modern Language Journal, 98, 536–552. https://doi.org/10.1111/j.1540-4781.2014.12089.xGoogle Scholar

Baddeley, A. D. (2017). Modularity, working memory and language acquisition. Second Language Research, 33, 299–311. https://doi.org/10.1177/0267658317709852CrossRef Google Scholar

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. doi:10.18637/jss.v067.i01CrossRef Google Scholar

Bylund, E., Abrahamsson, N., & Hyltenstam, K. (2010). The role of language aptitude in first language attrition: The case of pre-pubescent attriters. Applied Linguistics, 31, 443–464. https://doi.org/10.1093/applin/amp059CrossRef Google Scholar

Carretti, B., Borella, E., Cornoldi, C., & De Beni, R. (2009). Role of working memory in explaining the performance of individuals with specific reading comprehension difficulties: A meta-analysis. Learning and Individual Differences, 19, 246–251. https://doi.org/10.1016/j.lindif.2008.10.002CrossRef Google Scholar

Carroll, J. B., & Sapon, S. M. (1959). Modern Language Aptitude Test. Psychological Corporation.Google Scholar

Conway, A. R., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle, R. W. (2005). Working memory span tasks: A methodological review and user’s guide. Psychonomic Bulletin & Review, 12, 769–786. https://doi.org/10.3758/BF03196772Google Scholar PubMed

Cowan, N. (2017). The many faces of working memory and short-term storage. Psychonomic Bulletin & Review, 24, 1158–1170. https://doi.org/10.3758/s13423-016-1191-6Google Scholar PubMed

Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450–466. https://doi.org/10.1016/S0022-5371(80)90312-6Google Scholar

Dąbrowska, E. (2018). Experience, aptitude and individual differences in native language ultimate attainment. Cognition, 178, 222–235.Google Scholar PubMed

Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. (1999). Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. Journal of Experimental Psychology: General, 128, 309–331. https://doi.org/10.1037/0096-3445.128.3.309Google Scholar PubMed

Granena, G. (2013). Cognitive aptitudes for second language learning and the LLAMA language aptitude test. In Granena, G. & Long, M. (Eds.), Sensitive periods, language aptitude, and ultimate L2 attainment (pp. 3–41). John Benjamins. https://doi.org/10.1075/lllt.35.04graGoogle Scholar

Granena, G. (2014). Language aptitude and long-term achievement in early childhood L2 learners. Applied Linguistics, 35, 483–503. https://doi.org/10.1093/applin/amu013CrossRef Google Scholar

Harrington, M., & Sawyer, M. (1992). L2 working memory capacity and L2 reading skill. Studies in Second Language Acquisition, 14, 25–38. https://doi.org/10.1017/S0272263100010457Google Scholar

In’nami, Y., Hijikata, Y., & Koizumi, R. (2022). Working memory capacity and L2 reading: a meta-analysis. Studies in Second Language Acquisition, 44, 381–406. https://doi.org/10.1017/S0272263121000267Google Scholar

Jackson, D. O. (2020). Working memory and second language development: A complex, dynamic future? Studies in Second Language Learning and Teaching, 10, 89–109. https://doi.org/10.14746/ssllt.2020.10.1.5Google Scholar

Jeon, E. H., & Yamashita, J. (2014). L2 reading comprehension and its correlates: A meta-analysis. Language Learning, 64, 160–212. https://doi.org/10.1111/lang.12034Google Scholar

Johnson-Laird, P. N., & Khemlani, S. S. (2013). Toward a unified theory of reasoning. In Ross, B. H. (Ed.), The psychology of learning and motivation (pp. 1–42). Elsevier. https://doi.org/10.1016/B978-0-12-407187-2.00001-0Google Scholar

Juffs, A., & Harrington, M. (2011). Aspects of working memory in L2 learning. Language Teaching, 44, 137–166. https://doi.org/10.1017/S0261444810000509Google Scholar

Kane, M. J., Bleckley, M. K., Conway, A. R., & Engle, R. W. (2001). A controlled-attention view of working-memory capacity. Journal of Experimental Psychology: General, 130, 169–183. https://doi.org/10.1037/0096-3445.130.2.169Google Scholar PubMed

Klauer, K. J., Willmes, K., & Phye, G. D. (2002). Inducing inductive reasoning: Does it transfer to fluid intelligence? Contemporary Educational Psychology, 27, 1–25. https://doi.org/10.1006/ceps.2001.1079Google Scholar

Koda, K. (2005). Insights into second language reading. Cambridge University Press. https://doi.org/10.1017/CBO9781139524841Google Scholar

Kormos, J. (2013). New conceptualizations of language aptitude in second language attainment. In Granena, G. & Long, M. H. (Eds.), Sensitive periods, language aptitude, and ultimate L2 attainment (pp. 131–152). John Benjamins. https://doi.org/10.1075/lllt.35Google Scholar

Kormos, J., & Sáfár, A. (2008). Phonological short-term memory, working memory and foreign language performance in intensive language learning. Bilingualism: Language and Cognition, 11, 261–271. https://doi.org/10.1017/S1366728908003416CrossRef Google Scholar

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: tests in linear mixed effects models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13Google Scholar

Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389–433. https://doi.org/10.1016/S0160-2896(05)80012-1CrossRef Google Scholar

Leeser, M. J. (2007). Learner‐based factors in L2 reading comprehension and processing grammatical form: Topic familiarity and working memory. Language Learning, 57, 229–270. https://doi.org/10.1111/j.1467-9922.2007.00408.xGoogle Scholar

Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. Routledge. https://doi.org/10.4324/9781315887074CrossRef Google Scholar

Li, M., & D’Angelo, N. (2016). Higher-level processes in second language reading comprehension. In Chen, X., Dronjic, V., & Helms-Park, R. (Eds.), Reading in a second language: Cognitive and psycholinguistic issues (pp. 159–194). Routledge. 10.4324/9781315882741Google Scholar

Li, S. (2015). The associations between language aptitude and second language grammar acquisition: A meta-analytic review of five decades of research. Applied Linguistics, 36, 385–408. https://doi.org/10.1093/applin/amu054Google Scholar

Li, S. (2016). The construct validity of language aptitude. Studies in Second Language Acquisition, 38, 801–842. https://doi.org/10.1017/S027226311500042XGoogle Scholar

Li, S. (2019). Six decades of language aptitude research: A comprehensive and critical review. In Wen, Z. E., Skehan, P., Biedroń, A., Li, S., & Sparks, R. L. (Eds.), Language aptitude: Advancing theory, testing, research and practice (pp. 78–96). Routledge. https://doi.org/10.4324/9781315122021Google Scholar

Li, S. (2022). Explicit and implicit language aptitudes. In Li, S., Hiver, P., & Papi, M. (Eds.), The Routledge handbook of second language acquisition and individual differences (pp. 37–53). Routledge. https://doi.org/10.4324/9781003270546Google Scholar

Li, S. (2023). Working memory and second language writing: A systematic review. Studies in Second Language Acquisition, 45, 647–679. https://doi.org/10.1017/S0272263123000189Google Scholar

Li, S., Ellis, R., & Zhu, Y. (2019). The associations between cognitive ability and L2 development under five different instructional conditions. Applied Psycholinguistics, 40, 693–722. https://doi.org/10.1017/S0142716418000796CrossRef Google Scholar

Li, S., & Zhao, H. (2021). The methodology of the research on language aptitude: A systematic review. Annual Review of Applied Linguistics, 41, 25–54. https://doi.org/10.1017/S0267190520000136Google Scholar

Linck, J. A., Hughes, M. M., Campbell, S. G., Silbert, N. H., Tare, M., Jackson, S. R., … & Doughty, C. J. (2013). Hi‐LAB: A new measure of aptitude for high‐level language proficiency. Language Learning, 63, 530–566. https://doi.org/10.1111/lang.12011Google Scholar

Linck, J. A., Osthus, P., Koeth, J. T., & Bunting, M. F. (2014). Working memory and second language comprehension and production: A meta-analysis. Psychonomic Bulletin & Review, 21, 861–883. https://doi.org/10.3758/s13423-013-0565-2CrossRef Google Scholar PubMed

Long, J. A. (2022). jtools: Analysis and presentation of social scientific data (R package version 2.2.0) [Computer software]. The Comprehensive R Archive Network. https://cran.r-project.org/package=jtools Google Scholar

Martin, K. I., & Ellis, N. C. (2012). The roles of phonological short-term memory and working memory in L2 grammar and vocabulary learning. Studies in Second Language Acquisition, 34, 379–413. https://doi.org/10.1017/S0272263112000125Google Scholar

McCormick, T., & Sanz, C. (2022). Working memory and L2 grammar learning among adults. In Schwieter, J. W. & Wen, Z. E. (Eds.), Working memory and language (pp. 573–592). Cambridge.Google Scholar

Meara, P. (2005). LLAMA language aptitude tests: The manual. Swansea: Lognostics.Google Scholar

Miyake, A. (2001). Individual differences in working memory: Introduction to the special section. Journal of Experimental Psychology: General, 130, 163–168. https://doi.org/10.1037/0096-3445.130.2.163CrossRef Google Scholar

Oswald, F. L., McAbee, S. T., Redick, T. S., & Hambrick, D. Z. (2015). The development of a short domain-general measure of working memory capacity. Behavior Research Methods, 47, 1343–1355. https://doi.org/10.3758/s13428-014-0543-2Google Scholar

Peng, P., Barnes, M., Wang, C., Wang, W., Li, S., Swanson, H. L., … & Tao, S. (2018). A meta-analysis on the relation between reading and working memory. Psychological Bulletin, 144, 48. https://doi.org/10.1037/bul0000124CrossRef Google Scholar PubMed

Pimsleur, P., Reed, D., & Stansfield, C. (2004). Pimsleur Language Aptitude Battery (PLAB). N. Bethesda, MD: Second Language Testing.Google Scholar

Redick, T. S., & Lindsey, D. R. (2013). Complex span and n-back measures of working memory: A meta-analysis. Psychonomic Bulletin & Review, 20, 1102–1113. doi: 10.3758/s13423-013-0453-9CrossRef Google Scholar PubMed

Rai, M. K., Loschky, L. C., & Harris, R. J. (2015). The effects of stress on reading: A comparison of first-language versus intermediate second-language reading comprehension. Journal of Educational Psychology, 107, 348. https://doi.org/10.1037/a0037591Google Scholar

Richmond, L. L., Burnett, L. K., Morrison, A. B. & Ball, B. H. (2022). Performance on the processing portion of complex working memory span tasks is related to working memory capacity estimates. Behavior Research Methods, 54, 780–794.Google Scholar PubMed

Robinson, P. (2002). Learning conditions, aptitude complexes, and SLA: A framework for research and pedagogy. In Robinson, P. (Ed.), Individual differences and instructed language learning (pp. 113–135). John Benjamins. https://doi.org/10.1075/lllt.2.08robCrossRef Google Scholar

Robinson, P. (2005). Aptitude and second language acquisition. Annual Review of Applied Linguistics, 25, 46–73. https://doi.org/10.1017/S0267190505000036Google Scholar

Roehr, K., & Gánem-Gutiérrez, G. A. (2009). The status of metalinguistic knowledge in instructed adult L2 learning. Language Awareness, 18, 165–181. https://doi.org/10.1080/09658410902855854Google Scholar

Sáfár, A., & Kormos, J. (2008). Revisiting problems with foreign language aptitude. International Review of Applied Linguistics in Language Teaching, 46, 113–136. https://doi.org/10.1515/IRAL.2008.005Google Scholar

Sagarra, N. (2017). Longitudinal effects of working memory on L2 grammar and reading abilities. Second Language Research, 33, 341–363. https://doi.org/10.1177/0267658317690577Google Scholar

Saito, K. (2017). Effects of sound, vocabulary, and grammar learning aptitude on adult second language speech attainment in foreign language classrooms. Language Learning, 67, 665–693. https://doi.org/10.1111/lang.12244Google Scholar

Saito, K., Suzukida, Y., & Sun, H. (2019). Aptitude, experience, and second language pronunciation proficiency development in classroom settings: A longitudinal study. Studies in Second Language Acquisition, 41, 201–225. https://doi.org/10.1017/S0272263117000432Google Scholar

Sasaki, M. (1996). Second language proficiency, foreign language aptitude, and intelligence: Quantitative and qualitative analyses. Peter Lang.Google Scholar

Sawyer, M. & Ranta, L. (2001). Aptitude, individual differences, and instructional design. In Robinson, P. (Ed.), Cognition and second language instruction (pp. 319–353). Cambridge.Google Scholar

Schmidt, R. (1993). Awareness and second language acquisition. Annual Review of Applied Linguistics, 13, 206–226. https://doi.org/10.1017/CBO9781139524780CrossRef Google Scholar

Serafini, E. J. (2022). From differential to dynamic: The role of working memory in second language (L2) learning. In Kersten, K. & Winsler, A. (Eds.), Understanding variability in second language acquisition, bilingualism, and cognition (pp. 268–291). Routledge.Google Scholar

Serafini, E. J., & Sanz, C. (2016). Evidence for the decreasing impact of cognitive ability on second language development as proficiency increases. Studies in Second Language Acquisition, 38, 607–646. https://doi.org/10.1017/S0272263115000327Google Scholar

Shin, J. (2020). A meta-analysis of the relationship between working memory and second language reading comprehension: Does task type matter? Applied Psycholinguistics, 41, 873–900. https://doi.org/10.1017/S0142716420000272CrossRef Google Scholar

Shin, J., & Hu, Y. (2022). A methodological synthesis of working memory tasks in L2 research. In Schwieter, J. & Wen, Z. (Eds.), The Cambridge handbook of working memory and language (pp. 722–745). Cambridge.Google Scholar

Shipstead, Z., Harrison, T. L., & Engle, R. W. (2016). Working memory capacity and fluid intelligence: Maintenance and disengagement. Perspectives on Psychological Science, 11, 771–799. https://doi.org/10.1177/1745691616650647CrossRef Google Scholar PubMed

Skehan, P. (2018). Second language task-based performance: Theory, research, assessment. Routledge. https://doi.org/10.4324/9781315629766Google Scholar

Sparks, R. L., Humbach, N., Patton, J. O. N., & Ganschow, L. (2011). Subcomponents of second‐language aptitude and second‐language proficiency. Modern Language Journal, 95, 253–273. https://doi.org/10.1111/j.1540-4781.2011.01176.xCrossRef Google Scholar

Sparks, R., Patton, J., & Luebbers, J. (2019). Individual differences in L2 achievement mirror individual differences in L1 skills and L2 aptitude: Crosslinguistic transfer of L1 to L2 skills. Foreign Language Annals, 52, 255–283. https://doi.org/10.1111/flan.12390CrossRef Google Scholar

Unsworth, N., & Engle, R. W. (2007). On the division of short-term and working memory: an examination of simple and complex span and their relation to higher order abilities. Psychological Bulletin, 133, 1038–1066. https://doi.org/10.1037/0033-2909.133.6.1038Google Scholar PubMed

VanPatten, B. (2004). Processing instruction: Theory, research, and commentary. Routledge. https://doi.org/10.4324/9781410610195CrossRef Google Scholar

Vasylets, O., & Marín, J. (2021). The effects of working memory and L2 proficiency on L2 writing. Journal of Second Language Writing, 52, 100786. https://doi.org/10.1016/j.jslw.2020.100786CrossRef Google Scholar

Was, C. A., Rawson, K. A., Bailey, H., & Dunlosky, J. (2011). Content-embedded tasks beat complex span for predicting comprehension. Behavior Research Methods, 43, 910–915. https://doi.org/10.3758/s13428-011-0112-xCrossRef Google Scholar PubMed

Was, C. A., & Woltz, D. J. (2007). Reexamining the relationship between working memory and comprehension: The role of available long-term memory. Journal of Memory and Language, 56, 86–102. https://doi.org/10.1016/j.jml.2006.07.008Google Scholar

Wen, Z. E. (2016). Working memory and second language learning. In Working Memory and Second Language Learning. Multilingual Matters.Google Scholar

Wen, Z. E., Juffs, A., & Winke, P. (2021). Measuring working memory. In Winke, P. & Brunfaut, T. (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 167–176). Routledge. https://doi.org/10.4324/9781351034784Google Scholar

Yalçin, Ş., Çeçen, S., & Erçetin, G. (2016). The relationship between aptitude and working memory: An instructed SLA context. Language Awareness, 25, 144–158. https://doi.org/10.1080/09658416.2015.1122026CrossRef Google Scholar

Yalçin, Ş., & Spada, N. (2016). Language aptitude and grammatical difficulty: An EFL classroom-based study. Studies in Second Language Acquisition, 38, 239–263. https://doi.org/10.1017/S0272263115000509CrossRef Google Scholar

Yilmaz, Y. (2013). Relative effects of explicit and implicit feedback: The role of working memory capacity and language analytic ability. Applied Linguistics, 34, 344–368. https://doi.org/10.1093/applin/ams044CrossRef Google Scholar

Yilmaz, Y., & Granena, G. (2016). The role of cognitive aptitudes for explicit language learning in the relative effects of explicit and implicit feedback. Bilingualism: Language and Cognition, 19, 147–161. https://doi.org/10.1017/S136672891400090XGoogle Scholar

Yoshimura, Y. (2001). The role of working memory in language aptitude. In Bonch-Bruevich, X., Crawford, W. J., Hellermann, J., Higgins, C., & Nguyen, H. (Eds.), The past, present, and future of second language research (pp.144–163). Cascadilla Press.Google Scholar

Zalbidea, J. (2017). ‘One task fits all’? The roles of task complexity, modality, and working memory capacity in L2 performance. Modern Language Journal, 101, 335–352. https://doi.org/10.1111/modl.12389CrossRef Google Scholar

Zalbidea, J., & Sanz, C. (2020). Does learner cognition count on modality? Working memory and L2 morphosyntactic achievement across oral and written tasks. Applied Psycholinguistics, 41, 1171–1196. https://doi.org/10.1017/S0142716420000442CrossRef Google Scholar

Zamary, A., Rawson, K. A., & Was, C. A. (2019). Do complex span and content-embedded working memory tasks predict unique variance in inductive reasoning? Behavior Research Methods, 51, 2546–2558. https://doi.org/10.3758/s13428-018-1104-xCrossRef Google Scholar PubMed