Introduction
Researchers in the field of second language acquisition (SLA) have long been interested in learners’ individual differences (IDs), and the complex role they play in the second language (L2) learning process. For example, Larsen-Freeman and Long’s (Reference Larsen-Freeman and Long1991) classic text notes that “it is undeniable that important individual differences between language learners exist” (p. 153). A substantial body of L2 research on IDs has amassed, including motivation (e.g., Dörnyei & Kormos, Reference Dörnyei and Kormos2000), L2 anxiety (e.g., Teimouri, Goetze, & Plonsky, Reference Teimouri, Goetze and Plonsky2019), working memory capacity (e.g., Mackey, Adams, Stafford, & Winke, Reference Mackey, Adams, Stafford and Winke2010) and aptitude (e.g. Li, Reference Li2016; Sparks, Reference Sparks2012), often focusing on how these factors might moderate L2 development (DeKeyser, Reference DeKeyser2012; Robinson, Reference Robinson2005). This research includes theoretical, empirical, and meta-analytic studies.
Moving from general SLA work to studies of task-based language teaching (TBLT), an important line of research has focused on how individual differences might help to explain the extent to which learners can benefit from tasks (Awwad & Tavakoli, Reference Awwad and Tavakoli2019; Butler & Zeng, Reference Butler and Zeng2014; Kim et al., Reference Kim, Payant and Pearson2015; Sato & McDonough, Reference Sato and McDonough2020). Researchers have used a variety of methods and techniques to understand the impact of IDs on task-based interaction and learning, ranging from assessments and interviews to questionnaires and stimulated recalls, amongst others. The current paper presents a methodological review of practices used by researchers studying learner IDs in task-based language learning, with a detailed analysis of what emerged as the top five most frequently investigated IDs in TBLT research to date. We pay particular attention to the instruments, coding, analyses, and reporting practices utilized by researchers in this area, with goals of surveying the domains that have been of greatest interest to researchers, providing empirically-grounded methodological guidance, and highlighting potential avenues for further investigation.
Literature review
The goal of this paper is to examine how IDs are studied within task-based research. Most task-based researchers agree that a task can be broadly defined as an activity with a communicative purpose and a non-linguistic outcome (Ellis, Reference Ellis2018; Long, Reference Long2015; Mackey, Reference Mackey2020a). Task-based approaches in the literature vary, including models that follow a pre-task, post-task sequence (Ellis, Reference Ellis2003), those that are based on a task cycle with an element of focus on form (Willis, Reference Willis1996), and those that follow a sequence of pedagogic tasks approximating real-life target tasks (Long, Reference Long2015). Regardless of the approach, task-based researchers and practitioners are interested in how tasks facilitate the kinds of negotiation for meaning and interaction known to support successful SLA (Gass & Mackey, Reference Gass and Mackey2006; Mackey, Reference Mackey2020a). Researchers are often also interested in how manipulating specific task-related variables impacts linguistic and non-linguistic outcomes. These variables include increasing the cognitive complexity of the task (e.g., Robinson, Reference Robinson2011a), repeating the task (e.g., Bygate, Reference Bygate2018, Mackey, Reference Mackey1999), or offering planning time (e.g., Bygate & Samuda, Reference Bygate, Samuda and Ellis2005). In addition to pedagogic uses, tasks are also used as tools for eliciting oral or written L2 production in empirical SLA investigations (e.g., Housen et al., Reference Housen, Kuiken and Vedder2012; Yousefi, Reference Yousefi2016).
Research on individual differences
A subset of the research into tasks and second language learning investigates how individual differences among learners might mediate task outcomes and processes. Following Li et al. (Reference Li, Hiver and Papi2022) and Ortega (Reference Ortega2009), individual differences can be broadly categorized into four groups: cognitive (e.g., aptitude), conative (e.g., motivation), affective (e.g., anxiety), and demographic (e.g., age) differences. IDs are generally conceptualized as learner-internal factors, either fixed or changeable, that can affect the process and/or products of second language acquisition and may be mediated by the environment. IDs have been investigated within learners as well as for other interlocutors like teachers (e.g., Bryfonski, Reference Bryfonski2021) and non-teachers (Gurzynski-Weiss & Plonsky, Reference Gurzynski-Weiss, Plonsky and Gurzynski-Weiss2017). However, a few ID variables have garnered sustained attention by second language acquisition researchers for decades: aptitude, working memory, cognitive creativity, motivation, and anxiety.
Aptitude has generally been used to mean cognitive abilities that are posited to be predictive of speed, efficiency, and success in terms of language learning. Carroll’s classic (Reference Carroll and Diller1981) definition claims “an individual’s initial state of readiness and capacity for learning a foreign language, and probable facility in doing so given the presence of motivation and opportunity” (p. 86). Aptitude has been a topic of research interest since at least the 1950s (Gass & Mackey, Reference Gass and Mackey2012; Skehan, Reference Skehan2015). Aptitude has been measured in a number of ways, and, as our own analysis suggests, researchers tend to believe that there is not one single aptitude factor. For example, some scholars view working memory as a subset of aptitude (e.g., Wen, Reference Wen2016). Studies that have discussed or measured aptitude and tasks in some way include Yilmaz and Granena (Reference Yilmaz and Granena2015), with overviews in Dörnyei and Skehan, (Reference Dörnyei, Skehan, Doughty and Long2003), Skehan (Reference Skehan2015), and Wen et al. (Reference Wen, Biedroń and Skehan2017) raising interesting ongoing questions that should be addressed by more research in this area. Aptitude has been the topic of a great deal of interest in the general SLA literature, with theoretical, empirical, and synthetic papers, including a comprehensive and critical synthesis of the methods utilized in studies of aptitude in second language (L2) learning by Li and Zhao (Reference Li and Zhao2021).
Working memory capacity is another cognitive area where learners differ. Working memory involves not only storage capacity or what we usually think of when we hear the term “memory” but also processing, which is what is meant by the word “working,” in other words, doing something. In an early study in this area, Mackey et al. (Reference Mackey, Adams, Stafford and Winke2010) looked at the relationship between working memory and output, concluding that individuals with greater working memory capacity produced more modified output in L2 Spanish interaction. Other studies carried out by Kim et al. (Reference Kim, Payant and Pearson2015), Révész (Reference Révész2012), Sagarra (Reference Sagarra and Han2007), Trofimovich et al. (Reference Trofimovich, Ammar, Gatbonton and Mackey2007), and Yilmaz and Sağdıç (Reference Yilmaz and Sağdıç2019) all point to the fact that working memory capacity is associated with learners’ development of the target language and mediated by other learner-external factors such as task complexity and feedback type. In terms of how we assess working memory, most tests originate from research in cognitive psychology, with three that are commonly used in SLA being operation span, counting span, and sentence span (for more information, see Gass et al., Reference Gass, Behney and Plonsky2020).
Differences in learners’ levels of cognitive creativity typically involve looking at constructs like originality, elaboration, flexibility, and fluency. Early studies involving cognitive creativity and task performance were carried out by Albert and Kormos (Reference Albert and Kormos2004, Reference Albert and Kormos2011) who demonstrated a relationship between creativity and performance on an L2 narrative task. McDonough et al. (Reference McDonough, Crawford and Mackey2015) also showed that creativity was associated with the use of questions and coordination in a group problem solving task, and Suzuki et al. (Reference Suzuki, Yasuda, Hanzawa and Kormos2022) demonstrated a close relationship between creativity and the discourse of speaking tasks. Pipes (Reference Pipes2023) provides a helpful overview of research and practice in this area.
A commonly studied conative variable that differs by individual is motivation, which is often seen as how much active, personal involvement in L2 learning there is, as well as how long learners persevere and maintain L2 skills (e.g., Dörnyei, Reference Dörnyei, Dörnyei and Ushioda2009b). One of the earliest studied individual differences in L2 research (e.g., Larsen-Freeman & Long, Reference Larsen-Freeman and Long1991), motivation has grown dramatically recently with ~277,000 citations in Google Scholar for “motivation in second language acquisition” in the last 10 years, compared with ~74,000 in the 10 years prior. Dörnyei’s (Reference Dörnyei2005) highly influential theory of the L2 motivational self-system upended traditional frameworks of motivation and inspired many later studies to investigate motivational thinking as part of learner psychology, concepts of self, and identity. Meta-analytic research (Al-Hoorie, Reference Al-Hoorie2018; Yousefi & Mahmoodi, Reference Yousefi and Mahmoodi2022) investigating the L2 motivational self-system has tied motivation to learners’ subjective intended effort, underscoring the importance of motivation as an ID in L2 learning. More recently, Leeming and Harris (Reference Leeming and Harris2022) have called for using Self-Determination Theory to understand the motivational benefits of tasks within a TBLT framework.
Finally, anxiety, one of the most extensively researched affective factors, has also been shown to vary amongst individual second language learners. What has often been termed “foreign language anxiety” concerns three related performance anxieties: communication apprehension, test anxiety, and fear of negative evaluation (Horwitz et al., Reference Horwitz, Horwitz and Cope1986). Anxiety can be dynamic, fluctuating throughout tasks that might be associated with changes in linguistic performance (see, for example, Bashori et al., Reference Bashori, Van Hout, Strik and Cucchiarini2022; Papi & Khajavy, Reference Papi and Khajavy2023). Early research in L2 learning posited optimal levels of anxiety (which introspective measures suggest might be related to tasks and interlocutors) where language learning could be enhanced versus negative levels, which were assumed to be associated with impending anxiety. Baralt and Gurzynski-Weiss (Reference Baralt and Gurzynski-Weiss2011) compared learners’ state anxiety during task-based interaction in computer-mediated and face-to-face communication, finding learners’ reported state anxiety to be comparable across modalities. Current research on anxiety has explored the construct from the perspective of complex dynamic system theory, motivating researchers to delve into the very sources that drive the dynamic nature of anxiety (Papi & Khajavy, Reference Papi and Khajavy2023). This also encourages practitioners to design pedagogical interventions that may help learners manage anxiety more efficiently.
Syntheses in task-based L2 research
We now turn to our methodological synthesis of current practices in task-based research that has investigated learner IDs. Our general approach follows that used by earlier synthetic research (e.g., Plonsky & Kim, Reference Plonsky and Kim2016) in that we review substantive and methodological features rather than quantitatively synthesize effect sizes. Prior TBLT meta-analyses have examined the extent to which task-based interaction facilitates the acquisition of grammatical and lexical knowledge by synthesizing effect sizes (Cobb, Reference Cobb2010; Keck et al., Reference Keck, Iberri-Shea, Tracy-Ventura, Wa-Mbaleka, Norris and Ortega2006; Mackey & Goo, Reference Mackey, Goo and Mackey2007). Mackey and Goo (Reference Mackey, Goo and Mackey2007) investigated how different task and design features mediated interaction-driven learning, as well as whether the effects of task-based interaction were durable over time. Ziegler (Reference Ziegler2016) examined methodological features of task-based interaction research by investigating the context of the interaction focusing on computer-mediated communication (CMC) versus face-to-face (FTF) interaction. She found only a small difference between CMC and FTF interaction, favoring CMC for productive measures, but she cautioned about the stability of the finding due to the lack of delayed posttests in the primary studies.
Other meta-analyses have investigated specific task-based features and variables such as Jackson and Suethanapornkul’s (Reference Jackson and Suethanapornkul2013) examination of nine studies testing Robinson’s Cognition Hypothesis (Robinson, Reference Robinson2001), which resulted in a small but positive effect for accuracy but not fluency when complexity was increased along resource-directing dimensions. Sasayama et al. (Reference Sasayama, Malicka, Norris, Wen and Ahmadian2018) subsequently updated the finding that increasing task complexity by manipulating the tense needed to complete tasks (“here and now” versus “there and then”) led to greater syntactic complexity whereas manipulating complexity by the number of elements or reasoning demands led to greater lexical complexity (also see Révész, Reference Révész2009).
While these meta-analyses examined task-based L2 outcomes, other meta-analytic work has examined TBLT from a programmatic perspective. For example, a meta-analysis by Cobb (Reference Cobb2010) built on work investigating task-based interaction (e.g., Mackey & Goo, Reference Mackey, Goo and Mackey2007) by looking at 15 studies of learners performing oral communication tasks, finding differences on outcome measures that examined grammatical knowledge. Another programmatic-based meta-analysis by Bryfonski and McKay (Reference Bryfonski and McKay2017) examined 52 studies of longitudinal implementation of TBLT (as defined by primary authors), finding a positive effect for task-based approaches for a variety of learning outcomes as well as positive qualitative stakeholder perceptions.
Finally, there has been methodological work, including syntheses of TBLT research focusing on substantive rather than statistical findings, and methodological choices made by primary authors. Plonsky and Brown (Reference Plonsky and Brown2015), for example, meta-analyzed 18 meta-analyses of corrective feedback (focusing on its role as a key element in interaction-based tasks), finding the domain definitions caused each meta-analysis to draw different conclusions. Plonsky and Kim (Reference Plonsky and Kim2016) examined the substantive and methodological features of task-based learner production research. They analyzed 85 primary studies from 2006 to 2015, concluding, interestingly, that task-based researchers showed a preference for investigations of grammar, vocabulary, accuracy, and interaction with much less focus on pronunciation, pragmatics, and task performance work. In summary, while syntheses of TBLT research to date have reviewed prior studies with a focus on various methodological practices and findings, no studies have yet targeted the role of individual differences in task-based research, which is the goal of the current paper.
Motivation for the study
Given the ongoing interest in both individual differences as they relate to task-based language learning and teaching, and the focus on understanding methodological choices, the current study was guided by the following questions:
-
1) What are the demographic features of recent task-based research that investigated individual differences?
-
2) What kinds of individual differences have been investigated in recent task-based research?
-
3) How have individual differences been operationalized and measured in recent task-based research?
-
4) What sorts of analyses and reporting practices are most commonly seen in recent task-based research that focuses on individual differences?
Method
To answer these research questions, we carried out a substantive and methodological review, meaning that rather than synthesizing effect sizes (e.g., Cohen’s d, r) from the outcomes of quantitative studies, we systematically examined features of prior research. In doing this, we follow best practices in meta-analytic research recommended by a number of researchers (including, Mackey, Reference Mackey2020b; Norris & Ortega, Reference Norris and Ortega2006; Plonsky & Oswald, Reference Plonsky, Oswald and Plonsky2015) and prior methodological synthesis (e.g. Plonsky & Kim, Reference Plonsky and Kim2016; Plonsky & Oswald, Reference Plonsky, Oswald and Plonsky2015; Plonsky et al., Reference Plonsky, Marsden, Crowther, Gass and Spinner2020) in the domain of TBLT.
Inclusion and exclusion criteria
To systematically sample prior task-based research that has examined learners’ individual differences, we applied the following inclusion and exclusion criteria. The first defining characteristic of included studies was a focus on individual differences in the domain of TBLT.
We took an inclusive perspective on individual difference variables, operationalized from top-down and bottom-up perspectives. Top-down perspectives included the individual differences that commonly appear in texts on tasks and have long histories of being studied in the field (e.g., aptitude and working memory). Bottom-up perspectives included individual differences that emerged from our grounded coding on what types of individual difference variables were included in TBLT studies. Any learner-internal variables that mediated the processes and/or outcomes of second language acquisition were included. Exclusion criteria ruled out studies from non-task-based perspectives, for example, studies that examined individual differences but used linguistic tests like Grammaticality Judgement Tasks (e.g., Yilmaz & Granena, Reference Yilmaz and Granena2019) without tasks being a focus. Also excluded were studies that examined TBLT from non-learner perspectives, such as studies that explored teachers’ individual differences (e.g., Bryfonski, Reference Bryfonski2021), or individual differences that were not examined in light of task-based interventions, implementations, or interactions.
We adopted a similar broad operationalization of both individual differences and TBLT, including, for example, studies that examined TBLT from the perspective of learners’ needs, pedagogic tasks approximating target tasks (Long, Reference Long2015), task-supported language teaching (as in Ellis et al., Reference Ellis, Skehan, Li, Shintani and Lambert2020), task cycles (as in Willis, Reference Willis1996) and/or pre-, during- and post-tasks (as in Ellis, Reference Ellis2003; Reference Ellis2018). We included quantitative studies that utilized tasks to examine L2 production or outcome data (e.g. Complexity, Accuracy, Fluency/Complexity, Accuracy, Lexis, and Fluency (CAF/CALF; Bui & Skehan, Reference Bui and Skehan2018; Housen et al., Reference Housen, Kuiken and Vedder2012; Skehan, Reference Skehan1989) measures, oral or written measures), as well as qualitative studies of learners’ perceptions of TBLT and task-based interaction.
Following prior task-based methodological syntheses, we included only published peer-reviewed journal articles, meaning we excluded dissertations, theses, book chapters, conference presentations, and all types of unpublished research.
In statistical meta-analyses, methodologists typically recommend an inclusive approach to avoid publication bias. In other words, only including published studies may lead to positively skewed effect sizes due to the bias for statistically significant findings in academic publishing. However, in the meta-synthesis reported here, we aimed to systematically describe the popular areas, methods, and practices, rather than aggregate statistical effects (see, for example, a similar decision and motivation by Li and Zhao, Reference Li and Zhao2021). So, while book chapters and unpublished work such as theses and doctoral dissertations offer valuable contributions to the field, journal articles tend to have greater visibility and impact in terms of readership, and so we believe they reflect the most current areas of inquiry in this domain, and unpublished, non-referred work can be safely excluded for the purpose of this study. Finally, to limit the scope of our search to only recent, accessible research, we only included studies published between 2000 and 2023, where we expected to see the most growth and interest in IDs in task-based research at the time this study was written. We had to exclude studies that were not available in English as they were not accessible to us. A full list of synthesized studies is available at iris-database.org.
Search techniques
To access the relevant body of literature, four databases were reviewed: Linguistics and Language Behavior Abstracts (LLBA), Google Scholar, Educational Resources Information Center (ERIC), and Web of Science. We utilized the following terms in various combinations to search these databases: “task-based language teaching,” “TBLT,” “task supported,” “task- based,” “language learning,” and “individual differences.” We then cross-checked our list against articles recently published in eight journals that publish research related to our research questions: Applied Linguistics, Language Learning, Language Teaching Research, the Modern Language Journal, Studies in Second Language Acquisition, System, TASK Journal, TESOL Quarterly, Language Learning & Technology (LLT), the Annual Review of Applied Linguistics (ARAL), and Computer Assisted Language Instruction Consortium (CALICO). We also examined review articles relevant to our research questions (Chong & Reinders, Reference Chong and Reinders2020; Donate, Reference Donate2022; Ehrman et al., Reference Ehrman, Leaver and Oxford2003; Li & Zhao, Reference Li and Zhao2021; Nikolov, & Djigunović, Reference Nikolov and Djigunović2006; Roberts, Reference Roberts2012; Robinson, Reference Robinson and Robinson2011b; Smith & González-Lloret, Reference Smith and González-Lloret2021) and cross-checked the reference sections against the results from our database searches.
The total studies retrieved from the databases included 323 possible candidates for inclusion, with 133 studies being ultimately selected based on the inclusion and exclusion criteria discussed above. During the coding process, nine studies that were previously included via the criteria described above were found to be outside the scope of the study (e.g., because they did not use tasks as defined by any of the common standards outlined above) and were excluded. This resulted in a total sample of 133 studies included, contributing 135 unique samples. While we believe our sample paints an accurate and current picture of the domain of ID research in TBLT, of course, we do not believe or claim it is exhaustive. Other search terms, backwards-citation checks, a wider range of journals, and/or larger databases could all have uncovered additional studies. Our lack of time, space, and resources to examine literature not printed in English is also a limitation. Despite these shortcomings, given that we did manage to identify what we view as a substantial sample of included studies, spanning a range of timeframes and journals, we took the sample as sufficiently representative to proceed with the analysis, as shown in Tables 1 and 2.
Note: Table 2 only includes journals that contributed more than one unique sample. All other journals included in this study contributed only one study to the sample.
Table 1 shows that most of the included studies (88.15%) were implemented from 2012 to 2023 while only a few of them (11.85 %) were conducted before 2011.
Coding and analysis
To synthesize the relevant characteristics of the included studies, a coding scheme was developed to extract data from the following key areas: general study characteristics (journal, year, etc.), study context characteristics (country, language, modality, etc.), study participant characteristics (L1s, TLs, learner proficiency levels, etc.), research variables under investigation (IDs, dependent variables, etc.), task and design characteristics (task types, implementations, etc.), ID instrument characteristics (methods), statistical analyses (if applicable), coding methods, and open science practices. These characteristics and coding methods are illustrated in Table 3, with the full coding scheme and data set being available for download on IRIS (iris-database.org). To ensure the coding scheme would effectively obtain the characteristics listed above for our area of interest, the scheme was subjected to pilot and revision coding. The coding scheme was revised and refined before being utilized with the full sample of included studies. We then conducted inter-coder reliability testing. Two coders first discussed the coding scheme together and then independently coded 10 sample studies. The results from those 10 samples were then compared to ensure similar coverage for each coded category. Given the low-inference nature of the coding scheme, the coders achieved 91% agreement after their first meeting (with disagreements in seven categories). To resolve these coding discrepancies, which were mainly in the areas of context of the study (foreign versus second language) and statistical tests used, the ratings from a third coder were used, and the first two coders discussed and agreed upon how to code the disagreed upon data going forward. A second round of interrater reliability was then conducted to ensure reliability of the disagreed upon categories going forward. Two raters coded five additional studies from the sample. Once 100% rating agreement was achieved, the remainder of the studies were split up between two raters.
* CMC= computer mediated communication; F2F = face to face; SEM = structural equation modeling; TL = target language
In terms of analysis, the features listed in Table 3 that were based on categorical coding were analyzed using frequencies and percentages. For continuous data such as n sizes, treatment lengths, and number of tests conducted, we examined measures of central tendency and dispersion. For all open-ended items, we collapsed categories where possible and again analyzed them using frequencies and percentages.
Results
RQ1: The Demographic Features of the Recent Task-based Research
Demographics of the sample
The studies we analyzed included 9433 participants with an average n size per study of 70 and a range of 6 to 612 participants.
Context
As illustrated in Table 4, our analysis showed that the studies mainly focused on students learning languages in foreign language settings (89.63%), where they had relatively limited access to the target language. Also, the majority of studies were lab-based (62.22%) versus classroom-based studies (37.03%). As documented in studies of trends in applied linguistics research (e.g., Andringa & Godfroid, Reference Andringa and Godfroid2020), the majority of studies took place in university contexts (71.85%), followed by language institutes (17.78%), with a relatively small percentage of studies taking place at the secondary (9.63%) or elementary school level (7.41%). Finally, most studies in our sample were conducted in face-to-face modes (85.93%), with the sample also representing a few (k = 19) computer-mediated settings.
* Percentages do not always add up to 100 because some studies met multiple criteria
Participants
Examining the participants within the included studies, we found the majority (43.7%) of participants were rated as intermediate level, non-heritage (94.07%) language learners as illustrated in Table 5. Note that percentages do not add up to 100 because some studies met multiple criteria. The L1 backgrounds of the learners in this sample were varied, with 17.78% of studies examining learners from a mix of L1 backgrounds and a significant portion of the studies (23.70%) not reporting the L1 backgrounds of the learners. This is because we took a strict coding approach to L1 background; for example, when authors described participants as “Chinese learners of English” we did not assume an L1 background of Mandarin (given that, to take just one example, there are hundreds of recognized languages in China, with Mandarin and Cantonese being the two most commonly spoken). For a clearer picture of the range of world regions represented by the included studies, we plotted the setting where the study took place in Figure 1.
In summary, in keeping with previously described trends in applied linguistics research, the majority of the studies we analyzed investigated the learning of English (85.19%) as opposed to other L2s. After English, the only other TLs investigated were Spanish (8.89%), Korean (2.22%), Mandarin (1.48%), German (1.48%), French (0.74%) and Russian (0.74%).
RQ 2: Types of Individual Differences in Recent Task-based Research
To answer Research Question 2: “What kinds of individual differences have been investigated in recent task-based research?” in the included studies, we identified 30 individual differences being studied. We examined both the independent and dependent variables (where applicable) in each included study. For the majority of studies, the independent variables were the individual differences examined in relation to a variety of dependent variables that were typically outcome variables (e.g., anxiety, aptitude, cognitive style, creativity, gender, motivation, personality, prior knowledge, proficiency, and working memory). However, in some cases, IDs also emerged as dependent variables. This is especially the case in motivation research, which often examines the impact of various task manipulations on motivation as an outcome.
The most commonly examined ID was motivation, closely followed by working memory and L2 proficiency. Anxiety, aptitude, gender, prior knowledge, and learner interests were also commonly examined. These findings point to the variety of sub-areas of interest within task-based research, although some of the IDs identified, as illustrated in Table 6, represent overlapping constructs. For example, working memory is often examined as a sub-construct of aptitude. For the purposes of the study reported in this chapter, we coded based on the terms as they were used by primary authors.
RQ 3: Operationalization and Measurement of Individual Differences
To answer Research Question 3, “How have individual differences been operationalized and measured in recent task-based research?”, we examined the sorts of instruments used to elicit or measure each of the IDs previously identified to gain insight into how these constructs were operationalized in task-based research. Due to space constraints, this study presents only the five most commonly examined ID variables but the full dataset is available on IRIS (iris-database.org) together with operationalizations and methods for the less commonly examined ID variables.
As noted in relation to Research Question 2 above, the most common ID investigated in the included studies was motivation (30 of 135 studies, or 22.22%). This could be an artifact of time, as motivation research was one of the first individual difference variables to be investigated in L2 research (Larsen-Freeman & Long, Reference Larsen-Freeman and Long1991). Researchers investigating motivation mainly did so through the use of questionnaires (93.33%) as presented in Table 7. Authors adapted their questionnaires from a variety of pre-existing sources, citing instruments described in Boekaerts (Reference Boekaerts2002), Clément et al. (Reference Clément, Dörnyei and Noels1994), Gardner (Reference Gardner1985), Lam and Law (Reference Lam and Law2007), Martin et al. (Reference Martin, Myers and Mottet1999), Pietri (Reference Pietri2015), Pyun et al. (Reference Pyun, Kim, Cho and Lee2014), Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009), and Troia et al. (Reference Troia, Harbaugh, Shankland, Wolbers and Lawrence2012), amongst others. Gardner’s (Reference Gardner1985) Attitudes Motivation Test Battery and the questionnaire assessing trait-based L2 regulatory focus from Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) were the only materials of this kind to appear in more than one study each. A variety of studies created questionnaires specifically tailored to the study or tasks utilized in the classroom. For example, Torres and Serafini (Reference Torres and Serafini2016) developed a questionnaire consisting of items related to learners’ persistence with the task, interest in the activities, and satisfaction with their performance. Other methods of elicitation included journal entries (Sampson, Reference Sampson2012), thermometer ratings (Azkarai & Kopinska, Reference Azkarai and Kopinska2020), and interviews (Ruan et al., Reference Ruan, Duan and Du2015).
Six of the motivation studies examined how learners’ motivational profiles impacted their L2 production during or after task performance as measured by CALF (e.g., Han & McDonough, Reference Han and McDonough2021). Ten studies examined how various task manipulations or conditions were related to learners’ motivation (e.g., Torres & Serafini, Reference Torres and Serafini2016). For example, five out of those ten studies examined the relationship between motivation and task complexity, five examined motivation across task types or conditions, and one examined motivation and task repetition. Some of these studies also assessed motivation in conjunction with other IDs such as anxiety, attitudes, task engagement, interest, and proficiency. Studies of how TBLT is mediated by motivation, then, clearly represent rich and interesting areas.
Working memory was the second most commonly investigated ID in task-based research (17.78% of studies, as shown in Table 8). All studies that investigated working memory utilized some form of a memory span task, which can be loosely operationalized as the longest list of items (words, digits, sounds, etc.) a participant can recall. The most commonly used were operation-span tasks (41.67%), where participants complete math problems, and reading span tasks (29.17%), where participants are asked to read sentences and remember the final word. Studies cited classic reading span tasks by Daneman and Carpenter (Reference Daneman and Carpenter1980) and the speaking-span version (Daneman & Green, Reference Daneman and Green1986). Authors also utilized reading span adaptations for other languages such as for Hungarian (Révész, Reference Révész2012) and Farsi (Shahnazari, Reference Shahnazari2013). For spatial working memory tasks, authors implemented forward Corsi block-tapping tasks (Zalbidea & Sanz, Reference Zalbidea and Sanz2020) or online spatial tasks such as Blockspan and Shapebuilder (Nielson, Reference Nielson2014), both of which ask participants to remember and reproduce flashing or multi-colored shapes in a grid. Several studies note the drawbacks of classic reading-span and listening-span tasks such as Daneman and Carpenter’s (Reference Daneman and Carpenter1980) for learners who might be asked to complete the tasks in their L2, as justification for using other types of non-language working memory tasks such as spatial memory tasks. The majority of TBLT studies involving working memory (54.17%) investigated the impact of working memory on some dimension of task performance (as measured by CAF/CALF). Five of the included studies investigated the relationship between working memory and corrective feedback during task-based interactions (Goo, Reference Goo2012; Kim et al., Reference Kim, Payant and Pearson2015; Lai et al., Reference Lai, Fei and Roots2008; Liao & Zhang, Reference Liao and Zhang2022; Révész, Reference Révész2012), and one investigated the production of modified output following corrective feedback (Mackey et al., Reference Mackey, Adams, Stafford and Winke2010).
The next most commonly investigated ID in task-based research was L2 proficiency (17.78%). The issue of operationalizing L2 proficiency, namely that it is often not clearly operationalized in applied linguistics research, has been discussed extensively in the literature (see for example, Bachman and Clark’s (Reference Bachman and Clark1987), early work as well as Malovrh and Benati’s (Reference Malovrh and Benati2018) and Park et al.’s (Reference Park, Solon, Dehghan-Chaleshtori and Ghanbar2022) more recent contributions). While it is a frequently used outcome variable in L2 research, we are conceptualizing proficiency as an ID in the current study due to its routine use as an internal mediator of task effects in TBLT research.
We found that studies in task-based research also use a variety of methods to operationalize L2 proficiency (see Table 9). The primary studies we investigated examined the extent to which L2 proficiency mediated L2 outcomes based on a variety of task-related variables such as task complexity (e.g., Awwad & Tavakoli, Reference Awwad and Tavakoli2019; Ghahdarijani, Reference Ghahdarijani2012; Kim, Reference Kim2011; Xu & Fan, Reference Xu and Fan2021), pre-task planning (e.g., Bui, Reference Bui2019) and task type (e.g., oral vs. written, Kim, Reference Kim2011; or receptive vs. productive, Zareinajad et al., Reference Zareinajad, Rezaei and Shokrpour2015). Studies that investigated L2 proficiency as an ID utilized outcome measures such as CAF (25% of the proficiency studies), listening comprehension (8.33%), interaction/discourse patterns (4.17%; Butler & Zeng, Reference Butler and Zeng2014), vocabulary development (8.33%; Kim, Reference Kim2011), how often learners noticed others’ errors (4.17%; Sato & McDonough, Reference Sato and McDonough2020), and learners’ awareness of L2 pragmalinguistic features (4.17%; Takahashi, Reference Takahashi2005). To operationalize L2 proficiency, authors utilized the instruments identified in Table 10. The most common assessment was a standardized TOEFL test (20.83%). Other frequently used assessments included enrollment status in a particular grade (Butler & Zeng, Reference Butler and Zeng2014) or class (Kim, Reference Kim2011) and C-tests (e.g., Dörnyei & Kormos, Reference Dörnyei and Kormos2000; Monteiro & Kim, Reference Monteiro and Kim2020).
The next most commonly examined ID was anxiety (11.85%, see table 10). All of the included studies utilized questionnaires to measure anxiety. One study (Wang et al., Reference Wang, East and Li2021) also included semi-structured and stimulated recalls (Gass & Mackey, Reference Gass and Mackey2016) to formulate a subsequently developed anxiety questionnaire. Each of the studies utilized or adapted their anxiety questionnaire from a different source, with sources including: the Foreign Language Classroom Anxiety Scale (Horwitz et al., Reference Horwitz, Horwitz and Cope1986), Abolghasemi’s Test Anxiety Inventory (Abolghasemi et al.,Reference Abolghasemi, Asadi Moghaddam, Najarian and Shokrkon1996), Brunfaut and Révész (Reference Brunfaut and Révész2015), which was adapted from the Foreign Language Listening Anxiety Scale (Elkhafaifi, Reference Elkhafaifi2005), Second Language Writing Anxiety Inventory, (Cheng, Reference Cheng2004), MacIntyre, and Gardner (Reference MacIntyre and Gardner1994), A self-perceived communication competence scale (McCroskey, & McCroskey, Reference McCroskey and McCroskey1988), Pyun et al. (Reference Pyun, Kim, Cho and Lee2014), Robinson (Reference Robinson2001), and Yashima (Reference Yashima2002). The Horwitz et al. (Reference Horwitz, Horwitz and Cope1986) scale was identified as the most commonly used instrument to measure anxiety in general L2 research in Teimouri et al.’s (Reference Teimouri, Goetze and Plonsky2019) meta-analysis of L2 anxiety and achievement. However, in our sub-set of task-based studies, we found a wider range of approaches being implemented.
In these studies, 37.50% utilized CAF as an outcome measure, while one study utilized listening comprehension assessments (Ghahdarijani, Reference Ghahdarijani2012), and one examined the quantity and quality of interactions (Révész, Reference Révész2011). Six of the studies examined anxiety in conjunction with other IDs such as task motivation (Mahdavirad, Reference Mahdavirad2017; Wang et al., Reference Wang, East and Li2021), attitudes (Pyun, Reference Pyun2013), and willingness to communicate (van de Guchte et al., Reference van de Guchte, van Batenburg and van Weijen2022). Researchers also examined how task complexity (56.25%) or task repetition (6.25%) was related to anxiety during task-based interventions.
Aptitude was the fifth most commonly investigated ID in task-based research (6.67%). Many studies that investigated aptitude (44.44% of them) utilized CALF as the outcome measure. The Modern Language Aptitude Test (MLAT; Carroll & Sapon, Reference Carroll and Sapon1959) was the most commonly used method of operationalizing language aptitude in these studies followed by the LLAMA aptitude tests (Kourtali & Révész, Reference Kourtali and Révész2020; Monteiro & Kim, Reference Monteiro and Kim2020) and Pimsleur’s Language Aptitude Battery (Kormos & Trebits, Reference Kormos and Trebits2012; Li et al., Reference Li, Ellis and Zhu2019). However, two other aptitude tests were also utilized by task-based researchers in our sample: the Hungarian Language Aptitude test and the Oxford Language Aptitude test (see Table 11).
Researchers investigating aptitude in TBLT did so by examining the relationship between manipulating task complexity and aptitude (44%, all but one manipulated reasoning demands), planning time (22%), or task type (oral vs. written modes, 11%; picture description vs. narrative tasks, 11%).
RQ 4: analyses and Reporting Practices in Recent Task-Based Research
Finally, to answer Research Question 4, “What sorts of analyses and reporting practices are most commonly seen in recent task-based research that focuses on individual differences?”, we first looked at the study designs. We found that the majority of the research was quantitative (72.59%) or mixed methods (23.70%), with the rest being qualitative (3.0%) as illustrated in Table 12. Thirty-nine (28.89%) of the studies were longitudinal, and twenty-eight (20.74%) tracked changes over time using pre/post and/or immediate and/or delayed posttests, although only ten (7.41% of the sample) utilized delayed posttests. On average, the length of treatment in the longitudinal studies was 10 weeks, ranging from one to 40 weeks. More studies utilized oral tasks (67.41%) than written tasks (36.30%); however, both were well represented in the sample. Over a third of the studies utilized some form of CAF measures to examine L2 outcomes.
We next examined the most commonly implemented statistical analyses and coding practices of the included studies. More than a third (37.69%) of the quantitative studies in our sample utilized more than 10 statistical tests per study whereas 7.69% of the included studies ran no statistical tests at all. Around half of the studies (55.38%) ran fewer than 10 statistical tests. Most studies utilized frequencies and percentages (54.81%) followed by correlations (37.04%), t-tests (28.89%), and ANOVAs (25.19%) as demonstrated in Table 13. These are slightly different findings for our study than those reported in previous syntheses. In other words, the findings we report here for task-based ID research are not always the same as findings presented in prior methodological syntheses of task research. For example, Plonsky and Kim (Reference Plonsky and Kim2016) found that in task-based learner production studies, ANOVA was the most common test utilized by researchers.
Finally, we examined the sorts of open science practices implemented by authors of included studies. Forty-nine (36.00%) studies made their full tasks available in an appendix or an online repository (IRIS, iris-database.org or The Task Bank, tblt.indiana.edu). Thirty-nine studies (29.00%) made other instruments (such as background questionnaires) available on IRIS. In other words, 74 of 135 studies (54.81%) did not make any tasks or instruments available. Seven studies made their full datasets available, and two studies acknowledged receiving badges for open science. This might be because open science practices have increased in recent years but were seldom practiced in the earlier period for which we collected studies (see Figure 2).
Discussion
Our research provides an overview of the range of IDs investigated in recent, peer reviewed TBLT research along with information about how they are being investigated. We found that this domain of research is growing in popularity, with relatively few articles in this domain published in the early 2000s, up to nearly 10 per year in the 2010s and 15 per year in the 2020s. Our analysis shows that researchers are interested in a diverse array of IDs with motivation, working memory, L2 proficiency, anxiety, and aptitude standing out as the most commonly researched. This finding aligns with interest in L2 research in general where these IDs have robust enough empirical histories to have all been the subjects of other meta-analyses, for example there are prior meta-analyses on motivation (Al-Hoorie, Reference Al-Hoorie2018), working memory (Shin, Reference Shin2020), anxiety (Teimouri et al., Reference Teimouri, Goetze and Plonsky2019), and aptitude (Li, Reference Li2016), among others. More than 20 IDs emerged from our analysis, meaning there is ample room for more work in various domains of task-based ID research. Interestingly, ten IDs only appeared in one study each: emotional intelligence, heritage identity, interaction mindset, L1 fluency, multiple intelligences, tolerance of ambiguity, risk-taking, emotions, L2 self-system, and metacognitive strategies. This may be due to the fact that some of these IDs can be linked or subsumed into other IDs. For example, L2 risk-taking has been tied to specific domains of personality (Brown, Reference Brown2000; Pyun et al., Reference Pyun, Kim, Cho and Lee2014). These less commonly investigated IDs point to future potential avenues where task-based ID research might progress.
Our methodological synthesis also uncovered that researchers of the most commonly investigated domains of task-based ID research tend to rely on the same methodological tools. For example, the majority of studies investigating motivation and anxiety relied on questionnaires to operationalize ID variables. This leads us to question whether less commonly implemented tools, for example, those from motivation research, such as journals and written feedback could be triangulated with the more commonly used questionnaires and whether this might lead to a more robust operationalization of the dynamic nature of L2 motivation (e.g., Dörnyei, Reference Dörnyei, Ellis and Larsen-Freeman2009a).
While Derrick (Reference Derrick2016) found that only 58% of L2 studies reported the origins of their instruments, we found for task-based research that authors noted whether they adapted from an existing instrument or developed an instrument in-house for the purposes of the study.
Echoing previous findings in task-based methodological syntheses (Plonsky & Kim, Reference Plonsky and Kim2016), we found that ID researchers also rely heavily on changes in L2 output based on the CAF/CALF framework (Housen et al., Reference Housen, Kuiken and Vedder2012; Skehan Reference Skehan1998a; Reference Skehan1998b; Reference Skehan2009) to operationalize L2 performance and development. Other methods used include assessing listening comprehension, interaction/discourse patterns, vocabulary development, how often learners noticed others’ errors, and learners’ awareness of L2 pragmalinguistic features.
In terms of the task variables investigated in these studies, our study shows that researchers were mainly interested in investigations of task complexity (27.41%), planning time (12.59%), manipulating task types (11.85%), and corrective feedback (5.93%), among other variables. This range of interests in task-based ID research seems to be representative of domains of interest in TBLT more generally, as evidenced by the recent trends in conferences (Sasayama, Reference Sasayama2019), handbooks (Samuda & Bygate, Reference Samuda and Bygate2008), encyclopedias, and edited collections (Wen et al., Reference Wen, Biedroń and Skehan2017) (as noted in a review of recent edited collections by Bryfonski, Reference Bryfonski2020).
From a methodological standpoint (our fourth research question), only 39 (28.89%) of the studies we investigated were longitudinal, in contrast to 88 (65.19%) that were cross-sectional. Historically, many IDs have been considered to be fixed, unchangeable characteristics, which may lead researchers to focus on cross-sectional study designs. However, there is also evidence suggesting that IDs like aptitude or working memory might in fact be improvable via training exercises (Bialystok & DePape, Reference Bialystok and DePape2009; Davidson et al., Reference Davidson, Kabat-Zinn, Schumacher, Rosenkranz, Muller, Santorelli, Urbanowski, Harrington, Bonus and Sheridan2003; Linck et al., Reference Linck, Osthus, Koeth and Bunting2014). Other studies have found that constructs like motivation or anxiety might be dynamic rather than static, fluctuating by context, including at different times. We are encouraged that for the included longitudinal studies, the average time frame studied was 10 weeks, or slightly less than one academic semester. Many researchers in our field have called for more long-term research (e.g., Long, Reference Long2016; Mackey & Goo, Reference Mackey, Goo and Mackey2007). Additionally, the majority of the research we investigated was concentrated in a few contexts, namely, EFL contexts with adult language learners. To move the domain of task-based ID research forward, we believe it is important to recognize the need and value of and support research conducted outside the “WEIRD” (Western, Educated, Industrialized, Rich, and Democratic) contexts traditionally investigated by applied linguists, and social scientists more generally (Andringa & Godfroid, Reference Andringa and Godfroid2020; Henrich et al., Reference Henrich, Heine and Norenzayan2010). By focusing our investigations mainly on TL (English), the generalizability of findings from these studies of IDs in TBLT is limited.
In terms of our fourth research question, we found that investment in open science practices in the domain of task-based ID research is still developing. Derrick (Reference Derrick2016) reported that only 17% of authors in three journals provided instruments in an appendix or in an online repository. We found slightly more (29%) for task-based ID research. Applied linguistics has heralded a push towards open-science practices in recent years, including recognition of open data and materials through badges in major journals (e.g., Studies in Second Language Acquisition, Annual Review of Applied Linguistics), repositories for instruments and materials (IRIS, Marsden et al., Reference Marsden, Mackey, Plonsky, Mackey and Marsden2015), repositories for tasks (the Task Bank; Gurzynski-Weiss, Reference Gurzynski-Weiss2021), and registered replications and reports (Morgan-Short et al., Reference Morgan-Short, Marsden, Heil, Issa, Leow, Mikhaylova, Mikołajczak, Moreno, Slabakova and Szudarski2018). Open science practices are an important way to promote scientific equity through the sharing of knowledge, instruments, and findings in freely accessible and permanent repositories. While there is growing excitement around open access in applied linguistics research, practices such as open-access publishing (e.g., Zhu, Reference Zhu2017) or making data freely available have not yet been fully embraced by L2 researchers (and academics more broadly), and this was born out in our findings as well.
Recommendations for Future Research
From a content perspective, the results of this methodological review demonstrate that task-based ID research is expanding beyond the most often studied constructs (motivation, working memory, proficiency). While there is always room for development of studies involving these most commonly researched IDs, we uncovered many other lesser-studied IDs that have the potential to impact TBLT research. To take one example, a few studies have investigated cognitive creativity as an ID (including, for example, Albert & Kormos, Reference Albert and Kormos2011; McDonough et al., Reference McDonough, Crawford and Mackey2015; Zabihi et al., Reference Zabihi, Rezazadeh and Ansari2013). IDs like cognitive creativity have the potential to shed light on interesting relationships in how learners approach tasks or task-based interaction, for example investigating how learners’ cognitive creativity interacts with their ability to find solutions to task-based problems or utilize learning strategies. However, research in this area has yet to pick up momentum. Less studied IDs, like creativity and emotions, might be profitably combined with other more commonly studied IDs like motivation, (as in Pipes, Reference Pipes2023) to better understand the various ways in which learner IDs mediate outcomes during task-based interactions or interventions.
The task-based ID research that has been conducted so far has relied on a relatively small set of methodological approaches. For example, researchers investigating L2 proficiency could aim to triangulate data from multiple sources in order to present the most accurate, and most transferable, view of participants’ developmental levels. This might mean triangulating from standardized test scores in addition to enrollment status and in-house tests or assessments. The results of task-based assessments (e.g., Ellis et al., Reference Ellis, Skehan, Li, Shintani and Lambert2020; Noroozi & Taheri, Reference Noroozi and Taheri2022; Norris et al., Reference Norris, Brown, Hudson and Bonk2002) would also be useful to examine in conjunction with other standardized proficiency tests as they are often more representative and better aligned to the kinds of tasks learners complete in task-based interventions (e.g., see Boers et al., Reference Boers, Bryfonski, Faez and McKay2021).
From a methodological standpoint, we recommend more research focusing on IDs in TBLT from qualitative or mixed methods perspectives. Only four studies (2.96%) included in our methodological synthesis were qualitative, and 32 (23.70%) utilized mixed methods. Again, triangulation of qualitative measures along with quantitative results from questionnaires (the most commonly implemented tool in TBLT ID research according to our findings) such as through semi-structured or stimulated recall interviews, journals, role-plays, classroom discourse, long term case studies, or other qualitative datasets would facilitate our understanding of how learners’ individual differences might impact task performance and outcomes. In quantitative studies, we also recommend more longitudinal research that examines changes in L2 outcomes or learners’ IDs over time, with a greater focus on longer term effects through the use of delayed posttests or follow-up interviews. Some task-based interventions, such as interactively provided corrective feedback, have been shown to have delayed effects (Lee & Lyster, Reference Lee and Lyster2016; Mackey, Reference Mackey1999; Mackey & Goo, Reference Mackey, Goo and Mackey2007; Sheen, Reference Sheen2010). As such, delayed posttests are necessary to observe the contribution of learners’ IDs with how durable outcomes are over time.
In the domain of statistical practices, we found that more than a third (37.69%) of the quantitative studies in our sample employed more than 10 parametric statistical tests (and some studies utilized many more). This should be viewed in the light of calls in prior work (e.g., Larsson et al., Reference Larsson, Plonsky, Sterling, Kytö, Yaw and Wood2023; Plonsky, Reference Plonsky2013; Reference Plonsky and Plonsky2015) for researchers to expand their repertoire of statistical practices in quantitative and mixed methods research and prioritize examinations of descriptive statistics, effect sizes, and confidence intervals over running large numbers of null hypothesis statistical tests. In terms of reporting practices too, we recommend authors be explicit about demographic data, including clearly stating the L1s of participants, describing the full context in which the study took place, and including as much descriptive data as possible such that future meta-analytic work can be easily conducted and studies can be replicated if necessary.
Finally, we note that outreach and inclusivity is critical in task-based research. Task-based pedagogy is a worldwide interest and therefore requires a global perspective. We believe an important priority in this area is for research to investigate learners studying languages other than English. While we recognize the global impact of English, our understanding of language learning cannot currently be generalized without the addition of a robust variety of other target languages and in more diverse contexts. Additionally, researchers excited about task-based ID research should consider making their materials such as tasks and data freely accessible in online repositories to aid in replication efforts and to expand the usage of common tools and tasks.
Data availability statement
The experiment in this article earned Open Data and Materials badges for transparent practices. The data and materials are available at https://www.iris-database.org/details/pWxNY-asADp
Competing interest
The author(s) declare none.