Introduction
Boredom has been recognized as an important affective individual difference factor in second and foreign language (L2) learning (Dewaele, Botes, & Greiff, Reference Dewaele, Botes and Greiff2023; Li, Reference Li2021; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020). Yet, more empirical evidence is urgently needed to determine its exact role for the reasons below. Firstly, most extant studies on foreign language learning boredom (FLLB) have used scores in curriculum-based language course exams (e.g., midterm/end-term exams) or self-perceived L2 proficiency as an indicator of L2 achievement rather than more objective and widely used international language proficiency test results (e.g., Dewaele, Botes, & Meftah, Reference Dewaele, Botes and Meftah2023; Li & Han, Reference Li and Han2022; Li & Li, Reference Li and Li2023), which would allow more solid, comparable, and generalizable findings. Secondly, many prior studies have examined the predictive effects of FLLB on overall L2 achievement (e.g., Li & Li, Reference Li and Li2023). Still, very few have investigated its specific role in different L2 skills (i.e., listening, speaking, reading, and writing). The focus on the four L2 skills is important because the emotional experiences (e.g., intensity and frequency of emotions and how emotions arise and impact other learning outcomes) and their nomological network (i.e., how emotions are linked with learner-internal/-external factors and outcomes) may vary across L2 skills (Li et al., Reference Li, Li and Jiang2024). Indeed, different skills differ from each other in their linguistic goals, cognitive demands, visibility, recursiveness, evanescence, time constraints, and interactiveness (Li, Li et al., Reference Li, Li and Lu2023). Lastly, most prior studies have utilized cross-sectional designs (e.g., Zhao & Wang, Reference Zhao and Wang2023), neglecting the fact that learner emotions and L2 evolve.
Against this background, the current study is primarily concerned with the following two core questions: 1) How does FLLB relate to overall and skill-specific L2 achievement and proficiency? 2) How do FLLB and L2 achievement relate to each other over time? Before addressing the major questions, we firstly set out to refine the existing measurement of FLLB, given that different conceptualizations and measures of a construct may lead to inconsistent and incomparable findings (Borsboom, Reference Borsboom2006). Specifically, the 32-item Foreign Language Learning Boredom Scale (Li, Dewaele et al., Reference Li, Dewaele and Hu2023) is to be reduced to the Foreign Language Learning Boredom Scale–Short Form (FLLBS–SF) with psychometric properties assessed to facilitate future research.
Literature review
Foreign language learning boredom: Conceptualization and measurement
Boredom has been neglected in L2 research until recently (e.g., Li, Reference Li2021; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020). For any new construct, its research rationale, definition, conceptualization, and measurement need to be solid to allow and encourage further empirical explorations. In this regard, Pawlak et al. (Reference Pawlak, Kruk, Zawodniak and Pasikowski2020) and Li, Dewaele et al. (Reference Dewaele, Botes and Greiff2023) took the initiative, but the time has come to advance.
Pawlak et al. (Reference Pawlak, Kruk, Zawodniak and Pasikowski2020) first argued that boredom itself justifies the emerging investigations due to its associative negative symptoms, including learners’ distractions, demotivation, and dissatisfaction. The authors then explored the underlying structure of boredom in an English as a foreign language (EFL) context for 107 English majors in Poland using an exploratory factor analysis (EFA). They identified a two-factor structure representing the 23-item scale, Boredom in Practical English Language Classes–Revised: (1) Disengagement, monotony, and repetitiveness, and (2) Lack of satisfaction and challenge. This seminal study extends L2 emotion research by including and measuring boredom.
The pioneering scale developed by Pawlak et al. (Reference Pawlak, Kruk, Zawodniak and Pasikowski2020) shows some limitations. Firstly, the process of generating the original item pool remains largely unknown, undermining methodological transparency and replicability. Secondly, the scale did not consider out-of-classroom boredom that learners experience outside the formal instructed classroom learning environment (Pawlak et al., Reference Pawlak, Zawodniak, Kruk, Li and Li2023) and general trait boredom that learners are inclined to experience across circumstances (Li et al., Reference Li, Dewaele and Hu2023). However, both types could potentially differentiate an individual’s emotional profile of boredom and its role in L2 learning. Indeed, L2 learning is not restricted to classroom settings, and boredom could also arise after class, such as when doing homework, massive open online courses, or virtual language classes within the curriculum or extracurricular mobile learning (Li, Reference Li2021; Li et al., Reference Li, Li and Jiang2024; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2022). Recognizing out-of-classroom boredom is thus urgently needed, especially in the current era of technology-enhanced L2 learning, as it is part of the boredom in relation to L2 learning (Li et al., Reference Li, Li and Jiang2024; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2022). In addition, as noted, some learners are more likely to feel bored across life domains, not only in L2 learning but also in other subjects (e.g., maths) or gaming (Li et al., Reference Li, Dewaele and Hu2023). That is, the general trait of boredom contributes to the experience of L2-specific boredom and becomes part of it. Thirdly, the factor structure identified with EFA should have been further confirmed in a different group of participants using confirmatory factor analysis (CFA), ideally followed by assessment of validity (e.g., criterion/convergent/discriminant/predictive validity), reliability (internal consistency and test–retest reliability), and measurement invariance (across groups and time). Fourthly, the sample size was relatively small for instrument validation. Although there is no consensus on the required sample size for instrument validation, it is without doubt that the sample “should be sufficiently large to eliminate subject variance” (DeVellis Reference Devellis2016, p. 130) and to represent the target population better (DeVellis Reference Devellis2016). Boateng et al. (Reference Boateng, Neilands, Frongillo, Melgar-Quiñonez and Young2018) recommended ten respondents for each scale item as a minimum sample size (there are 23 items in the Boredom in Practical English Language Classes—Revised), while Hair et al. (Reference Hair, Black, Babin and Anderson2019) recommended 200 cases as the required size. Lastly, the scale was developed and validated among English majors. Their academic selves, achievement goals, and value appraisal systems, which are assumed as distal and proximal antecedents of academic emotions (Pekrun, Reference Pekrun2006), are inherently distinct from those of non-English majors. This may limit the applicability of this scale to non-English majors.
Taking into account the concerns of Pawlak et al.’s (Reference Pawlak, Kruk, Zawodniak and Pasikowski2020) scale mentioned above, Li, Dewaele et al. (Reference Dewaele, Botes and Greiff2023) moved a step forward. Before developing and validating the Foreign Language Learning Boredom Scale (FLLBS), the authors first explored the existence of boredom and defined it in a Chinese EFL context. They found that boredom was one of the most frequently experienced emotions of Chinese non-English majors. It manifested itself with sleepiness, inattention, mind wandering, and disengagement. The authors defined FLLB as “a negative, deactivating achievement emotion arising from ongoing learning activities or tasks” (p. 3) based on the three-dimensional taxonomy of the control–value theory in educational psychology (Pekrun, Reference Pekrun2006). Based on such a preliminary understanding and conceptualization of FLLB, FLLBS was then developed and validated among a relatively large sample (n = 2,223) following a series of systematic psychometric tests. Consequently, Li, Dewaele et al. (Reference Dewaele, Botes and Greiff2023) identified a seven-factor model representing a psychometrically sound 32-item FLLBS: “Foreign Language Learning Classroom Boredom, Underchallenging Task Boredom, PowerPoint Presentation Boredom, Homework Boredom, Teacher Dislike Boredom, General Learning Trait Boredom, and Overchallenging or Meaningless Task Boredom” (p. 4).
The full version and different short versions of the FLLBS have been applied in heterogeneous L2 contexts. Firstly, its full version has been mainly used and validated in diverse groups of L2 learners in China, such as 868 university EFL students (Li, Reference Li2022), 954 secondary EFL learners in a southeastern rural area (Li & Li, Reference Li and Li2023), 504 ethnic minority and Han EFL students in a northwestern rural area (Zhao & Wang, Reference Zhao and Wang2023), 517 FL majors (Zhang, Reference Zhang2022), and 348 Chinese-as-a-second-language international students in an online context (Chen et al., Reference Chen, Sun and Yang2022). Secondly, its eight-item Foreign Language Classroom Boredom Subscale has been applied in more diverse FL groups, such as 1,555 urban and 600 rural EFL students from central China (Ma et al., Reference Ma, Liu, Luo and Liu2023), 1,205 urban and 616 rural EFL students from southeastern China (Li & Li, Reference Li and Li2024), Chinese university EFL students in online contexts (n = 348, Li & Dewaele, Reference Li and Dewaele2020; n = 880, Wang & Li, Reference Wang and Li2022), 168 Arab and Kurdish EFL learners in both in-person and emergency remote teaching contexts (Dewaele, Albakistani, & Kamal Ahmed, Reference Dewaele, Albakistani and Ahmed2024), 118 English majors from Thailand (Apridayani & Waluyo, Reference Apridayani and Waluyo2022), and 332 FL (English, French, and Spanish) learners across educational levels from the UK, China, and Italy (Dewaele, Botes & Greiff, Reference Dewaele, Botes and Greiff2023). Lastly, some researchers extracted several items from the FLLBS. For example, Wang et al. (Reference Wang, Wang and Li2023) extracted 21 items out of the 32 items, three items for each factor, and Zhao et al. (Reference Zhao, Lan and Chen2023) extracted five items.
There is an empirical trend using shorter versions of the FLLBS. Nevertheless, the existing shorter versions of FLLBS have several limitations. Firstly, the eight-item Foreign Language Classroom Boredom Subscale only partially presents FLLB since the subscale is restricted to the classroom. As mentioned earlier, one of the advantages of the full FLLBS is that it includes both in-class and out-of-class boredom, and both L2-specific and domain-general boredom. Secondly, for those extracted versions, it remains unclear how the items were selected and if the extracted versions are representative, valid, reliable, and invariant across groups (Wang et al., Reference Wang, Wang and Li2023; Zhao et al., Reference Zhao, Lan and Chen2023).
Foreign language learning boredom: Links with L2 achievement and proficiency
FLLB has also been theorized to have debilitating effects on learners’ motivation (e.g., demotivation), engagement (e.g., pseudo-/superficial/procedural engagement), cognition (e.g., distraction, short attention span, and superficial information processing), strategy use (e.g., less self-regulation), behavior (e.g., withdrawal), and flow experience in class, which further impairs L2 achievement and proficiency (Dewaele et al., Reference Dewaele, MacIntyre, Albakistani and Kamal Ahmed2023; Kruk & Zawodniak, Reference Kruk and Zawodniak2017; Li, Dewaele et al., Reference Li, Dewaele and Hu2023; Li et al., Reference Li, Li and Jiang2024; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020). Underpinned by the control–value theory of achievement emotions (Pekrun, Reference Pekrun2006), which posits the dynamic reciprocal linkages between emotions and their antecedents and outcomes, we further assume that FLLB and L2 achievement are bidirectionally linked to each other over time.
Prior empirical studies have explored the links between FLLB and L2 achievement (commonly operationalized as scores in curriculum-based language course exams such as midterm/end-term exams) or self-perceived L2 proficiency in diverse L2 contexts (e.g., Li & Li, Reference Li and Li2023; Liu & Wang, Reference Liu and Wang2023; Zhao & Wang, Reference Zhao and Wang2023). The findings are mixed. In addition, very few have examined the links between FLLB and L2 proficiency measured with international language proficiency tests (e.g., Tsang & Dewaele, Reference Tsang and Dewaele2023).
FLLB was found to be a negative predictor of overall L2 achievement in most prior relevant studies. Özsaray and Eren (Reference Özsaray and Eren2018), for example, found that boredom had a large negative predictive effect on English achievement of undergraduate students from Turkey (β = –.50, p < .05). Smaller negative predictive effects on English achievement were found among Moroccan English learners from secondary schools, universities, and language centers (β = –.14, p < .01) (Dewaele, Botes, & Meftah, Reference Dewaele, Botes and Meftah2023) and English-as-an-L3 (third language) and Chinese-as-an-L2 ethnic minority students from China (β = –.17, p < .01) (Zhao & Wang, Reference Zhao and Wang2023). Li and Li (Reference Li and Li2023) further confirmed the small negative achievement effect of FLLB among secondary EFL students from rural China. Notably, they also revealed the limited durability of such achievement impact: The effect faded over time (the first week: β = –.14, p < .001; the fifth week: β = –.11, p < .001; the ninth week: nonsignificant).
FLLB was also found to be negatively correlated with self-perceived English proficiency. Liu and Wang (Reference Liu and Wang2023), for example, found a small-to-medium negative predictive effect among Chinese secondary students (r = –.28, p < .001). Similar results were found in an online L2 context among Chinese university learners (β = –.29, p < .001) (Li & Han, Reference Li and Han2022).
By contrast, some other studies found no significant predictive effects of FLLB on L2 achievement/proficiency. For example, Dewaele, Botes, and Greiff (Reference Dewaele, Botes and Greiff2023) found no significant predictive effect of FLLB on English achievement among international students. In two other studies among primary school EFL learners from Hong Kong, Tsang and Dewaele (Reference Tsang and Dewaele2023) and Yeung et al. (Reference Yeung, Tsang, Lam and Law2023) found that boredom did not significantly predict overall English proficiency or skill-specific English achievement (reading and writing).
Motivation of the current study
To determine the role of foreign language learning boredom in L2 learning
More empirical evidence is urgently needed to determine the exact role of FLLB. Firstly, and surprisingly, international language proficiency tests (e.g., the Cambridge English Assessment, International English Language Testing System [IELTS], Test of English as a Foreign Language [TOEFL], and Oxford Placement Test) have been scarcely used in L2 boredom (also enjoyment) research, which is in sharp contrast with the extensive use of curriculum-based course exams (e.g., end-term/midterm exams) (e.g., Li & Li, Reference Li and Li2023; Zhao & Wang, Reference Zhao and Wang2023). However, L2 proficiency is of paramount importance to L2 learners as “an index of the comprehension and production abilities that L2 learners develop across linguistic domains (e.g., lexical competence, grammatical competence, discourse competence) and modalities (spoken and written) to communicate” (Tremblay, Reference Tremblay2011, p. 340). Although course exams are the primary ways to assess curriculum implementation, teaching effectiveness, and students’ mastery of skills and knowledge in the period of coursework (Li et al., Reference Li, Li and Jiang2024), they are regional and administered at different levels of the education system within a country (e.g., national, municipal, and school levels). Thus, they are limited in external validity (e.g., applicability, transferability, and generalizability of relevant findings and implications) (Li et al., Reference Li, Li and Jiang2024). In addition, unlike established L2 proficiency tests, most course exams were administered with few considerations about their reliability and construct validity. Secondly, very few studies have taken a skill-specific approach to examine the distinctive roles of FLLB (and other emotions) in impacting overall L2 and different L2 skills (i.e., listening, speaking, reading, and writing) (e.g., Yeung et al., Reference Yeung, Tsang, Lam and Law2023), leaving this area largely uncharted. Such a skill-specific approach to L2 learner emotion is warranted because learners’ emotions may vary across language skills (subsystems) as a result of the variations in terms of linguistic goals, cognitive demands, visibility, recursiveness, evanescence, time constraints, self-paced learning, interactiveness, and social pressure involved in different skills (Li, Li et al., Reference Li, Dewaele and Hu2023; Li et al., Reference Li, Li and Jiang2024). Lastly, the widely used cross-sectional designs cannot capture the dynamic nature of FLLB and L2 achievement. Moreover, longitudinal investigations reveal causal relationships (Kenny, Reference Kenny1979).
All in all, the current study aims to provide a fuller picture of the role of FLLB in L2 learning more effectively by 1) using diverse measurements (both curriculum-based course exam scores and international L2 proficiency test scores), 2) considering both general L2 and specific L2 skills, and 3) utilizing both cross-sectional and longitudinal research designs.
To develop a short but sound measure of foreign language learning boredom
As reviewed previously, there is a clear empirical preference for shorter versions of the FLLBS. The rationale for reducing the items in the existing FLLBS to a minimum number is evident. Firstly, lengthy questionnaires may impair the initial willingness to participate in and complete the questionnaire (Galesic & Bosnjak, Reference Galesic and Bosnjak2009; Rolstad et al., Reference Rolstad, Adler and Rydén2011). The shorter the questionnaire is, the more respondents start, and the fewer respondents drop out (Galesic & Bosnjak, Reference Galesic and Bosnjak2009). Secondly, lengthy questionnaires potentially cause a response burden (Dörnyei & Dewaele, Reference Dörnyei and Dewaele2022; Galesic & Bosnjak, Reference Galesic and Bosnjak2009). The shorter the questionnaire is, the more likely respondents are to maintain their attention and interest, which contributes to a higher response rate and better response quality (Galesic & Bosnjak, Reference Galesic and Bosnjak2009; Rolstad et al., Reference Rolstad, Adler and Rydén2011). Thirdly, the longer the questionnaire, the more time it takes to complete it, and the more likely the questionnaire will be considered and declined as an obstacle in practice (Fowler, Reference Fowler2014; Rolstad et al., Reference Rolstad, Adler and Rydén2011). Fourthly, although it is expected to use multiple items that are semantically similar but phrased differently to measure the same construct or dimension of a construct in scales, unnecessarily identical items not only increase the length of the scale but also may impact participants’ affective states (e.g., impatience, fatigue, and hastiness) in response, which may further undermine both the construct validity and reliability of the survey results (Clark & Watson, Reference Clark and Watson1995).
To sum up, short questionnaires have the advantages of being convenient, user-friendly, and feasible and tend to have higher response rates and quality (Dörnyei & Dewaele, Reference Dörnyei and Dewaele2022). Nevertheless, as identified earlier, either the existing eight-item Foreign Language Classroom Boredom Subscale or other extracted versions of the FLLBS have their conceptual or methodological limitations. To this end, the current study aims to develop and validate a Foreign Language Learning Boredom Scale–Short Form (FLLBS–SF). It involved a series of systematic psychometric processes (from item selection to scale validation, including reliability assessment, validity assessment, and invariance assessment), following consistent guidelines and stringent statistical criteria, and utilizing data from a large integrated sample with heterogeneous backgrounds (e.g., age, education level, linguistic background, region, ethnicity, and economic status) over different time points.
The current study
Research questions and hypotheses
The following research questions (RQs) guided the present study:
-
1. Is the newly developed Foreign Language Learning Boredom Scale–Short Form (FLLBS–SF) reliable, valid, and invariant?
-
2. How does FLLB predict overall and skill (subsystem)-specific L2 achievement and proficiency?
-
3. How are FLLB and L2 achievements related to each other over three consecutive semesters in the span of a year?
For RQ2 and RQ3, we proposed the following hypotheses (Hs):
H1: FLLB would negatively predict overall L2 academic achievement and achievement in specific skills or subsystems (i.e., vocabulary and grammar, listening, reading, and writing) (see Figure 1);
H2: FLLB would negatively predict overall L2 proficiency and skill-specific proficiency (listening, reading, and writing) (see Figure 2);
H3: FLLB and L2 achievement would negatively and reciprocally predict each other across time (see Figure 3).
The first hypothesis was proposed based on the emotion–achievement link assumption of control–value theory and relevant empirical literature (e.g., Dewaele, Botes, & Meftah, Reference Dewaele, Botes and Meftah2023; Li & Han, Reference Li and Han2022; Li & Li, Reference Li and Li2023). Regarding the second hypothesis, there are very few theoretical assumptions/predictions or empirical investigations specifically addressing the effects of emotions on (overall/skill-specific) L2 proficiency. However, as reviewed, extant theoretical assumptions (e.g., the emotion–achievement link assumption of control–value theory) and empirical evidence (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020) suggest that FLLB has a significant impact on learning outcomes such as cognition, motivation, engagement, and behavior, which are linked to L2 proficiency. The second hypothesis is thus that FLLB could impair L2 proficiency. In addition, although L2 proficiency is inherently different from L2 achievement, they are interconnected, especially in FL learning contexts where L2 proficiency is mainly developed in instructed L2 learning within a particular curriculum, whose effectiveness is typically assessed with course scores (a commonly used indicator of L2 achievement). The third hypothesis was underpinned by the assumption of the control–value theory on the reciprocal and dynamic relationships between achievement emotions and academic achievement, which has been insufficiently corroborated in L2 learner emotion research (Pekrun, Reference Pekrun2006).
Research design
The three RQs were answered in three substudies. Study 1 aimed to develop and validate the Foreign Language Learning Boredom Scale–Short Form (FLLBS–SF) based on different datasets within diverse EFL learning contexts (see Figure 4). Study 2 adopted a cross-sectional design, and the FLLBS–SF was applied to assess the links between FLLB and (overall and skill-specific) L2 achievement and proficiency (see Figures 1 and 2). Study 3 was longitudinal in design, and the FLLBS–SF was applied to investigate the longitudinal associations between FLLB and English achievement (see Figure 3).
Datasets and participants
Table 1 shows the four datasets used in the three substudies. Datasets 1 and 2 were used in prior studies by Li, Dewaele et al. (Reference Dewaele, Botes and Greiff2023), and Zhao and Wang (Reference Zhao and Wang2023), and the other two are novel ones. The entire integrated dataset consists of 4,770 students, 2,223 (46.60%) university students from Dataset 1, and 2,547 (53.40%) secondary students from Datasets 1–3. The data on the same variables (e.g., FLLB, FLE, and FLCA) under discussion were obtained with the same instruments across datasets, enabling the integration of the datasets (Isbell & Son, Reference Isbell and Son2022). In Study 1, all the datasets were merged to develop and validate the FLLBS–SF. In Study 2, the cross-sectional part of Dataset 4 was used to examine the predictive effect of FLLB on overall and skill-specific L2 achievement and proficiency. In Study 3, the longitudinal part of Dataset 4 was used to explore the associations between FLLB and L2 achievement over three consecutive semesters.
Note. FLE = foreign language enjoyment, FLCA = foreign language classroom anxiety, BS = boredom susceptibility, ABE = academic boredom in English; T1–T3 = Time points 1–3.
Dataset 1 included 2,223 non-English majors from nine universities in China (M age = 18.33, SD age = 2.15). There were 901 (40.53%) men and 1,089 (48.99%) women. Their English proficiency was roughly at the B1 level in the Common European Framework of Reference for Languages (CEFR) as they were non-English majors who were going to take or had already passed the College English Test–Band 4 (passing the test is a graduation requirement for non-English majors in China), which is commonly aligned with the B1 level in CEFR (Jin et al., Reference Jin, Jie and Wang2022).
Dataset 2 included 504 participants (50.80% men, 49.20% women). They were junior secondary school students in rural areas with a mean age of 14.29 (SD = 2.07). Over half of the participants (n = 297, 58.93%) were Chinese-as-an-L2 and English-as-an-L3 students from various minority groups (e.g., Tibetan, Tujia, and Mongolian). The rest were all Chinese L1 and English L2 speakers (n = 207, 41.07%). As regulated by the Chinese Ministry of Education (2022), the participants were all English beginners, roughly at the A1 level in the CEFR.
Dataset 3 contained 934 participants (473 men and 461 women) from rural and urban areas in east China. They were all junior secondary school students with a mean age of 13.49 (SD = .88). All the participants were Chinese L1 speakers learning English as a foreign language. Their proficiency was at the beginning level (roughly equivalent to A1 in the CEFR), as regulated by the Chinese Ministry of Education (2022).
Dataset 4 consisted of 1,109 junior secondary EFL students in a rural boarding school in eastern China. They were from 26 intact classes, among which only four classes lived with their parent(s), and the other 22 classes were all left-behind children whose parent(s) worked in remote urban regions. Their mean age was 13.50 (SD = .77). There were 696 men (62.80%) and 413 (37.20%) women. Participants’ scores in the Cambridge A2 Key for Schools English Test (M = 45.63, SD = 15.23) indicate relatively low English proficiency, roughly at A1 in the CEFR.
Instruments
Table 2 presents the main instruments used in the three substudies: (1) Emotion scales measuring FLLB and its criterion variables (e.g., enjoyment and anxiety), (2) end-term English exams, and (3) English proficiency tests. All the scales were in Chinese and responded to on a 5-point Likert scale ranging from 1 “strongly disagree” to 5 “strongly agree.” All the instruments showed acceptable reliability across datasets.
Note. K = the number of scale items, test items, or sections; T1–T3 = Time points 1–3; FLLBS = Foreign Language Learning Boredom Scale, CFLES = Chinese Foreign Language Enjoyment Scale, FLCAS = Foreign Language Classroom Anxiety Scale, BSS = Boredom Susceptibility Scale, ABES = Academic Boredom in English Scale.
Foreign Language Learning Boredom Scale (FLLBS). The original 32-item FLLBS (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023) was used to measure general boredom in relation to EFL learning with seven factors (see Literature Review).
The Chinese Foreign Language Enjoyment Scale (CFLES). The CFLES (Li et al., Reference Li, Jiang and Dewaele2018) was used to measure student foreign language enjoyment, a criterion variable selected for FLLB based on literature (e.g., Li & Li, Reference Li and Li2023). The scale contains eleven items measuring three factors: FLE–Private, FLE–Teacher, and FLE–Atmosphere.
Foreign Language Classroom Anxiety Scale (FLCAS). The short FLCAS was used to measure students’ anxiety levels in foreign language classrooms, another criterion variable for FLLB (e.g., Li & Li, Reference Li and Li2023). Li and Li (Reference Li and Li2023) truncated the six items from the eight-item FLCAS (Dewaele & MacIntyre, Reference Dewaele and MacIntyre2014), applying and validating it in a similar Chinese EFL context.
Boredom Susceptibility Scale (BSS). The BSS (Zuckerman, Reference Zuckerman1979) was used to elicit learners’ general proneness to boredom, a criterion variable of FLLB (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023). The scale includes ten items.
Academic Boredom in English Scale (ABES). Students’ academic boredom in English subject, a criterion variable of FLLB (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023), was measured via the three-item ABES that Li (Reference Li2021) adapted from the Achievement Emotions Questionnaire (Pekrun et al., Reference Pekrun, Goetz, Frenzel, Barchfeld and Perry2011).
End-term English exam. Participants’ scores in their school-level end-term English exams were used as the index of their L2 achievement. Although the specific content of these exams differed across datasets, their structures were largely the same, containing sections for listening, vocabulary and grammar, reading, and writing, with the same maximum score of 120. For Dataset 2, we only obtained the global score. For Dataset 4, we obtained the scores for all 81 items in the sections on listening, vocabulary and grammar, reading, and writing, with the maximum scores of 20, 40, 40, and 20, respectively.
Cambridge A2 Key for Schools English Test. A practice version of the test was used to measure young school-age learners’ English proficiency in Dataset 4. The maximum score of the test utilized in the current study was 85, with 25, 30, and 30 for the three sections on listening, reading, and two writing tasks (an email writing task and a picture-based story-telling task), respectively. The research site declined the original speaking section because it was not part of their curriculum or high-stake English exams in China. Participants’ writing samples were assessed in terms of three dimensions of content (0–5 points), language (0–5 points), and organization (0–5 points) following the guidelines of the Cambridge Writing Assessment Subscales (see https://assets.cambridgeenglish.org/schools/CER%206647%20V1c%20JUL20_Teacher%20Guide%20for%20Writing%20A2%20Key%20for%20Schools.pdf).
Six English teachers were recruited from the research site to complete the (inter-)rating. The first author trained them systematically in three 60-minute sessions: 1) Familiarizing them with the rating rubrics, 2) showcasing them how to assess essays, and 3) guiding them to practice rating and having group discussions for challenging and confusing parts. Ten percent of the compositions were double-scored, and the inter-rater reliability for all six rating dimensions was acceptable (all r s >. 70; Koo & Li, Reference Koo and Li2000).
Data collection and ethics
For Dataset 1 (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023) and Dataset 2 (Zhao & Wang, Reference Zhao and Wang2023), both of which had been used in prior publications, we obtained consent to use the data from both author teams. For Datasets 3 and 4, official approval was obtained from the first author’s institution and the research sites. Written consent was obtained from participants and their guardians after they had been offered sufficient information on the nature, purpose, and duration of the project, their unconditional nonparticipation/withdrawal rights, rewards after the completion of the project, and data anonymization in research outcomes. Questionnaire surveys were then carried out via the online questionnaire platform of WenJuanXing (https://www.wjx.cn/) in computer classes. English exams/proficiency tests were completed in classrooms.
Study 1: Development and validation of the Foreign Language Learning Boredom Scale-Short Form (FLLBS-SF)
Study 1 aimed to reduce the 32-item FLLBS (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023) to the FLLBS–SF and assess its psychometric properties utilizing Datasets 1–4. Following a series of steps recommended by Marsh et al. (Reference Marsh, Ellis, Parada, Richards and Heubeck2005), Study 1 involved three phases: 1) Preparing the data, 2) developing the FLLBS–SF, exploring, and confirming its structure, and 3) validating the FLLBS–SF (Figure 4).
Phase 1: Preparing the data for scale development and validation
Datasets 1–3 were first merged and then randomly divided into two subsamples. Subsample 1 was used to develop the FLLBS–SF and explore its structure via EFA (n = 320, ten times the number of the scale items; Hair et al. Reference Hair, Black, Babin and Anderson2019). Subsample 2 was used to confirm the structure through CFA further (n = 3,341). T-tests showed no significant differences between the two subsamples in terms of their mean age, levels of boredom, enjoyment, and anxiety.
Phase 2: Developing the FLLBS-SF and confirming its structure
Phase 2 involved a series of procedures to develop the FLLBS–SF and explore and verify its underlying factor structure using data from Subsample 2 (n = 320). All these followed more stringent criteria or the same criteria used in the validation of the original FLLBS (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023).
Item analysis
Item analyses were conducted to assess the quality and measurement characteristics of the individual scale items (Reynolds et al., Reference Reynolds, Altmann and Allen2021). Specifically, we performed the item–total correlation analysis, inter-item analysis, and item discrimination analysis.
Item–total correlation analyses were first conducted to assess whether an item is measuring the same construct as the overall test measures, following a more stringent criterion (>. 40; Loiacono et al., Reference Loiacono, Watson and Goodhue2002) than that used in Li, Dewaele et al. (Reference Dewaele, Botes and Greiff2023) (>. 30). No items were eliminated after the assessment.
Then, the inter-item correlations were calculated to determine the consistency among items. Coefficients between. 30 and. 80 were considered acceptable (Field, Reference Field2013). Consequently, Items 4, 7, 13, 14, 15, 16, 17, 20, 24 and 28 were deleted.
Item discrimination analysis assessed how well the items can differentiate participants (Reynolds et al., Reference Reynolds, Altmann and Allen2021). Participants whose scores on the FLLBS fell within the upper and lower 27% (n = 85 and n = 86, respectively) constituted two comparison groups for independent sample t-tests (Kelley, Reference Kelley1939). Significant differences were detected on each item between the two groups, indicating no need for item deletion.
Exploratory factor analysis
To explore the factor structure underlying FLLB, EFAs were conducted after the assessment of Bartlett’s test of sphericity and the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy via SPSS 27. The extraction method was principal component analysis with the oblique (promax) rotation. The final factor number was decided by jointly considering theoretical underpinnings and the following indices: the eigenvalue, the scree plot, and the parallel analysis (Hair et al., Reference Hair, Black, Babin and Anderson2019). Firstly, factors with eigenvalues greater than one were retained. Secondly, an inflection point occurs in the scree plot when the eigenvalue transitions from a steep slope to a horizontal line. Factors before this point are suitable for retention. Lastly, factors with higher eigenvalues than values generated by parallel analysis were retained (Hair et al., Reference Hair, Black, Babin and Anderson2019).
The KMO (KMO ≥. 80) and Bartlett’s test (p < .001) results indicate that Subsample 1 was suitable for EFA. The first EFA was then conducted, revealing four factors with eigenvalues higher than 1. However, Items 18 and 19 were deleted due to their low factor loadings on all four factors (< .40).
Then, a second EFA was conducted. The eigenvalue, scree plot, and parallel analysis indicated four factors, one factor and three factors representing FLLB, respectively (Figure 5). The one-factor solution was abandoned because FLLB is theorized as a multidimensional construct (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020). The three-factor solution of parallel analysis was finally selected because parallel analysis offers more accurate and parsimonious results (e.g., Hair et al., Reference Hair, Black, Babin and Anderson2019; Hayton et al., Reference Hayton, Allen and Scarpello2004) compared to the eigenvalue criterion, which has been criticized for being “amongst the least accurate methods” for factor retention decisions (Costello & Osborne, Reference Costello and Osborne2005, p.2).
Item selection
The items were selected for each identified factor based on the EFA results and theoretical considerations. We deleted items according to the following criteria (e.g., Awang, Reference Awang2012; Clark & Watson, Reference Clark and Watson1995; Hair et al., Reference Hair, Black, Babin and Anderson2019).
-
1. Communality lower than .50;
-
2. Factor loadings on the focal factor lower than .60;
-
3. Cross-loadings higher than .30;
-
4. Conceptually inconsistent with other items under the same factors.
Items 21, 22, 23, 25, and 32 were deleted due to low factor loadings. Items 30 and 31 were also deleted due to their conceptual inconsistency with other items under the same factor. Resultantly, 13 items were retained in the scale.
Face validity assessment and the final FLLBS–SF
The initial version of FLLBS–SF was evaluated by 13 researchers in applied linguistics. Following Hair et al. (Reference Hair, Black, Babin and Anderson2019), they were invited to judge each scale item in terms of their (a) relevance in meaning (1 “imprecisely measuring boredom,” 10 “accurately measuring boredom”) and (b) clarity in the linguistic expression (1 “very ambiguous,” 10 “very clear”) on a 10-point scale. Then, they were asked a further open-ended question about whether/why certain items were problematic in measuring FLLB.
Experts’ quantitative ratings on item relevance and clarity were both high (M s > 7.30). However, they raised concerns in the open-ended questions about Items 2 and 5, particularly in terms of their conceptual inconsistency. The two items were deleted after a group discussion among authors. In the end, 11 items were maintained in the final FLLBS–SF, representing three factors (Table 3).
The three identified factors explained 56.93%, 11.84%, and 10.31% of the variance, respectively, and 79.08% in total. The three-factor structure represents FLLB as a multilevel and mixed emotional experience construed in language activities/tasks at the micro level, in language classrooms at the meso level, and in general learning at the macro level. Factor 1 is named Foreign Language Activity Boredom because the items concern boredom experienced in specific FL activities both in and out of class (e.g., English exercises). Factor 2 is named Foreign Language Classroom Boredom, and the items reflect boredom experienced in FL classrooms in general. Factor 3 is named General Learning Boredom because such boredom, although arising in L2 learning contexts, is shared in general learning as an emotional proneness ready to be brought to any subject.
Confirming the structure of FLLBS–SF
CFA was conducted to confirm the three-factor structure identified in EFAs, using the data of Subsample 2 via Mplus 8.0 (n = 3,341). The model fit indices and criteria were the root mean square error of approximation (RMSEA < .08), the standardized root mean square residual (SRMR < .08), the comparative fit index (CFI > .95), and the Tucker–Lewis index (TLI > .95) (Hu & Bentler, Reference Hu and Bentler1999). Additionally, standardized factor loadings higher than .70 were considered to be ideal (Hair et al., Reference Hair, Black, Babin and Anderson2019), a higher standard than Li, Dewaele et al. (Reference Dewaele, Botes and Greiff2023) (>. 50).
The results show good model fits (χ2/df = 517.768/41, CFI = .984, TLI = .979, SRMR = .020, RMSEA [90% CI] = .059 [.055,.064]) and acceptable factor loadings (see Figure 6). We thus confirmed the three-factor structure representing the 11-item FLLBS–SF.
Phase 3: Validating the FLLBS-SF
The newly developed 11-item FLLBS–SF was subjected to a series of psychometric validations, including reliability assessment, validity evaluation, and measurement invariance tests.
Reliability assessment
The reliability of FLLBS–SF was evaluated by assessing its internal consistency and test–retest reliability. We calculated Cronbach’s α, McDonald’s ω, and Guttman’s split-half coefficient as indicators of internal consistency using Datasets 1, 2, and 3 (> .70; Hair et al., Reference Hair, Black, Babin and Anderson2019). The test–retest reliability was evaluated with two-way random effects intraclass correlation coefficients (ICC) based on Dataset 4: poor reliability (ICC < .40), moderate reliability (.40 ≤ ICC < .60), good reliability (.60 ≤ ICC < .75), and excellent reliability (ICC ≥. 75) (Cicchetti, Reference Cicchetti1994).
The results show excellent internal consistency for the overall scale (α = .94, ω = .93), and three subscales (Foreign Language Activity Boredom: α = .92, ω = .92; Foreign Language Classroom Boredom: α = .90, ω = .91; General Learning Boredom: α = .91, ω = .91). The Guttman’s split-half reliability of the FLLBS–SF was also good (.87). Its test–retest reliability was acceptable (ICCT1–T2 = .44, ICCT2–T3 = .46). Taking together, the FLLBS–SF demonstrated satisfactory reliability.
Validity assessment
We further assessed convergent validity, discriminant validity, criterion validity, and predictive validity of the FLLBS–SF. Convergent validity was first evaluated to reflect whether items share a high proportion of variance in common (Hair et al., Reference Hair, Black, Babin and Anderson2019). Specifically, we calculated (1) the correlations between the scores on the FLLBS–SF and the original FLLBS and (2) the average variance extracted (AVE) and composite reliability (CR) based on Datasets 1, 2, and 3.
The large positive correlation between the scores on FLLBS–SF and FLLBS (r = .97, p < .001) indicated good convergent validity (Hair et al., Reference Hair, Black, Babin and Anderson2019). The high values of AVEs (>. 50) and CRs (>. 70) of the three factors also indicated good convergent validity (Hair et al., Reference Hair, Black, Babin and Anderson2019).
The discriminant validity was then assessed using Datasets 1, 2, and 3, to decide the extent to which the factors of FLLB are distinct from each other. Specifically, discriminant validity is established if all factor AVEs are greater than the squared correlation estimates (r2) between any pair of factors (Hair et al., Reference Hair, Black, Babin and Anderson2019). The comparison revealed that the AVE of each subscale was larger than the r2 between any two subscales (Table 4), indicating good discriminant validity.
Note. AVE = average variance extracted, CR = composite reliability; r2 = Coefficients of determination; ***p <. 001.
Criterion validity was examined to assess the correlation between FLLB and nonincidentally but theoretically related constructs (Devellis, Reference Devellis2016). Specifically, we correlated scores on the FLLBS–SF with a series of criteria variables, including foreign language enjoyment (Datasets 1, 2, and 3), anxiety (Datasets 1 and 3), academic boredom in English (Dataset 1), and boredom susceptibility (Dataset 1). The results indicate that FLLB measured by FLLBS–SF was positively correlated with foreign language anxiety (r = .36, p < .001), academic boredom in English (r = .76, p < .001), and boredom susceptibility (r = .36, p < .001), while negatively with foreign language enjoyment (r = –.39, p < .001). Taken together, the FLLBS–SF had good criterion validity.
Finally, the predictive validity concerns the extent to which the FLLBS–SF can predict learning outcomes. We examined the associations between FLLBS–SF scores and English achievement based on the merged Datasets 2 and 3 (n = 1,438). Linear regression results show that FLLBS significantly and negatively predicted L2 achievement (β = –.48, p < .001), suggesting that the FLLBS–SF had excellent predictive validity.
Invariance assessment
Cross-sectional invariance and longitudinal measurement invariance were assessed by employing multigroup CFAs in Mplus. Cross-sectional invariance tests were conducted across groups of gender [Male vs. Female], age [Adolescent: age < 18 vs. Adult: age ≥ 18], region [Rural vs. Urban], educational level [Secondary vs. Tertiary], and language status [English as L2 vs. L3] based on Datasets 1, 2, and 3. Dataset 4 was used for the longitudinal invariance test (i.e., time invariance).
To assess measurement invariance, we specified a set of increasingly stringent models, starting from the configural model (assuming equivalence of factor structures), the metric model (assuming equivalence of factor loadings), and finally, the scalar model (assuming equivalence of item intercepts). Then, we compared the more stringent model with the less stringent one to detect changes in model fits. Invariance is supported when ΔCFI ≤ –.010, ΔRMSEA ≤. 015, and ΔSRMR ≤. 030 (for metric invariance), or ΔSRMR ≤. 015 (for scalar invariance) (Chen, Reference Chen2007).
The results of the invariance test are summarized in Table 5. As displayed, no significant difference was detected between a series of increasingly strict CFA models, indicating that the FLLBS–SF was interpreted and responded to invariantly across distinctive gender, age, regional, educational, and language status groups and three-time points spanning over a year.
Note. All p <. 001.
Study 2: The links between FLLB and overall/skill-specific L2 achievement and proficiency
Study 2 was designed to reveal the predictive effect of FLLB on overall/skill-specific L2 achievement (vocabulary and grammar, listening, reading, and writing) and proficiency (listening, reading, and writing).
Data analysis and results
Structural equation modeling (SEM) was adopted to examine the predictive effects of boredom. It involved two steps: (1) Estimating the measurement model to assess the measurement validity of indicators and (2) Fitting the structural model to reveal the relationships among variables of interest. Data used in the SEM was the cross-sectional part of Dataset 4 (T2, n = 1,109). Of note was that data for boredom was based on scores on the FLLBS–SF.
The measurement model fits were good for overall L2 achievement model (χ2/df = 296.677/51, CFI = .976, TLI = .969, SRMR = .019, RMSEA [90% CI] = .069 [.062, .077]) and skill-specific L2 achievement model (χ2/df = 327.028/81, CFI = .982, TLI = .976, SRMR = .019, RMSEA [90% CI] = .054 [.048, .060]) (Hu & Bentler, Reference Hu and Bentler1999), hence allowing for the assessment of structural model. Results of the structural model show that the overall L2 achievement model was a good fit (χ2/df = 296.677/51, CFI = .976, TLI = .969, SRMR = .019, RMSEA [90% CI] = .069 [.062, .077]), and so was the skill-specific L2 achievement model (χ2/df = 327.028/81, CFI = .982, TLI = .976, SRMR = .019, RMSEA [90% CI] = .054 [.048, .060]). The results (Table 6) indicate that FLLB significantly (all p s < .001) and negatively predicted learners’ overall L2 achievement (β = –.19), vocabulary and grammar achievement (β = –.14), listening achievement (β = –.12), reading achievement (β = –.16) and writing achievement (β = –.13), with modest effect sizes (Cohen et al., Reference Cohen, Manion and Morrison2017).
Note. All p <. 001.
Results also indicate that the fit indices were satisfactory for the measurement model of the overall L2 proficiency model (χ2/df = 293.637/51, CFI = .976, TLI = .970, SRMR = .019, RMSEA [90% CI] = .068 [.060, .075]), and skill-specific L2 proficiency model (χ2/df = 306.810/71, CFI = .979, TLI = .973, SRMR = .019, RMSEA [90% CI] = .057 [.050,. 063]). The structural model demonstrated satisfactory fits for overall L2 proficiency model (χ2/df = 293.637/51, CFI = .976, TLI = .970, SRMR = .019, RMSEA [90% CI] = .068 [.060, .075]) and skill-specific L2 proficiency model (χ2/df = 306.810/71, CFI = .979, TLI = .973, SRMR = .019, RMSEA [90% CI] = .057 [.050, .063]). As indicated in Table 6, FLLB significantly (all p s < .001) and negatively predicted overall L2 proficiency (β = –.23), listening proficiency (β = –.19), reading proficiency (β = –.19), and writing proficiency (β = –.20), with modest effect sizes (Cohen et al., Reference Cohen, Manion and Morrison2017).
Study 3: The longitudinal relationships between foreign language learning boredom and achievement
Study 3 sought to reveal the longitudinal relationships between FLLB and L2 achievement.
Data analysis
Their associations were examined with cross-lagged panel modeling using Dataset 4 (T1–T3, n = 1,109) following the two steps for SEM in Study 2. The model fit was assessed based on the same recommended cutoff values in the CFA of Studies 1 and 2 (Hu & Bentler, Reference Hu and Bentler1999).
Results
The measurement model of the cross-lagged panel model was good (χ2/df = 1777.487 /573, CFI = .970, TLI = .967, SRMR = .034, RMSEA [90% CI] = .044 [.041,. 046]), allowing for the structural modeling. The fit of structural model was also satisfactory (χ2/df = 1851.368/577, CFI = .968, TLI = .965, SRMR = .063, RMSEA [90% CI] = .045 [.043, .047]) (see Figure 7).
As the figure indicates, FLLB and L2 achievement were negatively correlated at Time Point 1 (r = –.20, p < .001), but their correlations diminished to nonsignificant at Time Point 2 (r = –.07), and the effect size further decreased at Time Point 3 (r = –.05). Moreover, FLLB at Time Points 1 and 2 did not predict L2 achievement at Time Points 2 and 3, respectively. By contrast, English achievement consistently and negatively predicted subsequent FLLB (both β s = –.12, p < .001) with modest effect sizes (Cohen et al., Reference Cohen, Manion and Morrison2017).
Discussion
RQ 1 concerned the development and validation of the Foreign Language Learning Boredom Scale–Short Form (FLLBS–SF). We reduced the original 32-item FLLBS to a parsimonious 11-item FLLBS–SF measuring three factors, namely Foreign Language Activity Boredom, Foreign Language Classroom Boredom, and General Learning Boredom. The three-factor structure not only echoes the multidimensional nature of FLLB revealed in prior studies (e.g., Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020), but also reflects its multilevel framework from language activities/tasks (in- and out-of-class) at the micro level, to language classrooms at the meso level, and to learning in general at the macro level. Informed by the items and the factor structure of the final FLLBS–SF, we further define FLLB as an individual’s proneness or inclination to feel bored in relation to L2 learning in general, which is relatively stable (although evolving) and persists over a relatively long period as a combination of boredom accumulated in foreign language classrooms, (in- and out-of-class) foreign language learning activities and general learning.
Regarding the psychometric properties of the FLLBS–SF, we conducted a series of rigorous validation and assessment using a large, merged dataset with participants of diverse L2 backgrounds. The FLLBS–SF was found to have satisfactory (1) validity (i.e., face validity, construct validity, convergent/discriminant validity, criterion validity, and predictive validity), (2) reliability (i.e., internal consistency, test–retest reliability, and split–half reliability) and (3) measurement invariance (across time and across multiple heterogenous groups including age, gender, region, educational level, and language status). To sum up, the FLLBS–SF is a conceptually sound, psychometrically robust, stable, and parsimonious instrument to measure L2 learners’ general and long-term (trait-like) boredom in relation to L2 learning.
Results for RQ2 show that FLLB had consistent negative and modest predictive effects on overall L2 achievement/proficiency and skill-specific L2 achievement/proficiency, supporting our first and second hypotheses. In other words, general FLLB was found to be a negative predictor of academic success in L2 learning across measures (curriculum-based course exams and the proficiency test) and skills or subsystems (i.e., vocabulary and grammar achievement, listening, reading, and writing). The results indicate that participants with higher levels of boredom in English learning tended to have lower scores in overall end-term English exams and the global English proficiency test, and their subsections on different skills. Resultantly, our findings extend the argument of the skill-specificity of L2 learner emotions (Li et al., Reference Li, Dewaele and Hu2023; Li et al., Reference Li, Li and Jiang2024) by highlighting the skill generality simultaneously. That is, although L2 learners’ emotional experiences (e.g., intensity, frequency, and causes) may vary from skill to skill, the boredom–achievement/proficiency link is a consistent pattern across skills.
The FLLB–overall L2 achievement link obtained among Chinese secondary EFL students in the current study dovetail with prior significant results obtained from EFL learners from China, Turkey, and Morocco (e.g., Dewaele, Botes, & Meftah, Reference Dewaele, Botes and Meftah2023; Li & Li, Reference Li and Li2023; Özsaray & Eren, Reference Özsaray and Eren2018; Zhao & Wang, Reference Zhao and Wang2023). The significant links differ from the nonsignificant correlations obtained from international students (Dewaele, Botes, & Greiff, Reference Dewaele, Botes and Greiff2023). The difference might be attributed to the variations in the measures of boredom and groups of participants.
Our findings also extend the literature on the role of FLLB by including overall L2 proficiency and skill-specific L2 proficiency/achievement as the outcome variables of FLLB. As reviewed, although there were no directly relevant theoretical assumptions or prior empirical evidence for the role of emotions in affecting L2 proficiency and skill-specific L2 proficiency/achievement, their links could be explained by drawing on relevant theories and literature (e.g., the control–value theory, Pekrun, Reference Pekrun2006). FLLB, a negative achievement emotion with a low activation and an activity-related focus, is linked to a frequent lack of interest in ongoing L2 learning activities, constant demotivation, recurrent disengagement or pseudo-/superficial/procedural engagement in L2 learning, habitual distractions, short attention span, superficial information processing in L2 task completion, less self-regulated L2 learning, and fewer motivated learning behaviors (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023; Li et al., Reference Li, Li and Jiang2024, Li & Li, Reference Li and Li2024; Pawlak et al., Reference Pawlak, Kruk, Zawodniak and Pasikowski2020; Pekrun, Reference Pekrun2006). These adverse effects may further converge and impede L2 achievement/proficiency. The findings also show that the predictive effects of the general FLLB on L2 achievement/proficiency were both skill-general and skill-specific. However, boredom itself is skill-specific in terms of the way it is organized and instigated or the way it influences various learning outcomes (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023; Li & Li, Reference Li and Li2024).
RQ3 was concerned with the association between FLLB and L2 achievement across the three time points that spanned over a year. Our results only partially support the third hypothesis that FLLB and English achievement would reciprocally and negatively predict each other over time. First, the cross-lagged panel modeling results showed cross-sectional links between FLLB and L2 achievement at three different time points. Specifically, FLLB and L2 achievement were significantly negatively correlated at Time Point 1. However, this relationship was no longer significant at Time Point 2 and was weakened further by Time Point 3. This counterintuitive finding suggests that the effects of learners’ emotions fluctuate and may interact with each other and with motivational constructs over time (Dewaele & Meftah, Reference Dewaele and Meftah2023). The finding that FLLB no longer predicted L2 achievement at Time 2 and 3 could mean that more positive emotions and attitudes had neutralized its effect as the year progressed.
Secondly, the cross-lagged panel modeling results also show that L2 achievement consistently predicted subsequent FLLB negatively. This result extends prior cross-sectional studies on their associations (e.g., Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023) by specifying the effect of directionality over time. Cross-sectional studies typically assume that the effect goes from FLLB to achievement; however, such a design cannot decide the accurate directionality. Our study adopted a longitudinal design, revealing that the impact went from L2 achievement to FLLB rather than the opposite. The current findings dovetail with Alamer and Lee’s (Reference Alamer and Lee2021) findings that the effect went from L2 achievement to anxiety rather than the reverse. The unidirectional negative impact of L2 achievement on FLLB might result from a change in the emotional dynamics. Inspired by the control–value theory (Pekrun, Reference Pekrun2006), previous learning experience is an essential environmental factor that precedes emotions. Good scores in L2 exams may have initiated a positive feedback loop, resulting in reduced FLLB and vice versa. That is, in response to prior language achievement indicated as high course exam scores, learners may also have felt increased subjective control over the learning process, which further alleviates boredom and vice versa.
Another interesting finding was that FLLB did not predict subsequent L2 achievement across time. Similar patterns emerged in the study by Li and Li (Reference Li and Li2023). The local educational context may help to explain this pattern. Specifically, our participants in Substudy 3 were all Chinese secondary students in a competitive educational context where the government, the educational sector, school administrators and instructors, and parents generally place a high value on education and the subject of English of interest. English has been emphasized as a crucial asset for success and personal development (e.g., university admission and job opportunities) (Hu, Reference Hu2005). The participants’ extrinsic motivation to achieve higher in English is expected to grow as their upgrade from middle school to high school gets closer (temporally in parallel with our research project). We could thus expect that the debilitative effect of perceived boredom on L2 achievement was declining as a result of the increasing extrinsic motivation to achieve higher in English. In other words, participants may have decided that despite their FLLB, they would work hard to get good results, which would explain why FLLB did not predict any subsequent L2 achievement. In addition, the strong autoregressive stability among FLLB across time (i.e., the effect from FLLB at a prior time point to FLLB at a subsequent time point) could have reduced the power of FLLB in explaining the variance of L2 achievement (Adachi & Willoughby, Reference Adachi and Willoughby2015).
Limitations, Implications, and Suggestions for Future Research
The study has the following limitations. First, our participants were all Chinese EFL learners, which limits the generalizability of our findings. Future studies could cross-validate the FLLBS–SF in more heterogeneous L2 contexts. Second, since English proficiency was only collected at one time point, it was not possible to establish causality between FLLB and English proficiency. Future studies could adopt a longitudinal design to explore their causal relationships and their developmental trajectories (Alamer & Lee, Reference Alamer and Lee2021; Kenny, Reference Kenny1979).
Some pedagogical implications can be drawn from this study. Firstly, L2 educators should raise awareness about the importance of L2 learner emotions and help learners boost positive emotions and avoid or reduce FLLB. It is worth noting that how students feel matters not only for their well-being but also for their overall/skill-specific L2 performance. That is, an intervention on FLLB is not only an intervention on students’ emotional well-being in L2 contexts but also could be an indirect intervention to boost their L2 achievement/proficiency. Secondly, the factors identified in Study 1 have some clear pedagogical implications for L2 teachers. For example, they could try to reduce boredom by diversifying task/exercise format, avoiding excessive repetition in tasks, reducing the length of the same task (see the items for Foreign Language Activity Boredom), establishing a positive L2 classroom atmosphere (see the items for Foreign Language Classroom Boredom), and collaborating with teachers for other subjects to improve students’ emotional experiences in learning in general (see the items for General Learning Boredom). The other implication for L2 educators is the need to identify, remove, or loosen the effect from prior poor L2 achievement to subsequent higher levels of boredom.
Conclusion
The current study has corroborated, extended, and enriched the control–value theory in the L2 context. FLLB was found to predict L2 achievement/proficiency in a skill-general way. The skill generality should be highlighted in the control–value theory framework, in parallel with the skill (subsystem) specificity (Li, Dewaele et al., Reference Dewaele, Botes and Greiff2023; Li & Li, Reference Li and Li2024). This underlines the need to include both curriculum-based language course exams and global proficiency tests as measures for academic achievement in L2, which is the core dependent variable in the control–value theory framework and in L2 learner emotion research. In addition, our findings specified the directionality between FLLB and L2 achievement over time. L2 achievement was found to negatively influence FLLB unidirectionally, which contradicts the reciprocity claimed by the control–value theory (Pekrun, Reference Pekrun2006). Moreover, our findings resonate with the waxing and waning of the effects of FLLB—and other emotions—highlighted in Li and Li (Reference Li and Li2023). To conclude, the new 11-item FLLBS–SF was successfully developed and validated. It has sound psychometric properties and robust measurement invariance, and we recommend it for future research.
Author contribution
Chengchen Li: Conceptualization; Research Design; Data Collection; Data Process and Analyses; Writing–Abstract, Introduction, Literature Review, Motivation and Rationale, Discussion, and Implications; Revision & Editing; Resources & Project Administration; Funding Acquisition. Enhao Feng: Statistical Analyses; Writing– Methodology and Results; Drafting: Discussion; Revision. Xian Zhao: Drafting –Literature Review. Jean-Marc Dewaele: Drafting – Conclusion; Revision & Editing.
Acknowledgments
The authors wish to thank the participants of the study. Datasets 1 and 2 utilized in the current study were initially used by Li et al. (Reference Li, Dewaele and Hu2023) and Zhao and Wang (Reference Zhao and Wang2023). Datasets 3 and 4 are from the projects funded by the National Social Science Foundation of China (Recipient: Chengchen Li; Grant No.: 19CYY017). Sincere thanks also go to the anonymous reviewers and editors of SSLA for their helpful and constructive comments. The authors are solely responsible for any limitations and errors.
Competing interest
The authors declare that they have no competing interests.
Appendix
The Foreign Language Learning Boredom Scale–Short Form
To what extent do you agree with the following statements in relation to English learning?
1= Strongly Disagree, 2 = Disagree, 3 = Undecided, 4 = Agree, 5 = Strongly Agree