Introduction
Replicating meta-analysis
Reproducibility of meta-analysis is important (Plonsky, Reference Plonsky and Porte2012; van IJzendoorn, Reference van IJzendoorn, van der Veer, van IJzendoorn and Valsiner1994; Weissgerber et al., Reference Weissgerber, Brunmair and Rummer2021), given that meta-analytically derived results have a massive impact on the relevant research domain and practice. Examining reproducibility is at the core of replication research, which assesses the extent to which study findings can be reproduced in replication attempts (Marsden et al., Reference Marsden, Short, Thompson and Abugaber2018). Because meta-analysis involves various methodological choices in the process (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2021), different analysts may produce different results even from the same raw data (van IJzendoorn, Reference van IJzendoorn, van der Veer, van IJzendoorn and Valsiner1994). Therefore, Weissgerber et al. (Reference Weissgerber, Brunmair and Rummer2021) proposed the importance of methods’ reproducibility to improve the quality of meta-analysis. They claimed that if the description of meta-analytical approaches is sufficient, an independent researcher can reproduce the analysis results within an acceptable margin of error. The accessibility of the raw data, analysis code, and statistical output also contributes significantly to enhancing the reproducibility of meta-analysis.
Given the importance of meta-analysis reproducibility, the first goal of this study was to reproduce the findings of Jeon and Yamashita (Reference Jeon, Yamashita, In’nami and Jeon2022; hereafter, “the 2022 study”), an updated version of “L2 Reading Comprehension and its Correlates: A Meta-Analysis” published in Language Learning (Jeon & Yamashita, Reference Jeon and Yamashita2014; hereafter, “the 2014 study”), which systematically summarized bivariate correlations between L2 reading comprehension and its underlying components. The 2014 study has significantly contributed to L2 reading research by being cited more than 500 times at the time of this writing, in accordance with the importance of understanding L2 proficiency (Jeon & In’nami, Reference Jeon and In’nami2022). Moreover, both studies argued the relative importance of reading components for L2 reading comprehension, which serves as a valuable resource for L2 teachers to determine the degree to which each component should be emphasized in their reading instruction (Hamada, Reference Hamada2020).
Another motivation for replicating meta-analysis lies in its methodological progress that enables the exploration of questions left unresolved in previous meta-analyses (Plonsky, Reference Plonsky and Porte2012). This aligns with the stringent definition of replication, or a series of modified repetitions of an original study (Porte & McManus, Reference Porte and McManus2019). The 2014 and 2022 studies concluded that their unresolved issue was the establishment of a comprehensive model of L2 reading comprehension. Therefore, the second goal of this study was to extend the findings of the 2022 study to the modeling of the Simple View of Reading (SVR) in L2.
SVR is a leading theory that the process of learning to read entails mapping written words to spoken language; consequently, reading comprehension is determined by the interaction between word decoding and oral language comprehension skills (Hoover & Gough, Reference Hoover and Gough1990). Word decoding is the cognitive process of deciphering written words into their corresponding spoken language forms, requiring knowledge of letter-sound correspondence. Oral language comprehension represents linguistic knowledge and processes used to understand spoken language, such as vocabulary knowledge and listening comprehension. Moreover, word decoding skills explain more variance in reading comprehension in the early stages of learning to read, while the contribution of oral language comprehension to reading comprehension increases as the word decoding processes become automatized (see Sparks, Reference Sparks2021 for a more comprehensive review). The potential of this theory to characterize strengths and weaknesses in L2 readers has also been discussed both theoretically (Sparks, Reference Sparks2021) and empirically (Verhoeven & van Leeuwe, Reference Verhoeven and van Leeuwe2012). While the traditional meta-analysis, as employed in the 2014 and 2022 studies, is prevalent in the field of L2 research, it only provides aggregated results of bivariate relationships (e.g., L2 reading comprehension and vocabulary, grammar, and metacognitive knowledge) and small-scale interactions among them (Raeisi-Vanani et al., Reference Raeisi-Vanani, Plonsky, Wang, Lee and Peng2022). This limitation impedes the investigation of questions concerning the extent to which interactions among these components predict variance in L2 reading comprehension, aligning with the SVR framework.
Identifying the structural relationships among correlated components of L2 proficiency using meta-analytic structural equation modeling (MASEM) has received increasing attention in L2 research (In’nami, Cheung, et al., Reference In’nami, Cheung, Koizumi and Wallace2022). The MASEM approach combines meta-analysis and structural equation modeling methodologies (e.g., Cheung, Reference Cheung2015b), allowing for formulation of a hypothesized psychological model with multiple observed and latent variables using a synthesized/pooled correlation matrix. For example, Lee et al. (Reference Lee, Jung and Lee2022) applied the SVR model to L2 reading in their MASEM using 81 independent samples from 67 primary studies (n = 10,526). As depicted in Figure 1, rather than examining a direct relationship between L2 reading comprehension and each component, their model revealed an association between the two latent variables of L2 decoding and comprehension skills with reading comprehension. Collectively, these factors accounted for over 60% of the variance in L2 reading comprehension, with L2 comprehension abilities contributing more significantly. In comparison to Lee et al. (Reference Lee, Jung and Lee2022), the 2022 study synthesized a broader array of variables from a larger pool of samples. Consequently, the application of MASEM has potential to extend their findings by providing a comprehensive and robust structure between the correlated components of L2 reading comprehension. To achieve this, we will review the 2022 study and describe our replication approach.
The original study
The purpose of both the 2014 and 2022 studies was to examine the strength of the correlation between L2 reading comprehension and 10 components recognized as significant predictors of L2 reading variance. These components include L2 decoding, L2 phonological awareness, L2 orthographic knowledge, L2 morphological knowledge, L2 vocabulary knowledge, L2 grammar knowledge, L1 reading comprehension, L2 listening comprehension, working memory, and metacognition. Guided by the theoretical framework of the multi-component view of reading, these studies sought to investigate the relative magnitudes of the correlations, aiming to identify which components exhibit stronger associations with L2 reading comprehension.
As a meta-analytic procedure, the 2014 and 2022 studies systematically searched for primary studies that met specific criteria: (a) reporting both sample size and the original correlation coefficient between passage-level L2 reading comprehension and one or more of the identified components, (b) including only participants without language-related disabilities, and (c) being published in English. Consequently, the 2014 study identified 67 independent samples from 58 studies published between 1979 and 2011. The 2022 study expanded this database by adding research published between 2011 and 2017, comprising a total of 107 independent samples from 88 studies. Study characteristics were coded according to the following scheme: (a) if studies appeared to have reported on duplicate samples, only one study for each was included in the meta-analysis; (b) for longitudinal studies, only data collected at one time point were included by selecting the one most comparable with the rest of the studies in the analysis pool (e.g., “if there were 31 samples in the pool and 30 of them involved adolescent/adult participants, and one last sample was from a longitudinal study following a group of sixth graders for four years, we used the data taken in the tenth grade,” Jeon & Yamashita, Reference Jeon and Yamashita2014, p. 177); and (c) if studies reported multiple measures for a construct that could be defined as unitary, the average value of the reported correlations was used.
Correlations, weighted for sample size and corrected for attenuation, were synthesized using the random-effects model with Comprehensive Meta-Analysis Version 2 (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2021). As a result, the 2022 study concluded that the relationship between L2 linguistic knowledge, such as vocabulary and grammar, and L2 reading comprehension was robust. L2 processing skills, such as L2 decoding, phonological, orthographic, and morphological skills, also showed moderate-to-strong correlations, suggesting their significance as predictors of L2 reading comprehension. Notably, L2 listening comprehension emerged as the strongest factor in achieving higher levels of L2 reading comprehension, aligning with the SVR model for L2 (Lee et al., Reference Lee, Jung and Lee2022). In addition, the modest correlations of language-general variables, such as working memory and metacognition, suggest the relative importance of language processing skills based on sufficient linguistic knowledge in L2 reading comprehension (Nassaji, Reference Nassaji2014).
The scope and approaches of replication
As noted, the scope of this replication is twofold: (a) to ascertain whether the meta-analytic findings of the 2022 study can be reproduced with respect to the strength of correlations between L2 reading comprehension and the 10 components and (b) to explore the extension of their findings to the L2 SVR model using the MASEM approach. Methodologically, this replication is a complete secondary analysis, according to a process model of different types of replications (van IJzendoorn, Reference van IJzendoorn, van der Veer, van IJzendoorn and Valsiner1994). Regardless of the type of research, this approach involves the recoding and reanalyzing of the original data to recreate a dataset. Illustrating a complete secondary meta-analysis, Weissgerber et al. (Reference Weissgerber, Brunmair and Rummer2021) examined outcome reproducibility, entailing the re-creation of the dataset used in the original study, to inspect whether and to what extent the reanalyzed results vary when applying the same analytical approach (i.e., computational reproducibility and verification) or different analytical choices (i.e., analysis reproducibility). As the 2014 and 2022 studies presented a comprehensive list of publicly available literature and a detailed coding scheme with clearly described analytical methods, we can confirm the reproducibility of their outcomes.
As a modified repetition of the original study, we also employ the MASEM approach instead of the correlation-based analysis adopted in the 2022 study. Although the 2014 and 2022 studies estimated individual effects only, MASEM allows the investigation of more than just one effect simultaneously (In’nami, Cheung, et al., Reference In’nami, Cheung, Koizumi and Wallace2022), such as the simple effects of L2 vocabulary and grammar knowledge, along with their interaction effect on L2 reading comprehension. Another advantage is that MASEM can manage latent variables (e.g., L2 decoding) underlying several observed variables (e.g., word-reading accuracy and fluency). Considering that the SEM approach allows for extending the findings of original studies without SEM (Hamada & Takaki, Reference Hamada and Takaki2021), it is essential to investigate the robustness of specified models with meta-analyzed data. This minimal yet theoretically motivated change facilitates an extension of the 2022 study as well as a fair comparison of its findings with this replication (Porte & McManus, Reference Porte and McManus2019).
The re-creation of the 2022 study dataset has the advantage of extending the SVR model developed by Lee et al. (Reference Lee, Jung and Lee2022). As shown in Figure 1, while their SVR model consists of six components (i.e., L2 vocabulary and grammar knowledge, listening comprehension, and L2 decoding accuracy, fluency, and efficiency), the 10 components commonly synthesized in the 2014 and 2022 studies can be applied to Peng et al.’s (Reference Peng, Lee, Luo, Li, Joshi and Tao2021) SVR model for L1 readers. As shown in Figure 2, this model incorporates the three components of vocabulary and grammar knowledge and listening comprehension for a latent variable of language comprehension skills; two components of decoding accuracy and fluency for a latent variable of word decoding skills; and four components of phonological awareness, morphological knowledge, orthographic knowledge, and rapid automatized naming for a latent variable of metalinguistic skills. Peng et al. added metalinguistic skills (defined as the ability to think about language with awareness by manipulating structural features of language, such as phonemes, words, and sentences; Tunmer et al., Reference Tunmer, Herriman and Nesdale1988) to the SVR model because they are robust predictors of developing word decoding and language comprehension skills. Their MASEM results showed that decoding and language comprehension together explained 53% of the variance in reading comprehension. In addition, although metalinguistic skills significantly contributed to decoding and showed a strong relationship with language comprehension, they did not directly contribute to reading comprehension. As previous studies have found that cognitive skills in L2 comprehension are also associated with the development of L2 metalinguistic skills (e.g., Siu & Ho, Reference Siu and Ho2015; Wang et al., Reference Wang, Cheng and Chen2006), this study examined the SVR model established by Peng et al. for L2 readers.
Method
Design of the current meta-analysis
Table 1 summarizes the differences in meta-analytic choices between the 2022 and present studies. As described in the Study Coding section, the change in the handling of data dependency from the 2022 study to perform MASEM was practically motivated, not theoretically (i.e., unmotivated change; see Marsden et al., Reference Marsden, Short, Thompson and Abugaber2018). In the Meta-Analytic Procedure section, we explain why we motivated the change in the methodology for this secondary meta-analysis. The use of different software for statistical computing would not result in a difference between the two studies (Schwarzer et al., Reference Schwarzer, Carpenter and Rücker2015).
Note. Asterisks indicate motivated (*) and unmotivated (**) changes that potentially affected the replicated results. The methods used in the 2022 study were identical to those used in the 2014 study.
Study coding
The identical 88 studies used in the 2022 study were included in the present meta-analysis (see Supplementary Material 1). Two independent raters coded the study characteristics, such as study names, sample sizes, and correlations between L2 reading comprehension and each of the 10 components. Appendix A of the 2022 study provides detailed information on acceptable measures of both L2 reading comprehension and its components, which were referenced to determine which correlation coefficients reported in the primary studies should be recorded.
To satisfy the meta-analytic assumption of no more than one effect size per construct per study, studies by the same (co)authors were examined for data duplication (see Supplementary Material 2). In cases where multiple assessments were conducted for a single construct, they were averaged to control for data dependency. Similarly, if a longitudinal study reported multiple correlations from the same sample, only one set of correlations was included in the analysis. Although the 2022 study changed the data that were included based on comparability with other pooled studies per construct, we only used the first correlation to combine all variables into one pooled correlation for the MASEM approach.
To use the correlation matrices in the MASEM, correlations between the 10 components were also recorded. In addition, the measures of L2 word-decoding skills were further divided into decoding accuracy and fluency (tasks that measure the ability to read words aloud accurately in an untimed or timed format, respectively) to construct MASEM models in the same manner as Peng et al. (Reference Peng, Lee, Luo, Li, Joshi and Tao2021). As the intercoder agreement was sufficiently high (90%), any disagreements were resolved through discussion.
Meta-analytic procedure
Meta-analytic calculations were conducted following the procedures reported in the 2022 study. First, information regarding the sample sizes and correlations was entered into the {meta} package of R (Schwarzer et al., Reference Schwarzer, Carpenter and Rücker2015). Correlations attenuated by measurement error were corrected using the relevant reliability indices reported in the primary studies. In cases where statistical information was not provided, the mean reliability for each measure was employed for calculations. Then weighted average correlations between L2 reading comprehension and the 10 components were computed along with their 95% confidence intervals using the random-effects model. Based on the Open Science Collaboration (2015), the criteria for replication success were set as a combination of significance and small telescopes. Traditionally, replication was considered successful if it produced (non)significant p-values, as in the original study. However, Simonsohn (Reference Simonsohn2015) proposed small telescopes that combined null hypothesis significance testing with effect size estimation to examine whether the replicated effect size was significantly smaller than that of the original study. Therefore, in the present study, the success of replicating the previous findings was determined by considering that the average correlations between L2 reading comprehension and each of the 10 components did not differ significantly, with small standardized mean differences between the two studies (i.e., Cohen’s d < 0.40; see Plonsky & Oswald, Reference Plonsky and Oswald2014).
For the MASEM, we used correlation matrices collected from primary studies to analyze Peng et al.’s (Reference Peng, Lee, Luo, Li, Joshi and Tao2021) SVR model. We followed a two-stage MASEM approach using the {metaSEM} package (Cheung, Reference Cheung2014, Reference Cheung2015a); 78 correlation matrices from 68 studies were combined to construct a pooled correlation matrix in Stage 1, and the pooled matrix was used to test the theoretical model in Stage 2. The primary studies did not include all the correlations between L2 reading comprehension and the components related to these models, and thus maximum likelihood estimation was used for such missing correlations. Among the four components of metalinguistic skills, L2 orthographic knowledge and rapid automatized naming were removed because of a lack of sufficient samples. The theoretical model was assessed using goodness-of-fit indices (see Kline, Reference Kline2023, for review), including the comparative fit index (CFI ≥ .95), the Tucker-Lewis index (TLI ≥ .95), the root mean square error of approximation (RMSEA ≤ .06), and the standardized root mean square residuals (SRMR ≤ .05).
We built three theoretical models (see Figure 3) to determine whether (a) metalinguistic skills could be included in the SVR model and (b) metalinguistic skills made a direct contribution to L2 reading comprehension. Based on Peng et al. (Reference Peng, Lee, Luo, Li, Joshi and Tao2021), we examined whether in Model A, L2 metalinguistic skills directly impacted L2 decoding skills and were correlated with L2 comprehension skills. Model B integrated a direct effect of L2 metalinguistic skills on L2 reading into Model A. In Model C, any contribution from L2 metalinguistic skills was eliminated by fixing the relevant parameters to zero. Considering that the three models were nested, a likelihood ratio test was performed to determine the best-fitting model. We chose a complex model if it was significantly more accurate than a nested model. Otherwise, a nested model was selected for model parsimony (Peng et al., Reference Peng, Lee, Luo, Li, Joshi and Tao2021). Supplementary Material 3 contains the raw data, analysis codes, and statistical output (see also https://osf.io/jksp8/) in the manner proposed by In’nami, Mizumoto, et al. (Reference In’nami, Mizumoto, Plonsky and Koizumi2022).
Finally, we performed exploratory publication bias testing and confirmed that there was no unignorable influence of the compiled data on the present results. Supplementary Material 4 provides the details of the publication bias analyses and results.
Results and discussion
Replication of the 2022 study
Table 2 compares the number of samples used and the aggregated correlations between the 2022 and the present studies. Supplementary Material 5 provides forest plots for each component. Successfully reproduced effect sizes were observed for decoding, orthographic knowledge, morphological knowledge, grammar knowledge, listening comprehension, working memory, and metacognition, with no significant differences between the two studies. We first considered cases where there were no significant differences in effect sizes and sample selection (i.e., L2 orthographic knowledge, L2 listening comprehension, working memory, and metacognition). For L2 orthographic knowledge and metacognition, we combined the two groups sampled by Abu-Rabia and Sanitsky (Reference Abu-Rabia and Sanitsky2010). An additional sample was included for L2 orthographic knowledge (Noonan et al., Reference Noonan, Colleaux and Yackulic1997) and working memory (Shiotsu, Reference Shiotsu2010) because they each administered a spelling and reading span test. Although not significant, the effect of L2 listening comprehension was reduced in terms of the standardized mean difference. We added one sample that used a standardized L2 listening test (Woodcock Language Proficiency Battery: August et al., Reference August, Francis, Hsu and Snow2006) and removed the sample in Crosson and Lesaux (Reference Crosson and Lesaux2010) due to sample overlap with Lesaux et al. (Reference Lesaux, Crosson, Kieffer and Pierce2010).
Note. The original data were adopted from Table 2 of the 2022 study. Correlation coefficients were corrected for attenuation. For the components marked with an asterisk, there were significant differences in effect size between the two studies.
Second, although no significant difference in the aggregated effect size for L2 decoding was observed, the number of samples included in the meta-analysis differed considerably between the 2022 and the present studies. We examined the types of L2 decoding tasks used for the independent samples that were not included in the 2022 study. Of the nine added samples, six used standardized tests, such as the Letter-Word Identification subset of Woodcock Language Proficiency Battery (Chen et al., Reference Chen, Ramirez, Luo, Geva and Ku2012; Francis et al., Reference Francis, Snow, August, Carlson, Miller and Iglesias2006), Woodcock Reading Mastery Test-Revised (Jared et al., Reference Jared, Cormier, Levy and Wade-Woolley2011; Jia et al., Reference Jia, Gottardo, Koh, Chen and Pasquarella2014; Xue & Jiang, Reference Xue and Jiang2017), Woodcock-Munuz Language Survey-Revised (Leider et al., Reference Leider, Proctor, Silverman and Harring2013), Wide Range Achievement Test-Revised (Wang et al., Reference Wang, Cheng and Chen2006), and Würzburg Silent Reading Test (Limbird et al., Reference Limbird, Maluch, Rjosk, Stanat and Merkens2014) for decoding accuracy, and the Test of Word Reading Efficiency for decoding fluency (Jared et al., Reference Jared, Cormier, Levy and Wade-Woolley2011; Jia et al., Reference Jia, Gottardo, Koh, Chen and Pasquarella2014; Jiang et al., Reference Jiang, Sawaki and Sabatini2012; Lesaux et al., Reference Lesaux, Crosson, Kieffer and Pierce2010). The other samples used tailor-made tests that measured the ability to decode L2 graphemes into their corresponding phonemes (Koda, Reference Koda1998; Li et al., Reference Li, McBride-Chang, Wong and Shu2012; Siu & Ho, Reference Siu and Ho2015). For these reasons, including these samples was reasonable and improved the robustness of the findings regarding the effects of L2 decoding skills on reading comprehension.
Similarly, we sampled seven additional correlations for L2 grammar knowledge, wherein Swanson et al. (Reference Swanson, Orosco, Lussier, Gerber and Guzman-Orth2011) used a standardized test (the Morphological Closure subtest of the Illinois Test of Psycholinguistic Ability III), and the other samples employed tailor-made tests such as multiple-choice sentence completion (Brisbois, Reference Brisbois1995; Larson, Reference Larson1983; Shiotsu, Reference Shiotsu2010; Shiotsu & Weir, Reference Shiotsu and Weir2007) and grammaticality judgment (Mecartty, Reference Mecartty2000). However, the difference in the effect size for L2 grammar knowledge were small and did not reach significance. In contrast, four samples were removed from the analysis of L2 morphological knowledge due to data duplication (Chen et al., Reference Chen, Ramirez, Luo, Geva and Ku2012; Lam et al., Reference Lam, Chen, Geva, Luo and Li2012). They showed higher correlations with L2 reading comprehension than this study (average r = .62), resulting in a lower correlation coefficient.
The effect sizes that were not successfully replicated were for L2 phonological awareness, L2 vocabulary knowledge, and L1 reading comprehension. First, two samples were added to L2 phonological awareness (Francis et al., Reference Francis, Snow, August, Carlson, Miller and Iglesias2006; Kieffer & Vukovic, Reference Kieffer and Vukovic2013), in which the Comprehensive Test of Phonological Processing was used as a standardized measure. Meanwhile, three samples were removed for the same reason as L2 morphological knowledge. While the added samples yielded smaller correlation coefficients than in the 2022 study (average r = .50), the removed samples exhibited larger coefficients (average r = .71), reducing the overall impact of L2 phonological awareness. Second, regarding L2 vocabulary knowledge, we sampled eight more correlations than in the 2022 study. Among them, six samples used the Peabody Picture Vocabulary Test (Grant et al., Reference Grant, Gottardo and Geva2012; Jia et al., Reference Jia, Gottardo, Koh, Chen and Pasquarella2014; Li et al., Reference Li, McBride-Chang, Wong and Shu2012) and the Wechsler Abbreviated Scale of Intelligence vocabulary subtest (Kim, Reference Kim2012). Four samples employed tailor-made tests (e.g., cloze and meaning-recall formats) to measure L2 receptive vocabulary knowledge (Brisbois, Reference Brisbois1995; Sang et al., Reference Sang, Schmitz, Vollmer, Baumert and Roeder1986; Shiotsu, Reference Shiotsu2010). In cases with multiple samples in one study, the other three samples were not included in the 2022 study (Edele & Stanat, Reference Edele and Stanat2016: either Russian or Turkish sample; Shiotsu & Weir, Reference Shiotsu and Weir2007). These samples produced lower correlations (average r = .61) than those synthesized in the 2022 study, reducing the strength of the correlation between L2 vocabulary knowledge and reading comprehension in this replication. Finally, the effect size for L1 reading comprehension was also significantly smaller than that in the 2022 study. We included four additional samples that used standardized L1 reading comprehension tests: the Woodcock-Muñoz Language Survey-Revised (Goodwin et al., Reference Goodwin, August and Calderon2015; Swanson et al., Reference Swanson, Orosco, Lussier, Gerber and Guzman-Orth2011), the Gray Oral Reading Test (Jared et al., Reference Jared, Cormier, Levy and Wade-Woolley2011), and the Neilson Reading Test (adapted for Chinese L1 reading; Pasquarella et al., Reference Pasquarella, Chen, Lam, Luo and Ramirez2011). As these samples exhibited lower correlations (average r = .14) compared to the 2022 study, the correlation aggregated in this study was accordingly reduced.
Possible reasons for the failure to reproduce the previous results are (a) the recoding of the original raw data to recreate a dataset and (b) the motivated change in sample selection from a study that repeated the same assessment. Consistently, additional samples for each variable showed a lower correlation with L2 reading comprehension in the current meta-analysis than in the 2022 study. Some of these samples included a significantly higher number of participants than the median sample size (Mdn = 91) of pooled studies (e.g., n = 12,252 for L2 vocabulary knowledge: Sang et al., Reference Sang, Schmitz, Vollmer, Baumert and Roeder1986; n = 471 for L1 reading comprehension: Swanson et al., Reference Swanson, Orosco, Lussier, Gerber and Guzman-Orth2011). This suggests that these large-scale studies affected the pooled effect sizes, as they were weighted by sample size (Schwarzer et al., Reference Schwarzer, Carpenter and Rücker2015). The synthesis of effect sizes may also have been affected by the motivated change in sample selection. However, given that a longitudinal design was used across the 10 components, the addition of the studies reporting small effect sizes may have led to failure to replicate the findings of the 2022 study.
Secondary analysis of the 2022 study
By using the restructured dataset of the 2022 study, pooled correlations among the variables were estimated (78 samples from 68 studies with 12,062 participants). Table 3 shows that the pooled correlation coefficients ranged from. 30 to. 63, with all correlations being statistically significant (p < .05), aligning closely with the results of Lee et al. (Reference Lee, Jung and Lee2022; r = .21–.62).
Note. Values above the diagonal indicate the number of samples for each pooled correlation.
Based on the pooled correlation matrix, we built Models A, B, and C (see Figure 3), in which L2 reading comprehension was the dependent variable and the three latent variables were the independent variables. The likelihood ratio test showed no significant difference between Models A and B, χ2(1) = 2.51, p = .113, suggesting that Model A was more parsimonious than Model B regarding the structural complexity of SVR. The SVR model without metalinguistic skills (Model C) was not supported because the other two models were significantly more accurate, χ2(2) = 1163.79, p < .001 (vs. Model A), χ2(3) = 1166.30, p < .001 (vs. Model B). Therefore, based on Model A, we will discuss the relative contributions of L2 decoding and comprehension skills to L2 reading comprehension.
As depicted in Figure 4, the three latent variables were composed of significant and strong factor-loading coefficients (standardized path coefficient βs >. 60). The observed data fit the model with sufficient goodness-of-fit: χ2(16) = 22.15, p = .138, CFI = .998, TLI = .997, RMSEA = .006 (95% CI [.000,. 011]), SRMR = .043. Both L2 decoding (β = .18, 95% CI [.01,. 34], p = .033) and comprehension skills (β = .58, 95% CI [.43,. 73], p < .001) significantly contributed to L2 reading comprehension. There was a significant difference in their relative importance in L2 reading comprehension as the 95% CIs of the two path coefficients did not overlap. This result suggests that L2 comprehension skills contribute more to L2 reading comprehension than L2 decoding skills. Consistent with Peng et al. (Reference Peng, Lee, Luo, Li, Joshi and Tao2021), metalinguistic skills made a direct contribution to L2 decoding skills (β = .86, 95% CI [.77,. 96], p < .001) and were strongly correlated with L2 comprehension skills (β = .99, 95% CI [.90, 1.08], p < .001). In the SVR model with metalinguistic skills, 53% of the variance in L2 reading comprehension was explained.
Regarding the primary finding of the MASEM approach, a larger amount of variance in L2 reading comprehension could be attributed to both L2 decoding and comprehension skills in the SVR model with the indirect effects of metalinguistic skills compared to the model without metalinguistic skills. This coefficient of determination was comparable to that found in previous MASEM studies on L1 (53%; Peng et al., Reference Peng, Lee, Luo, Li, Joshi and Tao2021) and L2 (61%; Lee et al., Reference Lee, Jung and Lee2022) reading comprehension. As suggested by the 2014 and 2022 studies, the present MASEM results reinforce a more substantial contribution of L2 comprehension skills to L2 reading comprehension.
Limitations and future directions
The present replication includes a few limitations. First, it is necessary to investigate the complicated causes of the failure to replicate certain results of the 2022 study. The inherent flexibility in dataset creation can pose challenges in replicating meta-analytic results (Norouzian, Reference Norouzian2021). Moreover, both the present and 2022 studies may oversimplify the intricate outcomes by either discarding or averaging correlations from primary studies with longitudinal designs or multiple measurements (see also Norouzian & Bui, Reference Norouzian and Bui2023). While ideal solutions may involve leveraging modern approaches to meta-analysis and MASEM, such capabilities are not currently available. A promising avenue for improvement is to strive for the transparency of meta-analytic approaches and open-science practices in generating and reporting results, facilitating secondary analysis (Marsden & Plonsky, Reference Marsden, Plonsky, Gudmestad and Edmonds2018). Although In’nami, Mizumoto, et al. (Reference In’nami, Mizumoto, Plonsky and Koizumi2022) advocate for computationally reproducible research that involves sharing supplementary information such as raw data, analysis code, and output files for reanalysis and evaluation of statistical results, it is essential to consider the other three aspects of reproducibility proposed by Weissgerber et al. (Reference Weissgerber, Brunmair and Rummer2021) for the best reporting practice in meta-analyses.
In addition, for the MASEM results, including a larger number of samples is desirable to establish a stable and precise SVR model for L2. The use of the 2022 study dataset not only supports the applicability of SVR to L2 readers as in Lee et al. (Reference Lee, Jung and Lee2022), but also expands our understanding of the structural relationships between cognitive and metacognitive skills in L2 reading proficiency. Unfortunately, due to the limited availability of samples, the current MASEM model could not incorporate the effects of moderator variables (e.g., age, language setting, L1-L2 language distance, L1-L2 script distance, and L2 proficiency). While transcribing correlation coefficients from primary studies, it was observed that in some cases, only the correlation between L2 reading comprehension and target variables was reported, omitting correlations among the variables. Future studies should present a correlation matrix between L2 reading comprehension and its components, even if some components are not the primary focus theoretically or practically. Specifically, as shown in Table 3, the number of correlations between L2 decoding fluency and listening comprehension was relatively limited, despite being central to the SVR model. It is also necessary to explore correlations between L2 listening comprehension and L2 phonological awareness and morphological knowledge when applying Peng et al.’s (Reference Peng, Lee, Luo, Li, Joshi and Tao2021) SVR model. Moreover, more research on L2 orthographic knowledge is urgently required to examine the moderating effects of L1-L2 script distance on L2 reading comprehension.
Lastly, the SVR model examined in this study does not represent the whole picture of L2 reading. The latest meta-analysis by Lee and Lee (Reference Lee and Lee2023) has proposed an extended simple view of L2 reading that includes intelligence and working memory as cognitive abilities in addition to the Peng et al.’s (Reference Peng, Lee, Luo, Li, Joshi and Tao2021) model. Consistent with the findings of the present study, they have also shown the parsimoniousness of the SVR model by confirming that cognitive variables do not uniquely contribute to L2 reading comprehension. However, given the complex factors that influence L2 reading difficulties, future studies need to be systematic in establishing a comprehensive model of L2 reading that can explain reading problems not identified by the SVR model alone (Sparks, Reference Sparks2021). For example, the component model of reading (Joshi & Aaron, Reference Joshi and Aaron2000) incorporates affective and instructional factors that have long been studied in the context of L2 reading (e.g., Hamada & Takaki, Reference Hamada and Takaki2021). Testing the applicability of such reading models to L2 contexts will contribute to the science of the multi-component view of reading.
Implications
Despite these limitations, this study represents the first attempt to replicate the meta-analytic results of L2 research. Our replication makes unique contributions to L2 research in the following ways:
-
1. Highlighting the importance of secondary analyses for evaluating research reproducibility, which aligns with the open science movement and methodological reforms in L2 research (Marsden & Plonsky, Reference Marsden, Plonsky, Gudmestad and Edmonds2018).
-
2. Calling for replicating meta-analyses in L2 research because different meta-analysts may propose different procedures to extend and generalize the findings of original meta-analyses.
-
3. Extending the insights of the SVR model accumulated by individual studies with a large dataset, which will help develop a comprehensive model of L2 reading comprehension, affording a better understanding of the complicated relationships among reading components.
This study also has several theoretical and practical implications for L2 reading. From a theoretical perspective, our findings are consistent with the argument that the SVR model can be applied to L2 as an influential reading model (e.g., Lee et al., Reference Lee, Jung and Lee2022; Lee and Lee, Reference Lee and Lee2023; Sparks, Reference Sparks2021; Verhoeven & van Leeuwe, Reference Verhoeven and van Leeuwe2012). They also support the idea that L2 reading proficiency can be characterized in a way that L2 comprehension skills are stronger predictors than L2 decoding skills. More importantly, L2 metalinguistic skills have an indirect effect on L2 comprehension through L2 decoding and comprehension skills that function as possible mediators. These findings suggest that although the SVR model may be as parsimonious as possible for L2, L2 metalinguistic skills play a significant role in enhancing L2 decoding and comprehension skills for L2 reading comprehension.
For L2 reading instruction, the SVR model is helpful for obtaining information about struggling learners (Sparks, Reference Sparks2021). In the current SVR model (see Figure 4), L2 vocabulary and grammar knowledge, along with L2 listening comprehension, equally explained a large amount of the variance in L2 comprehension skills. These components are crucial for distinguishing successful and struggling L2 readers and should be the focal point of L2 reading instruction. When compared to improving word-reading fluency, improving L2 word-reading accuracy is expected to lead to being a stronger L2 reader (see also Lee et al., Reference Lee, Jung and Lee2022). As L2 metalinguistic skills underlie L2 decoding skills and are associated with L2 comprehension skills, phonological and morphological (and orthographic) knowledge can be taught explicitly and learned implicitly through L2 comprehension activities for efficient word reading (Nassaji, Reference Nassaji2014).
Conclusions
We replicated Jeon and Yamashita’s (Reference Jeon, Yamashita, In’nami and Jeon2022) meta-analytic findings regarding the bivariate relationships between L2 reading comprehension and its components. The previous results that were successfully reproduced in terms of correlational strength included L2 decoding, L2 morphological, orthographic, and grammar knowledge, L2 listening comprehension, L1 reading comprehension, working memory, and metacognition. Through a complete secondary analysis, this replication improved the statistical robustness concerning the significance of these correlates in L2 reading comprehension. The MASEM revealed that metalinguistic skills were associated with L2 decoding and comprehension skills but did not directly contribute to L2 reading comprehension, supporting the parsimonious structure of SVR, even in the context of L2 reading. The finding that L2 comprehension skills were stronger predictors of L2 reading comprehension than L2 decoding skills aligns with systematic (Jeon & Yamashita, Reference Jeon and Yamashita2014, Reference Jeon, Yamashita, In’nami and Jeon2022) and narrative (Sparks, Reference Sparks2021) inquiries, as well as previous MASEM findings (Lee et al., Reference Lee, Jung and Lee2022, Lee and Lee, Reference Lee and Lee2023).
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0272263124000226.
Data availability statement
The dataset, analysis codes, and output files used in this study are available from the OSF Data Repository (https://osf.io/jksp8/).
Acknowledgements
This research was supported by the Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research 20H01287 and 22K18579. We would like to thank Dr. Kevin McManus and the anonymous reviewers for their constructive suggestions for improvement.