1. Introduction
Corpus linguistics is perceived to have revolutionised language learning, since learners are encouraged to become researchers who can deduce language rules from fresh, rich, and authentic corpus data (Boulton & Cobb, Reference Boulton and Cobb2017; Johns, Reference Johns1991; Sinclair, Reference Sinclair1991). Corpora have been used extensively in nearly all branches of language studies, including lexicography, grammar, semantics, pragmatics, language variation/change, contrastive and translation studies, and discourse analysis (McEnery & Xiao, Reference McEnery, Xiao and Hinkel2011). Accordingly, corpus linguistics has greatly advanced our understanding of language analysis, shifting from intuition- to corpus-based inductive observations of language accounts. However, its influence on language teaching remains marginal (Boulton, Reference Boulton2017; Callies, Reference Callies, Götz and Mukherjee2019; Ma, Tang & Lin, Reference Ma, Tang and Lin2021).
Despite continuous calls for the integration of corpus training into teacher education programmes (Breyer, Reference Breyer2009; Callies, Reference Callies, Götz and Mukherjee2019; Chen, Flowerdew & Anthony, Reference Chen, Flowerdew and Anthony2019; Farr, Reference Farr2008; Leńko-Szymańska, Reference Leńko-Szymańska2017), a key issue remains to be solved – what skills and knowledge teachers should develop in the training. This study argues that corpus literacy (CL) needs to be developed before teachers can develop the necessary pedagogical skills to integrate corpora into their teaching. Building on the work of Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) and Callies (Reference Callies, Götz and Mukherjee2019), we proposed a five-component CL framework and collected survey data from 183 teachers and student teachers to test whether improving CL would increase teacher intention to use corpora in their teaching. No previously published study has conducted a comprehensive survey to verify the specific skills and knowledge involved in the CL construct. In addition, we explored teacher-perceived advantages and limitations of integrating corpora into classroom teaching.
A structural equation modelling (SEM) approach was adopted to analyse the collected Likert-scale survey data. Further, the teacher-perceived benefits and limitations of corpora collected from open-ended questions were analysed qualitatively. Our study adds empirical evidence to support CL as a key theoretical construct, which contributes to our theoretical understanding of corpus research on teacher training. The findings of this research can also provide practical guidance for various stakeholders (corpus linguists, corpus website developers, and teacher educators) to inform them how to provide effective corpus teacher training.
2. Literature
2.1 Corpus linguistics in language education
Scholars began using corpus linguistics as a research tool in the 1980s, and educators subsequently discovered its practical applications for language learning and teaching (Johns, Reference Johns1991; Sinclair, Reference Sinclair1991). Leech (Reference Leech, Wichmann, Fligelstone, McEnery and Knowles1997) outlined two types of corpus applications for language teaching: (1) using corpora during classroom teaching (direct use) and (2) aiding teaching and teacher development through the development of corpus-based references/dictionaries, teaching materials, or testing materials (indirect use). Several decades after the calls to use corpora for teaching, studies have documented evidence of both indirect use, primarily involving reference publishing (Leńko-Szymańska & Boulton, Reference Leńko-Szymańska and Boulton2015; McEnery & Xiao, Reference McEnery, Xiao and Hinkel2011), and direct use of integrating corpora into language teaching (Boulton & Cobb, Reference Boulton and Cobb2017; Boulton & Vyatkina, Reference Boulton and Vyatkina2021; Lee, Warschauer & Lee, Reference Lee, Warschauer and Lee2017; Pérez-Paredes, Reference Pérez-Paredes and Crosthwaite2019). However, there is insufficient evidence of direct use regarding the professional development of corpus training for in- and pre-service teachers (Callies, Reference Callies, Götz and Mukherjee2019; Latif, Reference Latif2021; Leńko-Szymańska, Reference Leńko-Szymańska2017; Naismith, Reference Naismith2017; Pérez-Paredes, Reference Pérez-Paredes2022; Schmidt, Reference Schmidt2022), particularly those at primary or secondary schools.
While corpus linguistics and corpora use have energised language research in recent decades, few teachers have integrated them into their classroom teaching, partially due to the absence of in- and pre-service teacher training (Boulton, Reference Boulton2017; Breyer, Reference Breyer2009; Callies, Reference Callies, Götz and Mukherjee2019; Leńko-Szymańska, Reference Leńko-Szymańska2017). Teachers and students also report difficulties in using corpus technology (Fitzgerald, Reference Fitzgerald2018; Poole, Reference Poole2022). Students with limited language proficiency often lack sufficient metalinguistic knowledge to formulate corpus search queries (Chang, Reference Chang2014; Yeh, Liou & Li, Reference Yeh, Liou and Li2007; Yoon & Hirvela, Reference Yoon and Hirvela2004). In addition, messy or overwhelming concordance results may have hindered students from locating language features (Chen, Reference Chen2011; Rodgers, Chambers & Le Baron-Earle, Reference Rodgers, Chambers and Le Baron-Earle2011). Similarly, teachers often reported technical problems regarding how to use corpora, such as searching for, analysing, or interpreting concordance lines (Breyer, Reference Breyer2009; Farr, Reference Farr2008; Leńko-Szymańska, Reference Leńko-Szymańska2014; Zareva, Reference Zareva2017). Furthermore, many teacher participants lacked the requisite knowledge or skills to use corpora independently, even after corpus training (O’Keeffe & Farr, Reference O’Keeffe and Farr2003; Tribble, Reference Tribble, Leńko-Szymańska and Boulton2015).
2.2 Teacher knowledge and skills for integrating corpora into teaching
Despite the small number of studies on corpus teacher training (Breyer, Reference Breyer2009; Farr, Reference Farr2008; Heather & Helt, Reference Heather and Helt2012; Leńko-Szymańska, Reference Leńko-Szymańska2017; Zareva, Reference Zareva2017) that demonstrated varying levels of success, a key issue remains for corpus educators: to determine the skills and knowledge that should be focused on in training programmes (Callies, Reference Callies, Kreyer, Schaub and Güldenring2016; Heather & Helt, Reference Heather and Helt2012; Mukherjee, Reference Mukherjee, Braun, Kohn and Mukherjee2006). Here, the concept of CL, which was raised by Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006), could be used as a starting point to understand the essential knowledge and skills desired in corpus training. Mukherjee’s CL included four components: (1) a basic understanding of what a corpus is, (2) what one can (and cannot) do with a corpus, (3) how concordances can be analysed, and (4) how one may (or may not) extrapolate general trends in language use from corpus data. Callies (Reference Callies, Kreyer, Schaub and Güldenring2016) further proposed a searching corpora component – how to search corpora using corpus resources and tools. To develop CL, one needs to work with a corpus and use a concordancer, which allows the user to enter a keyword/phrase and obtain many search results formulated as concordance lines. Heather and Helt (Reference Heather and Helt2012) formally defined CL as the ability to use corpus resources/technology to carry out language analysis.
Different researchers (e.g. Heather & Helt, Reference Heather and Helt2012; Leńko-Szymańska, Reference Leńko-Szymańska2017; Zareva, Reference Zareva2017) have proposed certain knowledge and skills that teachers should develop in corpus training. Although the knowledge and skills can overlap and are different (to a certain extent), a closer examination reveals that they all fall into two broad areas: (1) CL that is similar to that proposed by Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) and (2) some pedagogy-related skills that aim to integrate corpora into language classrooms (Breyer, Reference Breyer2009; Leńko-Szymańska, Reference Leńko-Szymańska2017; Lin & Lee, Reference Lin and Lee2015). In Ma et al. (Reference Ma, Tang and Lin2021), the latter is formally referred to as corpus-based language pedagogy (CBLP), meaning “the ability to integrate corpus linguistics technology into classroom language pedagogy to facilitate language teaching” (p. 2). In our view, CL relates primarily to content knowledge, whereas CBLP is more concerned with pedagogical content knowledge, based on Shulman’s (Reference Shulman1987) work.
Such a differentiation of the two dimensions of teacher knowledge is supported by recent empirical work that explored corpus teacher training. For example, the study conducted by Leńko-Szymańska (Reference Leńko-Szymańska2017) included both corpus linguistic and pedagogical skills for pre-service teachers. Further, in the study by Çalışkan and Kuru Gönen (Reference Çalışkan and Kuru Gönen2018), teachers initially experienced using corpus as a learning tool and then as a teaching tool. The pre-service teachers in the study by Ebrahimi and Faghih (Reference Ebrahimi and Faghih2017) were first instructed on the use of corpora and then on how to design pedagogical applications of corpora. Such sequence of the training content suggests that CL precedes CBLP, meaning the pedagogical applications of corpora in classroom teaching depend on the CL acquired by the trainees. Through being aware of the complex and intriguing relationship between CL and CBLP, this research focused on CL and aimed to identify the key components of CL. This could provide a sound foundation for teachers to develop their pedagogical skills further to apply corpora in teaching.
2.3 Empirical studies measuring teachers’ CL
With regard to the measurement of teacher CL, the predominant approach is to use qualitative methods, including interviews, open-ended questions, reflections, and lesson analysis (e.g. Breyer, Reference Breyer2009; Heather & Helt, Reference Heather and Helt2012; Latif, Reference Latif2021; Leńko-Szymańska, Reference Leńko-Szymańska2017; Zareva, Reference Zareva2017). A few studies have also used survey-based approaches, including both closed- and open-ended questions (Callies, Reference Callies, Götz and Mukherjee2019; Farr, Reference Farr2008; Leńko-Szymańska, Reference Leńko-Szymańska2014).
The majority of the aforementioned studies focused on measuring the attitudes and perceptions of teachers and student teachers towards the use of corpora in learning and teaching. For example, the experiences of corpus training received by participants have been examined (Breyer, Reference Breyer2009; Ebrahimi & Faghih, Reference Ebrahimi and Faghih2017; Leńko-Szymańska, Reference Leńko-Szymańska2014; Zareva, Reference Zareva2017). Participant attitudes towards using corpus resources and tools for learning and teaching purposes have also been investigated extensively (Çalışkan & Kuru Gönen, Reference Çalışkan and Kuru Gönen2018; Farr, Reference Farr2008; Latif, Reference Latif2021; Mukherjee, Reference Mukherjee, Braun, Kohn and Mukherjee2006; Naismith, Reference Naismith2017).
Recent research has indicated that corpus skills are multifaceted, including understanding and working with concordance lines (e.g. Charles, Reference Charles2018; Pérez-Paredes & Mark, Reference Pérez-Paredes and Mark2021) and building corpora (e.g. Charles, Reference Charles2012, Reference Charles2014; Charles & Hadley, Reference Charles and Hadley2022). Corpora are often built for classroom-specific contexts of language teaching or learning, especially for ESP/EAP instruction at tertiary level where corpora can be tailored to help university lecturers identify student-specific language use, such as analysing ESP/EAP writing tasks (Ackerley, Reference Ackerley, Charles and Frankenberg-Garcia2021; Chang, Reference Chang2014; Charles, Reference Charles2012). Unfortunately, the majority of teachers and student teachers working in school settings have little corpus knowledge, and we believe it is essential to help them build a basic CL, as outlined by Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) and Callies (Reference Callies, Kreyer, Schaub and Güldenring2016). Building corpora is considered an advanced corpus skill that can be focused on at a later stage, when school teachers have gained familiarity and developed confidence with their use.
Since the majority of our participants are from school settings, we focus on and measure all the basic CL components outlined by Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) and Callies (Reference Callies, Kreyer, Schaub and Güldenring2016): understanding, advantages and disadvantages, analysis, and search skills of corpora. Among these components, understanding is a relatively easy component to measure, where relevant questions (open-ended or Likert scale) can be designed to measure the level of understanding of corpora before and after training (Leńko-Szymańska, Reference Leńko-Szymańska2014; Mukherjee, Reference Mukherjee, Braun, Kohn and Mukherjee2006; Özbay & Kayaoğlu, Reference Özbay and Kayaoğlu2015). Occasionally, the perceived advantages and disadvantages of corpora have also been explored (Callies, Reference Callies, Götz and Mukherjee2019; Lin & Lee, Reference Lin and Lee2015; Mukherjee, Reference Mukherjee, Braun, Kohn and Mukherjee2006; Zareva, Reference Zareva2017). Similarly, participants’ perceived confidence (or challenges) in corpus analysis and their interpretation skills of corpus data have also been examined (Breyer, Reference Breyer2009; Leńko-Szymańska, Reference Leńko-Szymańska2014; Zareva, Reference Zareva2017).
2.4 Issues to be addressed and research questions
Past studies (often with small samples) have identified one or two CL subskills but have not comprehensively explicated a full system of critical CL skills. Leńko-Szymańska (Reference Leńko-Szymańska2014) measured the understanding component of CL by asking 13 trainee teachers to define two corpus-related terms, such as concordance and concordancer, via open questions. Similarly, Callies (Reference Callies, Götz and Mukherjee2019) asked 26 teachers to complete a survey with Likert-type scales to investigate three aspects of corpus use in the classroom: their familiarity with corpus applications (entailing understanding), their actual use of corpora in their language teaching, and the advantages of corpus use. Using qualitative data, Heather and Helt (Reference Heather and Helt2012) investigated six student teachers about four CL components: understanding, advantages, limitations, and analysis. To examine all CL subskills thoroughly, this larger study surveys many participants regarding all five components of CL: understanding, search, analysis, and the advantages and limitations.
In addition to the above observations, teacher training studies often include training in using corpora for both learning and teaching purposes. Hence, theoretically, participants are expected to use (or at least have the intention to use) corpora in their teaching after the training. Empirically, some small-scale studies (working with a small number of participants and using primarily qualitative data) have revealed that teachers or student teachers were less motivated to apply corpora in classroom after the corpus training (Ebrahimi & Faghih, Reference Ebrahimi and Faghih2017; Latif, Reference Latif2021; Lin & Lee, Reference Lin and Lee2015; Zareva, Reference Zareva2017). Moreover, similar to measuring CL, the intention to use corpora in classroom teaching has rarely been investigated systematically. In this study, we work with a large sample of teachers/student teachers and investigate their intentions to adopt corpora in classroom teaching after corpus training.
Finally, Heather and Helt’s (2012) findings revealed that participants who had developed good CL were more likely to identify the limitations of corpus linguistics and propose possible solutions. However, relatively few studies have evaluated the limitations of corpora among teachers, partly because an examination of corpus limitations is perceived to be most challenging during corpus teacher training (Zareva, Reference Zareva2017).
To address the aforementioned issues, we adopted a Likert-type scale survey to measure teachers’ self-reported CL after receiving corpus training. Our large survey study of 183 participants aims to verify empirically the five subskills of CL proposed by Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) and Callies (Reference Callies, Götz and Mukherjee2019) to gather evidence regarding the structure of the CL as an important theoretical construct, thus advancing our understanding regarding corpus research on teacher education. Our study may also shed light on how to provide effective corpus training for teachers. The following research questions guide this study:
-
1. What are the key subskills involved in developing CL?
-
2. Do participants with greater CL display a stronger intention to integrate corpora into classroom teaching?
-
3. What do participants perceive as the benefits and limitations of exploiting corpora in classroom teaching?
3. Methods
3.1 Context and participants
Using convenience sampling, corpus training was provided for 410 teachers and student teachers who were enrolled in courses offered by a university in Hong Kong from 2018 to 2020. Since this was an exploratory study, no control was imposed on the sample distribution. At the end of the corpus-based training, the participants were invited to complete an online survey to measure their perceived CL and their intention to integrate corpora into classroom teaching. Since the survey was voluntary and conducted outside the training, only 183 participants completed the survey, representing a response rate of 44%. Of these participants, most were of Chinese origin (either from Hong Kong or mainland China), while approximately 5% were international students from other countries (such as Japan, the Philippines, the United States, or Canada). Approximately half (n = 94) were English teachers, while the other half were a mixture of undergraduate and postgraduate student teachers (n = 89). All student teachers were pursuing programmes related to English language education and were expecting to become English teachers in primary or secondary schools after graduation. A large proportion of the in-service teachers (59%) worked in secondary schools, some (30%) worked in primary schools, and a small proportion (11%) worked in tertiary institutions. Accordingly, the majority of the teachers and student teachers had similar needs in terms of teaching English to either secondary or primary schools, justifying the decision to provide them with similar corpus training. Teachers with doctoral degrees often were more familiar with corpus technology, compared to other teachers (Pérez-Paredes, Ordoñana Guillamón & Aguado Jiménez, Reference Pérez-Paredes, Ordoñana Guillamón and Aguado Jiménez2018). Although we did not collect data on participants’ educational background, our personal knowledge of them suggests that less than 3% of the participants held a doctoral degree. Our pre-survey also showed that, prior to attending our corpus training, 50% had never heard of a corpus, and only 11% had more than one year of experience with corpora. Hence, most of our participants had limited CL knowledge or skills. Please see Table 1 for a list of the participants’ demographic information.
3.2 Procedure
These participants received corpus training following one of two procedures: integrated into one course on vocabulary learning and teaching for student teachers or in stand-alone training programmes offered to teachers. All the training was conducted by the first researcher of this article, who is a CALL researcher specialising in corpus technology and has been working on practical corpora applications with pre- and in-service English teachers during the past few years. In both cases, the participants underwent similar training in terms of length (approximately 4 weeks) and amount of learning content. Since the majority of our participants were already teaching (or would become teachers) at primary or secondary schools, the corpus training will focus on corpus resources and skills suitable for these two settings. Table 2 provides details of the training procedure.
Although the training covered both CL and pedagogical skills, we focus on CL in this study. With regard to the CL training, workshops and independent online learning tasks were provided for the participants to help them develop their understanding of basic concepts relating to corpus and concordance lines: how to search free online corpora (e.g. Lextutor: https://www.lextutor.ca/conc/; Word and Phrase: https://www.wordandphrase.info/; and COCA: http://corpus.byu.edu/coca/). First, corpus searches were demonstrated, and then participants were provided with hands-on experience in performing various corpus searches. Taking COCA as an example, all the essential search functions (including lists, wildcards, parts of speech, collocates, and comparisons) were introduced, and the participants were then given opportunities to practise using the search functions. Along with learning how to use various search functions for different corpora, the participants were guided in analysing and summarising the associated language patterns. Finally, the participants were provided with opportunities to reflect on the advantages and limitations of corpora usage in relation to teaching and learning.
A survey was administered to the participants at the end of the training that contained both Likert-scale and open-ended questions. The data collected from the 183 participants who responded to our survey formed the source for data analysis.
3.3 Constructing CL and developing measurement items
Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) suggested that CL comprised four components: (a) understanding of corpora, (b) knowing what corpora use can and cannot achieve, (c) knowing how to analyse corpus data, and (d) knowing how to draw conclusions about language use trends. For practical reasons, we divided the second component into two: the limitations and advantages of using corpora. Being able to draw conclusions about language use upon observations of corpus data is a salient advantage of corpora use. Therefore, we decided to integrate this component with the advantages component. When exploring any corpus data, the key corpus tool is the concordancer. However, as indicated by Naismith (Reference Naismith2017), “use of concordancers also typically requires some expertise and training, and the results [of concordance lines] may be challenging to interpret” (p. 275). We believe that teachers and students should be provided with adequate search skills to enable them to perform corpus searches and to observe and analyse language use. Callies (Reference Callies, Kreyer, Schaub and Güldenring2016) proposed an additional component (searching corpora), which we further modified and termed corpora search skills. After proposing the five subskills to be included in CL, the next step was to develop relevant scale items targeting each of the key CL components. This resulted in a total of 16 items across five components of CL (see Table 3 for detailed items).
Participants could answer on a 6-point Likert scale ranging from “strongly disagree” to “strongly agree” for all the items. In addition, the following two open-ended questions were included in the survey to collect teacher-perceived benefits and limitations of corpora: (1) In your view, what are the main advantages of corpus knowledge for your language learning and teaching? and (2) What are the limitations of using corpus resources for language learning and teaching?
We initially piloted these items with 40 student teachers and then revised the wording of a few items after receiving student feedback, and all 16 items were retained (see Table 3 for the finalised items).
Although the development of CL may not result in teachers immediately integrating corpora into the classroom, we postulated that the development of CL could have a positive influence on teachers’ intention to integrate corpora into classroom teaching. Following the concept of “Intention to Use Technology” in the technology acceptance model (TAM) (Davis, Reference Davis1989), intention to integrate corpora into classroom teaching was operationalised as a holistic propensity to integrate them into classroom teaching. Accordingly, our instrument also had to consider developing scale items with respect to the intention to integrate corpora into classroom teaching (ITICT). For this purpose, we added three items to form this component (see Table 4 for details).
In essence, the survey instruments were developed to measure two areas of corpus training: (1) the five components to be included in CL development and (2) whether CL can result in ITICT.
3.4 Data analysis
3.4.1 Item analysis
The corrected item-total correlation method was used for item analysis. The correlation between an item and its subscale was considered weak if the correlation was below 0.40, indicating that the relationship was also weak (Shieh & Wu, Reference Shieh and Wu2016). For a good scale, Ferketich (Reference Ferketich1991) recommended that corrected item-total correlations should range between 0.30 and 0.70. The results of our item analyses revealed very good values for all items in the six components (five for CL and one for ITICT), ranging from 0.89 to 0.55.
The next step involved conducting a reliability analysis to test the extent to which the instrument was likely to measure various components of CL consistently. We used Cronbach’s alpha coefficients to evaluate the internal consistency reliability associated with scores derived from a scale. By default, any Cronbach’s alpha exceeding 0.70 can be deemed acceptable.
3.4.2 Confirmatory factor analysis
To assess the structural validity of the 5-factor scale for CL (see Figure 1) based on the developed items, we used the confirmatory factor analysis (CFA) function in R software (R Core Team, 2021) using Rosseel’s (Reference Rosseel2012) lavaan package. For the five subscales, we included the items’ respective latent subscales. To evaluate the goodness of fit of the 5-factor model, several fit indices were evaluated using chi-squared (χ2) statistics and other goodness-of-fit indices: the comparative fit index (CFI), the Tucker–Lewis index (TLI), the root-mean-square error of approximation (RMSEA), and the standardised root-mean-square residual (SRMR). According to Hu and Bentler’s (Reference Hu, Bentler and Hoyle1995) recommendation, a model with CFI and TLI values > 0.90 and an RMSEA value of < 0.08 fitted well with the data. An SRMR value of < 0.08 was considered to be a good fit and value of 0.00 was a perfect fit (Hu & Bentler, Reference Hu and Bentler1999). The p-value of the χ2 should be > 0.05. However, when the sample size is large – as in the present study – a nonsignificant χ2 may be difficult to obtain (Barret, 2007).
3.4.3 SEM analysis
After the structural validity was tested, we proceeded to test the SEM using STATA 14 MP to predict the ITICT. Since the factor loadings were already identified by the CFA (see Comrey & Lee, Reference Comrey and Lee1992), we used the sum scores of the five factors as observed variables for the latent variable CL. The sum score method is especially applicable when the scales used are still exploratory (DiStefano, Zhu & Mindrila, Reference DiStefano, Zhu and Mindrila2009; Hair, Black, Babin, Anderson & Tatham, Reference Hair, Black, Babin, Anderson and Tatham2006), and exploratory studies generally find this approach acceptable (Tabachnick & Fidell, Reference Tabachnick, Fidell, Tabachnick and Fidell2001). Subsequently, we used the latent variable CL to predict the latent variable regarding ITICT with its three observed variables (see Figure 2).
3.4.4 Analysis of corpus-based teaching activities
Our survey only measured the participants’ perceived CL, not their actual performance. To compensate for this limitation, we will analyse some of the CL skills demonstrated in some corpus-based lesson activities designed by the participants. However, for brevity, only one corpus-based lesson will be selected and analysed. This analysis will focus on whether the designed activities reflected some of the five CL components outlined in Table 3.
3.4.5 Open-ended data analysis
All the answers to the two open-ended questions were collected and entered into Excel spreadsheets for the content analysis. The analysis proceeded in two steps (coding and thematic analysis) following the guidance provided by Creswell and Guetterman (Reference Creswell and Guetterman2019) and Braun & Clarke, Reference Braun and Clarke2006). The data were coded independently by two of the authors, and the inter-code reliability reached 88%. All disputed cases were resolved through discussion and agreement, after which the codes were combined to form a number of themes.
4. Results
4.1 Likert-scale data
The results of our reliability analysis, measured by Cronbach’s alpha for each subscale, indicated high reliability (0.954–0.775). As a result, we retained the 19 items for the overall scale. The reliability results for the final subscales are listed in Table 5.
To test the structural validity of the scale, we tested the 5-factor structure using all the items. This 5-factor structure was based on the five theoretical components of CL we constructed based on the work of Mukherjee (Reference Mukherjee, Braun, Kohn and Mukherjee2006) and Callies (Reference Callies, Kreyer, Schaub and Güldenring2016). This model included all the items (see Model 1 in Table 6). Model 1 had the following indices: CFI = 0.931, TLI = 0.914, and RMSEA = 0.098, with a 90% confidence interval of 0.087–0.110 and an SRMR of 0.06. Since the two items (U2 and L3) had cross-loadings of ≥ 0.45, these items had a poor fit to the model (see Brown, Reference Brown2015; Tabachnick & Fidell, Reference Tabachnick and Fidell2012). Accordingly, we removed these two items and ran the CFA again, which resulted in Model 2 (see Figure 1). Model 2 significantly improved the fit indices: CFI = 0.956, TLI = 0.943, and RMSEA = 0.086, with a 90% confidence interval of 0.072–0.100 and an SRMR of 0.03. This suggested that the 5-factor model without items U2 and L3 had a better fit than Model 1. To reiterate, while both models had significant χ2 values considering the sample size (Barrett, Reference Barrett2007), Model 2 had better fit indices and was thus more desirable.
Note. CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = root-mean-square error of approximation; CI = confidence interval; SRMR = standardised root-mean-square residual.
After validating the 5-factor structure of the CL scale, we tested whether CL would predict the ITICT using an SEM. To evaluate the model fit of the SEM, three fit indices were evaluated: CFI, TLI, and RMSEA. The SEM model (see Figure 2) had a good fit with the data, χ2(19, N = 181) = 39.89, p < 0.05; CFI = 0.985, TLI = 0.977; RMSEA = 0.078 (90% CI = 0.00–0.04); SRMR = 0.023. The effect of CL in predicting the ITICT in the SEM model was positive and significant (B = 0.41, p < 0.001). The full model demonstrated that 95.71% of the variance of ITICT could be predicted by all the entered observed and latent variables of CL. Specifically, the latent variable CL significantly predicted 92% of the variance of ITICT.
4.2 Corpus-based teaching activities
The following corpus-based teaching activities were selected from a lesson co-designed by two student teachers. The aim of the lesson was to help Chinese junior secondary 1 students differentiate three easily confused English verbs (“read”, “watch”, and “see”), because these can be approximately translated into the common Chinese word ‘看’ (kan). The following analysis focuses on the five previously identified CL skills.
4.2.1 Understanding
Overall, the lesson plan (see supplementary material) featured a series of well-designed corpus-based activities using COCA and included two relevant stages: “hands-on corpus search by students” and “inductive discovery by students”. The two trainees demonstrated a good understanding of corpus knowledge by integrating direct corpus consultations into their teaching steps. In addition, they designed a step-by-step COCA search guide (see Figures 3–5 for some excerpts). This demonstrated their high motivation toward adopting corpus technology as a teaching tool in the classroom.
4.2.2 Search
The two trainees designed an activity using the “collocates” function on COCA for students. After providing students with a demonstration, they are asked to follow the instructions (Figure 3) to search for nouns that collocate with “read”, “watch”, and “see”.
4.2.3 Analysis
The two trainees understood how to encourage students to analyse the concordance results from COCA and guided them to understand the differences between the three verbs (see Figure 4). Towards the end, the students were also provided with a “fill-in-the-blanks” exercise as a way of summarising the meaning and use of the three words (see Figure 5).
4.2.4 Advantages and limitations
The two trainees took advantage of the collocate function of COCA when teaching collocations of the three target words in a series of coherent activities, as shown previously. In addition, they were aware of the difficulties faced by junior secondary school learners when summarising language use patterns from concordance lines. Therefore, they tried to integrate some interactive and non-corpus activities into the corpus-based lesson. For example, prior to asking students to summarise the word meaning and uses (Figure 5), they designed an activity (see Figure 6) where students needed to match nouns/noun phrases (e.g. recipe, letters, television, and teacher’s angry face) with the three target verbs in the form of paper slips in pair work. The activity shown in Figure 6 demonstrates the two student teachers’ awareness of the limitations of corpus data and their attempts to ameliorate the situation by scaffolding student learning with interactive, non-corpus resources, as recommended in Ma et al. (Reference Ma, Tang and Lin2021).
Similar to the studies by Ma et al. (Reference Ma, Tang and Lin2021, Reference Ma, Yuan, Cheung and Yang2022) and Crosthwaite et al. (Reference Crosthwaite, Luciana and Wijaya2021), the analysis of the selected lesson activities indicates that our trainees had developed a good level of CL skills. Moreover, they had a good understanding of corpus technology, tried to integrate it into their lesson plan, and could take advantage of corpus tools and search functions to develop their lesson materials.
4.3 Open-ended question data
The qualitative data collected from the two open-ended questions were analysed to identify the teacher-perceived benefits and limitations of corpora use. Using thematic analysis, some themes were generated, and the results are shown in Table 7.
4.3.1 Benefits
Five key themes emerged regarding the benefits of corpora. The first was (1) access to authentic data, with both teachers and learners being able to access authentic language data and see how target words/phrases are naturally used in different contexts.
The second advantage was (2) promotion of autonomous learning. For learners, corpora can help in discovering, summarising, or self-correcting language patterns/usage in language learning. For teachers, it can help to encourage student independent/inductive/discovery learning through hands-on corpus searches.
The third advantage was (3) opportunity for learning and teaching collocations, and the fourth advantage was (4) learning/teaching difficult or easily confusing language/lexical items. Learners can learn confusing lexical pairs and enhance their accuracy of word choice in academic writing. For teachers, corpora can provide evidence that enables them to teach confusing lexical pairs and can help them confirm their intuitive language use, as revealed by the following statement:
I can use corpora to check my understanding before I deliver a course to students. It also enables learners to enhance the accuracy of their word choices. (Open-ended questions, data 4)
The last advantage was (5) using corpus resources for designing teaching activities:
We can design appropriate vocabulary teaching activities, which is very helpful. (Open-ended questions, data 5)
The results indicated that corpora can be of great benefit to both teachers and learners, reflecting their potential to enhance language learning and teaching (to a lesser extent). The most frequently mentioned benefits were the authenticity of the corpus data and the promotion of autonomous learning through corpus use. Next came the learning/teaching of linguistic items, such as collocations, lexis, or grammar. Comparatively few teacher participants perceived the benefits of corpus resources for designing teaching activities.
4.3.2 Limitations
Five main limitations regarding the use of corpus resources for language teaching and learning emerged from the data analysis. The first was (1) learning difficulty for lower-level students in summarising language patterns. Language data sampled in corpora may be too difficult to enable learners to summarise language use patterns without a teacher’s guidance (especially those with low proficiency), as reported as follows:
It may be difficult for beginners to draw conclusions from their observations without guidance. Teachers should prepare well to help learners to draw conclusions, which is essential. (Open-ended questions, data 6)
The analysis of the corpus-based teaching activities designed by the two student teachers demonstrated their awareness of this limitation. Accordingly, they tried to reduce the learning difficulties of the low-level learners by designing some scaffolding activities to facilitate an inductive summary of the language patterns from using concordance lines.
The second limitation was that corpus use is (2) time-consuming for both learners and teachers. The third limitation was (3) a lack of access to a computer and limited (pedagogical) ICT skills. The fourth limitation was (4) teachers’ difficulty in selecting and modifying appropriate texts. Teachers need to filter appropriate, accurate, and grammatically accurate texts from the corpus to prepare students for lessons. The final disadvantage (5) was teachers’ difficulty in integrating corpora with other teaching resources:
I find it difficult to integrate corpus-based activities with other resources. (Open-ended questions, data 10)
Some of the limitations perceived by the teachers aligned with previous studies that accounted for teachers’ reluctance to use corpora: it can be time-consuming for learners to analyse language data and for teachers to select appropriate language data, teachers may lack confidence and corpus-related ICT skills, and the CL may be incompatible with their already packed teaching schedules.
5. Discussion and implications
The CFA analysis validated the five key factors on which CL is built, and the SEM analysis demonstrated that teachers with higher CL tended to have stronger intentions to integrate corpora into classroom teaching. The analysis of the corpus-based teaching activities included in one selected lesson provided a snapshot of the trainees’ performance data on CL skills. The analysis of teacher-perceived benefits confirmed the potential of corpora for language learning/teaching and limitations highlighted teachers’ reluctance to explore corpus resources as a direct teaching tool.
5.1 Key factors for CL
Our study proposed and confirmed five subskills for CL, advancing our understanding of CL as a theoretical concept that lays the foundation for corpus teacher training. Within our research, a systematic measure of CL using both Likert-scale and open questions was also conducted, making important contributions to the measurement of CL. Built on Mukherjee’s (2006) four-component CL, our results empirically validated Callies’ (Reference Callies, Kreyer, Schaub and Güldenring2016) proposal that search skills should be included in CL. The newly added search skills turned out to be the most important factor for acquiring CL, since the path coefficient for search skills had the highest value of 0.93 (see Figure 2). Essentially, CL is “a bundle of complex skills conceived of as the ability to use the tools and technology of corpus linguistics” (Callies, Reference Callies, Kreyer, Schaub and Güldenring2016: 391) to enhance language learning and teaching. The key to exploring any corpus data is to learn how to use concordances to perform various corpus searches and to generate concordance lines for observations. The advantages factor of using corpora also yielded a high path coefficient of 0.91. This factor was similar to the often quoted “perceived usefulness” in the TAM model (Davis, Reference Davis1989), which is a key component for determining an individual’s attitude towards the use of technology and their acceptance behaviour. If teachers clearly perceive the usefulness (or advantages) of corpora, this would be a driving force for them deciding to learn and make use of corpora to enhance their language teaching. The importance of the search and analysis skills is supported by the analysis of the corpus-based teaching activities designed by our trainees (see above).
The two factors of understanding of corpora and analysis of corpus data also had relatively high path coefficients (0.83 and 0.82, respectively). A basic understanding of corpora is a prerequisite, providing motivation for using corpus tools and technology. Moreover, knowing how to analyse corpus data (especially concordance lines) is an essential skill that needs to be mastered. This skill facilitates manipulating corpus tools and extrapolating meaning and usage from the authentic language examples provided by the corpora. The analysis of the lesson designed by our trainees also demonstrated the importance of understanding and analysing corpus data with regard to developing an essential CL.
The coefficient for the limitations of using corpora was slightly lower than for the first four factors of CL at 0.71; however, it remained statistically significant (p < 0.001). No technology is perfect (including corpora), and understanding the limitations of corpora was one of the most challenging perspectives in relation to CL development (Zareva, Reference Zareva2017). In addition, greater awareness of the limitations was associated with stronger CL development among teacher trainees (Heather & Helt, Reference Heather and Helt2012). In this sense, we were pleased to establish that some of our trainees were aware of the limitations of corpus data and tried to use other means to compensate for these limitations when designing their corpus-based lesson materials.
5.2 The relationship between CL and teachers’ intention to integrate corpora into classroom teaching
Previous research investigating teacher perceptions of corpora and teacher attitudes towards incorporating corpora into their classroom teaching has typically yielded contradictory pictures. While the majority of teachers or student teachers clearly acknowledge the great potential that corpora have for language learning, they are much less willing to use corpora in their classroom teaching (Breyer, Reference Breyer2009; Naismith, Reference Naismith2017; Zareva, Reference Zareva2017). The current study took a step further by investigating teachers’ CL, which clearly demonstrated that CL leads to teachers’ intention to use corpora in classroom teaching. In other words, those teachers or student teachers with a higher level of CL are more likely to form a positive intention of incorporating corpora into classroom use. Given the high importance of search skills, more relevant training activities should be provided to help teachers gain familiarity with the various corpus search functions available on popular corpus websites. Moreover, enhancing their corpus search skills may also increase their confidence in working with corpora (Heather & Helt, Reference Heather and Helt2012; Naismith, Reference Naismith2017).
5.3 Teacher-perceived benefits and limitations of exploiting corpora in classroom teaching
In this study, the benefits that participants perceived in corpus use in language classrooms were also identified. Participant responses concentrated on five main themes: access to authentic data, learning/teaching collocations, promotion of autonomous learning, learning/teaching difficult language items, and using corpus resources for designing teaching activities. These teacher perceptions aligned with the reported benefits of corpora for language education (Heather & Helt, Reference Heather and Helt2012; Leńko-Szymańska, Reference Leńko-Szymańska2017; McEnery & Xiao, Reference McEnery, Xiao and Hinkel2011). For example, corpora are perceived to offer a considerable advantage in addressing learners’ collocation difficulties/errors (Fang, Ma & Yan, Reference Fang, Ma and Yan2021; McEnery & Xiao, Reference McEnery, Xiao and Hinkel2011; Tsai, Reference Tsai2019). From a pedagogical perspective, another important advantage of using corpora in language teaching is by helping teachers to promote a more learner-centred, autonomous approach to language learning and teaching (Boulton, Reference Boulton2017). An analysis of these teacher-perceived benefits confirmed the great potential that corpora have for language learning. Finally, the results indicated that teachers were inclined to view corpora more as a learning tool rather than as a classroom teaching tool. The reasons for this may be partly due to the perceived limitations of corpora usage.
The results revealed five teacher-perceived limitations. One new finding that has rarely been reported was the teachers’ difficulty in integrating corpora with other resources. Again, this could explain why teachers preferred to use corpora as a learning rather than teaching tool (Breyer, Reference Breyer2009; Leńko-Szymańska, Reference Leńko-Szymańska2017; Naismith, Reference Naismith2017). The literature also presented various reasons why teachers are reluctant to apply corpora in their teaching. These included frustration with the technical problems associated with corpora (Breyer, Reference Breyer2009; Farr, Reference Farr2008; Naismith, Reference Naismith2017), the heavy workload involved (Leńko-Szymańska, Reference Leńko-Szymańska2017; Lin & Lee, Reference Lin and Lee2015), and a lack of corpora training (Boulton, Reference Boulton2017; Tribble, Reference Tribble, Leńko-Szymańska and Boulton2015). The current study extended this understanding by revealing a new reason why teachers are reluctant to employ corpora into classroom teaching – the difficulty of integrating corpora with other resources.
5.4 Pedagogical implications
Based on our results, we propose several pedagogical implications for corpus linguists, corpus website developers, and teacher educators to help them provide successful CL training for teachers or students.
The first concerns what corpus skills should be the focus in CL training. Since the understanding of corpora is usually the first topic to be covered in corpus teacher training, it is assumed that this understanding can be relatively easily acquired. Further, it has been reported that teachers and student teachers are able to identify the benefits of corpora use after their training (e.g. Farr, Reference Farr2008; Heather & Helt, Reference Heather and Helt2012; Naismith, Reference Naismith2017; Zareva, Reference Zareva2017), which are mainly due to the advantages of corpus linguistics itself. Therefore, this aspect of training would also not be too challenging. The remaining three subfactors of analysis, awareness of limitations, and especially search skills (which are core components of the training) are worthy of greater attention.
The second concern is the sequence of introducing corpus websites to participants. The initial learning of corpora may be quite challenging for teachers and students, as this includes the functions and skills needed to perform essential concordance searches of corpus data, including searches for keywords or collocates (Leńko-Szymańska, Reference Leńko-Szymańska2017; Naismith, Reference Naismith2017; Zareva, Reference Zareva2017). Although the keyword search function is similar for many corpora concordancers, some online corpus websites have their own unique operating systems and different search functions (e.g. Lextutor, BNC, and COCA). It is suggested that only one corpus website is focused on at a time. Only after the participants have become familiar with the key search functions of a particular corpus should other (similar or different) corpus websites be introduced to participants sequentially.
Third, to facilitate teachers and learners using corpora, corpus website developers may consider using similar search interfaces and simplifying search syntax. This was recently conducted for the COCA website, where many complicated search syntaxes (previously only understood by corpus linguists) were simplified and reduced to a limited set of straightforward search formulae. This practice is desirable and presents a good example for other corpus websites to follow to encourage increasing numbers of teacher and learner users.
Fourth, corpus analysis skills should also be given sufficient attention in the training. Closely related to search skills, corpus analysis skills may be considered demanding by teachers and student teachers. Zareva (Reference Zareva2017) reported that teacher trainees considered the analysis of corpus data time-consuming and that it was not easy to summarise patterns of language use. Further, they encountered difficulties in understanding the information in tables and charts. For these reasons, we suggest that more hands-on corpus practice involving corpus analysis should be provided during training; for example, both Breyer (Reference Breyer2009) and Farr (Reference Farr2008) emphasised that more opportunities should be provided for trainees to interact with corpora as learners. From the learner perspective, Heather and Helt (Reference Heather and Helt2012) proposed that teacher trainees should learn how to “organize and present concordance data in ways that lead more clearly to autonomous learning for their students” (p. 436).
Finally, helping teachers to become aware of the limitations of corpora usage and to identify alternative solutions are important aspects of CL training. The implication was that the more aware of limitations teachers are, the more likely they are to develop methods of compensating for any limitations. Hence, they would make better use of corpora to facilitate student learning. This implies that more specific training on how to compensate for corpus limitations is needed to develop rounded CL for teachers. In addition, previous research suggests that understanding corpus limitations may be indicative of a high level of CL (Heather & Helt, Reference Heather and Helt2012). Understanding teacher-perceived limitations pertaining to corpora use may serve three purposes. First, knowing about these limitations may help corpus linguists and concordancer developers to understand the difficulties teachers experience with corpora usage. This would allow them to develop ideas for improving corpus tools and to align them with teacher needs. Second, educators providing corpus-based training for teachers should consider these limitations and design effective instructional activities/procedures to overcome them. Third, raising teacher awareness of the limitations means enhancing their metacognition and helping them to manage and plan effective corpus-based teaching.
6. Conclusions and limitations
Adopting a SEM approach, this research established five key components of CL that account for teachers’ intention to integrate corpora into their classroom teaching. Although it is commonly agreed that CL should be the focus of CL teacher training, researchers have tended to interpret and select rather idiosyncratically the subskills that should be included in CL training. Our research clearly established the five factors that comprise CL (understanding, search, analysis, and the advantages and limitations of corpora); therefore, successful training should hinge on adequate training that involves these five factors. Moreover, since it is relatively easy to learn about (and understand) the advantages of corpora, greater effort should be made to provide training in the three subfactors: analysis, limitations, and especially search skills of corpora (which is the foundation of corpus-based training). Finally, understanding the limitations of corpora can raise teachers’ metacognition, especially for devising alternative solutions to make better use of corpora in their classroom teaching.
This study has two limitations: self-report data and lack of concrete tests to assess participant use of corpora in their teaching. As self-report data can be subjective, the Likert-type scale measures of CL might differ from those of CL performance data. Hence, future studies can develop and psychometrically validate CL test items to measure teachers’ CL performance data more objectively and accurately. Additional psychometrically validated objective measures include teacher use of corpus search functions, analysis of concordance lines, and how they integrate CL into designing and using suitable teaching materials to facilitate their classroom teaching.
Supplementary material
To view supplementary material referred to in this article, please visit https://doi.org/10.1017/S0958344022000180
Acknowledgements
The article was supported by two CRAC projects (Ref: 03AAB; 04A32) at The Education University of Hong Kong.
Ethical statement and competing interests
Accessing the data archive is determined by the funding body of the research, namely The Education University of Hong Kong. All participants participated voluntarily in this research and appropriate ethical procedure was followed to obtain participant consent. The authors declare no competing interests.
About the authors
Qing Ma is an associate professor at the Department of Linguistics and Modern Language Studies, The Education University of Hong Kong. Her main research interests include second language vocabulary acquisition, corpus linguistics, computer-assisted language learning and mobile-assisted language learning. Her current research focuses on how to theorise and validate empirically a corpus-based language pedagogy.
Ming Ming Chiu is Chair Professor of Analytics and Diversity (Honor) in the Department of Special Education and Counselling and Director of the Assessment Research Centre at The Education University of Hong Kong. He studies automatic statistical analyses, inequalities, culture, and learning in 65 countries. His research interests include learning analytics, group processes, inequality, corruption, and online sexual predators.
Shanru Lin is a research assistant at the Department of Linguistics and Modern Language Studies, The Education University of Hong Kong. Her main research interests include data-driven learning, technology-enhanced learning, corpus linguistics, mobile-assisted language learning, personalised learning, vocabulary learning.
Norman B. Mendoza is a postdoctoral fellow at The Education University of Hong Kong, in the Department of Curriculum and Instruction. His research interests are in assessment, motivation, and psychology in the school and educational context.
Author ORCIDs
Qing Ma, https://orcid.org/0000-0003-3125-3513
Ming Ming Chiu, https://orcid.org/0000-0002-5721-1971
Shanru Lin, https://orcid.org/0000-0002-1439-2514
Norman B. Mendoza, https://orcid.org/0000-0003-0344-0709