Introduction
Various types of formulaic language, including collocations, are widespread in language and are of great importance in learning an additional language (e.g., Erman & Warren, Reference Erman and Warren2000; Wray, Reference Wray2002). Especially when it comes to the fluent, accurate, and idiomatic production of both spoken and written language, collocation knowledge plays a crucial role (e.g., Sinclair, Reference Sinclair1991). However, despite the large and growing number of pertinent studies, and the fact that collocations have been shown to be quite challenging even for advanced L2 learners (e.g., Boers et al., Reference Boers, Lindstromberg and Eyckmans2014; Laufer & Waldman, Reference Laufer and Waldman2011; Nesselhauf, Reference Nesselhauf2003), relatively few studies have addressed the longitudinal development of learners’ productive collocation knowledge. The few studies that have addressed this subject have mainly focused on learners’ use of collocations in (academic) writing (e.g., Edmonds & Gudmestad, Reference Edmonds and Gudmestad2021; Li & Schmitt, Reference Li, Schmitt and Wood2010; Siyanova-Chanturia, Reference Siyanova-Chanturia2015; Siyanova-Chanturia & Spina, Reference Siyanova‐Chanturia and Spina2020) or they were intervention studies designed to test the effect of a specific type of input on the incidental acquisition of collocations (e.g., Vu & Peters, Reference Vu and Peters2021). These longitudinal studies have undoubtedly provided valuable insights into learners’ collocation development. Even so, much remains to be learned about how learners’ L2 collocation knowledge develops over a longer time span as well as about the factors that affect this knowledge.
In a foreign language learning context, vocabulary and collocation learning can occur both intentionally and incidentally (e.g., Hulstijn, Reference Hulstijn, Doughty and Long2003). The term “intentional” has commonly been used in collocation research when investigating the effectiveness of different forms of classroom-based teaching, and “incidental” when examining learners’ collocational gains after they engaged in meaning-focused communicative activities such as reading or viewing television (for a review see Szudarski, Reference Szudarski2017). However, language learners—especially university foreign language majors who wish to develop their language skills for professional or personal reasons—might also have the intention to learn when reading or listening in the L2, as “coming across unfamiliar words during reading may trigger different kinds of processes, from basic visual intake and semantic integration to deliberate attempts to encode form and derive meaning” (Elgort et al., Reference Elgort, Brysbaert, Stevens and Van Assche2018, p. 363). Therefore, we believe that the term “contextual learning” from Elgort et al. (Reference Elgort, Brysbaert, Stevens and Van Assche2018) is the most suitable term to refer to collocation learning in this study, in which we aim to investigate university students’ productive form recall knowledge of L2 German collocations by focusing on 35 specific target collocations and by examining a range of item-related and learner-related variables.
Background
Identifying collocations
In L2 research, there is a firm consensus that collocations (e.g., pay attention) are one type of formulaic language, alongside other types of multiword expressions such as idioms (e.g., it’s a piece of cake), conventional situational expressions (e.g., nice to meet you), and lexical bundles (e.g., I don’t know if). In a broad sense, the concept “collocation basically refers to a syntagmatic relationship among words which co-occur” (Wood, Reference Wood and Webb2019, p. 31). However, precise definitions have varied according to researchers’ analytical approach. Two approaches, the phraseological and the frequency-based, have been predominant in past research (e.g., Granger & Paquot, Reference Granger, Paquot, Granger and Meunier2008). The former approach identifies a collocation as a type of restricted word combination based on the semantic and/or syntactic relationship between two (or more) words (e.g., Howarth, Reference Howarth1998). The frequency-based approach sees collocations as sets of words that have a high statistical probability of appearing together in natural language (e.g., Firth, Reference Firth1957; Sinclair, Reference Sinclair1991). Granger and Paquot (Reference Granger, Paquot, Granger and Meunier2008) argued for a definition that takes account of frequency, semantics, and syntax all together. In their hybrid view collocations are seen as habitually co-occurring lexical partnerships that have relatively transparent meanings (e.g., make a mistake, strong coffee) unlike, say, idioms. We adopt this hybrid view. Specifically, we take a collocation to be a word combination that (a) represents a specific syntactic pattern (e.g., adjective + noun, verb + noun), (b) occurs within a given word span in a corpus, and (c) has a relatively transparent meaning.
Item-related and learner-related variables affecting L2 collocation knowledge
Previous research on the processing and use of collocations indicates that L2 collocation development is influenced by item-related and learner-related variables. Item-related variables reported to influence the acquisition and processing of L2 collocations include L1-L2 congruency (e.g., Ding & Reynolds, Reference Ding and Reynolds2019; Vu & Peters, Reference Vu and Peters2021; Wolter & Gyllstad Reference Wolter and Gyllstad2011, Reference Wolter and Gyllstad2013), collocation frequency (e.g., Durrant, Reference Durrant2014; Wolter & Gyllstad, Reference Wolter and Gyllstad2013; Wolter & Yamashita, Reference Wolter and Yamashita2018), and frequency of node words (e.g., Nguyen & Webb, Reference Nguyen and Webb2017). Important learner-related variables include learners’ knowledge of single word L2 vocabulary (e.g., Gyllstad, Reference Gyllstad, Barfield and Gyllstad2009; Nguyen & Webb, Reference Nguyen and Webb2017; Vilkaitė, Reference Vilkaitė2017; Vu & Peters, Reference Vu and Peters2021) and L2 immersion (e.g., Edmonds & Gudmestad, Reference Edmonds and Gudmestad2021; Siyanova & Schmitt, Reference Siyanova and Schmitt2008). Although these variables have been shown to have a (positive) influence on L2 collocation learning, they are rarely studied together in a single study. With respect to incidental collocation learning, Vu and Peters (Reference Vu and Peters2021) explored the effect of three different modes of reading (reading-only, reading-with-listening, and reading with textual input enhancement), prior vocabulary knowledge, and five item-related variables (congruency, frequency of occurrence, Mutual Information [MI] score, corpus frequency, and type of collocation). They found that all three modes of reading resulted in learning gains, with a superior effect for reading with textual input enhancement. Learners’ prior vocabulary knowledge and congruency were found to be significant predictors of the incidental learning. This study was carried out over a 9-week period, but to the best of our knowledge, no study to date has investigated the aforementioned variables over a longer time span.
Item-related variables
Congruency
It has been widely observed that even advanced L2 learners produce unconventional and perhaps odd L2 collocations owing to overreliance on word-for-word translation from their L1 (e.g., Laufer & Waldman, Reference Laufer and Waldman2011; Nesselhauf, Reference Nesselhauf2003). In most studies on L1 influence, the present study included, collocations are considered “congruent” in L1 and L2 if there is a word-for-word translation of the L1 expression for the concept that the learner has in mind, and “incongruent” if there is no such translation equivalent (e.g., Nesselhauf, Reference Nesselhauf2005; Wolter & Gyllstad, Reference Wolter and Gyllstad2013). An example of a congruent German collocation for Dutch learners of German is eine Rolle spielen G—een rol spelen D (“play a role”). An incongruent collocation is in Ruhe lassen G—met rust laten D (“leave alone”) because the word-for-word translation would be *mit Ruhe lassen (*“leave with silence”). Note that only one word (the preposition) of in Ruhe lassen is incongruent according to the preceding definition. However, there are collocations that have more than one incongruent part, for example Wert legen (auf) G—belang hechten (aan) D (“attach importance [to]”). In our study, a collocation is considered incongruent when there is no literal translation equivalent of at least one of the constituent parts.
The effect of L1-L2 congruency on the learning and processing of L2 collocations has been well documented in the SLA literature. For example, Peters (Reference Peters2016) and Vu and Peters (Reference Vu and Peters2021) reported evidence that learners’ ability to recall forms is generally better for congruent collocations than for incongruent. Additionally, it has been found that congruent L2 collocations are processed more quickly and accurately than incongruent ones (e.g., Ding & Reynolds, Reference Ding and Reynolds2019; Wolter & Gyllstad, Reference Wolter and Gyllstad2011, Reference Wolter and Gyllstad2013; Wolter & Yamashita, Reference Wolter and Yamashita2018; Yamashita & Jiang, Reference Yamashita and Jiang2010). The fact that this effect is observed even in very advanced learners suggests that there might be a continuing influence of the L1 (Wolter & Gyllstad, Reference Wolter and Gyllstad2013). To the best of our knowledge, no study to date has examined the influence of congruency over multiple years to see how, or even whether, the effect of L1-L2 congruency effect changes.
Frequency and association strength
Usage-based theories hold that language learning is experience driven and that an extremely important fact of this experience is that different vocabulary items occur in input with different frequencies (Ellis, Reference Ellis2002). In SLA research, corpus frequencies are used to estimate real-world input frequencies. There is much evidence that learners tend to acquire high-frequency words before low-frequency words because high-frequency words are encountered more often (e.g., Ellis, Reference Ellis2002; Nation, Reference Nation2001). Some researchers suggest that frequency matters in this way for collocations as well (e.g., Durrant, Reference Durrant2014). It has been reported that the learnability of a collocation is not only influenced by the frequency of the collocation as a whole but also by the frequencies of its constituent words (e.g., Nguyen & Webb, Reference Nguyen and Webb2017; Wolter & Yamashita, Reference Wolter and Yamashita2018). However, Vu and Peters (Reference Vu and Peters2021) found that corpus frequency of the target collocations (consisting of high-frequency words) was not a significant predictor of students’ learning gains. Given these mixed findings, more research is needed into the effect of corpus frequency and L2 collocation development.
Inextricably linked to corpus frequency is interword association strength, which is often measured by t-scores or MI scores. Rankings based on t-scores tend to highlight very frequent word combinations (e.g., good example), whereas high MI scores tend to highlight relatively infrequent combinations made up of words that are strongly associated (e.g., tectonic plates) (Durrant & Schmitt, Reference Durrant and Schmitt2009). The t-score is computed as an adjusted value of collocation frequency based on the raw frequency minus random co-occurrence frequency divided by the square root of the raw frequency (Gablasova et al., Reference Gablasova, Brezina and McEnery2017). The MI score compares the probability of observing the two words of the collocation to the probabilities of observing the words independently (Church & Hanks, Reference Church and Hanks1990). Both measures are widely used in learner corpus studies on L2 writing, in which collocations are usually extracted from learners’ written productions. The few longitudinal studies on this topic, with participants in an immersion context, present differing results in terms of change over time. Yoon (Reference Yoon2016) found no significant changes in MI scores when comparing the essays written at the start and the end of one semester. Li and Schmitt (Reference Li, Schmitt and Wood2010) found a moderate increase in the t-score, whereas the MI scores remained relatively stable, although they also found considerable variation between the individual students. In a large-scale learner corpus study Siyanova-Chanturia and Spina (Reference Siyanova‐Chanturia and Spina2020) did not find that MI scores underwent a statistically significant change over time. However, in an earlier study, Siyanova-Chanturia (Reference Siyanova-Chanturia2015) found that learners’ writings at the end contained not only more higher-frequency combinations but also more collocations with relatively high MI scores than did the writings at the beginning. Edmonds and Gudmestad (Reference Edmonds and Gudmestad2021) also found a change in MI scores in learners’ output 8 months after these learners’ time abroad but no change in the overall frequencies of collocations. A possible explanation for these mixed findings relates to the immersion experience, in which the degree of language acquisition might not only depend on the amount of exposure but also on the comparatively active engagement with the L2 in social interactions (e.g., González Fernández & Schmitt, Reference González Fernández and Schmitt2015).
Studies of L2 collocation knowledge in a nonimmersion context seem to point to a lack of sensitivity of L2 learners to the association strength between words. For example, using a prompted productive collocation test, González Fernández and Schmitt (Reference González Fernández and Schmitt2015) measured 108 Spanish learners’ productive form recall knowledge of 50 English collocations that vary widely in respect of corpus frequency, t-score, and MI score and observed the following Pearson’s correlations with learners’ collocation scores: Raw corpus frequency: .45; t-score: .41; MI score: –.16. The authors concluded that “increasing the ‘tightness’ of the combinational bonding does not seem related to collocation learning” (p. 107). Weak mean correlations between MI scores and learners’ collocation knowledge were found in a meta-analysis carried out by Durrant (Reference Durrant2014). He argued that L2 learners, unlike L1 speakers, may notice only whole collocation frequency and not association strength between the constituent parts, concluding that “L1 learners notice both collocations and their components, while L2 learners focus only on the whole collocation” (ibid., p. 472). In contrast, Wray (Reference Wray2002) suggested that L2 learners tend to focus on individual words. Again, findings have been mixed and further research is needed.
Imageability
Imageability, defined as a lexeme’s “capacity to evoke a mental image” (Steinel et al., Reference Steinel, Hulstijn and Steinel2007, p. 449) and concreteness, defined as “the degree to which the concept denoted by a word refers to a perceptible entity” (Brysbaert et al., Reference Brysbaert, Warriner and Kuperman2014a, p. 904) are often used interchangeably because of the typically high correlation between both measures (e.g., Brysbaert et al., Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014b). Both imageability and concreteness are known to be potent facilitators of L2 word learning (e.g., De Groot and Keijzer, Reference De Groot and Keijzer2000; Ding et al., Reference Ding, Liu and Yang2017; Ellis and Beaton, Reference Ellis and Beaton1993). According to Steinel and colleagues (Reference Steinel, Hulstijn and Steinel2007), imageability may also facilitate the learning of L2 idioms. Because imageability has not been investigated yet as a possible variable in collocation learning, our analysis also took account of this variable.
Learner-related variables
Prior L2 vocabulary knowledge
A learner-related variable thought to be especially important for L2 collocation learning is learners’ prior L2 vocabulary size, that is, the number of known words, operationalized as “knowledge of the form–meaning connection” (Schmitt, Reference Schmitt2014, p. 915). Vilkaitė (Reference Vilkaitė2017), who investigated the effects of adjacency and prior vocabulary knowledge on the incidental acquisition of L2 collocations, found that learners’ prior receptive vocabulary knowledge—as measured by the Vocabulary Levels Test (VLT) (Nation, Reference Nation2001; Schmitt et al., Reference Schmitt, Schmitt and Clapham2001)—had a positive effect on the learning of collocations. Specifically, with an increase of one point in a learner’s VLT score, the predicted probability of learning a collocation increased by 10% in the immediate posttest and by 13% in the delayed posttest. Vu and Peters (Reference Vu and Peters2021) found that learners with a higher score on the VLT had a better chance of learning the form of the collocation: With an increase of one unit in the VLT score, the odds of a correct response increased by 2.2%. However, in the study of Toomer and Elgort (Reference Toomer and Elgort2019), participants’ L2 vocabulary knowledge—also measured by the VLT—seemed not to affect L2 learners’ gains of collocations. What should be remarked though, is that the VLT measures form recognition. In general, researchers agree that learners’ mastery of form (and meaning) recall lags behind their mastery of form (and meaning) recognition (e.g., Schmitt, Reference Schmitt2014) and that productive tasks are often more demanding than receptive ones (e.g., Webb, Reference Webb2008). Thus, if receptive vocabulary size is an influential factor in collocation learning, then it may be assumed that learners with a larger productive vocabulary size tend to be especially able to acquire productive collocation knowledge. Therefore, it seemed worth investigating whether learners’ prior productive vocabulary knowledge—as measured by a German version of the Productive VLT (PVLT)—influences the acquisition of collocations over time.
L2 immersion
Another factor that has been shown to influence the learnability of collocations is learners’ exposure to the L2. Usage-based theories predict that extensive exposure is needed for language learning in general (e.g., Ellis, Reference Ellis2002) and for collocation learning in particular (e.g., Durrant & Schmitt, Reference Durrant and Schmitt2010). However, González Fernández and Schmitt (Reference González Fernández and Schmitt2015) have argued that “it may not be exposure per se that is important, but the kind of high-quality engagement with language that presumable occurs in a socially-integrated environment, where learners wish to use the L2 for meaningful and pleasurable communication” (p. 101). Thus, it would be reasonable to expect that spending time in a L2 environment may facilitate collocation learning. Some studies have indeed observed a positive effect of L2 immersion abroad on learners’ collocation knowledge (González Fernández & Schmitt, Reference González Fernández and Schmitt2015; Groom, Reference Groom, Barfield and Gyllstad2009; Macis & Schmitt, Reference Macis and Schmitt2017; Siyanova & Schmitt, Reference Siyanova and Schmitt2008). In other studies, it was found that a stay in the target language country did not lead to an appreciably higher or more accurate use of L2 collocations (Boone, Reference Boone, Mitchell and Tyne2021; Li & Schmitt, Reference Li, Schmitt and Wood2010; Nesselhauf, Reference Nesselhauf2005). Because findings have been mixed, L2 immersion will be added as a variable in our analysis.
The present study
Although the studies reviewed above have undoubtedly contributed to a deeper understanding of L2 collocation development, more longitudinal studies are needed (e.g., Siyanova-Chanturia & Spina, Reference Siyanova‐Chanturia and Spina2020). First, the longitudinal studies outlined above ranged in duration from 9 weeks (Vu & Peters, Reference Vu and Peters2021) to 21 months (Edmonds & Gudmestad, Reference Edmonds and Gudmestad2021). Examining development over a longer period may contribute to a better understanding of the language learning process, which is often dynamic, complex, and long-ongoing (e.g., Larsen-Freeman, Reference Larsen-Freeman1997). Second, most longitudinal studies of collocation development have been corpus studies focusing on L2 learner output in writing, elicited by means of writing assignments, from which collocations are extracted. Longitudinal studies of L2 productive collocation knowledge (i.e., form recall) of specific target items, rather than collocation use, are scarce, but might provide additional insights on the learnability of specific collocations. Third, with the exception of Vu and Peters (Reference Vu and Peters2021), previous relevant longitudinal studies did not take into account a broad range of item- and learner-related variables. Examining item-related variables may help to identify characteristics that make collocations comparatively easy or hard to learn, and whether the effect of these variables changes during the learning process. Additionally, it is crucial to take account of individual learner profiles (e.g., Boers, Reference Boers and Webb2020) to see how these learner-related variables influence collocation development. Fourth, although other L2 languages than English are starting to be explored (e.g., Edmonds & Gudmestad, Reference Edmonds and Gudmestad2021; Siyanova-Chanturia, Reference Siyanova-Chanturia2015), the majority of studies in the field so far have focused on use or acquisition of collocations in L2 English. In sum, our study aimed to add to the existing body of research on collocation development by: (a) adopting a 3-year longitudinal design, (b) testing learners’ productive collocation knowledge by focusing on their correct or incorrect (written) production of a given collocation in a form recall test format, (c) taking into account several item- and learner-related variables, and (e) exploring an underrepresented L2, namely German.
The research questions are the following:
-
1) How does learners’ collocation knowledge develop over time?
-
2) How do several item-related variables (congruency, corpus frequency, association strength, and imageability) influence this development?
-
3) How do two learner-related variables (prior productive vocabulary knowledge and L2 immersion) influence this development?
-
4) Does the influence of these item-related and learner-related variables change over time?
Methodology
Participants
The participants in this study were 50 L1 Dutch undergraduate students (9 male, 41 female), majoring in German and an additional foreign language at a Belgian university. Twenty-one of them were studying French, 16 English, 9 Spanish, 2 Italian, 1 Russian, and 1 Turkish as their extra foreign language. They were all exposed to the same formal classroom instruction in German at university (190 contact hours in the first year, 215 in the second, and 140 the third year). Their bachelor’s program consists of an in-depth study of Dutch and two foreign languages and has an explicit focus on grammar and vocabulary in the first year, whereas there is more language practice (e.g., within translation, speaking, and writing courses) in the second and third year. No prior knowledge of German is required for the program, and the targeted level for graduating is a B2/C1 level (upper-intermediate for speaking and writing; advanced for listening and reading) according to the Common European Framework of Reference (Council of Europe, 2001). As a curriculum requirement, during the third academic year the students were expected to participate in a compulsory 5-month exchange program abroad. Of the 43 students who participated in the third collocation test, 24 of them went to a German-speaking country and 19 spent the semester in a non-German-speaking country. All students continued to study German at the host universities.
Data collection started at the beginning of students’ university program and ended after 3 years. All 50 students participated in the PVLT and in at least two data collection points of the collocation test. Participants took part on a voluntary basis and provided informed consent.
Target collocations
We wanted to develop a sample of targets representative of collocations that students might encounter during a learning trajectory aiming for a B2/C1 level. Identification of representative collocations for this level is complicated because of the variety and sheer number of collocations, and because there is no published, validated list of collocations to work from. However, there is an official German B1 word list (Glaboniat et al., Reference Glaboniat, Perlmann-Balme and Studer2013), which comprises about 2,400 lexical items that learners should know at this level. A number of collocations can be found in the example sentences next to the lexical items in this list (e.g., packen—Ich muss noch meinen Koffer packen, “I still have to pack my suitcase”). For this study, we selected only adjective-noun, noun-verb and preposition-noun-verb collocations. The German collocation dictionaries of Quasthoff (Reference Quasthoff2011) and Häcki Buhofer et al. (Reference Häcki Buhofer, Dräger, Meier and Roth2014) were then consulted, and candidate collocations appearing in at least one of these dictionaries were selected. The result was a pool of 55 collocations. To make sure there was sufficient variety in terms of frequency and association strength, these collocations were cross-checked with the German Web Corpus 2013 (deTenTen), a corpus of 16.5 billion words, made up of texts collected from the internet using the concordance tool in SketchEngine (https://www.sketchengine.eu/). We also checked that the targets did not appear in the vocabulary lists of students’ course textbooks and had not been addressed explicitly in vocabulary class, which was confirmed by the teachers.
Instruments
Productive collocation test
To measure students’ productive collocation knowledge of the target collocations, a productive collocation form recall test was developed, which took the form of a gap-fill translation test. Students had to complete the German sentences by adding the appropriate German collocation, as indicated by a L1 (Dutch) translation provided in parenthesis. For example: Zwischen Gesundheit und Armut besteht ein (nauw verband) _________________. (“There is a close link between health and poverty.”)
We ran a pilot study, in which we administered the collocation test to 77 first-year students of German. The aim was to test the internal consistency of the 55 items and to identify and omit ambiguous candidate collocations. Internal consistency of the items was measured using Cronbach’s alpha and was found to be high (α = .92). However, the pilot showed that several items were not suitable for the purposes of our study. For example, the Dutch collocation moeite doen (“make an effort”) can be translated in German with the collocation sich Mühe geben, but also with the reflexive verb sich anstrengen (which is not a collocation). Another issue was that there were collocations that have multiple correct translations in German. An example is Nebel (“fog”), which occurs in dichter Nebel, dicker Nebel, starker Nebel—all meaning “dense fog” (Dutch dichte mist). After exclusion of the collocations deemed problematic for this reason, 35 collocations remained (13 adjective-noun, 15 noun-verb, and 7 preposition-noun-verb). Cronbach’s alpha showed a good internal consistency of the 35 items in the task (α = .87). The same 35 items were used in the same test format each year, but the items were put in randomized order, and students did not receive feedback on their performance. The 35 target collocations can be found in Appendix A, the collocation test in Appendix B (Supplementary Material).
Because we did not want to attract participants’ attention to the targets at the very beginning of their learning trajectory, we did not administer a pretest (e.g., Toomer & Elgort, Reference Toomer and Elgort2019). Instead, to estimate baseline knowledge of the target collocations we collected proxy pretest scores from a very similar sample of learners: 32 Dutch-speaking undergraduate students of German at the beginning of their first year of university. The test was the same as the year-end tests in the main study. The proxy pretest scores show that only one congruent collocation, Ziel erreichen (“achieve a goal”) had a mean score of 0.47 (SD = 0.50). The mean score for the remaining 16 congruent collocations was 0.09, the mean score for the 18 incongruent collocations was 0.01. These scores show that there was negligible productive knowledge of the target collocations in this group.
Item-related and learner-related variables in the study
As part of the study, we collected measures for several potential predictors of learners’ L2 collocation knowledge. Appendix C (Supplementary Material) gives the values for all variables.
Congruency: To determine the effect of L1 congruency, the target items were labeled as congruent (1) or incongruent (0), based on the ratings of 11 university lecturers who were asked to decide whether the target item has a literal L1 translation equivalent in the L2 (“+congruent”) or not (“–incongruent”). These university lecturers had between 4 and 30 years of experience in teaching German. To estimate the reliability of the ratings for the 35 targets the intraclass correlation coefficient (ICC) was computed. The relevant version of the ICC as a measure of consistency is “2-way average random raters.” We used the psych package in R (Revelle, Reference Revelle2020) and found ICC = 0.95, 95% confidence interval (CI) [.93, .97], which indicates excellent reliability (Koo & Li, Reference Koo and Li2016). The target items were assigned the congruency values 1 or 0 depending on the rating of the majority of the lecturers. The result was that 17 collocations were signaled as congruent (1) and 18 signaled as incongruent (0). Of the 18 incongruent collocations, 13 contain one word that does not translate literally from Dutch, 4 of them two words, and 1 three words (including the preposition). Congruency was included as a dichotomous variable in the analysis.
Frequency and association strength: For all 35 targets, raw corpus frequency values (for the entire collocation and for the noun), t-score, and MI score were obtained from the German Web Corpus 2013. We used the SUBTLEX Zipf scale (Van Heuven et al., Reference Van Heuven, Mandera, Keuleers and Brysbaert2014) to log transform all frequency counts with the formula log10 (frequency per million words)+3. The advantage of this scale is that it is logarithmic and that the values are easy to interpret (ibid.).
Imageability: To determine imageability, we collected subjective ratings of imageability on a 7-point Likert scale from 17 very advanced L2 speakers of German (all holding a master’s degree in German language/literature and using German regularly in their jobs or daily lives) and seven L1 German speakers for a list of 66 collocations, including the 35 target collocations and 31 nontarget collocations. One purpose of the added 31 items was to allow our new imageability ratings to be validated (i.e., compared with a previously published set of collocation ratings). Seven of the 31 nontarget collocations functioned as list-initial “calibrator” items intended to serve as examples of the various levels of the rating scale. All 31 nontarget collocations were selected from the database compiled by Citron et al. (Reference Citron, Cacciari, Kucharski, Beck, Conrad and Jacobs2016) that gives concreteness ratings of 619 German phrases. Given the typically strong correlations between ratings of concreteness and imageability and because no collections of imageability ratings for German are yet available, we used the database of Citron et al. (Reference Citron, Cacciari, Kucharski, Beck, Conrad and Jacobs2016) to be able to validate our ratings.
The randomized list of to-be-rated collocations was presented to the raters. To increase the reliability of the ratings, raters were invited to rate the collocations twice, with a pause before the second round. For the second round, the collocations were presented in a new randomized order. Sixteen of the raters rated the collocations twice and eight rated them once. The two sets of ratings from the raters who completed the ratings twice were averaged to yield a single set of mean ratings for that person. Finally, a mean rating across all raters was calculated for each collocation (Appendix D).
To validate our ratings, we calculated correlations between the existing concreteness ratings (ibid.) and our new imageability ratings for the 31 nontarget collocations, finding r = .79, CI [.61, .89], which is very similar to the range reported for correlations between imageability and concreteness in the literature (e.g., Brysbaert et al., Reference Brysbaert, Stevens, De Deyne, Voorspoels and Storms2014b). To estimate the reliability of the ratings for the 35 target collocations we calculated the appropriate version of the ICC = 0.92, CI [.88, .95]. This indicates excellent reliability (Koo & Li, Reference Koo and Li2016).
Prior productive vocabulary size: To assess students’ prior vocabulary, we administered the PVLT for German developed by the German Institute for Test Research and Test Development in Leipzig and modeled after Nation’s PVLT for English (Nation, Reference Nation2001). The test contains five subtests, which measure learners’ vocabulary knowledge on the vocabulary levels of 1,000, 2,000, 3,000, 4,000, and 5,000 words, respectively. These levels are based on the frequency lists derived from the Herder/BYU-German corpus (Jones et al., Reference Jones, Tschirner, Goldhahn, Buchwald and Ittner2006). There are 18 cloze items per subtest (i.e., per frequency level). Each target word is embedded in one or two sentences. To disambiguate the target items, the first letter (or letters) of a targeted word is provided. For example: In dem Dorf steht eine alte Ki_______________. (“In the village, there is an old ch__________.”)
L2 immersion: All students participated in a compulsory 5-month exchange program in a country in which one of their languages of study is spoken. In our analysis, L2 immersion or study abroad (SA) was coded as SA_TL if the participant spent the semester abroad in a German-speaking country (n = 24), or SA_nonTL if otherwise (n = 19). It should be remarked, however, that for Time 1 and Time 2 L2 immersion was not coded, because students only went abroad between Time 2 and Time 3.
Procedure
Participants were tracked for 3 academic years. Both the PVLT and the collocation test were administered in class and students could not use a dictionary. In total, there were four test sessions. The PVLT was administered at the beginning of students’ first year of university, as a paper-and-pencil test. The time needed for completion was 30 minutes. The first collocation test was administered at the end of students’ first year of university, the second at the end of the second year, and the third at the end of the third year. The first two versions of the collocation test were taken as paper-and-pencil tests, but due to the COVID-19 pandemic, the third version had to be administered online. The time needed to complete this test was 25 minutes.
Scoring and analyses
The PVLT and the collocation tests were corrected manually, and each test item was scored either one point for a (completely) correct answer (e.g., Ziel erreichen for “achieve a goal”) or zero points for an incorrect or incomplete answer (e.g., …erreichen or Zweck erzielen). For the collocation test, a binary score for each collocation was given, which was used in the analyses. For the PVLT, learners’ mean percentage score across all subtests (measuring learners’ vocabulary knowledge on the vocabulary levels of 1,000, 2,000, 3,000, 4,000, and 5,000 words, respectively) was used in subsequent regression modeling. All analyses were carried out using the R software environment (version 4.1.2; R Core Team, 2021).
To visualize how learners’ collocation knowledge develops over time (RQ1), we calculated descriptive statistics for the proxy baseline test and the three collocation tests. The ggplot2 package (version 3.3.5; Wickham, Reference Wickham2016) was used to create line plots. To explore the influence of the learner- and item-related variables on students’ collocation score (RQ2 and 3) and the influence of time (RQ4), a generalized linear mixed model was used because the outcome variable (collocation score) is binary. A linear mixed model is an extension of a simple linear model to allow both random and fixed effects that account for individual variation between items and participants. The model was constructed using the gmler-function from the package lme4 (version 1.1.26; Bates et al., Reference Bates, Mächler, Bolker and Walker2021).
First, the continuous variables were centered on the mean. Then, a basic model was built with only random effects: items and learners. Next, the fixed effect “time” and the learner-related fixed effect “baseline productive vocabulary” were added. To be able to integrate the other learner-related variable—L2 immersion—a separate model was built, because students went abroad during their third year and consequently, only the results of the final test might have been influenced by this L2 immersion experience. Also, the item-related fixed effects congruency, collocation frequency, MI, and imageability were added. To avoid a collinearity problem, noun frequency and t-score were not included (see Table 2 for a correlation matrix). Interactions between time and the other fixed effects were added. Finally, variables and interactions were omitted until the best fit was identified. Models were fit using a maximum likelihood technique (Laplace Approximation) technique. Model fit was assessed using the anova-function in R. Marginal R2 was calculated, which measures the variance explained by the fixed effects only, and conditional R2, which measures the variance explained by both the fixed effects and the random effects, using the performance package (version 0.7.2; Lüdecke et al., Reference Lüdecke2021) in R.
Results
The study collected binary (correct vs. incorrect) learner responses to 35 German collocations at three times. Because 50 learners were enrolled in the study, the potential number of binary scores for tests 1 to 3 was 5,250. However, owing to learner absences the actual total was 4,235. Baseline knowledge of the 35 collocations was estimated by testing 32 learners similar to the ones participating in our study. The test-to-test correlations between the by-item scores on tests of productive collocation knowledge, with bootstrapped 95% CIs, are as follows: Proxy test to Test 1: r = .71 [.48, .84], Test 1 to Test 2: r = .77 [.59, .89], Test 2 to Test 3: r = .90 [.76, .96].
To visualize the results for RQ1 (How does learners’ collocation knowledge develop over time), descriptive statistics (mean, median, standard deviation and range) of the by-item scores are given in Table 1. As can be seen, there is general progress from time 0 to time 3.
Figure 1 represents the learning trend of the 35 target collocations. In each time point represented in the figure there would be 35 dots (one per collocation) if no collocation had the same mean score as any other. Although all the dots have been randomly “jittered” to minimize complete overlaps when multiple collocations have the same mean score, it is still the case that some dots are not visible. A dot that is especially dark corresponds to more than one collocation. Dots in two columns that relate to the same collocation are connected by a line. To sum up, Figure 1 shows that most baseline scores were at or near zero. General progress in collocation learning is indicated by the fact that most of the lines slope upward from left to right.
Twenty-one learners took all three year-end collocation tests. For these learners the total per-collocation test scores correlate fairly strongly from test to test: Test 1 to Test 2, r = .60; Test 2 to Test 3, r = .71. Figure 2 shows the trend of collocation learning for the 21 learners who took all three year-end tests of productive collocation knowledge. Overall progress is indicated by the fact that the great majority of the lines slope upward from test to test. It is plain, however, that there was some forgetting, especially during the final year.
Lastly, Figure 3 gives an overview of the collocation learning of all learners who were present for at least two consecutive year-end tests. Again, there was general progress but also some forgetting.
To explore which variables contributed to learners’ collocation development (RQ2 and 3), two mixed-effects logistic regression models were built. Table 2 provides descriptive statistics and a correlation matrix for the continuous item-related variables.
a Zipf transformed values.
* p < .05; **p < .01.
First, models were run without the variable L2 immersion because this variable could only affect the results at Time 3. The basic generalized linear mixed-effects model included only the random effects of “learner” and “item,” and showed that the variable “item” explained most of the variation (variance = 2.21, SD = 1.49, ICC = .39). Far less variation was explained by the variable “learner” (variance = 0.21, SD = 0.46, ICC = .04). Then, two basic models were compared using the anova-function in the lme4 package in R, which gives a chi-square test of the relative fit of two embedded regression models (Brysbaert, Reference Brysbaert2020). Adding the random intercept for “learner” contributed significantly to improving the model fit χ2 (1) = 82.25, p < .001. The best model to answer our research questions included random intercepts for item and learner and three significant fixed effects (time, learners’ productive vocabulary knowledge and congruency). Table 3 shows the final model, which has a marginal R2 of .20 and a conditional R2 of .45. This means that the fixed effects in the model explain 20% of the variance, and that an extra 25% of the variance was explained by the random effects. The odd coefficient for productive vocabulary was 1.03, CI95% [1.01, 1.04], meaning that a one unit positive difference in participant’s mean percentage score on the PVLT corresponds to a 3% positive difference in the probability of having productive knowledge of a collocation.
Note: Baseline for congruency = incongruent.
***p <0.05; p <0.01; ***p <0.001.
Then, another model was built with only the data of Time 3, including the variable L2 immersion (1,505 observations, 43 learners). This model also showed a significant effect of learners’ prior vocabulary knowledge and congruency. Having studied in the target language country did not affect the results on the collocation test significantly. The results of the final model can be found in Table 4. For this model, we found a marginal R2 of .13 and a conditional R2 of .42.
Note: Baseline for congruency = incongruent.
***p < 0.05; p < 0.01; ***p < 0.001.
To answer RQ4, which was to determine whether the influence of these item-related and learner-related variables changes over time, we explored interactions between time and the significant predictors (productive vocabulary and congruency). A significant interaction between time and congruency was found. To make it easier to interpret this interaction effect, Figure 4 was added. It shows a rising trend for the knowledge of congruent collocations, although the curve is not very steep, rather it is gradual. In contrast, the predicted probability that learners will know an incongruent collocation clearly rises. Specifically, at Time 1 the two contrasted probabilities are far apart. At Times 2 and 3 they are markedly less far apart, showing that the effect of time diminishes for congruent versus incongruent collocations. The learning curve for incongruent collocations is steeper compared to the curve of the congruent collocations, even though more congruent collocations are still known compared to incongruent collocations at Time 3.
Discussion
How does learners’ collocation knowledge develop over time?
Our results show that there was a general increase in collocation knowledge (i.e., form recall knowledge) after 3 years of studying German. This seems unsurprising because our participants were motivated language specialists who engaged with German almost daily at university in classes that include language production. However, if we look at the trend of collocation learning (Figures 1, 2 and 3), we see that at Time 3, not one learner was able to produce all 35 collocations correctly. Figures 2 and 3 show that total per learner scores range from 13 to 29. Although our learners were German majors, some of them still had rather limited knowledge of the target collocations after 3 years. This finding is in line with previous studies that indicate that the acquisition of collocations is slow and challenging even for advanced learners (e.g., Boers et al., Reference Boers, Lindstromberg and Eyckmans2014; Laufer & Waldman, Reference Laufer and Waldman2011; Nesselhauf, Reference Nesselhauf2003). It also appears to confirm the evidence in vocabulary research that “form recall is the most difficult degree of mastery of the form-meaning link” (Schmitt, Reference Schmitt2014, p. 929). In addition, it seems that the learning process was rather nonlinear, both with respect to the items (Figure 1) and the individual learners (Figures 2 and 3). These outcomes seem to be in line with the dynamic systems approach to language learning, in which language development is expected to be a nonlinear, chaotic, and highly individual process, with a learning curve “filled with peaks and valleys, progress and backsliding” (Larsen-Freeman, Reference Larsen-Freeman1997, p. 151). For collocation development, this type of process was already illustrated in the longitudinal study of Li and Schmitt (Reference Li, Schmitt and Wood2010), who reported the variation in collocation development of four learners followed over one year. This is confirmed in our study, in which there is considerable variation both in how well individual collocations were learned and in how well individual learners learned collocations. For the majority of the learners, there is a clear upward trend in the learning curve, but for some learners, some attrition from Time 2 to 3 was observed. This attrition might be explained by the fact that some of these individuals might have had less input to L2 German (e.g., through out-of-class activities such as reading books or articles, watching television, listening to music, or using social media in the L2), which also means less opportunities for contextual vocabulary learning. To get more insight into the causes hereof, qualitative interview data could prove useful.
The per learner scores at Time 3 do not only show that collocation learning is slow, but also raise the question of the effectiveness of contextual (incidental) learning. Research has shown that L2 collocations can be acquired both incidentally and intentionally, and that intentional learning results in greater gains (Szudarski, Reference Szudarski2017). It is likely that the long-term retention of the 35 collocations would have been better if they had been used as targets in a study on intentional learning. Explicit collocation instruction is definitely needed but because only a small number of collocations can be taught in the classroom, it is important to know which variables affect learning to make informed choices about which collocations should be selected for classroom learning, and how to deal with individual differences.
How do several item-related variables (i.e., congruency, corpus frequency, association strength, and imageability) influence L2 collocation development?
This study found that congruency had a statistically significant positive effect on learners’ productive collocation knowledge. Students’ better knowledge of congruent collocations at time 1, 2 and 3 compared to their knowledge of incongruent collocations might be explained by the fact that German and Dutch are highly related Germanic languages. However, this positive congruency effect has been shown for other language pairs too: for less highly related Germanic language pairs like English–Dutch (Peters, Reference Peters2016), English–German (Nesselhauf, Reference Nesselhauf2003, Reference Nesselhauf2005), and English–Swedish (Wolter & Gyllstad, Reference Wolter and Gyllstad2011, Reference Wolter and Gyllstad2013) and also for much less related language pairs like English–Chinese (Ding & Reynolds, Reference Ding and Reynolds2019), English–Japanese (Yamashita & Jiang, Reference Yamashita and Jiang2010), and English–Vietnamese (Vu & Peters, Reference Vu and Peters2021). Our results are thus in line with previous studies and indicate that (a) students often tend to rely on word-for-word translation when producing L2 collocations (Laufer & Waldman, Reference Laufer and Waldman2011) and that (b) L1-L2 congruency is an important factor in the processing and use of collocations, which should be taken into account in teaching.
With respect to the other item-related variables (i.e., collocation frequency, MI, imageability), the results were nonsignificant in the final model. Although some studies did find that collocation frequency related to some degree to L2 collocation knowledge, they also point out that corpus frequency is only one factor of influence (e.g., Durrant, Reference Durrant2014; González Fernández & Schmitt, Reference González Fernández and Schmitt2015). Vu and Peters (Reference Vu and Peters2021), who included a larger number of factors in their study, found no significant effect for corpus frequency. It is thus clear that the relationship between corpus frequency and collocation knowledge is not straightforward, and that a study’s findings may also depend on the corpus and the target items used (e.g., collocations consisting of infrequent words may yield different results compared to collocations consisting of high-frequency words or delexical verbs). The nonsignificant effect of MI in this study seems to be in line with other finding on L2 collocation knowledge in a non-immersion context, in which the strength of association between the words of a collocation does not seem related to L2 collocation learning (Durrant, Reference Durrant2014; González Fernández & Schmitt, Reference González Fernández and Schmitt2015). Regarding imageability, it has been shown that it facilitates L2 word learning (e.g., De Groot and Keijzer, Reference De Groot and Keijzer2000), and that it may also facilitate the learning of L2 idioms (Steinel et al., Reference Steinel, Hulstijn and Steinel2007). Our study could not confirm a facilitating effect for L2 collocations, although we think that due to the limited number of items, further research into the influence of imageability on L2 collocation learning is needed.
How do the learner-related variables prior vocabulary knowledge and L2 immersion influence L2 collocation development?
Learners’ baseline productive vocabulary emerged as a statistically significant predictor of productive collocation knowledge at the form-recall level. This finding seems to support our hypothesis that, if receptive vocabulary knowledge predicts collocation knowledge, both on a receptive level (e.g., Gyllstad, Reference Gyllstad, Barfield and Gyllstad2009; Nguyen & Webb, Reference Nguyen and Webb2017; Vilkaitė, Reference Vilkaitė2017) and on a form-recall level (e.g., Peters, Reference Peters2016; Vu & Peters, Reference Vu and Peters2021), productive vocabulary will do so too. In our study, we found that with an increase of one point in the mean percentage PVLT score, the odds of learning a collocation increased by 3%. These results extend the evidence of the “the-rich-get-richer” phenomenon in vocabulary learning, whereby larger vocabulary sizes, receptive or productive, are associated with better learning outcomes (e.g., James et al., Reference James, Gaskell, Weighall and Henderson2017). The findings might also point to the fact that in our study, two widely recognized constructs of vocabulary knowledge, namely vocabulary size (i.e., knowledge of the form–meaning connection) and vocabulary depth (e.g., collocation knowledge) (Schmitt, Reference Schmitt2014) appear to be related because students’ productive vocabulary size predicted their development of productive collocation knowledge.
Although some studies indicate that L2 immersion plays a role in collocation knowledge, we did not find a significant effect of a stay abroad in a German-speaking country on learners’ collocation knowledge at Time 3. This might be explained by the fact that 5 months might be quite short for collocation development, or by the fact that also the students going to a non-German-speaking country continued studying German at the universities abroad. It might also depend on students’ active engagement with the L2 in social interaction abroad (e.g., González Fernández & Schmitt, Reference González Fernández and Schmitt2015), which might take place outside the target language country too (e.g., Boone, Reference Boone, Mitchell and Tyne2021). Also here, qualitative interview data, for example on students’ L2 exposure and use or their L2 learning experience during SA, could yield relevant information.
Does the influence of these item-related and learner-related variables change over time?
A significant interaction effect between time and congruency was found in this study, indicating that the influence of congruency may change over time. Specifically, we found that students’ knowledge of congruent collocations remained comparatively stable from Time 1 to Time 3, whereas their knowledge of incongruent collocations significantly rose with time (Figure 4). However, it should be noted that for learners who already produced some of the congruent collocations correctly at Time 1, it was mathematically impossible to make as much progress as was the case with respect to the incongruent collocations.
Because of the important impact of the L1 on the processing of L2 collocations even at advanced levels of proficiency, Wolter and Gyllstad (Reference Wolter and Gyllstad2013) assumed a persisting congruency effect in L2 collocation processing. In our study, we see that the probability of knowing a congruent collocation compared to an incongruent collocation is higher at Time 3, but our study also suggests that, as learners’ proficiency level rises, learners may have increasing success in acquiring incongruent collocations. In short, the results of our study are consistent with previous findings of a fairly general positive effect of congruency but also show that the substantive importance of the effect dwindles as learning progresses.
Interestingly, the two collocations with the lowest score at Time 3 are incongruent collocations with more than one incongruent constituent part, which points at a possibility that the level of congruency may play a role in collocation learning. If this finding were borne out by further research, there would be pedagogical implications for the foreign language classroom. Teachers could devote extra attention to incongruent collocations, especially to the “very incongruent” ones (i.e., with both constituent parts being incongruent) because those are likely to cause problems for learners (e.g., Nesselhauf, Reference Nesselhauf2003). A contrasting L1-L2 approach, making students aware of L1-L2 differences, can be recommended. Although there was one maximum by item score (at Time 3 for eine Rolle spielen [“play a role”]), we believe that congruent collocations need attention too, as Wolter and Gyllstad (Reference Wolter and Gyllstad2011) have already pointed out. We suggest that especially in the beginning of a learning trajectory, when the gap between learners’ knowledge of congruent and incongruent collocations is large, teachers should give attention to incongruent collocations by setting exercises with known potential to enhance learners’ collocation knowledge (Boers & Lindstromberg, Reference Boers and Lindstromberg2012; Szudarski, Reference Szudarski2017). However, because the learning curve of congruent collocations hardly changes from Time 1 to Time 3, it may be useful in a later phase to give extra attention to congruent collocations that seem likely to be relatively hard to learn because they are infrequent or contain low-frequent words, for example.
Limitations and suggestions for future research
Our findings have to be seen in light of several limitations. First, the number of both targets and learners was fairly low. Here, it is relevant that L2 learners of German are not as numerous as L2 English learners. However, smaller numbers of learners should not dissuade researchers from investigating other languages because each language has its own characteristics and deserves its place in the field of applied linguistics. Additionally, compared to other longitudinal studies, the sample size is reasonable. However, we think it is recommendable to adopt a mixed approach in future studies, where quantitative results of smaller samples are complemented with qualitative results to see how individual learners deal with the challenges of learning L2 collocations and to get more context for the findings (e.g., on the attrition or on the L2 immersion experience). Qualitative insights are important to get the complete picture, because, as Henriksen (Reference Henriksen, Bardel, Lindqvist and Laufer2013) puts it: “It is more than likely that collocational acquisition is much more idiosyncratic in nature and dependent on specific language use situations than single-word acquisition” (pp. 48–49). A second methodological issue is the use of multiple tests to measure learners’ development. In this study, a positive testing effect cannot be entirely ruled out, that is some learning might have happened during test taking. However, we tried to reduce this effect by leaving a gap of a year between the completion of the tests. Third, the number of variables influencing collocation knowledge is undoubtedly much higher than the number investigated in this study. The predictors investigated here explain about 20% of the variation in test scores, which provides an opportunity for further studies to identify other factors involved. Fourth, it is not impossible that participants were able to translate some congruent collocations correctly even if they had never encountered these collocations before. For cognates (i.e., words with a similar form and meaning in the L1 and the L2), it has been shown that they “can grant learners access to a reservoir of potential target language vocabulary without explicit instruction” (Vanhove & Berthele, Reference Vanhove and Berthele2015, p. 2). It is likely that the same applies for congruency. Using a literal L1 equivalent works perfectly for congruent collocations, but not in case of incongruent collocations. However, this kind of “guessing effect” is difficult to avoid and might also be an indication of how students produce language. In relation to this, the other languages known by the participants may have had an effect on students’ collocation scores. An incongruent collocation targeted in our study, for example, might have been congruent in another foreign language with which our participants were familiar. It is not impossible that students’ additional languages served as a bridge to translate the L2 target collocations. We did not investigate this in the present study, but it might be interesting to explore in future research. Finally, because most of our incongruent targets contained only one word without a literal translation equivalent, future studies should look at learners’ acquisition of incongruent collocations of different incongruency levels.
Conclusion
The goal of this study was to investigate L2 learners’ productive collocation development in German and to examine the effect of several item- and learner-related variables. The results indicate that there was general progress, despite some forgetting. The variation in both per learner and per collocation scores shows that collocation learning is influenced by multiple variables.
As to item-related variables, our results corroborate previous findings that L1-L2 congruency is an important predictor of collocation knowledge. What is more, the congruency effect was found to persist throughout learners’ 3-year trajectory. To maximize collocation learning, we recommend that teachers and materials creators direct learners’ attention toward both congruent and incongruent collocations, with special attention to incongruent collocations at the beginning of the learning trajectory. As to learner-related variables, we found that learners with a comparatively large productive vocabulary at the beginning of the learning trajectory were more likely to produce correct L2 German collocations, which shows the importance of increasing one’s vocabulary as much as possible even in the early stages of learning an additional language.
Acknowledgments
We thank the anonymous reviewers for their valuable comments and suggestions for improving the manuscript. We extend special thanks to Seth Lindstromberg for his relentless assistance with statistical analyses and graphing, and for his invaluable advice and feedback on earlier drafts of this manuscript. Thanks also to all participants for their time and contribution to this study.
Data availability statement
This article received the Open Data and Open Materials badges for transparent practices. To view supplementary material for this article, please visit https://osf.io/yp2j4/?view_only=b0a7c06c30904072a6240f86a4ff1ff2.