Introduction
It is widely acknowledged that lexical knowledge contributes to second language (L2) learners’ overall language proficiency (e.g., Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011; Zareva et al., Reference Zareva, Schwanenflugel and Nikolova2005), and enhances learners’ mastery of language skills (e.g., Milton, Reference Milton, Bardel, Lindqvist and Laufer2013; Miralpeix & Muňoz, Reference Miralpeix and Muñoz2018; Stӕhr, Reference Stӕhr2008). Because lexical knowledge is viewed as a multifaceted construct (Henriksen, Reference Henriksen1999) that involves the acquisition of multiple word knowledge components (Nation, Reference Nation2013), several researchers have attempted to examine these various components including knowledge of single words and knowledge of collocations at the receptive and productive levels. Generally, the literature on lexical knowledge includes more measures of receptive word (e.g., Nation & Beglar, Reference Nation and Beglar2007; Schmitt et al., Reference Schmitt, Schmitt and Clapham2001; Webb et al., Reference Webb, Sasao and Ballance2017) and collocation (e.g., Gyllstad, Reference Gyllstad, Barfield and Gyllstad2009) knowledge than of productive word (e.g., Laufer & Nation, Reference Laufer and Nation1999) and collocation (e.g., Frankenberg-Garcia, Reference Frankenberg-Garcia2018) knowledge. Additionally, there is evidence that receptive knowledge of collocations develops in relation to receptive knowledge of single-word items (e.g., Nguyen & Webb, Reference Nguyen and Webb2017). However, little is known about the interrelationship between productive word and collocation knowledge, which is considered a higher-level aspect (e.g., Bahns & Eldaw, Reference Bahns and Eldaw1993; González Fernández & Schmitt, Reference González Fernández and Schmitt2020; Webb & Kagimoto, Reference Webb and Kagimoto2009).
The present study aims to contribute to the gap on the interrelationship between the higher-level productive knowledge of single words and collocations by using newly developed lemma-based measures of words and collocations at the first three 1,000 frequency levels of English. The tests are administered to native speakers of English as well as nonnative speakers in an English-as-a-Foreign-Language (EFL) context. To situate the present study in the literature, the next sections will survey research on the productive measures of single words, definition of collocations, and measures/determinants of productive collocation knowledge.
Background
Measuring productive knowledge of single words
Despite the availability of several measures of receptive word knowledge, such as the Vocabulary Levels Test or VLT (Nation, Reference Nation1990; Schmitt et al., Reference Schmitt, Schmitt and Clapham2001; Webb et al., Reference Webb, Sasao and Ballance2017) and the Vocabulary Size Test or VST (Nation & Beglar, Reference Nation and Beglar2007), only limited measures are available to assess productive word knowledge. One such measure is the lexical translation task (Webb, Reference Webb2008). In this test, L2 speakers are given L1 meanings and asked to provide their equivalent L2 forms. Webb reported in his study that such an L1-L2 translation test can elicit varied responses for target items, which means less control over intended L2 form. Although it is possible to restrict responses by providing the first letter(s) of the target word, a productive translation test may not reflect production during actual language use.
Another test that is intended to measure productive vocabulary knowledge is Lex30 (Meara & Fitzpatrick, Reference Meara and Fitzpatrick2000). This is a word-association test, where test-takers are required to produce a number of responses to stimulus words. While this test was found to indicate breadth of productive vocabulary, it appears to behave differently when used with learners of different proficiency levels (Walters, Reference Walters2012). Walters further argues that Lex30 scores are difficult to interpret.
Furthermore, CATSS (the new computer adaptive test of size and strength) (Aviad-Levitzky et al., Reference Aviad-Levitzky, Laufer and Goldstein2019) was developed to measure vocabulary knowledge in the receptive recall, productive recall, receptive recognition, and productive recognition modalities. The test targets word knowledge across 14 frequency bands (1K–14K). Productive recall, which is relevant to the present study, was measured through recalling a word form (e.g., She is a l_____ girl. (small)). As the test measures word knowledge from 14 frequency bands (including a range of low-frequency items), it may go far above the level of our target participants, who are EFL learners with varied proficiency levels.
A slightly different controlled productive word knowledge test is the Productive Vocabulary Levels Test (PVLT) which was developed by Laufer and Nation (Reference Laufer and Nation1999). The test is “controlled” in that it assesses learners’ ability to use a specific target L2 word when compelled to do so. The PVLT format is a gap-fill task where a meaningful sentence context is presented, and a missing target word is to be supplied. To restrict the responses, the first letters of the target word are provided (e.g., The book covers a series of isolated epis______ from history—Answer: episodes). The major principle is to include a minimal number of letters that only disambiguate the cue. The PVLT is similar to the VLT in that it targets sets of words that represent distinct frequency bands. A total of 18 items are sampled per frequency band: 2,000, 3,000, 5,000, University Word List, and 10,000. The scoring system is dichotomous (correct/incorrect), and minor spelling mistakes and grammatical errors are ignored. The examinee receives six scores: a score for each frequency band and a total score across bands.
The PVLT has been used widely as a measure of controlled productive word knowledge, but we opted for devising a new controlled productive word knowledge in the present study for several reasons. First, the original PVLT (ibid.) measures items from the 2,000-, 3,000-, 5,000-, and 10,000-word levels and the University Word List, which may go far beyond the level of our EFL participants. Thus, we opted to avoid low-frequency lemmas and focus instead on the 3,000 most frequent lemmas in English: 1,000, 2,000, and 3,000 levels. Furthermore, the PVLT uses word family (the headword and its inflectional and derivational forms, e.g., embarrass, embarrassed, and embarrassment) as the counting unit. While further empirical evidence is still needed to advance our understanding of the different lexical units (see Webb, Reference Webb2021 for an overview), research on L1 users (Wysocki & Jenkins, Reference Wysocki and Jenkins1987) and L2 learners (Schmitt & Zimmerman, Reference Schmitt and Zimmerman2002) seems to suggest that derivational knowledge develops with age and proficiency. For many less advanced L2 learners, the appropriate lexical unit for both receptive and productive purposes is likely to be a lemma (the headword and its inflectional forms in a given part of speech or PoS, e.g., embarrass and embarrassed, when used as a verb, are members of the same verb lemma) or a flemma (the headword and its inflectional forms regardless of PoS, e.g., embarrass, embarrassed as a verb, and embarrassed as an adjective are members of the same flemma). In the present study, the measure of productive word knowledge is similar in design to the PVLT but takes the aforementioned points into consideration.
Thus far, we have examined measures of productive knowledge of individual words. Because the aim of the present study is to link knowledge of words to knowledge of collocations, the following sections will focus on collocation knowledge.
Definition of collocations
Scholars interested in collocation research distinguish between two main approaches to defining collocations, namely, the phraseological approach (Cowie, Reference Cowie and Asher1994; Howarth, Reference Howarth1996; Nesselhauf, Reference Nesselhauf2003) and the frequency-based approach (see McEnery & Wilson, Reference McEnery and Wilson2001; Sinclair, Reference Sinclair1991). The phraseological approach identifies collocations based on co-occurrence restrictions among words and the relative semantic compositionality and restrictedness of meaning that distinguishes pure idioms (e.g., iron man) from collocations (e.g., handsome man) and free lexical combinations (e.g., funny man). The frequency-based approach, however, identifies collocations based on their co-occurrence frequency that is higher than mere chance based on strength of association measures, such as mutual information (MI).
In the present study, we follow the frequency-based approach to identifying collocations. This means that collocations refer to word combinations “that emerge from a corpus at greater frequency than could occur by chance, irrespective of their level of compositionality and/or semantic transparency” (Nguyen & Webb, Reference Nguyen and Webb2017, p. 300). This approach is highly valued in L2 learning because more corpus-based frequency is often considered a proxy of language exposure; more frequent items are more likely to be encountered first in the language input (Peters, Reference Peters and Webb2020).Footnote 1 Given our frequency-based approach to defining collocations, we will review three common measures of collocation strength: MI, t-score, and Log Dice.
MI is among the most widely used measures of collocation strength. It is related to “coherence” (Ellis et al., Reference Ellis, Simpson-Vlach and Maynard2008), “tightness” (González-Fernández & Schmitt, Reference González Fernández and Schmitt2015), and “appropriateness” of word combinations (Siyanova & Schmitt, Reference Siyanova and Schmitt2008). The MI score “uses a logarithmic scale to express the ratio between the frequency of the collocation and the frequency of random co-occurrences of the two words in the combination” (Gablasova et al., Reference Gablasova, Brezina and McEnery2017, p. 163). MI scores are especially high for combinations of rare words that very often co-occur such as “tectonic plate,” reflecting the exclusivity of the adjective “tectonic” with the noun “plate” (Durrant et al., Reference Durrant, Siyanova-Chanturia, Kremmel and Sonbul2022). T-score has also been used as a measure of “certainty of collocation” (Hunston, Reference Hunston2002, p. 73) and “the strength of co-occurrences” (Wolter & Gyllstad, Reference Wolter and Gyllstad2011, p. 436). However, Evert (Reference Evert2005) argues that the t-scores lack a transparent mathematical grounding. It is hence not quite possible to establish statistically reliable and valid cut-off points (Hunston, Reference Hunston2002). Unlike MI, T-score favors frequent collocations in the corpus (e.g., “of the” and “on the”; see Gablasova et al., Reference Gablasova, Brezina and McEnery2017). Log Dice, however, is in principle relatively similar to the MI score except that it does not account much for rare combinations (ibid.). Gablasova et al. (Reference Gablasova, Brezina and McEnery2017) provide the example of “zig zag” as a collocation with a high Log Dice score and explain that Log Dice is thought preferable to MI scores when the language learning constructs necessitate highlighting exclusivity between collocates without the rare frequency bias. However, Log Dice has not yet been extensively explored in the language learning research. In the current study, we opted to use MI scores because they are the most widely employed measures of strength of association. Moreover, to avoid the MI bias for rare combinations, we combined MI with a raw frequency threshold.
The often-cited MI threshold for “significant” collocations is 3 (Hunston, Reference Hunston2002). However, Evert (Reference Evert2008) proposed a ranking approach to operationalizing collocations on a cline from weaker to stronger ones, allowing the MI threshold value to go down. This ranking approach is the one to be employed in the present study as the study involves both native speakers of English who might produce very strong collocations and nonnatives who might produce weaker ones. Thus, based on the ranking frequency-based approach, we operationalize collocations in the present study as a sequence of words (two or more) with a minimum MI of 1 and a minimum frequency of 30 in the COCA (Corpus of Contemporary American English).
Measuring productive knowledge of collocations
Earlier studies have shown that L2 learners’ productive knowledge of collocations is limited. This research has either examined corpus-based evidence (e.g., Laufer & Waldman, Reference Laufer and Waldman2011; Nesselhauf, Reference Nesselhauf2003; Siyanova & Schmitt, Reference Siyanova and Schmitt2008) or used paper-and-pencil tests (e.g., Frankenberg-Garcia, Reference Frankenberg-Garcia2018; González Fernández & Schmitt, Reference González Fernández and Schmitt2015, Reference González Fernández and Schmitt2020; Nizonkiza, Reference Nizonkiza2012).
One of the earliest corpus-based collocation studies is Nesselhauf (Reference Nesselhauf2003) who examined the use of verb-noun collocations, such as take a break or shake one’s hand, by advanced German-speaking learners of English in free written production. The results showed that, despite participants’ high level of proficiency, they exhibited a notable difficulty in producing collocations. The most common type of collocation mistakes was the wrong choice of the verb (e.g., carry out races instead of hold races), followed by the wrong choice of nouns (e.g., close lacks instead of close gaps). Similarly, Laufer and Waldman (Reference Laufer and Waldman2011) investigated the use of English verb-noun collocations in the writing of native speakers of Hebrew at three proficiency levels. The results revealed that learners of all proficiency levels produced a higher number of deviant collocations and far fewer collocations than native speakers. It is notable that Laufer and Waldman (Reference Laufer and Waldman2011) mainly employed a dictionary-check method to classify verb-noun combinations as acceptable or deviant collocations. As noted in the preceding text, the present study uses a pure frequency-based approach to identifying collocations and may thus depict a different picture.
Another relevant corpus-based study is Siyanova and Schmitt (Reference Siyanova and Schmitt2008, Study 1) who examined English adjective-noun collocations produced in essays by Russian learners of English in comparison to native speakers of English. Appropriate collocations were identified based on joint frequency and MI scores in the BNC. An MI threshold of 3 was set for appropriate collocations in line with Hunston’s (Reference Hunston2002) criteria. A frequency criterion was also added, that is, six times in the BNC. This figure was chosen because it allowed for the inclusion of almost half the identified collocation data. Surprisingly, the results revealed very little difference between native speakers and nonnative speakers in the use of collocations (48.1% vs. 44.6% of the combinations produced were appropriate based on BNC counts, respectively).
One great advantage of corpus-based studies is the examination of authentic L2 production. However, such corpus-based research may not reveal all aspects of productive knowledge as learners may avoid using certain collocations (ones they are not confident with) or may overproduce a few collocates that they had practised well (referred to as “safe bets” or “zones of safety”) (Boers & Lindstromberg, Reference Boers and Lindstromberg2009).
Paper-and-pencil tests constitute a more direct measure of productive collocation knowledge than corpus-based evidence. Gap-fill tests have been the most common measures and were employed in different formats. One recurrently used format is to provide a sentential context and ask the learners to complete missing collocates (e.g., She was about to ______ a huge mistake) (e.g., González Fernández & Schmitt, Reference González Fernández and Schmitt2020; Nizonkiza, Reference Nizonkiza2012). To restrict the learners’ options, the first letter/syllable of the missing collocate is often supplied. To further constrain the range of potential collocations elicited, an L1 statement could be added to provide context for the English sentence.
An obvious advantage of such a format is that researchers can control which items are targeted and thus can manipulate various variables, such as frequency and congruency. On the minus side, however, these tests do not examine the learners’ authentic language use and cannot reveal the actual options available to learners during real-time production. In real life, speakers/writers need to consider the context and think about all possible collocates of the word at hand before producing the most appropriate collocate in the target context.
To overcome this limitation, an alternative gap-fill format has been developed by Frankenberg-Garcia (Reference Frankenberg-Garcia2018). The format requires participants to complete the gap in a sentential context/frame with as many collocates as they could think of. For example, in response to the sentential frame “They attempted to __________ the effect of …” participants could supply several collocates, including measure, examine, and analyze. In her study of collocations in an English-for-Academic-Purposes (EAP) context, Frankenberg-Garcia (Reference Frankenberg-Garcia2018) consulted the COCA to choose a range of collocations attested in different disciplinary areas. As shown in the illustrative frame, the nouns were presented within context, and the participants supplied the missing verbs/adjectives that collocate with these nouns. A major advantage of this format is that it simulates real-life performance whereby writers consider several possible collocates in context. However, scoring the test is not as straightforward as traditional gap-filling tests that target specific collocations (see preceding text). Frankenberg-Garcia (Reference Frankenberg-Garcia2018) used Pearson International Corpus of Academic English (PICAE) and employed Log Dice scores (a minimum of 3) and frequency of co-occurrence (five analogous co-occurrences) to identify acceptable collocations. In the present study, we will be using Frankenberg-Garcia’s (Reference Frankenberg-Garcia2018) format to simulate real-life productive collocation performance and examine factors that influence productive collocation knowledge. However, because our focus is on general, rather than academic, collocations we will use the COCA as our reference corpus.
Determinants of productive collocation knowledge
One important determinant of productive collocation knowledge is first language (L1) similarity. In her analysis of verb-noun collocations, Nesselhauf (Reference Nesselhauf2003) found that the learners’ L1 (i.e., German) had a clear effect on collocation errors. Likewise, Laufer and Waldman (Reference Laufer and Waldman2011) found that interlingual collocation errors by native Hebrew learners of English persisted across the three proficiency levels. Another relevant factor is grammatical configuration or PoS. Collocations are often grouped into two main categories: lexical and grammatical (Benson et al., Reference Benson, Benson and Ilson1997) with the latter including a proposition or a grammatical structure. Most of the research on the effect of configuration on L2 collocation knowledge has focussed on lexical collocations, but the evidence in this regard is still limited. For example, Lee and Shin (Reference Lee and Shin2021) found no significant effect of collocation type (i.e., verb-noun, adjective-noun, adverb-adjective, and adverb-verb) on the learners’ scores in a sentence writing task and a gap-fill task when collocation frequency was held constant. Similarly, Nguyen and Webb (Reference Nguyen and Webb2017) found no effect of grammatical configuration (verb-noun vs. adjective-noun) on receptive knowledge of collocations. Although the available evidence is inconclusive regarding the effect of PoS on collocation knowledge development, studies that analyzed L2 learners’ collocational errors showed that adjective + noun and verb + noun collocations were the most problematic (Nesselhauf, Reference Nesselhauf2003; Yan, Reference Yan2010). Thus, in the present study, we focus on adjective (or more generally modifier) + noun and verb + noun collocations (see the following text for more details).
In addition to the influence of L1 and collocation type, Nizonkiza (Reference Nizonkiza2012) highlighted the important role for collocation frequency on the development of productive collocation knowledge. Belgian and Burundian learners of English completed a gap-fill collocation test. Similar to Laufer and Waldman (Reference Laufer and Waldman2011; see preceding text), target collocations were identified based on a dictionary check. The results showed that learners’ collocation knowledge developed as corpus-based frequency increased. Frequency was also found to be an important determinant by González Fernández and Schmitt (Reference González Fernández and Schmitt2015) who found that their L1 Spanish – L2 English learners’ collocation knowledge correlated moderately with corpus frequency (r = .45) and t-score (r = .41). However, no significant relationship was found between collocation knowledge and MI score. The results also highlighted a clear influence for the amount of exposure on L2 collocation knowledge. The learners’ knowledge of collocations moderately correlated with engagement with English outside the classroom (r = .56) and years of English study (r = .45).
A fourth factor that was prominent in Laufer and Waldman’s (Reference Laufer and Waldman2011) study is learners’ L2 proficiency. Although the collocation production errors persisted across all levels of proficiency in that study, the number of appropriate collocations the learners produced increased at the advanced level. L2 proficiency was also a crucial factor in Nizonkiza (Reference Nizonkiza2012) who explored the relationship between controlled productive knowledge of collocations and a measure of L2 proficiency. The results showed that both tests distinguished between the proficiency levels and were highly correlated. This finding strongly indicates that collocation knowledge develops as L2 proficiency increases. Another related finding by Ellis et al. (Reference Ellis, Simpson-Vlach and Maynard2008) is that raw frequency is a better predictor of nonnatives’ collocation processing while MI better predicts the processing of collocations by native speakers. However, Ellis et al. (Reference Ellis, Simpson-Vlach and Maynard2008) did not examine whether and how nonnatives develop their productive collocational knowledge (both in terms of frequency and association strength) as a function of proficiency. One might speculate that MI can be the distinguishing feature of nonnative collocation performance at higher levels of proficiency (approaching nativelike performance), precisely because it is at high levels of proficiency that learners acquire lower frequency words, and then also their word partnerships. Conversely, at lower proficiency levels, learners’ vocabulary knowledge may be largely confined to high-frequency words, which seldom form partnerships with high MI scores. The present study examines this speculation by including data from both natives and nonnatives and through examining the effect of increased productive word knowledge (as a proxy of L2 proficiency) on collocational frequency and association strength (MI scores).
Thus, of most relevance to the current study is the association between knowledge of single words and collocation knowledge. The relationship between these two types of lexical knowledge has been rarely examined. On the receptive front, Nguyen and Webb (Reference Nguyen and Webb2017) investigated EFL learners’ knowledge of verb-noun and adjective-noun collocations at the first three 1,000-word frequency levels, and the extent to which several factors (including knowledge of single-word items at the same word frequency levels) influenced receptive knowledge of collocations. The results revealed significantly large positive correlations between receptive knowledge of single-word items and collocations (r = .67 for verb-noun collocations and r = .70 for adjective-noun collocations). Based on this result, the question arises: what about the relationship between single-word knowledge and collocation knowledge on the productive side? In his research agenda, Schmitt (Reference Schmitt2019) has called for more research exploring the productive level of mastery which has often been reported as lagging behind receptive knowledge. This is the gap that the present study aims to fill.
The present study
This study aims to explore the association between productive word knowledge and the productive knowledge of collocations of the most frequent 3,000 lemmas in English. The study limited itself to three frequency bands (1K, 2K, and 3K) for practicality considerations. Testing more levels would have required more time and resources. The focus of the study was also limited to two types of collocations: modifier-noun (MN) and verb-noun (VN), which are most problematic for L2 learners (Nesselhauf, Reference Nesselhauf2003; Yan, Reference Yan2010). It should be noted that the frequency bands in the productive collocation test refer to the frequency of the noun node (the shared component in both configurations) rather than the frequency of the elicited collocation (see Nguyen & Webb, Reference Nguyen and Webb2017, for a similar approach). Moreover, we used the term modifier in its broadest sense (i.e., any word that describes the noun node or limits its meaning in some way) in place of adjective to account for the variation of responses in the productive collocation test (see “Measures” section for more details).
Another aspect of the present study concerns the focus on three collocational features: appropriacy, frequency, and strength of association. Most of the previous research on receptive collocation knowledge has focused on the appropriacy of the elicited responses (e.g., selecting the appropriate collocate out of several options). Because the present study focused on productive knowledge, we additionally examined the frequency of the elicited responses and their association strength with the noun node.
We used Laufer and Nation’s (Reference Laufer and Nation1999) and Frankenberg-Garcia’s (Reference Frankenberg-Garcia2018) test formats to develop productive measures of word and collocation knowledge, respectively, and refer to them as the Controlled Productive Word Test (CPWT) and the Controlled Productive Collocation Test (CPCT). The study addresses the following research questions:
RQ1: To what extent is productive word knowledge associated with the appropriacy of the elicited collocations?
RQ2: To what extent is productive word knowledge associated with the corpus-based frequency of the elicited collocations?
RQ3: To what extent is productive word knowledge associated with the strength of the elicited collocations?
It should be noted that the participants in the present study included both native speakers (NSs) and non-native speakers (NNSs) of English who took both the CPWT and the CPCT. Including a NS group was essential for two reasons: (a) to validate the CPWT with a group of NSs who should have knowledge of the target lemmas and (b) to establish a baseline against which the NNS group’s CPCT results can be compared. Thus, we included “Group” (NSs versus NNSs) as a controlling factor in the analysis to examine how productive knowledge of collocations develops at highest levels of proficiency.
In addition to controlling for the effect of Group, we also included several item-related (frequency and length of individual words) and participant-related (amount of exposure and age of acquisition) factors as covariates in the analysis. Because collocation knowledge is a complex construct, we needed to partial out the effect of several variables before looking at the main focus of the present study, namely, the relationship between productive word knowledge and productive knowledge of collocations.
Methods
Participants
Two groups of participants took part in the present study. The first group comprised 27 NSs of English. The other group included 55 NNSs of English who spoke Arabic as their first language.Footnote 2 The NNSs were students at a university in Saudi either in the preparatory-year program (n = 18) or as seniors completing their BA degree in the English medium (n = 37). They showed mastery of the most frequent 1,000 (1K) word families in English as indicated by their scores (out of 30) in the updated VLT, Version A (Webb et al., Reference Webb, Sasao and Ballance2017): Minimum = 25, Maximum = 30, M = 28.31, SD = 1.63. We set a somewhat lenient threshold of 25/30 for receptive mastery of the 1K level in the present study as the purpose was to generally ensure adequate comprehension of the contexts in the CPWT and CPCT, which all belonged to the most frequent 1,000 word families in English (see the following text).
Table S1 (see Supplementary Materials) details the characteristics of participants under each group as indicated in their responses to a language background questionnaire. Our analysis models (see “Analysis” section) included participants’ average exposure to English in the four skills and the age at which they started learning English (coded as 0 years for native speakers) to partial out their effect.
Measures
As our purpose was to measure productive knowledge of both single-word items and collocations, we developed two measures for the present study: CPWT and CPCT. In the text that follows, we provide a full description of test creation, piloting, and scoring procedures.
The Controlled Productive Word Test
We sampled items for the CPWT from Davies (Reference Daviesn.d.) COCA frequency list. The list is based on COCA (Davies, Reference Davies2008–) frequency counts and is lemma based. It includes the most frequent 60,000 lemmas in English. The list was developed based on raw frequency, but a dispersion measure (Juilland D; see Juilland & Chang-Rodríguez, Reference Juilland and Chang-Rodríguez1964) of 0.30 was set as a threshold to eliminate lemmas that are limited to a specific genre domain (Davies, personal communication). We opted to use the COCA frequency list as it is based on lemma rather than word-family counts, which might be more suitable for our nonnative participants in the EFL context with varied proficiency levels (see “Background” section). However, it should be noted that there are limitations associated with using the COCA frequency list. We will return to these in the “Discussion” section.
As our aim was to examine knowledge of the most frequent 3,000 lemmas in English, we limited our sampling to the 1K (1,000), 2K (2,000), and 3K (3,000) levels/bands of the COCA frequency list. For the purpose of developing the PVLT, Laufer and Nation (Reference Laufer and Nation1999) sampled 18 items at each frequency level. Also, Aviad-Levitzky et al. (Reference Aviad-Levitzky, Laufer and Goldstein2019) used 10 items to represent each of the 14 frequency bands in the development of CATSS. The sampling rate is usually higher in receptive vocabulary measures (i.e., multiple-choice, checklist formats), where many items can be developed, administered, and scored in a relatively short time. However, for practicality purposes, the sampling rate is usually smaller in productive measures, taking into consideration test development, administration, and scoring time. In the present study, we initially opted for a round number that is closest to Laufer and Nation’s (Reference Laufer and Nation1999) sampling rate, that is, 20 lemmas per band. We employed stratified sampling (based on percentages) to specify the number of lexical word classes (excluding grammatical lemmas) from within each PoS (nouns, verbs, adjectives, and adverbs). Based on the percentages presented in Table S2 (see Supplementary Materials), each 1,000-lemma level was tested using 11 nouns, 5 verbs, 3 adjectives, and 1 adverb (total = 20). With 20 headwords under each of the three frequency levels, the test assessed the productive knowledge of 60 lemmas in total.
All lemmas belonging to a given PoS at each COCA-frequency level were randomized through the List Randomizer (https://www.random.org/lists/) to select target items. For each candidate target word, a short defining sentence context was provided. Words in the surrounding context belonged to the most frequent 1,000-word families (BNC/COCA List; Nation, Reference Nation2012). This was important to ensure that our NNSs, who showed mastery of that frequency level (see “Participants” section), would be able to fully comprehend the sentences.
The approach we employed to restrict responses was slightly different from Laufer and Nation (Reference Laufer and Nation1999; see “Background” section). They provided first letter(s) as clues and decided on the number of these based on possible orthographic neighbourhood (between 1–6 letter clues). To unify the number of clue letters, we only provided the first letter but also represented the number of letters by dashes as an additional clue. Here is an example for the noun violation:
We must report what he has done to the police. This is a v_ _ _ _ _ _ _ _ of the law.
The initial draft of the test went through several piloting stages. Two groups of natives took the test in different rounds. Items that did not elicit the target lemma were either replaced or modified. It was difficult to reach a perfect score for all items. Thus, we decided to use the test as it is and then exclude items where our main pool of 27 NSs did not achieve an acceptable score (see the following text). Appendix S1 presents the target items for the CPWT and Appendix S2 presents the actual test along with the answer key (see Supplementary Materials).
Responses in the CPWT were scored dichotomously (0/1) based on accuracy. Following Laufer and Nation (Reference Laufer and Nation1999), minor mistakes in grammar (i.e., different lemma form: violations instead of violation) and in spelling (incorrect but recognizable form, e.g., comet instead of commit and bleam instead of blame) were ignored. However, because the test is lemma-based, a different PoS (e.g., violate instead of violation) was coded as inaccurate (0). All responses were scored by a proficient Arabic–English research assistant who holds an MA degree in English. She was given detailed instructions before she started scoring the responses. Additionally, another Arabic–English research assistant holding an MA degree in English, scored a random sample of 30% responses based on the same guidelines. Interrater reliability was high (ICC = .99, 95% confidence interval (CI) [.97, .99]). Therefore, only the scores awarded by the first rater were included in the analysis.
As indicated in the preceding text, we initially examined responses by natives to exclude items where more than 20% of the NS provided incorrect answers. This resulted in excluding the following words from the analysis:
1K: walk, series, realize, chair
2K: expand, perception, content, achieve
3K: emotion, mixture, stability, impose, practical
Thus, we ended up with 16 items at the 1K level, 16 items at the 2K level, and 15 items at the 3K level. It should be noted that the incorrect responses provided by NSs in the CPWT do not reflect lack of knowledge (as NSs surely know these highly frequent words) but may have been caused by the clues not properly restricting the required response. This makes our results limited in estimating productive vocabulary size at each frequency band. We will return to this point when we discuss limitations of the study.
Table S3 (Supplementary Materials) presents the percentage of accurate/inaccurate CPWT responses provided by NSs and NNSs under the three frequency levels for the final item pool. We will not further analyze scores in the CPWT as the focus of the present study is on collocation knowledge. These scores will be included as a main factor in the analysis of the CPCT scores to answer the three research questions.
It is worthy of notice that the 20% exclusion criteria employed in the present study is higher than that employed by Laufer and Nation (1 out of 7, i.e., around 15%). However, our format is considered more challenging than Laufer and Nation’s format as we only provided the first letter as a clue, in addition to dashes, to restrict the number of letters. Thus, 20% may be a more suitable threshold for the present study.
We calculated the internal reliability of the final CPWT form (16 words at the 1K level, 16 words at the 2K level, and 15 words at the 3K level) including scores achieved by NS and NNS, and it was found high: 1K (Cronbach’s alpha = .86), 2K (Cronbach’s alpha = .91), 3K (Cronbach’s alpha = .95), total score (Cronbach’s alpha = .97).
The controlled productive collocation test
As indicated previously, we used Frankenberg-Garcia’s (Reference Frankenberg-Garcia2018) test format to assess the controlled productive knowledge of MN and VN collocations at the most frequent 3,000-word levels (1K, 2K, 3K, respectively). We used the same noun nodes for both configurations (MN and VN) to allow a direct comparison.
Nouns from the COCA Frequency List were randomized using the List Randomizer (https://www.random.org/lists/) to select target items. To minimize any transfer effect between the two tests (CPWT and CPCT), no noun was repeated in both measures. The sampled target nouns at each frequency level were checked individually in the COCA interface (Davies, Reference Davies2008–) for how variant their modifier and verb collocates are. For modifiers, the span was set to –1 and for verbs the span was set to –2 to allow for an intervening determiner. The search involved nouns as lemmas (e.g., node: [book]_nn*; collocates: _v* and _j*), and the resulting collocates were sorted by frequency followed by MI (mutual information) value. As indicated in the preceding text (see “Background” section), we used Evert’s ranking approach to operationalizing collocations with a minimum MI threshold of 1 and a minimum raw frequency of 30. However, as the purpose of this initial stage of test development was to explore strong collocations, we employed the stricter MI and frequency thresholds of 3 (Hunston, Reference Hunston2002) and 50 (Nguyen & Webb, Reference Nguyen and Webb2017), respectively. Only nouns that allowed a range of variation for both configurations with highly frequent and strong modifiers and verbs (which our NNSs might know) were included in the initial pool. For example, the noun node chance was considered suitable for the present measure as it allows several modifiers (e.g., good, real, great, fair, excellent) and verb (e.g., have, get, take, stand) collocates. However, the noun node bit was not considered suitable for our purposes. This is because while the COCA search for collocating modifiers of bit resulted in several options (little, tiny, small), the search for collocating verbs resulted only in one record, blow. We ended up with 11 candidate noun nodes for the CPCT at each frequency level.
Then, for each target noun node, we developed a context that is general enough to allow the elicitation of as many collocates as possible. We also made sure that none of the contexts for MN collocations gave away VN collocations or vice versa. For example, we did not use the verb create in the sentence eliciting “modifier + image” collocations as this might lead to an effect on the “verb + image” item. Similar to the CPWT, all words in the surrounding contexts were at the 1,000-word level to ensure full comprehension by our NNSs. The target noun was underlined in each sentence context to stress the target node for which collocations are to be produced. After two piloting rounds with NSs and NNSs to ensure variation in responses, we ended up with 10 noun nodes at each frequency band. Each noun node was presented twice in the test to elicit modifier collocates and then verb collocates. Target items for the CPTC are presented in Appendix S3 and the actual test is presented in Appendix S4 (see Supplementary Materials) with examples of typical collocations. Here are examples of CPCT items for the noun node chance, with possible strong collocates provided in brackets:
MN: This is a/an ____________ chance. (possible responses: good, real, great, fair, excellent)
VN: They ____________ a chance to win. (possible responses: have, get, take, stand)
Following Frankenberg-Garcia (Reference Frankenberg-Garcia2018), the CPCT format instructed participants to insert as many collocates (modifiers in the first section and single verbs in the second section) with the noun presented in context. This resulted in 12,491 data points with responses ranging between 0–16 (M = 1.64, SD = 1.07) per noun node for NNSs and between 1–48 (M = 4.11, SD = 3.31) per noun node for NSs. The huge variation in the number of responses provided by NSs may point out to the possibility that NSs approached the task differently than NNSs. We will revisit this point when we discuss the limitations of the study (see “Discussion” section).
Test scoring and response classification went through three stages: initial accuracy coding, data recording, and appropriacy classification. These steps are explained in detail in Appendix S5 (Supplementary Materials). As previously noted, we use the term modifier in the MN section of the test in its broadest senses. Thus, under the MN category, we accepted adjectives, attributive nouns (or noun adjuncts), and determiners as modifier responses.Footnote 3
At the end, each response was coded as “appropriate” or “inappropriate” based on COCA frequency: MI threshold of 1 and frequency threshold of 30. Thus, our definition of collocation “appropriacy” is related to corpus frequency rather than any evaluative judgment of the responses provided. Each collocation response was included in the data log along with its COCA collocation frequency and calculated MI score (across various lemma forms).Footnote 4
Procedures
After signing the consent form, the participants were administered the CPWT. The sheets were then collected and participants were administered the CPCT. Finally, they completed the language background questionnaire and the 1K updated VLT test.
It should be noted that due to the COVID-19 pandemic and class suspension in Saudi Arabia, we could not run all participants in face-to-face sessions. The NS group and a subset of the NNSs group (i.e., seniors) took the test online (with cameras on) and were instructed not to use any sources of help. The whole test battery took between 60–75 mins to complete by the NNS and only 45 minutes to complete by the NSs.
Analysis
To address the three research questions, three separate analyses were conducted in R version 4.1.1 (R Core Team, 2021). We will refer to the three analyses as Model 1, Model 2, and Model 3. Dichotomous (0/1) outcome values (collocation appropriacy scores in Model 1) were analyzed using a mixed-logit regression analysis for binary data (glmer function in the lme4 package). This analysis targeted “appropriacy” scores for the CPCT (Model 1) that addressed the first research question. For continuous dependent variables including collocation frequency (Model 2, second research question) and MI values (Model 3, third research question), we employed linear mixed-effects (LME) models (lmer function in the lme4 package). The random effect structure of all models was the same, including random intercepts for items and subjects, random by-item slopes for Group (NSs vs. NNSs), and random by-subject slopes for Frequency Level (1K, 2K, 3K). All analyses were conducted stepwise through evaluating the significant contribution of each factor to the model fit using AIC values in the forward method. In the following text, we describe the structure of each model.
The three models (Model 1, Model 2, and Model 3) are concerned with results of the CPCT. Model 1 was a mixed-logit regression with binary collocation appropriacy score (1 = appropriate, 0 = inappropriate) as the dependent measure. The full CPCT data set (12,491 data points) was admitted to this analysis of appropriacy. Covariates included target node lemma length,Footnote 5 configuration (MN vs. VN), average exposure to English, and age when the participant started to learn English. Main fixed variables included Group (NSs as the reference level), Frequency Level (1K as the reference level), and total CPWT score (out of 47). The total CPWT scores were included in the analysis to examine the effect of increased productive vocabulary size on the odds of providing appropriate collocations (Research Question 1). We also tested for the interaction between Frequency Level and CPWT scores. Odds ratios transformed from log odds (Exp(β) values) were used as estimates of the strength of each significant predictor in the model. We also calculated Cohen’s d values based on log odds as standardized estimates of effect size.
The other two models (Model 2 and Model 3) focused on a subset of the CPCT data (6,156 data points, 49.36%, for “appropriate” collocations only) to evaluate factors that predict the log frequency of appropriate collocations (Model 2; Research Question 2) and those that predict the strength of appropriate collocations, that is, MI value (Model 3; Research Question 3). Other than this difference in the dependent measures, Models 2 and 3 had similar structures. Covariates for both models included node lemma length, collocate lemma length, log collocate lemma frequency, configuration, average exposure to English, and age when the participant started to learn English. These were only included to partial out their effect before testing for the main factors. Main fixed factors included Group, Frequency Level, CPWT scores, and the interaction between Frequency Level and CPWT scores. Effect sizes for these models are represented by marginal and conditional R2 values. The former involves only fixed effects, but the latter incorporates random effects as well (see Winter, Reference Winter2019 for a fuller explanation). We also employed Brysbaert and Stevens’s (Reference Brysbaert and Stevens2018) guidelines to calculate Cohen’s d of significant variables in the LME models.
Table S4 (see Supplementary Materials) presents a summary of the continuous variables. Collinearity was checked for significant predictors in each model using the variance inflation factor (VIF). All VIF values were below 2, indicating no collinearity issues.
Results
Table 1 presents the percentage of appropriate/inappropriate collocation responses in the CPCT as well as mean frequency and MI values of appropriate collocations for both NSs and NNSs and for both configurations (MN/VN). It is interesting to note that out of the total 12,491 data points in the CPCT, 53.3% (6,658 responses) were produced by NSs but only around 46.7% (5,833) were produced by NNSs. Furthermore, the 5,833 data points for NNSs included 442 empty cells (no responses). NSs, however, did not leave any unfilled gaps in the CPCT. We opted to keep the empty cells in the appropriacy analysis (Model 1, RQ1, coded as 0 = inappropriate) as they represent lack of collocation knowledge.
* Total responses for NNSs include instances when the participant provided no answer (coded as 0 = inappropriate).
Percentile scores and average frequency/MI values in Table 1 seem to indicate several similarities between NSs and NNSs. First, the percentage of appropriate responses ranged between 63% and 45% for NSs and between 54% and 36% for NNSs. The percentage of appropriate responses produced by NSs might seem counterintuitive, with more than 40% inappropriate responses. It should be noted though that these results are similar to (and even higher than) Siyanova and Schmitt’s (Reference Siyanova and Schmitt2008) findings of 48.1% and 44.6% appropriate collocations for NSs and NNSs, respectively.
Another notable finding in relation to the number of “appropriate” collocation responses is that they gradually decreased as a function of the frequency band (i.e., fewer appropriate responses for lower frequency levels) for both the NNSs and NSs. Similarly, for COCA-based frequency, both groups showed a gradual decrease as a function of frequency band (with lower frequency values overall for the NSs than the NNSs). Finally, MI showed a gradual increase for lower frequency bands for both participant groups.
Regarding the effect of configuration (MN versus VN) on the appropriacy, frequency, and MI of collocations; there seems to be a tendency for more appropriate, less frequent, and stronger MN than VN collocations for both NSs and NNSs.
The following three subsections will present the best-fit Models 1, 2, and 3 to answer Research Questions 1, 2, and 3, respectively.
Association between productive word knowledge and the appropriacy of collocations (Model 1, RQ1)
The best-fit mixed-logit Model 1 for variables predicting appropriate collocation responses is presented in Table S5 (Supplementary Materials). None of the covariates tested contributed to the model. Of the main effects, only CPWT score and Frequency Level were significant. Group was initially significant but ceased to be when CPWT scores were added to the model. The results suggest that participants who scored higher in the CPWT were more likely to produce appropriate collocations in the CPCT (small effect). For Frequency Level, the model showed that more appropriate responses were provided at the 1K (reference) level than the 3K level (small effect). However, the difference between the 1K and 2K levels was not significant. To examine the remaining contrast between the 2K and 3K levels, we redefined the reference level as (2K). The difference was significant with a small effect (β = –0.48, z = –2.62, p = .009, d = –0.27). Finally, the interaction between the CPWT score and the Frequency Level was not significant, suggesting that frequency band did not modulate the observed productive word knowledge effect.
Association between productive word knowledge and the frequency of collocations (Model 2, RQ2)
Table S6 (Supplementary Materials) presents the best-fit LME model for variables predicting the COCA frequency of appropriate collocations. One notable significant covariate is the frequency of the collocate lemma. The positive, large effect suggests that the higher the frequency of the provided collocate, the higher the frequency of the collocation as a whole. This effect is expected as highly frequent lemmas are more likely to form part of highly frequent collocations.
After controlling for several significant covariates, we found that out of the three main variables (Group, Frequency Level, and CPWT score), only Frequency Level and CPWT score were significant. Like Model 1, the Group variable ceased to be significant when CPWT scores were added. Moreover, similar to Model 1, Model 2 showed no interaction between Frequency Level and CPWT scores.
We will now further explore the main effects reported in the preceding text for Frequency Level and CPWT scores. For Frequency Level, the model showed that the COCA frequency of collocations at the 1K level was significantly higher than those at the 2K and 3K levels (small to medium effects). Upon redefining the reference level as (2K), we find no significant difference between the 2K and 3K levels (β = –0.21, t = –1.48, p = .15, d = –0.21). Moving now to the effect of CPWT scores, results showed a significant (though very small) increase in the frequency of elicited collocations as the CPWT score increased.
Finally, the fact that the interaction between frequency level and CPWT score was not significant suggests that this positive effect of increased CPWT scores was omnipresent for all frequency bands.
Association between productive word knowledge and the strength of collocations (Model 3, RQ3)
Our final LME model (Model 3) examined the factors that predict the strength or MI value of the elicited collocations. Results presented in Table S7 (see Supplementary Materials) show two significant covariates. The most interesting of these is the contribution of configuration with a medium effect: lower overall MI values for VN collocations in comparison to MN collocations.
Similar to Model 2, the CPWT score and Frequency Level significantly contributed to the model fit, but the interaction between them did not. The Frequency Level contrasts seem to suggest that 3K collocations were significantly more likely to reflect higher MI values than 1K collocations (medium effect), but the difference between the 1K and 2K levels was not significant. Upon redefining the reference level, we found that the difference between the 2K and 3K levels was significant with a small effect: (β = 0.61, t = 2.71, p = .009, d = 0.40). For the effect of the CPWT, similar to Model 2, the results suggest an omnipresent significant (though very small) effect: higher MI values (i.e., stronger collocations) as productive knowledge increased.
Discussion
Lexical knowledge is not merely about developing the form-meaning link of individual words but, most importantly, about knowing how lexical items are used in context (Frankenberg-Garcia, Reference Frankenberg-Garcia2018). From this perspective, corpora have informed research on whether and how language users conventionally put words together to make appropriate utterances. Utilizing the COCA, we devised a lemma-based controlled productive word test (CPWT) and a controlled productive collocation test (CPCT). The aim was to examine the interrelationship between the productive knowledge of words and collocation appropriacy, frequency, and strength.
Regarding the appropriacy of collocations produced in the CPCT, our results (RQ1) revealed that productive word knowledge test scores and frequency band significantly contributed to scores. If we consider productive word knowledge in the present study as a proxy of proficiency, our findings can be interpreted as support for previous research in the area. For example, Laufer and Waldman (Reference Laufer and Waldman2011) reported that the appropriacy of collocations produced by their learners improved as a result of increased proficiency. Similarly, L2 proficiency, as measured with TOEFL, was a central factor contributing to productive knowledge of collocation in Nizonkiza’s (Reference Nizonkiza2012) study.
Looking more directly at the association between word knowledge and collocation knowledge, Nguyen and Webb (Reference Nguyen and Webb2017) found a strong relationship between receptive knowledge of single-word items and the accuracy of receptive collocation knowledge (r ≈ .70). Our results extend Nguyen and Webb’s findings to productive vocabulary knowledge; the wider the productive knowledge of words is, the more appropriate the collocations produced by language users are, though the effect is small (d = 0.24). The fact that the effect was small here but large in Nguyen and Webb’s study might simply be owing to different analysis methods (mixed logit model vs. correlation, respectively) or might be a genuine difference between productive and receptive knowledge. Further research in this area can help tackle this issue. It is also notable that the results of both NSs and NNSs were fairly similar (no effect of Group) with the percentages of appropriate responses being in line with findings of Siyanova and Schmitt (Reference Siyanova and Schmitt2008). This might be due to the fact that both studies employed a frequency-based approach to identifying appropriate collocations.
Regarding the effect of the frequency band on collocation appropriacy, the fact that the difference between the 1K and 2K levels was not significant may suggest that productive knowledge of collocations deteriorates at the 3K level, at least for the EFL participants in the present study.
We also examined the association between productive word knowledge and the corpus-based frequency of the collocation responses (RQ2). The results showed that productive word knowledge significantly contributed to the model (still with a small effect), with higher frequency collocations being produced overall by participants with wider word knowledge. Concerning frequency bands, noun nodes at the 1K level elicited more frequent collocations than the 2K and 3K levels, but no significant difference was established between the 2K and 3K levels. This finding might be due to the presence of advanced learners among the cohort for whom the differences in the frequency of elicited collocations at the lower frequency bands might be minimal. Overall, the collocation frequency analysis seems to suggest that language users who know more words productively were more likely to produce higher frequency collocations, suggesting a close relationship between productive vocabulary knowledge and productive collocation competence. Although this direction of the CPWT effect might seem counterintuitive given the fact that Ellis et al. (Reference Ellis, Simpson-Vlach and Maynard2008) showed that nativelike performance is akin to stronger collocations (higher MI and thus lower frequency), we believe this discrepancy in findings is related to a difference in the measures used in both studies (see the following text).
The last research question (RQ3) concerns the potential association between productive word knowledge and the strength of produced collocations (MI scores). Overall, similar to the effect reported for collocation frequency (see preceding text), as productive word knowledge increased so did collocation strength, though the effect was small. Two other significant predictors emerged. First grammatical configuration (MN vs. VN) was significant. While Lee and Shin (Reference Lee and Shin2021) and Nguyen and Webb (Reference Nguyen and Webb2017) did not find any effect of grammatical configuration on collocation accuracy, our study established an effect on collocation strength. Significantly lower MI scores were observed for VN combinations compared with MN combinations. In part, this lower average MI score for VN collocations could be attributed to the fact that some common verbs (e.g., delexical verbs like make and do) combine with many nouns resulting in relatively low MI scores.
Then, in contrast with the collocation frequency measure (see Model 2), collocation strength was observed to significantly increase as frequency level decreased, though not between the 1K and 2K levels. This is in fact an expected result, bearing in mind that MI scores are influenced by frequency; higher frequency words that collocate with a vast number of other words tend to have smaller MI scores than lower frequency words that collocate with a relatively limited number of words (Gablasova et al., Reference Gablasova, Brezina and McEnery2017; Nguyen & Webb, Reference Nguyen and Webb2017; see also Bestgen, Reference Bestgen2017).
Overall, the results of the present study seem to suggest that productive collocation knowledge is associated with productive knowledge of individual words. Participants who know more individual words productively are more likely to produce appropriate collocations that are highly frequent and have stronger association. This effect was omnipresent regardless of the Frequency Level (1K, 2K, and 3K). The fact that an increase in productive word knowledge was associated with higher frequency and stronger association might seem contradictory to the findings of Ellis et al. (Reference Ellis, Simpson-Vlach and Maynard2008) who found that nativelike processing is associated with higher MI but not higher frequency. It should be noted, however, that the measures employed in both studies are fundamentally different. Ellis et al.’s (Reference Ellis, Simpson-Vlach and Maynard2008; Experiment 2) productive outcome measure was articulation latency in a reading aloud task, and frequency and MI were examined as predictor variables. Conversely, our study elicited open-ended responses in a gap-fill task and included frequency and MI as outcome variables. Our results seem to suggest that, at least for highly frequent noun nodes, more proficient speakers of the language are more likely to produce more conventionalized collocations (i.e., those that are highly frequent and strongly associated according to corpus data). Thus, as NNSs develop their proficiency to nativelike levels, the collocations they produce become not only stronger but also more frequent (based on corpus counts).
Pedagogical implications
Results of the NNSs in the present study seem to show limited productive knowledge of both words and collocations (appropriacy, frequency, and strength) that lags behind the knowledge exhibited by native speakers. In fact, productive vocabulary knowledge, needed for writing and speaking properly in the L2, has often been reported to be a more advanced level of mastery than receptive knowledge (e.g., Aviad-Levitzky et al., Reference Aviad-Levitzky, Laufer and Goldstein2019). This receptive/productive distinction has been reflected in Nation and Webb’s (Reference Nation and Webb2011) Technique Feature Analysis (TFA) which was developed to evaluate vocabulary activities based on 18 criteria. The TFA gives higher overall scores to exercises that involve some level of “form retrieval,” assumed to reflect vocabulary production.
But how can productive lexical knowledge be enhanced? Empirical research in this area is fairly limited as rightly noted by Schmitt (Reference Schmitt2019), “an under-researched area of particular interest is how to push leaners’ knowledge from receptive mastery to the point where they can independently use lexical items fluently and appropriately in their own output” (p. 264). However, several scholars have made useful suggestions. Nation (Reference Nation2007, Reference Nation2013), for example developed the four-strand principle postulating that any effective vocabulary development program should involve balanced attention to four major components: language-focused practice, meaning-focused input, meaning-focused output, and fluency development. At least two of these strands can be directly related to enhancing productive vocabulary knowledge: meaning-focused output and fluency development. The focus in meaning-focused output activities should be on the successful communication of meaning. Laufer (Reference Laufer and Webb2020) claims that this kind of practice can help the L2 learner cross the receptive/productive vocabulary boundary. As for the often-ignored fluency development strand, practice can also involve productive activities, but these should always be fairly easy to improve speed of access to already known vocabulary.
Thus, the limited research on productive vocabulary knowledge development seems to suggest that language learning programs may need to invest a lot of time and effort to enhance learners’ ability to retrieve words in speaking and writing activities. Our results suggest that this will be reflected not only in enhancing the productive knowledge of individual words but also collocational knowledge. We hope that with recent calls for empirical evidence in this area, researchers will start to conduct more studies that examine the most useful activities to enhance productive vocabulary knowledge.
Limitations and future research
This study has a number of limitations that need to be addressed in future research. First, our definition of “appropriate” collocations in the CPCT was confined to frequency-based counts. Such a definition might have masked possible differences between NSs and NNSs in their use of collocations. This may also be related to the nature of the collocation knowledge measure employed. Following Frankenberg-Garcia (Reference Frankenberg-Garcia2018), the CPCT in the present study instructed participants to provide “as many collocates as possible.” NSs are likely to approach such a task very differently from NNSs. As NSs do not need to “prove” they know the conventions of their native language, they can deviate from the predictable and be creative (hence the large number of responses provided by some NSs for certain items and the counterintuitive low percentage of appropriate collocations based on corpus frequency). NNSs, by contrast, probably deal with the task as a test of how well their knowledge approximates the L2 conventions. Thus, future research examining productive knowledge of collocations may need to develop other measures, avoiding such built-in bias toward NNSs’ performance.
A second limitation of the study is related to item sampling. For that purpose, we used the COCA list which set a relatively low threshold for dispersion—0.3 Juilland dispersion in comparison to 0.6 (Dang et al., Reference Dang, Coxhead and Webb2017) and 0.8 (Gardner & Davies, Reference Gardner and Davies2014). This may suggest that at least some lemmas in the COCA list were not evenly distributed across the corpus. Moreover, the fact that we used a frequency list that is based on American English makes our measures biased toward that dialect. We hope that more lemma-based general-language frequency lists will be available soon to allow further investigations of vocabulary knowledge.
Another limitation of the study is related to the sampling rate of the CPWT. After multiple screening of the items in the CPWT, less than 20 words were chosen to represent each frequency level. Schmitt et al. (Reference Schmitt, Schmitt and Clapham2001) suggest a sampling rate of 30 per 1,000 words to reach an acceptable reliability. When designing productive measures of word knowledge, future research should include more items per frequency level. This can be done through rigours piloting and validation processes. Moreover, the fact that only the first letter was provided as a clue to the target word in the CPWT (cf. Laufer & Nation’s, Reference Laufer and Nation1999; PVLT) resulted in NSs producing orthographically similar synonyms in place of target words (e.g., extending and enlarging in place of the target expanding). We hope that more validation studies in the area of productive vocabulary testing can reach the best way to elicit the required response in a controlled gap-fill format.
A fourth limitation of the study is that we only included three frequency levels (1K, 2K, 3K). This may have resulted in a ceiling effect which prevented differences to emerge between higher-level NNSs and NSs. To overcome this limitation, lower frequency levels should be included in future research to differentiate learners at higher-proficiency levels. Additionally, L1 congruency is established as an important determinant of collocation knowledge (e.g., Nesselhauf, Reference Nesselhauf2003; see “Background” section). Therefore, it would be useful to include congruency as a variable in the categorization of appropriate collocations responses. This should allow for exploring the potential effects of several factors on the production of congruent/incongruent items.
Conclusion
The present study is, to our knowledge, the first attempt to explore the interrelationship between productive word knowledge and productive knowledge of collocations. The results suggest that productive word knowledge is associated with the appropriacy, frequency, and strength of elicited collocations. We hope this study will open the door for more research into productive knowledge of single words and collocations to better understand factors that affect vocabulary development.
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/S0272263122000341.
Acknowledgments
We would like to thank Prince Sultan University for funding this research project under Grant [Applied Linguistics Research Lab- RL-CH-2019/9/1]. We would also like to thank Professor Norbert Schmitt for his useful comments on the initial design of the study. Thanks are also due to three anonymous reviewers for their very useful comments which greatly improved the article. Any shortcomings are entirely our own responsibility.