The development of oral communication in a second language (L2) is of critical importance to second language acquisition (SLA) research and L2 pedagogy. An essential component of oral communication skills is productive vocabulary use, which can affect learners’ everyday interactions and their academic achievement (Daller et al., Reference Daller, van Hout and Treffers-Daller2003). The study of productive vocabulary development and use has been an area of research interest in SLA more broadly and in learner corpus research more specifically over the past 25 years (Laufer & Nation, Reference Laufer and Nation1995; Meara & Bell, Reference Meara and Bell2001; Wolfe-Quintero et al., Reference Wolfe-Quintero and Kim1998). Characteristics of L2 vocabulary use have been found to correlate with how comprehensible and accented an L2 learner’s speech is perceived to be (Appel et al., Reference Appel, Trofimovich, Saito, Isaacs and Webb2019; Crowther et al., Reference Crowther, Trofimovich, Saito and Isaacs2018; Saito, Reference Saito2020; Saito et al., Reference Saito, Webb, Trofimovich and Isaacs2016) and have been associated with achievement on spoken assessment tasks (Berger et al., Reference Berger, Crossley and Kyle2019; Crossley et al., Reference Crossley, Clevinger and Kim2014; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle et al., Reference Kyle, Crossley and McNamara2016).
Despite the clear importance of productive L2 vocabulary, there is less agreement on precisely how vocabulary use should be measured. Generally, indices of productive vocabulary use fall under the umbrella of lexical richness (Read, Reference Read2000; Yule, Reference Yule1944). Lexical richness refers to the breadth and depth of productive vocabulary knowledge (Read, Reference Read2000), which is most often measured using the constructs of lexical diversity and lexical sophistication (see Kyle, Reference Kyle and Webb2020). Lexical diversity is most commonly measured using an index of lexical variety such as D, MATTR, or MTLD (see Jarvis, Reference Jarvis, Jarvis and Daller2013a; Zenker & Kyle, Reference Zenker and Kyle2021). Lexical sophistication refers to the proportion of advanced words used in a language production task and is most commonly measured using an index of word frequency (Kyle & Crossley, Reference Kyle and Crossley2015; Laufer & Nation, Reference Laufer and Nation1995; Read, Reference Read2000). Several recent studies have emphasized the importance of taking a multivariate approach to the measurement of lexical sophistication (Eguchi & Kyle, Reference Eguchi and Kyle2020; Kim et al., Reference Kim, Crossley and Kyle2018; Kyle et al., Reference Kyle, Crossley and Berger2018). These studies have suggested, for example, that lexical sophistication is most accurately indexed when features beyond single-word frequency are considered, including psycholinguistic properties of words (such as concreteness; Crossley & Skalicky, Reference Crossley and Skalicky2019; Guo et al., Reference Guo, Crossley and McNamara2013) and the strength of association between word combinations (Bestgen & Granger, Reference Bestgen and Granger2014; Durrant & Schmitt, Reference Durrant and Schmitt2009; Garner et al., Reference Garner, Crossley and Kyle2019; Granger & Bestgen, Reference Granger and Bestgen2014).
The majority of productive vocabulary use research has investigated data from cross-sectional corpora, with a relatively small number of studies investigating oral development longitudinally (Berger et al., Reference Berger, Crossley and Kyle2019; Crossley et al., Reference Crossley, Salsbury and McNamara2010; Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011a, Crossley et al., Reference Crossley, Skalicky, Kyle and Monteiro2019; Tavakoli, Reference Tavakoli2018). Longitudinal research on the same individuals is needed to inform developmental research and support SLA theories of vocabulary acquisition (Hasko, Reference Hasko2013; Meunier, Reference Meunier, Granger, Gilquin and Meunier2015; Ortega & Byrnes, Reference Ortega and Byrnes2009). In addition, most productive vocabulary research has focused on L2 English, likely because of the availability of both English learner corpora and tools for the automatic analysis of English texts. There has not been enough research to determine the degree to which these findings extend to other L2s or even if different L1–L2 pairs may yield different results. For instance, Spanish has a rich verb morphology system that might affect lexical development in different ways than the English verb system does, especially when considering verb inflections (Montrul, Reference Montrul2004; Schnur & Rubio, Reference Schnur and Rubio2021). In more practical terms, Spanish is the second most common native language in the world, with more than 496 million speakers according to the Instituto Cervantes annual report (Fernández Vítores, Reference Fernández Vítores2022). Spanish is also the most common L2 studied in the United States across all educational levels and is also commonly studied worldwide. It is therefore critical to investigate and better understand how multiple features of L2 Spanish may develop in both university and study abroad (SA) contexts, as this can have not only theoretical implications but also practical implications for pedagogy and for the design of SA programs.
A number of studies have investigated the characteristics of productive vocabulary use in L2 Spanish by looking at a single feature of lexical richness (Asencion-Delaney & Collentine, Reference Asencion-Delaney and Collentine2011; Berton, Reference Berton2020; Castañeda-Jimenez & Jarvis, Reference Castañeda-Jimenez, Jarvis and Geeslin2014; McManus et al., Reference McManus, Mitchell and Tracy-Ventura2021; Schnur & Rubio, Reference Schnur and Rubio2021; Tracy-Ventura, Reference Tracy-Ventura2017). These studies contribute to the new field of L2 Spanish vocabulary development and learner corpus research, but more research is needed to make wider generalizations (Lozano, Reference Lozano2015; Mendikoetxea, Reference Mendikoetxea and Geeslin2013). The current study builds on previous research by investigating the productive lexical development of L2 Spanish use using spontaneous oral data from the longitudinal Languages and Social Networks Abroad Project corpus (LANGSNAP: Mitchell et al., Reference Mitchell, Tracy-Ventura and McManus2017), a 21-month corpus of 27 university learners with regard to various features of lexical variation and sophistication.
Defining lexical richness
The term lexical richness was initially used in literary stylometric studies to refer to the size of a particular author’s vocabulary (Yule, Reference Yule1944). Although Yule’s use of lexical richness referred specifically to a particular calculation of lexical diversity (Yule’s K), researchers in applied linguistics have used the term more broadly. Read (Reference Read2000), for example, explains that lexical richness refers to the breadth and depth of lexical knowledge that is demonstrated in productive language use. Read further outlined three subconstructs of lexical richness—namely lexical density, lexical diversity, and lexical sophistication.
Lexical density refers to the proportion of content words in a text. Although it was hypothesized that more proficient users of a language (who have wider and deeper productive vocabulary knowledge) will produce more informationally dense texts, empirical evidence has suggested that density is more closely related to register (less interactive texts tend to be more lexically dense) than proficiency (Engber, Reference Engber1995; Lu, Reference Lu2012; O’Loughlin, Reference O’Loughlin1995). Accordingly, lexical density indices are rarely used as measures of lexical richness.
Lexical diversity typically refers to the variety of words used in a text (Engber, Reference Engber1995; Jarvis, Reference Jarvis2013b; Kyle et al., Reference Kyle, Crossley and Jarvis2021) and is a productive measure of lexical breadth. As language learners become more proficient, we presume that their productive vocabulary will grow. We also presume that individuals with a larger productive vocabulary will use a wider variety of lexical items to complete a particular language task. Accordingly, we presume that more proficient language users will produce texts that are more lexically diverse than less proficient users (when the language task is kept consistent). An important confound with many indices of lexical diversity is that they conflate text length and lexical variety (Koizumi & In’nami, Reference Koizumi and In’nami2012; McCarthy & Jarvis, Reference McCarthy and Jarvis2010; Zenker & Kyle, Reference Zenker and Kyle2021). In order to estimate lexical breadth, it is therefore important to use indices of lexical diversity that are stable across different text lengths such as moving average TTR (MATTR; Covington & McFall, Reference Covington and McFall2010).
The third subconstruct of lexical richness is lexical sophistication. Lexical sophistication has been conceptualized from two related perspectives. The first perspective focuses on the learnability of a particular word and highlights lexical breadth. For example, words that are more frequent in an individual’s language experience are (with some caveats) easier to learn (and use) than words that are less frequent (Ellis, Reference Ellis2002). We therefore presume that more proficient language learners will know (and use) a higher proportion of less frequent words (see Laufer & Nation, Reference Laufer and Nation1995; Read, Reference Read2000). As lexical sophistication research has matured, other features of word learnability such as concreteness (Brysbaert et al., Reference Brysbaert, Warriner and Kuperman2014; Paivio, Reference Paivio1971) have been used to complement frequency indices. The second perspective focuses on reader and/or listener perceptions of lexical proficiency. Although perceptions of lexical proficiency are affected by word learnability features (and therefore vocabulary depth), they are also affected by features of vocabulary breadth such as the use of vocabulary items in the appropriate lexicogrammatical contexts and registers (Garner et al., Reference Garner, Crossley and Kyle2019; Kim et al., Reference Kim, Crossley and Kyle2018; Kyle et al., Reference Kyle, Crossley and Berger2018; Nation, Reference Nation2001). From this second perspective, indices related to both individual word use (e.g., frequency and concreteness) and indices related to collocation use (e.g., n-gram association strength) are used to measure lexical sophistication. In this paper we adopt the latter perspective, which is in line with a large body of lexical sophistication research published over the past decade (Crossley, Salsbury, et al., Reference Crossley, Salsbury, McNamara, Jarvis and Daller2013; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle et al., Reference Kyle, Crossley and Berger2018; Kyle & Crossley, Reference Kyle and Crossley2015).
Lexical richness in learner corpus research
Indices of lexical richness have been found to correlate with measures of L2 lexical proficiency (Berger et al., Reference Berger, Crossley and Kyle2019; Crossley, Salsbury, et al., Reference Crossley, Salsbury, McNamara, Jarvis and Daller2013; Kyle et al., Reference Kyle, Crossley and Berger2018) and holistic scores of speaking proficiency (Crossley et al., Reference Crossley, Clevinger and Kim2014; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle & Crossley, Reference Kyle and Crossley2015) as well as with judgements of communicative competence, such as fluency, comprehensibility or accentedness (Appel et al., Reference Appel, Trofimovich, Saito, Isaacs and Webb2019; Saito, Reference Saito2020; Saito & Akiyama, Reference Saito and Akiyama2017; Tavakoli & Uchihara, Reference Tavakoli and Uchihara2020; Uchihara & Saito, Reference Uchihara and Saito2019). Empirical studies have demonstrated that both lexical diversity and lexical sophistication are important predictors of speaking and writing proficiency (Crossley & McNamara, Reference Crossley and McNamara2013; Guo et al., Reference Guo, Crossley and McNamara2013; Kyle & Crossley, Reference Kyle and Crossley2015), but the majority of research to date has used written corpora. Findings of L2 writing studies do not always transfer directly to studies involving L2 speaking. For example, the assumption that as learners’ language develop, they will use less frequent words seems to be accepted in L2 writing research (Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011b; Kyle et al., Reference Kyle, Crossley and Berger2018; Laufer & Nation, Reference Laufer and Nation1995). In contrast, studies examining L2 spoken corpora present mixed results (Bardel et al., Reference Bardel, Gudmundson and Lindqvist2012; Crossley et al., Reference Crossley, Salsbury and McNamara2010; Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011a; Crossley et al., Reference Crossley, Salsbury and Mcnamara2015; Kyle & Crossley, Reference Kyle and Crossley2015; Lindqvist et al., Reference Lindqvist, Gudmundson, Bardel, Bardel, Lindqvist and Laufer2013). Spoken language differs from written language in that it generally involves less planning and lack of editing, especially in interpersonal communication. Linguistic features of spoken and written language are influenced not only by mode but also by register (Biber, Reference Biber1988; Biber & Conrad, Reference Biber and Conrad2019; Kyle et al., Reference Kyle, Eguchi, Choe and LaFlair2022). Interpersonal registers vary in their situational context (e.g., everyday conversation versus office hours), which is also characterized by different linguistic features (Biber & Conrad, Reference Biber and Conrad2019). Less formal registers tend to be characterized by less sophisticated lexical items (though these items may still be diverse; Biber et al., Reference Biber, Conrad, Reppen, Byrd, Helt, Clark, Cortes, Csomay and Urzua2004; Kyle et al., Reference Kyle, Eguchi, Choe and LaFlair2022). Consequently, research investigating spontaneous spoken data has found that more proficient learners tend to produce more frequent words (Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011a; Crossley et al., Reference Crossley, Skalicky, Kyle and Monteiro2019; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle & Crossley, Reference Kyle and Crossley2015). However, relatively few studies have examined this phenomenon longitudinally, and even fewer studies have done so in L2 Spanish.
In the SA context, a few studies have investigated development of lexical diversity (McManus et al., Reference McManus, Mitchell and Tracy-Ventura2021; Tavakoli, Reference Tavakoli2018) and/or lexical sophistication (Tavakoli, Reference Tavakoli2018; Tracy-Ventura, Reference Tracy-Ventura2017; Zaytseva et al., Reference Zaytseva, Miralpeix and Pérez-Vidal2021). The mixed results suggest a complex interplay between time spent abroad, task, and mode. For example, after a 1-month stay abroad, Tavakoli (Reference Tavakoli2018) found that English learners improved their oral lexical diversity (as measured by D and MTLD) in a dialogue task, but no significant changes were observed in the monologic task. In another study using monologic tasks, Leonard and Shea (Reference Leonard and Shea2017) found no significant increase in the development of Spanish lexical diversity (as measured by D) after 3 months abroad. However, McManus et al (Reference McManus, Mitchell and Tracy-Ventura2021) found lexical diversity scores (as measured by D) to increase significantly after 9 months abroad using an oral narrative task, A few studies have also used spontaneous spoken tasks. Mora and Valls-Ferrer (Reference Mora and Valls-Ferrer2012) used oral interviews in their 15-month longitudinal study, finding that a 3-month SA period resulted in a significant increase in lexical diversity scores (as measured by Guiraud’s index). Similarly, Serrano et al. (Reference Serrano, Tragant and Llanes2012) also used oral interviews and found a significant increase in spoken lexical diversity (Guiraud’s) after the first 3 months abroad, but improvement in written lexical diversity scores were not significant until after 8 months abroad. In contrast, two longitudinal studies found SA to be more beneficial for the development of written than oral lexical diversity as measured by Guiraud’s index (Pérez-Vidal et al., Reference Pérez-Vidal, Juan-Garau, Mora, Valls-Ferrer and Muñoz2012; Zaytseva et al., Reference Zaytseva, Miralpeix and Pérez-Vidal2021), with formal instruction having a greater impact on oral lexical diversity than SA. However, the studies that used Guiraud’s index should be interpreted with caution, given the well-documented intrinsic relationship between Guiraud’s index and text length (e.g., McCarthy & Jarvis, Reference McCarthy and Jarvis2010; Koizumi & In’nami, Reference Koizumi and In’nami2012; Zenker & Kyle, Reference Zenker and Kyle2021). Fewer studies have examined lexical sophistication during SA using frequency band-based indices (Tracy-Ventura, Reference Tracy-Ventura2017; Zaytseva et al., Reference Zaytseva, Miralpeix and Pérez-Vidal2021). Despite the important contributions these studies make to our understanding of longitudinal vocabulary development and the field of SA, more research is still needed in this area.
Multivariate approach to lexical richness
Lexical diversity has been widely studied in L2 research (Engber, Reference Engber1995; Jarvis, Reference Jarvis, Jarvis and Daller2013a, Reference Jarvis2013b) and is calculated by considering the number of types (different words) and the number of tokens (total number of words) in a text. Because of an intrinsic link between text length and simple measures of diversity—such as the type-token ratio or Guiraud’s (Reference Guiraud1960) index—measures such as moving average TTR (MATTR; Covington & McFall, Reference Covington and McFall2010) and the measure of textual lexical diversity (MTLD; McCarthy & Jarvis, Reference McCarthy and Jarvis2010) are increasingly used (see Koizumi & In’nami, Reference Koizumi and In’nami2012; Vidal & Jarvis, Reference Vidal and Jarvis2020; Zenker & Kyle, Reference Zenker and Kyle2021). Although a consistent relationship between lexical diversity and lexical development has been found, lexical diversity indices only account for the use of different words, not how sophisticated the words themselves are. For example, the following Spanish sentences would get a similar diversity score: el gato come la comida (the cat eats the food) and el mamífero devora un manjar (the mammal devours a delicacy), yet one could argue the later one uses more advanced and sophisticated vocabulary. A combination of lexical diversity and lexical sophistication indices to examine lexical use in a multivariate manner is needed, providing a broader understanding of vocabulary development (Jarvis, Reference Jarvis2017; Kyle, Reference Kyle and Webb2020).
Measures of lexical sophistication often make use of a reference corpus and include a variety of indices related to word frequency, range, and collocation. As learners become more proficient, they tend to use less frequent words (at least in written productions; Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011b). An early approach to calculating frequency was the Lexical Frequency Profile (LFP; Laufer & Nation, Reference Laufer and Nation1995). An LFP is calculated by grouping word families of a reference corpus and dividing them into frequency bands. The percentage of words in a learner text that occur in each band is then calculated. Laufer and Nation found that more proficient writers tend to use more low-frequency words and more words from the university word list than novice writers, who tend to use high-frequency words found in the 1,000 and 2,000 frequency bands. More recent investigations of Spanish as an L2 have also indicated that more advanced writers tend to produce more low-frequency words (Berton, Reference Berton2020; Schnur & Rubio, Reference Schnur and Rubio2021). A recent longitudinal study found that using more low-frequency words can be a predictive feature of development when written and spoken texts are combined (Tracy-Ventura, Reference Tracy-Ventura2017). An alternative method for calculating frequency scores is to use the mean frequency of words in a text. Mean frequency is calculated by identifying the precise frequency of each word in a learner text (based on a reference corpus) and then calculating the average frequency score. Mean frequency indices have been found to be reasonably strong predictors of L2 proficiency (Crossley, Cobb, & McNamara, Reference Crossley, Cobb and McNamara2013, p. 967). Tools such as Coh-Metrix (Graesser et al., Reference Graesser, McNamara, Louwerse and Cai2004) and TAALES 2.0 (Kyle et al., Reference Kyle, Crossley and Berger2018) calculate mean-frequency scores for English texts but not for other languages. Research in L2 English has found that the use of lower frequency words positively correlates with proficiency levels and holistic scores, especially in written registers (Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011b; Kim et al., Reference Kim, Crossley and Kyle2018; Laufer & Nation, Reference Laufer and Nation1995).
However, although we presume that as learners have access to a wider range of less frequent words as they become more proficient users of a language, this does not necessarily mean that they use (or should use) less frequent words in all contexts. For example, a number of studies have found a positive relationship between speaking proficiency and word frequency across a range of relatively informal speaking task types (Berger et al., Reference Berger, Crossley and Kyle2019; Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011a; Crossley et al., Reference Crossley, Clevinger and Kim2014; Eguchi & Kyle, Reference Eguchi and Kyle2020). For example, Kyle and Crossley (Reference Kyle and Crossley2015) found a small, positive relationship between frequency and proficiency scores on an independent TOEFL speaking task, which asks test takers to provide their opinion on an everyday topic. Berger et al. (Reference Berger, Crossley and Kyle2019) found similar results using a corpus of L2 conversations rated for lexical proficiency. In a recent study that investigated the relationship between lexical sophistication and oral proficiency interview scores (Eguchi & Kyle, Reference Eguchi and Kyle2020), a strong, positive relationship was found between the “common word” factor (which included several frequency indices) and holistic oral proficiency interview scores. These findings suggest that for some spoken registers, advanced oral proficiency may be characterized by the use of comprehensible higher frequency words. Kyle et al. (Reference Kyle, Crossley and McNamara2016), for example, found that opinion-based TOEFL iBT independent speaking-task responses included more frequent words than integrated speaking-task responses and that less formal integrated tasks (i.e., campus situation) included more frequent words than more formal integrated tasks that required the synthesis of technical academic information. The register of the task (i.e., campus situation versus academic) affected word frequency. Task type has also been shown to affect lexical features that correlate with holistic judgements of comprehensibility and accentedness (Appel et al., Reference Appel, Trofimovich, Saito, Isaacs and Webb2019; Crowther et al., Reference Crowther, Trofimovich, Saito and Isaacs2018) and written lexical sophistication in L2 Spanish texts (Schnur & Rubio, Reference Schnur and Rubio2021). Thus, register effects should be accounted for in the investigation of lexical sophistication.
It should be noted, however, that not all studies that involve informal speaking tasks have found positive relationships between frequency and proficiency (Bardel et al., Reference Bardel, Gudmundson and Lindqvist2012; Lindqvist et al., Reference Lindqvist, Bardel and Gudmundson2011). Using the same small sample of interview data from L2 learners of French (n = 14) and Italian (n = 20) but different methods of differentiating between basic and advanced vocabulary, Lindqvist et al. (Reference Lindqvist, Bardel and Gudmundson2011) and Bardel et al. (Reference Bardel, Gudmundson and Lindqvist2012) found that “advanced high” learners used a lower proportion of frequent words than “advanced low” learners in each language. Clearly, more research is needed to determine the factors that affect the production of high and low frequency words such as mode, register, L2, and the proficiency levels that are under investigation.
N-grams and strength of association
Collocation use is also an important indicator of proficient word use that taps into one aspect of vocabulary breadth (Gries, Reference Gries2013; Nation, Reference Nation2001; Paquot, Reference Paquot2019; Sinclair, Reference Sinclair1991). Research analyzing lexical sophistication from a multivariate approach has found corpus-based measures of n-gram frequency and strength-of-association (SOA) to be strong predictors of language development and proficiency (Crossley et al., Reference Crossley, Salsbury and Mcnamara2015; Eguchi & Kyle, Reference Eguchi and Kyle2020; Gablasova et al., Reference Gablasova, Brezina and McEnery2017; Garner et al., Reference Garner, Crossley and Kyle2019; Kyle et al., Reference Kyle, Crossley and Berger2018). N-grams refer to multiword sequences of n words (e.g., en el, soy un), and SOA measures the conditional probability that two words in an n-gram will occur together, based on a reference corpus. Common SOA measures include mutual information (MI), which tends to highlight highly exclusive collocations, and T score, which tends to highlight collocations between frequent words.
Strength-of-association and n-gram frequency are generally indicators of both L2 spoken and written proficiency. N-grams indices positively correlate with holistic scores of writing (Gablasova et al., Reference Gablasova, Brezina and McEnery2017; Garner et al., Reference Garner, Crossley and Kyle2019, Reference Garner, Crossley and Kyle2020; Granger & Bestgen, Reference Granger and Bestgen2014) and predict longitudinal development trajectories related to writing proficiency (Bestgen & Granger, Reference Bestgen and Granger2014; Paquot, Reference Paquot2019). These measures also contribute to a large percentage of the variance of holistic scores of lexical proficiency in writing (Garner et al., Reference Garner, Crossley and Kyle2020; Granger & Bestgen, Reference Granger and Bestgen2014; Kim et al., Reference Kim, Crossley and Kyle2018; Kyle et al., Reference Kyle, Crossley and Berger2018), oral lexical proficiency (Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle & Crossley, Reference Kyle and Crossley2015), and rater judgements of comprehensibility (Saito, Reference Saito2020). Eguchi and Kyle (Reference Eguchi and Kyle2020), for example, found that advanced oral proficiency interview score samples were characterized by more strongly associated n-grams (measured using both MI and T scores) and n-grams used in wider contexts. These studies suggest that appropriate collocation use is an important predictor of spoken and written L2 (at least in English). However, related research of collocational use in Spanish is scarce (Vincze et al., Reference Vincze, García-Salido, Orol, Alonso-Ramos and Alonso-Ramos2016) and more research is needed to determine the degree to which these relationships are stable across L2s.
Psycholinguistic word information
In addition to frequency and n-gram measures, psycholinguistic word information indices are an important factor when modeling L2 development via lexical sophistication. These word norms are based on behavioral studies (Brysbaert et al., Reference Brysbaert, Warriner and Kuperman2014; Stadthagen-Gonzalez et al., Reference Stadthagen-Gonzalez, Imbault, Pérez Sánchez and Brysbaert2017) and are related to a word’s saliency (Crossley et al., Reference Crossley, Kyle and Salsbury2016; Crossley & Skalicky, Reference Crossley and Skalicky2019; Salsbury et al., Reference Salsbury, Crossley and McNamara2011), which in turn affects the difficulty of learning and using a word (see Ellis, Reference Ellis2002). Therefore, psycholinguistic word information indices are a measure of vocabulary depth. Psycholinguistic word information includes indices such as concreteness (how concrete or abstract a word is), familiarity (how often that word is encountered), and imageability (how easy it is to create a mental image of a word), among others.
Psycholinguistic properties of word knowledge have contributed to the variance explaining lexical proficiency and holistic scores in both spoken and written assessment contexts (Crossley et al., Reference Crossley, Kyle and Salsbury2016; Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011a; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle et al., Reference Kyle, Crossley and Berger2018). Longitudinal studies of L2 speech samples have indicated that learners use words that are less concrete, less meaningful, and less imageable as a function of time (Crossley & Skalicky, Reference Crossley and Skalicky2019; Salsbury et al., Reference Salsbury, Crossley and McNamara2011). Cross-sectional studies have found a similar relationship between learner proficiency and the use of less salient words (e.g., words that are less concrete) in L2 English (Crossley et al., Reference Crossley, Salsbury, McNamara and Jarvis2011a; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle & Crossley, Reference Kyle and Crossley2015). To our knowledge, however, there has been no empirical research on how these norms can be used to index lexical development of L2 Spanish.
Studies of lexical richness of L2 English have found several measures to be relatively stable across written and spoken corpora (e.g., lexical diversity, concreteness) but not all (e.g., word frequency). Most studies of productive lexical use have been cross-sectional in nature. These studies provide an account of the lexical characteristics of learner produced texts at various benchmark levels, but they do not necessarily indicate how lexical use develops over time. Although the number of published longitudinal studies has been increasing (Berger et al., Reference Berger, Crossley and Kyle2019; Crossley et al., Reference Crossley, Skalicky, Kyle and Monteiro2019; Crossley & Skalicky, Reference Crossley and Skalicky2019), more research is needed (particularly in languages other than English) to understand the ways in which lexical use develops. In particular, more studies that investigate the development of productive lexical use from a multivariate perspective are needed (Eguchi & Kyle, Reference Eguchi and Kyle2020; Kim et al., Reference Kim, Crossley and Kyle2018; Kyle et al., Reference Kyle, Crossley and Berger2018). To date, only a small number of studies have examined lexical richness in Spanish (Asencion-Delaney & Collentine, Reference Asencion-Delaney and Collentine2011; Berton, Reference Berton2020; Castañeda-Jimenez & Jarvis, Reference Castañeda-Jimenez, Jarvis and Geeslin2014; Vincze et al., Reference Vincze, García-Salido, Orol, Alonso-Ramos and Alonso-Ramos2016), and only two have used longitudinal designs (McManus et al., Reference McManus, Mitchell and Tracy-Ventura2021; Tracy-Ventura, Reference Tracy-Ventura2017). As part of a larger study investigating complexity, accuracy, and fluency (CAF) measures, McManus et al. (Reference McManus, Mitchell and Tracy-Ventura2021) found that spoken lexical diversity scores as measured by D (Malvern & Richards, Reference Malvern and Richards2002) increased over three 1-year collection points. Using a frequency-band approach, Tracy-Ventura (Reference Tracy-Ventura2017) found that participants used significantly more low-frequent words in the 3k–5k bands after studying abroad for 9 months. These studies have provided an excellent starting point for research into the development of L2 Spanish productive lexical use. However, more research is needed to understand how L2 Spanish develops with respect to lexical diversity, frequency, saliency, and collocation use.
Current study
The present study adds to previous findings of lexical richness in Spanish learner corpus research and in longitudinal development of spoken language by investigating several lexical and collocational features of language use in L2 Spanish learners over a 21-month period.
Multiple indices of lexical sophistication commonly used in previous learner corpus studies were calculated to allow for comparisons between longitudinal research and oral data in languages other than English. This study is guided by the following research questions:
-
(1) How do features of lexical richness develop over time in L2 Spanish?
-
(2) To what extent are indices of lexical richness in L2 Spanish collinear?
Method
Learner corpus
The learner corpus used for this study was a subset of the Spanish oral data from the longitudinal learner corpus LANGSNAPFootnote 1 (Mitchell et al., Reference Mitchell, Tracy-Ventura and McManus2017; Tracy-Ventura et al., Reference Tracy-Ventura, Mitchell, McManus and Alonso-Ramos2016). The LANGSNAP corpus includes written and oral data from 27 L2 Spanish learners who spent 9 months abroad. The data were collected at six points over a 21-month period: before departure, three visits during their stay, and two post-SA collection points. At each collection point, each learner completed a written argumentative task, a picture-based oral narrative task, and a semistructured interview, each of which was designed to elicit rich interactive language. In total, the corpus includes 486 texts (303,920 words). There were three prompts for the written argumentative essay and the picture-narrative task, each administered approximately a year apart. Preliminary analyses indicated that there were strong task and prompt effects in the written and oral narrative data, reflecting previous research (Biber & Gray, Reference Biber and Gray2013; Kyle et al., Reference Kyle, Crossley and McNamara2016). Therefore, we decided to analyze the oral interviews in this study.
The semistructured oral interviews consisted of preestablished questions related to students’ opinions and experiences about their lives abroad, their host family, or language learning. As described in Mitchell et al. (Reference Mitchell, Tracy-Ventura and McManus2017), the interviews were designed to elicit a variety of forms and vocabulary from interactive and spontaneous L2 speech samples that could be relevant for the analysis of complexity, accuracy, lexicon, and fluency measures. For example, even though at Visits 2 and 3 the preestablished topic is about students’ immediate experiences, participants are also asked to reflect on future plans and what they would miss when returning home. Each of the 27 participants produced one spoken text at each collection point, with a total of 162 texts and 254,828 words. The interviews were conducted by a member of the research team and lasted an average of 15 min. The LANGSNAP website provides files with the responses transcribed by the research team. The overview of the corpus and starting topics are shown in Table 1.
The 27 learners were language majors at a university in the United Kingdom where students are required to study abroad during their third year of their undergraduate degree. While abroad, students were exchange students (n = 9), teaching assistants (n = 16), or work interns (n = 2). There were more females (n = 20) than males (n = 7), and most students had English as their L1 (n = 25), except two L1 Polish speakers. Two thirds of the learners (n = 18) spent their year abroad in Spain and the rest (n = 9) in Mexico. Participants’ ages at the time of collection varied from 20 to 25, and their mean length for studying Spanish before beginning data collection was 5.5 years (for more information on the participants, design, and collection of LANGSNAP, see Mitchell et al., Reference Mitchell, Tracy-Ventura and McManus2017; Tracy-Ventura et al., Reference Tracy-Ventura, Mitchell, McManus and Alonso-Ramos2016). Participants’ overall proficiency was measured three times (before departure, after 5 months abroad, and after returning) using the Spanish Elicited Imitation Test (EIT; Ortega, Reference Ortega2000). Participants’ proficiency was at an intermediate level at the beginning of the study. The results of a repeated measures analysis of variance showed a significant effect for time and large effect sizes between times (Mitchell et al., Reference Mitchell, Tracy-Ventura and McManus2017), indicating that participants’ overall language proficiency improved while abroad and it continued after their return.
Indices of lexical richness
Indices of lexical richness were calculated using a freely available, newly developed tool, TAALES_ES. The tool processes texts using the es-core-news-sm (version 2.1.0) model and Spacy (version 2.1.8). For the analyses in this paper, all words were lemmatized and homographs were distinguished by parts of speech (e.g., noun, verb, etc.). The scripts are freely available at https://github.com/LCR-ADS-Lab/TAALES_ES. Research in Spanish L2 acquisition has pointed out that appropriate verb inflection and the use of the subjunctive mood may signal development for L1 English speakers (Asencion-Delaney & Collentine, Reference Asencion-Delaney and Collentine2011; Collentine, Reference Collentine2010; Montrul, Reference Montrul2004; Schnur & Rubio, Reference Schnur and Rubio2021). Given the rich verb morphology system in Spanish and previous research findings, verb lemmas were distinguished by tense and mood (but not person). This allowed (for example) differentiating between present indicative conjugations of a verb (e.g., comes [you eat] and comemos [we eat] are represented as comer_VERB_Ind_Pres) and subjunctive conjugations (e.g., comieras [were you to eat] and comiera [were he/I/she to eat] are represented as comer_VERB_Sub_Imp). Spacy reports high accuracy for the features used in these analyses (Explosion Al, 2022), including large-grained parts of speech (Noun, Verb, Adjective, etc.; F1 = .982), tense (F1 = .973), and mood (F1 = .965). All corpus-based indices were calculated using a 450-million word subset of the Spanish version of the Corpus of the Web (ESCOW14; Schäfer, Reference Schäfer, Bański, Biber, Breiteneder, Kupietz, Lüngen and Witt2015; Schäfer & Bildhauer, Reference Schäfer, Bildhauer, Choukri, Declerck, Doğan, Maegaard, Mariani and Moreno2012). The following indices were used to measure lexical richness:
Moving average type token ratio (MATTR)
MATTR is an index of lexical diversity that has been shown to be independent of text length (Covington & McFall, Reference Covington and McFall2010; Zenker & Kyle, Reference Zenker and Kyle2021). In this study, MATTR is calculated using a moving 50-word window. First, a type token ratio (TTR) is calculated for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached. Final MATTR scores are calculated by averaging the TTR scores for all 50-word windows. It is expected that higher proficiency language users will produce more lexically diverse texts given a particular language production task, which is indexed by higher MATTR scores.
Content word frequency
Content word frequency scores are calculated using all adjectives, adverbs, nouns, and verbs in a text. Mean content word frequency scores are calculated based on the average ESCOW frequency score for content words in a learner text. Traditionally, lower frequency content words (e.g., chirrido [squeak], egregio [egregious], innumerables [countless]) have been considered more difficult and/or less likely to be known by a language learner than more frequent content words (e.g., hermano [brother], bosque [forest], verano [summer]), and they are therefore considered more sophisticated, particularly in written registers.
Verb frequency
Verb frequency scores, which accounted for tense and mood, indicated the mean frequency for verbs based on the ESCOW corpus. Less frequent verbs, such as for instance, indicative past tense conjugations of desestimar (e.g., desestimar_VERB_Ind_Past [e.g., desestimé; I dismissed]), present subjunctive forms of comprometer (comprometer_VERB_Sub_Pres [e.g., comprometa; were I to compromise]), or future indicative forms of describir (desribir_VERB_Ind_Fut [e.g., describiré; I will describe; describirán; they will describe]) are considered more sophisticated than verbs that are more frequent, such as the present indicative of the verb querer (e.g., querer_VERB_Ind_Pres [e.g., quiero, I want; queremos, we want]), the present indicative of decir (decir_VERB_Ind_Pres [e.g., dice; she/he says]), or past tense indicative forms of dar (dar_VERB_Ind_Past [e.g., di; I gave]). It is expected that more proficient learners will (on average) use less frequent verb forms. By measuring verb frequency in this way, we do not assume that learners who produce one verb form can produce all tenses. We approach verb inflection as a feature of L2 Spanish development, as less common verb forms may take longer to learn (Asención-Delaney & Collentine, Reference Asencion-Delaney and Collentine2011; Montrul, Reference Montrul2004) and may be predictive of proficiency (Schnur & Rubio, Reference Schnur and Rubio2021).
Bigram MI score
Bigram MI scores comprise the mean MI score for bigrams in a learner text. Word combinations that are more exclusive earn higher MI scores (e.g., caer_AUX_Inf-derrotar_ADJ [e.g., cayendo derrotado; falling defeated], platillo volante [flying saucer], inversamente proporcional [inversely proportional), and less exclusive word combinations earn lower MI scores (e.g., solo_ADJ-saber_VERB_Ind_Pres [e.g., solo sé; I only know], cuando ellas [when they (female)], trabajar_VERB_Inf-a_ADP [e.g., trabajar a; to work to]). Previous L2 English research has indicated that more advanced L2 learners tend to use more strongly associated bigrams.
Bigram T score
Bigram T scores comprise the mean T score for bigrams in a learner text. Frequently occurring word combinations tend to earn higher T scores (e.g., muy bueno [very good], una vez [one time], en mi [in my]), whereas less frequently occurring word combinations tend to earn lower T scores (e.g., similar porque [similar because], familiar en [relative in], leer_VERB_Ind_Pres-a_ADP [e.g., leo a; read to]). Previous research in English (e.g., Eguchi & Kyle, Reference Eguchi and Kyle2020; Garner et al., Reference Garner, Crossley and Kyle2019; Granger & Bestgen, Reference Granger and Bestgen2014) has indicated that more advanced L2 learners tend to use bigrams that earn higher T scores.
Word concreteness
Concreteness scores represent the average concreteness value for words in a text. Concreteness refers to the degree to which a word refers to a perceptible entity. In this study, concreteness scores collected by Guasch et al. (Reference Guasch, Ferré and Fraga2016) were used. Word such as abeja (bee), manzana (apple), and silla (chair) earn higher concreteness scores, whereas words such as amargura (bitterness), salud (health), and suerte (luck) earn lower concreteness scores. Previous research has suggested that more proficient learners will (on average) use words with lower concreteness scores. Given the relatively small number of words included in the Guasch et al. (Reference Guasch, Ferré and Fraga2016) study, concreteness scores were available for approximately 20% of the content words in the learner corpus.
Statistical analyses
To analyze the longitudinal data, a series of linear mixed-effects (LME) models were developed using R (R Core Team, 2021) to determine whether indices of lexical richness were predictive of language development as a function of time. This advanced statistical analysis allows us to examine development over time while also considering participants and their individual trajectories (Gries, Reference Gries, Granger, Gilquin and Meunier2015). The analyses were calculated using R package lme4 (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). In each model, the lexical index (e.g., frequency_CW, MATTR_lemmas) was set as the dependent variable, time as the fixed effect, and participant as a random effect, which uses random intercepts. This model presumes that although the characteristics of each participant’s productive lexical use may be at a different starting point before SA, their development would follow similar trajectories and increase or decrease at approximately the same rate. This is the equation used for all models: lmer (lexical_index ~ Time + (1 | Participant), data). The R package lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2015) was used to estimate p values. To calculate the effect size of each model, we used the R package MuMln (Bartoń, Reference Berton2020). Both R 2 marginal (R 2m) and conditional (R 2c) values are reported. The R 2m values indicate the amount of the variance explained by the fixed effects alone within the group, whereas R 2c explains the amount of the variance by both fixed effects and random effects. Finally, the R package emmeans (Lenth et al., Reference Lenth, Singmann, Love, Buerkner and Herve2018) was used to obtain estimated marginal means and run post hoc pairwise comparison of the marginal means to identify significant differences among collection points and how time can predict growth in these models. Correlation analyses were conducted to determine the strength of the relationship between lexical indices used in this study. A repository with all files for all the analyses can be found at https://osf.io/atbws/?view_only=1b01cfd8aa3c41e8b7bac87288095757.
Results
Several LME models were conducted to examine the change in the characteristics of productive lexical use during the 21-month of the study period. To measure lexical richness, an index of lexical diversity and two indices of word frequency were calculated. Two indices of n-gram association strength were also measured. Additionally, scores for concreteness were calculated at each collection point. All indices meet the assumption of normality. The score of each lexical index was set as the dependent variable in each of the models, collection points were set as the fixed effects and participants as random effects. The results of each LME model are reported below. Descriptive statistics, visualization of group means and individual trajectories, and post hoc pairwise comparisons between collection points are included. Furthermore, a correlation matrix showing the relationship of all indices is included.
Lexical diversity
An LME model was conducted to investigate the change of lexical diversity as measured by MATTR during the 21 months. Descriptive statistics can be found in Table 2. The group means and individual trajectories are visualized in Figure 1.
The results of the LME model indicated a meaningful and significant relationship (p = .003) between the fixed effects (collection point) and lexical diversity. The model indicated that the fixed effects (collection points) explained 4.14% of the variance in lexical diversity scores (R 2m = .0414). The model also indicated that the combination of fixed and random effects accounted for 64.5% of the variance (R 2c = .6449) in lexical diversity scores, suggesting a high degree of variation across participants. Post hoc pairwise analyses showed a large, significant increase (p = .016, d = -0.897) in lexical diversity scores between Time 1 (predeparture) and Time 6 (9 months after returning to the home country). There is not a significant change in lexical diversity scores during the first 2 months abroad (Time 2). However, a medium, significant increase (p = .047, d = -0.794) was observed between Time 2 (after 2 months abroad) and Time 6. See Table 3 and 4 for a summary of the results and Figure 2 for a visualization of the results.
Lexical sophistication
Content word frequency
Descriptive statistics for content word frequency can be found in Table 5 and are visualized in Figure 3.
The results of the LME model indicated a meaningful and significant relationship (p < .001) between collection point and content word frequency scores. The model indicated that the fixed effects (collection points) explained 17.4% of the variance in content word frequency scores (R 2m = .174). The model also indicated that the combination of fixed and random effects explained 59.9% of the variance (R 2c = .599), suggesting a high degree of variation across participants. Post hoc pairwise analyses showed a large, significant increase (p < .001, d = -1.399) in content word frequency scores between Time 1 (predeparture) and Time 2 (2 months in country). This increase remained during the remainder of the study period, but no further significant increases or decreases were observed after Time 2. See Table 6 and 7 for a summary of the results and Figure 4 for a visualization.
Verb frequency
An LME model was conducted to investigate the degree to which the average frequency of verbs changed during the study period. Descriptive statistics can be found in Table 8 and are visualized in Figure 5.
The results indicated a meaningful and significant relationship (p < .001) between collection point and verb frequency scores. The model indicated that the fixed effects (collection points) explained 10.6% of the variance in verb frequency scores (R 2m = .106). The model also indicated that the combination of fixed and random effects accounted for 49.4% of the variance (R 2c = .494) in verb frequency scores, suggesting a high degree of individual variation. Post hoc pairwise analyses indicated a large, significant decrease (p < .001, d = 1.321) in verb frequency scores between Time 1 (predeparture) and Time 4 (after 9 months abroad). This decrease remained significant (p < .05, d = 0.923) at Time 5 (after 5 months back home). The analysis also indicated a large, significant decrease (p < .001, d = 1.228) between Time 2 (after 2 months abroad) and Time 4 (after 9 months abroad), which remained significant (p = .032, d = .830) at Time 5. See Table 9 and 10 for a summary of the results and Figure 6 for a visualization.
N-gram association strength
Mutual information
Descriptive statistics for MI scores can be found in Table 11 and are visualized in Figure 7.
The results indicate a meaningful and significant relationship (p = .009) between collection point and MI scores. The model indicated that the fixed effects (collection points) explained 4.7% of the variance in MI scores (R 2m = .047), and that the combination of fixed and random effects explained 52.8% of the variance in MI scores, (R 2c = .528), suggesting a high degree of variation across participants. Post hoc pairwise analyses showed a large, significant decrease (p = .035; d = .824), between Time 1 (previsit) and Time 2 (the first 2 months abroad), which remained significant until Time 3 (5 months after being abroad; p = .036; d = .821). No further significant increase or decrease was observed for the rest of the study. See Table 12 and 13 for a summary of the results and Figure 8 for a visualization.
T scores
Descriptive statistics for T scores can be found in Table 14 and visualized in Figure 9.
The results indicate a nonsignificant relationship (p = .107) between collection point and T scores. The model indicated that the fixed effects (collection points) explained 2.7% of the variance in T scores (R 2m = .027), and the combination of fixed and random effects explained 53.2% of the variance (R 2c = .532).
Psycholinguistic Norms
Concreteness
Descriptive statistics for concreteness scores can be found in Table 15 and are visualized in Figure 10.
The results of the model indicate a meaningful and significant relationship (p < .001) between collection point and concreteness scores. The model indicated that the fixed effects (collection points) explained 19.7% of the variance in concreteness scores (R 2m = .197) and the combination of fixed and random effects explained 61.1% of the variance R 2c = .611), suggesting a high degree of variation across participants. Post hoc pairwise analyses showed a large, significant increase (p =.002, d = -1.923) between Time 1 (predeparture), and Time 2 (after 2 months abroad). The increase remained significant (p < .001, d = -1.717) until Time 4 (after 9 months abroad). A large, significant decrease (p =.022, d = 0.865) in concreteness scores was observed between Time 4 (the last month abroad) and Time 5 (after being home for 5 months) which remained significant until Time 6 (after 9 months back home). See Table 16 and 17 for a summary of the results and Figure 11 for a visualization of the results.
Correlation analyses
To determine the relationships between indices of L2 Spanish lexical richness used in this study, correlation analyses were conducted. A correlation matrix with the correlation coefficients can be found in Table 18.
The results from the correlational analyses indicate that most indices were not strongly correlated with each other, with a few exceptions. The results show a large, positive correlation between T scores and MI scores (r = .711). There is also a medium, positive correlation between MATTR and T scores (r = .523), and between MATTR and MI scores (I = .483).
Discussion
In this study we investigated the development of lexical richness in L2 oral interviews across multiple dimensions of lexical use using advanced natural language processing tools. First, the results suggest that lexical diversity (as measured by MATTR) sees meaningful growth over the 21-month period. It appears that participants’ spoken lexicon slowly increases while being abroad and that it continues after returning home. The findings indicate that it may take some time to incorporate a wider variety of words in spontaneous speech, as MATTR values only reach significance after participants have been back home for 9 months. Although the results support some of the previous findings of lexical diversity being a good indicator of L2 development (Jarvis, Reference Jarvis2017; Mora & Valls-Ferrer, Reference Mora and Valls-Ferrer2012; Serrano et al., Reference Serrano, Tragant and Llanes2012; Tavakoli, Reference Tavakoli2018), this study adds further insight into how advanced learners develop their spontaneous spoken lexicon and that the growth in their vocabulary may take time to show. A related study that used same learner corpus but unlemmatized lexical items and a different index of lexical diversity (Mitchell et al., Reference Mitchell, Tracy-Ventura and McManus2017) had somewhat divergent findings. Mitchell et al. found increases in lexical diversity between predeparture interviews and all times abroad, after which lexical diversity scores decreased. There are at least two issues that warrant further exploration. The first is the degree to which operationalization of lexical items (lemmatized versus unlemmatized orthographic forms) affects measurements of diversity in Spanish L2 texts (see Jarvis & Hashimoto, Reference Jarvis and Hashimoto2021, for some explorations into this issue for L2 English texts). The second issue is operationalization of lexical diversity. In this study MATTR was used, which has been demonstrated to be particularly independent of text length (Vidal & Jarvis, Reference Vidal and Jarvis2020; Zenker & Kyle, Reference Zenker and Kyle2021). Mitchell et al. (Reference Mitchell, Tracy-Ventura and McManus2017) used D to index lexical diversity. The results of research concerning D have been mixed, with some studies finding a positive relationship between D and text length (Koizumi & In’nami, Reference Koizumi and In’nami2012), which suggests that D conflates text length and diversity. This may help to explain the differences found between the two studies.
The results also suggest that lexical sophistication, as measured by mean content word frequency, changes meaningfully during the first 2 months abroad. More time in the country does not appear to affect content word frequency, but change does appear to be durable, as content word frequency values do not significantly decrease after returning home. As participants travel abroad and advanced in their proficiency, they incorporate more frequent words into their spontaneous speech. These results are mostly consistent with previous findings of spoken lexical use (Crossley et al., Reference Crossley, Salsbury and McNamara2010; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle & Crossley, Reference Kyle and Crossley2015; Tracy-Ventura, Reference Tracy-Ventura2017; cf. Bardel et al., Reference Bardel, Gudmundson and Lindqvist2012). It may be possible that as learners spend time in the host country, they begin to use more frequent words as they move away from the textbook language they most likely had experienced in their classrooms. These results diverge from the findings of Tracy-Ventura (Reference Tracy-Ventura2017), who measured lexical sophistication in the LANGSNAP corpus but combined written and spoken tasks at each collection point. Using frequency bands for all words across two points (predeparture and end of stay), Tracy-Ventura found that participants used more low-frequency words by the end of their stay than at predeparture. The observed differences between the two studies suggest that register differences play an important role in the measurement of lexical sophistication. Although previous research has found relatively similar results across band-based and mean-based frequency norms (Crossley et al., Reference Crossley, Cobb and McNamara2013), differences in operationalizations may have also contributed to differences across studies.
In contrast to content word frequency, verb frequency values decreased by the end of the SA period, indicating that learners incorporated more sophisticated verbs. The change seems durable, as it remains significant 5 months after returning home. The results suggest that learners need to spend several months abroad until a decrease in verb frequency values reaches significance, but that the change is long-lasting. As participants spend more time abroad and advanced in their L2, they are able to produce more infrequent verbs forms. For example, the present tense of indicative of the verb “to have,” tengo ([I have], tener_VERB_Ind_Pres; 2159.445 per million), would receive a higher frequency score than the subjunctive imperfect form, tuviera ([were I to have] tener_VERB_Sub_Imp, 34.683 per million). The findings suggest that the use of more infrequent verb forms may be indicative of advanced spoken L2 Spanish. This implies that certain verb forms in Spanish, such as the past subjunctive, may take longer to learn and may be harder to retrieve in spontaneous speech than other forms. These results shed light into how low-frequency verbs in Spanish may be a good predictor L2 development (Schnur & Rubio, Reference Schnur and Rubio2021), and that the relationship between lexis and grammar needs to be considered when examining certain features of lexical richness.
Recent research has highlighted the importance of measuring collocation use as a multifaceted construct. In this study, we analyzed two indices that measure the association strength between two words—namely, MI and T score. The results suggest that MI values significantly decrease during the first 2 months of the stay abroad, and this decrease remains significant 5 months later. By the end of the 9 months abroad, participants start using bigrams with higher MI scores again, although the increase is not statistically significant.
These findings may be related to the formulaic language used in a classroom setting that uses highly exclusive n-grams or collocations composed of low-frequency words. As participants get immersed in the target language, they produce a greater proportion of low-MI-scored bigrams. It could be possible that when participants move abroad, the everyday language required from them is not characterized by bigrams formed by infrequent words, as it could be in their previous classroom setting. However, by the end of the stay abroad, bigrams with high MI scores increase. Participants may need more time abroad to incorporate bigrams composed of lower frequency words into their lexical repertoire. Examples of collocations in the learner corpus that receive high MI scores are habla hispana (“Hispanic language,” 8.8144), hablante nativo (“native speaker,” 8.26333) or madres solteras (“single mothers,” 7.695), whereas the collocations un hablante (“a speaker,” 0.77765) or las madres (“the mothers,” 1.45042) would receive a lower MI score. As their vocabulary grows abroad, so does the use of highly exclusive collocations. The results show no significant change in T scores during the study. High T score values tend to highlight bigrams composed of frequent words, yet the production of collocations composed of frequent items (high T scores) does not seem to be predictive of development in this study.
The results of these SOA norms partly align with previous research, in which more advanced learners’ texts will generally include bigrams with higher MI scores, (Ellis et al., Reference Ellis, Simpson-Vlach and Maynard2008; Gablasova et al., Reference Gablasova, Brezina and McEnery2017; Garner et al., Reference Garner, Crossley and Kyle2019; Granger & Bestgen, Reference Granger and Bestgen2014; Kyle et al., Reference Kyle, Crossley and Berger2018; Paquot, Reference Paquot2019). Even though there is a significant decrease of MI scores the first few months abroad, they end up increasing later on throughout the study. There is no change in T scores, unlike what previous studies have found. More research on collocational use of L2 Spanish is needed to better interpret these findings. In particular, it may be useful to investigate bigrams with particular parts of speech (Bestgen & Granger, Reference Bestgen and Granger2014) and/or dependency bigrams (Kyle & Eguchi, Reference Kyle, Eguchi and Granger2021; Paquot, Reference Paquot2019). Nevertheless, this study highlights the importance of a multidimensional approach to collocation use and other indices of lexical sophistication of L2 Spanish.
Studies that have examined psycholinguistic word information investigate the extent to which L1 norms can predict L2 spoken lexical proficiency (Salsbury et al., Reference Salsbury, Crossley and McNamara2011). These psycholinguistic norms are related to processing, saliency, retrieval, and learnability of a word. The results of this study show that concreteness scores changed meaningfully during the study. Participants’ use of concrete words increased after 2 months abroad, which remained significant until the end of their stay abroad. However, as they return home, participants started using fewer concrete words in comparison to the words produced after arrival.
A significant decrease in concreteness scores was observed 5 months after their return home. However, the results suggest that the use of more concrete words throughout the study period is still meaningful, as evidenced by the fact that concreteness scores were significantly higher 5 months after returning home than before departure.
These findings suggest that the use of more salient (more concrete) words may be an indicator of L2 spoken development. However, these changes are not durable after returning home. These findings differ from previous research using L2 corpora, where proficient speech is generally characterized by less salient and more difficult to retrieve lexical items (Crossley et al., Reference Crossley, Kyle and Salsbury2016; Crossley & Skalicky, Reference Crossley and Skalicky2019; Eguchi & Kyle, Reference Eguchi and Kyle2020; Kyle et al., Reference Kyle, Crossley and McNamara2016; Salsbury et al., Reference Salsbury, Crossley and McNamara2011). A possible explanation could be linked to the oral corpus used in this study, which is conversational in nature, unlike studies using controlled tasks or higher stakes tasks, such as those in testing settings. It may be that being immersed in the language in comparison with taking foreign language courses at a university might affect word processing as well.
As a whole, these findings show how several measures of lexical richness may be indicative of oral L2 development. A valuable finding shown in this study is that development of vocabulary is not linear and that the change in some indices is more durable than others. The results suggest that as learners advanced in their L2 oral skills, they tended to use a more varied vocabulary, more frequent content words but infrequent verb forms. The correlation analyses suggest that there is not a strong relationship between most measures of lexical richness but that bigram SOA measures (T scores and MI scores) are strongly correlated. This finding is not necessarily surprising, given that the indices measure related aspects of the same subconstruct. Lexical diversity also appears to be correlated to SOA indices. However, studies of L2 Spanish collocational features are rare. Research on the development of Spanish productive collocational use in a variety of written and spoken registers is clearly needed to better interpret these findings.
As suggested by the findings, the immersion in the country of residence compared with being at the home university taking advanced language courses (Time 5 and Time 6), considering everything that may influence L2 use in each setting (e.g., host family, social life, exposure, formal instruction), may play a role on the characteristics of participants’ productive lexical use, especially in oral conversations. It appears that the change of setting from home university context to immersion might affect certain features of productive vocabulary more than it affects others, especially for association strength indices and psycholinguistic word norms. Future research should explore differences in spontaneous speech development in an instructed foreign language setting and SA programs (Collentine, Reference Collentine2004; Segalowitz & Freed, Reference Segalowitz and Freed2004).
Conclusion
This study examined six dimensions of lexical richness of L2 Spanish using a longitudinal spoken learner corpus. We measured a series of frequency-based indices, association strength measures, and psycholinguistic norms that have been found to be representative of advanced L2 proficiency or lexical use. For the analysis, we used TAALES_ES, a tool for the automatic analysis of lexical sophistication in Spanish text that will allow for replicable analysis in Spanish learner corpora. The results on lexical diversity, frequency, and to a certain degree bigram association strength support previous findings on the characteristics of productive vocabulary use. The results regarding psycholinguistic word norms differ to some extent with those from past studies. However, most studies have investigated written and cross-sectional corpora. Research on longitudinal and oral corpora is needed to understand actual L2 development of spontaneous speech. The reported findings require additional evidence of L2 Spanish and oral corpora studies to provide support to understand how speech develops over time.
The current study has pedagogical implications for teaching L2 speaking and, in particular, for vocabulary instruction sensitive to register. Because advanced spontaneous speech may be characterized by features that differ from written proficiency and other spoken registers (e.g., the use of higher versus lower frequency words), teachers may consider implementing awareness-raising tasks that highlight the type of vocabulary that is appropriate for different types of formal and informal registers (e.g., everyday conversation versus work-place small talk). This may require, for instance, emphasizing the teaching of highly exclusive collocations or highlighting the importance of frequent and salient words during spontaneous everyday conversational tasks. Furthermore, facilitating tasks where students are pushed to produce certain verb forms that take longer to learn but may be associated with more advanced speech could be helpful to automatize the processes behind verb conjugations in L2 Spanish.
A limitation of this study is that the psycholinguistic word lists accounted for only 20% of the content words in the learner texts, so the results should be taken with caution, as these just provide a first look into the psycholinguistic properties of words of L2 Spanish. Additionally, when calculating average verb frequency, we only tagged for tense and mood. Analyzing accuracy and agreement of verb forms, as well as other aspects of lexico-grammar such as gender agreement, would reveal a more in-depth picture of the characteristics of L2 Spanish productive vocabulary use.
Future research should also explore individual development. The LME models show that much of the variance was explained at the individual level, which could indicate that individuals progress at different rates and follow different paths. Taking a dynamic systems perspective (Cameron & Larsen-Freeman, Reference Cameron and Larsen-Freeman2007) to study productive lexical use would give us insight into the complexities of a learner’s interlanguage development.
Data availability statement
The experiment in this article earned an Open Data badge for transparent practices. The materials are available at https://osf.io/atbws/?view_only=1b01cfd8aa3c41e8b7bac87288095757