Evaluating evidence for the reliability and validity of lexical diversity indices in L2 oral task responses

Kristopher Kyle; Hakyung Sung; Masaki Eguchi; Fred Zenker

doi:10.1017/S0272263123000402

Evaluating evidence for the reliability and validity of lexical diversity indices in L2 oral task responses

Published online by Cambridge University Press: 30 August 2023

and

Kristopher Kyle*: Affiliation:
Department of Linguistics, University of Oregon, Eugene, OR, USA
Hakyung Sung: Affiliation:
Department of Linguistics, University of Oregon, Eugene, OR, USA
Masaki Eguchi: Affiliation:
Department of Linguistics, University of Oregon, Eugene, OR, USA
Fred Zenker: Affiliation:
Department of Second Language Studies, University of Hawaii at Manoa, Honolulu, HI, USA
*: Corresponding author: Kristopher Kyle; Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Although lexical diversity is often used as a measure of productive proficiency (e.g., as an aspect of lexical complexity) in SLA studies involving oral tasks, relatively little research has been conducted to support the reliability and/or validity of these indices in spoken contexts. Furthermore, SLA researchers commonly use indices of lexical diversity such as Root TTR (Guiraud’s index) and D (vocd-D and HD-D) that have been preliminarily shown to lack reliability in spoken L2 contexts and/or have been consistently shown to lack reliability in written L2 contexts. In this study, we empirically evaluate lexical diversity indices with respect to two aspects of reliability (text-length independence and across-task stability) and one aspect of validity (relationship with proficiency scores). The results indicated that neither Root TTR nor D is reliable across different text lengths. However, support for the reliability and validity of optimized versions of MATTR and MTLD was found.

Type: Methods Forum
Information: Studies in Second Language Acquisition , Volume 46 , Issue 1 , March 2024 , pp. 278 - 299

DOI: https://doi.org/10.1017/S0272263123000402 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

ACTFL-ALC Press. (1996). Standard Speaking Test manual.Google Scholar

ALC Press. (2010). The Standard Speaking Test (SST). http://tsst.alc.co.jp/e/assessment.html Google Scholar

Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D. (2017). Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques. Language Learning, 67, 180–208.CrossRef Google Scholar

Bartoń, K. (2019). MuMIn: Multi-Model Inference (1.43.6) [Computer software]. https://cran.r-project.org/web/packages/MuMIn/index.html Google Scholar

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48.CrossRef Google Scholar

Biber, D., Conrad, S. M., Reppen, R., Byrd, P., Helt, M., Clark, V., Cortes, V., Csomay, E., & Urzua, A. (2004). Representing language use in the university: Analysis of the TOEFL 2000 Spoken and Written Academic Language corpus. TOEFL Monograph Series.Google Scholar

Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45, 5–35.CrossRef Google Scholar

Bulté, B., & Housen, A. (2019). Beginning L2 complexity development in CLIL and non-CLIL secondary education. Instructed Second Language Acquisition, 3, 153–180.CrossRef Google Scholar

Bulté, B., & Roothooft, H. (2020). Investigating the interrelationship between rated L2 proficiency and linguistic complexity in L2 speech. System, 91, Article 102246.CrossRef Google Scholar

Carlson, S. B., Bridgeman, B., Camp, R., & Waanders, J. (1985). Relationship of admission test scores to writing performance of native and nonnative speakers of English. ETS Research Report Series, 1985, i–137.CrossRef Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the Test of English as a Foreign Language. Routledge.Google Scholar

Chotlos, J. W. (1944). Studies in language behavior IV: A statistical and comparative analysis of individual written language samples. Psychological Monographs, 56, 75–111.CrossRef Google Scholar

Cohen, J. (1988). Statistical power analysis fo the behavioral sciences. Routledge.Google Scholar

Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17, 94–100.CrossRef Google Scholar

Cumming, A. H., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10, 5–43.CrossRef Google Scholar

Engber, C. A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4, 139–155.CrossRef Google Scholar

Explosion AI. (2018). SpaCy language models. https://spacy.io/models/en#en_core_web_sm Google Scholar

Fergadiotis, G., Wright, H. H., & Green, S. B. (2015). Psychometric evaluation of lexical diversity indices: Assessing length effects. Journal of Speech, Language, and Hearing Research, 58, 840–852.CrossRef Google Scholar PubMed

Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In Millar, R. T., Martin, K. I., Eddington, C. M., Henery, N. M., & Tseng, A. (Eds.), Selected proceedings of the 31st Second Language Research Forum (pp. 240–254). Cascadilla Proceedings Project.Google Scholar

Guiraud, P. (1960). [Problems and methods of linguistic statistics]. Reidel.Google Scholar

Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech, Language, and Hearing Research, 29, 129–134.CrossRef Google Scholar PubMed

Hwang, H. (2020). A contrast between VP-Ellipsis and Gapping in English: L1 acquisition, L2 acquisition, and L2 processing [Unpublished doctoral dissertation]. University of Hawaiʻi at Mānoa.Google Scholar

Ishikawa, S. (2011). A new horizon in learner corpus studies: The aim of the ICNALE project. In Weir, G., Ishikawa, S., & Poonpon, K. (Eds.), Corpora and language technologies in teaching, learning and research (pp. 3–11). University of Strathclyde Press.Google Scholar

Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels of second language speaking proficiency: How distinct? Applied Linguistics, 29, 24–49.CrossRef Google Scholar

Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus: Exploiting the language learners’ speech database for research and education. International Journal of The Computer, the Internet and Management, 12, 119–125.Google Scholar

Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19, 57–84.CrossRef Google Scholar

Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63, 87–106.CrossRef Google Scholar

Jarvis, S. (2017). Grounding lexical diversity in human judgments. Language Testing, 34, 537–553.CrossRef Google Scholar

Jarvis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12, 377–403.CrossRef Google Scholar

Johnson, W. (1944). Studies in language behavior I: A program of research. Psychological Monographs, 56, 1–15. https://doi.org/10.1037/h0093508CrossRef Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.CrossRef Google Scholar

Kobayashi, Y., & Abe, M. (2016). Automated scoring of L2 spoken English with random forests. Journal of Pan-Pacific Association of Applied Linguistics, 20, 55–73.Google Scholar

Koizumi, R., & Hirai, A. (2012). Comparing the story retelling speaking test with other speaking tests. JALT Journal, 34, 35–60.CrossRef Google Scholar

Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40, 554–564.CrossRef Google Scholar

Koizumi, R., In’nami, Y., & Jeon, E. H. (2022). L2 speaking and its internal correlates: A meta-analysis. In Jeon, E. H. & In’nami, Y. (Eds.), Understanding L2 proficiency: Theoretical and meta-analytic investigations (pp. 307–338). John Benjamins.CrossRef Google Scholar

Kyle, K. (2022). Pylats Python package (.37) [Python]. https://pypi.org/project/pylats/Google Scholar

Kyle, K., & Crossley, S. A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102, 333–349.CrossRef Google Scholar

Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly, 18, 154–170.CrossRef Google Scholar

Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33, 319–340.CrossRef Google Scholar

Kyle, K., Eguchi, M., Choe, A. T., & LaFlair, G. (2022). Register variation in spoken and written language use across technology-mediated and non-technology-mediated learning environments. Language Testing, 39, 618–648.CrossRef Google Scholar

Lambelet, A. (2021). Lexical diversity development in newly arrived parent-child immigrant pairs: Aptitude, age, exposure, and anxiety. Annual Review of Applied Linguistics, 41, 76–94.CrossRef Google Scholar

Lennon, P. (2000). The lexical element in spoken second language fluency. In Riggenbach, H. (Ed.), Perspectives on fluency (pp. 25–42). University of Michigan Press.Google Scholar

Lenth, R., Singmann, H., Love, J., Buerkner, P., & Herve, M. (2019). R pagage Emmeans: Estimated marginal means, AKA least-squares means (1.47) [Computer software]. https://github.com/rvlenth/emmeans Google Scholar

Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal, 96, 190–208.CrossRef Google Scholar

Maas, H. D. (1971). Über den Zusammenhang zwischen Wortschatzumfang und Länge eines Textes [On the connection between vocabulary breadth and text length]. Zeitschrift Für Literaturwissenschaft Und Linguistik, 2, 73–96.Google Scholar

MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed., Vol. 2). Lawrence Erlbaum Associates.Google Scholar

Malvern, D. D., & Richards, B. J. (1997). A new measure of lexical diversity. In Ryan, A. & Wray, A. (Eds.), Evolving models of language (Vol. 12, pp. 58–71). Multilingual Matters.Google Scholar

Malvern, D. D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan.CrossRef Google Scholar

McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Doctoral dissertation, The University of Memphis].Google Scholar

McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24, 459–488.CrossRef Google Scholar

McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381–392.CrossRef Google Scholar PubMed

Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30, 555–578.CrossRef Google Scholar

Pfenniger, S. (2020). The dynamic multicausality of age of first bilingual language exposure: Evidence from a longitudinal content and language integrated learning study with dense time serial measurements. The Modern Language Journal, 104, 662–686.CrossRef Google Scholar

Polat, B., & Kim, Y. (2014). Dynamics of complexity and accuracy: A longitudinal case study of advanced untutored development. Applied Linguistics, 35, 184–207.CrossRef Google Scholar

Read, J. (2000). Assessing vocabulary. Cambridge University Press.CrossRef Google Scholar

Révész, A., Ekiert, M., & Torgersen, E. N. (2016). The effects of complexity, accuracy, and fluency on communicative adequacy in oral task performance. Applied Linguistics, 37, 828–848.Google Scholar

Tracy-Ventura, N., Huensch, A., & Mitchell, R. (2021). Understanding the long-term evolution of L2 lexical diversity: The contribution of a longitudinal learner corpus. In Le Bruyn, B. & Paquot, M. (Eds.), Learner corpus research meets second language acquisition (pp. 148–171). Cambridge University Press.Google Scholar

Tracy-Ventura, N., Mitchell, R., & McManus, K. (2016). The LANGSNAP longitudinal learner corpus. In Alonso-Ramos, M. (Ed.), Spanish learner corpus research: Current trends and future perspectives (Vol. 78, pp. 117–142). John Benjamins.CrossRef Google Scholar

Treffers-Daller, J. (2013). Measuring lexical diversity among L2 learners of French. In Jarvis, S. & Daller, M. (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 79–104). John Benjamins.CrossRef Google Scholar

Treffers-Daller, J., Mukhopadhyay, L., Balasubramanian, A., Tamboli, V., & Tsimpli, I. (2022). How ready are Indian primary school children for English medium instruction? An analysis of the relationship between the reading skills of low-SES children, their oral vocabulary and English input in the classroom in government schools in India. Applied Linguistics, 43, 746–775.CrossRef Google Scholar

Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39, 302–327.Google Scholar

Tweedie, F. J., & Baayen, R. H. (1998). How variable may a constant be? Measures of lexical richness in perspective. Computers and the Humanities, 32, 323–352.CrossRef Google Scholar

Vercellotti, M. L. (2017). The development of complexity, accuracy, and fluency in second language performance: A longitudinal study. Applied Linguistics, 38, 90–111.CrossRef Google Scholar

Verspoor, M., Schmid, M. S., & Xu, X. (2012). A dynamic usage based perspective on L2 writing. Journal of Second Language Writing, 21, 239–263.CrossRef Google Scholar

Vidal, K., & Jarvis, S. (2020). Effects of English-medium instruction on Spanish students’ proficiency and lexical diversity in English. Language Teaching Research, 24, 568–587.CrossRef Google Scholar

Yoon, H.-J. (2017). Linguistic complexity in L2 writing revisited: Issues of topic, proficiency, and construct multidimensionality. System, 66, 130–141.CrossRef Google Scholar

Zenker, F., & Kyle, K. (2021). Investigating minimum text lengths for lexical diversity indices. Assessing Writing, 47, Article 100505.CrossRef Google Scholar

Article contents

Evaluating evidence for the reliability and validity of lexical diversity indices in L2 oral task responses

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests