Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-22T20:52:51.733Z Has data issue: false hasContentIssue false

Generalizability in mixed models: Lessons from corpus linguistics

Published online by Cambridge University Press:  10 February 2022

Freek Van de Velde
Affiliation:
Department of Linguistics, KU Leuven, Blijde Inkomststraat 21/3308, BE-3000Leuven, Belgium. [email protected] [email protected] [email protected]://www.arts.kuleuven.be/ling/qlvl/people/pages/00039016; https://www.arts.kuleuven.be/ling/qlvl/people/pages/00102617; https://www.arts.kuleuven.be/ling/qlvl/people/pages/00013279
Stefano De Pascale
Affiliation:
Department of Linguistics, KU Leuven, Blijde Inkomststraat 21/3308, BE-3000Leuven, Belgium. [email protected] [email protected] [email protected]://www.arts.kuleuven.be/ling/qlvl/people/pages/00039016; https://www.arts.kuleuven.be/ling/qlvl/people/pages/00102617; https://www.arts.kuleuven.be/ling/qlvl/people/pages/00013279
Dirk Speelman
Affiliation:
Department of Linguistics, KU Leuven, Blijde Inkomststraat 21/3308, BE-3000Leuven, Belgium. [email protected] [email protected] [email protected]://www.arts.kuleuven.be/ling/qlvl/people/pages/00039016; https://www.arts.kuleuven.be/ling/qlvl/people/pages/00102617; https://www.arts.kuleuven.be/ling/qlvl/people/pages/00013279

Abstract

Part of the generalizability issues that haunt controlled lab experiment designs in psychology, and more particularly in psycholinguistics, can be alleviated by adopting corpus linguistic methods. These work with natural data. This advantage comes at a cost: in corpus studies, lexemes and language users can show different kinds of skew. We discuss a number of solutions to bolster the control.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Balota, D. A., Yap, M. J., Hutchison, K. A., & Cortese, M. K. (2012). Megastudies: What do millions (or so) of trials tell us about lexical processing? In Adelman, J. S. (Ed.), Visual word recognition volume 1: Models and methods, orthography and phonology (pp. 90115). Psychology Press.Google Scholar
Bergs, A., & Hoffmann, T. (Eds.) (2017). Cognitive approaches to the history of English. Special issue of English Language and Linguistics, 21(2), 191–438.CrossRefGoogle Scholar
Beuls, K., & Steels, L. (2013). Agent-based models of strategies for the emergence and evolution of grammatical agreement. PLoS ONE 8(3), e58960.CrossRefGoogle ScholarPubMed
Bloem, J. (2021). Processing verb clusters. LOT Dissertation Series.Google Scholar
Bresnan, J., Cueni, A., Nikitina, T., & Baayen, H. (2007). Predicting the dative alternation. In Bouma, G., Krämer, I., & Zwarts, J. (Eds), Cognitive foundations of interpretation (pp. 7796). Amsterdam: KNAW/Edita.Google Scholar
De Smet, I., & Van de Velde, F. 2020. A corpus-based quantitative analysis of twelve centuries of preterite and past participle morphology in Dutch. Language Variation and Change 32(3), 241265.CrossRefGoogle Scholar
Gennari, S., & Macdonald, M. (2009). Linking production and comprehension processes: The case of relative clauses. Cognition 111(1), 123.CrossRefGoogle ScholarPubMed
Gries, S. T. (2005). Syntactic priming: a corpus-based approach. Journal of Psycholinguistic Research, 34(4), 365399.CrossRefGoogle ScholarPubMed
Gries, S. T. (2015). The most underused statistical method in corpus linguistics: Multi-level (and mixed-effects) models. Corpora 10(1), 95125.CrossRefGoogle Scholar
Grondelaers, S., Speelman, D., Drieghe, D., Brysbaert, M., & Geeraerts, D. (2009). Introducing a new entity into discourse: Comprehension and production evidence for the status of Dutch er ‘there’ as a higher-level expectancy monitor. Acta Psychologica 130(2), 153160.CrossRefGoogle ScholarPubMed
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning. Data mining, inference, and prediction (2nd ed.). Springer.Google Scholar
Hundt, M., Mollin, S., & Pfenninger, S. E. (Eds.). (2017). The changing English language. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Jaeger, F. T. (2006). Redundancy and syntactic reduction in spontaneous speech. PhD diss., Stanford University.Google Scholar
Keuleers, E., & Balota, D. A. (2015). Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments. Quarterly Journal of Experimental Psychology 68(8), 14571468.CrossRefGoogle ScholarPubMed
Landsbergen, F., Lachlan, R., Ten Cate, C., & Verhagen, A. (2010). A cultural evolutionary model of patterns in semantic change. Linguistics 48(2), 363390.CrossRefGoogle Scholar
Lestrade, S. (2015). A case of cultural evolution: The emergence of morphological case. In Köhnlein, B. & Audring, J. (Eds.), Linguistics in the Netherlands (pp. 105115). John Benjamins.Google Scholar
Petré, P., & Van de Velde, F. (2018). The real-time dynamics of the individual and the community in grammaticalization. Language 94(4), 867901.CrossRefGoogle Scholar
Piantadosi, S. T., Tily, H., & Gibson, E. (2011). Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences 108(9), 35263529.CrossRefGoogle ScholarPubMed
Pijpops, D., Beuls, K., & Van de Velde, F. (2015). The rise of the verbal weak inflection in Germanic. An agent-based model. Computational Linguistics in the Netherlands Journal 5, 81102.Google Scholar
Pijpops, D., Speelman, D., Grondelaers, S., & Van de Velde, F. (2018). Comparing explanations for the complexity principle. Evidence from argument realization. Language and Cognition 10(3), 514543.CrossRefGoogle Scholar
Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., … Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913929.CrossRefGoogle Scholar
Roland, D., Elman, J., & Ferreira, V. (2006). Why is ‘that’? Structural prediction and ambiguity resolution in a very large corpus of English sentences. Cognition 98(3), 245272.CrossRefGoogle Scholar
Röthlisberger, M., Grafmiller, J., & Szmrecsanyi, B. (2017). Cognitive indigenization effects in the English dative alternation. Cognitive Linguistics 28(4), 673710.CrossRefGoogle Scholar
Speelman, D., Heylen, K., & Geeraerts, D. (2018). Introduction. In Speelman, D., Heylen, K., & Geeraerts, D. (Eds.), Mixed-effects regression models in linguistics (pp. 110). Springer.CrossRefGoogle Scholar
Steels, L. (2016). Agent-based models for the emergence and evolution of grammar. Philosophical Transactions of the Royal Society B 371, 20150447.CrossRefGoogle ScholarPubMed
Stefanowitsch, A., & Gries, S.T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8(2), 209244.CrossRefGoogle Scholar
Szmrecsanyi, B. (2005). Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1(1), 113150.CrossRefGoogle Scholar
Van de Velde, F. & Pijpops, D. (2019). Investigating lexical effects in syntax with regularized regression (Lasso). Journal of Research Design and Statistics in Linguistics and Communication Science, 6(2), 166199.Google Scholar
Van de Velde, F., & Peter, P. 2020. Historical linguistics. In Adolphs, S., & Knight, D. (Eds.), The Routledge handbook of English language and digital humanities (pp. 328359). Routledge.CrossRefGoogle Scholar
Wiechmann, D. (2008). On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory 4(2), 253290.CrossRefGoogle Scholar
Wolk, C., Bresnan, J., Rosenbach, A., & Szmrecsanyi, B. (2013). Dative and genitive variability in Late Modern English: Exploring cross-constructional variation and change. Diachronica 30(3), 382419.CrossRefGoogle Scholar
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science 12(6), 11001122.CrossRefGoogle ScholarPubMed
Zipf, G. K. (1935). The psycho-biology of language. An introduction to dynamic philology. Houghton Mifflin.Google Scholar