Book contents
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- 6 Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
- 7 Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
- 8 Comparing Bayesian and Frequentist Models of Language Variation
- 9 Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
- Part IV Applications of Classification-Based Approaches
- Index
- References
7 - Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
Three-Way Genitive Variation in English
from Part III - Perspectives on Multifactorial Methods
Published online by Cambridge University Press: 06 May 2022
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- 6 Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
- 7 Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
- 8 Comparing Bayesian and Frequentist Models of Language Variation
- 9 Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
- Part IV Applications of Classification-Based Approaches
- Index
- References
Summary
The authors apply logistic regression, multinomial regression, classification trees and random forests to a ternary outcome variable: the variation between the ’s-genitive, the of-genitive and functionally equivalent noun + noun combinations. The statistical approaches discussed fall into regression models on the one hand and classification trees on the other. Specifically, as an alternative to successive binomial regression analyses, the authors implement a multinomial model, which can analyse the entire dataset with three outcome categories simultaneously. Further, a basic classification tree is calculated alongside a more complex (and more robust) random forest. The chapter does not only weigh advantages and shortcomings of all four models, but it also explicates the different rationales and interpretations that come with them. As a major insight, it emerges that the nature of the dataset, the analytic purpose and the statistical model are interdependent and condition each other in several non-trivial respects.
- Type
- Chapter
- Information
- Data and Methods in Corpus LinguisticsComparative Approaches, pp. 194 - 223Publisher: Cambridge University PressPrint publication year: 2022
References
Further Reading
References
- 1
- Cited by