Book contents
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- 6 Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
- 7 Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
- 8 Comparing Bayesian and Frequentist Models of Language Variation
- 9 Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
- Part IV Applications of Classification-Based Approaches
- Index
- References
9 - Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
Concessive Constructions in Varieties of English
from Part III - Perspectives on Multifactorial Methods
Published online by Cambridge University Press: 06 May 2022
- Data and Methods in Corpus Linguistics
- Data and Methods in Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Acknowledgements
- Introduction: Comparative Approaches to Data and Methods in Corpus Linguistics
- Part I Corpus Dimensions and the Viability of Methodological Approaches
- Part II Selection, Calibration and Preparation of Corpus Data
- Part III Perspectives on Multifactorial Methods
- 6 Comparing Generalised Linear Mixed-Effects Models, Generalised Linear Mixed-Effects Model Trees and Random Forests
- 7 Comparing Logistic Regression, Multinomial Regression, Classification Trees and Random Forests Applied to Ternary Variables
- 8 Comparing Bayesian and Frequentist Models of Language Variation
- 9 Comparing Methods for the Evaluation of Cluster Structures in Multidimensional Analyses
- Part IV Applications of Classification-Based Approaches
- Index
- References
Summary
This chapter sets out by discussing the way in which multidimensional techniques and visualizations have been used to analyse linguistic data. While, for instance, multidimensional scaling and unrooted phenograms (or NeighborNets) have primarily been designed for exploratory purposes, the author argues that they are in fact regularly used to put linguistic assumptions or hypotheses to the test. Cluster goodness (in terms of internal coherence and external distance from other clusters) in such approaches are typically evaluated based on a two-dimensional visualization. The author compares the affordances and limitations of visual inspection with a quantitative set of metrics that directly relates to visual displays but adds a degree of precision not attained by the human eye. The empirical part of the paper applies both approaches to a study of concessive constructions in six varieties of English, based on spoken and written material from the International Corpus of English. The author suggests that the new metrics can be usefully applied to a variety of multidimensional techniques to endow them with a measure of objectivity.
Keywords
- Type
- Chapter
- Information
- Data and Methods in Corpus LinguisticsComparative Approaches, pp. 259 - 288Publisher: Cambridge University PressPrint publication year: 2022
References
Further Reading
References
- 1
- Cited by