Book contents
- Frontmatter
- Contents
- List of figures
- List of tables
- List of contributors
- List of abbreviations
- Introduction. Analysing variation in English: what we know, what we don't, and why it matters
- Part I Investigating variation in English: how do we know what we know?
- 1 Collecting data on phonology
- 2 How to make intuitions succeed: testing methods for analysing syntactic microvariation
- 3 Corpora: capturing language in use
- 4 Hypothesis generation
- 5 Quantifying relations between dialects
- 6 Perceptual dialectology
- Part II Why does it matter? Variation and other fields
- Notes
- References
- Index
3 - Corpora: capturing language in use
from Part I - Investigating variation in English: how do we know what we know?
Published online by Cambridge University Press: 03 May 2011
- Frontmatter
- Contents
- List of figures
- List of tables
- List of contributors
- List of abbreviations
- Introduction. Analysing variation in English: what we know, what we don't, and why it matters
- Part I Investigating variation in English: how do we know what we know?
- 1 Collecting data on phonology
- 2 How to make intuitions succeed: testing methods for analysing syntactic microvariation
- 3 Corpora: capturing language in use
- 4 Hypothesis generation
- 5 Quantifying relations between dialects
- 6 Perceptual dialectology
- Part II Why does it matter? Variation and other fields
- Notes
- References
- Index
Summary
Introduction
Language cannot be invented; it can only be captured.
(Sinclair 1997: 31)The enterprise of investigating language variation is based on access to empirical data – language as actually used by speakers and writers. This is not trivial. We only know what we do about variation in English (or for that matter, in any variety, dialect, register, etc.) through analysis of language in some collection of materials. This collection, ‘the corpus’, is the foundation of everything we do. The data might consist of a collection of letters and diaries, spoken narratives of personal experience, or a compilation of text logs from instant messaging conversations. The materials that provide data for variation studies are diverse, but what unites them is their empirical validity as representations of language in use and, as a consequence, our dependence on them. The simple truth is that we cannot engage in the study of language variation without access to a corpus of data on which to test our hypotheses, base our analyses, and inform our theories, yet this simple truth masks a number of not-so-simple issues. How are corpora constructed? If a corpus contains spoken language, what is the best way to represent the speech in written format? How are corpora accessed and mined? What methods achieve what results? How should the results be interpreted (i.e. what do they mean, what do they tell us?)? This chapter explores these kinds of questions but it intentionally presents few solutions.
- Type
- Chapter
- Information
- Analysing Variation in English , pp. 49 - 71Publisher: Cambridge University PressPrint publication year: 2011
- 6
- Cited by