Book contents
- Designing and Evaluating Language Corpora
- Designing and Evaluating Language Corpora
- Copyright page
- Contents
- Figures
- Tables
- Acknowledgments
- 1 Introduction
- 2 Approaches to Representativeness in Previous Corpus Linguistic Research
- 3 Corpus Representativeness
- 4 Domain Considerations
- 5 Distribution Considerations
- 6 The Influence of Domain and Distribution Considerations on Corpus Representativeness
- 7 Corpus Design and Representativeness in Practice – With Daniel Keller
- Glossary
- Book part
- References
- Index
3 - Corpus Representativeness
A Conceptual and Methodological Framework
Published online by Cambridge University Press: 07 April 2022
- Designing and Evaluating Language Corpora
- Designing and Evaluating Language Corpora
- Copyright page
- Contents
- Figures
- Tables
- Acknowledgments
- 1 Introduction
- 2 Approaches to Representativeness in Previous Corpus Linguistic Research
- 3 Corpus Representativeness
- 4 Domain Considerations
- 5 Distribution Considerations
- 6 The Influence of Domain and Distribution Considerations on Corpus Representativeness
- 7 Corpus Design and Representativeness in Practice – With Daniel Keller
- Glossary
- Book part
- References
- Index
Summary
As discussed in Chapter 1, corpus representativeness depends on two sets of considerations: domain considerations and distribution considerations. Domain considerations focus on describing the arena of language use, and operationally specifying a set of texts that could potentially be included in the corpus. The linguistic research goal, which involves both a linguistic feature and a discourse domain of interest, forms the foundation of corpus representativeness. Representativeness cannot be designed for or evaluated outside of the context of a specific linguistic research goal. Linguistic parameter estimation is the use of corpus-based data to approximate quantitative information about linguistic distributions in the domain. Domain considerations focus on what should be included in a corpus, based on qualitative characteristics of the domain. Distribution considerations focus on how many texts should be included in a corpus, relative to the variation of the linguistic features of interest. Corpus representativeness is not a dichotomy (representative or not representative), but rather is a continuous construct. A corpus may be representative to a certain extent, in particular ways, and for particular purposes.
Keywords
- Type
- Chapter
- Information
- Designing and Evaluating Language CorporaA Practical Framework for Corpus Representativeness, pp. 52 - 67Publisher: Cambridge University PressPrint publication year: 2022