Book contents
- English Corpus Linguistics
- English Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Preface
- Acknowledgments
- 1 The Empirical Study of Language
- 2 Planning the Construction of a Corpus
- 3 Building and Annotating a Corpus
- 4 Analyzing a Corpus
- Concluding Remarks
- Discussion Topics
- Appendix: Corpora
- Bibliography
- Index
2 - Planning the Construction of a Corpus
Published online by Cambridge University Press: 15 June 2023
- English Corpus Linguistics
- English Corpus Linguistics
- Copyright page
- Contents
- Figures
- Tables
- Preface
- Acknowledgments
- 1 The Empirical Study of Language
- 2 Planning the Construction of a Corpus
- 3 Building and Annotating a Corpus
- 4 Analyzing a Corpus
- Concluding Remarks
- Discussion Topics
- Appendix: Corpora
- Bibliography
- Index
Summary
This chapter describes both the process of creating a corpus as well as the methodological considerations that guide this process. It opens with a detailed discussion of the planning that went into the building of four different types of corpora: the British National Corpus (BNC), the Corpus of Contemporary American English (COCA), the Corpus of Early English Correspondence (CEEC), and the International Corpus of Learner English (ICLE). The structure of each of these corpora is also discussed: their length, the genres that they contain (e.g prose fiction, press reportage, blogs, spontaneous conversations, scripted speech), and other pertinent information. Subsequent sections discuss other topics relevant to building a corpus, such as defining exactly what a corpus is (can the web be considered a corpus?); determining the appropriate size of a corpus and the length of particular texts that the corpus will contain (complete texts versus shorter samples from each text, e.g. 2,000 words); selecting the particular genres be included a corpus (e.g. press reportage, technical writing, spontaneous conversations, scripted speech); and insuring that the writers or speakers whose speech or writing is included are balanced for such issues as gender, ethnicity, and age.
Keywords
- Type
- Chapter
- Information
- English Corpus LinguisticsAn Introduction, pp. 42 - 76Publisher: Cambridge University PressPrint publication year: 2023