Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction: goals and methods of the corpus-based approach
- Part I Investigating the use of language features
- Part II Investigating the characteristics of varietie
- Part III Summing up and looking ahead
- Part IV Methodology boxes
- 1 Issues in corpus design
- 2 Issues in diachronic corpus design
- 3 Concordancing packages versus programming for corpus analysis
- 4 Characteristics of tagged corpora
- 5 The process of tagging
- 6 Norming frequency counts
- 7 Statistical measures of lexical associations
- 8 The unit of analysis in corpus-based studies
- 9 Significance tests and the reporting of statistics
- 10 Factor loadings and dimension scores
- Appendix: commercially available corpora and analytical tools
- References
- Index
2 - Issues in diachronic corpus design
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction: goals and methods of the corpus-based approach
- Part I Investigating the use of language features
- Part II Investigating the characteristics of varietie
- Part III Summing up and looking ahead
- Part IV Methodology boxes
- 1 Issues in corpus design
- 2 Issues in diachronic corpus design
- 3 Concordancing packages versus programming for corpus analysis
- 4 Characteristics of tagged corpora
- 5 The process of tagging
- 6 Norming frequency counts
- 7 Statistical measures of lexical associations
- 8 The unit of analysis in corpus-based studies
- 9 Significance tests and the reporting of statistics
- 10 Factor loadings and dimension scores
- Appendix: commercially available corpora and analytical tools
- References
- Index
Summary
Designing a diachronic corpus can be even more complicated than a synchronic corpus (discussed in Methodology Box 1): in addition to concerns relating to size and register diversity, there is the added parameter of time that must be adequately represented. Further, the universe of available texts is much smaller for earlier historical periods, making it difficult to even assess when a representative sample has been achieved.
When designing either a synchronic or a diachronic corpus, the first step is to determine the intended research purposes. For historical research, those purposes might be as narrow as studying the style of a single author's novels. Designing a representative corpus for this purpose would be relatively straightforward – in fact, it might be reasonable to aim for an exhaustive sampling in this case. However, broader research goals quickly result in much more complicated corpus designs. For example, designing a corpus to study a period style (e.g., early eighteenth-century prose) or a single genre (e.g., the novel) raises serious questions about sampling methods.
At the far extreme of complexity is the multi-purpose diachronic corpus designed to represent a wide range of register diversity across historical periods. The Helsinki Corpus and the ARCHER Corpus were both designed for these purposes; the Helsinki Corpus covers the period from c. 750 to c. 1700; and the ARCHER Corpus covers the period from 1650 to the present.
- Type
- Chapter
- Information
- Corpus LinguisticsInvestigating Language Structure and Use, pp. 251 - 253Publisher: Cambridge University PressPrint publication year: 1998
- 1
- Cited by