Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction: goals and methods of the corpus-based approach
- Part I Investigating the use of language features
- Part II Investigating the characteristics of varietie
- Part III Summing up and looking ahead
- Part IV Methodology boxes
- 1 Issues in corpus design
- 2 Issues in diachronic corpus design
- 3 Concordancing packages versus programming for corpus analysis
- 4 Characteristics of tagged corpora
- 5 The process of tagging
- 6 Norming frequency counts
- 7 Statistical measures of lexical associations
- 8 The unit of analysis in corpus-based studies
- 9 Significance tests and the reporting of statistics
- 10 Factor loadings and dimension scores
- Appendix: commercially available corpora and analytical tools
- References
- Index
6 - Norming frequency counts
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction: goals and methods of the corpus-based approach
- Part I Investigating the use of language features
- Part II Investigating the characteristics of varietie
- Part III Summing up and looking ahead
- Part IV Methodology boxes
- 1 Issues in corpus design
- 2 Issues in diachronic corpus design
- 3 Concordancing packages versus programming for corpus analysis
- 4 Characteristics of tagged corpora
- 5 The process of tagging
- 6 Norming frequency counts
- 7 Statistical measures of lexical associations
- 8 The unit of analysis in corpus-based studies
- 9 Significance tests and the reporting of statistics
- 10 Factor loadings and dimension scores
- Appendix: commercially available corpora and analytical tools
- References
- Index
Summary
When corpus-based studies examine the frequency of features across texts and registers, it is important to make sure that the counts are comparable. In particular, if the texts in a corpus are not all the same length, then frequency counts from those texts are not directly comparable. For example, imagine that you analyzed two texts and found that each one has 20 modal verbs. It might be tempting to conclude that modals are equally common in the texts. However, further imagine that the first text has a total length of 750 words, and the second text is 1,200 words long – in this case, your conclusion would be wrong. Because the second text is longer, there are more opportunities for modals to occur, and therefore simply comparing the raw counts does not give an accurate account of the relative frequencies of modals in the two texts.
“Normalization” is a way to adjust raw frequency counts from texts of different lengths so that they can be compared accurately. The total number of words in each text must be taken into consideration when norming frequency counts. Specifically, the raw frequency count should be divided by the number of words in the text, and then multiplied by whatever basis is chosen for norming.
- Type
- Chapter
- Information
- Corpus LinguisticsInvestigating Language Structure and Use, pp. 263 - 264Publisher: Cambridge University PressPrint publication year: 1998
- 4
- Cited by