Published online by Cambridge University Press: 10 July 2002
In their conceptual framework for linguistic literacy development, Ravid & Tolchinsky synthesize research studies from several perspectives. One of these is corpus-based research, which has been used for several large-scale research studies of spoken and written registers over the past 20 years. In this approach, a large, principled collection of natural texts (a ‘corpus’) is analysed using computational and interactive techniques, to identify the salient linguistic characteristics of each register or text variety. Three characteristics of corpus-based analysis are particularly important (see Biber, Conrad & Reppen 1998):
[bull ] a special concern for the representativeness of the text sample being analysed, and for the generalizability of findings;
[bull ] overt recognition of the interactions among linguistic features: the ways in which features co-occur and alternate;
[bull ] a focus on register as the most important parameter of linguistic variation: strong patterns of use in one register often represent only weak patterns in other registers.