Skip to main content Accessibility help
×
Hostname: page-component-7479d7b7d-qs9v7 Total loading time: 0 Render date: 2024-07-15T20:49:56.162Z Has data issue: false hasContentIssue false

18 - Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables

Published online by Cambridge University Press:  05 June 2014

Heikki Mannila
Affiliation:
Department of Information and Computer Science, Aalto University, Finland
Terttu Nevalainen
Affiliation:
Department of Modern Languages, University of Helsinki, Finland
Helena Raumolin-Brunberg
Affiliation:
Department of Modern Languages, University of Helsinki, Finland
Manfred Krug
Affiliation:
Otto-Friedrich-Universität Bamberg, Germany
Julia Schlüter
Affiliation:
Otto-Friedrich-Universität Bamberg, Germany
Get access

Summary

Introduction

The work we report in this chapter began with the aim of finding techniques to minimize the problems that arise from small data samples in fields such as historical sociolinguistics. However, the solutions we propose are not limited to historical sociolinguistics, but are applicable to quantitative sociolinguistic and corpus studies in general. Establishing the frequency of given linguistic forms is a crucial issue in studying differences in linguistic usage between populations or points in time. In its simplest form, the question can be posed as follows: suppose there are two alternative forms, A and B, of a linguistic variable – alternative pronunciations, words or phrases meaning the same, functionally equivalent grammatical structures – what is the frequency of use of each? The basic questions we address include the use of aggregate data and its relation to individual variation when individuals contribute different amounts of data to the aggregate.

The other problem we discuss is similarly a fundamental one: what is the minimum sample size – number of speakers, writers or texts, depending on the research topic – that is required to yield consistent results for a given linguistic variable? For a historical sociolinguist using a public corpus, this may be a question of a scarcity of data due to a high rate of illiteracy in a particular period. For sociolinguists who have to elicit their interview data, it is an issue of research economy. In Tagliamonte’s words (2006: 33): ‘The size of the sample must necessarily be balanced with the available time and resources for data handling.’ Looking back at 40 years of sociolinguistic research, Labov (2006 [1st edn. 1966]: 400–401) notes that the analysis of the stratification by age, gender and social class of a given city has usually required 60–100 speakers. Without introducing any testing of sample size, he considers the 120 speakers used in a Montreal study to be ideal, although he emphasizes the care with which the sampling was designed.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Hinneburg, Alexander, Mannila, Heikki, Kaislaniemi, Samuli, Nevalainen, Terttu and Raumolin-Brunberg, Helena 2007. ‘How to handle small samples: bootstrap and Bayesian methods in the analysis of linguistic change’, Literary and Linguistic Computing 22: 137–150.CrossRefGoogle Scholar
Nevalainen, Terttu, Raumolin-Brunberg, Helena and Mannila, Heikki 2011. ‘The diffusion of language change in real time: progressive and conservative individuals and the time-depth of change’, Language Variation and Change 23(1): 1–43.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×