Recognising Groups among Dialects

9 - Recognising Groups among Dialects

Published online by Cambridge University Press: 12 September 2012

Jelena Prokić and

John Nerbonne

Edited by

John Nerbonne ,

Charlotte Gooskens ,

Sebastian Kürschner and

Renée van Bezooijen

Show author details

Jelena Prokić: Affiliation:
University of Groningen
John Nerbonne: Affiliation:
University of Groningen
John Nerbonne: Affiliation:
University of Groningen
Charlotte Gooskens: Affiliation:
University of Groningen
Sebastian Kürschner: Affiliation:
Friedrich-Alexander-Universität Erlangen-Nürnberg
Renée van Bezooijen: Affiliation:
University of Groningen

Book contents

Get access

Summary

Abstract In this paper we apply various clustering algorithms to the dialect pronunciation data. At the same time we propose several evaluation techniques that should be used in order to deal with the instability of the clustering techniques. The results have shown that three hierarchical clustering algorithms are not suitable for the data we are working with. The rest of the tested algorithms have successfully detected two-way split of the data into the Eastern and Western dialects. At the aggregate level that we used in this research, no further division of sites can be asserted with high confidence.

INTRODUCTION

Dialectometry is a multidisciplinary field that uses various quantitative methods in the analysis of dialect data. Very often those techniques include classification algorithms such as hierarchical clustering algorithms used to detect groups within certain dialect area. Although known for their instability (Jain and Dubes, 1988), clustering algorithms are often applied without evaluation (Goebl, 2007; Nerbonne and Siedle, 2005) or with only partial evaluation (Moisl and Jones, 2005). Very small differences in the input data can produce substantially different grouping of dialects (Nerbonne et al., 2008). Without proper evaluation, it is very hard to determine if the results of the applied clustering technique are an artifact of the algorithm or the detection of real groups in the data.

The aim of this paper is to evaluate algorithms used to detect groups among language dialect varieties measured at the aggregate level. The data used in this research is dialect pronunciation data that consists of various pronunciations of 156 words collected all over Bulgaria.

Type: Chapter
Information: Computing and Language Variation
International Journal of Humanities and Arts Computing Volume 2
, pp. 153 - 172

Publisher: Edinburgh University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

9 - Recognising Groups among Dialects

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive