Article contents
The automatic identification of lexical variation between language varieties
Published online by Cambridge University Press: 11 October 2010
Abstract
Languages are not uniform. Speakers of different language varieties use certain words differently – more or less frequently, or with different meanings. We argue that distributional semantics is the ideal framework for the investigation of such lexical variation. We address two research questions and present our analysis of the lexical variation between Belgian Dutch and Netherlandic Dutch. The first question involves a classic application of distributional models: the automatic retrieval of synonyms. We use corpora of two different language varieties to identify the Netherlandic Dutch synonyms for a set of typically Belgian words. Second, we address the problem of automatically identifying words that are typical of a given lect, either because of their high frequency or because of their divergent meaning. Overall, we show that distributional models are able to identify more lectal markers than traditional keyword methods. Distributional models also have a bias towards a different type of variation. In summary, our results demonstrate how distributional semantics can help research in variational linguistics, with possible future applications in lexicography or terminology extraction.
- Type
- Papers
- Information
- Natural Language Engineering , Volume 16 , Special Issue 4: Distributional Lexical Semantics , October 2010 , pp. 469 - 491
- Copyright
- Copyright © Cambridge University Press 2010
References
- 15
- Cited by