Hostname: page-component-586b7cd67f-vdxz6 Total loading time: 0 Render date: 2024-11-25T17:00:24.873Z Has data issue: false hasContentIssue false

Distributions of cognates in Europe as based on Levenshtein distance*

Published online by Cambridge University Press:  11 August 2011

JOB SCHEPENS*
Affiliation:
Donders Institute for Brain, Cognition and Behaviour, Donders Centre for Cognition, Radboud University Nijmegen, The Netherlands
TON DIJKSTRA*
Affiliation:
Donders Institute for Brain, Cognition and Behaviour, Donders Centre for Cognition, Radboud University Nijmegen, The Netherlands
FRANC GROOTJEN
Affiliation:
Donders Institute for Brain, Cognition and Behaviour, Donders Centre for Cognition, Radboud University Nijmegen, The Netherlands
*
Address for correspondence: Job Schepens/Ton Dijkstra, Donders Centre for Cognition, Radboud University Nijmegen, P.O. Box 9104, 6500 HE Nijmegen, The Netherlands[email protected]
Address for correspondence: Job Schepens/Ton Dijkstra, Donders Centre for Cognition, Radboud University Nijmegen, P.O. Box 9104, 6500 HE Nijmegen, The Netherlands[email protected]

Abstract

Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.

Type
Research Notes
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

In our study, we used the standard input–output functions of the following translation database: Euroglot professional 5.0 (2008), developed by Linguistic Systems B.V. We are grateful to Walter van Heuven, Gerard Kempen, Frank Leoné, Steven Rekké, Bastiaan du Pau, and two anonymous reviewers for their thoughtful comments on an earlier version of this paper.

References

Caramazza, A., & Brones, I. (1979). Lexical access in bilinguals. Bulletin of the Psychonomic Society, 13 (4), 212214.CrossRefGoogle Scholar
Ceram, C. W. (1966). Enge Schlucht und Schwarzer Berg: Entdeckung des Hethiter-Reiches. Reinbek: Rororo Taschenbuch Ausgabe.Google Scholar
Costa, A., Caramazza, A., & Sebastián-Gallés, N. (2000). The cognate facilitation effect: Implications for models of lexical access. Journal of Experimental Psychology: Learning, Memory and Cognition, 26 (5), 12831296.Google Scholar
Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In Dornic, S. (ed.), Attention and performance VI, pp. 535555. Hillsdale, NJ: Erlbaum.Google Scholar
Davis, Ch., Sánchez-Casas, R., García-Albea, J., Guasch, M., Molero, M., & Ferré, P. (2010). Masked translation priming: Varying language experience and word type with Spanish–English bilinguals. Bilingualism: Language and Cognition, 13, 137155.CrossRefGoogle Scholar
Dijkstra, A. (2005). Bilingual visual word recognition and lexical access. In Kroll, J. F. & De Groot, A. (eds.), Handbook of bilingualism: Psycholinguistic approaches, pp. 178201. Oxford: Oxford University Press.Google Scholar
Dijkstra, A., Grainger, J., & Van Heuven, W. J. B. (1999). Recognition of cognates and interlingual homographs: The neglected role of phonology. Journal of Memory and Language, 41, 496518.CrossRefGoogle Scholar
Dijkstra, A., Miwa, K., Brummelhuis, B., Sappelli, M., & Baayen, H. (2010). How cross-language similarity and task demands affect cognate recognition. Journal of Memory and Language, 62, 284301.Google Scholar
Duyck, W., Van Assche, E., Drieghe, D., & Hartsuiker, R. J. (2007). Visual word recognition by bilinguals in a sentence context: Evidence for nonselective access. Journal of Experimental Psychology: Learning, Memory and Cognition, 33 (4), 663679.Google Scholar
Friel, B., & Kennison, S. (2001). Identifying German–English cognates, false cognates, and non-cognates: Methodological issues and descriptive norms. Bilingualism: Language and Cognition, 4 (3), 249274.Google Scholar
Gooskens, C., & Heeringa, W. (2004). Perceptive evaluation of Levenshtein dialect distance-measurements using Norwegian dialect data. Language Variation and Change, 16 (3), 189208.Google Scholar
Gray, R., & Atkinson, Q. (2003). Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature, 426 (27), 435439.CrossRefGoogle ScholarPubMed
Hamming, R. W. (1950). Error detecting and correcting codes. The Bell System Technical Journal, 22 (2), 147160.Google Scholar
Heeringa, W. (2004). Measuring dialect pronunciation differences using Levenshtein distance. Ph.D. dissertation, University of Groningen.Google Scholar
Hrozný, F. (1915). Die Lösung des hethitischen Problems. Ein vorlaüfiger Bericht. Mitteilungen der Deutschen Orient Gesellschaft, 56, 1750.Google Scholar
Kondrak, G., & Sherif, T. (2006). Evaluation of several phonetic similarity algorithms on the task of cognate identification. Proceedings of the Workshop on Linguistic Distances Sydney: Association of Computational Linguistics, pp. 43–50. [ACL Anthology Network, archive at http://aclweb.org/anthology-new/.]CrossRefGoogle Scholar
Kroll, J. F., Stewart, E. (1994). Category interference in translation and picture naming – Evidence for asymmetric connections between bilingual memory representations. Journal of Memory and Language, 33 (2), 149–147.CrossRefGoogle Scholar
Lemhöfer, K., Dijkstra, T., & Michel, M. C. (2004). Three languages, one ECHO: Cognate effects in trilingual word recognition. Language and Cognitive Processes, 19 (5), 585611.CrossRefGoogle Scholar
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10 (8), 707710. [Russian original (1965) in Doklady Akademii Nauk SSSR, 163 (4), 845–848.]Google Scholar
Mackay, W., & Kondrak, G. (2005). Computing word similarity and identifying cognates with pair hidden Markov models. Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL), pp. 40–47. Ann Arbor, MI. [ACL Anthology Network, archive at http://acl.ldc.upenn.edu/W/W05.]CrossRefGoogle Scholar
Marian, V., & Spivey, M. (2003). Competing activation in bilingual language processing. Bilingualism: Language and Cognition, 6 (2), 97115.CrossRefGoogle Scholar
Pagel, M., Atkinson, Q., & Meade, A. (2007). Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature, 449 (11), 717720.Google Scholar
Schwartz, A. I., & Kroll, J. F. (2006). Bilingual lexical activation in sentence context. Journal of Memory and Language, 55, 197212.Google Scholar
Tokowicz, N., Kroll, J. F., De Groot, A. M. B., & Van Hell, J. G. (2002). Number of translation norms for Dutch–English translation pairs: A new tool for examining language production. Behavior Research Methods, Instruments, & Computers, 34 (3), 435451.CrossRefGoogle ScholarPubMed
Van Hell, J. G., & De Groot, A. M. B. (1998). Disentangling context availability and concreteness in lexical decision and word translation. Quarterly Journal of Experimental Psychology Section a – Human Experimental Psychology, 51 (1), 4163.CrossRefGoogle Scholar
Van Hell, J. G., & De Groot, A. M. B. (2008). Sentence context modulates visual word recognition and translation in bilinguals. Acta Psychologica, 128 (3), 431451.Google Scholar
Van Hell, J. G., & Dijkstra, T. (2002). Foreign language knowledge can influence native language performance in exclusively native contexts. Psychonomic Bulletin & Review, 9 (4), 780789.Google Scholar
Van Orden, G. C. (1987). A rows is a rose. Memory & Cognition, 15, 181198.CrossRefGoogle ScholarPubMed
Voga, M., & Grainger, J. (2007). Cognate status and cross-script translation priming. Memory & Cognition, 35 (5), 938952.CrossRefGoogle ScholarPubMed
Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart's N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15 (5), 971979.CrossRefGoogle Scholar
Supplementary material: PDF

Dijkstra Supplementary Material

Dijkstra Supplementary Material

Download Dijkstra Supplementary Material(PDF)
PDF 84.5 KB