Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-26T11:56:53.608Z Has data issue: false hasContentIssue false

Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue

Published online by Cambridge University Press:  17 April 2019

Nikola Ljubešić*
Affiliation:
Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia Department of Information and Communication Science, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Maja Miličević Petrović
Affiliation:
Department of General Linguistics, Faculty of Philology, University of Belgrade, Belgrade, Serbia
Tanja Samardžić
Affiliation:
Language and Space Lab, University of Zürich, Zürich, Switzerland
*
*Address for correspondence: Nikola Ljubešić, Jožef Stefan Institute, Ljubljana, Slovenia; University of Zagreb, Zagreb, Croatia, [email protected]

Abstract

In this paper we deal with the spatial distribution of 16 linguistic features known to vary between Bosnian, Croatian, Montenegrin, and Serbian. We perform our analyses on a dataset of geo-encoded Twitter status messages collected in the period from mid-2013 to the end of 2016. We perform two types of analyses. The first one finds boundaries in the spatial distribution of the linguistic variable levels through the kernel density estimation smoothing technique. These boundaries are then plotted over the state borders for a visual comparison. The second analysis deals with linguistic distance between the states. The groupings of linguistic variables and countries are calculated given the state borders and the Jensen-Shannon divergence between distributions of the 16 variables within each state. This analysis is completed with a measure of variable consistency for each country. These analyses are intended to show the extent to which current state borders correspond to linguistic boundaries. They suggest that Croatia and Serbia still represent the two extremes, reflecting a history of normative divergences, while Bosnia-Herzegovina and Montenegro, depending on the variable, lean to one or the other side.

Type
Articles
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alexander, Ronelle. 2013. Language and identity: The fate of Serbo-Croatian. In Roumen Daskalov and Tchavdar Marinov (eds.), Entangled histories of the Balkans. Volume 1: National ideologies and language policies, 341417. Leiden & Boston: Brill.Google Scholar
Barić, Eugenija, Lončarić, Mijo, Malic, Dragicá, Pavešić, Slavko, Peti, Mirko, Zečević, Vesna & Znika, Marija. 1997. Hrvatska gramatika, 2nd edn. Zagreb: Školska knjiga.Google Scholar
Bart, Gabriela, Glaser, Elvira, Sibler, Pius & Weibel, Robert. 2013. Analysis of Swiss German syntactic variants using spatial statistics. In Xosé Afonso Álvarez Pérez, Ernestina Carrilho & Catarina Magro (eds.), Current approaches to limits and areas in dialectology, 143169. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar
Bekavac, Božo, Seljan, Sanja & Simeon, Ivana. 2008. Corpus-based comparison of contemporary Croatian, Serbian and Bosnian. In Marko Tadić, Mila Dimitrova-Vulchanova & Svetla Koeva (eds.), Proceedings of the Sixth International Conference “Formal approaches to South Slavic and Balkan languages” (FASSBL 6), 33–39. Zagreb: Croatian Language Technologies Society & Faculty of Humanities and Social Sciences.Google Scholar
Britain, David. 2002. Dialectology. In David Bickerton (ed.), A web guide to teaching and learning in languages, linguistics and area studies. Southampton: Subject Centre for Languages, Linguistics and Area Studies. http://www.llas.ac.uk/resources/gpg/964 [Updated January 2005].Google Scholar
Browne, Wayles & Alt, Theresa. 2004. A handbook of Bosnian, Serbian, and Croatian. http://www.seelrc.org:8080/grammar/mainframe.jsp?nLanguageID=1 (29 October, 2017).Google Scholar
Chambers, J.K. & Trudgill, Peter. 1998. Dialectology, 2nd edn. Cambridge: Cambridge University Press.Google Scholar
Čedić, Ibrahim. 2001. Bosanskohercegovački standardnojezički izraz – bosanski jezik. In Svein Mønnesland (ed.), Jezik i demokratizacija, 69–77. Sarajevo: Institut za jezik. Reprinted in Branko Tošović & Arno Wonisch (eds.). 2009. Bošnjački pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, 41–50. Graz & Sarajevo: Institut für Slawistik der Karl-Franzens-Universität Graz & Institut za jezik Sarajevo.Google Scholar
Čirgić, Adnan, Pranjković, Ivo & Silić, Josip. 2010. Gramatika crnogorskoga jezika. Podgorica: Ministarstvo prosvjete i nauke Crne Gore.Google Scholar
Doyle, Gabriel. 2014. Mapping dialectal variation by querying social media. In Proceedings of the 14th Conference of the European chapter of the Association for Computational Linguistics, 98–106. Gothenburg: Association for Computational Linguistics.Google Scholar
Dražić, Jasmina & Vojinović, Jelena. 2009. Imenice tipa nomina agentis u srpskom i hrvatskom jeziku (tvorbeni i semantički aspekt). In Branko Tošović (ed.), Die Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen. Lexik – Wortbildung – Phraseologie, 311–320. Berlin-Münster-Wien-Zürich-London: LIT Verlag. Reprinted in Branko Tošović & Arno Wonisch (eds). 2010. Srpski pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, Book I/2, 41–50. Graz & Belgrade: Institut für Slawistik der Karl-Franzens-Universität Graz & Beogradska knjiga.Google Scholar
Eisenstein, Jacob, O’Connor, Brendan, Smith, Noah A. & Xing, Eric P.. 2010. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1277–1287. Cambridge, MA: Association for Computational Linguistics.Google Scholar
Eisenstein, Jacob, Smith, Noah A. & Xing, Eric P.. 2011. Discovering sociolinguistic associations with structured sparsity. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human language technologies, 1365–1374. Portland: Association for Computational Linguistics.Google Scholar
Eisenstein, Jacob, O’Connor, Brendan, Smith, Noah A. & Xing, Eric P.. 2014. Diffusion of lexical change in social media. PloS ONE 9(11). e113114. https://doi.org/10.1371/journal.pone.0113114 Google Scholar
Fišer, Darja, Erjavec, Tomaž, Ljubešić, Nikola & Miličević, Maja. 2015. Comparing the nonstandard language of Slovene, Croatian and Serbian tweets. In Mojca Smolej (ed.), Simpozij Obdobja 34. Slovnica in slovar - aktualni jezikovni opis, Part 1, 225231. Ljubljana: Filozofska fakulteta.Google Scholar
Glaser, Elvira. 2013. Area formation in morphosyntax. In Peter Auer, Martin Hilpert, Anja Stukenbrock & Benedikt Szmrezcsanyi (eds.), Space in language and linguistics: Geographical, interactional and cognitive perspectives (linguae & litterae 24), 195–221. Berlin & Boston: De Gruyter.Google Scholar
Goebl, Hans. 1982. Dialektometrie: Prinzipien und methoden des einsatzes der numerischen taxonomie im bereich der dialektgeographie. Wien: Osterreichischen Akademie der Wissenschaften.Google Scholar
Goebl, Hans. 1984. Dialektometrische Studien: Anhand italoromanischer, riitoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. 3 Vol. Tübingen: Max Niemeyer.Google Scholar
Gonçalves, Bruno & Sánchez, David. 2014. Crowdsourcing dialect characterization through Twitter. PLoS ONE 9(11): e112074. https://doi.org/10.1371/journal.pone.0112074 Google Scholar
Halilović, Senahid. 2004. Pravopis bosanskoga jezika za osnovne i srednje škole. Zenica: Dom štampe.Google Scholar
Hornsby, David. 2009. Dedialectalization in France: Convergence and divergence. International Journal of the Sociology of Language 196(97). 157180.Google Scholar
Hudeček, Lana & Vukojevic, Luká. 2007. Da li, je li i li – normativni status i raspodjela. Rasprave 33. 217234.Google Scholar
Ivić, Pavle. 1956. Dijalektologija srpskohrvatskog jezika. Uvod i štokavsko narečje. Novi Sad: Matica srpska.Google Scholar
Jahić, Dževad, Halilović, Senahid & Palić, Ismail. 2000. Gramatika bosanskoga jezika. Zenica: Dom štampe.Google Scholar
Kortmann, Bernd & Wagner, Susanne. 2005. The Freiburg English dialect project and corpus. In Bernd Kortmann, Tanja Herrmann, Lukas Pietsch & Susane Wagner (eds.), A Comparative Grammar of British English Dialects: Agreement, Gender, Relative Clauses, 120. Berlin & New York: Mouton de Gruyter.Google Scholar
Kovačić, Marko. 2005. Serbian and Croatian: One language or languages? Jezikoslovlje 6. 195204.Google Scholar
Labov, William. 1963. The social motivation of a sound change. Word 19. 273309.Google Scholar
Ljubešić, Nikola, Mikelić, Nives & Boras, Damir. Language identification: How to distinguish similar languages? In Proceedings of the 29th International Conference on Information Technology Interfaces ITI 2007, 541–546. Cavtat, Croatia.Google Scholar
Ljubešić, Nikola, Fišer, Darja & Erjavec, Tomaž. 2014. TweetCaT: A tool for building Twitter corpora of smaller languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), 2279–2283. Reykjavik, Iceland.Google Scholar
Ljubešić, Nikola & Kranjčić, Denis. 2015. Discriminating between closely related languages on Twitter. Informatica 39(1). 18.Google Scholar
Ljubešić, Nikola, Klubička, Filip, Agić, Željko & Jazbec, Ivo-Pavao. 2016. New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 23–28. Paris: European Language Resources Association (ELRA).Google Scholar
Ljubešić, Nikola, Samardžić, Tanja & Derungs, Curdin. 2016. TweetGeo – A tool for collecting, processing and analyzing geo-encoded linguistic data. In Yuji Matsumoto & Rashmi Prasad (eds.), Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 3412–3421. Osaka: The COLING 2016 Organizing Committee.Google Scholar
Miličević, Maja, Ljubešić, Nikola & Fišer, Darja. 2017. Birds of a feather don’t quite tweet together: An analysis of spelling variation in Slovene, Croatian and Serbian twitterese. In Darja Fišer & Michael Beißwenger (eds.), Investigating computer-mediated communication: Corpus-based approaches to language in the digital world, 1443. Ljubljana: Scientific Publishing House of the Faculty of Arts, University of Ljubljana.Google Scholar
Miličević, Maja & Ljubešić, Nikola. 2016. Tviterasi, tviteraši or twitteraši? Producing and analyzing a normalized dataset of Croatian and Serbian tweets. Slovenščina 2.0 4. 156–188.Google Scholar
Nerbonne, John, Heeringa, Wilbert, Erik van den Hout, E, van der Kooi, Peter, Otten, Simone & van de Vis, Willem. 1995. Phonetic distance between Dutch dialects. In Gert Durieux, Walter Daelemans & Steven Gillis (eds.), CLIN VI: Proceedings from the Sixth CLIN Meeting, 185–202. Antwerpen: Center for Dutch Language and Speech, University of Antwerpen (UIA).Google Scholar
Nerbonne, John, Heeringa, Wilbert & Kleiweg, Peter. 1999. Edit distance and dialect proximity. In David Sankoff & Joseph Kruskal (eds.), Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, 2nd edn., 515. Stanford: CSLI.Google Scholar
Nguyen, Dong, Smith, Noah & Rosé, Carolyn. 2011. Author age prediction from text using linear regression. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 115–123. Portland: Association for Computational Linguistics.Google Scholar
Perović, Milenko A., Silić, Josip & Vasiljeva, Ljudmila. 2009. Pravopis crnogorskoga jezika i rječnik crnogorskoga jezika (pravopisni rječnik). Podgorica: Ministarstvo prosvjete i nauke Crne Gore.Google Scholar
Pešikan, Mitar, Jerković, Jovan & Pižurica, Mato. 2010. Pravopis srpskoga jezika. Novi Sad: Matica srpska.Google Scholar
Petrović, Tanja. 2015. Srbija i njen Jug : “južnjački dijalekti” između jezika, kulture i politike. Beograd: Fabrika knjiga.Google Scholar
Pichler, Heike & Hesson, Ashley. 2016. Discourse-pragmatic variation across situations, varieties, ages: I DON’T KNOW in sociolinguistic and medical interviews. Language & Communication 49. 118.Google Scholar
Piper, Predrag. 2009. O prirodi gramatičkih razlika između srpskog i hrvatskog jezika. In Predrag Piper (ed.), Južnoslovenski jezici: gramatičke strukture i funkcije, 537552. Beograd: Beogradska knjiga.Google Scholar
Pranjković, Ivo. 1997. Hrvatski standardni jezik i srpski standardni jezik. In Emil Tokarz (ed.), Język wobec przemian kultury, 50–59. Katowice: Wydawnictwo Uniwersytetu Śląskiego. Reprinted in Branko Tošović & Arno Wonisch (eds.). 2012. Hrvatski pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, Book II, 408–417. Graz & Zagreb: Institut für Slawistik der Karl-Franzens-Universität Graz & Izvori.Google Scholar
Scheffler, Tatjana, Gontrum, Johannes, Wegel, Matthias & Wendler, Steve. 2014. Mapping German tweets to geographic regions. In Proceedings of the NLP4CMC Workshop at Konvens, 2634. Bochum: Bochumer Linguistische Arbeitsberichte.Google Scholar
Séguy, Jean. 1971. La relation entre la distance spatiale et la distance lexicale. Revue de linguistique romane 35. 335357.Google Scholar
Silić, Josip. 2008. Fonetsko-fonološke i ortografsko-ortoepske razlike između bosanskoga (bošnjačkoga), hrvatskoga i srpskoga jezika. In Branko Tošović (ed.). Die Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen, 266–274. Berlin-Münster-Wien-Zürich-London: LIT Verlag. Reprinted in Branko Tošović & Arno Wonisch (eds.). 2010. Hrvatski pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, Book I, 87–98. Graz & Zagreb: Institut für Slawistik der Karl-Franzens-Universität Graz & Izvori.Google Scholar
Speelman, Dirk, Grondelaers, Stefan & Geeraerts, Dirk. 2003. Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities 37(3). 317317.Google Scholar
Stanojčić, Živojin & Popović, Ljubomir. 2008. Gramatika srpskog jezika za gimnazije i srednje škole. Beograd: Zavod za udžbenike.Google Scholar
Stevanović, Mihailo. 1989. Savremeni srpskohrvatski jezik. Beograd: Naučna knjiga.Google Scholar
Szmrecsanyi, Benedikt. 2008. Corpus-based dialectometry: aggregate morphosyntactic variability in British English dialects. International Journal of Humanities and Arts Computing 2(1/2) (special issue; John Nerbonne, Charlotte Gooskens, Sebastian Kürschner & Renée van Bezooijen (eds.) Language Variation). 279–296.Google Scholar
Šehović, Amela. 2009. Mocioni sufiksi u bosanskom, hrvatskom i srpskom jeziku (u nomina agentis et professionis). In Branko Tošović & Arno Wonisch (eds.), Bošnjački pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, 433445. Graz & Sarajevo: Institut für Slawistik der Karl-Franzens-Universität Graz & Institut za jezik Sarajevo.Google Scholar
Špago-Ćumurija, Edina. 2009. Bosnian or Croatian? Sintaksičke razlike u kursevima bosanskog i hrvatskog jezika za strance. In Branko Tošović (ed.), Die Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen. Grammatik, 375–387. Berlin-Münster-Wien-Zürich-London: LIT Verlag. Reprinted in Branko Tošović & Arno Wonisch (eds.). 2009. Bošnjački pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, 273–292. Graz & Sarajevo: Institut für Slawistik der Karl-Franzens-Universität Graz & Institut za jezik Sarajevo.Google Scholar
Tošović, Branko. 2008. Gramatičke razlike između srpskog, hrvatskog i bošnjačkog jezika (preliminarium). In Tilman Berger & Biljana Golubović (eds.), Morphologie – Mündlichkeit – Medien: Festschrift für Jochen Raecke, 311–322. Hamburg: Verlag Dr. Kovač. Reprinted in Branko Tošović & Arno Wonisch (eds.). 2010. Srpski pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, Book I/2, 183–200. Graz & Belgrade: Institut für Slawistik der Karl-Franzens-Universität Graz & Beogradska knjiga.Google Scholar
Tošović, Branko. 2009. Die grammatikalischen Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen. In Branko Tošović (ed.), Die Unterschiede zwischen dem Bosnischen/Bosniakischen, Kroatischen und Serbischen. Grammatik, 131–188. Berlin-Münster-Wien-Zürich-London: LIT Verlag. Reprinted in Branko Tošović & Arno Wonisch (eds.). 2010. Srpski pogledi na odnose između srpskog, hrvatskog i bošnjačkog jezika, Book I/2, 237–292. Graz & Belgrade: Institut für Slawistik der Karl-Franzens-Universität Graz & Beogradska knjiga.Google Scholar
Trudgill, Peter. 1974. Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography. Language in Society 3. 215246.Google Scholar
Trudgill, Peter, Gordon, Elizabeth, Lewis, Gillian & MacLagan, Margaret. 2000. Determinism in new-dialect formation and the genesis of New Zealand English. Journal of Linguistics 36(2). 299318.Google Scholar
Wieling, Martijn, Nerbonne, John & Baayen, Harald. 2011. Quantitative social dialectology: Explaining linguistic variation geographically and socially. PLoS ONE 6(9). e23613. doi:10.1371/journal.pone.0023613Google Scholar
Woolhiser, Curt. 2005. Political borders and dialect divergence/convergence in Europe. In Peter Auer, Frans Hinskens & Paul Kerswill (eds.), Dialect Change. Convergence and Divergence in European Languages, 236262. New York: Cambridge University Press.Google Scholar