1. Introduction
Indefinite pronouns, as their name suggests, are pronominal words whose main function is to express indefinite reference (Haspelmath Reference Haspelmath1997:11), such as nothing, someone, anywhere, etc. in English. In Estonian they typically refer to an undefined or unknown object, phenomenon, or characteristic (Erelt, Erelt & Ross Reference Erelt, Erelt and Ross2007:187).
This study focuses on the use of the indefinite pronoun keegi in Estonian dialects, which can have multiple functions depending on the context and polarity of the sentence, and corresponds to the English indefinite pronouns someone, nobody/no one, anybody, etc., as illustrated by the following examples from Standard Estonian.
In Standard Estonian, the indefinite pronouns keegi and miski ‘something, anything, nothing’ are differentiated by what they can refer to: keegi is strictly used to refer to animate entities, while miski refers to inanimate entities (Erelt Reference Erelt2017a:743). However, in some Estonian dialects, this distinction in animacy is not as clear, because keegi can also refer to inanimate entities, as in (5). In this paper, we aim to find out just how common such reference to inanimates is and how it is distributed geographically and functionally.
A similar irregularity exists for the pronoun kes ‘who’ (see Pook Reference Pook2019), but as with kes, the phenomenon is rarely mentioned in previous studies. In fact, only Viikberg (Reference Viikberg2020:174) mentions the possibility of keegi being used to refer to inanimate entities in the Mulgi dialect. Based on our previous research, however, this phenomenon exists in a much wider area than just that one dialect.
In this paper we regard animacy as a binary variable, following Fowler (Reference Fowler1977:16–17) in dividing and classifying as animate beings all those that are capable of initiating action and change and of movement. This means that all humans and animals are categorised as animate and everything else as inanimate. However, it must be acknowledged that typically animacy in language cannot be regarded as a binary variable at all, but rather as a scale from most to least animate. This scale is called the animacy hierarchy, which is presented by Dixon (Reference Dixon1979:85) as follows:
-
1st, 2nd personal pronoun > 3rd personal pronoun > proper name > human noun > non-human animate noun > inanimate noun
For some languages or for some constructions, the distinction between these categories might be more fine-grained (e.g. having 1st and 2nd person as separate categories) or less fine-grained (e.g. only opposing animate to inanimate), but overall it is a universal tendency to grammatically distinguish those categories which are higher in the hierarchy from those which are lower. Higher categories are often treated as more central to the clause structure and are more likely to act as an agent in events (Comrie Reference Comrie1989:185; Croft Reference Croft1990:113; Whaley Reference Whaley1996:172; Kittilä, Västi & Ylikoski Reference Kittilä, Västi and Ylikoski2011:6).
The choice of treating animacy as binary in this paper stems from the nature of the data, which contain spoken texts on topics such as the informant’s personal life, lifestyle, past events, or working methods, and where the marking of pronouns as biologically animate or inanimate was straightforward, i.e. without any borderline cases of animacy. Moreover, since this article studies the animacy of an indefinite pronoun, many of the finer categories in the animacy hierarchy cannot be applied to it at all.
This study has two aims. The first aim is to examine the data acquired from the Corpus of Estonian DialectsFootnote 3 and determine how keegi is used and what functions it fulfils in the dialects. This paper is a needed contribution to the field, as keegi (and most other indefinite pronouns in Estonian) and its use have never been thoroughly described before. As a continuation of previous research (see Pook Reference Pook2019), the main aim of this paper is to study the use of keegi in regard to the animacy of its referent in order to ascertain which dialectal areas allow the variation of referring to both animate and inanimate entities with keegi and which variables influence this variation. The linguistic variables we use in our study help to explain under which conditions the inanimate keegi can be used. Our purpose is therefore to analyse this variation in spoken language and its relation to other relevant variables.
In addition, we aim to find out whether the geographical and morphosyntactic variables that affect the animacy-related use of the interrogative pronoun kes ‘who’, as shown in Pook (Reference Pook2019), are similar for the indefinite pronoun keegi. In a sense, we want to discern whether the reason why keegi may select only animate entities or both animate and inanimate entities is due to its interrogative component kes, which serves as a source of grammaticalisation for indefinite keegi. We expect that the non-selectivity between animate and inanimate referents is spread in the same dialect area for both keegi and kes, and that the choice between the use of animates and inanimates is conditioned at least partially by the same factors. As a working hypothesis we expect that the animacy distinction has less importance in the scope of negation, and consequently the use of keegi referring to inanimates occurs mostly when keegi functions as a negative polarity item.
This paper is structured as follows. In Section 2 we provide a brief overview of Estonian dialects and describe our dataset. In Section 3 we describe the use of Estonian indefinite pronouns and discuss the functions of the pronoun keegi. Section 4.1 explains our annotation system and Section 4.2 describes the statistical methods used in the analysis. Section 5 presents the results of the statistical analysis, while a discussion and our conclusions are included in Section 6.
2. Data
Estonian dialects are traditionally divided into 8–10 dialects and 105–120 subdialects. According to the latest classifications, the North Estonian dialect group includes the Insular, Western, Mid, and Eastern dialects, the Northeastern–Coastal dialect group is composed of the Coastal and Northeastern dialects, and the South Estonian dialect group consists of the Tartu, Mulgi, Võru, and Seto dialects (Pajusalu Reference Pajusalu and Erelt2007:231). This is the division used in the Corpus of Estonian Dialects and therefore also in this study (see Figure 1). It should be mentioned, however, that in earlier classifications the Northeastern and Coastal dialects were regarded as one dialect and the Seto dialect was considered to be a subdialect of Võru (Kask Reference Kask1984). Every dialect is, in addition, divided into subdialects, which are based on the borders of historical parishes.
All the dialects are distinct from contemporary Standard Estonian, which is based on North Estonian but is also a compromise between various dialects, conscientious language planning, and recent influences of contact languages. Northern dialects share the most with Standard Estonian, with up to 58% common features (which include phonetic and grammatical features and core vocabulary) between the Mid dialect and Standard Estonian, while the southern dialects differ the most from Standard Estonian, with the Võru dialect sharing only 18% of common features with Standard Estonian (Pajusalu Reference Pajusalu and Erelt2007:233).
The most significant differences in phonology, morphology, and lexis can be found between the southern and northern dialects, since South Estonian diverged from Proto-Finnic before other Finnic languages (Sammallahti Reference Sammallahti1977; Viitso Reference Viitso1985; Kallio Reference Kallio2012). However, recent dialect studies have found that on a (morpho)syntactic level, the biggest differences are between the eastern and western dialects instead, with the Coastal and Mulgi dialects fitting in with either group depending on the phenomenon studied (Lindström et al. Reference Lindström, Mervi Kalmus, Bakhoff and Pajusalu2009; Uiboaed Reference Uiboaed2013; Uiboaed et al. Reference Uiboaed, Cornelius Hasselblatt, Muischnek and Nerbonne2013; Lindström, Uiboaed & Vihman Reference Lindström, Uiboaed and Vihman2014; Lindström et al. Reference Lindström, Pilvik, Ruutma and Uiboaed2015; Ruutma et al. Reference Ruutma, Kyröläinen, Pilvik and Uiboaed2016; Lindström & Uiboaed Reference Lindström and Uiboaed2017; Lindström, Pilvik & Plado Reference Lindström, Pilvik and Plado2018; Pook Reference Pook2021).
The data used in this study come from the Corpus of Estonian Dialects. The corpus contains authentic dialectal recordings from all dialect areas. The recordings are transcribed phonetically and annotated for morphological features. The speakers are typically older people, who have often lived in the same place their entire life and are therefore a good representation of their home dialect. The conversations cover a range of topics, such as their current lifestyle and family, past events, traditions, and working practices (Lindström, Lippus & Tuisk Reference Lindström, Lippus and Tuisk2019).
This study uses the morphologically annotated texts, from which 1,857 observations of the pronoun keegi were compiled into our dataset. This also includes a few observations of the pronoun kes ‘who’ from the southern dialects, where kes (and its variants) have an indefinite meaning even without the affix -gi, as in (6). It has been claimed that previously the interrogative pronouns in Finno-Ugric languages were used for expressing indefiniteness; the gi-affixed forms are a later development in Finnic languages (Alvre Reference Alvre1986:49). Nowadays, the option to use interrogative pronouns indefinitely has receded from the written language, but can still be found in Votic, Veps, and in some Estonian and Finnish dialects (Alvre Reference Alvre1977:21, Reference Alvre1986:46–49; Van Alsenoy & van der Auwera Reference Van Alsenoy, van der Auwera, Miestamo, Tamm and Wagner-Nagy2015:28; Karjalainen Reference Karjalainen2019).
Table 1 gives an overview of the data used in this study.
3. The use of keegi and other indefinite pronouns
3.1 Indefinite pronouns
According to Martin Haspelmath’s classic definition, indefinite pronouns are pronouns ‘whose main function is to express indefinite reference’ (Haspelmath Reference Haspelmath1997:11). However, as shown by Haspelmath himself and later by, for example, Denić, Steinert-Threlkeld & Szymanik (Reference Denić, Steinert-Threlkeld and Szymanik2022), indefinite pronouns may have various functions and various referential values, showing that indefiniteness is not a clear-cut category and is internally heterogeneous. Haspelmath (Reference Haspelmath1997) has listed nine main functions of indefinite pronouns, and Denić et al. (Reference Denić, Steinert-Threlkeld and Szymanik2022) have reduced this number to six main semantic ‘flavours’: specific known, specific unknown, nonspecific, negative polarity, free choice, and negative indefinite. Most European languages have more than one indefinite pronoun for covering this range of meanings; however, in Estonian, keegi can be used for all of them.
Indefinite pronouns are very common within the scope of negation. Most European languages use special negative indefinite pronouns (Bernini & Ramat Reference Bernini and Ramat1996:120), such as nobody in English. Estonian is one of the few European languages that does not have dedicated negative indefinites; only mitte keegi (which includes the non-sentential negation marker mitte) has grammaticalised into this function to a certain degree (Bernini & Ramat Reference Bernini and Ramat1996:124–125). Negative indefinites may co-occur with verbal negation or themselves suffice to express sentential negation (as in English) (Haspelmath Reference Haspelmath1997:36). In Estonian, mitte keegi always occurs with verbal negation.
Another widely discussed function of indefinites in negative contexts is negative polarity. Negative polarity items are words or phrases that can be used only in sentences that include at least one negative element in the same sentence (Zwarts Reference Zwarts, Brown and Miller1999:295). In relation to indefinite pronouns, well-known polarity items are the English any-series (anybody, anything). In addition to negative clauses they can be used in some other negative-polarity environments, such as in conditional or interrogative clauses, as well as some other environments, and are not strictly related to the expression of non-existence (Haspelmath Reference Haspelmath1997:37–39), thus in typical irrealis contexts. Estonian, again, does not have a dedicated indefinite pronoun for expressing negative polarity and also uses keegi in negative polarity contexts.
In many languages, however, indefiniteness can also be expressed in negative contexts by other means. Partee (Reference Partee and Rothstein2008) has explained the use of Russian partitive-genitive within the scope of negation by referring to decreased referentiality and non-veridicality in this context. Furthermore, based on Kiparsky (Reference Kiparsky, Butt and Geudel1998), Partee shows that the partitive marking of an object in Finnish occurs in a context of lowered referentiality (compared to the total object in the accusative). The connection between non-referentiality under the scope of negation and partitive marking of NPs with reduced referentiality has been found in many languages, but especially in Balto-Finnic and Slavic languages (Miestamo Reference Miestamo, Luraghi and Huumo2014; Seržant Reference Seržant2015). According to Seržant (Reference Seržant2015), the partitive-under-negation rule is a language-contact phenomenon and common Eastern Circum-Baltic innovation. The use of partitive marking of objects and existential subjects under negation is obligatory in Estonian as well; it also applies to indefinite pronouns, e.g. keegi (nominative) > kedagi (partitive).
3.2 Indefinite pronouns in Estonian
While personal, demonstrative, and interrogative pronouns in Finno-Ugric languages are fairly old word classes, indefinite pronouns formed considerably later, as evidenced by their varied origins and the existence of compound forms (Alvre Reference Alvre1980:539, Reference Alvre1986:5).
Van Alsenoy and van der Auwera (Reference Van Alsenoy, van der Auwera, Miestamo, Tamm and Wagner-Nagy2015:32, 39, 66) categorise Uralic indefinites into four groups: negative indefinites (morphologically negative), negative indefinites (morphologically non-negative), negative polarity indefinites, and neutral indefinites. Out of these four categories, Estonian mostly uses neutral indefinites, which do not have any distributional restrictions: even when used with a negative verb they acquire their negative or specific meaning from the context. This can result in ambiguity in meaning in some cases. However, Estonian also has a non-sentential negative marker mitte ‘not’, which, used together with keegi ‘nobody’ or miski ‘nothing’, has the function of emphasising the negativity and clarifying the meaning. In the previously mentioned categories, mitte + indefinite pronoun can be considered to be a morphologically negative indefinite, or a negative indefinite in terms of Haspelmath (Reference Haspelmath1997) and Denić et al. (Reference Denić, Steinert-Threlkeld and Szymanik2022).
Interestingly, the word mitte is etymologically related to the partitive form of the interrogative mis ‘what’ (Mägiste Reference Mägiste2000:1545). Since indefinites have developed from interrogatives in Estonian, the proposed development from *mitä-ä-hen > mittää > mitta > mitte (Mägiste Reference Mägiste2000:1545) indicates how tightly the use of interrogative-indefinite pronouns and partitive case marking are related to each other especially in negation contexts.
Moreover, mitte is also used as a constituent negator with infinitive and converb clauses (e.g. mitte tea-des not know-conv ‘not knowing’) in Standard Estonian (see Tamm Reference Tamm, Miestamo, Tamm and Wagner-Nagy2015), and as a negation word or polarity item in some dialects, especially in the Insular and Western dialects, as in (7). Thus, the use of interrogative/indefinite pronouns in the context of negation was also common in the past and it has developed into a polarity item and/or a negation word in Estonian.
It can be explained by the fact that the use of partitive case under the scope of negation is a common feature in Estonian as well as in other Finnic languages and in Baltic and Slavic languages; in these languages partitive marking is used for expressing indefinite, non-referential meanings (Miestamo Reference Miestamo, Luraghi and Huumo2014; Seržant Reference Seržant2015). Thus partitive indefinite pronouns are something that could be expected to occur in negated clauses (as a subject or object argument under the scope of negation), and therefore the development from a partitive indefinite pronoun to a polarity item and later into a negation word seems possible.
One of the most productive affixes for deriving indefinite pronouns is -gi/ki, which works in Estonian in a way similar to discourse particles and has various meanings related to information structuring, quantification, etc. (Metslang Reference Metslang2003). The original meaning of the affix -gi/ki is unclear; in present-day data it has both additive (‘also’) and scalar (‘even’) meanings. In negative contexts it behaves as a negative polarity item, as many words with this affix are used only with negative polarity (Sang Reference Sang1983:121–122; Paldre Reference Paldre1998:49–51). It is possible that -gi/ki has become a part of many indefinite pronouns precisely through negative polarity.
The Estonian indefinite pronouns with the suffix -gi/ki are keegi, miski, mingi ‘some, a certain’, kumbki ‘(n)either’, and ükski ‘none’; the first four of these are based on early interrogative stems, the last one on the numeral üks ‘one’ (Alvre Reference Alvre1980:539; see also Nevis Reference Nevis1984). Deriving indefinites from interrogatives is common typologically (Haspelmath Reference Haspelmath, Dryer and Haspelmath2013) and is characteristic of the Uralic languages (Van Alsenoy & van der Auwera Reference Van Alsenoy, van der Auwera, Miestamo, Tamm and Wagner-Nagy2015). When looking at our dialectal data, only the South Estonian Võru and Seto varieties use bare interrogatives (without -gi/ki) as indefinites (kiä ‘who, somebody’).
Deriving indefinites with the -gi/ki clitic is thus a relatively late development, which can also be seen from the position of -gi/ki. As an enclitic particle, it is attached to the very end of the word after any number and case markers (ilusa-te-le-gi ‘beautiful-pl-all-cli’), but as an affix on indefinites its position varies: it is used before or after the case marker, e.g. kelle-le-gi – kelle-gi-le (see Pant Reference Pant2018; Pant Reference Pant2020). This positional variation is an indicator of the ongoing lexicalisation process, whereby the -gi/ki clitic becomes a part of the stem and therefore its natural position is before the case and number suffixes (kellegi-le). However, language planning still suggests the placement of -gi/ki after other suffixes, similarly to the use of the -gi/ki clitic as a discourse particle (Pant Reference Pant2018). In dialects, the typical position of -gi/ki is before the case marker, at least in the allative form (Saareste Reference Saareste1955:16), and this does appear in our data: out of 35 allative forms, 23 have the case marker at the end, while 10 pronouns end with -gi/ki (and two pronouns from the Seto dialect lack a marker for indefiniteness).
Other indefinite pronouns in Estonian are kõik ‘all’, iga ‘each’, mõlemad ‘both’, kogu ‘all’, mitu ‘many’, mõni ‘some’, üks ‘one’, teine ‘other’, etc. (Erelt, Erelt & Ross Reference Erelt, Erelt and Ross2007:187). The use of the pronouns mingi and üks has been more thoroughly examined by Pajusalu (Reference Pajusalu2000, Reference Pajusalu2001, Reference Pajusalu2004): while both of these pronouns express vagueness in spoken language, using mingi leaves an impression that the referred entity is unfamiliar to both the speaker and the listener, while üks conveys the meaning that in that given context the referent is unknown only for the listener; mingi can also have a negative or evaluative connotation, while üks typically does not (Pajusalu Reference Pajusalu2000). It has been argued that indefinite pronouns such as kõik, mõni, and mitu should more accurately be called quantifying pronouns, as they are often used as definite pronominal NPs in spoken language (Pajusalu Reference Pajusalu2009:135).
3.3 Functions of keegi in the data
In this section we describe the possible functions that the pronoun keegi can have based on the data from the CED. The functions are defined on the basis of syntax. The indefinite pronoun can be used as an argument (subject, object, oblique argument), an attribute, a negative polarity item, and as some other minor functions that are mostly related to spoken use of language and are therefore not mentioned in Estonian grammars. We have broadly referred to all of these uses as functions of keegi. This categorisation is our own and does not follow any previously described functions for the pronoun keegi.
Nominative subject
The subject argument in Estonian is typically in the nominative case and agrees with the verb in person and in number (Erelt, Metslang & Plado Reference Erelt, Metslang and Plado2017:240). The indefinite pronoun keegi often occurs in subject position and indicates that the subject’s referent is unknown or even irrelevant for the speaker and/or listener, as in (8).
Partitive subject
Estonian has the option of using partitive subjects which alternate with nominative subjects, a case of differential subject marking (see e.g. de Hoop & de Swart Reference Hoop and de Swart2009). The use of a partitive subject is more restricted than that of a nominative subject: a partitive subject occurs most commonly in existential and possessive clauses with XVSFootnote 4 word order, and is obligatory in negative existential (as in (9)) and possessive clauses (Erelt & Metslang Reference Erelt and Metslang2006:255); in all of these clause types, it alternates systematically with a nominative subject. However, the use of a partitive subject is not limited only to these clause types (Huumo & Lindström Reference Huumo, Lindström, Luraghi and Huumo2014; Lindström Reference Lindström2017); its use is mostly linked to quantitative indefiniteness (Metslang Reference Metslang2012; Lindström Reference Lindström2017). Partitive subjects here are categorised separately from nominative subjects since keegi as a partitive subject behaves significantly differently from keegi as a nominative subject, as shown in the statistical analysis in Section 5.
Object
Estonian has differential object marking, meaning that the marking of the direct object varies and is dependent on several semantic and syntactic factors (see e.g. Ogren Reference Ogren2015). The object is most typically marked with the partitive case (for partial objects) and with the genitive or nominative case (for total objects). The choice between using a partial or a total object is dependent on polarity, aspect, and the referent’s boundedness. If a clause is perfective, the referent is quantitatively bounded, and the clause is affirmative, a total object is used. If even one of these conditions is not met, a partial object is used instead (Metslang Reference Metslang2017:258, 264–267). Some verbs, however, take only partitive objects and do not allow object marking alternations (see Tamm & Vaiss Reference Tamm and Vaiss2019). Interestingly, in the dataset of this study, all the objects are in the partitive case; 87% of them occur in a negative sentence.
Adverbial
In the Estonian grammar tradition, the term adverbial covers both oblique arguments (such as arguments marking experiencer, possessor, or addressee) and adjuncts (e.g. time and location adverbials). The border between the oblique arguments and adverbials is not always clear-cut in Estonian: on one hand, the option to have an oblique argument and the form of it are selected by the predicate; on the other hand, their presence in the clause is far from being obligatory and is more likely context-dependent (see e.g. Lindström & Vihman Reference Lindström and Vihman2017), making obliques closer to adjuncts. Therefore we use a cover term adverbial in this study, without drawing out clear differences between the obliques and adjuncts. In (11) keegi is an adjunct (semantically beneficiary), in (12) it is a possessor argument, and in (13) it is an addressee. Most of the uses of keegi in this group are related to the marking of possessors, addressees, and beneficiaries. Note that some typical adjuncts, such as locatives and time adverbials, cannot be formed with the indefinite pronoun keegi.
Genitive attribute
A genitive attribute occurs within the NP and precedes the head noun. Estonian genitive attributes may express the possessor, author, place, time, quantum, purpose, etc. (Pajusalu Reference Pajusalu2017a:388). In our data, all the uses were more or less closely related to possessor marking, as in (14). Only the uses where the indefinite pronoun has the meaning ‘proper, true’ could be seen as a separate group, as in (15).
Postnominal attribute
Estonian has mostly prenominal attributes in noun phrases (e.g. genitive attributes), as they are strongly preferred over postnominal attributes, but postnominal attributes are also possible (Pajusalu Reference Pajusalu2017a:382). Keegi as a postnominal attribute typically belongs to a pronoun (me ‘we’, nad ‘they’, as in (16)) or to a noun referring to a group of people (e.g. rahvas, inimesed ‘people’). This construction has the meaning ‘any of the group’ or ‘none of the group’.
Determiner
Since Estonian lacks grammatical articles, indefinite article-like determiners keegi, miski ‘something, nothing’, üks ‘one’, mingi ‘some, a certain’, etc. can be used to express indefiniteness. These determiners are more frequent in spoken than in written language (Pajusalu Reference Pajusalu2017a:382–384, Reference Pajusalu2017b:573). In this context, grammatically keegi can be replaced by mingi or üks, changing only minute nuances in the meaning (see Section 3.1), and keegi can be considered (as with üks and mingi) to function like an indefinite article (Pajusalu Reference Pajusalu2000:89), with the main function of indicating that the referent of the NP is unknown, as in (17).
Negative polarity item
A negative polarity item (NPI) is a word associated with a negation environment, which means it normally appears in sentences with negative polarity, but it is also common in certain non-negative contexts such as conditional or interrogative sentences. Typical NPIs in English are any (and the any-series), ever, at all, etc., although in different languages NPIs can range from nouns and adverbs to even verbs and constructions (Sang Reference Sang1983:120; Haspelmath Reference Haspelmath1997:33–34; Giannakidou Reference Giannakidou, Klaus von Heusinger and Portner2011:1661–1662; Erelt Reference Erelt2017b:193).
The affix -gi has been considered to be an NPI itself, as words like ükski ‘none’, iialgi ‘never’, sugugi ‘(not) at all’, etc. are all used only with negative polarity. Although pronouns like keegi, miski ‘something, nothing’, mingi ‘some, any’ and adverbs like kunagi ‘ever, never’ have both positive and negative meanings, the first interpretation of their meaning in a negated sentence is negative exactly because of the affix -gi (Sang Reference Sang1983:121–122; see also Paldre Reference Paldre1998). A study about negation in Estonian dialects found that keegi is used as an NPI in all of the analysed subdialects (the study included one subdialect from each dialect), but it was a more frequent means of emphasising negation in the subdialects of the Western, Mid, Eastern, and Mulgi dialects (Klaus Reference Klaus2009:148).
Since keegi can be used as a subject or object under negation and, based on its form, we cannot distinguish its use as a negative polarity item from other uses, we have taken a narrower approach to the definition of an NPI here: specifically, NPIs are those uses of indefinite pronouns in negated clauses that do not fill any argument position of the negated verb, i.e. their use is not related to the meaning of the main verb but only to the negation. NPIs in our data only appear in negative environments and have the purpose of emphasising the negation.
More than half of the NPIs in our data are preceded by ega ‘nor’, ei ‘no’, or muud ‘other:prt’, as seen in (18), forming a somewhat grammaticalised construction. For the other NPIs, keegi typically acquires the meaning of ‘at all’, as seen in (19).
Generalising alternative
In the data of this study, a generalising alternative follows an NP and refers to an indefinite, unspecified option similar to that NP (20). The NP in this structure is separated from the generalising alternative by või/ehk ‘or’, with the NP being in focus, while the following või/ehk keegi denotes uncertainty or possible other alternatives (Lindström Reference Lindström2001:96).
The distribution of the aforementioned functions in the data is depicted in Table 2. Keegi is most commonly used as a nominative subject and an object, followed by the functions of partitive subject, adverbial, and negative polarity item. Keegi is less often used as any type of attribute or as a generalising alternative.
4. Methods
4.1 Annotation
Our dataset consists of observations of keegi and its variants from the corpus. Each datapoint includes the preceding and following context (up to 20 words), the case marking of keegi, and information about the speaker. Each of the sentences in the dataset was manually annotated with the following variables.
Animacy of the referent
This is the dependent variable of the study and marks whether the entity that keegi is referring to is animate or inanimate. In this study, all humans (including human collectives) as well as animals are marked as animate, and everything else is marked as inanimate. As mentioned previously, in real language use, animacy is a much more complex concept and not just a binary division, but in the interest of operationalisation, while also taking into account the topics and themes in the spoken data used, it is reasonable to differentiate only between animate entities, as in (21), and inanimate entities, as in (22).
Polarity of the clause
This marks whether the polarity of the clause containing keegi is affirmative, as in (23), or negative, as in (24). We predict that the animacy distinction has less importance within the scope of negation; therefore referring to inanimate entities with keegi could be more common in negative clauses.
Function of keegi
This marks which syntactic function keegi fills in a clause. These functions are as follows: nominative subject, partitive subject, object, adverbial, genitive attribute, postnominal attribute, determiner, negative polarity item, and generalising alternative. See Section 3.2 for a more detailed description of the functions.
Position of keegi in the clause
This marks one of three places in the clause for keegi to be situated: clause-initially, as in (25), clause-internally, as in (26), or clause-finally, as in (27).
Case marking of keegi
This variable was extracted directly from the extant corpus annotation and marks the case of keegi in the clause. Out of the 14 Estonian cases, eight are found in the data: nominative, genitive, partitive, elative, allative, adessive, ablative, and comitative. In a previous animacy study of kes ‘who’, it was found that case was significantly associated with the referent’s animacy, with elative and comitative being the most frequently used cases to refer to inanimate referents (Pook Reference Pook2019), so it is highly likely that the case of keegi also affects its use.
Dialect
This marks which dialect area the speaker is from: the Coastal, Northeastern, Insular, Western, Mid, Eastern, Mulgi, Tartu, Võru, or Seto dialect. We predict that dialects are a very significant factor determining the probability of referring to an inanimate entity with keegi. In a previous study of kes ‘who’, the pronoun was used to refer to inanimate referents most frequently in the northern dialects, particularly in the Eastern, Western and Coastal dialects, while using kes in that manner was rare or unattested in the southern dialects (Pook Reference Pook2019). We expect the area where keegi is used for inanimates to be roughly the same.
Table 3 gives an overview of all the variables used in this study.
4.2 Statistical analysis
When studying dialect syntax, it is highly beneficial to have a large number of natural language recordings since it can be difficult to reproduce syntactic phenomena in a controlled environment. However, this type of data collection can also result in an unpredictably unbalanced dataset, in which the phenomenon of interest can be represented many times in one dialectal area or construction and hardly ever in another due to arbitrary and uncontrollable factors during data collection, but not necessarily due to the actual distribution of the phenomenon.
Hence, in this study, we have used three different statistical methods, none of which pose any particular requirements upon the data, making them highly suitable to use in the case of unbalanced datasets with categorical variables. Specifically, these methods are conditional inference trees, random forests, and multiple correspondence analysis. We applied all of these in order to determine which variables affect the use of keegi in referring to animate or inanimate entities.
Conditional inference trees and random forests are methods based on binary recursive partitioning. At each stage, the tree model’s algorithm tests the association between the independent variables and the given response variable (which, in this study, is the animacy of the pronoun keegi). The variable most strongly associated with the response variable is the one used to split the data into two sets. This kind of partitioning continues until no variable is associated with the response at a level of statistical significance. At this point, the results are depicted as a tree with binary splits (Hothorn, Hornik & Zeileis Reference Hothorn, Hornik and Zeileis2006; Strobl, Malley & Tutz Reference Strobl, Malley and Tutz2009).
For random forests, the model outputs a measure of importance for each variable, averaged over many conditional inference trees. These measures, in turn, reflect the value of impact each variable has on the response. The goal of these two methods is to predict the chances of the dependent variable occurring in a given context, specified by the independent variables (Breiman Reference Breiman2001).
Correspondence analysis (CA) is an exploratory technique designed specifically for the analysis of categorical variables. CA takes the frequency of co-occurring features and converts them to distances, which are then plotted on a two- or three-dimensional graph to visualise how the variable values are associated with each other (Glynn Reference Glynn, Glynn and Robinson2014:445). Multiple correspondence analysis is an extension of simple CA, but the former has the ability of analysing more than two factors simultaneously (Hill & Lewicki Reference Hill and Lewicki2006:136).
All three of these methods have been successfully used in many other studies of Estonian, Estonian dialects and (dialect) syntax (see e.g. Uiboaed Reference Uiboaed2013; Ruutma et al. Reference Ruutma, Kyröläinen, Pilvik and Uiboaed2016; Lindström & Uiboaed Reference Lindström and Uiboaed2017; Taremaa Reference Taremaa2017; Lindström, Pilvik & Plado Reference Lindström, Pilvik and Plado2018; Pook Reference Pook2019; Hint et al. Reference Hint, Taremaa, Reile and Pajusalu2021; Lindström, Pilvik & Plado Reference Lindström, Pilvik and Plado2021; Pook Reference Pook2021).
All of the calculations were performed using the statistical software R (R Core Team 2018). The conditional inference trees and random forests were computed using the functions ctree() and cforest() from the party package (Hothorn, Hornik & Zeileis Reference Hothorn, Hornik and Zeileis2006). The correspondence analysis was computed using the function mjca() from the ca package (Nenadic & Greenacre Reference Nenadic and Greenacre2007).
5. Results
In this section we present our analysis of all the variables included in the study in terms of how they relate to keegi referring to animate and inanimate entities. In Section 5.1 we look at all the variables individually: dialect, function, case marking, polarity, and position. In Section 5.2 we show the conditional inference tree and random forest models in order to determine how these variables together affect the speaker’s choice in referring to animate or inanimate entities with keegi. In Section 5.3 we use a multiple correspondence analysis to visualise the associations between all the variables on a two-dimensional graph.
5.1 Impact of the studied variables
Out of the 1,857 observations of keegi in the dataset, 987 referred to animate and 870 to inanimate entities. While in Standard Estonian keegi can only refer to animate beings, in dialects this restriction clearly does not always exist and keegi is used almost equally to refer to both animate and inanimate entities.
In order to find out which variables affect the use of keegi in terms of referring to animate or inanimate referents, in this section we analyse all of them in comparison to the animacy of the referent. The variables examined are dialect (and subdialects), function, case, polarity, and position.
5.1.1 Dialects and subdialects
First we compared the frequency of referring to inanimate entities in the dialects and subdialects. As can be seen in Table 4 and Figure 2, the dialects for which it is most probable to refer to inanimate entities with keegi are the Western, Mid, and Eastern dialects, where over half of the pronouns refer to an inanimate being. All in all, referring to inanimates is possible in all of the dialects except for the Võru and Seto dialects, where all of the instances refer to an animate being.
Looking more closely at the subdialects (Figure 3), we can see that most of the subdialects in the Western, Mid, and Eastern dialects have a high percentage of references to inanimate entities. The Insular dialect is split into two – although it has a moderately high probability of referring to inanimate beings with keegi, the data from the subdialects show that on the island of Saaremaa (the biggest island in the Insular dialect) most of the pronouns refer to animate entities, while on the island of Hiiumaa (the second largest island in the Insular dialect) it is very likely to refer to inanimate entities as well.
We compared the dialectal results obtained in this study about keegi with the results of the study about the use of the interrogative/relative pronoun kes ‘who’ (Pook Reference Pook2019), which can also be used to refer to both animate and animate entities in Estonian dialects. Table 5 shows that the area where this variation occurs is quite similar. Although kes is predominantly used to refer to animate beings, with an average of only 9.7% of the pronouns referring to inanimates, the Western, Mid, and Eastern dialects have a higher percentage of inanimate referents, while the Võru and Seto dialects have few or no inanimate referents for both pronouns. The use of the pronouns in the Insular dialect is also divided in a similar manner between the islands of Saaremaa and Hiiumaa.
The significant differences in the percentages show, however, that while kes is mostly still perceived to be associated with animate entities, keegi has lost some of its distinction in animacy in the minds of the speakers and can be more easily used to refer to both animate and inanimate beings. Heine and Kuteva (Reference Heine and Kuteva2006:206, 227) have noted that, in many languages, as the interrogative markers have gone through the stages of grammaticalisation – from being just an interrogative marker to a marker that can introduce headed relative clauses – they have lost their distinction in gender, animacy, number, case, etc. Pook (Reference Pook2019) showed that this was also true for kes, as the pronoun was much more likely to refer to an inanimate entity when it was a relative pronoun than when it was used as an interrogative pronoun. Since indefinite pronouns have also grammaticalised from interrogatives, it is interesting to see that the semantic bleaching in animate–inanimate distinction is even more common with keegi than with kes (as can be inferred from frequency information).
It is also interesting to note that (based on the data in the Corpus of Estonian Dialects) while kes ‘who’ and mis ‘what’ are both used to refer to both animate and inanimate entities in certain Estonian dialects, the same cannot be said about their counterparts keegi and miski ‘something, nothing, anything’, as miski can only refer to inanimate entities in Standard Estonian as well as in all the Estonian dialects.
We briefly examined the normalised frequencies of keegi and miski in the Corpus of Estonian Dialects to see whether the dialects that overwhelmingly use keegi to refer to both animate and inanimate entities therefore have a lower frequency of miski overall, since keegi fills the function of both pronouns, and whether the dialects that use keegi predominantly to refer to animate entities have a higher overall frequency of miski.
As can be seen in Table 6, our hypothesis is true for most dialects. The Western, Mid, and Eastern dialects, which have a very high percentage of keegi referring to inanimates, have a much lower usage frequency of miski than of keegi. Inversely, in the Coastal, Tartu, Võru, and Seto dialects, where references to inanimate entities using keegi are less common or even completely unattested, the frequency of miski is three or more times higher than the frequency of keegi.
The Insular dialect stands out because of its opposite behaviour: although almost half of the instances of keegi in that dialect refer to inanimates, the frequency of miski in the corpus is almost four times higher than the frequency of keegi. It is possible that there are other words used in a similar function and position, for example üht(i) ‘(not) at all’ or mitte ‘not’ in the scope of negation. Previous researchers have also noticed the frequent occurrence of miski and mitte for emphasising negation in the Insular dialect (see e.g. Vitsberg Reference Vitsberg1958:27, 202).
5.1.2 Function
Next we looked at all the functions of keegi in comparison to the animacy of what keegi was referring to (see Table 7). The nominative subject (see (28)), attributes, and adverbials stand out as they are rarely or never used to refer to inanimate entities. Keegi as a polarity item, object, or partitive subject is, however, used predominantly to refer to inanimate referents. Generalising alternatives and determiners are also more likely to be inanimate.
5.1.3 Case marking
For indefinite keegi, case seems to be strongly associated with the referent’s animacy, as can be seen from Table 8. Partitive stands out as the typical case used to refer to inanimate referents, with 83.7% of partitive pronouns referring to inanimate beings. Meanwhile, nominative, adessive, allative, and genitive are strongly associated with referring to animate beings. These percentages correspond well to the results in the previous section since subjects and objects showed similar probabilities of animate/inanimate references relative to their prototypical cases, nominative and partitive.
The rest of the cases have too few observations in the dataset to draw any clear conclusions about their use in this variation.
For the pronoun kes, case was also a significant factor determining whether the pronoun was used to refer to animate or inanimate entities. However, for kes the elative and comitative cases were the ones where the majority of pronouns were used to refer to inanimates (see Pook Reference Pook2019).
5.1.4 Polarity
The speaker’s choice of using keegi to refer to inanimate entities is also affected by the polarity of the clause. Table 9 shows that in clauses with negative polarity, it is much more likely that keegi refers to inanimate beings (59%) than in affirmative clauses (17.5%). Thus the restriction that keegi has an animate referent does not hold up well at all in negative clauses.
5.1.5 Position
Finally, using keegi to refer to inanimates is particularly probable if keegi is situated at the end of the clause as opposed to the beginning (see Table 10). This is most likely associated with the function keegi serves in the clause, as functions that encourage referring to inanimate entities are either overwhelmingly (in the case of partitive subjects and objects) or always (in the case of polarity items) in the middle or at the end of the clause.
5.1.6 Summary of the variables’ effects on the referent’s animacy
Looking at all these variables separately, we can say that keegi is mostly used to refer to inanimate entities in the Western, Mid, and Eastern dialects, in negative clauses, as an object, a partitive subject, or a polarity item, and towards the end of the clause, as illustrated in (29).
In order to further verify these results, we have used multifactorial statistical methods in the next sections of this paper, which also give us the opportunity to measure the relations and interactions between the studied variables.
5.2 Conditional inference tree and random forest
In order to assess the significance of all the variables in association with each other, we ran a conditional inference tree model on the data. Figure 4 shows the conditional inference tree graph for the animacy of the referent of the pronoun keegi. Here we focus on linguistic/functional variables only: the variables included in this model were case, function, polarity, and position; the response in this model was animacy. We have excluded the variable of dialect from this model, as its effect on the animacy of the referent has already been demonstrated in Section 5.1.1. Data from all dialects are still included in the analysis.
The figure displays all the possible splits significant at the level of 0.05 or less. The bar plots at the bottom show the proportion of animate (light grey) and inanimate (dark grey) observations with the given combination of variable values.
It can be seen that the animacy of keegi is significantly associated with all four included variables: case, function, position, and polarity. Case is the variable to first split the dataset into two: keegi in elative and partitive has a higher probability of being inanimate than keegi in other cases included in the data. Raw data show, however, that there are only six instances of keegi in elative in the dataset, which is definitely not enough to make any solid conclusions, so it should rather be said that partitive is the only case in the dataset that clearly licenses the inanimate use of the pronoun. The other cases are next divided by function: determiners and polarity items have a 20% chance of being inanimate (Node 6). The rest of the functions are split again by case: comitative and genitive have a low possibility of referring to an inanimate entity (Node 4), while ablative, adessive, allative, and nominative almost exclusively refer to animate beings (Node 5).
The set of partitive and elative is also split by function: adverbials, determiners, general alternatives, objects, and polarity items have a very high chance of referring to inanimate beings. If keegi in one of those functions is in a negative clause, the probability of it referring to an inanimate entity is even higher (Node 10) than when it is in an affirmative clause (Node 9). Partitive subjects and postnominal attributes behave according to their position in the clause: clause-initial or clause-internal keegi is less likely to refer to an inanimate entity (Node 12) than clause-final keegi (Node 13).
The C-index of concordance for this model is 0.94. The C-index evaluates the predictions made by the algorithm: it shows the proportion of concordant pairs divided by the total number of possible evaluation pairs. A value of 0.5 means that the model is not able to discriminate between the variants at all, a value between 0.5 and 0.7 shows poor discrimination, a value between 0.7 and 0.8 suggests an acceptable discrimination, a value between 0.8 and 0.9 shows excellent discrimination, and any value above 0.9 means that the model is able to discriminate between different variants exceptionally well (Hosmer Jr., Lemeshow & Sturdivant Reference Hosmer, Lemeshow and Sturdivant2013: 177). Therefore this model is fitted exceedingly well.
While the conditional inference tree shows the significant associations between independent variables and the response, it does not show the strength of those associations. Therefore the random forest model was applied to the same dataset. This analysis includes the same variables as the conditional inference tree model, with the addition of the variable of dialect. The impact of the variables is shown in Figure 5. The names on the y-axis show the variables included in the analysis, and the numbers on the x-axis show the relative difference between the importance of the variables.
Figure 5 concludes that the most important predictor for the animacy of the pronoun keegi is dialect (0.042), followed by case (0.021), function (0.007) and position of keegi (0.003). Polarity does not seem to have any discriminatory power in this model. The C-index of this model is 0.97, which suggests an outstanding fit.
This mostly reflects the results of the conditional inference tree, showing that the variables of case, function, and position affect the animacy of the pronoun both significantly and strongly. However, while polarity significantly determines the animacy of the pronoun in a certain context in the dataset, the association between polarity and animacy is weak and it cannot be generalised for the entire dataset.
5.3. Correspondence analysis
As a final method, we visualised all the studied variables with a multiple correspondence analysis (MCA) in Figure 6. For most datasets, the combination of the first two dimensions offers the most accurate and easily interpretable visualisation of how the variables and their values are associated with each other (Glynn Reference Glynn, Glynn and Robinson2014:447). The further a value is from the origin (the point where the x-axis and y-axis intersect), the more discriminating it is. Inversely, the closer a value is to the origin, the less discriminating it is, but only in the context of the chosen variables. This means that a variable or a value might still contribute to the studied variation, but not in the visualised dimensions.
To analyse the relationship between one variable’s value and another variable’s value, one should look at the angle connecting the two values via the origin: the smaller the angle, the stronger the positive association probably is. If the angle is 90 degrees, the values are most likely not associated at all, and if the angle is 180 degrees, the values are probably negatively associated with each other.
It is important to note here that the MCA does not show whether the associations between the variable values are significant or relevant at all since the primary purpose of this technique is to just produce a simplified representation of the data. Therefore one must check all conclusions made with the MCA using raw data (Greenacre Reference Greenacre1984:10; Hill & Lewicki Reference Hill and Lewicki2006:134; Glynn Reference Glynn, Glynn and Robinson2014:444).
We can see in Figure 6 that the first, vertical dimension appears to be a continuum from most animate to least inanimate and describes 73.9% of the variance in the data. This dimension also mostly seems to follow the argument-marking schema, where prototypical nominative subjects are most likely to be animate (referring to a person), while objects and partitive subjects tend to refer to inanimates (concrete or abstract entities, events, or even non-referential use (see Metslang Reference Metslang, Luraghi and Huumo2014:202)). It is not as obvious what the second, horizontal dimension represents. However, it only describes another 14.4% of the variance, so the vertical dimension is plainly much more important in describing the use of keegi. Combined, the first two dimensions describe 88.3% of the variance. This means that only 11.7% of the variance of these studied variables is left unexplained by this MCA analysis.
In addition to objects, inanimacy is also linked to partitive subjects and polarity items, to the partitive case in general and to clause-final position. These variable values are, however, not only associated with inanimacy, but many of them are also associated with each other. All objects and partitive subjects and a majority of polarity items are in the partitive case. Polarity items, in turn, typically occur at the end of the clause. Although negative polarity is situated a bit farther from the centre of this group, all three of the aforementioned functions are typically in the scope of negation and are associated with each other through that characteristic as well.
In fact, in some types of sentences, it can be somewhat difficult to make the distinction between objects, partitive subjects, and polarity items. See for example (30): in this clause, häda ‘problem’ could be interpreted as a partitive subject, making kedagi ‘someone:prt’ a polarity item. However, if we consider häda olema ‘to be wrong (with something)’ to be a lexicalised verb construction, kedagi instead becomes the partitive subject of the clause.
Therefore, as it is sometimes difficult even to distinguish these three functions from each other, it is not at all unusual that they also function in a similar fashion in this variation, and the differences between them are more vague for the pronoun keegi. All in all, it is a cluster of values that truly function as a group, and none of them can be disregarded in analysing the use and variation of keegi.
Another group of associated values is the subject, clause-initial position, and the nominative case. These values are not as strongly linked to animates as the previously discussed values were to inanimates, as they are farther from each other on the plot and the angle connecting them to the origin is wider. Nevertheless, it is safe to say that this is another important cluster in describing the variation of keegi. While negative polarity is, in the given context of variables, not as discriminating in describing this variation, affirmative polarity is in fact very closely related to animate entities. Similarly, while none of the dialects are very strongly associated with inanimacy, the southern dialects and the Coastal dialect are clearly more connected to referring to animate entities.
A separate group is formed with the adverbial and genitive attribute functions and with genitive, ablative, allative, comitative, and adessive cases. Based on their position on the graph, it seems that both functions tend to be associated with animate entities. This is confirmed by the raw data, as there are only eight adverbials and no genitive attributes that refer to inanimate beings. As the name suggests, genitive attributes are all in the genitive case, while the rest of the mentioned cases are typically associated with adverbials, which explains why exactly these values are presented together on the graph.
All in all, this MCA analysis nicely illustrates the results obtained in the previous parts of the analysis: there are several significant variables in this study that all affect the use of keegi, and they do this in association with each other.
6. Conclusions and discussion
In this paper we examined the use of the indefinite pronoun keegi ‘someone, nobody, anybody’ in Estonian dialects. We described functions and positions in which keegi can be used in these dialects and analysed the phenomenon of using the otherwise animate keegi to refer to inanimate entities as well, a variation that is characteristic only of dialects and not of Standard Estonian.
Based on the data in the Corpus of Estonian Dialects, the pronoun keegi is used in the following functions: as a nominative and a partitive subject, an object, an adverbial, a genitive and a postnominal attribute, a determiner, a negative polarity item and a generalising alternative. Almost half of all the uses are subjects, but objects, adverbials, and negative polarity items are also very frequent.
The results show that keegi is most often used to refer to inanimate entities in the Western, Mid, and Eastern dialects, where over half of keegi pronouns refer to inanimates. At the same time, in the Võru and Seto dialects it does not seem to be at all possible to use keegi to refer to inanimate beings. Similar results were obtained in the study of the pronoun kes ‘who’ (Pook Reference Pook2019), where it was possible to refer to inanimate entities with kes in the northern dialects, but this variation was rare or non-existent in the Võru and Seto dialects. The Insular dialect’s two biggest islands were also divided similarly in both studies – both keegi and kes can be used to refer to inanimates on Hiiumaa, but rarely or never on Saaremaa. The similar distribution of inanimate uses of kes and keegi shows us that such developments are probably not coincidental: in this area the animate–inanimate distinction has for some reason started to fade.
Nevertheless, we cannot draw a direct line between the use of kes and the use of keegi in this similar variation. While the region where the speakers do not distinguish between animate and inanimate clearly overlaps for kes and keegi, the same cannot be said about their morpho-syntactic use. As our results show, the indefinite pronoun is most often used to refer to inanimate entities when keegi is an object, a partitive subject, or a negative polarity item, when it is in the partitive case, and positioned at the end of a negative clause. When keegi refers to an animate being, it is most likely a nominative subject at the beginning of an affirmative clause. In terms of kes, negation does not have a strong influence on this variation, and – contrary to keegi – the percentage of inanimate kes pronouns is three times higher in affirmative clauses than in negative clauses. In addition, instead of the partitive marking of keegi being the one most likely to refer to inanimates, it is the elative and comitative forms of kes that show the most prevalent lack of distinction in animacy.
However, it was shown in Pook (Reference Pook2019) that the distinction in animacy for the pronoun kes was most prevalent when kes was used as a relative pronoun, as opposed to an interrogative pronoun, that is, the more grammaticalised functions also showed the least selectivity in terms of animacy. A similar connection can be made for keegi, as the most grammaticalised function of a negative polarity item also increased its non-selectivity. So there are certainly parallels between the use of kes and keegi, but it is obviously not only due to the interrogative component kes in keegi that causes this variation.
Our results show how tightly indefinite pronouns and partitive case marking are interrelated in the scope of negation, as well as how the animate–inanimate distinction has become irrelevant in this specific context. From this we may also infer that we are dealing with a case of grammaticalisation: the loss of semantic distinctions or semantic bleaching more widely can be an early stage in the grammaticalisation process (see e.g. Heine & Kuteva Reference Heine and Kuteva2002:2, Reference Heine and Kuteva2006:60–61). The same has happened in the process of grammaticalisation of interrogatives into relative pronouns in many European languages (Heine & Kuteva Reference Heine and Kuteva2006:209), including in Estonian (Pook Reference Pook2019). The loss of a semantic distinction for indefinites, which is in the current case an extension of the inanimate uses of keegi, can be seen as an analogous grammaticalisation process, which can potentially result in developing into a negation word or a polarity item. We have seen that already happen with the word mitte ‘not’, which has grammaticalised from an interrogative/indefinite pronoun (in the partitive case) to a polarity item and/or negation word in Estonian (Mägiste Reference Mägiste2000:1545). Thus the inanimate use of keegi in Estonian dialects seems to be following the same path of grammaticalisation, and does not seem to affect the animacy distinction very much in syntactic positions that are outside the scope of negation and for which differentiating between animate and inanimate referents is still relevant to understanding the content of the clause, such as for nominative (canonical) subjects or attributes.
Acknowledgements
This research has been supported by the Centre of Excellence in Estonian Studies (European Regional Development Fund). We want to thank the anonymous reviewers of the Nordic Journal of Linguistics for their valuable comments and suggestions.