Hostname: page-component-6bf8c574d5-gr6zb Total loading time: 0 Render date: 2025-02-22T20:05:07.908Z Has data issue: false hasContentIssue false

Acquisition of Morphological Variation: An Elicitation Experiment on Children’s Production of Parallel Forms in Croatian and Estonian

Published online by Cambridge University Press:  21 February 2025

Virve-Anneli Vihman*
Affiliation:
Institute of Estonian and General Linguistics, University of Tartu, Tartu, Estonia
Gordana Hržica
Affiliation:
Department of Speech and Language Pathology, University of Zagreb, Zagreb, Croatia
Mari Aigro
Affiliation:
Institute of Estonian and General Linguistics, University of Tartu, Tartu, Estonia
Sara Košutar
Affiliation:
Department of Language and Culture, UiT The Arctic University of Norway, Tromso, Norway
Tomislava Bošnjak Botica
Affiliation:
Department of General Linguistics, Institute for the Croatian Language, Zagreb, Croatia
*
Corresponding author: Virve-Anneli Vihman; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Children’s acquisition of variation in the target language depends on a number of factors not yet well understood. This study probes the acquisition of morphological variation in two unrelated languages, Croatian (Slavic) and Estonian (Finnic), focussing on parallel forms of a lexeme expressing a single grammatical category (a phenomenon known as morphological overabundance). We conducted a cross-linguistic elicitation experiment with 140 monolingual, typically developing children aged 3;0 to 6;11 (80 learning Croatian, 60 Estonian). We elicited genitive plural forms in Croatian and partitive plural in Estonian, with lexemes which either are invariant or allow more than one form. Children in both languages were less accurate with lexemes with parallel forms, indicating that the morphological variation hindered acquisition. Pattern type frequency was found to affect accuracy in both languages. Children’s choice between two parallel forms was unaffected by age, but significant language-specific differences emerged.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Variation has been acknowledged as a factor in language development for decades, but only recently has it become a focus of systematic research (see, e.g., Shin & Miller, Reference Shin and Miller2022). The pervasiveness of variation in language has become increasingly apparent through variationist corpus linguistics (e.g., Szmrecsanyi, Reference Szmrecsanyi2017), boosted by the development of more diverse corpora and more sophisticated analytical approaches. Children encounter linguistic input which is full of variable expressions of meaning, such as synonyms or variable order, and probabilistic information regarding how frequently each of two alternatives is used. Variation is both a help and a hindrance: consistent, transparent cues are known to facilitate the acquisition of word forms and argument structure (Chan et al., Reference Chan, Lieven and Tomasello2009; Slobin, Reference Slobin1985), yet children need a degree of variation to motivate generalisation across a grammatical category (e.g., Perfors et al., Reference Perfors, Tenenbaum and Wonnacott2010).

Variation is also inherent to children’s production and results from various aspects of the developmental process. Variability in early language production may reflect patterns of variation in the input, but variation in children’s production which is not present in the input can result from vague or underspecified representations, memory limitations, insufficient planning and articulatory skills, or advances in knowledge which are not implemented across the board, leading to, e.g. the parallel use of accurate (e.g. past tense ran) and overgeneralised (*runned) forms (Maslen et al., Reference Maslen, Theakston, Lieven and Tomasello2004). Much research has shown that children acquire some variability early (see overviews in, e.g., Smith & Durham, Reference Smith and Durham2019, Johnson & White, Reference Johnson and White2020, p. 6, Shin & Miller, Reference Shin and Miller2022), but the timing depends on both the salience and frequency of the target structure (Smith & Durham, Reference Smith and Durham2019).

Variation has been a focus of recent research in heritage languages as well as more generally in the first language acquisition of sociolinguistically and structurally conditioned variation, including lexical (Smith & Durham, Reference Smith and Durham2019), phonological (Miller, Reference Miller2013) and morphosyntactic (Shin, Reference Shin2016) variation. Yet we still know little about how children acquire the variable patterns in their input (such as optional subject omission or the use of synonyms). Shin and Miller (Reference Shin and Miller2022) propose a trajectory for the acquisition of a category expressed with two variants, beginning with (a) the production of only one of the two forms, followed by (b) the use of both forms but in mutually exclusive contexts, giving way to (c) the use of both forms in a small number of more frequent, overlapping contexts, before eventually expanding to (d) the use of two variants in a similar probabilistic distribution as found in the input.

Selecting only one of two variants, to begin with, makes use of initially limited resources when no semantic distinction is apparent. The well-known Mutual Exclusivity Assumption (Au & Markman, Reference Au and Markman1987; Markman et al., Reference Markman, Wasow and Hansen2003) is posited to underlie children’s word learning, showing effects up through age five. This was extended to a bias against grammatical synonymy by Clark (Reference Clark and McWhinney1987) in the Principle of Contrast (also dubbed the Doctrine of Form-Function symmetry by Poplack, Reference Poplack, Shin and Erker2018), in order to explain why children assume differing functions for differing syntactic structures. These theoretical principles may go some way to explain what leads children to posit conditions governing variation. If they assume that each form ought to be mapped to a single meaning, then they may seek conditions for variation, and impose conditions when they are absent in the input. However, Shin and Miller (Reference Shin and Miller2022) note that the research to date has not focussed on how children recognise two forms as being interchangeable, nor how different types of input variation have differing effects on acquisition.

This paper addresses the second of these questions through a cross-linguistic experimental study of children’s use of parallel morphological forms. Children acquiring languages with complex morphological systems have been shown to have a good grasp of the more frequently encountered aspects of the system by age three (see Dressler, Reference Dressler2012, Smoczynska, Reference Smoczynska and Slobin1985; Xanthos et al., Reference Xanthos, Laaha, Gillis, Stephany, Aksu-Koç, Christofidou, Gagarina, Hrzica, Ketrez, Kilani-Schoch, Korecky-Kröll, Kovačević, Laalo, Palmović, Pfeiler, Voeikova and Dressler2011; in Croatian, e.g. Kovačević et al., Reference Kovačević, Marijan, Hržica, Voeikova and Stephany2009; in Estonian, e.g., Argus, Reference Argus, Voeikova and Stephany2009), but they need more time to fully acquire the nuances of the system and less frequent forms and categories (e.g., in Croatian, Hržica et al., Reference Hržica, Košutar, Botica Bošnjak and Milin2024; in Estonian, Argus, Reference Argus, Voeikova and Stephany2009, Argus & Bauer, Reference Argus and Bauer2020; in Estonian, Finnish and Polish, Granlund et al., Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019), which are still being acquired at age five (e.g., Abbot-Smith et al., Reference Abbot-Smith, Lieven and Tomasello2008).

The studies noted above investigate the acquisition of categorical (invariable) patterns, while the acquisition of patterns involving variation in morphologically rich languages is largely unexplored. Yet languages with complex morphology are known to exhibit plentiful deviations from canonical ‘form-function symmetry’. The inflectional systems in these languages exhibit syncretism (form homonymy across different cells in the paradigm, like her in English, used for both genitive and accusative forms of the third singular feminine pronoun, compared to his and him), defectiveness (the lack of a form for an expected cell, e.g. the lack of a genitive inanimate relative pronoun to match whose in English or the lack of a first-person singular form for the Russian verb pobedit ‘to be victorious’) and parallel forms for a single cell, also known as ‘overabundance’ (e.g. dived and dove in English, both acceptable past tense forms of dive). The last of these (parallel forms, or overabundance) is the focus of the present study.

We do not know how or when variation in the realisation of inflectional categories is acquired by children learning languages like these. Since they are sensitive at an early age to morphological structure (Xanthos et al., Reference Xanthos, Laaha, Gillis, Stephany, Aksu-Koç, Christofidou, Gagarina, Hrzica, Ketrez, Kilani-Schoch, Korecky-Kröll, Kovačević, Laalo, Palmović, Pfeiler, Voeikova and Dressler2011), they may be expected to also acquire morphological variation early, reflecting similar distribution frequencies as encountered in their input (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015). On the other hand, in the absence of any functional motivation for parallel forms used to express a single category, they may well follow Shin and Miller’s (Reference Shin and Miller2022) proposed pathway, or make use of some of the strategies along the pathway, before attaining adult-like use of the variation.

We have little knowledge about how children navigate variable systems and much of the research on the acquisition of variation is based on longitudinal corpus data (see Shin & Miller, Reference Shin and Miller2022), which may underestimate children’s knowledge and ability to produce variation, given appropriate contexts (Requena, Reference Requena2023). In this study, we aim to provide triangulation in three ways: (a) we focus on a type of variation in the input which, to our knowledge, has not been investigated in first language acquisition before, namely morphological overabundance unconditioned by sociolinguistic factors, semantic content, or syntactic context, (b) we conduct an experiment, which enables us to target a particular construction with a controlled context, including a broader sample of children and more tokens than corpus data typically allows, and (c) we include two languages, Croatian and Estonian, giving insight into the acquisition of overabundance with systematic and distributional differences. We have chosen two inflectionally rich languages from unrelated language families which both exhibit a considerable amount of parallel forms in language usage but with differences between the distribution of these forms and the linguistic features of lexemes allowing the parallel forms (e.g. Croatian verbs in Bošnjak Botica & Hržica, Reference Bošnjak Botica and Hržica2016, and Estonian nouns, Aigro & Vihman, Reference Aigro and Vihman2023). Within nouns, overabundance is distributed differently depending on gender class in Croatian, whereas its distribution in Estonian cuts across multiple inflectional classes in many morphological categories; yet the use of parallel forms has not been reported to be conditioned by sociolinguistic factors in either language. The differences in distribution and linguistic factors may be expected to also affect rates of acquisition.

In the next section, we introduce the notion of morphological overabundance and describe the nominal inflection systems of Croatian and Estonian, followed by our research questions and the preregistered hypotheses guiding our study. The methodology section includes the factors affecting variation which we include in our study as predictors, followed by results. We discuss the implications of the results for both research on the acquisition of variation and psycholinguistic treatments of overabundance in the Discussion.

2. Background

2.1. Parallel forms, or morphological overabundance

In morphological theory, overabundance is the situation in which a morphological category (such as past tense or dative case) can be expressed by more than one form for a single lexical item (Thornton, Reference Thornton, Maiden, Smith, Goldback and Hinzelin2011), such as the verb dive in English, acceptable in the past tense as either dove or dived, and often listed in dictionaries as both. Thornton (Reference Thornton, Rainer, Gardani, Dressler and Luschützky2019) shows that overabundance may occur as a rare, lexically conditioned phenomenon (meaning that it occurs on a word-by-word basis, as in English, restricted to very few lexemes) or a widespread phenomenon affecting many lexemes or many morphological paradigm cells through the system, on the other extreme. Parallel forms may occur with differing proportions (i.e. the degree to which one variant is preferred varies across lexemes), and may be conditioned or occur in free variation (Thornton, Reference Thornton, Rainer, Gardani, Dressler and Luschützky2019). Aigro and Vihman (Reference Aigro and Vihman2023) emphasise the need to distinguish between potential (available, theoretical) and realised overabundance, as evidenced in usage. Investigations of the phenomenon in particular languages have shown a wide range of cross-linguistic variation in the usage of overabundant paradigms (Bermel & Knittl, Reference Bermel and Knittl2012; Bošnjak Botica & Hržica, Reference Bošnjak Botica and Hržica2016; Fehringer, Reference Fehringer2004; Aigro & Vihman, Reference Aigro and Vihman2023). The frequency ratios of parallel forms vary greatly between lexemes (Bermel & Knittl, Reference Bermel and Knittl2012), but balanced 50–50 usage is attested and not necessarily a rare, temporary situation (Aigro & Vihman, Reference Aigro and Vihman2024).

Theoretical approaches have cast doubt on the existence or persistence of unstructured variation, in which parallel forms are used for a particular expression with no apparent distinction emerging in semantic or pragmatic interpretation, suggesting that languages might be expected to tend toward reducing this form of deviation over time (Thornton, Reference Thornton, Maiden, Smith, Goldback and Hinzelin2011). However, overabundant usage is widely attested and found to be stable (e.g., Bošnjak Botica & Hržica, Reference Bošnjak Botica and Hržica2016; Guzmán Naranjo & Bonami, Reference Guzmán Naranjo and Bonami2021; Aigro & Vihman, Reference Aigro and Vihman2023). Cappellaro (Reference Cappellaro, Cruschina, Maiden and Smith2013) links diachronic preservation to low lexical frequency, while Bermel and Knittl (Reference Bermel and Knittl2012) link realised overabundance to higher frequencies. Aigro and Vihman (Reference Aigro and Vihman2024) found that overall cell frequency (the occurrence of the grammatical category in any form) is a factor in the usage ratios of parallel forms, with lower frequencies facilitating more even usage of both variants. Speakers of some languages have been found to use the two variants in complementary distribution, conditioned by syntactic factors (Bermel & Knittl, Reference Bermel and Knittl2012) or morphophonological properties (Aigro & Vihman, Reference Aigro and Vihman2024; Guzmán Naranjo & Bonami, Reference Guzmán Naranjo and Bonami2021); completely free variation between forms is rarer (Dąbrowska, Reference Dąbrowska2008).

However, even conditioned variation may not be governed by factors to which young children are attuned: use of nouns as arguments vs adjuncts (Bermel & Knittl, Reference Bermel and Knittl2012) or behaviour according to morphological paradigms (Aigro & Vihman, Reference Aigro and Vihman2024) may not reflect categories that young children are sensitive to. The use of parallel forms in similar contexts might be compared to children’s parallel production of target-like and overgeneralised forms: two options for expressing a single grammatical category are used interchangeably. The unstructured distribution of form variants in the input and the unclear role played by lexical or cell frequency in overabundant usage in the target language raises a host of questions regarding children’s acquisition of the system. This study narrows the questions to the distribution found in the input in Croatian and Estonian. The following section describes the usage of parallel forms found in the nominal inflection of the two languages.

2.2. Linguistic background: Croatian and Estonian

This section gives brief overviews of noun inflection in the languages included in this study, Croatian (Slavic) and Estonian (Finnic). Both languages include rich systems of inflectional morphology, with multiple cases and declension classes. Nominal inflection classes in Croatian are characterised by a three-way system of grammatical gender made more complex through accentual diversity in inflectional paradigms and morphologically driven phonological changes. Estonian has 26 morphophonologically conditioned inflection classes and no grammatical gender (Viks, Reference Viks1992). Both languages exhibit overabundant paradigms for both nouns and verbs.

The present study investigates Croatian genitive plural and Estonian partitive plural. These two grammatical categories, represented in the morphological paradigms by a single ‘cell’ in a table, are frequent in usage, even in language used with and by young children, and they are characterised by parallel forms across a wide selection of lexemes. In addition, both are used to mark plural nouns co-occurring with frequent quantifiers, as well as having other functions (the Croatian genitive indicating possession and the Estonian partitive being the default object case). These features make the two paradigm cells comparable in terms of acquisition and enable us to use them in a single experimental design across the two languages for comparability and generalisability.

Croatian

The nominal system of Croatian includes three genders, two numbers (singular and plural) and seven cases (nominative, genitive, dative, accusative, vocative, locative, instrumental). The singular paradigm is morphologically differentiated, while the plural paradigm displays syncretism in three cases. Reference grammars (e.g., Barić et al., Reference Barić, Lončarić, Malić, Pavešić, Peti, Zečević and i Znika2005) categorise nouns into three distinct inflection classes: masculine and neuter nouns inflect according to a single paradigm, while feminine nouns are divided into two classes.

Most of the forms expressing grammatical categories are predictable from the citation form (nominative singular). However, the presence of morphophonological and accentual variation within a paradigm opens up the possibility for a grammatical category to be expressed with multiple acceptable forms for a particular lexeme. According to the most recent description of overabundance in Croatian (Bošnjak Botica, Reference Bošnjak Botica2024), there are close to 30 categories of potentially overabundant forms in the Croatian nominal system. In certain paradigm cells, these parallel forms are found in both singular and plural, where phonological conditions render morphological changes optional, hence facilitating the simultaneous availability of both forms. Some of these cells are overabundant for a large, productive group of thousands of lexemes, while others have only a narrow range of lexemes exhibiting overabundance.

Recent research (Bošnjak Botica & Hržica, Reference Bošnjak Botica and Hržica2016; Bošnjak Botica et al., Reference Bošnjak Botica, Polančec, Musulin, Hržica and Košutar2023) has revealed a discrepancy between the potential overabundance listed in reference grammars, and realised overabundance, as evidenced in language corpora. One of the most prevalent cells with realised overabundance is the genitive plural of feminine nouns ending with -a with a stem-final consonant cluster. Unlike most overabundant cells, which can have two parallel forms, this group can be realised with up to three forms. Most feminine nouns in Croatian (around 80%) belong to the group of nouns ending with -a. Over 85% of these have a relatively simple genitive plural, which differs from the nominative singular only in phonological length (the word-final -a is lengthened), as shown in the first two rows of Table 1.

Table 1. Genitive plural formatives in Croatian

* A noun was considered to belong to a particular pattern if it was attested in this pattern in at least one of the two corpora.

However, a smaller number of feminine nouns with a stem-final consonant cluster (from now on, CC nouns, e.g. čizma, torba) may exhibit parallel forms (Bošnjak Botica et al., Reference Bošnjak Botica, Polančec, Musulin, Hržica and Košutar2023). The frequencies reported here derive from the Croatian language corpus (Ćavar & Brozović Rončević, Reference Ćavar and Brozović Rončević2012) and Croatian Web corpus (subcorpus Forum.hr, Ljubešić & Klubička, Reference Ljubešić, Klubička, Bildhauer and Schäfer2014). In these corpora, there are more than 200 CC nouns with a frequency of over 20. Such nouns can allow up to three competing GEN.PL forms: (a) a lengthened word-final /a/, (b) an additional /a/ inserted in the stem, breaking the CC (a-a), and (c) an alternative ending -i, replacing the /a/ (last three rows of Table 1). Although the i-option is less preferred by language authorities (Babić et al. 2012; Birtić et al., Reference Birtić, Blagus Bartolec, Hudeček, Jojić, Kovačević, Lewis, Matas Ivanković, Mihaljević, Miloš, Ramadanović and Vidović2012), it seems to be spreading in usage (Kapović, Reference Kapović2018; Lečić, Reference Lečić2016). Some feminine CC nouns appear in all three forms in at least one of the corpora (e.g., čizma ‘boot’), others in only two forms (e.g., torba ‘bag’). There are certain phonological restrictions that block the usage of a (b) form (torba ‘bag’ in Table 1). However, some nouns that do not display any phonological restrictions also appear in only two forms; different combinations are attested (see Table 1). Out of all CC nouns that display overabundance, most are attested with two endings. The combination of (a) and (c) forms and the combination of (b) and (c) forms have similar distributions (39% and 38% lexemes with GEN.PL overabundance sharing the pattern). All three forms appear in 23% of overabundant nouns. It also has to be noted that not all CC nouns are attested as overabundant, as 12% only appear in the (c) form.

The most frequent GEN.PL formative among all Croatian nouns is the -a pattern, as it is present in the two most common declensions (one used for masculine and neuter gender and the other primarily for feminine nouns). It is the only option for the large majority of masculine and neuter nouns. Regarding feminine nouns, two additional formatives appear in certain numbers, but the frequency of occurrence of the three genitive plural formatives is heavily biased towards the lengthened word-final -a. Feminine nouns can belong to two classes, one more frequent than the other (80% vs. 20%). In the more frequent class, most of the nouns (85%) do not display overabundance and form GEN.PL almost exclusively by lengthening the -a. For the rest of the nouns in that class (CC nouns), the -a formative is one of the overabundant variants. Table 1 shows that many CC nouns (over 60%) appear in patterns that accommodate the -a formative. Altogether, counting feminine nouns from both classes, overabundant or not, -a appears in 72% of nouns. Other GEN.PL formatives are less frequent in Croatian feminine nouns. The formative -i is present in the other class of feminine nouns (comprising 15% of feminine nouns) and in CC nouns. Altogether, -i is one or one of the formatives for 35% of feminine nouns. The formative -a-a is only present in some CC nouns, as evident in Table 2. Altogether, it is one of the formatives of 14% of feminine nouns.

Table 2. Partitive plural formatives in Estonian

Estonian

Estonian nominal morphology includes two numbers and fourteen cases, but no gender distinctions. Three cases, nominative, genitive, and partitive, can all serve as stems for the remaining eleven cases, which are for the most part regular and predictable. The system includes stem change, syncretism, and overabundance, and the form of a lexeme in a particular morphological paradigm cell is not always predictable from anyone ‘diagnostic’ form alone (see Blevins, Reference Blevins2008; Kaalep, Reference Kaalep2012; Viht & Habicht, Reference Viht and Habicht2019). However, it is predictable from the nominative, genitive, and partitive forms. Current approaches describe 26 nominal inflection classes (Raadik, Reference Raadik2013; Viht & Habicht, Reference Viht and Habicht2019; Viks, Reference Viks1992).

The existence of these alternate patterns across classes gives rise to the possibility of multiple acceptable forms for a particular lexeme. Aigro and Vihman (Reference Aigro and Vihman2023) give an overview of realised overabundance in Estonian nouns (as attested in the corpus), contrasted with potential overabundance listed in reference grammars (Erelt et al., Reference Erelt, Leemets, Mäearu and Raadik2018; Viht & Habicht, Reference Viht and Habicht2019). They find overabundant forms in usage in the Balanced Corpus of Estonian for most cells in the plural paradigm and a number of singular cells. While the greatest proportion of lexemes is attested with overabundant forms in the elative plural (8.4%), the partitive plural cell, because of its more frequent usage overall, is by far the most prevalent in terms of absolute numbers of overabundant lexemes: 366 lexemes were found to be in use with parallel partitive plural forms, amounting to 4.1% of the noun dataset. Usage of partitive plural parallel forms is much more balanced between ‘cellmates’ (the parallel forms) than most semantic cases, which are heavily biassed toward the long, affixal forms (see also Aigro & Vihman, Reference Aigro and Vihman2024).

Partitive plural formation is known to be a challenge to acquire for L2 language learners (e.g. Metslang, Reference Metslang, Fernandez-Vest and Do-Hurinville2009), with five possible formants, not all phonologically predictable, as shown in Table 2, some of which have to be lexically learned (Blevins, Reference Blevins2008; Kaalep, Reference Kaalep2010). Table 2 shows examples of non-overabundant nouns: two lexemes with affixal partitive plural formations, one, a fused form expressing partitive and plural (kiisu-sid), the other agglutinative (raamatu-i-d), both attached to a GEN.SG stem, and three vowel partitive plural forms (banaane, õunu, lehti), where the ending is considered to be part of the stem itself, and is lexically specified as -i, -e, or -u.

The partitive plural also has the most varied overabundant patterns, distributed across ten distinct formative pairs. The alternation patterns exhibited by most lexemes involve the long-form -sid and one of the three vowel endings (Aigro & Vihman, Reference Aigro and Vihman2023, Reference Aigro and Vihman2024). Table 3 exemplifies the three most frequent patterns, all alternating between the -sid affix and a vowel-final plural. The vowel-final plural is shorter, and it may involve the addition of a vowel, compared to the nominative singular stem (as in kell-i ‘clock-PAR.PL’, in Table 3), or stem-internal change, as in kive ‘stone.PAR.PL’ and maju ‘house.PAR.PL’, both of which exhibit a vowel change compared to the nominative singular stem. In the analysis, we call this feature (presence of stem change) morphological complexity.

Table 3. Distribution of partitive plural endings in the noun dataset based on the Balanced Corpus of Estonian (Aigro & Vihman, Reference Aigro and Vihman2023)

In terms of lexical form proportions, shorter (vowel-final) forms are preferred significantly more often, often dominating over the -sid affix for individual lexemes. This preference pattern has been linked to lexeme frequency (Kaalep, Reference Kaalep2009, Reference Kaalep2018; see also Aigro & Vihman, Reference Aigro and Vihman2024), in that the more frequent the lexeme or that particular lexeme cell, the more likely it is to be used with the vowel plural. This is easily explained, as the vowel plural requires lexically specific knowledge (regarding which vowel to select); hence, higher frequency means increased exposure and easier access during processing to the lexically specific information about vowel selection, in turn leading to the dominance of the vowel form with more frequent lexemes. For less frequent lexemes, the -sid variant is found to dominate over the vowel variant. Overall, usage proportions are not heavily skewed toward either variant for the partitive plural, although the dominance of the -sid form in a particular lexeme is rare (Aigro & Vihman, Reference Aigro and Vihman2023).

2.3. Factors affecting morphological acquisition

Various distributional factors are expected to guide the acquisition of morphology, based on the importance of properties of the input in the acquisition of morphologically complex languages (e.g., Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015; Granlund et al., Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019; Xanthos et al., Reference Xanthos, Laaha, Gillis, Stephany, Aksu-Koç, Christofidou, Gagarina, Hrzica, Ketrez, Kilani-Schoch, Korecky-Kröll, Kovačević, Laalo, Palmović, Pfeiler, Voeikova and Dressler2011). We expect children to glean probabilistic information about the acceptability of parallel forms from frequency information, and usage ratios for particular lexemes. At an early stage of Shin and Miller’s (Reference Shin and Miller2022) proposed pathway to acquiring variation, we would also expect children to initially reduce the variation and systematically select one form type, especially in the absence of any evidence for a functional need for both forms. This sort of selective usage might occur globally across a number of lexemes, or in lexically specific ways, depending on frequency and other factors.

Children have been amply shown to pick up on and imitate probabilistic information regarding the frequency of form usage. Frequency effects are pervasive in language usage (Divjak, Reference Divjak2019) and first language acquisition (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015; Hickmann et al., Reference Hickmann, Veneziano and Jisa2018). In acquisition studies, the focus has been on token frequency (the number of times a particular word form appears in the input) and type frequency (the number of distinct lexemes occurring in a particular pattern or form in the input), with the latter also referred to as class size and being measured similarly to phonological neighbourhood density (PND) or morphological neighbourhood density, which examines lexemes showing analogous patterns of morphological formation (see Granlund et al., Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019, for a comparison of two approaches to measuring neighbourhood density). The frequency of a certain form has been shown to predict the occurrence of overgeneralized inflected forms in that particular pattern (e.g., Aguado-Orea & Pine, Reference Aguado-Orea and Pine2015; Dąbrowska & Szczerbiński, Reference Dąbrowska and Szczerbiński2006; Hržica et al., Reference Hržica, Košutar, Botica Bošnjak and Milin2024; Maslen et al., Reference Maslen, Theakston, Lieven and Tomasello2004; Räsänen et al., Reference Räsänen, Ambridge and Pine2016).

Numerous studies have found type frequency (class size, or PND) effects on morphological acquisition in morphologically complex languages (e.g. in Polish, Dabrowska & Szczerbinski, Reference Dabrowska and Szczerbinski2006; Finnish, Kirjavainen et al., Reference Kirjavainen, Nikolaev and Kidd2012; Estonian, Argus, Reference Argus, Voeikova and Stephany2009; Lithuanian, Savičiūtė et al., Reference Savičiūtė, Ambridge and Pine2018; Croatian, Hržica et al., Reference Hržica, Bošnjak Botica and Košutar2023, Hržica et al., Reference Hržica, Košutar, Botica Bošnjak and Milin2024, and a comparison of Finnish and Polish, Engelmann et al., Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019), but the evidence for the interacting effects of token and type frequency is mixed. Räsänen et al. (Reference Räsänen, Ambridge and Pine2016) found that the effect of PND (class) decreased with increasing surface form frequency, but the effect disappeared when age was added to the model. Granlund et al. (Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019), in a study of the acquisition of case forms across three languages (Estonian, Finnish and Polish), found significant positive effects of form token frequency for Polish and Estonian, and of PND in all three languages. Separate models for the languages found the predicted interaction only in Polish, with PND having a greater effect in forms with lower token form frequency (less available for retrieval); a Bayesian analysis revealed some evidence for a negative interaction in both Polish and Estonian. An exploratory model including the three languages found an interaction in Polish and Finnish (but none in Estonian) indicating that, contrary to predictions, PND became less important with age.

For parallel forms, the question arises of what sort of frequency matters (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015). The acquisition of a variant may be predicted by the input token frequency (the frequency of one parallel form) or its overall cell frequency (the sum of token frequencies for each parallel form in a cell). In addition, the acquisition of a cell involving variation in the input may be affected by the relative frequency (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015) of the target cell compared with other cells. In the case of overabundant paradigms, we expect children to be sensitive to the relative probability of each parallel form in the input, a measure we have called lexical form proportion. The proportional usage of a morphological formative overall among nouns or verbs may also play a role; in this study, we refer to this as corpus form proportion (percentage of occurrence of a formative among all forms expressing a given cell).

Finally, language-specific factors related to morphological complexity have also been shown to affect acquisition alongside these more general, frequency- and pattern-based measures. As demonstrated by Granlund et al. (Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019) in exploratory analyses, the presence of stem change affected accuracy in both Estonian and Finnish, and targeted cases with more possible formatives were also more difficult for children in all three languages (Estonian, Finnish and Polish) than those with a single form. In the present study, we include morphological complexity, or the effect of stem change in Estonian, and system complexity, in which we examine the effect of the number of available parallel forms in overabundant cells in Croatian.

The operationalisation of these factors in our study is presented at the end of the Methodology section.

3. Research questions and hypotheses

In the present study, we aimed to test, first, whether overabundance facilitates acquisition (providing more than one potentially accurate target form) or hinders it (as variability may inhibit the formation of a clear representation of target forms, or even abstraction of the grammatical category). In addition, we were interested in whether children systematically conform to one form type over the other. We formulated the following research questions, along with preregistered hypotheses, available on OSF (https://osf.io/n7ku4). For descriptions of the variables used, please see “Variables”, in the Methodology section.

RQ1: What affects the accuracy of children’s responses in both overabundant (OA) and non-overabundant (non-OA) items?

H1. The effect of OA: Participants will perform more accurately on OA items than non-OA items across both languages. This is based on the likelihood of an accurate response being higher with two accepted outcomes than with only one. Forms which would be considered overgeneralisations if produced with non-OA items are considered accurate if produced with OA items. Hence our prediction is that the difference in OA condition would favor OA items.

H2. The effect of age: The accuracy of forms will improve with age across both types of items (both OA and non-OA).

H3. The effect of the input: For both types of items, children’s responses will be predicted by cell frequency and pattern frequency in the input, and an interaction between them, such that for higher frequency forms, the effect of pattern frequency will be smaller and for lower cell frequency lexemes, the effect of pattern frequency will be greater.

RQ2: What affects the choice of form in OA items?

H4. The effect of age: For OA items, the likelihood of producing the more frequent input form (vowel-final forms in Estonian; forms with -a in Croatian) will decrease with age, as the child acquires a fuller set of resources and is expected to rely less on global form frequency.

H5. The effect of input: Children’s responses will be predicted by the cell frequency and pattern frequency (forms with higher frequency being more likely in responses), and an interaction between them (such that pattern frequency will be less significant with higher frequency forms). As for the proportional usage of a form, this was differently defined in each language because of the locus of variation: responses are expected to be facilitated by lexical form proportion in Croatian, and corpus form proportion in Estonian (see Variables). In addition, we expect the effect of pattern frequency to increase with age, as the child acquires a better grasp of the overall system. This hypothesis was added after preregistration.

H6. Morphological complexity: For Estonians, we define complexity as the presence or absence of stem change. Form choice is expected to be affected by complexity, such that forms with no stem change will be preferred. For Croatian, complexity is defined as the number of available parallel forms. Form choice will be affected by the number of forms used by adults for a particular noun (2 or 3).

4. Method

4.1. Ethics and preregistration

This study received ethics approval from the Research Ethics Committee of the University of Tartu (protocol nr. 356-T23) and the Ethics Committee of the Faculty of Education and Rehabilitation Sciences of the University of Zagreb (nr. 251-74/21-01/2). Written consent was obtained from the preschools in which testing took place. Signed informed consent forms were received from each child’s parent or guardian prior to investigators testing the children. Children were asked for verbal assent on the day of testing and given the choice of whether to participate in the study or not.

This study’s design and hypotheses were preregistered on the website of the Open Science Framework before any testing began. The project site (https://osf.io/g4u5y) hosts additional information on the experimental stimuli, the raw data, and analysis scripts. The preregistration form can be found at: https://osf.io/n7ku4.

4.2. Participants

Participants included in the analysis comprise a total of 140 children aged 3;0 to 6;11 in two languages: 80 Croatian speakers (mean age = 5;4, 41 girls) and 60 Estonian speakers (mean age = 5;8, 32 girls). Six participants were excluded due to bilingualism (5 Estonian, 1 Croatian). Five participants were excluded from the Croatian data for only having produced responses in the nominative case.

All the children were reported to be typically developing, native speakers of their respective languages. Our aim was to include children old enough to participate in the elicitation study and young enough to be in the process of acquiring certain aspects of complex morphology. This age range aligns with that of similar, earlier studies (e.g. Granlund et al., Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019; Räsänen et al., Reference Räsänen, Ambridge and Pine2016). The wide age range was chosen to enable the observation of potential developmental effects. Children were recruited from kindergartens in Zagreb (Croatia) and Tartu (Estonia) where the respective languages are dominant societal languages.

4.3. Procedure

Two production tasks were designed to elicit the target noun forms, genitive plural in Croatian and partitive plural in Estonian. The experimental protocol was detailed and consistent across both languages, aiming for maximal comparability. Each child was tested individually, in a quiet room at pre-school. The experimenter managed the experiment through MS PowerPoint on a laptop, recording responses both on paper and with an audio recorder. Four lists, ordered pseudo-randomly, were generated for each language.

Children were presented with an image of a rabbit, accompanied by a recorded female voice saying, “This is a bunny/rabbit.” Four practice trials were administered and repeated if needed, to ensure the child’s comprehension of the task. Subsequently, the participants were exposed to the experimental items, including 60 test items and 60 fillers.

For the test items in both languages, participants saw an image of the target noun, such as a single apple, accompanied by an auditory stimulus, “This is an apple,” with the nominative singular form of the target noun. The following slide displayed a picture of the rabbit observing a collection of multiple apples (as illustrated in Fig. 1), coupled with an audio prompt: “Bunny/Rabbit sees many…" (Croatian: Zec vidi puno…?; Estonian: Jänku näeb palju…?). The Croatian quantifier puno ‘many’ requires nouns in the genitive plural form; the Estonian quantifier palju ‘many’ requires nouns in the partitive plural.

Figure 1. Example of the image shown with prompt ‘Bunny/Rabbit sees lots of…?’ (Croatian: Zec vidi puno…?; Estonian: Jänku näeb palju…?)

The experimenter moved to the next item once the child had completed the sentence. They praised the children and encouraged them when they were confused, but did not repeat the audio or offer answers. The experiment lasted approximately 15 minutes. Children were offered two breaks during the task and were rewarded with stickers during and after the experiment, regardless of their responses.

Prompts for fillers differed from the test items. In Croatian, fillers elicited the instrumental singular, with the phrase To je zec s… ‘This is the rabbit with…’ In Estonian, fillers prompted allative singular forms, the picture depicting a bunny waving at the object, with the audio: Jänku lehvitab… (kellele?) ‘The bunny is waving to… (who?)’

4.4. Stimuli

Estonian stimuli included 120 items: 60 experimental items and 60 fillers. The 60 test items included both overabundant and non-overabundant nouns, based on whether they were attested in parallel forms in language corpora. Accuracy analysis includes all 60 experimental items, while form choice analysis only focuses on the overabundant items.

For Croatian, 30 feminine CC nouns were selected as overabundant based on reference books (Birtić et al., Reference Birtić, Blagus Bartolec, Hudeček, Jojić, Kovačević, Lewis, Matas Ivanković, Mihaljević, Miloš, Ramadanović and Vidović2012; Jojić et al., Reference Jojić, Nakić and Zečević2015). Corpus analysis showed that 5 of these were attested in only one form. Therefore, the Croatian accuracy analysis includes 35 non-overabundant items and 25 overabundant items. The Estonian experiment includes 30 overabundant and 30 non-overabundant stimuli. In the form choice analysis, however, 13 Croatian overabundant nouns are included, as they involve the option of the a-form in the genitive plural, varying between -a and -i, or between -a, -i and -a-a (nouns varying between -i and -a-a patterns were excluded from the form choice analysis). The selection criteria for the items were kept as similar as possible across the two languages.

As corpora of child language and child-directed speech (CDS) in these languages are too sparse to produce robust frequency information for two parallel forms of even frequent overabundant nouns (see Behrens, Reference Behrens2006; Lieven et al., Reference Lieven, Salomo and Tomasello2009; Parisse, Reference Parisse2019), we made use of both CDS and adult language corpora to obtain frequency data, as has been done in previous studies (e.g., Räsänen, Ambridge & Pine, Reference Räsänen, Ambridge and Pine2016, p. 1714; Engelmann et al., Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019; Kirjavainen et al., Reference Kirjavainen, Nikolaev and Kidd2012).

The Croatian nouns originate from the CHILDES Croatian Kovacevic Corpus (Kovačević, Reference Kovačević2002) to ensure that children would be likely to have encountered them in their input by this age. This corpus includes language samples of spontaneous interactions between three children and caregivers, collected during early language acquisition up to about 3 years of age. All Croatian overabundant nouns were feminine CC nouns. We extracted feminine CC nouns occurring in some form in both child speech and CDS. All selected nouns had high imageability ratings in the Croatian psycholinguistic database, with a minimum index of 3.7 out of 5 (mean, 4.7 Peti-Stantić et al., Reference Peti-Stantić, Anđel, Keresteš, Ljubešić, Stanojević and i Tonković2021). All target nouns were confirmed to have realised overabundance in the larger written corpora (Croatian language corpus, Ćavar & Brozović Rončević, Reference Ćavar and Brozović Rončević2012; Croatian Web Corpus, Ljubešić & Klubička, Reference Ljubešić, Klubička, Bildhauer and Schäfer2014). While many of the nouns have three theoretically available parallel forms, some (N = 10) were attested in two forms in the adult corpus, while others were attested in all three forms (N = 20). The number of realised forms was a language-specific variable in the form choice analysis (‘Number of OA forms’; see Variables). Non-overabundant nouns did not contain consonant clusters and therefore had no conditions for overabundance in the genitive plural.

The Estonian stimuli are based on the dataset of nouns found to have realised overabundance in partitive plural in the adult written corpus (Aigro & Vihman, Reference Aigro and Vihman2023). A total of 30 nouns were selected from the list of 366 overabundant nouns (available at https://osf.io/ehvsj) and checked for age-appropriateness against the CDS in three Estonian child language corpora available on CHILDES (Argus, Reference Argus1998; Vija, Reference Vija2004; Zupping, Reference Zupping2016 Footnote 1). Together, the CDS corpora consist of language samples of caregiver speech in spontaneous interactions with three children (between the ages of 1;3 to 4;2), amounting to approximately 171,000 CDS tokens. The CDS corpora attest to the use of 26/30 overabundant nouns and 29/30 non-overabundant nouns. These were supplemented by nouns showing overabundance in the Balanced Corpus of Estonian. There are no imageability scores for Estonian nouns; their imageability was decided by the authors. Non-overabundant forms were confirmed not to occur with multiple forms in adult usage, based on the same Estonian Balanced corpus. For all Estonian overabundant nouns, the partitive plural varies between the -sid affixal form and a vowel-final form, the latter of which is represented by eight -u, eight -i and fourteen -e nouns (see Table 3). Twenty-two of the nouns used in the two languages overlap in meaning, such that we were able to use the same image in Estonian and Croatian (e.g. virsik and breskva, both meaning ‘peach’).

The Croatian fillers included 60 masculine nouns, 30 of which may have two parallel forms in the instrumental singular (e.g., gusar.NOM ‘pirate’ ~ gusarom/gusarem.INSTR), while 30 nouns appear in only one form (e.g., slon.NOM ‘elephant’ ~ slonom.INSTR). The Estonian target form used with fillers, the allative singular, does not exhibit overabundance.

Corpus frequencies for the selected target items and the parallel forms of OA items are given in Appendix A.

4.5. Variables

We describe here the operationalisation of the factors discussed in the Background and list the formal measures used in this study, first, the variables coded for all items (both overabundant, OA, and non-overabundant, non-OA), and then the variables coded only for OA items. The adult corpus measures refer to the Balanced Corpus of Estonian (https://www.cl.ut.ee/korpused/grammatikakorpus/) or a combination of the Croatian language corpus (Ćavar & Brozović Rončević, Reference Ćavar and Brozović Rončević2012) and the Croatian Web Corpus (Ljubešić & Klubička, Reference Ljubešić, Klubička, Bildhauer and Schäfer2014).

Variables coded for all items (OA and non-OA):

Accuracy: Binary (0,1). This was treated as a binary category, despite the variable nature of children’s productions. In order to operationalise target-likeness, we counted as accurate any form that is standardly accepted in the adult language (i.e. two or three target forms for OA items, and a single form for non-OA items).

Item type: Binary (OA, non-OA).

Participant age: A continuous variable measured in months.

Cell frequency (‘Word form token frequency’ in the preregistration): The frequency per million words in an adult corpus of the cell in question. For non-OA items, this refers to simple token frequency. For OA items, it is the sum of frequencies of all accepted OA forms, used for expressing the single target grammatical category for a particular lexeme. For instance, in Estonian, the token counts of the two different partitive plural forms of koer ‘dog’ (koer-i and koera-sid) make up the cell frequency of ‘dog.PAR.PL’.

Pattern frequency (‘class size’ in preregistration): The number of distinct lexemes (type count) with which each morphological pattern is found in the adult corpus. A pattern is defined as the morphology with which a cell function is expressed, based on the forms in which it is attested in the corpus. For non-OA items, the pattern is a single formative, e.g. the Estonian noun küünal ‘candle’ has the partitive plural formative -id (küünla-id ‘candle-PAR.PL’). For OA items, pattern frequency is the sum of type frequencies for all available formatives. For instance, Estonian koer ‘dog’ has partitive plural forms using -sid and -i (koera-sid ~ koeri). Hence, its pattern frequency is the sum of -i and -sid type frequencies in the corpus (the number of nouns occurring with those two formatives).

Variables coded only for overabundant (OA) items:

The analysis of form selection only includes those OA items for which an accurate form was produced.

Form choice: To investigate which accurate form was produced, the dependent variable was operationalised as a binary variable (0,1) depending on whether the child produced the more (1) or less (0) frequent OA form type. For Croatian, the most frequent form was defined as the -a ending as it is prevalent in adult usage. For Estonian, it was the vowel-final form, which is more frequent overall in adult usage (Aigro & Vihman, Reference Aigro and Vihman2023).

Lexical form proportion : The proportion of use (percentage of tokens) of each formative out of the total count of cell frequency, e.g. the proportion with which the form čizama ‘boot.GEN.PL’ occurs out of all the genitive plural forms of čizma ‘pear’ (čizma ~ čizama ~ čizmi). This predictor was only used in the Croatian OA model, as nearly all Estonian OA items have vowel-final forms as the more frequent formant: the variable does not vary enough in Estonian to be used in the model.

Corpus form proportion : The percentage of the partitive plural formative given in the response, out of the overall pool of partitive plural forms. Unlike pattern frequency above, which is based on type frequency, this variable reflects pattern token frequency (number of forms found with a particular pattern in the adult corpus, expressed as a proportion of total PAR.PL forms). This variable is only used in the Estonian OA model.

Number of OA forms: Binary (2,3), language-specific predictor used only in the Croatian analysis to describe whether the target item has two or three forms attested in the adult corpus.

Complexity : Binary (1,0), based on the presence or absence of stem change in the partitive plural inflection of target nouns. With complex items, the vowel-final OA form does not contain the nominative form (e.g., kivi ‘stone.NOM’, kive ‘stone.PAR.PL’). With non-complex items, the vowel-final form contains the nominative form: luik ‘swan.NOM’, luiki ‘swan.PAR.PL’.

5. Results

Our results are presented in two sets of two models, one for each language. In the Accuracy section, we present results for RQ1 or the accuracy of all responses. In Form Choice we investigate RQ2, addressing the factors affecting the choice of response for overabundant items.

Analyses are all logistic mixed-effects models, built using the lme4 package in R (Bates et al., Reference Bates, Mächler, Bolker and Walker2015), using participants as a random slope. Explanatory variables were selected using forward stepwise selection, starting with an intercept-only model and adding variables only when they improved the model (as determined by ANOVA tests).

5.1. Accuracy

Both Estonian and Croatian models include all test items (n(Est) = 3,589, n(Cro) = 4,794). Both models test the effect of item type (OA vs non-OA), age, cell frequency, pattern frequency and interaction between the last two on the overall accuracy of children’s responses.

The final model for Estonian (AUC = 0.80, 1.0 < predictors’ VIF values < 1.3, showing low collinearity) is the following:

$$ \begin{array}{l} glmer\Big( responseaccuracy\sim itemtype+ age+\mathit{\log}1p\left( lexicalcellfrequency\_ pmw\right)\\ {}\hskip1em +\mathit{\log}( patterntypefrequency)+\left(1| participantcode\right),\hskip0.4em data= dat\_ est\_ all,\\ {}\hskip1em family= binomial\left( link=" logit"\right),\hskip0.4em control= glmerControl\left( optimizer=" bobyqa"\right)\Big)\end{array}$$

The final model for Croatian (AUC = 0.91, 1.1 < predictors’ VIF values < 1.35) is the following:

$$ \begin{array}{l} glmer\left( responseaccuracy\sim itemtype+\mathit{\log}(patterntypefrequency)\right.\\ +\mathit{\log}\left( lexicalcellfrequency\_ pmw\right):\mathit{\log}(patterntypefrequency)\\ \left.+\left(1| participantcode\right),\hskip0.5em , data= dat\_ cro\_ all,\hskip0.4em , family= binomial\left(\right)\right) \end{array} $$

The output for both models is in Appendix B (Table B1B2).

The effect of item condition

Our first hypothesis predicted that participants would perform more accurately on overabundant (OA) than non-OA items. Children learning Estonian gave more target-like responses overall than the children learning Croatian (Est: 75.4%, Cro: 66.1%). However, responses to OA items were significantly less accurate than non-OA items in both languages (see Fig. 2; Estonian: β = -0.99, p < .001; Croatian: β = -13.73, p < .001):

Figure 2. Accuracy of responses by language and item condition. OA = overabundant (more than one target response), non-OA = non-overabundant (items with one target response).

The difference in accuracy by item type is greater in Croatian (Fig. 2B) than in Estonian (Fig. 2A). Of the non-target-like forms produced for Croatian OA items, 84% constitute a-final forms for lexemes unattested in this form in adult usage. Inaccurate responses in Estonian were less uniform and generally involved the production of a non-target-like stem vowel (e.g., *kokke pro kokki ‘chef.PAR.PL’).

In summary, our hypothesis was not confirmed. Instead, the converse was shown to hold: accuracy was lower with OA items than non-OA items.

The effect of age

Our second hypothesis predicted that accuracy would increase with age. This was borne out in Estonian (β = 0.03, p = .017, see Fig. 3), but not in Croatian, where age did not affect accuracy (see Fig. 4). Age was significant in Estonian, with a smaller effect than either item type (OA, non-OA) or the input properties discussed below.

Figure 3. Accuracy by age (Estonian).

Figure 4. Accuracy by age (Croatian).

Examining the accuracy of responses given by individual children, we can see two very different patterns of responses. As shown in Fig. 3, the Estonian-speaking children gave varying proportions of accurate responses, with very few children performing at the ceiling or floor, and their overall individual proportion of accurate responses increased with age.

Croatian-speaking children show a very different pattern, with two children on the floor, but 60 children (75%) giving nearly identical proportions of accurate responses (42–44 out of 60), suggesting that their non-target-like responses are conditioned by the target items. Indeed, most children gave inaccurate responses to the same words, with a nearly categorical binary distribution: 17 nouns elicited almost universally inaccurate responses and 43 nearly universally accurate responses. In short, the distribution shows that in Croatian, most children, regardless of age, were heavily biased toward a-final responses, regardless of whether the target lexeme had an a-final genitive plural in the input or not. This is expanded on in the Discussion section.

The effect of input properties

Our third hypothesis predicted an effect of input frequency. We expected children to be more accurate with items used more frequently in the target cell, giving children more exposure to the target forms. Again, we find a significant effect in Estonian (β = 0.29, p < .001), but not in Croatian.

We also predicted greater accuracy with items belonging to more productive patterns for the target category (greater type frequency, or patterns occurring with a wider range of lexemes).

As shown in Fig. 5, pattern frequency had a significant positive effect in both Estonian (β = 1.17, p < .001, Fig. 5A) and Croatian (β = 1.81, p < .001, Fig. 5B). This is exemplified in Estonian with the partitive plural formation pattern using the stem vowel -e, which occurs with over seven times as many lexemes (n = 10,567) as the stem vowel -u (n = 1,443) in an adult corpus. Children’s responses demonstrate sensitivity to this distinction in type frequency: among non-OA items, 88% of responses to lexemes with -e partitive plural forms were accurate, compared to 19% of target items with -u forms. Similarly, the Croatian -a genitive plural form occurs with 40 times as many lexemes as the -i genitive plural (n = 9,643 and 238, respectively). Accuracy was 90% with items using the former strategy, and 4% with the latter.

Figure 5. Accuracy by pattern frequency (significant in both languages).

Finally, the predicted interaction between pattern frequency and cell frequency was found to hold in Croatian, but not in Estonian. For higher-frequency forms, pattern frequency affected accuracy significantly less, and low-frequency forms showed a stronger type frequency effect (Croatian: β = -0.04, p < .001).

5.2. Form choice

Two models (one for each language) describe the factors affecting the choice of the more frequent form for overabundant items. These models only include accurate responses to overabundant items, i.e. responses using either or any of the accepted formatives (n(Est) = 1,308, n(Cro) = 947). Both models test the effects of age, cell frequency, and pattern frequency. The Croatian model includes the language-specific predictor’s lexical form proportion (proportion of the produced formant in corpus data per lexeme) and a number of forms. The Estonian model includes the effect of corpus form proportion (proportion of the produced formant in the corpus data overall) and complexity. All predictors are described in the Methodology section.

The Croatian model investigates factors increasing the likelihood of children producing responses with the more frequent a-final form; the Estonian model investigates factors increasing the likelihood of the vowel-final form.

The final model for Croatian (AUC = 0.99, VIF = 6.5) is the following:

$$ \begin{array}{c}glmer\left( producedmorpheme\sim \mathit{\log}(lexicalformantproportion)\right.\\ +\mathit{\log}\left( lexicalcellfrequency\_ pmw\right):\mathit{\log}(patternfrequency)\\ \left.+\left(1| participantcode\right),\hskip0.4em data= dat\_ cro\_ oa,\hskip0.4em , family= binomial\right)\end{array}$$

The final model for Estonian (AUC = 0.91, 1.1 < predictors’ VIF values < 1.3) is the following. The output of both models is given in the Appendix C (Table C1C2).

$$ \begin{array}{c} glmer\left( producedfreqmorph\sim stemchange\right.\\ +\mathit{\log}\left( lexicalcellfrequency\_ pmw\right)+\mathit{\log}(patterntypefrequency)\\ \left.+\left(1| participantcode\right),\hskip0.4em , data= dat\_ est\_ oa,\hskip0.4em , family= binomial\right)\end{array} $$

Although Croatian children’s responses to OA items included a higher proportion of non-target-like forms (as discussed above), the accurate responses had a much higher proportion of the more frequent form compared to Estonian children (94.5% vs 60.1%). Neither language showed an effect of age on form choice in OA responses overall.

As shown in Fig. 6, Estonian children vary greatly in the degree to which they produce the same formative across OA pairs. Most children (70%) produce the vowel-final form in more than half of their responses, but only 10% do so exclusively. The proportion of vowel-final responses does not depend on age, contrary to our hypothesis, which predicted that older children would be more likely to produce the more frequent form in OA items than younger children.

Figure 6. Individual children’s form choice for OA items in Estonian (shown as the proportion of more frequent, vowel-final responses), by age.

The picture looks strikingly different in the Croatian responses, explaining the difference noted above in accuracy. As illustrated in Fig. 7, 70% of children exclusively produce the more frequent -a formative. The few children (6%) who conform to the alternative variant (as shown by the data points at 0 in Fig. 7) all produced the -i formative in accurate responses, both for nouns using the a~i pattern with two parallel forms, and for nouns using the a~-a-a~i pattern with three parallel forms.

Figure 7. Individual children’s form choice for OA items in Croatian (shown as the proportion of more frequent, a-final responses), by age.

The effect of input properties

Our fifth hypothesis predicted that children would be more likely to select the more frequent variant when cell frequency was high. This effect was observed only in Estonian (β = 0.31, p < .001). Using log values of frequencies, we found that the more frequent vowel-final forms are more likely to be produced for items with higher cell frequencies. Cell frequency had no significant effect on form choice in Croatian.

We also predicted a pattern frequency effect in both languages. This variable only plays a role in Estonia, where it had a significant positive effect on the production of vowel-final variants (β = 2.0, p < .001). Children were more likely to produce more frequent OA variants for patterns occurring with a wider range of nouns in the target language. In Croatian, we found a significant negative interaction between cell and pattern frequencies (β = -0.48, p = .002).

We found a negative effect of lexical form proportion in Croatian, meaning children are more likely to produce the more frequent -a form when these forms make up a smaller proportion of all forms used for marking that cell for that lexeme (β = -9.78, p = .001). See Discussion.

In Estonian, the corpus form proportion was too divided to emerge as significant in the model (W = 0, p < .001). We did find that the more prominent (productive) the vowel form was as a partitive strategy in the corpus, the more likely children were to produce it instead of the affixal form. To illustrate, -e makes up 22% of partitive plural forms across the corpus, and when presented with a lexeme with -e ~ -sid alternatives, most children (74%) produced the vowel variant. The -u formative, however, is much less frequent in the corpus (6%), and children are also much less likely to produce -u-forms, opting for the -sid formative in 70% of instances, even though the latter is less frequent in adult speech than the -u variant.

Regarding the language-specific hypotheses, Estonian responses were strongly affected by complexity, or whether the target form involves a stem change (β = -3.55, p < .001). Items for which the vowel-based variant involves a change to the nominative stem (e.g. muna ‘egg.NOM.SG’, mune ‘egg.PAR.PL’) elicited far fewer vowel-final responses. For these items, children were more likely to make use of the affixal variants (82% of responses involving a stem-changing variant, N = 212), even though these are less frequent both in the input overall and in their respective OA pairs. For items without stem change in the vowel-final variant (e.g. käpp ‘paw.NOM.SG’, käppi ‘paw.PAR.PL’), participants preferred vowel variants, producing forms with affixal -sid in only 29% of responses. This pattern applies consistently for all five stem-changing nouns in our stimuli (vowel variants produced in 8–25% of responses). Similarly, children consistently produced vowel variants more frequently for most nouns without stem change (19 out of 25 nouns).

In Croatian, the language-specific variable concerned the number of available forms. We found no effect of the number of forms on children’s production of the a-form. This response was produced with similar likelihood in both a~a-a~i and a~i patterns. Moreover, children also produced more a-final forms for lexemes which did not have these as target forms.

6. Discussion

Parallel forms of the sort investigated in this study provide a source of variation for children to acquire which is not obviously conditioned by phonological, semantic or sociolinguistic context. Our study set out to investigate how children acquire parallel forms: whether parallel forms in the input accelerate or impede acquisition compared to lexemes with a single target form (RQ1), and what affects children’s choice among two forms for overabundant lexemes (RQ2). Our experiment tested 3–6-year-olds acquiring Croatian or Estonian as their native language, two systems with plentiful parallel forms, which nevertheless differ in details regarding how the parallel forms are distributed in the system and what factors affect its availability and realisation. Let us review the results, based on our preregistered hypotheses, before discussing issues and implications.

6.1. Summary of results

First, overabundance was found to negatively affect accuracy, contrary to our hypothesis: In both languages, children gave significantly fewer accurate responses to items with parallel forms (OA items) than those without parallel forms (non-OA items), despite the greater likelihood of accurate responses for OA items. We also found higher overall accuracy in Estonian, because of a greater difference in accuracy by item type in Croatian. The main effects were found of lexical cell frequency and age (with a weaker effect) only in Estonian. The other input factor considered in the accuracy analysis, pattern type frequency, had a significant effect on responses in both languages. The predicted interaction – that pattern frequency would have a greater effect where cell frequency was lower – held only in Croatian.

The second set of analyses only included accurate responses given for OA items. For Croatian, a main effect was found in the lexical formant proportion predictor, meaning the proportion of use of one target parallel form over the other(s). A negative interaction was found between the other input properties, cell and pattern frequency, meaning pattern frequency had a greater effect on children producing the a-form when lexical cell frequency was low. No effect was found on the number of possible responses.

For Estonian, the main effects were found of two input properties – lexical cell frequency and pattern frequency – with no effect on the usage ratios and no interactions. A significant main effect was also found for morphological complexity, meaning stem change in the target form. This result confirms what has been reported in earlier studies of morphological acquisition in Estonian (Hallap et al., Reference Hallap, Padrik and Raudik2014; Granlund et al., Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019), namely, that forms involving stem change are more difficult to acquire. Since the stem change variable affected only vowel-final forms in our stimuli, the negative effect of complexity on the selection of vowel-final form is easily apparent. Although vowel-final forms are more frequent in the input, and preferred by children for most lexemes, the stem-changing forms were selected much less frequently. Only five target items included stem change, yet all of these show a reverse of the tendency by children to select the vowel partitive plural form.

6.2. The categorical responses of Croatian children

The Croatian results require some further explanation. As suggested by Fig. 4 and confirmed in Fig. 7, children’s responses were nearly categorical and showed little variation. The overabundant items were all attested in more than one form in corpora and were selected according to corpus frequencies. However, because of the unavailability of CDS corpora of sufficient size, we used corpora of adult written Croatian, which may not be reflective of spoken language or input to children, particularly as the system seems to be in flux.

Yet this does not explain why the Croatian children, when producing accurate responses to overabundant items, nearly universally chose the a-final form. How can we explain this? First of all, although the i-form is the most frequent form in 12 of the target items, the a-final formative is more frequent overall in the genitive plural: both in feminine nouns and across the other genders. Secondly, many of the overabundant target items are compatible with the a-formative, hence lending support to a strategy of preferring the a-forms. Finally, the a-forms produced by the children were sometimes syncretic with the nominative singular (the most frequent, unmarked form, which was provided for each noun before the prompt). Although the genitive plural form is described as having a long /a/ compared with the nominative singular, this is not true for all regional variants of Croatian, and in children’s productions, it is not always easy to tell the difference between a long and short /a/. Hence, there are some systemic reasons for the children to prefer a-forms, and there is also a possible artefact of the experimental situation, where the children may have been more likely to produce the a-form because of its (near-)syncretism with the form given by the experimenter. Note, however, that none of the participants included in the analysis gave only nominative forms as responses. All the participants also inflected nouns in the instrumental case for the fillers.

Moreover, although the non-overabundant items were never (or only marginally) attested in multiple forms in the corpus, some have the same consonant cluster stem as our overabundant target items, meaning that it is compatible with the system to assume they might have overabundant forms. Inaccurate responses were consistently given for 17 nouns. Among these were five non-overabundant nouns, four of which (japanka ‘flip-flop sandal’, kanta ‘bucket’, ljuljačka ‘swing’, maska ‘mask’) use the i-final form, and one (trešnja ‘cherry’) has an a-a genitive plural. All of these were treated by children as a-compatible forms even though they were not labelled as overabundant according to our procedure. The remaining consistently inaccurate items were overabundant items which were classified as having only a-a and i-forms in the input. Yet these were also given a-forms by the children.

Hence, the Croatian model output can be explained almost entirely by the overproduction of a-forms. The lack of both age and lexical cell frequency effects falls out of this. Even the great item type effect on accuracy in Croatian derives from the target nouns including only five non-overabundant nouns incompatible with a-forms (with high inaccuracy), while the overabundant nouns included 12 nouns incompatible with a-forms. Therefore, this turns out to be an inadvertent effect of our target lexical item choices, rather than of item type. The negative interaction between input token and type frequency is determined by this pattern of responses, as is the pattern type frequency effect: the patterns which do not include -a are of low type frequency, and these exhibit a high proportion of inaccurate responses.

Although no previous research has specifically investigated the use of parallel forms in Croatian child language, earlier studies show that plural forms emerge later (Kovačević et al., Reference Kovačević, Marijan, Hržica, Voeikova and Stephany2009) and that children make more errors with morphologically complex nouns (Cvikić & Kuvač, Reference Cvikić and Kuvač2005). According to Kovačević et al. (Reference Kovačević, Marijan, Hržica, Voeikova and Stephany2009), even school-age children experience difficulties in producing morphologically complex plural forms. Cvikić & Kuvač (Reference Cvikić and Kuvač2005) further demonstrate that children employ various strategies to manage the morphological complexity of genitive singular and genitive plural forms: simplification of declension, omission of morphophonological rules, and overgeneralization. The insertion of /a/ into the stem to break a consonant cluster is more complex than the other two concatenative strategies for forming the genitive plural. Furthermore, over 85% of feminine nouns ending in -a have a relatively simple genitive plural, which differs from the nominative singular only in terms of phonological length (lengthening the word-final -a, Bošnjak Botica et al., Reference Bošnjak Botica, Polančec, Musulin, Hržica and Košutar2023). Our findings align with studies showing the effects of form frequency (e.g., Hržica et al., Reference Hržica, Košutar, Botica Bošnjak and Milin2024 for verbs) and morphological complexity on the production of case forms when multiple options are available.

6.3. Frequency measures

The Croatian data discussed above underscores the issue of which frequencies matter, discussed in the literature on frequency effects in language usage, processing (Gries & Divjak, Reference Gries and Divjak2012) and language acquisition. Ambridge et al. note that frequency of all kinds affects language use on every level: “There are effects of ABSOLUTE FREQUENCY (e.g. high-frequency words will be learned earlier than low-frequency words) and RELATIVE FREQUENCY (e.g. of two competing forms, the most frequent will be dominant)” (2015, pp. 241, our italics). Two competing forms always evoke the possibility of including a measure of relative frequency, but it is not always clear what the relevant domain is for measuring it.

With the example of overabundance, because of the particularities of the data, we included the usage ratio predictor per each lexical item in Croatian, but the performative pattern in Estonian. This choice may have an effect on what is significant in the model. Importantly, we examined the frequency of patterns according to the full set of available formatives in the corpus: if a noun was overabundant, then it was grouped for pattern frequency with other nouns showing the same variation. However, for nouns which children usually (or perhaps always) encounter in one of two or three possible variants, there are at least two classes they could be grouped with. Although we grouped them with a morphological class including nouns showing a particular alternation pattern (as shown to be informative in Aigro & Vihman, Reference Aigro and Vihman2024), it is more likely that children would initially group them with other nouns inflecting with the highly frequent formative. The Croatian a-form dominance demonstrates this: many of the nouns classified as overabundant seem not to be overabundant in the children’s usage. Hence, the high frequency of genitive plurals formed with -a overall in the corpus may be a more relevant measure than the smaller group of feminine CC nouns which exhibit overabundance.

6.4. Cross-linguistic comparison

Although the discussion above shows that the Croatian results derive from an overwhelming preference for one form, the cross-linguistic comparison is enlightening. For the Estonian items, both lexical cell frequency and pattern frequency were significant. This result, as well as the overall pattern of high variation in responses (both per child and per lexical item), indicates that the measures we used are relevant to the Estonian input and how it guides the acquisition process.

The difference in results between the two languages may stem from (a) structural reasons (properties of the linguistic systems being acquired), (b) differences in the input, as well as (c) specifics of our experimental design. Structurally, the overabundant category we selected has important linguistic differences in the two languages. In Croatian, the overabundance in genitive plural forms is mostly found within a morphological class of feminine nouns ending with -a. In Estonian, overabundance in partitive plural is distributed broadly across the nominal system. Moreover, there is variance in both languages within the overabundant lexical set, but the variance also differs across the languages. In Croatian, three possible formatives exist, and nouns differ as to which ones they use. In Estonian, all the overabundant nouns alternate between an affixal and a vowel-final form, but the vowel-final form varies across lexemes. Both of these differences mean that the Croatian child encounters variation differently from the Estonian child. The Croatian child has an input which is highly amenable to assuming nouns can be inflected with -a, as most nouns have that as at least one of the options. Estonian children could generalise the -sid affix, which holds across all overabundant nouns and is salient. This would have the additional advantage of avoiding the stem-changing vowel forms. However, this would not be as effective a strategy as it is for the Croatian a-forms, as the affixal forms are not dominant in Estonian, either in overabundant lexemes or across nouns more generally. And it is not the strategy used by the participants in our experiment.

As for differences in the input, it is clear that the frequency patterns of the parallel forms differ for the children in the two languages. Because the plural genitive and partitive forms are not frequent enough to gather data on usage ratios from the CDS corpora, we cannot say exactly what relative frequencies and usage ratios the children encounter in the input. It is likely that the frequencies from the Estonian corpora better reflect the children’s input than the Croatian corpora, because of differences in written and spoken registers and regional dialects in Croatian. However, it is also apparent that the differences in proportional usage in the two languages are great enough to lead to very different outcomes. Whether it is mostly due to structural or distributional factors, Croatian children have learned to select a single formative for most nouns most of the time, while Estonian children show a great deal of variation in forms. There may be a threshold of variance, below which the variation is more resistant to acquisition. The structural factors may play a more important role, leading Croatian children to exemplify the early stages of Shin and Miller’s (Reference Shin and Miller2022) trajectory, while Estonian children are in the later stages of the trajectory at the same age. We cannot tell, based on this study, whether Croatian children use other forms in other, mutually exclusive contexts, nor whether Estonian children are more selective at an earlier age. However, we know that in both languages, speakers eventually arrive at productive knowledge of the variation.

Regarding our experimental design, the effort to match the design across languages means imposing criteria which may leave out important details of the system in one of the languages in order to make a better match. We attempted to select a grammatical category which was testable in the same context and reflected similar variation, but some of the imposed conditions may have meant a selection of target items which biassed children in Croatian differently from Estonian.

Even if this is the case, we believe the value of the comparison outweighs the problems. More cross-linguistic studies which match an identical experimental protocol across languages are crucial for advances in the field: in order to evaluate findings across different studies, we must know that we are comparing like with like. Considering the effects of differences in linguistic structure and input distribution, being able to examine data elicited in similar contexts with similar methods allows us better to generalise our results. We turn to their implications in the following section.

6.5. Implications

The results of this study have implications for our understanding of the acquisition of morphological variation as well as our views of parallel form usage in language. Variation is pervasive in language. Children face variation in the input, but the variation they produce may only partially overlap with that encountered in the input. The emergent process of acquiring the system produces its own variation (e.g. over-generalisation), as well as simplifying the variation which is beyond the grasp of the child at any given point (e.g. regularisation).

Target forms were found to be more difficult for items with parallel forms than those with single target forms in both languages, contrary to our hypothesis. We predicted that children’s errors would be consistent with the system, meaning that overabundant nouns would provide a wider bull’s eye for responses to hit the mark, whereas nouns with only one accurate response would leave the other potential option open, as a temptation to err. Instead, we found that overabundance itself has a negative effect on accuracy. This is an important finding. The existence of multiple forms for a single function increases uncertainty in the system, but the overabundance described here is not said to be unstable or moving toward regularisation. It seems to be a stable feature of the languages; speakers seem to navigate it with ease, though the uncertainty associated with these forms can be identified computationally. The children in our study, however, show that overabundance reduces accuracy in acquisition. This may be partly due to encountering each of the forms less often, but our frequency predictors remain significant in the same model alongside item type. This indicates that overabundance itself leads to uncertainty, over and above the effect of frequency on memory, entrenchment and retrieval.

This provides an elaboration of Shin and Miller’s (Reference Shin and Miller2022) proposed trajectory. If the participants in our study had simply narrowed the target and selected one possible formant for overabundant nouns, they would have been fully accurate. Likewise, at the stage of mutually exclusive contexts, they would have produced one or the other, again remaining accurate. They instead produced more inaccurate forms where variation existed. This suggests that the uncertainty associated with the variation in forms leads to less solid, or slower, acquisition of the forms encountered in the input than those with symmetric form-function mapping: their representation may be less clearly specified, accessing the forms is more challenging, and the forms are reproduced less faithfully. We propose that the children in our study may be at various points along the trajectory, perhaps differing both by language and individual differences.

Two further wrinkles should be added to the trajectory: first, variation may prolong the process of acquisition of parallel forms, and second, variation of different kinds may have different effects on the trajectory. Despite the prolonged period of acquisition, the children’s overall accuracy was similar to the results in Granlund et al. (Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019), in which 3 to 5-year-old Estonian-learning children gave 75% accurate elicited case forms for singular nouns (the same as the overall accuracy rate in the current study), and children learning Polish (a Slavic language related to Croatian) gave 70% accurate responses (compared to 66% in this study).

Regarding variation of different kinds, this study includes both differences across languages as well as some hidden variations not discussed above. The languages provide different distributions of parallel forms, and we do not have fully reliable data on what input the children have been exposed to. Moreover, the Estonian children’s inaccurate responses were often inaccurate in the selection of a non-target-like theme vowel in vowel-final forms. This is the hidden variation: vowel-final forms vary in their formatives and are lexically conditioned. The responses to individual items, as well as the rates of overgeneralisation, need to be looked into further.

Forms which rely on lexically specific information require some minimum amount of exposure and practice in order to be stored and reproduced accurately. Plural forms of nouns are rarer than singular forms, and overabundant items will provide less exposure still to each individual form (assuming the input actually provides exposure to parallel forms). Hence, the inaccurate vowel-final responses in Estonian essentially conform to the system, but the lexically specified morphological information (which vowel to choose) requires longer to acquire and will take place on a more gradual course, than items with single forms. This implies, then, that – even if children at first reduce the variation and select just one form – the presence of parallel forms affects the acquisition trajectory, compared to contexts with no variation. Importantly, the vowel-final forms in Estonian are more frequently used, yet more complex to learn. A model of the acquisition of variation needs to be able to account for (a) the reduced input exposure to each form, as well as (b) the finding that reducing the variation by producing only one form may not circumvent the effects of the variation.

The Croatian children in our study overproduced one of the variants, at the expense of the others, resulting in a simplified system with far less overabundance than is present in the target system. Whether this is an artefact of our experiment or a developmental point for these children, their strategy is founded on both the frequency of the form and its distribution among nouns. Despite the wide age range in our study (3;0 to 6;11), we did not find an effect of age in Croatian, perhaps because the strategy of defaulting to the a-form is so effective with the target items.

The Estonian children of the same age, facing a similarly complex system, produced a wide range of variation. The likelihood of children reproducing the variation in the target language derives from a combination of their emergent knowledge of the morphological patterns, their ability to represent and retrieve the forms and their understanding of how the forms are conditioned and used. Their knowledge, representation and use are affected both by the properties of the system and by their input.

The finding that overabundance negatively affects accuracy aligns with early theoretical approaches which claimed that children favour linguistic forms with a one-to-one correspondence between meaning and form (e.g. Slobin’s Operating Principles, Reference Slobin, Ferguson and Slobin1973 or MacWhinney’s Transparency Hypothesis, Reference MacWhinney1978). These approaches claim that in the early stages of acquisition, children seek regular morphological patterns in which one form corresponds to one function, as this simplifies the cognitive task of learning complex morphology. Parallel morphological forms present a challenge to this preference, with two or more forms for a single grammatical function. Faced with this challenge, children may rely on frequency and/or simplicity to navigate the problem, initially using the more frequent or simpler forms (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015; Bybee, Reference Bybee1985; Shin & Miller, Reference Shin and Miller2022). The Estonian data indicate an interaction between frequency and morphological complexity, showing a preference for the more frequent form unless that is overly complex, in which case the simpler form is selected; the Croatian data demonstrate a preference for the morphologically simpler, more frequent forms. This aligns with theoretical principles emphasising the avoidance of complexity as a strategy in language acquisition.

6.6. Limitations and outlook

Some limitations of the study – regarding measures, cross-linguistic design and selection of target items – are discussed above. This study includes only two languages and examines only production, but another crucial element of the puzzle is the question of how comprehension of this sort of variation develops. To what extent do production and comprehension proceed in step with each other? Do the children who consistently produce only one of the variants have more trouble comprehending the other, or is passive knowledge more advanced than active production?

In future studies, it would be useful to include a wider age range, more languages, and evidence from more varied grammatical structures. However, the most important limitation of the study at hand is the lack of actual data on the children’s input. A study which was able to draw on CDS corpora, or – even better – a sample of each participant’s input, would go a long way to resolving some of the open questions. A mixed approach using corpus data, experimental elicitation and comprehension tests would give a fuller picture of how children acquire parallel morphological forms and why their trajectories look the way they do.

7. Conclusion

This paper reports on a study of the acquisition of parallel forms in two morphologically rich languages, Croatian and Estonian. The cross-linguistic experimental paradigm reveals both similarities and differences in the acquisition of morphological variation in the two languages. Children were less accurate with items associated with parallel forms (two or three possible target responses) than with those items with only a single target response, indicating that variation hindered acquisition. Pattern type frequency also affected accuracy in both languages, and stem change affected Estonian children’s responses.

Croatian children resorted overwhelmingly to one formative in their responses, while the Estonian children exhibited great variability and development with age. Theoretical implications of the study include the incorporation of a more gradual acquisition trajectory for items with variation as well as the acute need for comparable data from multiple languages in deriving models of acquisition of variation.

Acknowledgements

This research formed part of the Feast and Famine project, funded by the UK Arts and Humanities Research Council (grant no. AH/T002859/1, PI Neil Bermel). We are deeply grateful to our colleagues in the project for inspiration, numerous discussions and feedback during the course of this study. During part of the writing up of the study, Sara Košutar was funded by the Norwegian Research Council (grant no. 316103). We are also grateful to speech and language practitioners Andrea Adašević, Majda Čadež and Zrinka Urukalović, who collected data in Croatia, and to research assistants Ester Erik and Elo Petron, who collected most of the data in Estonia. We would also like to express our gratitude to two anonymous reviewers, who helped improve the paper, and to audiences at the SLE (Athens) and Many Paths to Language (Nijmegen) conferences in 2023, and IASCL (Prague) in 2024 for useful discussions.

Competing interest

The authors declare no competing interests.

Appendices

Appendix A. List of stimuli in both languages (overabundant and non-overabundant), with all target forms listed and token frequency per million in the (written, adult) corpora

Corpus frequencies: For retrieving frequency data, we used two corpora for Croatian and a balanced corpus with three sub-corpora for Estonian. The first Croatian corpus is Riznica: Croatian Text Corpus (Ćavar & Brozović Rončević et al., Reference Ćavar and Brozović Rončević2012). It consists mainly of newspapers, books and articles (101 782 863 tokens). The other resource was a sub-corpus of the Croatian Web Corpus (hrWaC; Ljubešić & Klubička, Reference Ljubešić, Klubička, Bildhauer and Schäfer2014), known as ForumHr, which consists of texts from the largest Croatian Internet forum (241 694 709 tokens in total). Two corpora were used to ensure better coverage, including journalistic, fiction and non-professional texts. For Estonians, the Balanced Corpus of Estonian was used (https://www.cl.ut.ee/korpused/grammatikakorpus/). The corpus consists of an equal number of journalistic texts, fiction and scientific texts (5 million each, totalling 15 million tokens).

In both languages, words were checked for their use by caregivers in child language corpora available on CHILDES, but the Child-Directed Speech available in those corpora is too small to extract reliable formant proportion frequencies, or even to use them as a primary source for what is age-appropriate vocabulary. This was done instead through a combination of corpus attestation, CDIs and authors’ knowledge.

Table A1. Non-overabundant Croatian stimuli

Table A2. Overabundant Croatian stimuli, with usage frequency

Table A3. Non-overabundant Estonian stimuli

Table A4. Overabundant Estonian stimuli, with usage frequency

Appendix B. Output for the logistic mixed-effects models for accuracy (described in Results)

Table B1. Accuracy. Model output for Croatian

Table B2. Accuracy. Model output for Estonian

Appendix C. Output for the logistic mixed-effects models for form choice (described in Results)

Table C1. Form choice. Model output for Croatian

Table C2. Form choice. Model output for Estonian

Footnotes

1 Although these corpora were selected here based on Granlund et al. (Reference Granlund, Kolak, Vihman, Engelmann, Lieven, Pine, Theakston and Ambridge2019), who used these three corpora, note that other, larger corpora could be considered in the future in place of the smaller Argus corpus.

* OA forms marked with asterisk were excluded from the OA form choice analysis because the variant with -a was not attested in the corpora.

References

Abbot-Smith, K., Lieven, E., & Tomasello, M. (2008). Graded representations in the acquisition of English and German transitive constructions. Cognitive Development, 23(1), 4866. https://doi.org/10.1016/j.cogdev.2007.11.002CrossRefGoogle Scholar
Aguado-Orea, J., & Pine, J. M. (2015). Comparing different models of the development of verb inflection in early child Spanish. PLOS One, 10(3), e0119613. https://doi.org/10.1371/journal.pone.0119613CrossRefGoogle ScholarPubMed
Aigro, M. & Vihman, V. (2024). Preferences in the use of overabundance: Predictors of lexical bias in Estonian. Cognitive Linguistics, 35(2), 289312. https://doi.org/10.1515/cog-2023-0035CrossRefGoogle Scholar
Aigro, M., & Vihman, V.-A. (2023). Realised overabundance in Estonian noun paradigms: A corpus study. Word Structure, 16(2–3), 154175. https://doi.org/10.3366/word.2023.0227CrossRefGoogle Scholar
Ambridge, B., Kidd, E., Rowland, C. F., & Theakston, A. L. (2015). The ubiquity of frequency effects in first language acquisition. Journal of Child Language, 42(2),239273. https://doi.org/10.1017/S030500091400049XCrossRefGoogle ScholarPubMed
Argus, R. (1998). CHILDES Estonian Argus Corpus. https://doi.org/10.21415/T5888BCrossRefGoogle Scholar
Argus, R. (2009). The early development of case and number in Estonian. In Voeikova, M. D. & Stephany, U. (Eds.), Development of nominal inflection in first language acquisition: A cross-linguistic perspective (Studies on language acquisition), 30 ) (p. 111152). Mouton de Gruyter.CrossRefGoogle Scholar
Argus, R., & Bauer, A. (2020). Muutevormide ilmumine eesti keelt esimese keelena omandavate laste kõnesse [Emergence and productive use of inflectional forms in early Estonian]. Philologia Estonica Tallinnensis, 5, Article 5 https://doi.org/10.22601/PET.2020.05.01CrossRefGoogle Scholar
Au, T. K., & Markman, E. M. (1987). Acquiring word meanings via linguistic contrast. Cognitive Development, 2(3), 217236. https://doi.org/10.1016/S0885-2014(87)90059-1CrossRefGoogle Scholar
Barić, E., Lončarić, M., Malić, D., Pavešić, S., Peti, M., Zečević, V., i Znika, M. (2005). Hrvatska gramatika. Školska knjiga.Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 148. https://doi.org/10.18637/jss.v067.i01CrossRefGoogle Scholar
Behrens, H. (2006). The input–output relationship in first language acquisition. Language and Cognitive Processes, 21(1–3), 224. https://doi.org/10.1080/01690960400001721CrossRefGoogle Scholar
Bermel, N., & Knittl, L. (2012). Corpus frequency and acceptability judgments: A study of morphosyntactic variants in Czech. Corpus Linguistics and Linguistic Theory, 8(2), 241275. https://doi.org/10.1515/cllt-2012-0010CrossRefGoogle Scholar
Birtić, M., Blagus Bartolec, G., Hudeček, L., Jojić, Lj., Kovačević, B., Lewis, K., Matas Ivanković, I., Mihaljević, M., Miloš, I., Ramadanović, E., & Vidović, D. (2012). Školski rječnik hrvatskoga jezika [School Dictionary of Croatian]. Zagreb: Institut za hrvatski jezik i jezikoslovlje & Školska knjiga.Google Scholar
Blevins, J. P. (2008). Declension classes in Estonian. Linguistica Uralica, 43(4), 241267. https://doi.org/10.3176/lu.2008.4.01CrossRefGoogle Scholar
Bošnjak Botica, T., & Hržica, G. (2016). Overabundance in Croatian dual-class verbs. Fluminensia, 28(1), 83106.Google Scholar
Bošnjak Botica, T., Polančec, J., Musulin, M., Hržica, G., & Košutar, S. (2023). The rise and fall of overabundance: The case of Croatian genitive plural forms. In SLE Conference, Athens, August 30th, 2023.Google Scholar
Bošnjak Botica, T. (2024). Morfološko preobilje u hrvatskom jeziku. Zagreb: Institut za hrvatski jezik. 139 pp.Google Scholar
Bybee, J. (1985). Morphology: A study of the relation between meaning and form. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Cappellaro, C. (2013). Overabundance in diachrony: A case study. In Cruschina, S., Maiden, M., & Smith, J. C. (Eds.), The boundaries of pure morphology: Diachronic and synchronic perspectives (pp. 209220). Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199678860.001.0001/acprof-9780199678860CrossRefGoogle Scholar
Ćavar, D., & Brozović Rončević, D. (2012). Riznica: The Croatian language corpus. Prace filologiczne, 63, 5165.Google Scholar
Chan, A., Lieven, E. & Tomasello, M. (2009). Children’s understanding of the agent-patient relations in the transitive construction: Cross-linguistic comparisons between Cantonese, German, and English. Cognitive Linguistics, 20(2), 267300. https://doi.org/10.1515/COGL.2009.015CrossRefGoogle Scholar
Clark, E. V. (1987). The principle of contrast: A constraint on language acquisition. In McWhinney, B. (ed.), Mechanisms of Language Acquisition. Lawrence Erlbaum Associates.Google Scholar
Cvikić, L., Kuvač, J. (2005). The Acquisition of Croatian Masculine Noun Morphology in Croatian as Second Language, Proceedings from VII. International Conference of Language Examination, Applied and Medicinal Linguistics. Dunaujvaros, Hungary.Google Scholar
Dabrowska, E., & Szczerbinski, M. (2006). Polish children’s productivity with case marking: The role of regularity, type frequency, and phonological diversity. Journal of Child Language, 33(3), 559597.CrossRefGoogle ScholarPubMed
Dąbrowska, E. (2008). The effects of frequency and neighbourhood density on adult speakers’ productivity with Polish case inflections: An empirical test of usage-based approaches to morphology. Journal of Memory and Language, 58(4), 931951. https://doi.org/10.1016/j.jml.2007.11.005CrossRefGoogle Scholar
Dąbrowska, E., & Szczerbiński, M. (2006). Polish children’s productivity with case marking: The role of regularity, type frequency, and phonological diversity. Journal of Child Language, 33(3), 559597. https://doi.org/10.1017/S0305000906007471CrossRefGoogle ScholarPubMed
Divjak, D. (2019). Frequency in language: Memory, attention and learning. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781316084410CrossRefGoogle Scholar
Dressler, W. U. (2012). On the acquisition of inflectional morphology: Introduction. Morphology, 22, 18. https://doi.org/10.1007/s11525-011-9198-1CrossRefGoogle Scholar
Engelmann, F., Granlund, S., Kolak, J., Szreder, M., Ambridge, B., Pine, J., Theakston, A., & Lieven, E. (2019). How the input shapes the acquisition of verb morphology: Elicited production and computational modelling in two highly inflected languages. Cognitive Psychology, 110, 3069. https://doi.org/10.1016/j.cogpsych.2019.02.001CrossRefGoogle ScholarPubMed
Erelt, T., Leemets, T., Mäearu, S., & Raadik, M. (2018). Eesti õigekeelsussõnaraamat [Dictionary of Correct Usage in Estonian]. Tallinn: Eesti keele sihtasutus. Retrieved from http://www.eki.ee/dict/qs/Google Scholar
Fehringer, C. (2004). How stable are morphological doublets? A Case Study of /[schwa]/ ∼ Ø Variants in Dutch and German. Journal of Germanic Linguistics, 16(4), 285329. https://doi.org/10.1017/S1470542704040425CrossRefGoogle Scholar
Granlund, S., Kolak, J., Vihman, V., Engelmann, F., Lieven, E. V. M., Pine, J. M., Theakston, A. L., & Ambridge, B. (2019). Language-general and language-specific phenomena in the acquisition of inflectional noun morphology: A cross-linguistic elicited-production study of Polish, Finnish and Estonian. Journal of Memory and Language, 107, 169194. https://doi.org/10.1016/j.jml.2019.04.004CrossRefGoogle Scholar
Gries, S. & Divjak, D. (2012). Volume 1 Frequency effects in language learning and processing. De Gruyter Mouton. https://doi.org/10.1515/9783110274059CrossRefGoogle Scholar
Guzmán Naranjo, M., & Bonami, O. (2021). Overabundance and inflectional classification: Quantitative evidence from Czech. Glossa: A Journal of General Linguistics, 6(1), 131. https://doi.org/10.5334/gjgl.1626Google Scholar
Hallap, M., Padrik, M., & Raudik, S. (2014). Käändevormide kasutamise oskus eakohase arenguga vene-eesti kakskeelsetel ning spetsiifilise kõnearengu puudega ükskeelsetel lastel [Estonian case morphology in second language acquisition and Specific Language Impairment]. Eesti Rakenduslingvistika Ühingu aastaraamat/Estonian Papers in Applied Linguistics, 10, 7390. https://doi.org/10.5128/ERYa10.05CrossRefGoogle Scholar
Hickmann, M., Veneziano, E., & Jisa, H. L. (2018). Sources of variation in first language acquisition: Languages, contexts, learners. Amsterdam: John Benjamins. https://doi.org/10.1075/tilar.22CrossRefGoogle Scholar
Hržica, G., Bošnjak Botica, T., & Košutar, S. (2023). Stem overgeneralizations in the acquisition of Croatian verbal morphology: Evidence from parental questionnaires. Word Structure 16:2–3, 176205. https://doi.org/10.3366/word.2023.0228CrossRefGoogle Scholar
Hržica, G., Košutar, S., Botica Bošnjak, T. & Milin, P. (2024). The role of entrenchment and schematisation in the acquisition of rich verbal morphology. Cognitive Linguistics, 35(2), 251287. https://doi.org/10.1515/cog-2023-0022CrossRefGoogle Scholar
Johnson, E. K., & White, K. S. (2020). Developmental sociolinguistics: Children’s acquisition of language variationWiley Interdisciplinary Reviews: Cognitive Science11(1), 115. https://doi.org/10.1002/wcs.15Google ScholarPubMed
Jojić, Lj., Nakić, A., & Zečević, V. (eds) (2015). Veliki rječnik hrvatskoga standardnog jezika [ Large dictionary of standard Croatian ]. Zagreb: Školska knjiga.Google Scholar
Kaalep, H.-J. (2018). Parallel forms in Estonian finite state morphology. In Proceedings of the 4th international workshop for computational linguistics for Uralic languages (pp. 139153). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-0212CrossRefGoogle Scholar
Kaalep, H.-J. (2012). Eesti käänamissüsteemi seaduspärasused [Patterns in the declension system of Estonian]. Keel ja Kirjandus, 6, 418449.CrossRefGoogle Scholar
Kaalep, H.-J. (2010). Mitmuse osastav eesti keele käändesüsteemis [Partitive plural in the Estonian case system]. Keel ja Kirjandus, 2, 94111.Google Scholar
Kaalep, H-J (2009). Kuidas kirjeldada ainsuse lühikest sisseütlevat kasutamisandmetega kooskõlas? [Describing the short illative in accordance with usage data]. Keel ja Kirjandus, 6, 411425.Google Scholar
Kapović, M. (2018). Širenje nastavka -i u genitivu množine e–deklinacije u suvremenoj štokavštini. Suvremena lingvistika, 44, 3972.CrossRefGoogle Scholar
Kirjavainen, M., Nikolaev, A., & Kidd, E. (2012). The effect of frequency and phonological neighbourhood density on the acquisition of past tense verbs by Finnish children. Cognitive Linguistics, 23(2), 273315. https://doi.org/10.1515/cog-2012-0009CrossRefGoogle Scholar
Kovačević, M. (2002). CHILDES Croatian Kovacevic Corpus. https://doi.org/10.21415/T5FS5XCrossRefGoogle Scholar
Kovačević, M., Marijan, P., and Hržica, Gordana (2009). The acquisition of case, number, and gender in Croatian. In Voeikova, M. D. & Stephany, U. (Eds.), Development of nominal inflection in first language acquisition: A cross-linguistic perspective (Studies on language acquisition, 30 ) (pp. 111152). Mouton de Gruyter.CrossRefGoogle Scholar
Lečić, D. (2016). Morphological doublets in Croatian: A multi-methodological analysis. [Doctoral dissertation, University of Sheffield]. White Rose eTheses Online. https://etheses.whiterose.ac.uk/16068/Google Scholar
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics, 20(3), 481507. https://doi.org/10.1515/COGL.2009.022CrossRefGoogle Scholar
Ljubešić, N., & Klubička, F. (2014). {bs,hr,sr} WaC – Web Corpora of Bosnian, Croatian and Serbian. In Bildhauer, F. & Schäfer, R. (Eds.), Proceedings of the 9th web as Corpus workshop (WaC-9) @ EACL 2014, (pp. 2935). Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-0405CrossRefGoogle Scholar
MacWhinney, B. (1978). The acquisition of morphophonology. Monographs of the Society for Research in Child Development, 43(1/2), 1123.CrossRefGoogle Scholar
Markman, E. M., Wasow, J. L., & Hansen, M. B. (2003). Use of the mutual exclusivity assumption by young word learners. Cognitive Psychology, 47(3), 241275. https://doi.org/10.1016/s0010-0285(03)00034-3CrossRefGoogle ScholarPubMed
Maslen, R. J., Theakston, A. L., Lieven, E. V., & Tomasello, M. (2004). A dense corpus study of past tense and plural overregularization in English. Journal of Speech, Language, and Hearing Research, 47(6), 13191333. https://doi.org/10.1044/1092-4388(2004/099)CrossRefGoogle ScholarPubMed
Metslang, H. (2009). The pitfalls of a language: Estonian as L2. In Fernandez-Vest, M.M.J. & Do-Hurinville, D.T. (Eds.), Plurilinguisme et traduction–des enjeux pour l’Europe (pp. 119134). Paris: L’Harmattan.Google Scholar
Miller, K. (2013). Acquisition of variable rules: /s/-lenition in the speech of Chilean Spanish-speaking children and their caregivers. Language Variation and Change, 25, 3, 311340. https://doi.org/10.1017/S095439451300015XCrossRefGoogle Scholar
Parisse, C. (2019). How large should a dense corpus be for reliable studies in early language acquisition? CogniTextes, 19. https://doi.org/10.4000/cognitextes.1483CrossRefGoogle Scholar
Perfors, A., Tenenbaum, J. B., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37(3), 607642. https://doi.org/10.1017/S0305000910000012CrossRefGoogle ScholarPubMed
Peti-Stantić, A., Anđel, M., Keresteš, G., Ljubešić, N., Stanojević, M., i Tonković, M. (2021). Psiholingvističke mjere ispitivanja 3.000 riječi hrvatskoga jezika: konkretnost i predočivost. Suvremena lingvistika, 44(85), 91112. https://doi.org/10.22210/suvlin.2018.085.05Google Scholar
Poplack, S. (2018). Categories of grammar and categories of speech: When the quest for symmetry meets inherent variability. In Shin, N. & Erker, D. (Eds.), Questioning theoretical primitives in linguistic inquiry papers in honor of Ricardo Otheguy (pp. 734). John Benjamins. https://doi.org/10.1075/sfsl.76.02popCrossRefGoogle Scholar
Raadik, M. (2013). Õigekeelsussõnaraamat [Dictionary of correct usage]. Eesti keele sihtasutus.Google Scholar
Räsänen, S. H., Ambridge, B., & Pine, J. M. (2016). An elicited-production study of inflectional verb morphology in child Finnish. Cognitive Science, 40(7), 17041738. https://doi.org/10.1111/cogs.12305CrossRefGoogle ScholarPubMed
Requena, P. E. (2023). Linguistic variation and grammatical complexity in child heritage speakers. Spanish as a Heritage Language, 3(1), 123. https://doi.org/10.5744/shl.2023.1006CrossRefGoogle Scholar
Savičiūtė, E., Ambridge, B., & Pine, J. M. (2018). The roles of word-form frequency and phonological neighbourhood density in the acquisition of Lithuanian noun morphology. Journal of Child Language, 45(3), 641672. https://doi.org/10.1017/S030500091700037XCrossRefGoogle ScholarPubMed
Shin, N. L. (2016). Acquiring constraints on morphosyntactic variation: Children’s Spanish subject pronoun expression. Journal of Child Language, 43(4), 914947. https://doi.org/10.1017/S0305000915000380CrossRefGoogle ScholarPubMed
Shin, N., & Miller, K. (2022). Children’s acquisition of morphosyntactic variation. Language Learning and Development, 18(2), 125150. https://doi.org/10.1080/15475441.2021.1941031CrossRefGoogle Scholar
Slobin, D. I. (Ed.). (1985). The crosslinguistic study of language acquisition: Volume 1: The data (1st ed.). Psychology Press. https://doi.org/10.4324/9781315802541Google Scholar
Slobin, D. I. (1973). Cognitive prerequisites for the development of grammar. In Ferguson, C. A. & Slobin, D. I. (Eds.), Studies of child language development (pp. 175208). Holt, Rinehart, and Winston.Google Scholar
Smith, J., & Durham, M. (2019). Sociolinguistic variation in children’s language: Acquiring community norms. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Smoczynska, M., (1985). The Acquisition of Polish. In Slobin, D. (Ed.), The Crosslinguistic study of Language Acquisition: The Data (pp. 595686). Lawrence Erlbaum Associates.Google Scholar
Szmrecsanyi, B. (2017). Variationist sociolinguistics and corpus-based variationist linguistics: Overlap and cross-pollination potential. Canadian Journal of Linguistics/Revue canadienne de linguistique, 62(4), 685701. https://doi.org/10.1017/cnj.2017.34CrossRefGoogle Scholar
Thornton, A. M. (2011). Overabundance (multiple forms realizing the same cell): A non‐canonical phenomenon in Italian verb morphology. In Maiden, M., Smith, J. C., Goldback, M. &, Hinzelin, M.-O. (Eds.), Morphological autonomy: Perspectives from romance inflectional morphology (pp. 358381). Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199589982.003.0017CrossRefGoogle Scholar
Thornton, A. M. (2019). Overabundance: A canonical typology. In Rainer, F., Gardani, F., Dressler, W. U. & Luschützky, H. C. (Eds.), Competition in inflection and word formation (pp. 223258). Springer. https://doi.org/10.1007/978-3-030-02550-2_9CrossRefGoogle Scholar
Viht, A., & Habicht, K. (2019). Eesti keele sõnamuutmine [ Estonian inflection ]. Tartu: University of Tartu Press.Google Scholar
Vija, M. (2004). CHILDES Estonian Vija Corpus. https://doi.org/10.21415/T5QS41CrossRefGoogle Scholar
Viks, Ü. (1992). Väike vormisõnastik I. Sissejuhatus ja grammatika. [Little dictionary of word forms, I: Introduction and grammar]. Tallinn: Academy of Sciences, Keele ja Kirjanduse Instituut.Google Scholar
Xanthos, A., Laaha, S., Gillis, S., Stephany, U., Aksu-Koç, A., Christofidou, A., Gagarina, N., Hrzica, G., Ketrez, F. N., Kilani-Schoch, M., Korecky-Kröll, K., Kovačević, M., Laalo, K., Palmović, M., Pfeiler, B., Voeikova, M. D., & Dressler, W. U. (2011). On the role of morphological richness in the early development of noun and verb inflection. First Language, 31(4), 461479. https://doi.org/10.1177/0142723711409976CrossRefGoogle Scholar
Zupping, S. (2016). CHILDES Estonian Zupping Corpus. https://doi.org/10.21415/T5K89HCrossRefGoogle Scholar
Figure 0

Table 1. Genitive plural formatives in Croatian

Figure 1

Table 2. Partitive plural formatives in Estonian

Figure 2

Table 3. Distribution of partitive plural endings in the noun dataset based on the Balanced Corpus of Estonian (Aigro & Vihman, 2023)

Figure 3

Figure 1. Example of the image shown with prompt ‘Bunny/Rabbit sees lots of…?’ (Croatian: Zec vidi puno…?; Estonian: Jänku näeb palju…?)

Figure 4

Figure 2. Accuracy of responses by language and item condition. OA = overabundant (more than one target response), non-OA = non-overabundant (items with one target response).

Figure 5

Figure 3. Accuracy by age (Estonian).

Figure 6

Figure 4. Accuracy by age (Croatian).

Figure 7

Figure 5. Accuracy by pattern frequency (significant in both languages).

Figure 8

Figure 6. Individual children’s form choice for OA items in Estonian (shown as the proportion of more frequent, vowel-final responses), by age.

Figure 9

Figure 7. Individual children’s form choice for OA items in Croatian (shown as the proportion of more frequent, a-final responses), by age.

Figure 10

Table A1. Non-overabundant Croatian stimuli

Figure 11

Table A2. Overabundant Croatian stimuli, with usage frequency

Figure 12

Table A3. Non-overabundant Estonian stimuli

Figure 13

Table A4. Overabundant Estonian stimuli, with usage frequency

Figure 14

Table B1. Accuracy. Model output for Croatian

Figure 15

Table B2. Accuracy. Model output for Estonian

Figure 16

Table C1. Form choice. Model output for Croatian

Figure 17

Table C2. Form choice. Model output for Estonian