1. Introduction
In classic cross-linguistic work, Greenberg (Reference Greenberg and Greenberg1963) reported that more languages have suffixes than prefixes and that languages that have both use suffixes more. This is known as the ‘suffixing preference’, and Cutler et al. (Reference Cutler, Hawkins and Gilligan1985) and Hawkins & Cutler (Reference Hawkins, Cutler and Hawkins1988) suggest explaining it, in part, in terms of research showing that spoken word recognition relies most heavily on the beginnings of words, making it advantageous to have no prefix. To understand how complex words are processed, we must understand how words with affixes in different positions (i.e. prefixes, suffixes and infixes) are processed. Toward this end, in a companion paper, we examined the perception and production of words with affixes (prefixes and suffixes) in Georgian and English (Harris & Samuel Reference Harris and Samuel2025); that paper provides valuable context for the current study, including a more extensive review of relevant literature regarding affixes. Our goals were to test the core claim of the Cutler–Hawkins Hypothesis (hereafter CHH) and to better understand the processing of complex words. Based on earlier research showing that the beginnings of words are ‘psychologically most salient’, the CHH predicts that words with suffixes are easier (faster and more accurate) to process than words with prefixes. In English, we conducted one experiment, auditory lexical decision, with an equal number of prefixes and suffixes. Lexical decision is a standard test of lexical recognition in which the participant indicates by pressing one of two computer keys whether an item is a word. In Georgian, we conducted a large set of experiments, including multiple auditory lexical decision tasks and a production test. We tested a pair of inflectional prefixes/suffixes and a derivational pair. Collectively, the experiments with prefixed and suffixed words did not support the hypothesis: there was no systematic advantage for perceiving suffixed words relative to prefixed words.
In this paper, we investigate clitics rather than affixes. Clitics are little words that are pronounced with a host word. Enclitics are clitics that follow the host and its suffixes, whereas proclitics precede the host and its prefixes. Although clitics are similar in some respects to affixes, we rely on Zwicky & Pullum (Reference Zwicky and Pullum1983) and other authors to distinguish between the two. Cysouw (Reference Cysouw, Dressler, Kastovsky, Pfeiffer and Rainer2005: 18) cites ‘a strong cross-linguistic preference for clitics to be enclitic rather than proclitic, just as affixes show a strong preference for being suffixes rather than prefixes’, and Dryer (Reference Dryer2017) provides data to support this. We ask whether the hypothesized differences between the processing of affixes in different positions can reasonably be extended to processing of clitics in different positions. Such an extension seems logical, not only because of the preference that Cysouw reports, but also because, historically, affixes often develop from clitics (e.g. Harris & Campbell Reference Harris and Campbell1995: 63–65). Himmelmann (Reference Himmelmann2014: 946–948) observes that clitics may cliticize to an unrelated host and be separated from their following grammatical host by a prosodic boundary, whereas this does not occur when clitics follow their grammatical hosts. Such asymmetries in phrasing may prevent clitics that precede grammatical hosts from becoming affixes, whereas no such barrier exists for enclitics. Observations by Halpern (Reference Halpern, Spencer and Zwicky1998: 119) and Asao (Reference Asao2015: 163) also support this extension. Here we report a series of experiments that investigate the perception and production of words with clitics in different positions to test whether differences in processing difficulty do, in fact, account for the observed distributional pattern. For example, if listeners are asked to verify whether an utterance is a real word in their language, processing difficulties could manifest as less accurate or slower responses. Thus, we can test the CHH by measuring accuracy and response times for words with a preceding unit (a proclitic) versus matched words with a following unit (an enclitic).
For the reasons stated above, we might expect the processing of clitics in various positions to be similar to that of affixes in the analogous positions. Our experiments with clitics allow us to test whether an extension of the CHH might gather more empirical support than we found for the more direct test of affixes. In addition, the languages that we study here allow us to examine an extension of the processing-difficulty hypothesis that Cutler et al. (Reference Cutler, Hawkins and Gilligan1985) themselves made, the idea that infixes (or, correspondingly, endoclitics) are disfavored because of their presumed processing difficulty:
We believe that the infrequ[e]ncy of infixing is also motivated by a general processing consideration, namely that languages are reluctant to break up structural units…. It appears highly likely that the adjacency of immediate constituents, both in morphology and syntax, facilitates processing, whereas discontinuities and crossed branching complicate it. By this explanation either prefixing or suffixing should be vastly preferable to infixing, as indeed the distributional facts attest that they are (Cutler et al. Reference Cutler, Hawkins and Gilligan1985: 752).
Of course, the fact that infixes are infrequent and endoclitics are truly rare among languages of the world does not show that they are difficult to process and produce. We need to determine this empirically. We do that here through a series of experiments: lexical decision of real words, lexical decision of nonce words, and production. In the lexical decision test, a classic for this purpose, a faster or more accurate response is interpreted as a consequence of easier processing. Similarly, production accuracy indexes the ease of using a given structure.
Cutler et al. added a potential caveat to their extended hypothesis but ultimately cast doubt on the caveat:
Of course, if a stem has been effectively recognized by the time its uniqueness point has been processed, one might argue that infixing a morpheme between the uniqueness point and the end would provide all the continuity necessary (since the end should be irrelevant), AND get the important affixed information in at the earliest possible useful point, i.e. just when the word has been recognized. There is yet another processing reason for avoiding this, however: the relative insalienc[e] of middle positions in a word. The evidence … suggests that the middle of a word is its least salient part. (Cutler et al. Reference Cutler, Hawkins and Gilligan1985: 752)
Indeed, the evidence that Cutler and colleagues considered (1985: 742–743) – regarding recall from writing, letter reversals (in writing), tip-of-the-tongue phenomena, slip of the ear phenomena and word retrieval – showed that the middle of the word is least salient from these points of view. All of these studies involved English or Dutch, and none were studies of affixes or clitics. Thus, we do not know whether the middle of the word is the least salient in all respects, or in all languages or with respect to affixes or clitics. On the basis of the research they cite, we do not know whether endoclitics and infixes are easy or difficult to process and produce, or even whether suffixes are easier than prefixes as inferred from the study of the vague notion of word beginnings, middles and ends. (What is a beginning? Is it the first segment, the first syllable, the first third of the word?) We aim to shed some light on all of this in the series of experiments reported here.
With the proper choice of target languages, it is possible to design strong tests of the processing costs for clitics in various positions. Among the languages known to have endoclitics, Udi (Harris Reference Harris2002) provides the most positions for comparison. Although other languages have some variation of position, only Udi has clitics before the verb, after the verb, between morphemes and inside the root. Thus, it permits us to test how speakers process a morpheme broken up by another word and compare this position with the other three.
In (1) we see the first-person singular clitic pronoun, -z(u)- , breaking up the unanalyzable root aq’- ‘receive’; we refer to this position as ‘intramorphemic’. In (2) the third-person singular clitic, -ne-, occurs between the two morphemes of the complex verb stem, oʕne ‘cry’ and x ‘say’; we refer to this clitic position as ‘intermorphemic’.Footnote 1
To complement the Udi tests, we also report experiments we have conducted with European Portuguese (EP), which traditionally has clitics in three positions: proclitic, enclitic and intermorphemic. Using two languages, rather than just one, provides a measure of generality. For both languages, we test whether the pattern of processing differences predicted by the CHH is present.
Some have argued that some of the morphemes at issue in Udi and in EP are affixes rather than clitics (Crysmann Reference Crysmann, Gerlach and Grijzenhout2000, Reference Crysmann, Cann, Grover and Miller2001, Reference Crysmann2003, Luís Reference Luís2004, Spencer & Luís Reference Spencer, Luís, Booij and van Marle2005, Reference Spencer and Luís2012). In both languages, these morphemes have some of the characteristics of clitics, and we refer to them as clitics informally, while recognizing their other characteristics. Note that Cutler et al. base their generalization not on affix type (prefix, suffix) but on the position relative to the stem (preceding, following, 1985: 747). The clitic pronoun that precedes the verb when a negative is present (in the order NEG Clitic(s) Verb) may be analyzed as a true proclitic or as an enclitic to the negative, which precedes the verb (see Himmelmann Reference Himmelmann2014: 946–948, summarized in the introduction). The linear order is the same in Udi and EP. Here, we refer to these informally as proclitics, while recognizing their complex status.
The literature offers little prior empirical work on how affixes or clitics may affect processing. Many experiments have relied on the written medium (following the influential work of Taft & Forster Reference Taft and Forster1975), which is a less accurate representation of natural language than speech. Bridgers & Kacinik (Reference Bridgers and Kacinik2017) find that in written Italian, words with suffixes are recognized with greater speed and accuracy, thus supporting the CHH. However, they interpret certain electroencephalogram (EEG) evidence, mostly from Dutch and English, as showing that prefixed words are processed more quickly. They observe, further, that other researchers have shown that prefixes slow word recognition.
Hupp, Sloutsky & Culicover (Reference Hupp, Sloutsky and Culicover2009) show that the suffixing preference is a general cognitive strategy that applies not only to language but also to music and vision. Martin & Culbertson (Reference Martin and Culbertson2020) replicated their results and extended the experiments to speakers of Kîîtharaka, a prefixing language. Those speakers attended to variation at the beginning of a sequence. Their conclusion is that the site of variation in the native language determines where speakers attend most to variation, in all domains.
Because cross-modal priming involves spoken language test items, it may be better suited to studying stored word structure than studies using purely visual stimuli. Using auditory primes for visual targets (AV), Marslen-Wilson et al. (Reference Marslen-Wilson, Tyler, Waksler and Older1994) showed that responses to a prefixed prime (e.g. prepay) for a suffixed target (e.g. payment) were faster than a suffixed prime (e.g. payable). Feldman & Larabee (Reference Feldman and Larabee2001) worked with a variety of prime-target combinations: visual primes for auditory targets (VA), AV and VV. They found that the failure of suffixed primes to facilitate recognition seemed to depend on modality.
There are few, if any, prior studies of the processing or production of clitics by unimpaired adults that can inform our experiments. Kuehnast’s (Reference Kuehnast2009) study used printed words, not spoken language. We nevertheless review it because it seems to be the only study directly relevant to the experiments reported here. Kuehnast examined the processing of clitics in Bulgarian in Broca’s aphasics compared with that of unimpaired controls. In Bulgarian, as in Udi, clitics cluster after negative elements, and this whole cluster precedes the verb (see discussion below). In both languages, clitics are enclitic to the verb under certain circumstances. Bulgarian clitics are object pronouns and reflexives, like those in EP. The clitics we are testing in Udi, however, are subject pronouns. (There are additional differences between the pronominal clitics in the two languages.) We report here only on elements of Kuehnast’s study that are relevant to our interests, including only the results for unimpaired controls.
In self-paced reading with a fixed window, Kuehnast compared three conditions: (i) grammatical sequences in which the pronoun was enclitic to the negative, with this cluster preceding the verb, (ii) ungrammatical sequences of the negative, immediately followed by the verb, with the pronoun enclitic to that (ungrammaticality results from the order) and (iii) grammatical affirmative sequences with a verb followed by enclitic pronouns. It was found that for unimpaired speakers, pronominal clitics are read more quickly in the first condition than when they follow the verb. This finding runs counter to the CHH, but note that it comes from printed words. Between conditions two and three, ungrammatical enclitics were read somewhat more slowly than grammatical ones.
In this article, we report on a series of experiments in Udi and in EP undertaken to better understand the roles of proclitics, enclitics and endoclitics. Our processing experiments are based entirely on spoken language because this is more natural than written language. The CHH predicts that words with suffixes are easier (faster and more accurate) to process, but as noted above, our empirical tests of this idea (Harris & Samuel Reference Harris and Samuel2025) found no support for suffixed items being consistently easier. We assume that proclitics are analogous to prefixes. On this basis, we test whether proclitics are easier to process and produce than enclitics (the pattern reported in Bulgarian), whether instead enclitics lead to easier processing (the CHH pattern, regarding affixes) or if there is no clear advantage one way or another (the pattern we found in Georgian, regarding affixes). To make this more concrete, in both languages, we use verification tasks in which listeners must indicate (with a Yes/No button-push response) whether a given sequence is legal in their language. For example, in Udi, te-yan bak-e ‘we were not’ and bak-al-yan ‘we will be’ are both legal, but they differ in the placement of the clitic. The CHH would predict faster and/or more accurate responses for the enclitic items. The same logic holds for EP. In both languages, we also investigate whether clitic placement affects production.
In any set of word stimuli, there will be some variation in familiarity (typically indexed by some kind of word frequency measure) and usually in length (in segments, syllables, morphemes). What is critical in an experiment is matching such factors across conditions that are being compared, not keeping such things constant within a condition. In fact, in our experiments, we carefully match item sets as closely as possible in this way so that comparisons of performance across the different clitic-position conditions are not contaminated by systematic differences in such factors. To show this, we have made our complete sets of stimuli available on the website of this journal.
Researching the processing of prefixes and suffixes in most languages involves different affixes. For example, Harris & Samuel (Reference Harris and Samuel2025) compared processing of the Georgian prefix v- (first-person subject) with that of the suffix -s (third-person singular subject). These are directly comparable in the sense that both are subject person markers and both are a single consonant, but they are different morphemes. In the experiments in Udi and EP that we report here, however, we are comparing the positioning of the same clitic. For example, the Udi clitic =zu (first-person singular subject) can occur enclitic to the verb, proclitic to the verb or inside the verb. Thus, the present paper provides several important advances to the companion paper on English and Georgian affixes.
In Section 2, we describe the clitics of Udi in greater detail; experiments with clitics in Udi are described in Section 3. The clitics of EP are introduced in Section 4, and the experiments with these clitics in Section 5. Section 6 provides a general discussion.
2. Clitics in Udi
The general subject clitic pronouns of Udi are given in Table 1. These clitics are collectively referred to as person markers (PMs). All three plural clitics are pronounced more or less as shown in Table 1 in all phonological environments, but there is a great deal of variation in the forms of the singular clitics. Moreover, a different morpheme -ne (irregular present tense of the light verb pesun ‘say’) and a third morpheme -en (hortative ending) exist. Because of potential confusion, singular subjects in our experiments are limited to filler stimuli. We also used plurals because they are phonologically more robust (CVC), and this will be important in the next experiment, which we wanted to be parallel to this one. All experiments are limited to the clitics in Table 1.
Udi is the perfect language in which to study the positioning of clitics since it permits a given clitic to occupy more positions than other languages. All subject clitic pronouns in Udi may occupy different positions under various specific conditions. These positions are summarized in Table 2, using the third-person plural =t’un as an example.
A complex verb in Udi is formed with an incorporated element (noun, adjective, etc., aš ‘work’ in Table 2) and a light verb (-b- ‘do’ in Table 2). There are about half a dozen light verbs; some can be used independently and thus have a meaning, and others cannot. Simplex verbs in Udi are of the form (C)VC(C) and lack a light verb; their stems each have only one morpheme. With verbs of all types, PMs must be enclitic to the verb in certain tenses, including the future II, illustrated in Table 2.Footnote 3 PMs must cliticize to negative particles (illustrated in Table 2), question words (e.g. šu ‘who?’) and other focused constituents.Footnote 4 The PM may be in intermorphemic position with complex stems; with simplex stems, the PM never occurs between morphemes. The PM may be intramorphemic with simplex stems; it never occurs in this position with complex stems.
Another aspect of clitic placement should be mentioned here. Negatives usually stand immediately before the verb, as illustrated in Table 2, but the negative te and its clitic pronoun may instead immediately follow the verb, as illustrated in (3) or, under certain circumstances, occur in intermorphemic (4) or intramorphemic position (5).
Further, under the conditions where we expect intramorphemic or intermorphemic occurrence of clitics, the clitic is occasionally found in enclitic position. This position is not considered ungrammatical, but dispreferred.
Udi is spoken in two dialects. One is found in the village of Vartašen in Azerbaijan and in the village of Zinobiani in Georgia. The other dialect is spoken in the village of Nij (Nic, Nich, Nizh) in Azerbaijan, and it was here that our experiments were carried out.
3. Description of Experiments in Udi
To test the CHH extended to clitics and to learn about the perception and production of clitics in different positions, we ran three experiments with native speakers of Udi. These included processing experiments of two kinds – two types of verification task – and a production task.
3.1. Experiment 1: Perception of existing Udi words and utterances that violate placement rules for Udi clitics – verification task
Experiment 1 was a verification task in which subjects heard a series of experimental items and, for each, decided whether it was a legal verb form (Yes) or a verb form that violates the rules for placement of clitic pronouns (No). The goal was to see whether subjects recognize items with clitics in one position more easily (more quickly and/or more accurately) than in another position. To verify that a stimulus is indeed a legal Udi word, the listener must recognize the item – the input must make contact with the stored lexical item. Thus, any differences in accuracy and/or response time, as a function of clitic position, indicate processing difficulty differences. The CHH says that the suffix preference derives from such processing differences that favor later positions over earlier positions because of the hypothesized privileged status of word onsets. It thus predicts faster/better verification for later-positioned units (suffixes and enclitics) than for earlier ones (prefixes and proclitics).
In experiments of this sort, prior research has shown that performance is influenced by how familiar a word is likely to be for the listeners. In well-researched languages, there are databases available with measures of word frequency that can be used to predict the familiarity of each word. No such data are available for languages like Udi, but research on understudied languages (e.g. Kgolo & Eisenbeiss Reference Kgolo, Eisenbeiss, Harris, Jaeger and Norcliffe2015) shows that native speakers’ estimates of frequency are relatively accurate. Therefore, after they had completed the experiments, we asked two of our subjects to estimate the frequency of verbal lexemes. The averages of their answers are listed with experimental items in Appendices A and E.
3.1.1. Materials
As described above, in Udi, there are two positions for endoclitics. Complex verbs require clitics to occur between the two morphemes of the stem under certain circumstances, whereas under the same circumstances, clitics occur within the single morpheme of the stem of a simplex verb. Among grammatical items, Experiment 1 included 24 with enclitics, 24 with proclitics, 24 with clitics between the morphemes of a complex stem and 24 with clitics inside the single morpheme of the simplex stem. These were balanced by 24 ungrammatical items in each of the four conditions.
In each condition, there were eight each of first-person plural subjects, second-person plural subjects and third-person plural subjects in correct forms and the same number in incorrect forms. Critical items with enclitics included simplex and complex verbs in the future II, one of the tenses that forces clitics into this position. Critical items with proclitics included simplex and complex verbs in the aorist I (a past tense with perfective aspect) with negative te, which immediately precedes the verb and attracts clitics. Critical items with the clitic in one of the endoclitic positions were also in the aorist I, but without a negative.
Incorrect forms were made ungrammatical by breaking rules of PM clitic placement. In the future II, which requires enclitics, endoclitics were used instead. For example, beside the correct form biq’-al-yan ‘we will get’, we constructed the incorrect * bi-yan-q’-al, where the PM yan ‘we’ intrudes in the monomorphemic root, biq’ ‘get’. We expect shorter reaction times and fewer errors with the correct form. The same expectation would extend to the correct complex form ser-b-al-yan ‘we will make’ compared with the incorrect *ser-yan -b-al. In the negative, which favors the proclitic (see options in (3)–(5) above), the PM was instead enclitic to the verb (leaving the negative before the verb). For example, beside the grammatical te-nan bos-i ‘y’all did not throw away’, we presented the ungrammatical *te bos-i-nan. A comparable example with a complex stem is te-nan k’os’-b-i ‘y’all did not bend’ for which we constructed the incorrect *te k’os’-b-i-nan. In the aorist I (without a negative), which requires an endoclitic, clitics were placed between the verb stem and the aorist I suffix, a position which is never grammatical for clitics in Udi. (Enclitic position was not used because it is grammatical, though dispreferred, with this tense–aspect–mood.) For example, parallel to the correct simplex a-t’un-k’-i ‘they saw’, we presented the incorrect * ak’-t’un-i. The correct complex p’uč’ur-tun-b-i ‘they peeled’ was paralleled by the ungrammatical *p’uč’ur-b-t’un-i. These 96 correct and 96 incorrect critical items are listed in Appendix A, where clitics that occur inside monomorphemic stems are coded as ‘endoclitic A’, and those that occur between the morphemes of a complex stem are coded as ‘endoclitic B’.Footnote 5 Additional examples are given in Table 3.
In addition, 192 fillers, half correct and half incorrect, were used. Grammatical fillers involved a mix of simplex and complex verbs in tense–aspect–mood forms not used for critical items, including hortative (which has no clitic), present tense (which requires an endoclitic position), future I tense (which in this dialect requires enclitic position), zero copula (which requires that the clitic be enclitic to the predicate nominal, also present), aorist II (which requires endoclitic position), singular imperative (which has no clitic) and the -ala subjunctive (which requires enclitic position). Two fillers – one a hortative and one an imperative – included the negative ma (prohibitive). The filler items that required clitics had singular subjects, equally divided among first, second, and third persons. Incorrect items were made ungrammatical by placing the clitic incorrectly. These 192 stimuli are listed in Appendix B.
Experimental items were recorded by our consultant, a native speaker of the Nij dialect, using a Tascam DR-40 recorder and a Shure SM10A microphone in a quiet room in her home. Sound files of individual items were digitally cut from the original recordings using a waveform editor. File onsets and offsets were kept close to the actual sound onsets and offsets to assure accurate reaction time measurements. Amplitudes were adjusted when necessary to have a uniform set of stimuli. Stimuli and the experimental software were transferred to a laptop computer.
As we noted above, our findings with affixes in Georgian and English did not favor either prefixes or suffixes, and Kuehnast’s findings with clitics in Bulgarian favored proclitics; neither result is consistent with the CHH. Our experiments here provide another set of tests of this idea. In addition, based on the insight of Cutler et al. (Reference Cutler, Hawkins and Gilligan1985) quoted in the introduction, we hypothesized that endoclitics would be more difficult, and that between these, intramorphemic clitics would be most difficult
3.1.2. Method
With the help of our consultant, we were able to recruit 40 participants for our three experiments. Of these, 28 were female, and 12 were male. A few participants declined to complete Experiment 3, but no one was permitted to do Experiment 2 unless they had completed Experiment 1, and no one was permitted to do Experiment 3 unless they had completed Experiment 2 (details are provided below). Experiments were carried out at our consultant’s home, at the participants’ homes, or in a quiet room in a local school during the summer vacation. Participants were offered payment whether or not they completed all three experiments.
Participants heard a series of experimental items through headphones from the laptop computer. On each trial, the participant indicated Yes or No answers by pressing one of two labeled keys – the J key (Yes) or the F key (No) on the computer keyboard. Reaction times were measured from the onset of the speech to when the key was pushed. Presentation of the 384 experimental items was randomized for each participant. Our consultant gave instructions in Udi, including how to press the Yes and No keys, and each participant completed a short practice session consisting of 18 items, including nine correct and nine incorrect items. Participants were instructed to answer as quickly and accurately as possible.
3.1.3. Results and discussion
The results for five participants were excluded because of high error rates. A participant’s data were excluded if average accuracy for either Yes or No (or both) stimuli was at or below chance. The results for the remaining 35 participants were submitted to two analyses of variance (ANOVA), one for the error rates and one for the reaction times. Each ANOVA had a single factor corresponding to the four locations that the clitics could appear in. Figure 1 presents the means for the four conditions (left panel: error rates, right panel: reaction times). Here and in subsequent graphs, we have used ‘simplex’ for intramorphemic clitics and ‘complex’ for intermorphemic clitics.
As the left panel of Figure 1 shows, accuracy on the task was very high – the overall error rate was 2.5%. With this compressed range, the small differences among the conditions did not reach significance: F(3,102) = 2.628, p = 0.054. As the right panel shows, the reaction time measure produced a much wider range of performance. The reaction time differences were highly reliable: F(3,102) = 255.124, p < 0.001.
Recall that, extended to clitics, the CHH predicts three differences, given its reliance on widely accepted views regarding spoken word processing. One prediction is that interrupting a stem should be disruptive of processing. This view predicts that the stem-internal clitics should lead to worse performance than the stem-external ones. As Figure 1 shows, the reverse was the case – reaction times were actually faster for the endoclitics than for proclitics or enclitics: F(1,34) = 483.982, p < 0.001. Along the same lines, interrupting a morpheme in a simplex verb was predicted to be more disruptive than placing a clitic between the two morphemes of a complex verb. The data also show a reversal of this prediction, with faster responses to the simplex than the complex endoclitic cases: F(1,34) = 128.363, p < 0.001. The data are, however, consistent with the third prediction of the hypothesis – responses were, in fact, slower when a clitic was placed before the verb than when it came after it: F(1,34) = 127.056, p < 0.001.
Collectively, the results of Experiment 1 are not supportive of the extended CHH: one prediction was confirmed, but in the other two cases, there were actually significant reversals, with significantly worse performance in a condition that was predicted to be better.
3.2. Experiment 2: Perception of Udi nonce words – verification task
One of the challenges in studying language processing, and other cognitive functions, is that humans are so good at the task that they perform at ceiling levels. A general strategy for overcoming this problem is to make the task more difficult than usual – to challenge the system. The strategy we adopt here is to eliminate the subject’s familiarity with the stimuli. Here, we eliminate familiarity by having subjects do the same task as in Experiment 1, but with nonce words – stimuli that follow the phonotactic rules of Udi, but that are not known Udi words.
3.2.1. Materials
Stimuli for this experiment mirrored those of Experiment 1, consisting of (i) nonce words structured correctly and (ii) nonce words structured incorrectly (that is, with the clitics in a position that is incorrect for the context used). For example, (i) p’oš-al- t’un and af-b-al-yan are correctly structured simplex and complex stem verbs in the enclitic condition, and (ii) p’o-t’un-š-al and af-yan-b-al are incorrectly structured nonce words. To set the context, we used real clitics, real tense–aspect–mood suffixes and (with some items) the real default negative. The plural subject markers are CVC in form, and this robust form helps the participant to identify them as subject clitics. We included 30 correct items with enclitics, 30 with proclitics, 30 with endoclitics between the morphemes of complex verb stems and 30 with endoclitics inside the single morpheme of simplex verb stems. We included the same numbers of incorrect forms, made ungrammatical by placing the clitics in positions that are incorrect for the circumstances but not disallowed overall in Udi. These 240 critical items are listed in Appendix C.
Nonce words were based on correct and incorrect real words, and our consultant was asked to verify that each could be pronounced. If not, we modified the proposed item together.
In addition, we used 111 correct and 111 incorrect fillers, including hortative, present tense, future I tense, zero copula, aorist II, singular imperative and the -ala subjunctive. With fillers that take a subject clitic, singulars were used. All fillers for this experiment are listed in Appendix D.
3.2.2. Method
All 40 participants did this experiment, but results for two of these failed to record for technical reasons. This experiment was carried out after Experiment 1, in the same locations, and all participants in this experiment had previously completed Experiment 1. Using the same two response keys as before, participants were asked to indicate whether each item could be an Udi word, even though it is not. The participants first heard a correctly formed masdar (citation form) of the nonce verb, followed by a correctly or incorrectly formed finite form. Because this is a difficult task, we spent a substantial amount of time explaining it and permitted participants to go through the training phase twice if they wished. Each participant completed a training session consisting of 18 items – nine correct and nine incorrect. Other methods were as in Experiment 1.
3.2.3. Results and discussion
As noted above, we explicitly designed Experiment 2 to be difficult by removing any lexical support for the task. Participants did, in fact, find this more abstract task to be quite challenging. We set a criterion of accuracy on correct critical items (i.e. critical items that did follow the correct pattern of clitic placement) of at least 50% overall. Using this cutoff, 24 of the 38 participants were included in the data analyses, with an overall accuracy of over 70% on these items.
The issues and predictions in Experiment 2 are the same as those in Experiment 1. Figure 2 shows the error rates (left panel) and reaction times (right panel) for these nonce stimuli.
The shift to nonce words was successful in making the task much more difficult. Even for the participants who exceeded our threshold for inclusion, error rates are much higher than for real words, and overall reaction times were also higher. For the errors, it is clear that one condition was much worse than the other three. When we interrupted a single morpheme (the simplex nonce verbs), participants rejected the resulting legal form almost half of the time, a rate significantly higher than the other three cases: F(1,23) = 17.988, p < 0.001.
The contrast between the low error rate in all conditions in Experiment 1 (Figure 1), on the one hand, and the much higher error rate with nonce words specifically in the intramorphemic condition in Experiment 2 (Figure 2), on the other, suggests that grammatical forms of real words are stored in memory, rather than derived by rule. There are a total of 18 clitic pronouns in Udi, but any given verb occurs with only one set of six, such as the ones in Table 1 (and only those verbs requiring ones in the table were used in our experiments). In addition, either q’a- subjunctive or te ‘not’ may precede clitic pronouns in endoclitic position. Since each verb can occur with any one of six pronouns, plus any of those with q’a-, or any of the clitic pronouns with te, there are 18 possible clitic insertions in this condition. Given that there are few verbs that permit intramorphemic clitics (about 43), this would mean that the speaker would need to store 18 x 43 forms, or 774 forms. This is well within the capacity of memory (consider the many thousands of different objects that humans can easily recognize, and the tens of thousands of words that an adult knows). On this view, during perception, intramorphemic clitics are not functionally interrupting the simplex verb – the whole form is stored. This explanation could account for the much higher error rate for intramorphemic clitics in Experiment 2 because listeners have never had the opportunity to develop memory representations for these nonce forms; in effect, these really are interrupting the base form, leading to many errors.
For the reaction times, recall that we considered three predictions for the real words and only found support for one of them. The same is true for the nonce words. On average, interrupting a root did not slow down processing compared to the average of placing a clitic before or after a word: F(1,23) = 0.604, p = 0.445, n.s. And, interrupting a morpheme did not slow responses compared to placing the clitic between two morphemes (but there was the accuracy difference noted above): F(1,23) = 1.414, p = 0.247, n.s. The prediction that held up, as it did for the real words, was that placing a clitic before a word led to slower responses than placing it afterward: F(1,23) = 36.577, p < 0.001.
The results for the nonce verification task are mixed. On one hand, the CHH prediction that proclitics should be worse than enclitics was supported by the reaction times, as was the prediction that interrupting a morpheme should be disruptive (at least in the accuracy measure). On the other hand, these two effects only showed up in one measure (reaction time for the first, accuracy for the second), and the overall prediction of interrupting a root being problematic did not hold.
3.3. Experiment 3: Production of existing Udi words – generation task
There is a sizable literature that investigates how humans generate spoken language, though the work in this field is disproportionately based on English and Dutch (e.g. Warker & Dell Reference Warker and Dell2015, Ferreira, Morgan & Slevc Reference Ferreira, Morgan, Slevc, Rueschemeyer and Gaskell2018). By manipulating conditions for these productions, and measuring response times, errors and/or neural activity, researchers can learn about the representations and processes that support speech production. We adopt this generation approach in Experiment 3 to investigate whether there are differences in the difficulty of producing words with clitics in different positions.
3.3.1 Materials
In this production experiment, we aurally presented the verb in citation form together with one of three environments to elicit forms with the clitic in a certain position. The citation form was presented in a female voice, and the environment information (see below) was presented in a male voice; the male voice’s onset lagged the female voice by 500 msec. The citation form gives the speaker the verb stem to be used in her response. The environments were established by presenting a negative in some instances and a time adverb designed to elicit a future or a past tense. In addition, an independent pronoun (similar in some persons to the clitic pronoun) was presented to indicate the person of the subject; as in previous experiments, all pronouns associated with the critical items were plural. Aural presentation was used because, while Udi is a written language, few speakers read and write in it, preferring instead Russian or Azeri for written communication. The use of two different voices (one male, one female), and the 500 msec offset between them, made it easier for listeners to perceptually separate the input.
The citation forms included simplex and complex verbs, with 24 of each accompanied by the future (to elicit enclitics) and by negatives (in the past, to elicit proclitics). In addition, citation forms of 24 simplex verbs were accompanied by the past (to elicit intramorphemic endoclitics); citation forms of 24 complex verbs were also accompanied by the past (to elicit intermorphemic endoclitics). These 96 critical items are listed in Appendix E.Footnote 6 In Appendix E, some third-person pronouns are listed as ʂorox (šorox) ‘they’ and others as ʂot’oǧon (šot’oɣon) ‘they’. These are the absolutive and ergative case forms, respectively – the former generally required by intransitive verbs, the latter by transitives.
Table 4 lists the prompts we used for various conditions and sample correct responses. There are two future tenses (see Footnote note 3), but in this dialect, both require an enclitic. We expect that incorrect responses would omit the clitic or place it in a position that is inconsistent with the prompts.
In addition to the 96 critical items, there were 96 fillers – 48 items in the present tense and 48 in the future tense. With the fillers, equal numbers of first-, second- and third-person singular pronouns were used. These filler items are listed in Appendix F. Some third-person singular subjects are given as ʂo (šo) ‘he, she, it’ and others as ʂot’in (šot’in) ‘he, she, it’. The former is the absolutive case form, and the latter the ergative.
3.3.2. Method
Of the 40 participants from the prior experiments, 35 completed this experiment, but data from one of these were lost due to technical problems. The data from two participants were eliminated because of high error rates, leaving a final sample of 32 participants.
Presentation of the 196 experimental items was pseudo-randomized across two groups, with one having the experimental items in reverse order with respect to the other. Fifteen participants completed the A version of the production task, and 20 completed the B version. A practice session consisted of nine items.
The first item in each of the two pseudo-randomized groups had to be omitted from the count since a hardware error prevented participants from hearing these items. The two omitted items were both fillers, so the numbers of critical items were not affected.
3.3.3. Results and discussion
Each production was matched with the key produced by our consultant (keys are listed in Appendices E and F). If the two were the same, ‘y’ was scored. If the two differed, the response was transcribed. Repetition of part of the prompt with a correct response was not scored as an error. We counted as incorrect those responses with verb stem, tense, or person or number of the clitic not corresponding to the prompts, a negative when it had not been prompted, or no negative when it had been prompted. A response was considered correct if it had a verb stem, tense and person–number combination of the clitic corresponding to the prompts, as well as a negative if and only if prompted. In complex verbs, a stem error was scored whether it was the incorporated element (usually noun or adjective) or the light verb that was in error. Use of a causative of the correct verb was also scored as a stem error. There are many pairs differing only by light verb, such as tam-bak-e ‘she/he/it was fulfilled’ and tam-b-e ‘she/he/it fulfilled’; use of one of these when the other was prompted was scored as a stem error. Either of the future tenses was deemed correct in response to the ‘tomorrow’ prompt; in this dialect, both require enclitic placement. Either the aorist I or the aorist II was accepted as a valid answer with the past tense prompt, and the more complex past tenses were also accepted. All of these condition the same (endoclitic) placement. Several other minor variants were accepted as correct, since the prompts indicated only ‘yesterday’, ‘right now’, and ‘tomorrow’. We interpret errors in stem, tense, person–number and negation as a reflection of the difficulty encountered by a participant in generating the required form. Across all conditions, there were a total of 211 tense errors, compared with 153 stem errors, 363 negation errors and 454 errors in person or number.
Participants seldom placed a clitic in an ungrammatical position, omitted a clitic or doubled a clitic. In the fillers, there is one example of a doubled clitic and one omitted clitic. In the enclitic condition, every response that was otherwise error-free had the clitic in enclitic position. In the condition in which the clitic is expected to precede the verb, some responses (92 out of 302 correct responses) placed the clitic and its host negation after the verb (e.g. bap-i te-nan instead of te-nan bap-i ‘you have not arrived’), but this is permitted by Udi grammar (see discussion in Section 2). In the same condition, a few responses (seven out of 302 correct responses) placed the negative and clitic together in endoclitic position (e.g. port-te-nan-b-i instead of te-nan port-b-i ‘you were not patient’); again, this is permitted by the grammar, though it is not often used (see discussion in Section 2). There was no response in this condition that put the clitic alone in enclitic position (e.g. *te bap-i-nan , *te port-b-i-nan ) or endoclitic position (e.g. *te ba-nan-p-i, *te port-nan-b-i), either of which would be ungrammatical. In the two endoclitic conditions, some responses put the clitic not in the expected endoclitic position but in enclitic position (e.g. həzir-b-i-yan instead of the expected həzir-yan-b-i ‘you prepared it’), which is also grammatical. We cannot count these as incorrect, even though under normal circumstances, a speaker does not say this.Footnote 7 These results are summarized in Table 5. No response had a proclitic unless a negative also preceded the verb to trigger this position. No response had a clitic in intramorphemic position in a complex verb. It is logically impossible to place a clitic between the morphemes of a stem in the intramorphemic condition – that is, with a simplex verb.
Recall that the negative–clitic combination sometimes follows the verb or occurs in intramorphemic or intermorphemic position. These positions are not ungrammatical, just infrequent. In Table 5, the row for condition 2 shows that about 30% of correct negatives were actually enclitic to the verb – more than expected based on this position in texts and conversation. The table also shows about 2% of correct negatives occurred in endoclitic position. Enclitics are infrequent in conversation under the conditions where endoclitics (both intramorphemic and intermorphemic) are expected, but in this production experiment, the lines for conditions 3 and 4 show that it is more common than the preferred position. Together, the results for conditions 2, 3 and 4 show an unexpected frequency of the enclitic position in conditions that ordinarily are associated with a different position. From the point of view of production, the enclitic position appears to be the default.
It proved to be very difficult for participants to keep in mind the verb that they were given, put it in the tense that corresponds to the adverb that they heard, use the clitic appropriate to the pronoun they heard, in some instances add a negative, and still put the clitic in the expected position. Scores were therefore generally fairly low.
The left panel of Figure 3 presents the error rates for the four conditions for the 32 usable participants. As the figure shows, error rates for clitics after the verb and for those within the stem (both intramorphemically and intermorphemically) were essentially the same (approximately 27%). In contrast, for items where the clitic was expected before the stem, error rates were extremely high (approximately 61%), leading to a robust effect of condition: F(3,93) = 29.214, p < 0.001.
In scoring the individual-participant data, it became apparent that for a subset of the participants, trials involving a negative generated a disproportionate share of the errors. It may be that under the challenging conditions of the task, some participants may have treated negation as the most dispensable aspect. To check whether this might overly influence the overall pattern of results, we analyzed the results for the 22 participants who do not show this tendency. The right side of Figure 3 shows the results for these 22 ‘clean’ participants. As is clear, although removing a subset of the participants brought down the error rate for items with proclitics (which includes the negatives), there is still a major difference among the conditions: F(3,63) = 9.638, p < 0.001.
The results for the Udi production task are simple to summarize. The participants had difficulty with items that required clitics to be placed in the earliest position. The other three cases were easier for them and did not differ from each other. This is consistent with the CHH. The original CHH was based on theories of spoken word recognition that emphasize the importance of word onsets for making contact with the mental lexicon. We do not expect access in the mental lexicon to be problematic when the task does not involve word recognition; the lexical item was supplied in the production experiment. The poor performance here for proclitic items suggests that the preferences for suffixes over prefixes might be grounded in some other aspect of the system. The equivalent performance on endoclitic items and enclitic items shows that just as for the perception experiments, participants do not find it problematic to break up the verb, either intermorphemically or intramorphemically.
3.4. General discussion of Udi experiments
We chose to test the CHH in Udi because it offered the most complete set of clitic positions. In particular, the same clitics can occur before a verb, after a verb and within verbs both intramorphemically and intermorphemically. Across the three experiments, the results were quite mixed with regard to predictions of the hypothesis. On the positive side, for both real and nonce Udi words, verification response times were slower when a clitic preceded a verb than when it followed a verb. The comparable disadvantage for producing Udi utterances when a proclitic is required might initially seem like additional support. However, the original hypothesis was framed as a consequence of disrupting lexical access during word recognition, and during production, it is not clear that the difficulty could be seen in this way – the participant is given the base verb on each trial.
The tests of predictions for clitics that break up a stem – endoclitics – were uniformly unsupportive of the CHH. With real words, response times were actually faster for items with these internal elements than for ones in which the clitics were external. Similarly, for the real words, the stimuli that should have been most disruptive – clitics that break up a morpheme – were actually responded to most quickly. The corresponding results for the nonce stimuli were also not as a processing account would predict, with internal clitics leading to no slower responses than external ones, and intramorphemic clitics being no more disruptive than intermorphemic clitics.
In the introduction to this article, we quoted from Cutler et al. (Reference Cutler, Hawkins and Gilligan1985), who suggested a sensible reason for infixes and, by extension, endoclitics being difficult to process and produce. We quote part of their statement again here, focusing on a caveat that they included:
Of course, if a stem has been effectively recognized by the time its uniqueness point has been processed, one might argue that infixing a morpheme between the uniqueness point and the end would provide all the continuity necessary (since the end should be irrelevant), AND get the important affixed information in at the earliest possible useful point, i.e. just when the word has been recognized. (Cutler et al. Reference Cutler, Hawkins and Gilligan1985: 752).
The authors (p. 739) define UP, saying ‘the point in the word at which all other members of the initial cohort have dropped out is called the word’s uniqueness point’. To explain it, they use the word dwindle, where the UP is [ɪ], since no other word in English begins with the sequence [dwɪ]. The word word is used in two ways in linguistics – it can refer to a lexeme or a word form. In English, library and libraries are distinct word forms belonging to one lexeme. In Udi, (8) and (9) are distinct word forms of one lexeme, ak’ ‘see’.
If we interpret the UP based on the word form, it falls at [z] in (8), given ak’-al-nu ‘you will see’.Footnote 8 The UP is at [i] in (9), given a-z-k’-e ‘I have seen’. In fact, on this interpretation, the UP in Udi will always fall on the first sound segment of the last suffix or clitic. The same is true with complex stems, given forms such as zom-bak-al-zu ‘I will study’ and zom-ez-bak-i ‘I studied’, where the UP’s are [z] and [i], respectively. In that case, in Udi it would be true of every verb form that the ‘stem has been effectively recognized by the time its uniqueness point has been processed’, yet only certain verb forms permit endoclitics.
However, if we interpret the UP based on the lexeme, it falls at [k’] in (8)–(9), given ač-al-zu ‘I will get lost’ and others. Here, we must process the UP itself before the stem can be recognized, but in one example, there is nevertheless an endoclitic, and in the other, an endoclitic is not permitted. Thus, neither interpretation of the UP accurately predicts permitted use of an endoclitic.
The observation of Cutler et al. (Reference Cutler, Hawkins and Gilligan1985) seems entirely reasonable. Nevertheless, one is left wondering how the listener understands intramorphemic clitics at all, let alone more efficiently than clitics in any other position. The uniqueness point is not the answer. As discussed in Section 3.2.3, a different possibility is that speakers understand intramorphemic clitics in Udi because they have stored the relevant sequences in memory.
Taken together, the results of the three Udi experiments provide little support for the Cutler–Hawkins processing account of the suffix advantage. Of course, the results from a single language may be skewed by (unknown) idiosyncratic properties of that language. Indeed, one reason we undertook the current study was to increase the range of languages for testing the CHH beyond the affix-based experiments we conducted in English and Georgian (Harris & Samuel Reference Harris and Samuel2025). We continue this effort in the following sections, reporting on results for similar tests in EP. Although EP does not offer the distinction between intramorphemic and intermorphemic clitics, it does allow us to compare clitics that occur before the verb to those that occur after, and to compare clitics that occur internally (endoclitics) to those that are external to the verb.
4. Clitics in European Portuguese
In EP, the clitic pronoun represents the direct or indirect object, and the two can be combined.Footnote 9 The default position for clitics in EP is enclitic. If there is a negative, a question word, a fronted focus phrase or certain other features, clitics occur proclitic to the verb, preceded by these features. In future tense and conditional, clitics are mesoclitic,Footnote 10 occurring between the verb stem (resembling the infinitive form) and the future or conditional suffix, which also indicates the person and number of the subject. Clitics distinguish all six person–number combinations, a dative vs. accusative distinction with all these variations and a masculine vs. feminine distinction in the third-person accusative. Dative clitic pronouns are shown in Table 7, and accusative in Table 8.
Since dative and accusative may both occur in a clause, we get clitic clusters, shown in Table 9. Tables 7–9 are based on the tables in Spencer & Luís (Reference Spencer and Luís2012: 205).
The gender and number of the dative are neutralized in clitic clusters; that is, the forms in the lhe row of Table 9 encode either ‘to him/her/it’ or ‘to them’. EP also has reflexive clitics, which are important in the language but occur only in fillers in our experiments. We therefore do not discuss them further here. EP clitic allomorphy affects only third-person accusative clitics (o, a, os, as):
As described above, EP differs from Udi in that, among other things, EP has only a single position for mesoclitics. The questions of whether either language has true proclitics or true endoclitics are controversial.
5. Description of Experiments in EP
The EP experiments were designed to be parallel to those that we had carried out on Udi.
5.1. Experiment 4: Perception of existing EP words and utterances that violate placement rules for EP clitics – verification task
5.1.1. Materials
Experiment 4 included 24 grammatical items in each condition – that is, with the clitics in each of the three positions. See Table 6 above for examples of legal placement for a clitic in each position. Correct critical items with enclitics and proclitics had verbs in the present, imperfect or perfect. For example, mostramos-te ‘we show you’ is in the present tense, with a first-person plural subject, -mos, and an enclitic second-person singular indirect object, -te. Those with proclitics occurred with negative não, which precedes the verb and ordinarily causes clitics to immediately precede the verb. In the item não lhe comunicaste ‘you have not communicated to him/her’, the negative não causes the clitic lhe ‘to him/her’ to precede the verb, comunicaste, in the perfect, with a second-person singular subject, -ste. Correct critical items with the clitic in the mesoclitic position were in the future tense (without a negative). For example, fritá-los-emos ‘we will fry them’ has the third-person plural masculine accusative clitic los in mesoclitic position between the stem, fritá ‘fry’, and the first-person plural future suffix -emos. Additional examples are given in Table 10.
Subjects included all six person–number combinations; in EP, these are expressed as suffixes, which express at the same time the tense–aspect–mood. A wide variety of clitics was used, including all person–number combinations, both feminine and masculine, and both dative and accusative. Reflexive clitics and clitics requiring allomorphic change were avoided in the correct forms of critical items.
Critical items also included 24 ungrammatical items in each condition. Incorrect forms were made ungrammatical by (i) using a mesoclitic where an enclitic or proclitic is required (e.g. *mostra-lhe-mos ‘we show him’ instead of correct mostramos-lhe ), (ii) using true allomorphic variants of clitics (lo, la, los, las, variants of o, a, os, as) but in contexts where they are inappropriate (e.g. *beijá-los-rás ‘you will kiss them’ instead of the correct beijá-los-ás and (iii) inserting clitics in a position that is never possible in Portuguese – namely, between the stem and the r of the infinitive (instead of between the infinitive and the ending, which expresses tense and person–number of the subject). An example of (iii) is *dominá-vos-rei instead of the correct dominar-vos-ei ‘I will dominate y’all’, where the suffix -ei encodes first-person singular subject and future tense. These 72 correct and 72 incorrect critical items are listed in Appendix G. To these critical items, 72 correct and 72 incorrect fillers were added. With the fillers, we tried to make the totality of the experimental items more like the language, using patterns that had not been used in the critical items. For example, some fillers used intransitive verbs, which do not use a dative or accusative clitic. Some fillers involved the conditional (which requires a mesoclitic), avoided for critical items. Other tense–aspect–mood categories were also used, such as imperative. In some fillers, we used the n allomorphs (no, na, nos, nas, allomorphs of o, a, os, as), which were avoided in the critical forms. For example, based on the infinitive destruir ‘destroy’, we have the grammatical filler destruiam-nos ‘they were destroying them’, where the third-person plural masculine clitic os is realized as nos because of its proximity to [m]. Finally, the reflexive clitic se was avoided in the critical items but was included in some fillers (e.g. decidiram-se ‘he/she/it decided’). Some items have more than one of these four characteristics (e.g. dormiríamos ‘if we sleep’, which is both intransitive and conditional). Incorrect fillers were formed by changing the order of morphemes, placing clitics between morphemes of the verb where they never actually occur or placing n allomorphs in positions in which these allomorphs never occur. A variety of subjects was used, as well as a variety of clitics. No verbs were used in more than one category. These 144 stimuli are listed in Appendix H.
5.1.2. Method
A total of 49 participants were recruited for our three experiments. Of these, 34 appeared to be female and 15 male. All had been prescreened as native speakers. Because the mesoclitic is falling out of use and younger speakers avoid it,Footnote 11 we invited participants who were older. They were staff members of the University of Coimbra. Experiments were carried out in a quiet office at the University. Participants were compensated.
To the extent possible, the method was the same as that used for Experiment 1. Presentation of the 288 experimental items was randomized for each participant. Instructions were given in Portuguese. Each participant completed a short practice session consisting of 17 items, including nine correct and eight incorrect items. They were instructed to answer as quickly and accurately as possible.
5.1.3. Results and discussion
Data from one participant were not included due to extremely high error rates on the items that called for a No response. Figure 4 presents the error rates and response times for the remaining 48 participants for the three types of items. These results bear on two predictions that follow from extending the CHH to clitics: clitics that occur at the beginning of a word should be more disruptive than those at the end, and breaking up a stem (mesoclitics) should be more disruptive than placing a clitic before or after an intact word. Inspection of Figure 4 indicates that there was a clear speed–accuracy trade-off, undercutting any clear result that might support the predictions.
Overall, there were significant differences in error rates, with the most errors occurring for the mesoclitic items: F(2,94) = 17.611, p < 0.001. Conversely, for reaction times, the robust effect reflected faster responses for the mesoclitics: F(2,94) = 17.243, p < 0.001. Looking specifically at whether mesoclitics are more disruptive than clitics outside the word, the error data are supportive: F(1,47) = 24.701, p < 0.001. However, reflecting the speed–accuracy trade-off, the reaction time data show a significant advantage for the mesoclitics: F(1,47) = 18.965, p < 0.001.
The focused tests comparing proclitics to enclitics yield the same conflict between error rates and response times. Participants were a bit more accurate when the clitic preceded the word than when it followed – a marginal difference statistically: F(1,47) = 3.407, p = 0.071. But the response times showed the opposite pattern: F(1,47) = 15.371, p < 0.001. Given the conflicting results across the two measures, the data do not provide any clear evidence in favor of (or against) the CHH.
5.2. Experiment 5: Perception of EP nonce words – verification task
5.2.1. Materials and method
An experiment using nonce words was used for the same reason as in Udi – nonce items are more challenging and can reveal processing that may be difficult to see when people are dealing with familiar stimuli. The task was structured to be parallel to both Udi Experiment 2 and EP Experiment 4 and involved stimuli consisting of nonce verbs, real infinitive endings, real tenses, real subjects and real clitics. For example, the correctly constructed nonce item apitilaste- me consists of the nonce verb stem apitila- (from the nonce infinitive apitilar with the real infinitive ending -ar) and the real second-person singular perfect suffix -ste and occurs with the real first-person singular clitic -me. Grammatical stimuli were correctly structured, and an equal number of ungrammatical stimuli were incorrectly structured, with respect to morpheme positions. For example, the incorrectly structured *apitila-lha-ste has the real dative/accusative third-person singular feminine clitic lha in a position where no clitic occurs in EP – namely, between the stem and the perfect suffix. We made sure that no nonce forms violated the phonological norms of EP and that all were pronounceable.
Items that were incorrectly structured were made incorrect in the same ways real words were made incorrect for Experiment 4, by (i) placing the clitic incorrectly for the context, (ii) using the lo, la, los and las allomorphs of third-person clitics in contexts where they are inappropriate and (iii) inserting clitics in a position that is never possible in Portuguese – namely, between the stem and the r of the infinitive. The subjects of nonce verbs were balanced for person and number, and the clitics were balanced for person, number and case (dative or accusative). These 144 correct and incorrect critical items are listed in Appendix J.
We accompanied these with 144 fillers, again using patterns that had not been used in constructing the critical items. Fillers again included intransitive verbs, verbs in the conditional and imperative, verbs with the n allomorphs (no, na, nos, nas) and verbs with the reflexive clitic se. (Intransitive nonce words have no accompanying clitics.) Incorrect fillers again were formed by changing the order of morphemes, placing clitics between morphemes of the verb where they never actually occur, or placing n allomorphs in positions in which these allomorphs never occur. A variety of subjects was used, as well as a variety of clitics. These items are listed in Appendix K. The method was as in Experiment 4.
5.2.2. Results and discussion
Recall that a core motivation for using nonce stimuli is that such stimuli can challenge the system sufficiently to allow us to detect differences that may be hard to find when processing proceeds extremely quickly and accurately. In fact, six of the participants produced chance-level accuracy on the legal and/or illegal forms. Their data were not included in the analyses reported here. The data file from one other participant was lost due to a technical problem, leaving 42 participants in the analyses.
Even for the remaining participants, as a comparison of Figures 4 and 5 indicates, the nonce stimuli did indeed increase errors and slow down responses. Surprisingly, although errors went up substantially for stimuli with clitics before or after the word, accuracy was not impaired for the mesoclitic items. The error rates across the three conditions were reliably different: F(2,82) = 40.065, p < 0.001. The response times, although noticeably slower than for the real words, followed the same pattern, with slow responses to items with a proclitic and faster responses to those with mesoclitics: F(2,82) = 42.434, p < 0.001.
For the specific question of whether a mesoclitic is disruptive, Figure 5 makes it clear that exactly the reverse was true. There were fewer errors for mesoclitics than for items with clitics before or after the word: F(1,41) = 103.165, p < 0.001. Similarly, the mesoclitic stimuli also yielded faster responses: F(1,41) = 47.433, p < 0.001. The specific tests comparing proclitics versus enclitics produced the same speed–accuracy trade-off found for the real words. Nonce items with a proclitic produced fewer errors than those with enclitics: F(1,41) = 9.932, p = 0.003, but the reaction times showed an advantage for the enclitics: F(1,41) = 35.107, p < 0.001. Thus, as with the real words, the results do not provide any clear support for the CHH.
5.3. Experiment 6: Production of existing EP words – generation task
5.3.1. Materials
The production experiment was similar to Experiment 3 in Udi, except for the presentation modality of the environments. In Portuguese, we presented the verbs in aural form and their environments in written form, gaining the advantages of cross-modal priming identified in previous research (e.g. Marslen-Wilson et al. Reference Marslen-Wilson, Tyler, Waksler and Older1994). In Udi, we did not have that option (see Section 3.3.1). In EP, the verb was presented aurally in the infinitive form, and one of three environments was presented visually on the laptop screen to elicit forms with clitics in certain positions, as summarized in Table 11.
In all environments, PRONOUN represents any one of the possible object pronouns. On the screen, these were presented as in Figure 6.
Twenty-four items were presented to prompt a proclitic, 24 to prompt an enclitic and 24 to prompt a mesoclitic. These 72 critical items are listed in Appendix L. Fillers were not used. The practice session had 17 items.
5.3.2. Method
The 72 critical items were pseudo-randomized to create one sequence of trials. This sequence was reversed to create a second pseudo-randomized list. Forty-six participants completed the production experiment, with 23 receiving one list (Group A) and 23 receiving the other list (Group B). The data from five of the participants were not used because of very large numbers of errors, leaving 21 in Group A and 20 in Group B.
Participants were instructed to express something in the past when they heard the prompt ontem ‘yesterday’, and similarly for the other tense prompts. They were instructed on how to respond to the não ‘not’ prompt and to the pronouns as prompts for the clitics. We wanted all responses to have third-person singular subjects, but as we did not know a way to request this without using explicit linguistic terminology, we relied on modeling this in examples we gave in our explanation of the method. We recognize that this opens the possibility that a participant might be more creative and use a variety of subjects. We discuss this problem further below.
5.3.3. Coding and analyses
Each production was transcribed and matched with the keys listed in Appendix L. Answers with a repetition or correction were counted as incorrect. Also counted as incorrect were responses with verb stem, tense, person or number of the subject suffix, or person or number of the object clitic not corresponding to the prompts, a negative when it had not been prompted, or no negative when it had been prompted. For clitics, an error was scored if the clitic that had been prompted was omitted. A response was considered correct if it had a verb stem, tense, person–number combination of the clitic corresponding to the prompts, as well as a negative if and only if prompted.
The first item in each of the two pseudorandomized stimulus lists had to be omitted from the count since a hardware error prevented participants from hearing these items. One eliminated item was from the enclitic condition, and one was from the mesoclitic condition, leaving 23 in each. This resulted in a total of 2,870 responses to experimental items.
Applying the criteria stated above, we found 1,965 correct answers. There were 134 agreement errors, including 92 responses whose only error was in subject agreement.
5.3.3. Results and discussion
We present the results in two ways. The left side of Figure 7 shows the error rates for producing items with clitics in the three possible positions, counting as an error the use of anything other than a third-person singular subject. The right side shows the corresponding data when agreement errors were not included in the computations.Footnote 12 In both cases, the pattern is very clear: producing items with a proclitic was most accurate, and producing those with mesoclitics was least accurate. These differences were quite reliable in both cases (full set: F(2,80) = 13.040, p < 0.001; omitting agreement errors: F(2,80) = 12.736, p < 0.001).
For the specific question of whether disrupting an onset is more disruptive than placing a clitic after the word, the results are just the opposite: clitics before the word yielded more accurate responses than those occurring after. This reversal was significant for both the full data set (F(1,40) = 5.031, p = 0.030) and for the set omitting agreement errors (F(1,40) = 8.500, p = 0.006). As the patterns in Figure 7 suggest, producing stimuli that interrupted a stem (mesoclitics) led to more errors than producing the other two types, both for the full dataset (F(1,40) = 17.564, p < 0.001) and for the set excluding agreement errors (F(1,40) = 15.087, p < 0.001). Thus, the production results confirm the prediction that interrupting a stem with a clitic is more difficult than merely placing one before or after the word. However, the production results run counter to the view that clitic placement before a word is more difficult than placement after one.
As with our experiments in Udi, we can ask whether the results support two extensions of the CHH that the distribution of clitics is driven by two aspects of speech perception: disrupting onsets should be more problematic than disrupting offsets, and breaking up a word (with a mesoclitic) should be more problematic than preceding or following a word with a clitic. The Portuguese results do not provide affirmative evidence in either case.
5.4. General discussion of EP experiments
For the real word test (Experiment 4), there was a clear speed–accuracy trade-off. Thus, although proclitics did lead to slower response times than enclitics, the accuracy results showed the opposite pattern. Similarly, the higher error rate for mesoclitics compared to clitics that did not break up the word was offset by faster responses for the mesoclitics. The same speed–accuracy trade-off was found for proclitics versus enclitics in Experiment 5, and mesoclitics were actually better than external clitics on both the accuracy and reaction time measures. Finally, in the production task, mesoclitics were the most difficult to produce accurately, whereas producing words with proclitics was consistently easier than producing words with enclitics. Collectively, the results of the Portuguese experiments do not offer support for a processing explanation for a distributional preference for suffixes (enclitics) over prefixes (proclitics).
6. General Discussion
Our research effort is fundamentally grounded in Greenberg’s (Reference Greenberg and Greenberg1963) classic observation of a suffixing preference across the languages of the world and in Cysouw’s (Reference Cysouw, Dressler, Kastovsky, Pfeiffer and Rainer2005) extension of that observation to clitics: distributionally, suffixes are more common (both within and across languages) than prefixes, and enclitics are more common than proclitics (Dryer Reference Dryer2017). Cutler et al. (Reference Cutler, Hawkins and Gilligan1985) and Hawkins & Cutler (Reference Hawkins, Cutler and Hawkins1988) hypothesized that the suffixing preference could stem from well-established differences in the relative importance of early- versus late-occurring information in spoken words during word recognition; they extended this idea to suggest that because word-medial information is least accessible during perception, infixes (and by our extension, endoclitics) should be relatively rare. (In this section, we use ‘endoclitic’ to include EP mesoclitics; see Footnote note 10.)
The CHH makes good sense, but empirical tests of the ideas have not been undertaken previously. Moreover, most of the evidence regarding the importance of early- versus late-occurring information during word recognition comes from studies using English or Dutch, whereas the basis for the suffixing preference is the distribution of suffix versus prefix use across hundreds of languages. Thus, we undertook a research program to test whether there are measurable processing differences for affixes and clitics that follow the patterns predicted by the CHH. We report our tests of suffixes and prefixes in a companion paper (Harris & Samuel Reference Harris and Samuel2025), whereas the current paper reports the tests focused on clitics (the extension suggested by Cysouw (Reference Cysouw, Dressler, Kastovsky, Pfeiffer and Rainer2005)). To address the concern that the original basis for the CHH was limited to two closely related Germanic languages (Dutch and English), our experiments have employed a more diverse set of languages – Georgian and English for the tests of affixes, and Udi and EP for the clitic tests.
The experiments testing prefixes and suffixes in English and in Georgian were similar to those used in the current study. Overall, they did not produce results to support the CHH. In the current study, we have taken advantage of the properties of clitics in Udi and in EP to conduct tests of extensions of the hypothesis that have two advantages over what could be done with affixes in English and Georgian. The first advantage is that the clitics in Udi and in Portuguese are constant across positional changes, a property that is not available in Georgian or English affixes. For example, in Georgian, we were able to compare words with a first-person verb prefix (e.g. v-ban ‘I bathe’) to matched verbs with a third-person suffix (e.g. ban-s ‘he/she/it bathes’), with both affixes consisting of a single fricative, but not the same single fricative. The second advantage is that Udi and Portuguese provide a rich set of endoclitics to test, allowing us to test whether the expected disadvantage for these ‘buried’ sounds would be found.
Broadly speaking, just as the experiments with affixes failed to support the CHH, the experiments with clitics did not produce results consistent with that idea. More precisely, across the various experiments and measures, there were at least as many cases in which the observed pattern was at odds with the predictions of the processing account as cases that were consistent with it.
Arguably, the verification tasks using real words (Udi: Experiment 1; Portuguese: Experiment 4) are the most direct tests of the hypothesis, as they involve actual words in the languages, and participants made perceptual judgments – the situation that comes closest to the source of the CHH. The results were quite similar across Udi and Portuguese for this situation. In both experiments, the participants made relatively few errors, making the reaction times the appropriate focus. In both languages, consistent with the hypothesis, verification times were slower for words with a preceding clitic than for ones with a following clitic. However, the reaction time patterns for endoclitics, in both languages, were at odds with the hypothesis because these words were responded to more quickly than words with external clitics.
We included comparable tests using nonce words to push performance away from possible ceiling performance, and, in fact, the error rates were quite substantial in these experiments compared to their real-word counterparts. Across languages, the patterns of reaction times were similar. As with the real words, items with a preceding clitic yielded slower response times than ones with a following clitic – consistent with the hypothesis. This is important, but the CHH is much bigger. When we look at the other predictions in Cutler et al. (Reference Cutler, Hawkins and Gilligan1985) and Hawkins & Cutler (Reference Hawkins, Cutler and Hawkins1988), the evidence is much less supportive. For example, for both languages, response times were not slower for cases that ‘broke up’ a stem (endoclitics) compared to those that left it intact (proclitics, enclitics). In Udi, these two cases led to the same response times, whereas in Portuguese, the mesoclitics actually produced the fastest response times. The accuracy data are also problematic for the hypothesis: in Portuguese, mesoclitic items yielded more accurate responses than items with external clitics, and items with proclitics were responded to more accurately than those with enclitics. Both of these results are the opposite of what would be predicted by the hypothesis. Similarly, in Udi, endoclitics produced better accuracy than external clitics. Thus, just as with the real words in the two languages, the results for nonce words were quite scattered with respect to the hypothesis.
We included production experiments in both languages, even though the original hypothesis was grounded in perception, to see whether there might be a comparable pattern in production. Finding such a pattern would call for re-examining the assumed perceptual basis for the distribution of affixes and clitics cross-linguistically. In Udi, we did find that participants had great difficulty producing the required forms when a preceding clitic was called for. However, in Portuguese, this was actually the easiest condition. It is difficult to know exactly what might be driving these differences, but one obvious factor was the difference in how the participant received the pieces to put together. Because Portuguese is a viable written language, we delivered three of the four pieces to the participant in written form. But because Udi is not widely used in written form, participants were given the information auditorily, presumably imposing a heavier memory load.
Cutler (Reference Cutler, Frauenfelder and Cutler1985) observed that the way speakers deal with processing or producing a particular phenomenon is not necessarily universal, as had been assumed. This is supported by a wide variety of cross-linguistic results. As we noted, when we compare the results of the production experiments, we find that Udi and EP are very different. Udi speakers found it most difficult to place a clitic before the verb without error (see Figure 3), whereas for speakers of EP, this was the easiest (most error-free) position (see Figure 7). Recall that in both languages, the expected order is as in (7):
We suggested above that the procedures for Udi might have imposed a greater memory load on the participants. In addition, in Udi, but not EP, the speaker must decide between three grammatical options for placing the negative + clitic. These three positions were discussed in Section 2 and illustrated in the second row of Table 2 and in examples (3)–(5). We summarize the positions in Table 12. Udi speakers could have produced significantly more errors for the condition with the negative (labeled in figures as proclitic) because they had all these options to weigh. It is true that the conditions intended to elicit endoclitics also had two options, but one of these was the enclitic position – by hypothesis, the default. We suggest that placing the clitic in the default position does not elicit the errors that considering the infrequent endoclitic positions elicits in the negative condition.
The fact that two words are involved in the proclitic condition predicts that processing should be harder. But the CHH also predicts this: prefixed, and by extension, proclitic position will be harder to process (slower and less accurate) than suffixed and enclitic. The failure to find evidence for the CHH is thus simultaneously evidence that this factor is not important.
Whatever underlies the different error patterns in production for Udi versus Portuguese, neither language provides perception or production results that clearly match what would be expected based on our extensions of the CHH to clitics. Coupled with a similar lack of supportive evidence from tests of affixes in Georgian and English (Harris & Samuel Reference Harris and Samuel2025), the robust distributional patterns for affixes (Greenberg Reference Greenberg and Greenberg1963) and for clitics (Cysouw Reference Cysouw, Dressler, Kastovsky, Pfeiffer and Rainer2005, Dryer Reference Dryer2017) may not depend on the kind of processing influences that form the basis of the CHH. If not, empirical tests are needed for alternative explanations for these clear distributional facts. There is some evidence (e.g. Hupp et al. Reference Hupp, Sloutsky and Culicover2009) that there may be a domain-general preference for onsets versus offsets that cuts across both language and non-linguistic stimuli. Greenberg’s (Reference Greenberg and Greenberg1963) observations, and converging findings in later studies, leave no doubt that there is an asymmetry in how languages employ affixes. The results we have reported here for clitic placement, together with our examination of affixes (Harris & Samuel Reference Harris and Samuel2025), demonstrate that an explanation is unlikely to be grounded in the processing factors suggested by Cutler & Hawkins. As such, the alternative accounts merit further investigation.
Acknowledgements
The research reported here was supported in part by grant number BCS 1729256 from the National Science Foundation. Additional support was provided by the Economic and Social Research Council (UK) grant #ES/R006288/1, the Ministerio de Ciencia E Innovación (Spain) grants PSI2017-82563-P and PID2020-113348GB-I00, the Basque Government through the BERC 2018-2021 and BERC 2022-2025 programs, the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditations SEV-2015-0490 and Grant # PID2020-113348GB-I00, and the University of Massachusetts Amherst. We are very grateful to Venera Antonova, our consultant for Nij Udi. She worked with us to verify and record every item, she found experiment participants for us and in many cases took us to their homes, and she hosted us during both of our visits to Nij. We are very grateful also to Ana Luís, our colleague who is a native speaker of European Portuguese and has done important work on clitics and affixes in the language. She constructed lists of stimuli and recorded all of them, she found and scheduled experiment participants and she temporarily turned her office into a lab for our research. In addition, we are grateful to Alexandre Alves Santos, Jack Duff, John Kingston and Kristine Yu.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0022226724000045.