Estonian belongs to the Finnic branch of the Finno-Ugric language family and is closely related to Finnish. It has about one million native speakers in Estonia and about 150,000 elsewhere in the world. In phonetics, Estonian is probably best known for its three degrees of contrastive quantity: short (Q1), long (Q2) and overlong (Q3).
In the following, the sound system of Standard Estonian is described. Standard Estonian was established in the course of the 19th century on the basis of Northern Estonian dialects and is currently used by most speakers of Estonian independent of their geographical origin. The transcription of the narrative passage is based on the recording of a female speaker in her early thirties from Central Estonia.
Consonants
Estonian plosives [p t tʲ k] are pronounced as voiceless unaspirated consonants. Short plosives are spelt with the letters b, d and g, e.g. kabi [kɑpiˑ] ‘hoof’, sodin [sotʲiˑn] ‘scribble (pres. 1st sg.)’, pered [pereˑt] ‘families’. In spontaneous speech they can become partly or fully voiced in intervocalic position. Long plosives occur as geminates in a voiced surrounding, or in voiceless consonant clusters, e.g. kota [kottɑ] (Q2) ‘worn-out shoe (gen. sg.)’, kotta [kotːtɑ] (Q3) ‘worn-out shoe (part. sg.)’, palka [pɑlkːkɑ] (Q3) ‘wage (part. sg.)’, kotka [kotːkɑ] (Q3) ‘eagle (gen. sg.)’.
The fricatives [s sʲ ʃ f] behave similarly to plosives, occurring both as short and long, e.g. isa [isɑˑ] ‘father’, maste [mɑsʲːte] ‘mast (part. pl.)’, duši [tuʃʃi] ‘shower (gen. sg.)’, bluffi [plufːfi] ‘bluff (part. sg.)’. Fricatives /f/ and /ʃ/ do not belong to the native phoneme inventory of Estonian and appear only in foreign and loan words. The letter ž, which in most other languages marks a voiced fricative, is pronounced as a short unvoiced fricative [ʃ], e.g. beeži [peːʃi] ‘beige (gen. sg.)’, or even as [s] by some speakers (Pajusalu Reference Pajusalu, Metslang and Rannut2003).
The glottal fricative /h/ is usually not pronounced in word-initial position in spontaneous speech, e.g. häbi [æpiˑ] ‘shame’, homme [omːme] ‘tomorrow’. However, it is pronounced in most cases in formal speech, and can become fully voiced in a voiced surrounding, e.g. lähen [læeˑn] ‘go (pres. 1st sg.)’.
Voiced consonants [m n r v l] become unvoiced in word-final position after [t], [h] or [s], e.g. latv [lɑtː] ‘tree top’, lehm [lehː] ‘cow’, mahl [mɑhː] ‘juice’, käsn [kæsː] ‘sponge’. The phoneme /n/ has a context-dependent allophone and is pronounced as [ŋ] before /k/, e.g. panga [pɑŋkɑ] ‘bank (gen. sg.)’, panka [pɑŋkːkɑ] ‘bank (part. sg.)’.
The palatal approximant /j/ occurs in syllable-initial position, e.g. jänes [jæneˑs] ‘rabbit’, paljas [pɑlʲjɑs] ‘bare, naked’. In addition, it also occurs as a glide before a vowel following a long /i/ or a diphthong whose second component is /i/, e.g. siia [siːː.jɑ] ‘here’, maia [mɑiː.jɑ] ‘sweet-toothed (gen. sg.)’. Between low and mid vowels, /j/ can be realized as a raised mid vowel [], e.g. kaja [kɑ.ɑˑ] ‘echo’, oja [o.ɑˑ] ‘brook’. The bilabial approximant [w] is not an independent phoneme but occurs only as a glide at a syllable boundary after a long /u/ or a diphthong containing /u/ as a second component that is followed by a vowel, e.g. luua [luːː.wɑ] ‘to create’, laua [lɑu.wɑ] ‘table (gen. sg.)’.
All word-internal and word-final alveolar consonants (except /r/) can be palatalized. Estonian has pre-palatalization: palatalization occurs before rather than after the consonant and is characterized by a longer i-like transition from vowel to consonant and a quality change in the first part of a single or geminate consonant or consonant cluster (e.g. [lʲ] has a much higher F2 value than [l]; Lehiste Reference Lehiste, Kõressaar and Rannit1965, Eek Reference Eek1972). In effect, palatalization is caused by /i/ or /j/ following the consonant, e.g. pani [pɑnʲiˑ] ‘put (past 2nd sg.)’, palju [pɑlʲju] ‘many’. As palatalization also occurs in Estonian without the presence of an overt /i/ or /j/ it is phonemic, e.g. müts [mytʲːs] ‘cap’ – mütsi [mytʲsi] ‘cap (gen. sg.)’ (cf. müts [mytːs] ‘thump’ – mütsu [mytsu] ‘thump (gen. sg.)’), kutsu [kutʲsu] ‘doggie’ (cf. kutsu [kutsu] ‘invite (imp. 2nd sg.)’), punu [punʲuˑ] ‘belly’ (cf. punu [punuˑ] ‘braid (imp. 2nd sg.)’).
Vowels
Estonian has nine vowel phonemes, represented by the letters i ü u e ö õ o ä a, which are pronounced as [i y u e ø ɤ o æ ɑ]. They are divided according to tongue height as: high [i y u], mid [e ø ɤ o], and low vowels [æ ɑ].
All nine vowels occur in a primary stressed syllable. The unrounded back vowel /ɤ/ can be realized depending on the speaker as a mid back vowel [ɤ], a close back vowel [ɯ] or a mid central vowel [ɘ].
All vowels in primary stressed syllables occur as short and long. As short and long vowels do not differ much in quality, long vowels are phonologically considered to be double vowels, i.e. sequences of two identical segmental phonemes (Eek & Meister Reference Eek and Meister1999), e.g. vere/vere/ [vereˑ] ‘blood (gen. sg.)’, veere/veere/ [veːre] ‘edge (gen. sg.)’, veere /vee:re/ [veːːre] ‘to roll (imp. 2nd sg.)’. Long vowels occur only in primary stressed syllables. Estonian has virtually no vowel reduction in stressed syllables. It is only the vowels in unstressed syllables that can be reduced to some degree (Eek & Meister Reference Eek and Meister1998).
Unlike the related languages Finnish and Hungarian, Estonian does not have vowel harmony. Only five vowels, [ɑ e i o u], occur in non-initial syllables whereas [o] can only be found in a non-initial syllable of proper names, foreign and loan words such as Arno [ɑrnoˑ], foto [fotto] ‘photo’, auto [ɑutto] ‘car’. Word-final /e/ is often more open than in a stressed syllable and is realized as []. In the neighbourhood of /j/, back vowels can be more fronted [ ] (both in initial and non-initial syllables).
Estonian has 36 diphthongs in primary stressed syllables (see table 1). All nine vowels can appear as the first component of a diphthong but only five vowels, [ɑ e i o u], as the second component. Twenty-six diphthongs can be found in native or loanwords; 18 of them (marked in the table with bold) occur both in Q2 and Q3 words (e.g. naeru [nɑeru] (Q2) ‘laughter (gen. sg.)’, koera [koeːrɑ] (Q3) ‘dog (part. sg.)’) whereas the rest only in Q3 (e.g. söed [søeːt] ‘charcoals’). Diphthongs which only occur in foreign words are given in parentheses in the table (e.g. kiosk [kioːsk] ‘kiosk’). Only three diphthongs, [ɑi ei ui], are allowed in secondary stressed syllables, e.g. naljakaid [ˈnɑlʲjɑkˌkɑiːt] ‘funny one (part. pl.)’, teateid [ˈteɑtːˌteiːt] ‘message (part. pl.)’, õnnetuid [ˈɤnnetˌtuiːt] ‘unhappy one (part. pl.)’, and also in unstressed syllables, e.g. tänamatuid [ˈtænɑˑˌmɑttuit] ‘ungrateful one (part. pl.)’.
Prosodic features
Stress
Primary stress in native Estonian words is fixed, falling almost always on the first syllable of a word (and is therefore unmarked in the present examples). Additionally, Estonian has a complex system of secondary stresses, the placement of which is not always predictable. Words of more than three syllables can consist of combinations of monosyllabic, disyllabic and trisyllabic feet. A tetrasyllabic word is generally made up of two disyllabic metric feet where the secondary stress falls on the third syllable. In longer words, secondary stresses fall on successive odd-numbered syllables. Exceptions are suffixes that always attract stress, e.g. -mine (laulmine [ˈlɑuːlˌmine] ‘singing’), -line (aluseline [ˈɑluˑseˌline] ‘basic, alkaline’), -lik (ohtlikku [ˈohːtˌlikːku] ‘dangerous (part. pl.)’).
Quantity system
The Estonian quantity system is a complicated prosodic phenomenon in which duration interacts closely with stress and pitch (Lehiste Reference Lehiste, Lehiste and Ross1997).
For vowels, the three-way durational opposition occurs only in the first syllable of a word, which as a rule carries the primary stress, as for example in the triplet: kalu [kɑluˑ] (Q1) ‘fish (part. pl.)’, kaalu [kɑːlu] (Q2) ‘scales (gen. sg.)’, kaalu [kɑːːlu] (Q3) ‘scales (part. sg.)’. For consonants, the opposition operates between the first and second syllable of a disyllabic sequence, e.g. kala [kɑlɑˑ] (Q1), ‘fish’, kalla [kɑllɑ] (Q2) ‘calla lily’, kalla [kɑlːlɑ] (Q3) ‘pour (imp. 2nd sg.)’. As can be seen from these examples, contrastive quantity marks differences in both lexical meaning and grammatical function. All monosyllabic words, e.g. saad [sɑːːt] ‘get (pres. 2nd sg.)’ are according to traditional grammar treated as overlong.
In addition to duration, a decisive factor in determining the degree of quantity is the duration ratio of the first (stressed) and second (unstressed) syllable in a disyllabic sequence. The characteristic ratios between these syllables are 2:3 for Q1, 3:2 for Q2 and 2:1 for Q3 (Lehiste Reference Lehiste and Sebeok1960), and they have been shown to be stable also in spontaneous speech (Krull Reference Krull1993). These ratios give evidence of foot isochrony: the longer the first syllable the shorter the second. Usually the second-syllable vowel in a disyllabic Q1 foot (containing an open short syllable) is so-called half-long (marked in the transcriptions with the IPA diacritic [ˑ]). A half-long vowel is considered to be a characteristic of Q1 and is best perceived in words pronounced in isolation.
A crucial additional difference in distinguishing Q2 and Q3 lies in the realization of pitch in a metric foot, which in Q1 and Q2 is a step-down F0 contour between the end of the first syllable nucleus and the beginning of the second syllable, while Q3 is associated with a fall early during the first syllable.
Rhythm and intonation
Rhythmically, Estonian is neither a pure stress-timed nor a syllable-timed language. Estonian strives towards foot-isochrony (e.g. Eek & Meister Reference Eek and Meister1999, Ross & Lehiste Reference Ross and Lehiste2001) while at the same time tending towards relatively even syllable durations (Asu & Nolan Reference Asu, Nolan, Hoffmann and Mixdorff2006).
The most common intonational pitch accent in Estonian is a high tone associated with a metrically strong syllable followed by a trailing low tone (H*+L). Rising nuclei where a low accented tone is followed by a high boundary tone (L* H%) occur frequently in colloquial interactional speech where their function is above all to signal continuation. Additionally, a frequent intonational pattern in both read and colloquial speech is a high leading tone followed by a low accented tone (H+L*) (Asu & Nolan Reference Asu and Nolan2007).
Transcription of the recorded passage
Orthographic version
Põhjatuul ja päike. Ükskord vaidlesid põhjatuul ja päike selle üle, kumb neist on tugevam. Just siis tuli mööda teed rändaja, seljas soe mantel. Põhjatuul ja päike leppisid kokku, et see, kellel esimesena õnnestub sundida rändajat mantlit seljast võtma, on teisest tugevam. Põhjatuul puhuski kõigest jõust, aga mida rohkem ta puhus, seda enam koomale tõmbas rändaja oma mantli hõlmad. Lõpuks loobus põhjatuul katsest. Siis hakkas aga päike nii soojalt paistma, et rändaja kohe oma mantli seljast võttis. Ja nõnda pidigi põhjatuul tunnistama, et päike on tast tugevam.
Acknowledgements
We are grateful to Karl Pajusalu, Jaan Ross, Tiit-Rein Viitso and all the other members of the phonetics seminar at the University of Tartu for their many valuable suggestions when discussing the paper with us, as to Francis Nolan, and Marilyn May Vihman for their insightful comments on an earlier draft of this paper. We would also like to thank Ilse Lehiste and the late Arvo Eek for reviewing the paper.