1 Introduction
Taiwanese, also known as Taiwan Southern Min or Taiwanese Hokkien, is a language spoken in Taiwan, a small yet densely populated island off the southeast coast of China (Figure 1). It had been the lingua franca of the island until 1945, when the government mandated Mandarin as the official language (Lin, Reference Lin2021). Currently, it is still the second most widely spoken language on the island, and about 70 percent of the population have at least some passive knowledge of the language (Huang, Reference Huang1993).
Taiwanese is genealogically related to Min, a Sino-Tibetan language spoken in Fujian province on the southeast coast of China (Lin, Reference Lin2001). During the seventeenth century, population pressure initiated a prolonged wave of migration of Min speakers from Fujian to Taiwan. As the influx continued into the eighteenth century, Min speakers eventually outnumbered the indigenous Austronesian speakers of the island and became the majority of the population. The diaspora originated mainly from two cities in Fujian, Tsiangtsiu and Tsuantsiu. The Min dialects spoken at these two places differ slightly in their sound inventories and phonological rules, but are in general mutually intelligible. They became the main ingredients for the later formation of Taiwanese.
During the early stage of immigration, the two dialects were kept relatively distinct, as the original immigrants tended to cluster amongst their kin. However, in time, dialectal mixing and merging became unavoidable due to frequent contacts between the two varieties. Language contact with Hakka (also a Chinese immigrant language), Pepohoan (Austronesian languages of the plain indigenous tribes), Dutch (during Dutch colonial rule 1624–62 and 1664–8), and Japanese (during Japanese colonial rule 1895–1945) also brought in a large number of lexical items, eventually resulting in a new language we now call Taiwanese (Lin, Reference Lin2001). In fact, the first mention of the term Taiwanese probably appeared during the Japanese colonial era. In his A Composite Japanese-Taiwanese Dictionary (Ogawa, Reference Ogawa1907) and A Composite Taiwanese-Japanese Dictionary (Ogawa, Reference Ogawa1931, Reference Ogawa1932), Ogawa referred to the language as Taiuango, which means Taiwanese in Japanese. The first ever textbooks on Taiwanese were probably also created during this time in order to equip Japanese civil servants and police officers with an adequate level of language proficiency with which to interact smoothly with local residents (Ichikawa, Reference Ichikawa2013).
Mixing between the Tsiangtsiu and Tsuantsiu dialects was not homogeneous across Taiwan. Some had more traits of one while others had more of the other. Based on the degree of mixing between the two dialects, Ang (Reference Ang2003) categorized Taiwanese into three major dialects: Pro-Tsiang, which has a greater flavor of the Tsiangtsiu variety; Pro-Tsuan, which has a greater tang of the Tsuantsiu variety; and Mix, which incorporates both varieties in a more balanced fashion, although it still leans slightly toward the Tsiangtsiu variety. Figure 2 shows the approximate distributions of the three dialects. The Pro-Tsuan dialect is mainly spoken in the north and on the west coast of central Taiwan, while the Pro-Tsiang dialect mainly appears along the north coast and in central Taiwan. Finally, the Mix dialect is prevalent in the southern part of Taiwan. Because proportionately more speakers have adopted Taiwanese as their primary language in southern Taiwan (Table 1) (National Statistics ROC, 2021), the Mix dialect is currently the most dominant variety (Ang, Reference Ang1992) and is recognized as the mainstream accent of Taiwanese (Ministry of Education ROC, 2020). Notice that this mainstream variety is not yet completely homogeneous. Regional subdialects still exist in the middle- and old-aged generations, while the younger generation shows more accent merging (Hsu Reference Hsu2015, Reference Hsu2016). Nevertheless, it is currently a widely recognized variety by Taiwanese speakers and is adopted in dictionaries published by the government (Ministry of Education ROC, 2020).
% | Northern | Central | Southern | Eastern |
---|---|---|---|---|
Primary | 18.17 | 41.40 | 48.42 | 18.57 |
Secondary | 61.43 | 49.52 | 46.23 | 47.43 |
Throughout history, Taiwanese has encountered two government-mandated language promotion movements that threatened its status as a lingua franca. The first one was during the Japanese colonial time. During the early stage of Japanese rule, the Japanese language was promoted but not to the exclusion of local languages. However, between 1937 and 1945, a stricter Japanization policy (the Kōminka Movement) was implemented (Chou, Reference Chou, Duus, Myers and Peattie1996; Lin, Reference Lin2001). Taiwanese newspapers and schools were banned, and the use of Taiwanese was forbidden in public institutions. Families were awarded the honorable title of Kokugo Katei or Kokugo no Ie (‘national language family’) when they could demonstrate that all members conversed only in Japanese at home. Such families were granted better opportunities in education, career development, and business permit approval (Wu, Reference Wu2000). Despite all this, Taiwanese demonstrated resilience to the challenges. By 1943, although at least 80 percent of the population could speak Japanese at a fluency level of a sixth grader or above, less than 1.3 percent of the households were granted Kokugo Katei (Chou, Reference Chou, Duus, Myers and Peattie1996). In other words, Taiwanese speakers became fluent in everyday Japanese, and educated people were literate in Japanese writing, but their mother tongue and daily language remained largely Taiwanese.
The second language promotion movement was the Mandarin-only policy implemented after the retrocession of Taiwan from Japan to the Republic of China in 1945 at the end of World War II (Huang, Reference Huang1993; Lin, Reference Lin2001). In a rush to consolidate power, the new government instigated strict measures to promote Mandarin. The use of Taiwanese was banned from schools and public domains. Poor Mandarin ability was considered a school misconduct and the use of Taiwanese was seriously punished. Broadcasting companies were strictly regulated and highly censored with regard to the proportion of Taiwanese programs. Successful applications to government and teaching positions depended heavily on the Mandarin proficiency of the applicants, as Taiwanese and all the other non-Mandarin languages were demoted to vernaculars and were thought unfit for formal occasions and respectable job positions. In other words, Taiwan had become a strictly diglossic society. Mandarin was the exclusive high language and Taiwanese, despite being the native language of more than 70 percent of the population (Huang, Reference Huang1993), could merely function as a low language. The Mandarin-only policy ended officially in 1987, when martial law was lifted.
Although on the surface Taiwanese also seemed to have survived the second language promotion movement, its vitality has been in serious decline (Chen, Reference Chen2010a). According to National Statistics ROC (2021), there is a positive correlation between speaker age and the likelihood of acquiring Taiwanese as one’s first language (Table 2). For people aged fifty-five and older, 75 percent acquired Taiwanese first. However, for youngsters aged fourteen and younger, only 22 percent did the same, and 76 percent acquired Mandarin first instead. The adoption of Taiwanese as one’s primary language also shows the same trend. For people aged beyond sixty-five, more than 65 percent adopted Taiwanese as their primary language, while for those aged under forty-five, only 15 percent did so, and close to 85 percent adopted Mandarin as their everyday language instead. More important, within each age group, the percentage for acquiring Taiwanese first is always higher than that for adopting it as the primary language. This bleakly suggests that Taiwanese is quickly losing ground. Even though the government did not officially ban the use of Taiwanese in private domains during the Mandarin-only Movement, people have spontaneously refrained from using the language in their everyday lives, possibly due to self-censorship (Chen, Reference Chen2010a; Yap, Reference Yap2018). This creates a large intergenerational transmission gap for Taiwanese language and culture. Nowadays, it is not uncommon to observe grandparents struggling to utter Mandarin words in order to communicate with their grandchildren, even in southern parts of Taiwan where Taiwanese is in fact prevalent (see Table 1). As a consequence, Taiwanese not only lost its status of lingua franca in this second battle, but its survival is seriously jeopardized. Boundaries between Taiwanese dialects have thus been increasingly blurred among the younger generation due to the loss of speakers and decline in speaker proficiency.
Age range | First acquired | Primary language | ||
---|---|---|---|---|
Taiwanese | Mandarin | Taiwanese | Mandarin | |
6–14 | 22 | 76 | 7 | 92 |
15–24 | 30 | 67 | 11 | 89 |
25–34 | 38 | 58 | 15 | 84 |
35–44 | 51 | 43 | 22 | 77 |
45–54 | 62 | 30 | 32 | 67 |
55–64 | 70 | 21 | 49 | 48 |
65 and older | 75 | 14 | 66 | 28 |
Since the 1990s, the government has initiated a series of reversing language shift movements to right the wrongs. Bans on the use of Taiwanese, Hakka, and other indigenous languages in public domains were lifted, and language courses were designed and taught in schools to familiarize students with what should have been their mother tongues (Chen, Reference Chen1998). Several government-led projects on standardization of written Taiwanese have also been implemented to increase literacy. Tai-lo, a romanization alphabet designed for Taiwanese, was created by merging earlier major systems (Ministry of Education ROC, 2008). Assignment criteria for Taiwanese sinographs were also established for high-frequency words to eliminate idiosyncratic usages (Ministry of Education ROC, 2014). Finally, an online Taiwanese dictionary was compiled to facilitate language learning and use (Ministry of Education ROC, 2020). Research on Taiwanese began to gain popularity (Khoo, Reference Khoo and Shei2019). In 2018, the Taiwan Congress passed the Development of National Languages Act, officially granting all languages spoken in Taiwan an equal legal status (Ministry of Culture ROC, 2019).
This Element intends to provide an overview on the phonetics of Taiwanese. Section 2 provides a review of major previous literature on the consonants, vowels, tones, syllables, and prosody of the standard dialect of Taiwanese. Section 3 focuses on some major dialectal variations still robust in the society. Section 4 introduces two research materials, one read and the other spontaneous, for interested researchers to have quick access to actual Taiwanese data, and Section 5 utilizes these materials to provide acoustic measurements for some of the phenomena mentioned in previous literature. Finally, in Section 6, we propose future research directions in Taiwanese worth exploring.
Revitalizing a language already on the decline is truly a mammoth undertaking, and we are all aware of the numerous obstacles ahead. However, it is not without hope. According to a large survey (N = 2,139) in Chen (Reference Chen2010a), more than 70 percent of the interviewees agreed that Taiwanese is a marker of solidarity and is worth preserving. As we currently still have enough fluent speakers in the language community, it is possible to provide an enriched environment for both our beginning and advanced learners to develop a fully functional language system of Taiwanese (Hsu, Reference Hsu2018). It merely requires all of us to pitch in and patiently wait for the moment to enjoy the fruits of our toil.
2 Existing Research
This section introduces the consonants, vowels, tones, syllable structure, and prosody of mainstream Taiwanese.
2.1 Consonants
Mainstream Taiwanese has eighteen consonants (Chang, Reference Chang1989). It has a large set of noncontinuants, including nine plosives, three nasals, and three affricates, and only three continuants, including two fricatives and one approximant (Table 3). In the following, we will look at the phonological patterning of each sound category in turn.
Bilabial | Dental | Velar | Glottal | |
---|---|---|---|---|
Plosive | p pʰ b | t tʰ | k kʰ ɡ | ʔ |
Nasal | m | n | ŋ | |
Affricate | ts tsʰ dz | |||
Fricative | s | h | ||
Approximant | l |
2.1.1 Voiced Stops, Nasal Stops, and /l/
Aside from the missing /d/, Taiwanese has a very balanced set of oral stops, including the voiceless unaspirated /p t k/, the voiceless aspirated /pʰ tʰ kʰ/, and the voiced /b ɡ/ (Chang, Reference Chang1989). The nasal stops /m n ŋ/ also parallel neatly along the three places of articulation. In the literature, however, there has been a long-standing tradition to include /l/ as part of the voiced stop series to stand in place of what should have been a /d/ (Chung, Reference Chung1996; Lin, Reference Lin2001). Although some have argued that there is some phonetic basis to this and have claimed that Taiwanese /l/ has a stop-like quality (Chang, Reference Chang1989), the motivation is really phonological.
In Taiwanese, a phonological rule dictates that the nasality of a voiced stop onset and its following vowel has to be consistent. Voiced oral stops are followed by oral vowels, while nasal stops are followed by nasal vowels (Chang, Reference Chang1989; Chung, Reference Chung1996; Lin, Reference Lin2001). For the bilabial and velar positions, this indicates that [b ɡ] are in complementary distribution with their homorganic nasals [m ŋ]. For the dental position, since /d/ is missing, the system picks the next closest candidate possible, and the alternation occurs between [l] and [n]. The rule could be summarized as (1). Table 4 shows some examples.Footnote 1
(1)
Tai-lo | IPA | Gloss | Tai-lo | IPA | Gloss |
---|---|---|---|---|---|
bā | [ba7] | ‘to fit’ | mā | [mã7] | ‘to scold’ |
lî | [li5] | ‘to leave’ | nî | [nĩ5] | ‘year’ |
gōo | [ɡɔ7] | ‘five’ | ngōo | [ŋɔ̃7] | ‘to realize’ |
Superscript numbers indicate tones.
A controversy related to the rule in (1) is the phonemic status of the onsets [b l ɡ] and [m n ŋ]. Since the two sets of sounds are in complementary distribution, some researchers followed the classical analysis and considered them as allophones of the same phonemes. [m n ŋ] are thus deemed as mere allophonic realizations of /b l ɡ/ before nasalized vowels, and do not have a phonemic status of their own (Chang, Reference Chang1989; Lin, Reference Lin2001; Tung, Reference Tung1968). However, other scholars viewed the distribution as an accidental gap and regarded the two sets of sounds as separate phonemes (Cheng & Cheng, Reference Cheng and Cheng1987; Ting, Reference Ting1985). Experimental data supported the first view more. Pan (Reference Pan2004) found that listeners tended to ignore the phonetic differences between onsets [b] and [m] and categorize them as the same phoneme. Wang (Reference Wang1996) also showed that both the [b m] pair and the [ɡ ŋ] pair were accepted as allophones of the same phonemes in the onset position. However, similar results were not observed for the [l n] pair. This implies that even though /l/ takes the place of the missing /d/ in Rule (1), it likely has a phonemic status apart from /n/, unlike the [b m] and [ɡ ŋ] pairs. Therefore, the phonemic transcription of mā ‘to scold’ and ngōo ‘to realize’ in Table 4 should probably be /bã/ and /ɡɔ̃/, respectively, while that of nî ‘year’ should still be /nĩ/. However, this does not mean that Taiwanese lacks /m ŋ/ in its phoneme inventory. Both can act as syllabic nasals as in m̄-koh /m̩.kə/ ‘but’ and n̂g-sik /ŋ̩.sik/ ‘yellow’, and are allowed in the coda position like /n/, as in kâm /kam/ ‘to hold in mouth’ and kâng /kaŋ/ ‘same’, and are thus still regarded as phonemes.
2.1.2 Sibilant Realization
Taiwanese has four sibilants in total, including one fricative /s/ and three affricates /ts tsʰ dz/. Of the four, the voiced affricate is the most variable and has several allophones. Besides the canonical [dz], there are also two free variants, [z] and [l], as shown in (2) (Ang, Reference Ang1997, Reference Ang2003; Chen, Reference Chen1995; Lin, Reference Lin1995). The developmental process is construed to be from [dz] to [z] to [l], and is deemed to be motivated by ease of articulation (Chuang & Fon, Reference Chuang and Fon2017a, Reference Chuang and Fon2018). Voiced affricates are composed of a voiced stop and a voiced fricative, both of which are physiologically strenuous (Ohala, Reference Ohala and MacNeilage1983). Voiced stops require low intraoral pressure relative to the subglottal pressure to maintain voicing. However, when the closure is long, the transglottal pressure difference can drop quickly, and voicing ceases. Voiced fricatives are even more difficult. Besides the transglottal pressure difference required for voicing, the intraoral pressure also has to be higher than the atmospheric pressure to maintain high air velocity. When the air velocity is too low, frication noise cannot be created, and the result is an approximant, such as [l] (Ohala, Reference Ohala and MacNeilage1983). Therefore, the development of [dz]→[z]→[l] is considered a weakening process with an articulatory basis.
(2)
There is an additional variant [ɡ] for /dz/, which only occurs before /i/, as shown in (3). This variant is generally believed to be influenced by Hakka through language contact (Ang, Reference Ang2003, Reference Ang2012). Speakers are not homogeneous with regard to how they treat this variant. Some use it in free variation (Chen, Reference Chen1995), while others use it in complementary distribution with [dz], [z], and [l] (Lin, Reference Lin1995).
(3)
All four of the sibilants undergo an assimilation rule of palatalization when preceding /i/, as shown in (4). Since /dz/ has two free variants that are sibilants (i.e., [dz] and [z]), its palatalized form also has two free variants, [dʑ] and [ʑ] (Ang, Reference Ang1997, Reference Ang2003; Chen, Reference Chen1995; Lin, Reference Lin1995).
(4)
2.2 Vowels
Mainstream Taiwanese has six oral monophthongs /i e a ɔ u ə/ and four nasal monophthongs /ĩ ẽ ã ɔ̃/ (Figure 3) (Chang, Reference Chang1989). If only oral vowels are considered, it is deemed a symmetric six-vowel system and is assumed to be relatively stable (Chen, Reference Chen2010b). Table 5 shows some examples.Footnote 2 In addition to the monophthongs, Taiwanese also allows eight diphthongs, including five rising diphthongs /i̯a i̯ə u̯a u̯e u̯i/, and three falling diphthongs /ai̯ au̯ iu̯/. The nonsyllabic vowel targets are always /i̯/ or /u̯/, but the syllabic vowel targets are somewhat different between the rising and falling diphthongs. For the former, vowels of all three height levels can be used, including /a ə e i/, while for the latter, they are restricted to only /a/ and /i/. Rounded vowels cannot act as the syllabic target regardless of diphthong types. The designation of the syllabic target in /u̯i/ and /iu̯/ is especially interesting. Although both are composed of the two high vowel targets /i/ and /u/, Hsu (Reference Hsu2004) argued that /i/ has higher sonority than /u/ using rhyming patterns and acoustic measurements as evidence, and designated /i/ as the syllabic target for both. Of the eight diphthongs, seven are with a nasal counterpart, including four of the rising diphthongs /ĩ̯ã ũ̯ã ũ̯ĩ ũ̯ẽ/, and all three of the falling diphthongs /ãĩ̯ ãũ̯ ĩũ̯/. Taiwanese also has two triphthongs, /i̯au̯/ and /u̯ai̯/, with the syllabic vowel being always /a/. Both of them have a nasal counterpart. Some examples are given in Table 6.
Oral vowels | Nasalized vowels | ||||
---|---|---|---|---|---|
Tai-lo | IPA | Gloss | Tai-lo | IPA | Gloss |
tī | /ti7/ | ‘chopsticks’ | tīnn | /tĩ7/ | ‘full’ |
se | /se1/ | ‘muslin’ | senn | /sẽ1/ | ‘to give birth’ |
ta | /ta1/ | ‘dry’ | tann | /tã1/ | ‘to bear’ |
kôo | /kɔ5/ | ‘to paste’ | kôonn | /kɔ̃5/ | ‘to snore’ |
Oral vowels | Nasalized vowels | ||||
---|---|---|---|---|---|
Tai-lo | IPA | Gloss | Tai-lo | IPA | Gloss |
kai | /kai̯1/ | ‘should’ | kainn | /kãĩ̯1/ | ‘to moan’ |
háu | /hau̯2/ | ‘to cry’ | ha̍unnh | /hãũ̯ʔ8/ | ‘half-cooked’ |
iû | /iu̯5/ | ‘oil’ | iûnn | /ĩũ̯5/ | ‘sheep, goat’ |
kià | /ki̯a3/ | ‘to mail’ | kiànn | /kĩ̯ã3/ | ‘mirror’ |
kua | /ku̯a1/ | ‘song’ | kuann | /kũ̯ã1/ | ‘liver’ |
khuì | /kʰu̯i3/ | ‘breath’ | khuìnn-ua̍h | /kʰũ̯ĩ3.u̯aʔ8/ | ‘happy’ |
bue̍h-á | /bu̯eʔ8.a2/ | ‘sock’ | muê-á | /bũ̯ẽ5.a2/ | ‘plum’ |
iau | /i̯au̯1/ | ‘hungry’ | iaunn | /ĩ̯ãũ̯1/ | ‘peek-a-boo’ |
kuai | /ku̯ai̯1/ | ‘obedient’ | kuainn | /kũ̯ãĩ̯1/ | ‘to close’ |
2.3 Tones
Being a tonal language, Taiwanese is rich in tones, and it has an extensive set of tone sandhi rules. It also uses tones to distinguish between stressed and unstressed syllables. In the following, tonal categories and their corresponding rules regarding tone sandhi and stress are introduced.
2.3.1 Tonal Categories
There are in total seven tones in Taiwanese and almost all syllables in Taiwanese are realized with a particular tone (Chang, Reference Chang1989; Lin, Reference Lin2001). Taiwanese tones are defined by two factors, pitch contour and syllable structure. Tone 4 and Tone 8 are reserved for checked syllables, which are syllables with obstruent codas /p t k ʔ/. Because the voicing ends abruptly, the duration of these two tones is extremely short. The remaining five tones are called smooth tones and occur in all the nonchecked syllables. These tones are longer, and have more variations in their tonal contours. Table 7 shows the descriptions of these tones using both word labels and Chao’s (Reference Chao1968) five-point tonal scale, with 1 being the lowest and 5 being the highest in pitch. Notice that although Taiwanese has seven tones, they are numbered from Tone 1 to Tone 5, and then from Tone 7 to Tone 8. Tone 6 is missing. This is because Taiwanese originally had eight tones, but Tone 6 was merged with Tone 2 in its historical evolution. The absence of the historical Tone 6 and the different tone sandhi behaviors of checked and unchecked tones (discussed later in this Element) suggest that this system of tone description might be revised. Nevertheless, here we will continue to use the traditional system.
Tones | Word labels | Chao’s tone values | Examples | |||||
---|---|---|---|---|---|---|---|---|
C1 | Y | T | A | C2 | ||||
Tone 1 | high-level | 55 | 44 | 44 | 44 | 55 | kun | ‘king’ |
Tone 2 | high-falling | 53 | 53 | 53 | 41 | 51 | kún | ‘to boil’ |
Tone 3 | low-falling | 21 | 31 | 11 | 21 | 11 | kùn | ‘rod’ |
Tone 4 | mid-short | 21 | 32 | 32 | 32 | 3 | kut | ‘bone’ |
Tone 5 | mid-rising | 24 | 13 | 24 | 23 | 13 | kûn | ‘group’ |
Tone 7 | mid-level | 33 | 33 | 33 | 33 | 33 | kūn | ‘county’ |
Tone 8 | high-short | 53 | 33 | 4 | 4 | 5 | ku̍t | ‘slippery’ |
C1: Chang (Reference Chang1989); Y: Yang (Reference Yang1991); T: Tung (Reference Tung1996); A: Ang (Reference Ang1997); C2: Chang (Reference Chang1999)
Of the five smooth tones, there are two level tones, two falling tones, and one rising tone. The two level tones differ in pitch register; one is high-level (Tone 1) and the other is mid-level (Tone 7).Footnote 3 Similarly, the two falling tones also differ in pitch register; one is high-falling (Tone 2) and the other is low-falling (Tone 3). There is some debate regarding the actual contour of Tone 3. Some claim that it is a falling tone (Ang, Reference Ang1997; Chang, Reference Chang1989; Yang, Reference Yang1991), while others argue that it is a low-level tone (Chang, Reference Chang1999; Tung, Reference Tung1996). One suspects that this is due to the limits of methodology. Since all of the researchers in Table 7 utilized only subjective judgments but not acoustic measurements in determining tonal values, and since listeners’ perceptual acuity is positively correlated with pitch height (Jongman et al., Reference Jongman, Qin, Zhang and Sereno2017), it is possible that the pitch movements of Tone 3 occur at a pitch register that is too low to be reliably detected. However, since three of the five researchers heard a falling contour, and since acoustic studies also showed a falling contour (Hong & Chan, Reference Hong and Chan2022), Tone 3 is labeled as a low-falling tone in this Element.
As for the two checked tones, although there is large variability among the five researchers with regard to tonal values, especially for Tone 8, at least all of them are unanimous with regard to the relative tonal register. Tone 8 is slightly higher than Tone 4. As for the tonal contour, it seems that Tone 4 is slightly falling, while Tone 8 is more varied. It seems to be either a level or a falling tone. This also coincides with previous acoustic studies (Hong & Chan, Reference Hong and Chan2022). However, since the two tones are rather short in duration, one suspects that the exact tonal contour does not play an important role in tonal perception due to perceptual limits (cf. Jongman et al., Reference Jongman, Qin, Zhang and Sereno2017). Tonal perception thus likely places more weight on the tonal register instead. Also, although Tone 3 and Tone 4 seem to have very similar tonal contours for most analyses based on Table 7, they are actually perceptually rather distinct, as the two are very different in tonal duration, with the former almost twice as long as the latter (Hong & Chan, Reference Hong and Chan2022).
2.3.2 Tone Sandhi
The aforementioned tones are usually called base tones or citation tones. Taiwanese also has a fairly extensive set of sandhi tone rules that apply to all tonal categories. In a tone sandhi group (TSG), only the last stressed syllable is exempt and receives the base tone. All the preceding syllables undergo the tone sandhi rules and are realized with corresponding sandhi tones, which are phonologically and phonetically different from their base tone counterparts (Chang, Reference Chang1989; Lin, Reference Lin2001). A TSG does not have a definite length, and its scope is jointly determined by morphology, syntax, and prosody. For example, when Tâi-uân ‘Taiwan’ and gín-á ‘child’, each of which is a TSG, are compounded together to form Tâi-uân gín-á ‘native Taiwanese (lit. children of Taiwan)’, a larger TSG is formed and only the last syllable á is realized in its citation form. The three preceding syllables of tâi, uân, and gín are realized in their sandhi forms. The rule for sandhi tones is summarized in (5).
(5) Tbase → Tsandhi / ____ + σ1 + σ2 + … + σ́n]TSG
Figure 4 shows the tone sandhi circle of mainstream Taiwanese. The five smooth tones go around in a circle when linked by their tone sandhi rules (hence the tone sandhi circle). Except for Tone 5, which can only occur as a citation form but not a sandhi form, all the remaining four tones can act as both. For example, when a high-level tonal contour is encountered, there are two reasonable possibilities. It can either be a Tone 1 in its citation form or a Tone 2 in its sandhi form. Only the positioning of the tone could help differentiate between the two. For the mid-level tonal contour, it could be even more complicated, as there are then three possibilities. It can be a Tone 7 in its citation form, or a Tone 1 in its sandhi form, or a Tone 5 in its sandhi form. For a word like pîng-an ‘peace’, which is composed of a /piŋ/ syllable in Tone 5 and an /an/ syllable in Tone 1, it is pronounced as a /piŋ/ in Tone 7 plus an /an/ in Tone 1 when the two syllables are strung together (Table 8).
Tai-lo | Sandhi form | Gloss |
---|---|---|
tang-pîng | /taŋ1.piŋ5/→[taŋ7.piŋ5] | ‘east’ |
báng-thâng | /baŋ2.tʰaŋ5/→[baŋ1.tʰaŋ5] | ‘bugs’ |
tàng-sng | /taŋ3.sŋ̩1/→[taŋ2.sŋ̩1] | ‘stingy’ |
kak-tōo | /kak4.tɔ7/→[kak8.tɔ7] | ‘angle’ |
kah-ì | /kaʔ4.i3/→[ka2.i3] | ‘like’ |
tâng-o | /taŋ5.ə1/→[taŋ7.ə1] | ‘crown daisy’ |
kāng-khuán | /kaŋ7.kʰu̯an2/→[kaŋ3.kʰu̯an2] | ‘same’ |
ga̍k-khì | /ɡak8.kʰi3/→[ɡak4.kʰi3] | ‘musical instrument’ |
tsia̍h-pn̄g | /tsi̯aʔ8.pŋ̩7/→[tsi̯a3.pŋ̩7] | ‘eat’ |
Tone sandhi of the two checked tones follows a different set of rules, and is coda-dependent. For syllables ending with /p t k/, Tone 4 and Tone 8 act as the sandhi tone for each other (Figure 4) (Chang, Reference Chang1989; Cheng, Reference Cheng1973; Lin, Reference Lin2001). For example, kak-tōo /kak4.tɔ7/ ‘angle’ is realized as [kak8.tɔ7], and ga̍k-khì /ɡak8.kʰi3/ ‘musical instrument’ is realized as [ɡak4.kʰi3] (Table 8). However, this does not necessarily entail that the phonetic realization of a sandhi Tone 4 is identical to that of a citation Tone 8, or vice versa. Instead, it merely specifies the fact that the sandhi Tone 4 is a high short tone while the sandhi Tone 8 is a low short tone.
There is a different set of sandhi rules regarding checked tones ending with /ʔ/ (Figure 4) (Chang, Reference Chang1989; Cheng, Reference Cheng1973; Lin, Reference Lin2001). This is probably because there is an additional coda-dropping rule for /ʔ/ prior to the sandhi rules when the syllable is in a TSG-internal position, as shown in (6). With the final obstruent gone, the syllable now becomes lengthened, and what was originally the pitch excursion for Tone 4 and Tone 8 is now phonetically respectively closer to a low-falling Tone 3 and a mid-level Tone 7 instead. Therefore, they follow the rules for Tone 3 and Tone 7 and their sandhi forms become high-falling Tone 2 and low-falling Tone 3, respectively (Figure 4). Please see Table 8 for some examples.
(6) /ʔ/ → ∅/ ____ + σ1 + σ2 + … + σn]TSG
2.3.3 Exceptions to Regular Tone Sandhi Rules
There are three groups of words that are exceptions to the aforementioned tone sandhi rules (Lu, Reference Lu2003). All of them abide by rules that are somewhat different from the default sandhi rules. The first group involves some high-frequency words, such as khì ‘go’ and kah ‘and’. Instead of the sandhi rules applying once, they apply twice. For example, khì tó /kʰi3.tə2/ ‘where do you want to go? (lit. go where)’ should have been realized as *[kʰi2 tə2] according to the sandhi rules (Figure 4). However, the actual realization is in fact [kʰi1 tə2] instead. Similarly, guá kah lí /ɡu̯a2 kaʔ4 li2/ ‘me and you’ should have been *[ɡu̯a1 ka2 li2], but is instead realized as [ɡu̯a1 ka1 li2].
The second group of exceptions involves the diminutive suffix -á. More than half of the tones abide by a different set of sandhi rules when followed by the suffix, including Tone 3, Tone 7, /ʔ/-ending Tone 4, and /ʔ/-ending Tone 8 (Table 9). Tone 3 and Tone 4 have Tone 1 as their sandhi tone by applying the sandhi rules twice. Tone 8 has Tone 7 as its sandhi tone by applying the sandhi rule once and then reversing the sandhi rule after [ʔ] deletion. For example, thòo-á /tʰɔ3.a2/ ‘rabbit’ should have been *[tʰɔ2.a2], but is realized as [tʰɔ1.a2] instead. Similarly, hio̍h-á /hi̯əʔ8.a2/ ‘leaf’ should have been *[hi̯ə3.a2], but is realized as [hi̯ə7.a2] instead. Tone 7 is the oddball in this set, as it does not undergo the tone sandhi rules at all. For example, phōo-á /pʰɔ7.a2/ ‘booklet’ should have been *[pʰɔ3.a2], but it remains [pʰɔ7.a2] instead. One suspects that the motivation for these exceptions might be avoidance of low-ending tones. Since -á begins high, preventing the previous tone from ending low might make it easier for tonal articulation.
Tai-lo | Before -á | Gloss |
---|---|---|
kim-á | /kim1.a2/→[kim7.a2] | ‘gold’ |
láng-á | /laŋ2.a2/→[laŋ1.a2] | ‘cage’ |
thòo-á | /tʰɔ3.a2/→[tʰɔ1.a2] | ‘rabbit’ |
tik-á | /tik4.a2/→[tik8.a2] | ‘bamboo’ |
ah-á | /aʔ4.a2/→[a1.a2] | ‘duck’ |
iûnn-á | /ĩũ̯5.a2/→[ĩũ̯7.a2] | ‘sheep, goat’ |
phōo-á | /pʰɔ7.a2/→[pʰɔ7.a2] | ‘booklet’ |
tsha̍t-á | /tsʰat8.a2/→[tsʰat4.a2] | ‘thief’ |
hio̍h-á | /hi̯əʔ8.a2/→[hi̯ə7.a2] | ‘leaf’ |
Finally, the third group of exceptions involves a morphological structure specific to adjectives. In Taiwanese, triple reduplication is adopted for emphasis (Chang, Reference Chang1989; Lin, Reference Lin2001). For example, while âng means red, âng-âng-âng means extremely red. Special sandhi rules are applied to the first syllable of these structures for four of the tones, including Tone 1, Tone 5, Tone 7, and Tone 8, so that it becomes a Tone 5 (Table 10). The second and the third syllables follow the regular sandhi rules. For instance, when kim /kim1/ ‘shiny’ is triply reduplicated, it becomes [kim5.kim7.kim1], not *[kim7.kim7.kim1]. Notice that for Tone 8, the exception sandhi rule would result in a checked syllable being paired with a smooth tone – for example, ba̍t /bat8/ ‘fitting tightly’ becomes [bat5.bat4.bat8]. The first syllable is realized as a mid-rising Tone 5 without deleting the final /t/. However, since the first syllable of triply reduplicated adjectives is always lengthened, the checked Tone 8 is lengthened accordingly and would not have trouble realizing the smooth tone fully.
Tai-lo | Triple reduplication | Gloss |
---|---|---|
kim | /kim1.kim1.kim1/→[kim5.kim7.kim1] | ‘shiny’ |
tsún | /tsun2.tsun2.tsun2/→[tsun1.tsun1.tsun2] | ‘accurate’ |
tàng | /taŋ3.taŋ3.taŋ3/→[taŋ2.taŋ2.taŋ3] | ‘freezing’ |
kip | /kip4.kip4.kip4/→[kip8.kip8.kip4] | ‘rushed’ |
khuah | /kʰu̯aʔ4.kʰu̯aʔ4.kʰu̯aʔ4/→[kʰu̯a2.kʰu̯a2.kʰu̯aʔ4] | ‘spatious’ |
tâm | /tam5.tam5.tam5/→[tam5.tam7.tam5] | ‘wet’ |
tīng | /tiŋ7.tiŋ7.tiŋ7/→[tiŋ5.tiŋ3.tiŋ7] | ‘hard’ |
ba̍t | /bat8.bat8.bat8/→[bat5.bat4.bat8] | ‘fitting tightly’ |
pe̍h | /peʔ8.peʔ8.peʔ8/→[pe5.pe3.peʔ8] | ‘white’ |
2.3.4 Tone versus Stress
Besides a complicated tonal system, Taiwanese also has stress, and its stress is realized largely through tone (Lu, Reference Lu2003). All stressed syllables in Taiwanese receive a tone, either a base tone or a sandhi tone. On the other hand, unstressed syllables are not assigned any tone and are intrinsically short. Based on how their pitch register is realized, unstressed syllables can be of two types. The first adopts the pitch register of the end of the preceding syllable. The nominalization marker --ê and the aspect marker --ah are of this type.Footnote 4 The second type consistently assumes a low falling contour, much like that of a Tone 3 or Tone 4, regardless of the preceding tonal environment. Verbal complements --tio̍h and --khí-lâi are of this type. Table 11 shows some examples of unstressed syllables. Notice that the syllable preceding an unstressed syllable is always realized in its base tone.
Tai-lo | Before unstressed syllable | Gloss |
---|---|---|
sio--ê | /sio1.e5/→[sio1.eH] | ‘things that are hot’ |
líng--ê | /liŋ2.e5/→[liŋ2.eL] | ‘things that are cold’ |
lim--ah | /lim1.aʔ4/→[lim1.aʔH] | ‘have drunk’ |
khùn--ah | /kʰun3.aʔ4/→[kʰun3.aʔL] | ‘have slept’ |
............................................................................................................................. | ||
tsim--tio̍h | /tsim1.ti̯əʔ8/→[tsim1.ti̯əʔL] | ‘have kissed’ |
khuànn--tio̍h | /kʰũ̯ã3.ti̯əʔ8/→[kʰũ̯ã3.ti̯əʔL] | ‘have seen’ |
pue--khí-lâi | /pu̯e1.kʰi2.lai̯5/→[pu̯e1.kʰiL.lai̯L] | ‘take off’ |
tsáu--khí-lâi | /tsau̯2.kʰi2.lai̯5/→[tsau̯2.kʰiL.lai̯L] | ‘start to run’ |
2.4 Syllables
The syllable structure of Taiwanese is of a CGVX skeleton, in which only V is obligatory (Chang, Reference Chang1989; Cheng & Cheng, Reference Cheng and Cheng1987; Chung, Reference Chung1996). It can be filled by a vowel or a syllabic nasal /m̩ ŋ̩/. All consonants except for /ʔ/ can occur in the C slot. The G slot is reserved for the onglide of a rising diphthong – that is, /i̯/ and /u̯/. The final X slot can be filled with an offglide of a falling diphthong, a nasal /m n ŋ/, or an unreleased stop of /p t k ʔ/.
Traditionally, a Taiwanese syllable is composed of an initial and a final (Chappell, Reference Chappell, Vittrant and Watkins2019; Chung, Reference Chung1996). The initial is the onset consonant in the C slot, and the final is the rest of the syllable. The final is further divided into the medial and the rhyme. The medial position is reserved for the onglide of a diphthong – that is, the G slot. The rhyme is composed of the nucleus (the V slot) and the coda (the X slot). Figure 5 shows an example.
Syllable boundaries are relatively fluid in Taiwanese and can be modified, weakened, or even erased altogether through phonetic, phonological, morphological, and prosodic rules. In the following, four types of boundary modifications are introduced: coda obstruent deletion is phonetically licensed; anticipatory assimilation is phonetically and phonologically motivated; gemination is morphologically triggered; syllable fusion is largely predicted by prosody. Each of these is discussed in what follows.
2.4.1 Deletion of Coda Obstruents
In Section 2.2, we mentioned a rule regarding the deletion of /ʔ/ in TSG-internal positions. However, recent studies have shown that there are more cases of coda deletion than what is prescribed by the phonological rule. In other words, deletion is observed for all coda obstruents /p t k ʔ/, but the rate is especially high for /ʔ/. Using acoustic measurements and electroglottography, Pan (Reference Pan2017) observed that the deletion rate for /ʔ/ is higher than 80 percent, while that for /p t k/ is lower than 15 percent. Similar results were also found in Chen (Reference Chen2009b, Reference Chen2010b) and Pan and Lyu (Reference Pan and Lyu2021). /ʔ/ is always more prone to be deleted than /p t k/, and more prone to be deleted in Tone 8 than Tone 4. As /ʔ/ not only has little vowel transition (Chu & Lin, Reference Chu and Lin2010), but also lacks the visual cues commonly found in oral stops, it might be harder for listeners to perceive and is thus more prone to deletion.
2.4.2 Anticipatory Assimilation of Codas
Assimilation in Taiwanese is largely anticipatory, and mainly occurs on coda consonants (Chang, Reference Chang1989; Lin, Reference Lin2001). Table 12 shows some examples. It is interesting to note that all lexical items do not apply the rule with the same frequency. For some, such as sin-pū ‘daughter-in-law’, assimilation is obligatory. The underlying form /sin.pu/ is practically never heard. However, for others, such as sin-bûn ‘news’, assimilation is optional, and its application is probably governed by a variety of performance factors, such as speech rate, genre, and personal preferences.
Tai-lo | Assimilation | Gloss |
---|---|---|
sin-pū | /sin.pu/→[sim.pu] | ‘daughter-in-law’ |
kim-nî | /kim.ni/→[kin.ni] | ‘this year’ |
...................................................................................................... | ||
pak-tóo | /pak.tɔ/→[pat.tɔ] | ‘stomach’ |
sin-bûn | /sin.bun/→[sim.bun] | ‘news’ |
2.4.3 Gemination of Coda Stops
The gemination of coda stops can be triggered by a diminutive suffix -á [e.g., niau-á ‘cat (lit. cat-dim.)’], a nominal suffix --ê [e.g., thâi-ti--ê ‘butcher (lit. kill-pig-nom.)’], or a classifier ê [e.g., gōo ê lâng ‘five people (lit. five-CL-people)’]. It applies to all coda stops except for the glottal stop /ʔ/ (Chappell, Reference Chappell, Vittrant and Watkins2019; Chiang, Reference Ang1992; Lien, Reference Lien1995; Lin, Reference Lin2001). The process is straightforward for nasal codas as only gemination is involved. The nasal coda of the first syllable is copied to the onset slot of the following syllable to reach the final output – for example, kim-á /kim.a/→[kim.ma] ‘gold’ (Table 13). The situation is a bit more complicated for the oral stops. In addition to gemination, there is also intervocalic voicing, so a̍p-á /ap.a/ ‘box’ is realized as [ab.ba]. For syllables ending with the coda /t/, such as tsi̍t ê /tsit.e/ ‘one’, intervocalic voicing turns the coda into [l] instead, as there is no /d/ in the Taiwanese sound inventory (Table 3).
Tai-lo | Gemination (+voicing) | Gloss |
---|---|---|
a̍p-á | /ap.a/→[ab.ba] | ‘box’ |
kim-á | /kim.a/→[kim.ma] | ‘gold’ |
gín-á | /ɡin.a/→[ɡin.na] | ‘child |
tsi̍t ê | /tsit.e/→[tsil.le] | ‘one’ |
tik--ê | /tik.e/→[tiɡ.ɡe] | ‘Tik’ (a name) |
âng--ê | /aŋ.e/→[aŋ.ŋe] | ‘something red’ |
2.4.4 Syllable Fusion
Syllable fusion is a rather productive process in Taiwanese (Lin, Reference Lin2001). Table 14 shows some examples. It is clear from the table that the onset of a fused syllable usually comes from that of the first syllable, while the coda usually comes from that of the second syllable. This is captured by Chung’s (Reference Chung1996) “edge-in” rule, in which the first segment of the first syllable and the last segment of the last syllable are selected to be in the final fused form. Selection of the vowel nucleus of the fused form seems somewhat trickier. For tsit-tsūn ‘now’, the vowel of the first syllable is chosen, but for sàu--lo̍h-lâi ‘to sweep down’ and sì tsa̍p it ‘forty-one’, both vowels are incorporated in the final fused form. For sî-tsūn ‘time’, forms with a vowel from either the first or the second syllable have been attested (Li & Myers, Reference Li and Myers2005). Chung (Reference Chung1996) stated that the nucleus of the fused syllable is awarded to the vowel of the highest sonority (the “vocoid association” rule), with the ones on the left having a higher priority than those on the right (the “L-R-scanning” rule). The final fused form also has to obey the phonotactic constraints of the language (the “maximality constraint” rule). This could then easily explain the fused forms for tsa-hng ‘yesterday’, sàu--lo̍h-lâi ‘to sweep down’, and sì tsa̍p it ‘forty-one’. However, there are also limitations to the rules. For example, they fail to explain why there are two fused forms for sî-tsūn ‘time’ but only one for tsit-tsūn ‘now’ when the two practically have the same vowel combination.
Tai-lo | Syllable fusion | Gloss |
---|---|---|
tsa-hng | /tsa1.hŋ̩1/→[tsaŋ5] | ‘yesterday’ |
tsit-tsūn | /tsit4.tsun7/→[tsin2] | ‘now’ |
sî-tsūn | /si5.tsun7/→[sin7] or [sun7] | ‘time’ |
sàu--lo̍h-lâi | /sau̯3.ləʔ8.lai̯5/→[sau̯3.lu̯ai̯] | ‘to sweep down’ |
sì tsa̍p it | /si3.tsap8.it4/→[si̯ap8.it4] | ‘forty-one’ |
2.5 Prosody
Grouping and prominence are considered the two pillars in prosody (Pierrehumbert, Reference Pierrehumbert1980). The former refers to how words are organized into units of various sizes, while the latter refers to how some words are emphasized over others through highlighting devices. The following introduces the two main elements of prosody in Taiwanese.
2.5.1 Prosodic Grouping
With regard to grouping, Peng and Beckman (Reference Peng and Beckman2003) detailed three levels of prosodic constituents in their prosodic labeling system of Tone and Break Indices for Taiwanese (TW-ToBI), including the syllable, the TSG, and the intonational phrase (IP). Among the three, the syllable is the lowest level. Although syllable boundaries are largely regulated by phonotactics, they are not at all unbreakable. In addition to resyllabification due to affixation (see Section 2.4.3), syllable fusion caused by juncture weakening is often observed in spontaneous speech (see Section 2.4.4), as is illustrated in (7). The bisyllabic phrase hōo lâng (passive marker, lit. ‘give man’) is commonly realized as a fused syllable [hoŋ] in everyday speech.
(7)
I hōo lâng phah. 3rd sg. give man hit ‘He was hit’ syllable fusion: /i1 ho7laŋ5 pʰaʔ4/→[i7 hoŋ5 pʰaʔ4]
The intermediate boundary of the TSG is realized through the positioning of base tones. In addition to being an indicator of morphological structure and syntactic phrasing (Chen, Reference Chen1987), base tones can also act as a highlighting marker through prosodic grouping, as shown in (8). By inserting a TSG boundary after guá ‘1st sg.’ and setting it apart from the following verb mn̄g ‘ask’ in the highlighting version, the word reverts to its base tone and receives a narrow focus.
(8)
Uānn guá mn̄g --ah. change 1st sg. ask PART ‘(It’s) my turn/MY TURN to ask’ no highlighting: /ũ̯ã7 ɡu̯a2 mŋ̩7 aʔ4/→[ũ̯ã3 ɡu̯a1 mŋ̩7 aʔ]TSG highlighting guá: /ũ̯ã7 gu̯a2 mŋ̩7 aʔ4/→[ũ̯ã3 ɡu̯a2]TSG [mŋ̩7 aʔ]TSG adapted from Peng and Beckman (Reference Peng and Beckman2003)
The highest level of prosodic disjuncture in Taiwanese is the IP boundary, which is considered the largest phonological unit in a language (Jun, Reference Jun2005). Although its internal structure remains to be explored, previous studies showed some consensus on how such a boundary could be elicited in Taiwanese. Besides absolute utterance-final positions, both vocatives (Pan, Reference Pan2007c; Pan & Tai, Reference Pan and Tai2006) and subordinate clauses (Hsu & Jun, Reference Hsu and Jun1998) have been used in read speech elicitation, as shown in (9) and (10).
(9)
[A-pah]IP [lâi-khì tńg --ah.]IP (Pan, Reference Pan2007c) Dad leave return PART ‘Dad, let’s go home!’ /a1.paʔ4 lai̯5.kʰi3 tŋ̩2 aʔ4/→[a7.paʔ4 lai̯7.kʰi1 tŋ̩2 aʔ]
(10)
[Pîng-iú kóng]IP [ta-ke m̄-bián tsiap--lâi.]IP friend say mother-in-law not-have-to bring-home ‘(My) friend said (I) don’t have to bring (my) mother-in-law home.’ /pi̯əŋ5.i̯u2 koŋ2 ta1.ke1 m̩7.ben2 tsi̯ap4.laɪ5/→[pi̯əŋ7.i̯u2 koŋ1 ta7.ke1 m̩3.ben1 tsi̯ap4.lai̯] adapted from Hsu and Jun (Reference Hsu and Jun1998)
2.5.2 Labeling for Prosodic Grouping
Peng and Beckman (Reference Peng and Beckman2003) set up a labeling scheme of break indices (BIs) to label the three levels of prosodic boundaries. In their system, each syllable is designated a BI, which gauges the disjuncture between the current syllable and the next. As shown in Table 15, the three levels of boundaries are designated as b2, b3, and b4 in an ascending order. An example is given in Figure 6. The whole utterance constitutes one IP. Therefore, the sentence-final syllable lâi ‘come’ is followed by a b4. The phrase tsha̍t-á ‘thief’ forms a TSG, as the final syllable diminutive suffix á is lengthened and realized with a high-falling base tone. Therefore, it is followed by a b3. Syllables like the first sī ‘yes’, m̄ ‘no’, and ài ‘want’, which assume their sandhi tones and do not show much lengthening, are thus followed by a b2.
BI | Description |
---|---|
b4 | IP boundary |
b3 | TSG boundary |
b3m | percept of TSG without base tone |
b2m | base tone without percept of TSG |
b2 | syllable boundary |
b1 | resyllabification |
b0m | syllable fusion |
Besides labels for the three prosodic boundaries, there are other BIs in the system to accommodate some of their variants. Break index b0m is used for syllable fusion. The bisyllabic phrase bô ài (‘do not want, lit. no want’) in Figure 6 is realized as a fused syllable so that /bə.ai̯/ becomes [bu̯ai̯]. The syllable bô is thus labeled a b0m. Break index b1 is used to label resyllabification. The tsha̍t-á ‘thief’ in Figure 6 is a good example. The final /t/ in the first syllable tsha̍t ‘thief’ is geminated and resyllabified as the onset of the second syllable -á ‘diminutive suffix’ so that /tsʰat.a/ becomes [tsʰal.la] instead (Chang, Reference Chang1989; Chung, Reference Chung1996). As a result, the break after the syllable tsha̍t is designated a b1.
The three levels of prosodic boundaries are accompanied by gradient phonetic cues. At the right edge of a prosodic phrase, both final lengthening and boundary pause are longer at IP than TSG boundaries, and final lowering is more drastic at boundaries higher in hierarchy (Kuo, Reference Kuo2011, Reference Kuo2012; Pan & Tai, Reference Pan and Tai2006; Peng, Reference Peng1997). One can also see the effect of prosodic hierarchy on syllable duration from Figure 6. In the utterance, the IP-final syllable lâi ‘come’ is much longer than the TSG-final syllable -á ‘diminutive suffix’, which is in turn longer than the regular syllables of the first sī ‘yes’, m̄ ‘no’, and ài ‘want’. Voice quality also plays a role. Intonational phrase boundaries are often accompanied by a breathy or creaky voice while smaller boundaries are more modal (Kuo, Reference Kuo2012). Segmental information can code prosodic hierarchy as well. Nasals at IP boundaries are often accompanied by inhalation and tend to have a longer nasal plateau and nasal airflow than those at smaller boundaries (Pan, Reference Pan2007c).
Hierarchical cues are also found at the left edge of a prosodic phrase. Duration again plays a role, but in a trend that is opposite from the right edge. Syllables at IP boundaries tend to be shorter than those at TSG boundaries, and word-initial and -internal syllables are the longest of all (Pan & Tai, Reference Pan and Tai2006). In contrast, the F0 trend patterns in a similar fashion as the right edge (Pan, Reference Pan, Gussenhoven and Riad2007b; Pan & Tai, Reference Pan and Tai2006). Falling tones tend to have a larger pitch excursion by lowering the pitch floor at higher boundaries, and their falling velocity is also higher. Segments are hierarchically coded as well. Voiceless stops have a longer closure at IP than TSG boundaries, and prenasalization commonly found in voiced stops is accompanied by longer nasal airflow at higher boundaries (Hayashi, Hsu & Keating, Reference Hayashi, Hsu and Keating1999; Hsu & Jun, Reference Hsu and Jun1998).Footnote 5 Interestingly, voice onset time (VOT) measurements across stops show paradigmatic strengthening at larger disjunctures. Aspirated stops have the longest voice lag at IP boundaries and the shortest at syllable boundaries, while voiced stops have the longest voice lead at IP boundaries and the shortest at syllable boundaries. Not much hierarchical effect is observed for unaspirated stops. In other words, VOTs for voiced, unaspirated, and aspirated stops are coordinated so that they become more distinct from one another at higher boundaries.
Despite the phonetic evidence, not everyone agrees with the organization of the hierarchy. Among the three levels, the TSG boundary is the most controversial (Hsu & Jun, Reference Hsu and Jun1996; Pan, Reference Pan, Gussenhoven and Riad2007b). As mentioned, the application of the tone sandhi rules is to a large extent determined by morphology and syntax, not prosody. In addition, TSG is not strictly layered under IP. Utterance (10) is a good example. Although the first IP ends at kóng ‘say’, the word usually does not coincide with the TSG boundary, as reporting verbs are more commonly realized with a sandhi tone than a base tone. This suggests that TSG violates the strict layer hypothesis assumed in Selkirk’s (Reference Selkirk1986) prosodic organization, and is thus probably not a fully legitimate candidate for the prosodic hierarchy.Footnote 6
Peng and Beckman (Reference Peng and Beckman2003) also acknowledged the peculiarity of TSGs, and have built this into the break indices of their TW-ToBI labeling system. As shown in Table 15, b2m is used for word-internal syllables that do not undergo tone sandhi rules. This occurs in a few words of a subject-predicate structure, as in tē-tāng /te7.taŋ7/→[te7.taŋ7] ‘earthquake, lit. earth-move’. Although the first syllable tē is nonfinal, it is realized with a base tone instead of a sandhi tone, and is thus designated b2m. Break index b3m is another example demonstrating the unusual characteristics of TSG boundaries. It is used for intermediate disjunctures that end with a sandhi tone instead of a base tone. Figure 6 shows such an example. There is an intermediate boundary after sī m̄ sī ‘a grammatical construction for yes-no questions, lit. yes-no-yes’, which is evidenced by the lengthening of the second sī. However, the syllable is realized with the low-falling sandhi tone instead of the mid-level base tone. It is thus designated a b3m to accommodate the mismatch. As b2m and b3m are proposed alongside the regular indices of b2, b3, and b4 in the TW-ToBI system, it suggests that Peng and Beckman (Reference Peng and Beckman2003) also recognize the dual morphosyntactic and prosodic functions of TSG boundaries. It also implies that the intermediate disjuncture in Taiwanese is mostly accompanied by a TSG boundary, but can grammatically do without on some occasions.
The results in Pan and colleagues (Reference Pan, Lyu, Huang and Mu-fan2018) also support this view. They studied prosodic boundaries of different sizes in spontaneous speech, and found that none of them completely coincide with TSG boundaries. However, there is a positive correlation between prosodic hierarchy and TSG. Larger boundaries tend to be accompanied by TSG boundaries more often than smaller ones, although cross-dialectal differences do exist (IP: 50–80 percent; intermediate: 50–70 percent; word boundaries: 30 percent; syllable boundaries: ≤ 10 percent). In other words, a TSG could be deemed as a prosodic boundary marker due to its gradient nature, much like the more commonly cited phonetic cues, such as initial strengthening, final lengthening, and final lowering.
Based on these studies, it seems safe to assume three levels of prosodic boundaries in Taiwanese. However, the defining features of the intermediate disjuncture are less clear. Peng and Beckman (Reference Peng and Beckman2003) chose the obvious TSG as the main characteristic of the intermediate level, while at the same time allowing for exceptions to accommodate for the inconsistencies between TSG and Selkirk’s (Reference Selkirk1986) strict layer hypothesis. On the other hand, Hsu and colleagues adopted the “small phrase” (Hayashi et al., Reference Hayashi, Hsu and Keating1999) and “word” (Hsu & Jun, Reference Hsu and Jun1998) as alternatives for the intermediate disjuncture. Although this indeed bypasses the dilemma faced by Peng and Beckman (Reference Peng and Beckman2003), it is also problematic in its own right, as they did not provide clear definitions for a small phrase, and the adoption of a word as part of the prosodic hierarchy inevitably requires recourse to TSG. Further studies are thus necessary in order to shed light on the nature of the intermediate disjuncture in Taiwanese.
2.5.3 Prosodic Prominence and Its Labeling
Turning to the other pillar of prosody, one finds prominence in Taiwanese to be mainly realized through tonal realization. Peng and Beckman (Reference Peng and Beckman2003) mentioned two ways of highlighting in the language. One is through the manipulation of the phonological tone sandhi rules, as illustrated in (8). Certain syllables commonly realized as sandhi tones could be deliberately changed back to their base tones to create a highlighting effect of narrow focus. This is also supported by Pan and colleagues (Reference Pan, Lyu, Huang and Mu-fan2018), as they found a negative correlation between the occurrence of base tones and word frequency. Rare words in spontaneous speech, which are also more likely to be highlighted, tend to end in base tones rather than sandhi tones. This implies that the intermediate TSG boundary of Taiwanese could serve a dual role of both prosodic grouping and highlighting, and the former can at times become a way to achieve the latter.
The other highlighting device is through the phonetic realization of tones, termed stress in Peng and Beckman’s (Reference Peng and Beckman2003) TW-ToBI system. There are three levels of stress, and each syllable is designated a stress level (Table 16). When a syllable is realized with a full tone, it is an s2. This is not prevalent in everyday speech, and only occurs when there is focal prominence. Instead, most syllables are realized with some degree of tonal reduction, including tonal undershoot and duration shortening. Syllables of this type are designated as s1. Finally, s0 is used when a syllable is reduced and has completely lost its tonal specification. This often occurs in a prosodically weak position, such as the diminutive suffix -á in word-medial positions, as in gín-á-lâng (‘child, lit. child-dim.-human’), or can be morphologically determined (e.g., kiann-sí ‘afraid of dying, coward’ vs. kiann--sí ‘scared to death, lit. afraid-die’). Figure 7 shows three different renditions of kiann-sí. Figure 7a is elicited by putting a narrow focus on the second syllable of kiann-sí ‘afraid of DYING (not LIVING)’. The syllable sí is realized with a full-blown high-falling tone and is thus an s2. Figure 7b is elicited by putting a broad focus on the whole word of kiann-sí ‘afraid of dying, coward’. Here sí is still realized with a high-falling contour, but the high tonal target is somewhat compromised, and the syllable is thus assigned an s1. Finally, Figure 7c is elicited by neutralizing the tone of the second syllable of kiann--sí ‘scared to death’. The syllable is shortened and loses its high tonal target altogether and is thus assigned an s0.
Stress | Description |
---|---|
s2 | syllable with fully realized tone |
s1 | syllable with some tonal reduction |
s0 | syllable that has lost its tonal specification |
Like boundary cues, duration and pitch both play a role in the phonetic realization of stress (Pan, Reference Pan, Lee, Gordon and Büring2007a). S2 syllables tend to be longer than s1, and this lengthening effect interacts with syllable positioning. Those that are nearer the end of an utterance tend to show a stronger effect. S2 syllables also have a larger F0 range and a higher mean F0 than s1 syllables, and the two cues seem to complement each other. However, in general, duration is still a more reliable cue of stress than pitch.
Unlike many stress-timed Indo-European languages, in which some kind of a highlighting device is obligatory to a prosodic phrase (Jun, Reference Jun2005), prominence does not play as big of a role in Taiwanese prosody. Although Peng and Beckman’s (Reference Peng and Beckman2003) labeling system requires each syllable to be assigned a stress level, there is no stipulation with regards to the minimal number of any stress level within a phrase. In other words, highlighting is not grammatically essential, and a phrase can be realized without a single s2 syllable. One suspects that this has to do with the nature of tone languages. As pitch has already been largely occupied by the realization of lexical tones, there is little room for pitch manipulation that only serves a structuring purpose. Similar patterns are found in tone languages like Mandarin (Peng et al., Reference Peng, Chan, Tseng, Huang, Lee, Beckman and Jun2005) and Cantonese (Wong, Chan & Beckman, Reference Wong, Chan, Beckman and Jun2005). However, this does not mean a prominence-cueing device is lacking altogether in Taiwanese. Rather, both stress (e.g., s2) and boundary signals (e.g., TSG boundaries) contribute when highlighting is necessary for pragmatic purposes.
3 Dialectal Variations in Taiwanese
In addition to the mainstream Taiwanese, there are also several dialectal variations with regard to Taiwanese phonetics. In the following, major variations of consonants, vowels, and tones are introduced.
3.1 Variations in Consonants
Because consonants have specific anchor points for their places of articulation, their pronunciation is relatively stable. The one consonant in Taiwanese that shows substantial dialectal variation is the voiced sibilant /dz/. As mentioned, voiced sibilants are physiologically strenuous to produce (Ohala, Reference Ohala and MacNeilage1983). Therefore, ease of articulation provides a strong motivation for this sound to change.
Across dialects, /dz/ is not always existent, even among old speakers (Ang, Reference Ang2003, Reference Ang2012). The Mix variety is the most conservative, and its old speakers still predominantly retain [(d)z] (Table 17). On the other hand, Pro-Tsuan is the most progressive, and its old speakers almost exclusively use [l] instead. The Pro-Tsiang dialect is somewhere in between. Both [(d)z] and [l] are adopted as major free variants among old speakers. For the younger generation, some of the dialectal variations have been blurred (Chuang & Fon, Reference Chuang and Fon2017a). The realization of /dz/ in all three dialects is partly dependent on the rounding of the following vowel. In rounded environments, [l] is now the predominant realization, and little variability is observed. Realizations other than [l] became rather marginal. For unrounded environments, more variability is found. The Mix dialect is still the most conservative, and [(d)z] is still a robust realization, along with [l] and [ɡ]. For Pro-Tsiang, [l] and [d] are used interchangeably, but [(d)z] is still used, albeit to a lesser extent. For Pro-Tsuan, [l] is always the dominant form, but [d] can also occasionally occur.
Pro-Tsuan | Pro-Tsiang | Mix (mainstream) | |
---|---|---|---|
Old | [l] ≫ [(d)z] |
|
|
Young |
|
|
|
It is interesting that two new realizations, [ʐ] and [d], are found among the younger generation for /dz/. They are not necessarily the dominant realizations, but their appearance is worth mentioning. The retroflex [ʐ] probably arose from negative transfer from the official language Mandarin, which includes a rich set of retroflexes in its consonant inventory. As mentioned, due to the Mandarin-only policy, the younger generation in Taiwan is almost always more dominant in Mandarin than Taiwanese (Chen, Reference Chen2010a; Yap, Reference Yap2018). Therefore, it is not surprising for them to adopt [ʐ] as a way to realize /dz/, as both are voiced sibilants by nature. The adoption of [d] is even more intriguing. It is suspected that this has to do with the fact that Taiwanese lacks a /d/ in its voiced stop set, and the use of [d] might be a way for the speakers to fill in the gap for the system.
Besides dialectal influences, gender and speaker proficiency are also influential factors (Chuang & Fon, Reference Chuang and Fon2018). Males are more inclined to retain the voiced sibilant feature and use [(d)z] and [ʐ] as realizations of /dz/, while females are more inclined to adopt the weakened form [l]. Those with higher proficiency are more likely to retain [ɡ] and [ʐ], while those with lower proficiency are more likely to adopt [l] and [d]. It thus seems safe to say that the future fate of /dz/ is still not yet completely determined. If the two newly adopted forms, [ʐ] and [d], are gaining popularity, or if speakers (especially males) are consciously regaining their language proficiency in response to the reversing language shift movements currently promoted by the government, then the phoneme /dz/ could still survive in one form or another. However, if the weakening process is strong enough to override all other possibilities, then in the near future, Taiwanese will be left with only seventeen consonants (cf. Table 3), and /dz/ will be merged with /l/.
3.2 Variations in Vowels
Compared to consonants, vowels tend to vary much more. In this section, two common vowel variations are examined. One is the oral vowel /ə/ and the other is the nasal vowel /ĩũ̯/. They are chosen because their variations are still fairly widespread among the younger generation and therefore are likely to influence the future path of Taiwanese.
3.2.1 Oral Vowel /ə/
Although /ə/ is currently a phoneme in the vowel system of mainstream Taiwanese (Figure 3 in Section 2.2), it is conjectured to be originally derived from a former phoneme /o/ through delabialization (Chang, Reference Chang2000; Chen, Reference Chen2009a; Tung, Reference Tung1991). In other words, the current /ə/-/ɔ/ contrast was previously an /o/-/ɔ/ contrast instead, resulting in an asymmetric six-vowel system (Figure 8a) (Chen, Reference Chen2004). The two contrasts are geographically neatly distributed among members of the older generation (Table 18) (Hsu, Reference Hsu2016). Northern speakers use the /o/-/ɔ/ contrast exclusively while southern speakers use only the /ə/-/ɔ/ contrast. On the other hand, central speakers, residing geographically in the middle, occupy the middle ground by showing a mixture of the two, and both /o/-/ɔ/ and /ə/-/ɔ/ contrasts can be observed.
Northern | Central | Southern | |
---|---|---|---|
Old | /o/-/ɔ/: 100% |
| /ə/-/ɔ/: 100% |
Young |
|
|
|
For the younger generation, dialectal variations regarding the mid-central/back vowel contrast is largely attenuated. As shown in Table 18, except for 40 percent of the northern young speakers, who lost their parents’ /o/-/ɔ/ contrast by merging the former with the latter, resulting in a symmetric five-vowel system (Figure 8b) (Chen, Reference Chen2004),Footnote 7 the majority of the speakers follow mainstream Taiwanese and show the /ə/-/ɔ/ contrast regardless of geographical location. However, most of these instances are not etymologically justified, but are instead influenced by Mandarin. Table 19 shows some examples of Taiwanese words containing the vowel in concern that also have cognates in Mandarin. These words are commonly produced with an /ɔ/ vowel in the north and /ə/ vowel in the south among old speakers (Hsu, Reference Hsu2016). For the younger generation, however, the choice of /ɔ/ or /ə/ is not determined by geographical location, but is largely determined by the Mandarin pronunciations of the words instead. If the Mandarin cognate has a vowel close to /ɔ/, as is the case of só ‘lock’, then /sɔ/ is preferred over /sə/. On the other hand, if the Mandarin cognate has a vowel close to /ə/, as in ko ‘older brother’, then /kə/ is preferred over /kɔ/ instead.
Word | Gloss | Taiwanese | Mandarin | Preferred |
---|---|---|---|---|
só | ‘lock’ | /sɔ/N, /sə/S | /su̯o/ | /sɔ/ |
kó | ‘fruit’ | /kɔ/N, /kə/S | /ku̯o/ | /kɔ/ |
ko | ‘older brother’ | /kɔ/N, /kə/S | /kɤ/ | /kə/ |
hô | ‘river’ | /hɔ/N, /hə/S | /xɤ/ | /hə/ |
thô | ‘peach’ | /tʰɔ/N, /tʰə/S | /tʰau̯/ | no preference |
pò | ‘to report’ | /pɔ/N, /pə/S | /pau̯/ | no preference |
Based on these examples, it is probably safe to say that the symmetric six-vowel system in mainstream Taiwanese (Figure 3 in Section 2.2), although relatively new, is here to stay, as it has support from the sheer number of speakers due to both geographical locations (for old speakers) and Mandarin influence (for young speakers). The symmetric five-vowel system created by merging /ɔ/ and /o/ might also have a chance of survival (Figure 8b) (Chen, Reference Chen2004). According to Chen (Reference Chen2009a, Reference Chen2009b), both systems are symmetric and are therefore phonologically stable. The old asymmetric six-vowel system would probably gradually disappear, not only because it is phonologically unstable (Chen, Reference Chen2010b), but also because of the dwindling population of the old generation (Figure 8a) (Hsu, Reference Hsu2016).
3.2.2 Nasal Vowel /ĩũ̯/
The /ĩũ̯/ sound in mainstream Taiwanese is homogeneous across most regions in Taiwan, except for Tainan. Instead of /ĩũ̯/, Tainan speakers use a slightly different vowel /ĩ̯ɔ̃/ (Chen, Reference Chen2009a; Tung, Reference Tung1991). Table 20 shows some examples.
Word | Gloss | Mainstream | Tainan |
---|---|---|---|
siunn | ‘box’ | /sĩũ̯/ | /sĩ̯ɔ̃/ |
iûnn | ‘sheep, goat’ | /ĩũ̯/ | /ĩ̯ɔ̃/ |
tsiùnn | ‘sauce’ | /tsĩũ̯/ | /tsĩ̯ɔ̃/ |
Although this is a sound that is specific only to the Tainan variety, it is still worth mentioning for two reasons. First, by incorporating /ĩ̯ɔ̃/ instead of /ĩũ̯/ in the system, the Tainan variety has created a mismatch between its nasal and oral vowels, violating the universal tendency for nasal vowels to show the same distribution as their oral counterparts (Clements et al., Reference Clements, Vaissière, Amelot, Montagu, Rialland, Ridouane and van der Hulst2015). The universal tendency holds true for mainstream Taiwanese, which includes both /ĩũ̯/ and /iu̯/ in the system (Table 6). However, the Tainan variety poses as an exception, as it allows /iu̯/ but not /ĩũ̯/, and includes /ĩ̯ɔ̃/ but not /i̯ɔ/. Second, the Tainan variety has been playing a very important role in the development of Taiwanese. Tainan used to be the political center of Taiwan (Liu, Reference Liu2010), and currently, it is also the only region in Taiwan in which more residents adopt Taiwanese as their primary or secondary language than Mandarin (National Statistics ROC, 2021). Therefore, compared to other parts of Taiwan, Tainan speakers probably have exerted more influence on the development of Taiwanese. However, studies have shown a positive correlation between speaker age and the usage of /ĩ̯ɔ̃/ (Chen, Reference Chen2009a). Younger speakers are more inclined to be mainstreamed to /ĩũ̯/ than to maintain the local /ĩ̯ɔ̃/. This suggests that universal tendency and/or the mainstream variety are strong forces in molding a dialectal system, even when there is a robust group of speakers with an alternative pronunciation. More time is needed to see whether the unique nasal vowel /ĩ̯ɔ̃/ will continue to exist in the future.
3.3 Variations in Tones
Tones in Taiwanese can also be susceptible to dialectal changes. In this section, two variations regarding tones are introduced. One is the realization of Tone 8, and the other is a tone sandhi rule regarding Tone 5 commonly found in Pro-Tsuan.
3.3.1 Realization of Tone 8
Although Tone 8 is canonically a high short tone (Table 7), its realizations can be highly variable. Table 21 compares several studies that looked at the realizations of Tone 8. It is evident from the table that there is a robust age effect. For the older generation, the realization of Tone 8 is dialect-dependent. Pro-Tsiang seems to be the most consistent. All three locations examined show a mid tone. This implies that Tone 8 is merged into Tone 4 in this dialect. On the other hand, Pro-Tsuan is rather heterogeneous. Depending on the location, realizations can range from a high, to a mid, to a rising tone. The Mix dialect is somewhere in between. All locations show a high tone, but there are also two places that allow a mid tone. This implies that they might be at the early stage of merging.
Major dialect | Location | Old | Young | Studies |
---|---|---|---|---|
Pro-Tsuan | Taipei | high/mid | mid | Chen (Reference Chen2010b, Reference Chen2013) |
Pro-Tsuan | Changhua | high/rising | --- | Tu (Reference Tu2011) |
Pro-Tsuan | Hsinchu | high/rising | mid | Chen (Reference Chen2018) |
Pro-Tsuan | Taichung | mid | mid | Chen (Reference Chen2021) |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||||
Pro-Tsiang | Changhua | mid | mid | Chen (Reference Chen2010b) |
Pro-Tsiang | Yilan | mid | mid | Chen (Reference Chen2014, Reference Chen2017) |
Pro-Tsiang | Taichung | mid | mid | Chen (Reference Chen2021) |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||||
Mix | Hualien | high/mid | --- | Chen & Chen (Reference Chen and Chen2020) |
Mix | Tainan | high/mid | mid/falling | Chen (Reference Chen2009a, Reference Chen2010b) |
Mix | Tainan | high | --- | Yang (Reference Yang1988) |
Mix | Kaohsiung | high | --- | Ang (Reference Ang1997) |
Mix | Kaohsiung | high | mid | Khng (Reference Khng2014) |
For the younger generation, however, the dialectal differences are almost completely mitigated. In all except for one instance (Chen, Reference Chen2009a, Reference Chen2010b), younger speakers realize Tone 8 as a mid short tone, showing a complete merge between Tone 4 and Tone 8. Notice that this merge only occurs in the base tone. For the sandhi tone, Tone 8 still follows its own sandhi rule. It becomes a Tone 4 when the coda is /p t k/, and becomes a Tone 3 when the coda is /ʔ/.
3.3.2 Tone Sandhi Rule of Tone 5
In mainstream Taiwanese, Tone 5 is realized as a Tone 7 in a sandhi position (Figure 4) (Chang, Reference Chang1989; Lin, Reference Lin2001). This is true of both Pro-Tsiang and the Mix dialects. However, for Pro-Tsuan, Tone 5 is realized as a Tone 3 instead (Lu, Reference Lu2003; Tung, Reference Tung1991). Table 22 shows some examples. This rule seems fairly ubiquitous within the dialect, and no difference across age groups was observed (Chen, Reference Chen2021). It is therefore considered a robust rule and should remain one of the signature characteristics of the dialect for some time to come.
Word | Gloss | Mainstream | Pro-Tsuan | Gloss |
---|---|---|---|---|
sîng-kong | /siŋ5.kɔŋ1/ | [siŋ7.kɔŋ1] | [siŋ3.kɔŋ1] | ‘success’ |
lâng-kheh | /laŋ5.kʰeʔ4/ | [laŋ7.kʰeʔ4] | [laŋ3.kʰeʔ4] | ‘guest’ |
nâ-kiû | /nã5.kiu̯5/ | [nã7.kiu̯5] | [nã3.kiu̯5] | ‘basketball’ |
4 Materials for Research on Taiwanese
We would like to make two sets of materials available to researchers interested in the phonetics of Taiwanese. The first is a read speech dataset of The Story of Aju, which is a short passage originally designed for studying the phonetic realization of the voiced sibilant /dz/ in Taiwanese (Chuang & Fon, Reference Chuang and Fon2017b, Reference Chuang and Fon2018). The second is a spontaneous speech dataset of monologues elicited through an interview format. It is part of a large corpus construction project on the spontaneous speech of Mandarin-Taiwanese bilinguals (Fon, Reference Fon2004).
4.1 The Story of Aju
Although read speech elicitation is a common way of obtaining speech data, it is technically more difficult in Taiwanese to do so compared to many other more well-known languages. As the current romanization system (Ministry of Education ROC, 2008) and the standardization of character writing (Ministry of Education ROC, 2014) were only recently introduced, many adult speakers are still not very proficient in the spelling and writing rules and tend to revert to a character-based make-do system when the need for writing Taiwanese arises. This grassroot system is not standardized, can vary from person to person, and is often produced extemporaneously. Therefore, longer passages of read speech are rather difficult to elicit in Taiwanese without substantial experimenter intervention. The Story of Aju was created against this backdrop. It is a deliberately short passage of only ninety syllables. However, it includes all oral monophthongs and diphthongs except for /u̯e/ in Taiwanese. Two of the nasal vowels, /ĩ/ and /ĩũ̯/, and both syllabic nasals, /m̩/ and /ŋ̩/, are also included. All onset consonants except for /pʰ/ are included. Finally, all codas are included. The full text of the passage is shown in (11). For the convenience of the readers, both the standard character system and the romanization system are provided, along with a fairly literal English translation. Researchers are encouraged to use whatever writing system they see fit to their own situations.
(11) 有一个囡仔叫做阿如。伊足溫柔,毋過字寫甲足䆀, ,所以伊的同學攏共伊笑。頂禮拜二,天氣足熱,阿如想欲入去教室,毋過伊同學共伊欺負,無愛予阿如入去。阿如心內足艱苦,所以轉去共媽媽講。媽媽就共伊講:「阿如,你著愛忍耐!」
Ū tsi̍t ê gín-á kiò-tsò A-jû. I tsiok un-jiû, m̄-koh jī siá kah tsiok bái, sóo-í i ê tông-o̍h lóng kā i tshiò. Tíng lé-pài-jī, thinn-khì tsiok jua̍h, A-jû siūnn-beh ji̍p-khì kàu-sik, m̄-koh i tông-o̍h kā i khi-hū, bô ài hōo A-jû--ji̍p-khì. A-jû sim-lāi tsiok kan-khóo, sóo-í tńg-khì kā ma-ma kóng. Ma-ma tō kā i kóng, “A-jû, lí tio̍h ài jím-nāi!”
There was a child named Aju. She was very sweet, but had poor handwriting, so her classmates all laughed at her. Last Tuesday, it was very hot, and Aju wanted to enter the classroom, but her classmates bullied her by not letting her in. Aju was in anguish, so she went home and told her mom. Mom then told her, “Aju, you just have to put up with it!”
Recordings and textgrid files of Praat of two males and two females are made available for public use (M1.wav, M1.TextGrid, M2.wav, M2.TextGrid, F1.wav, F1.TextGrid, F2.wav, F2.TextGrid). All speakers were young Mandarin-Taiwanese early bilinguals who have acquired Min since birth. To gauge their Min proficiency, two sets of self-ratings on a Likert scale of 1 to 7 were used, one for Mandarin and Min proficiency, and the other for their frequency of use. Ratings for Mandarin were included as a reference since it is the official language in Taiwan, and all speakers of the younger generation are expected to develop full proficiency in the language and use it on a daily basis. Table 23 shows the biographical details of the speakers.
Speaker | Age | Gender | Hometown | AoA | Proficiency | Usage |
---|---|---|---|---|---|---|
M1 | 21 | male | Taipei | birth | 6:7 | 4:7 |
M2 | 22 | male | Changhua | birth | 6:7 | 6:7 |
F1 | 23 | female | Tainan | birth | 6:7 | 6:7 |
F2 | 21 | female | Taichung | birth | 6:7 | 6:7 |
Recordings were done in a sound-treated room using a KORG MR-1000 digital recorder and a Sennheiser HMD 25-1 head-mounted microphone at a sampling rate of 44,100 Hz, and were later downsampled to 22,050 Hz using Adobe Audition CS6. The experimenter was a Mandarin-Taiwanese bilingual native speaker. Before the recording, she checked with each subject to make sure they could produce the paragraph fluently and correctly. Subjects were asked to read in a natural fashion. The recording session lasted less than fifteen minutes.
Figure 9 is an illustration of the labeling of The Story of Aju in Praat. Syllables were labeled based on their underlying forms. For example, even though tsi̍t ê ‘one-CL’ is resyllabified as [tɕil.le] in the actual rendition, the labeling still follows the syllable boundaries stipulated by phonology, which is /tɕit.e/.
4.2 The Spontaneous Speech Corpus
The Spontaneous Speech Corpus was constructed to facilitate research on the spontaneous speech of Mandarin-Taiwanese bilinguals in Taiwan, who constitute the largest ethnic group in the country (Huang, Reference Huang1993). Speech was elicited in an interview format by a fluent Mandarin-Taiwanese bilingual experimenter in a quiet room using SONY TCD-D8 and a Sennheiser HMD 25-1 head-mounted microphone at a sampling rate of 44,100 Hz, and was later downsampled to 22,050 Hz using Adobe Audition CS6. Each speaker contributed roughly thirty minutes of Mandarin and thirty minutes of Taiwanese speech data.
An experimenter usually started off with some neutral preset questions, but was given the liberty to sidetrack into topics that seemed of interest to the speakers along the way. Common topics discussed include hometown, food, travel, movies, and careers. Short excerpts of the Taiwanese recordings and their corresponding Praat TextGrid files from two males and two females are made available for public use (M3.wav, M3.TextGrid, M4.wav, M4.TextGrid, F3.wav, F3.TextGrid, F4.wav, F4.TextGrid). To control for dialectal variations, all speakers were young adults from Taichung, a central metropolitan area in Taiwan. However, it is difficult to pinpoint exactly which dialect they spoke based on their hometowns due to various degrees of dialect mixing shown in the data. Table 24 shows the biographical details of the speakers. In the excerpts, they were all talking about their hometowns. A full transcription of the excerpts is provided in the Appendix.
Speaker | Age | Gender | Hometown | Recording (mm:ss) |
---|---|---|---|---|
M3 | 33 | male | Qingshui, Taichung | 03:16 |
M4 | late 20s | male | Longjing, Taichung | 03:04 |
F3 | mid 20s | female | Dongshi, Taichung | 03:09 |
F4 | 24 | female | Taiping, Taichung | 03:01 |
5 Acoustic Analysis of the Pronunciation of Taiwanese
In this section, we would like to utilize the two sets of data in Section 4 and examine some of the phonetic realizations of the Taiwanese phonological system. We first examined the allophonic variations of onset plosives, the voiced sibilant /dz/, vowels, and coda plosives. Then we looked at realizations of tones and syllable fusion. Finally, we showed how the prosodic structure could be manifested through tone. Please note that the phenomena mentioned were intended to provide a glimpse of how some major elements of Taiwanese phonology are phonetically realized, and were not meant to comprise an exhaustive list. Although some of the number counts in the tables that follow are not large due to the small size of the data provided in Section 4, the phenomena themselves are not fortuitous, but are in fact rather common in at least some dialects of Taiwanese, as per our observations. Many of the characteristics discussed can be developed into one or several full-blown lines of research, and the purpose of this section is to provide some potential pointers for interested researchers to delve deeper into Taiwanese phonetics.
5.1 Onset Plosives
As discussed in Section 2, Taiwanese has nine plosives in total (Table 3) (Chang, Reference Chang1989). All except for /ʔ/ can act as syllable onsets. In the following, the phonetic realizations of voiced, voiceless unaspirated, and voiceless aspirated plosives are examined in turn.
Table 25 shows the top realizations of voiced /b/ and /ɡ/. In both cases, there are stark differences between read and spontaneous speech. For /b/, the most common realization in read speech was the default [b], accounting for 54 percent of the data. The prenasalized [mb] was the second most common, accounting for 23 percent. In contrast, the role of [b] was drastically diminished in spontaneous speech, accounting for only 31 percent of the data, while voiced fricatives [v] and [β] became more common candidates, and accounted for 24 percent and 22 percent, respectively.
/b/ | /ɡ/ | |
---|---|---|
Read |
|
|
Spontaneous |
|
|
Spontaneous (pron. excluded) | --- |
|
Turning to /ɡ/, we found a different picture. First of all, the default [ɡ] was not a common realization for /ɡ/ (Table 25). We found no instance of [ɡ] in read speech, and only three instances in spontaneous speech. Instead, the dominant realization for /ɡ/ in read speech was the prenasalized [ŋɡ], accounting for 75 percent of the data, and total deletion was actually the most prevalent “realization” in spontaneous speech, with a deletion rate as high as 81 percent. The prenasalized [ŋɡ] came in a far second, accounting for only 8 percent of the data. Careful inspection showed that the predominance of total deletion was largely contributed by the two first personal pronouns, guá ‘1st pers. sg.’ and guán ‘1st pers. pl.’, which accounted for about 84 percent of the cases. However, deletion was not limited to such, as content words like Tâi-gí ‘Taiwanese’ and gín-á ‘child’ were also found to omit their /ɡ/. Even if personal pronouns were disregarded, deletion is still the most preferred option, accounting for 50 percent of the data, and the prenasalized [ŋɡ] is still the second, accounting for 29 percent.
The variability of /b/ and /ɡ/ is higher than what was found in previous studies on careful speech (cf. Pan, Reference Pan1995), in which only [mb] and [ŋɡ] were mentioned, not [v], [β], and total deletion. However, since speakers were encouraged to speak naturally in this study, larger variability was expected. Maintaining voicing throughout closure is a physically strenuous task (Ohala, Reference Ohala and MacNeilage1983). Therefore, voiced stops are often either (partially) devoiced to maintain stop closure or become prenasalized or spirantized to maintain voicing.
Based on our data, Taiwanese did not take the route of devoicing, as we found no instance of /b/→[p] and only one instance of /ɡ/→[k]. With three-way voicing contrast in stops, devoicing might potentially jeopardize the between-category discriminability in the language and is thus dispreferred. Instead, when a voiced stop is intended, speakers tend to aim for either the voiced stop itself, its prenasalized variant, or its spirantized version. The latter two are useful for sustaining voicing since they allow an opening in the supralaryngeal cavity, keeping the intraoral pressure low. The adoption of these two alternative pronunciations also implies that voicing is probably a more valued feature than closure for the realization of voiced stops in Taiwanese. Figure 10 shows some spectrographic examples of /b/ realizations.
The realization difference between /b/ and /ɡ/ was also interesting. In read speech, the default [b] was the most common realization for /b/, while the prenasalized [ŋɡ] was the most common for /ɡ/. In spontaneous speech, the most common realization for /b/ was the spirantized [β, v], while the most common for /ɡ/ was total deletion. In both cases, the realizations of /ɡ/ are more lenited than those of /b/, more so in spontaneous speech than read speech. This shows that the closure of the voiced set is probably gradually eroding away, and the velar position is taking the lead. This is likely due to both physiological and lexical factors. Physiologically, active voicing is harder to maintain in /ɡ/ than in /b/ because of a smaller supralaryngeal cavity and a more limited soft surface on which air pressure can impinge (Ohala, Reference Ohala and MacNeilage1983). Lexically, /ɡ/ is also much less productive than /b/ in the language. TJ’s Dictionary of Non-literary Taiwanese (Tiun, Reference Tiun2009) included 1,009 entries for /b/-initial words, but only 465 entries for /ɡ/. Similarly, we found twenty-one unique /b/-words in our spontaneous speech corpus, but only eleven /ɡ/-words. Furthermore, if we disregard the two overly represented personal pronouns of guá ‘1st pers. sg.’ and guán ‘1st pers. pl.’, then there were only twenty-seven tokens containing /ɡ/, as compared to fifty-five tokens containing /b/. All these showed that in terms of both type and token frequencies, /ɡ/ is consistently the lesser used of the two. Therefore, the loss of /ɡ/ might not create as much confusion as losing /b/.
Turning to the voiceless unaspirated set, we seemed to find much less variability (Table 26). For read speech, the default realization was close to 100 percent for all three stops. Even for spontaneous speech, the realization rates were still 70 percent. /t/ and /k/ also had an additional prenasalized voiced allophone of around 10 percent in spontaneous speech. A close inspection showed that more than 90 percent of these cases were due to carryover assimilation from preceding nasals or vowels (e.g., /t/→[nd] in guán tau ‘my place’). For /p/, there was a secondary realization of total deletion in spontaneous speech. All of them came from a single lexical item, tsha-put-to ‘about’. Of the seven instances found, five showed some kind of syllable contraction. The trisyllabic word was conventionally coalesced into a monosyllabic tshiâu (Figure 11). This shows that total deletion is unlikely to be a regular allophonic realization of /p/, but is rather a result of lexicalized syllable contraction specific to tsha-put-to.
/p/ | /t/ | /k/ | |
---|---|---|---|
Read |
|
|
|
Spontaneous |
|
|
|
Realizations for the aspirated set were even more stable (Table 27). Both /tʰ/ and /kʰ/ were predominantly realized as their respective default [tʰ] and [kʰ] in read and spontaneous speech. Unfortunately, we only collected one case of /pʰ/ in spontaneous speech, and none in read speech, so it is unclear whether /pʰ/ would also have a consistent realization as [pʰ]. However, since labials did not show more variability than the other two places in the voiced (Table 25) and the unaspirated set (Table 26), we would predict that /pʰ/ would also generally be realized as [pʰ].
/pʰ/ | /tʰ/ | /kʰ/ | |
---|---|---|---|
Read | --- |
|
|
Spontaneous |
|
|
|
Figure 12 shows the VOT values of the three sets of onset plosives that were realized as their default in read and spontaneous speech. For voiced stops, since prenasalized realizations were prevalent (Table 25), sometimes even more prevalent than the default realizations, both the default and the prenasalized realizations were included in the VOT calculation. As shown in the figure, the three sets of voiced, voiceless unaspirated, and voiceless aspirated onset stops showed distinct VOT ranges in both read and spontaneous speech. Voiced VOT had an average of −45 ms to −65 ms, voiceless unaspirated VOT had an average of 10 ms to 25 ms, and voiceless aspirated VOT had an average of 60 ms to 75 ms. The results for the voiceless unaspirated set coincided well with most previous studies (Chiung, Reference Chiung2002; Hsieh, Reference Hsieh2007; Lin, Reference Lin and Tse2013; Tseng & Huang, Reference Ang1992), although it was somewhat shorter than those in Huang (Reference Huang2009). For the voiced set, our voice lead was consistent with what was found in Hsieh (Reference Hsieh2007) and Tseng and Huang (Reference Ang1992), but was longer than that in Huang (Reference Huang2009), and shorter than that in Chiung (Reference Chiung2002). Finally, for the voiceless aspirated set, the voice lag in this study was approximately the same as that in Hsieh (Reference Hsieh2007), but was shorter than that in Chiung (Reference Chiung2002), Lin (Reference Lin and Tse2013), and Huang (Reference Huang2009), and longer than that in Tseng and Huang (Reference Ang1992). Since all of these studies employed various methodologies, this suggests that the voiceless unaspirated set is the most stable in terms of its VOT measures, probably because it is the set in between the two extremes. Both the voiced and the voiceless aspirated sets are more susceptible to various factors, such as genre, speech rate, and speaker idiosyncrasies. However, it is also clear from previous studies and from Figure 12 that the three-way distinction was largely maintained despite the effect of these performance factors.
5.2 Voiced Sibilant /dz/
The realization of the voiced sibilant /dz/ has been known to be varied (Chuang & Fon, Reference Chuang and Fon2017a, Reference Chuang and Fon2018). This is not surprising, as it is physiologically a difficult sound to produce (Ohala, Reference Ohala and MacNeilage1983). Table 28 shows the common allophonic realizations of /dz/ in read and spontaneous speech. We found four major realizations, [z d l ᵑɡ]. [z] was only observed in read speech. This is expected. Although it is considered one of the canonical pronunciations of /dz/ (Ang, Reference Ang1997, Reference Ang2003; Chen, Reference Chen1995; Lin, Reference Lin1995), it is also physiologically strenuous (Ohala, Reference Ohala and MacNeilage1983), and probably harder to maintain in spontaneous speech. [ᵑɡ] is a dialectal variant, and it only occurs before an unrounded vowel (Ang, Reference Ang2003). In our corpus, only Subjects F1 and F4 sometimes adopted this variant. [l] is a lenited form of /dz/. It was observed in all four environments, but was more common before rounded than unrounded vowels. In unrounded environments in read speech, only one token was found. [d] also appeared in all environments except before rounded vowels in spontaneous speech. It seems this newly adopted variant is very popular among the younger generation (Chuang & Fon, Reference Chuang and Fon2017a, Reference Chuang and Fon2018).
Rounded | Unrounded | |
---|---|---|
Read |
|
|
Spontaneous |
|
|
There are at least two interesting points from Table 28. First, between the two major traits of the voiced sibilant /dz/, voicing seems to be valued more than frication. Of the 67 /dz/ tokens collected in our corpus, voiced allophonic realizations accounted for more than 90 percent, including all four of the major realizations [z d l ᵑɡ]. On the other hand, there were only thirteen tokens of sibilants, including eight [z]’s, two [dz]’s, two [ts]’s, and one [s], accounting for less than 20 percent.Footnote 8 In other words, like voiced stops (Table 25), Taiwanese speakers viewed the voicing sound quality as more important to the realization of /dz/ than maintaining the designated manner of articulation. It is possible that voicing itself creates a perceptual similarity between sounds of different manners, which is not easily achieved by sounds of the same manner with different voicing status (Balise & Diehl, Reference Balise and Diehl1994). Second, besides the rounded vowels in spontaneous speech, which only had two tokens, [d] was a prominent realization in the remaining three environments. This is especially intriguing, as the modern Taiwanese consonant inventory does not include /d/. However, since [l] is a dominant lenition form for /dz/ (Ang, Reference Ang2003; Wang, Reference Wang2014), young speakers nowadays seem to have filled the gap created by the missing /d/ through the realization of /dz/→[l]→[d], thus making the voiced stop series complete (Chuang & Fon, Reference Chuang and Fon2017a, Reference Chuang and Fon2018). If we consider [d] to be also a realization subtype of [l], then [l] has indeed become the dominant realization of the voiced sibilant /dz/ in Taiwanese, as 61 percent of the tokens were realized either as [l ɭ] or [d ˡd nd] [cf. Ang (Reference Ang2003)]. Acoustically, the two realizations are not that much different either (Figure 13). [d] only has a low-frequency voice bar, while [l] has additional higher-frequency formants. The transition is also slightly more abrupt in [d] than in [l] due to the release of the stop closure. However, the two could potentially be very similar to a native listener’s ear.
5.3 Vowels
As most of our speakers are from central and southern Taiwan, their vowel choices expectedly lean toward the south, with some tokens of the north from time to time. This mixture of both varieties is consistent with the findings in Hsu (Reference Hsu2016). Figure 14 shows the normalized values of F1 and F2 of the vowels adopted by the speakers. The vowels were taken from CV and V syllables with a nonnasal onset and an oral vowel. This was to avoid complications in formant smearing caused by nasal formants and anti-formants. Also, since juxtaposition of adjacent vowel targets in diphthongs often results in target undershoot, only monophthongs were included in the calculation. Formant values were first extracted from the midpoint of a vowel by a Praat script, and Lobanov’s (Reference Lobanov1971) normalization was applied using the vowels package (Kendall & Thomas, Reference Kendall and Thomas2018) in R (R Core Team, 2021).
For read speech, all four speakers adopted the symmetric six-vowel system of /i, e, a, ə, ɔ, u/ (cf. Figure 3), as all etymologically /ə/-tokens except for one were realized as such. The exception was an [ɔ] uttered by M1 in one of the two tokens of tông-o̍h /tɔŋ.əʔ/ ‘classmates’. This shows that /ə/ is the dominant vowel in the vowel system for read speech for these speakers. As shown in Figure 14a, the three corner vowels /i, a, u/ demonstrated very little variation. This is probably due to the fact that they are located at the edges of the vowel space, so little variability is allowed. Interestingly, the non-corner vowel /ɔ/ also showed little variation. We suspect that this is because its neighboring vowel /ə/ is too close by, leaving it little room for variability. In contrast, the two mid vowels of /e/ and /ə/ showed larger inter-speaker variation. This is possibly due to the fact that phonemically there is only one level of mid vowels in Taiwanese and substantial leeway is allowed.
For spontaneous speech, the picture was somewhat different, and there was an additional vowel of [ɔ], labeled as [ɔə] (Figure 14b). This was because all four speakers showed free variation between [ə] and [ɔ] for what should have been /ə/. Figure 15 displays the [ɔ] realization rates for these speakers. It is interesting to see a potential gender effect. Both M3 and M4 had a realization rate of around 60 percent, while F3 had 44 percent and F4 only had 26 percent.
Except for /ə/, all the other vowels in spontaneous speech demonstrated more variability than their read speech counterparts. This is expected, since coarticulation and reduction are more prevalent due to faster speech rate and higher spontaneity. /ə/ showed little variation, probably also because it had little room for variability due to crowdedness in the region.
The fact that different vowel realizations were observed across the two genres is rather interesting. The predominance of /ə/→[ə] in read speech suggests that [ə] is probably considered the canonical and dominant form for these speakers. This is consistent with the claim by Chang (Reference Chang2000) and Tung (Reference Tung2001). On the other hand, the mixture of [ə] and [ɔ] in spontaneous speech implies that [ɔ] has become an allophone of /ə/, and [ə] and [ɔ] are in free variation. This is in line with Hsu’s (Reference Hsu2016) findings. Currently, [ə] seems to be winning, as it was the canonical and dominant form in read speech. The fact that females also preferred [ə] over [ɔ] in both genres implies that [ə] does not have a negative connotation, as female speakers usually tend toward more prestigious speech (Labov, Reference Labov2001). Even though northern speakers generally use /ɔ/ not /ə/ (Hsu, Reference Hsu2016), they also use Taiwanese less often than their central and southern counterparts (National Statistics ROC, 2021), and are thus less likely to exert much influence on the future direction of the language. Therefore, [ə] might still be the dominant and canonical realization of /ə/ for some time to come.
5.4 Coda Obstruents
Taiwanese allows four obstruents /p t k ʔ/ in the coda position (Chang, Reference Chang1989). Previous studies suggested a tendency for the final obstruents to be deleted, and omission is both segment- and tone-dependent. It tends to be more common for the glottal /ʔ/ than for the oral /p t k/, and more common for Tone 8 than Tone 4 for /ʔ/ (Chen, Reference Chen2009b, Reference Chen2010b; Pan & Lyu, Reference Pan and Lyu2021).
Figure 16 shows the realization of the obstruent codas in our data. “Regular” refers to realizations dictated by phonological rules. This includes the default [p t k ʔ] realizations, the intervocalic lenition of [p], [t], and [k] into [β], [l] and [ɣ], respectively, the word-medial deletion of [ʔ], and the realizations of anticipatory assimilation to the following sound. Figure 17 shows some examples for the realizations of /k/. Figure 17a is a default instance of [k] in sann ê a-tsik ‘three uncles, lit. three CL uncle’, Figure 17b is a lenition example of /k/ →[ɣ] in tāi-ha̍k ê sî-tsūn ‘during college, lit. college GEN time’, and Figure 17c is an example of anticipatory assimilation of /k/→[t] in tsha tsiok tsē ‘differ very much’.
The “innovative” category refers to realizations not prescribed by any known phonological rule. For example, the final /k/ of kám-kak ‘to feel’ was realized as [ʔ] instead (Figure 17d). Since there is currently no phonological rule dictating such a change, it is operationally defined as an innovative form by the speaker. There are potentially two accounts underlying this type of realization. First, speakers might be unfamiliar with the exact realization of the final obstruent of the lexical item, and either mistook [kaʔ] as the pronunciation for /kak/ or used [ʔ] as a default replacement for all the coda obstruents of which they were unclear. Second, speakers might be starting a new process of further simplifying the system so that the oral gestures of the final /p t k/ are slowly eroding away, and only the glottal gesture that often accompanies oral stops remains. This is not unheard of. For example, Pó-chéng-uā, a related Min language spoken on a small outlying island of Wuqiu (see Figure 1), has undergone the erosion process of keeping only the final /ʔ/ but losing all the oral obstruents (Dai, Reference Dai2007). There are also other types of innovative realizations besides realizing oral stops as [ʔ]. For example, an instance of /p/ in --ji̍p-khì ‘to enter’ was found to be realized as [t] instead. Table 29 shows the distribution of the innovative usages in our data. Of the 428 instances of final obstruents collected, there were only 14 such tokens. However, they were not distributed evenly. There seemed to be a stronger tendency for the oral /p t k/ to be realized as [ʔ] than for them to be realized as other oral [p t k] options, especially in spontaneous speech. More importantly, there was no instance of /ʔ/→[p t k] that could not be explained by anticipatory assimilation. Regarding the predominance of /p t k/→[ʔ], it was unclear from the distribution which of these two accounts is more plausible. More studies are required in order to determine this.
/p t k/→[ʔ] | /p t k/→[p t k] | Others | |
---|---|---|---|
Read | 2 | 1 | --- |
Spontaneous | 8 | 2 | 1 |
Finally, the “deletion” category in Figure 16 refers to the total deletion of a coda obstruent that is not otherwise regulated by a phonological rule. This pertains to the realizations of the final /p t k/ and also the word-final /ʔ/, but not the word-medial /ʔ/. Although deletions in these positions all indicate a total loss of the final coda gesture, only the former two are not designated by a phonological rule, while the latter one is (see [6]). Therefore, the former two are included in this category, but the latter one is considered as one of the “regular” realizations of /ʔ/ instead. Figure 17e shows an example of the “deletion” category. The final /k/ in kok-gí ‘national language’ was completely deleted, resulting in an open syllable.
The distribution in Figure 16 is intriguing, as it shows at least two traits that do not exactly coincide with previous studies (cf. Chen, Reference Chen2009b, Reference Chen2010b; Pan & Lyu, Reference Pan and Lyu2021). First of all, /ʔ/ did not appear as eroding at the fastest pace in our data. In fact, it seemed fairly comparable to /t/ and /k/ in terms of its regular realizations. We suspect this discrepancy might be at least partly due to the adoption of different stimuli and categorization criteria. Chen (Reference Chen2009b, Reference Chen2010b) studied word-final obstruent codas of monosyllabic and bisyllabic words in isolation (e.g., káu-tsa̍p ‘ninty’), while Pan and Lyu (Reference Pan and Lyu2021) examined word-medial obstruent codas of bisyllabic words in a carrier sentence (e.g., Sing thiann sik-lāi ê siann-tiāu ‘First listen to the tone of indoors.’). To make our data a little more comparable, we first limited our observations to only prepausal /p t k/ tokens, which would be prosodically more similar to the isolated words in Chen (Reference Chen2009b, Reference Chen2010b). Results are shown in the second column of Table 30, and the /ʔ/ deletion rate was indeed much higher than /k/ deletion (64% vs. 16%), while no deletion was found in this position for /p/ and /t/. Second, we limited our observations to only sentence-medial coda obstruents to be more comparable to the dataset used in Pan and Lyu (Reference Pan and Lyu2021). Since they seemed to have treated all four obstruents equally, and did not take into consideration the word-internal deletion rule of /ʔ/, we also did the same with our data. Results are shown in the third column of Table 30. Again, the /ʔ/ deletion rate was much higher than the /t/ and /k/ deletion rates (57% vs. 30% vs. 22%), and there was no instance of deletion for /p/ in this position. This shows that if one does not take the phonological deletion rule of /ʔ/ into account, then /ʔ/ deletion is always more prevalent than that of /p t k/. However, this does not necessarily imply that /ʔ/ will disappear. As there was also a tendency for random tokens of /p t k/ to be realized as [ʔ] (Table 29), it is currently difficult to predict which of the final obstruents will disappear first.
Prepausal | Sentence-medial | |
---|---|---|
/p/ | 0/13 | 0/19 |
/t/ | 0/5 | 30/100 |
/k/ | 11/67 | 12/55 |
/ʔ/ | 124/195 | 92/162 |
The largest discrepancy regarding /ʔ/ between the current findings and previous studies actually lies in whether /ʔ/ deletion is tone-dependent. Previous studies suggested that /ʔ/ in Tone 8 is more prone to be deleted than that in Tone 4 (Chen, Reference Chen2009b, Reference Chen2010b; Pan & Lyu, Reference Pan and Lyu2021). However, this was not observed in our data. Figure 16b shows that the deletion rate was in fact higher in Tone 4 than in Tone 8. If phonologically allowed /ʔ/ deletion was also included in the calculation, then the overall deletion rate of /ʔ/ was 66 percent for Tone 4 and 70 percent for Tone 8. We suspect the discrepancy stemmed from different sampling methods. Unlike Chen (Reference Chen2009b, Reference Chen2010b) and Pan and Lyu (Reference Pan and Lyu2021), we did not have comparable sets for the two tones. In fact, our collection consisted of predominantly Tone 4 rather than Tone 8 tokens (N = 210 vs. 24). Therefore, studies with more balanced tokens from both tones would be needed in order to see whether a stable tone-dependency effect exists in /ʔ/ deletion.
The second interesting pattern illustrated in Figure 16 is the resilience of the final /p/. Of the twenty tokens collected, there was only one single instance of innovative realization, in which --ji̍p-khì /dzip.kʰi/ was realized as [dzit.kʰi]. All the remaining realizations were either the orthodox [p] or [β], or [t k] due to anticipatory assimilation. In other words, there was no deletion at all. This implies that erosion for coda /p/ is progressing at most at a slow pace compared to the remaining /t k ʔ/. However, this was not mentioned in previous studies (cf. Chen, Reference Chen2009b, Reference Chen2010b; Pan & Lyu, Reference Pan and Lyu2021). We suspect that this might have something to do with the higher visibility of the oral gesture, as /p/ is the most visual of the four final obstruent codas. Since we had relatively few tokens of coda /p/ in our data, more research would be required in order to confirm the stability of /p/.
Previous studies showed that the realizations of final /p t k ʔ/ were substantially influenced by syllable position, dialect, and gender (Chang, Reference Chang1989; Chen, Reference Chen2009b, Reference Chen2010b; Pan & Lyu, Reference Pan and Lyu2021). The discrepancies between the current data and the previous data suggest that speech genre probably also plays a role. It is thus safe to say that the final obstruents in Taiwanese are under a long-term process of erosion. However, the progression of individual obstruents is largely influenced by both linguistic and nonlinguistic factors. More studies at various future time points are required in order for us to have a clearer picture of this phenomenon.
5.5 Tone
To observe how base tones are realized in Taiwanese, syllables at utterance-final positions were examined. F0 measurements were taken from the voiced portions of these syllables (Howie, Reference Howie1974), and Figure 18 shows the average realizations of the five smooth tones. There are three interesting observations.
First, although Taiwanese has two sets of tones that differ phonemically only in pitch register, Tone 1 and Tone 7, and Tone 2 and Tone 3, their acoustic realizations were not exactly comparable. For the pair of level tones, Tone 1 and Tone 7, both the F0 height and the F0 contour were somewhat different. Tone 1 was indeed higher than Tone 7 as prescribed, but only Tone 1 was relatively level. Tone 7 has a contour that is slightly dipping, and is closer to Tone 5. On the other hand, the pair of falling tones, Tone 2 and Tone 3, was more faithful to its phonemic prescriptions. Both were falling in contour and Tone 2 was consistently higher than Tone 3 regardless of genre and gender.
Second, Tone 5 is acoustically found to be a dipping tone, even though it is phonologically deemed as rising. Across languages in the area, it is not uncommon to find rising tones to be realized with a slight initial fall. Both Taiwan and Mainland Mandarin show such a tendency (Fon & Chiang, Reference Fon and Chiang1999; Fon, Chiang & Cheung, Reference Fon, Chiang and Cheung2004; Shi & Wang, Reference Shi and Wang2006). The initial fall is considered a byproduct of the physiological effort required for achieving a rise, and is regarded as phonologically insignificant (Chao, Reference Chao1956, Reference Chao1968; Shih, Reference Shih1988). However, perceptual experiments have shown that listeners do actively use the initial portion to help tonal detection when necessary (Fon et al., Reference Fon, Chiang and Cheung2004). What strikes us as intriguing is the high resemblance in the realizations of the rising tones between Taiwanese and Taiwan Mandarin. Both tend to have their turning points situated around the center of the tone, and both have a shallow rise following a shallow fall. For both languages, the rising tone could at most be considered as a mid-rising tone. On the other hand, although Shi and Wang (Reference Shi and Wang2006) also showed a slight dipping contour for Mainland Mandarin, the turning point is fairly early, at around 20 percent of the tone. The rise is also a prominent one, and usually ends at the high end of the tonal range, making it a high-rising tone. Figure 19 shows a comparison of three bisyllabic words of rising tones in Taiwanese, Taiwan Mandarin, and Mainland Mandarin. It is clear from the figure that there is high resemblance in the realization of the rising tones between Taiwanese (Figure 19a) and Taiwan Mandarin (Figure 19b), while the rising tones in Mainland Mandarin are more dissimilar (Figure 19c). The pitch range of the rise is larger, and the initial falling portion is shorter (Fon & Chiang, Reference Fon and Chiang1999; Fon et al., Reference Fon, Chiang and Cheung2004; Shi & Wang, Reference Shi and Wang2006). This suggests that the resemblance between Taiwanese and Taiwan Mandarin is probably due to language contact, and the direction is more likely from Taiwanese to Mandarin than vice versa.
Finally, there were also some patterns in duration worth noting. In read speech, the rising Tone 5 was longer than the two falling tones, Tone 2 and Tone 3, though it was only longer than Tone 2 and not longer than Tone 3 in spontaneous speech (see Figure 18). This is in line with previous studies (Gandour, Reference Gandour1977; Yu, Reference Yu, Fougeron, Kühnert, D’Imperio and Vallée2010). Moreover, the high-falling Tone 2 was always longer than the mid-falling Tone 3, probably because the mid-falling tone spanned a smaller F0 range and thus required less time to achieve the tonal targets. On the other hand, the duration difference between the high-level Tone 1 and the mid-level Tone 7 was quite intriguing, as it clearly violated observations from previous studies that vowel duration is inversely correlated with F0 (Gandour, Reference Gandour1977). We suspect this might have something to do with the acoustic realizations of the two aforementioned tones. As dynamic tones tend to be perceptually longer than static ones, it is possible that the mid-level Tone 7 is perceptually licensed to be short. More studies are needed in order to observe the interplay between acoustic tonal realizations and their perceptual duration.
As for the checked tones, the realizations were much messier, especially for Tone 8. As shown in Table 31, we collected more tokens of utterance-final Tone 4 than Tone 8. Tone 4 tokens were predominantly realized as the canonical mid tone (Figure 20a). Tokens of Tone 4 being realized as either Tone 1 or Tone 8 were all contributed by the utterance-initial interjection ah ‘well’, which was always realized with a high-level contour. Therefore, if the final obstruent remained, then it became a Tone 8, and if it was deleted, then it became a Tone 1 (Figure 20b).
T4 | T8 | |
---|---|---|
Read |
|
|
Spontaneous |
|
|
|
| --- |
The realization of Tone 8 was even more varied. Of the ten tokens collected, only one was realized as the canonical high-falling tone (Figure 21a) (Chang, Reference Chang1989). The majority were realized as a mid Tone 4 instead (Figure 21b). In other words, the base Tone 8 was to a large extent merged with the base Tone 4. When the final obstruent was eradicated, the tone became a mid-level Tone 7 (Figure 21c). This suggests that Taiwanese is probably undergoing a simplifying process in terms of its checked base tones, and is gradually moving from a seven-tone system to a six-tone one. This is consistent with Chen’s (Reference Chen2009b, Reference Chuang and Fon2010b) observations and predictions.
5.6 Syllable Fusion
Syllable fusion is a common phenomenon in Taiwanese. Table 32 shows the words that were fused multiple times in the corpus. Almost all fused instances were originally bisyllabic. The fused form of tsha-put-to ‘about’ was the only one that was trisyllabic. At the segmental level, most fusion followed the edge-in rule and the L-R-scanning rule (Chung, Reference Chung1996) (see Section 2.4.4). According to these two rules, the fused forms of khng [kʰŋ̩] and iunn [ĩũ] for khó-lîng ‘possibly’ and in-uī ‘because’, respectively, would be considered exceptions. The former chose [ŋ] over [ə] for the vowel nucleus, violating the L-R-scanning rule and the vocoid association rule, while the latter chose the [u] over [i] from the second syllable, violating the edge-in rule.
Phrase/word | Fused form | Gloss | N |
---|---|---|---|
(a) Read | |||
|
| ‘do not want’ | 3 |
(b) Spontaneous | |||
khó-lîng (HH+MM) |
| ‘possibly’ | 5 |
|
| ‘about’ | 4 |
|
| ‘so’ | 4 |
|
| ‘to feel’ | 3 |
|
| ‘because’ | 2 |
--khí-lâi (LL+LL) |
| ‘start to’ | 2 |
|
| ‘to people’ | 2 |
|
| ‘at’ | 2 |
At the tonal level, the edge-in principle was even more likely to be violated. From Table 32, it is clear that the first tonal target of the first syllable seemed to have priority and was always chosen to be in the fused form. However, the target chosen seemed somewhat less predictable for the second syllable. In some cases, the first target was chosen, as in bô ài ‘do not want’ and in-uī ‘because’; in other cases, the second target was chosen, as in sóo-í ‘so’ (Figure 22). The fused form of the trisyllabic word tsha-put-to ‘about’ was an even clearer blatant violation of the rule, as it was the tonal target from the second syllable that was chosen, not the third. More importantly, the fused form was found in three different speakers (M4, F3, and F4), so it could not be easily explained as idiosyncratic pronunciation.
5.7 Prosodic Hierarchy
As a tone language, Taiwanese probably imposes a greater restriction on how prosodic tunes can vary, much like Mandarin (Peng et al., Reference Peng, Chan, Tseng, Huang, Lee, Beckman and Jun2005) and Cantonese (Wong et al., Reference Wong, Chan, Beckman and Jun2005). However, this does not imply the total absence of prosody in Taiwanese, but instead, an intricate interaction between tone and prosodic tunes is in action. We followed Peng and Beckman’s (Reference Peng and Beckman2003) system and labeled Tone 2 and Tone 5 at both IP- and TSG-final positions (Figure 23). As both tones span a substantial tonal range, one could thus observe how pitch range in prosody interacts with tonal register and tonal range. One could also examine whether tonal contours are preserved under the effect of declination. As shown in the figure, the overall tonal contour was not different at the two prosodic levels for the high-falling Tone 2. Both were realized as falling. However, the degree of the fall was much larger before IP than TSG boundaries, and the difference was larger in read than spontaneous speech. In addition, TSG-final Tone 2 started at a much higher register than those before IP boundaries in read speech, while not much difference was found in spontaneous speech. This suggests that final lowering was indeed more prominent in the IP-final position for T2, but the effect was stronger in read speech, exerting an influence on both the initial tonal register and the final fall. In spontaneous speech, the effect was more obvious in the final fall only. There was also a substantial difference in duration between the two prosodic levels. Regardless of speech genre, IP-final Tone 2 was consistently longer than TSG-final Tone 2, indicating a stronger final lengthening effect at higher prosodic boundaries. This was generally in line with previous studies (Kuo, Reference Kuo2011, Reference Chen2012; Pan & Tai, Reference Pan and Tai2006; Peng, Reference Peng1997).
Although previous studies only looked at the falling Tone 2, we also included the other contour tone in Taiwanese, the rising Tone 5. Because the second half of the tonal contour goes against the direction of declination, it would be interesting to see how they interact. As shown in Figure 23, the typical dipping contour of Tone 5 was only well preserved in IP-final position. In TSG-final position, only female spontaneous speech showed such a contour. For male spontaneous speech and female read speech, only the initial fall was observed, but not the final rise. For male read speech, even the initial fall was not realized, and the contour became flat. In other words, within the same genre, females were generally more conservative for contour preservation than males. For Tone 5 tokens realized with the initial falling and/or the final rising portion, the IP-final position showed a larger excursion, but the difference was not as large as that in Tone 2, probably because of the intrinsic physiological constraint of the tone (Shi & Wang, Reference Shi and Wang2006), and conflict between the tonal contour and the declination trend. There was not much difference in the initial tonal register between IP- and TSG-final positions either. Except for female spontaneous speech (Figure 23d), which showed a large difference, Tone 5 tokens at these two prosodic levels began at about the same tonal register. This shows that final lowering did not exert a differential effect on the realization of Tone 5 at the two prosodic levels.
Comparing Tone 2 and Tone 5 across different prosodic positions and speech genres suggested that the effect that prosodic tunes and pitch range impose on tones at prosodic boundaries is likely dependent on a variety of factors. In general, tonal contours are preserved better at large prosodic boundaries than smaller ones, in falling tones than rising ones, in spontaneous speech than read speech, and among females than males. This implies that linguistic, sociolinguistic, and physiological factors might all be involved in Taiwanese prosody. More studies would be needed in order to understand the strength of these factors and how they interact.
6 Future Research
Looking across research on Taiwanese, one finds at least two areas that are especially lacking. The first is prosody. Although Peng and Beckman (Reference Peng and Beckman2003) have sketched out a rough prosodic structure for Taiwanese, very little has been done to test how the model could be refined or modified. The majority of research on Taiwanese still focuses on segmental and tonal variations across dialects only. This is unfortunate, as Taiwanese provides a perfect testing ground for examining interactions among tone, stress, rhythm, and prosody. Compared to Mandarin, it has a richer tone inventory, a more robust stress system, and a more varied syllable structure, in addition to a tone sandhi rule set that is far more intricate and sensitive to suprasegmental elements. It would thus be theoretically interesting to see how prosody could be instantiated in such a complex system.
The other area that is often lacking is a cognitive perspective on how learners and speakers process such a system. Because of the switch from a Taiwanese-dominant to a Mandarin-dominant society due to the Mandarin-only policy (Huang, Reference Huang1993; Lin, Reference Lin2001), the research stance regarding the age factor is often from the point of language decay and language loss. Younger speakers are regularly regarded as losing ground in Taiwanese due to insufficient proficiency through processes such as simplification, lenition, and deletion. While the decline of Taiwanese usage and proficiency in the younger generations is true, such a perspective will not be of much help in revitalizing the language itself. More studies on how a weaker language is acquired and processed would be extremely helpful for the younger speakers today to develop a full Taiwanese system and for the language to regain its vitality in the society as a whole. The future awaits, but its brightness depends largely on what we do today.
Appendix
What follows are the transcripts of the Taiwanese recording excerpts from the Mandarin-Taiwanese Spontaneous Speech Corpus (Fon, Reference Fon2004) in both the standard character system and the romanization system of M3, M4, F3, and F4. English translation is also provided to facilitate understanding. Both of the female speakers co-switched to Mandarin several times in their excerpts. As Mandarin is the dominant and official language in Taiwan, this is a fairly common phenomenon. Mandarin utterances are indicated through underline, and its romanization follows the Hanyu pinyin system. To protect the speakers’ privacy, their names were edited out.
M3 | |
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
M4 | |
Q: |
|
A: |
|
F3 | |
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
F4 | |
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
A: |
|
Q: |
|
David Deterding
Universiti Brunei Darussalam
David Deterding is a Professor at Universiti Brunei Darussalam. His research has involved the measurement of rhythm, description of the pronunciation of English in Singapore, Brunei, and China, and the phonetics of Austronesian languages such as Malay, Brunei Malay, and Dusun.
Advisory Board
Bill Barry, Saarland University
Anne Cutler, Western Sydney University
Jette Hansen Edwards, Chinese University of Hong Kong
John Esling, University of Victoria
Ulrike Gut, Münster University
Jane Setter, Reading University
Marija Tabain, La Trobe University
Benjamin V. Tucker, University of Alberta
Weijing Zhou, Yangzhou University
Carlos Gussenhoven, Radboud University
About the Series
The Cambridge Elements in Phonetics series will generate a range of high-quality scholarly works, offering researchers and students authoritative accounts of current knowledge and research in the various fields of phonetics. In addition, the series will provide detailed descriptions of research into the pronunciation of a range of languages and language varieties. There will be elements describing the phonetics of the major languages of the world, such as French, German, Chinese and Malay as well as the pronunciation of endangered languages, thus providing a valuable resource for documenting and preserving them.