Shanghai Chinese (Shanghainese; 上海话) is a Wu dialect (ISO 639-3; code: wuu) spoken in the city of Shanghai (CN-31), one of the four municipalities in the People's Republic of China. Over the last century, the dialect has been heavily influenced by neighbouring dialects spoken in the provinces of Jiangsu and Zhejiang, such as Jianghuai Mandarin (江淮官话), the Suzhou Wu dialect (吴语苏州话), and the Ningbo Wu dialect (吴语宁波话), in addition to two other, more distant dialects, Cantonese (广东话) and Northern Mandarin (北方官话). Most native speakers of Shanghai Chinese are in fact descendants of immigrants from Jiangsu and Zhejiang provinces who moved to Shanghai in the late nineteenth and early twentieth century. More recently, the position of Shanghai Chinese has been eroded with the influx of immigrants from other parts of the country and the widespread adoption of Standard Chinese. Today, virtually all speakers also speak (Shanghainese-accented) Standard Chinese. There has been considerable research on the synchronic and diachronic changes of the dialect. Representative works include Xu & Tang (Reference Xu and Tang1962) on socially stratified variation in Shanghainese; Qian (Reference Qian2003) and Hu (Reference Hu and Hu2003), which provide detailed accounts of the evolution of Shanghainese over the last century; Liu (Reference Liu2004), which focuses on the sound structure of Shanghainese and attempts to trace its historical development; Shi & Jiang (Reference Shi and Jiang1987), which reports significant individual variation among speakers of broadly similar age (born between 1928 and 1948); Xu, Tang & Tang (Reference Xu, Tang and Tang1982) on synchronic variation in Shanghai Chinese; and Xu, Tang & Tang (Reference Xu, Tang and Tang1988), which investigates variation among three generations of ten families (grandparents: born between 1807 and 1915; parents: 1930–1948; and children: 1967–1968).
Shanghai Chinese is generally understood to be the modern dialect spoken in the urban districts that were recognized as the city centre of Shanghai before the incorporation of various surrounding suburbs and towns into the municipality since the 1980s. The dialect is therefore also known as上海闲话, the ‘Shanghai urban variety’, as opposed to 本地闲话, the ‘Shanghai suburban variety’ (see Xu & Tao Reference Xu and Tao1997 for further details and You Reference You2010 for a detailed survey of the sound system changes in suburban varieties). The commonly recognized first attempt at a systematic description of Shanghai Chinese is Edkins (Reference Edkins1853). Since the beginning of the 20th century, various aspects of the sound structure of Shanghai Chinese have been investigated, with some providing an overview (e.g. Chao Reference Chao1928, Sherard Reference Sherard1972, Xu & Tang Reference Xu and Tang1988, Xu & Tao Reference Xu and Tao1997), some addressing segmental properties (e.g. Ren Reference Ren1992, Shen, Wooters & Wang Reference Shen, Wooters and Wang1987, Ping Reference Ping2005, and Z. Chen Reference Chen2010 on consonant production; Svantesson Reference Svantesson1989 and Chen Reference Chen2008a on vowels), and most focusing upon lexical tone and tone sandhi (e.g. Zee & Maddieson Reference Zee and Maddieson1980; Shen Reference Shen1981a, Reference Shenb, Reference Shen1982, Reference Shen1985; Xu, Tang & Qian Reference Xu, Tang and Qian1981, Reference Xu, Tang and Tang1982, Reference Xu, Tang and Qian1983; Jin Reference Jin1986; Rose Reference Rose1993; Zhu Reference Zhu1999; Z. Chen Reference Chen2007; Chen Reference Chen2008b). Shanghai Chinese has also been taken as an important case language in the development of phonetic and phonological theories of tone realization (e.g. Selkirk & Shen Reference Selkirk, Shen and Inkelas1990; Duanmu Reference Duanmu1995, Reference Duanmu1997, Reference Duanmu1999; M. Chen Reference Chen2000; Yip Reference Yip2002; Chen Reference Chen2011) and syllable structure (e.g. Duanmu Reference Duanmu1994, Reference Chen2008).
The present description is accompanied by recordings of a female native speaker who was born in the 1950s and grew up in the Huangpu District. According to Xu & Tao (Reference Xu and Tao1997), she belongs to the Middle Generation Shanghainese group of speakers (市区中派方言), those born roughly around 1940–1965.Footnote 1 She has lived mostly in the Netherlands since 1989, but visits Shanghai regularly. She speaks mainly Shanghai Chinese at home. Our motivation for the present account is not only to bring together existing descriptions of Shanghainese in an accessible form, but also to propose a number of analytical innovations relative to traditional treatments of the Shanghainese data. These are: (i) the analysis of prevocalic glottal stop [Ɂ] and voiced [ɦ] as allophonic features of onsetless syllables conditioned by tone; (ii) the rejection of [i] or [j] in dipthongs/triphthongs following the alveolo-palatal obstruents; (iii) the rejection of a fricative vowel symbol in favour of a syllabic fricative; (iv) the absence of a contrastive palatal nasal; and (v) the interpretation of the front rounded glide as allophonic.
Lexical tone
Shanghai Chinese has evolved from the eight-tone system recorded in Edkins (Reference Edkins1853) to the current five-tone system (Qian Reference Qian2003). Figure 1 illustrates the f0 contours of the five tones (T1–T5) uttered in isolation (i.e. /tɔ1/ ‘knife’, /tɔ2/ ‘island’, /dɔ3/ ‘peach’, /tʊʔ4/ ‘to supervise/check’, and /dʊʔ5/ ‘to read’) by our female speaker.Footnote 2
Researchers vary greatly in the conventions of numerical values used to describe the pitch contours of the five tones. For example, Xu & Tang (Reference Xu and Tang1988), a classic description of Shanghai Chinese, adopts the five-scale pitch system developed by Chao (Reference Chao1930), which divides a speaker's pitch range into five levels with 5 indicating the highest end and 1 the lowest. T1–T5 are transcribed as 53, 34, 23, 55, and 12, respectively.Footnote 3 Strictly speaking, this system does not accurately reflect the f0 contours plotted in Figure 1 (even with non-linear transformation of the f0 values). This discrepancy is an indication of the considerable variation in pronunciation that exists both within the same generation of Shanghainese speakers as well as across generations, as is also evident in the various transcriptions offered by other researchers (e.g. Zee & Maddieson Reference Zee and Maddieson1980 [T1: 51, T2: 34, T3: 14, T4: 5, T5: 14];Footnote 4 Shen Reference Shen1981b, Reference Shen1985 [T1: 52, T2: 334/34, T3: 113/13, T4: 4/5, T5: 23]; Zhu Reference Zhu1999 [T1: 41i, T2: 23i, T3: 14a, T4: 33i, T5: 24a] with i indicating high pitch register and a indicating low pitch register).Footnote 5
At a more abstract level, these transcriptions converge on the basic f0 patterns of the lexical tones. T1 is a Falling tone, while T2 starts at a high register with a rise towards the end and can be termed high Rising. T4 shows a high and slightly falling f0. It has a short duration and sounds like a high-level pitch. Hereafter, we will refer to it as a short High tone. These three tones all start within the relatively higher f0 range, traditionally known as the Yin (阴) register tones. T3 is a low Rising tone and T5 is a low Rising tone with the same relatively short duration as T4. T3 and T5 start within a low f0 range, traditionally known as the Yang (阳) register tones. Note that both tones end with a quite high pitch level. In other words, tonal register in Shanghainese is evident in the beginning part of the tonal contours, indicating that it is in part governed by the laryngeal specification of the onset consonant. T4 and T5 only occur in syllables closed by a glottal stop and are known as Rusheng (入声), as opposed to the other three tones, which are known as Shusheng (舒声). Because of this set of complex tone–segment/rime co-occurrence patterns, tonal contrasts in Shanghai Chinese have been argued to be syllable-level contrasts (e.g. Zee & Maddieson Reference Zee and Maddieson1980, Zhu Reference Zhu1999). In multisyllabic constituents, lexical tones contrast only in the initial syllable of a tone unit and thus neutralize in non-initial positions. (See the sections on syllable structure and tone contrasts below for more details.)
Consonants
We have identified 28 consonants. Corresponding key words/bound morphemes are provided below the consonant chart. A prominent feature of Shanghai Chinese is the three-way laryngeal contrast in obstruents, known as quanqing 全清 ‘all clear’, ciqing 次清 ‘secondary clear’, and quanzhuo 全浊 ‘all muddy’. In modern phonetic terms, they are often labeled as voiceless unaspirated, voiceless aspirated, and voiced, respectively (Chao Reference Chao1967), and we adopt these labels in this description. These labels should not mask the important observation that these obstruents vary in their phonatory state from modal or stiff for the ‘clear’ types to slack for the ‘muddy’ types (see Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 63) and may also be classified as tense, aspirated, and lax, respectively.
In the initial position of a prosodic tone unit, there is little VOT difference between the voiceless unaspirated and voiced categories, both of which differ in VOT from voiceless aspirated obstruents. In the non-initial position of a tone unit, voiced obstruents are fully voiced, giving rise to a three-way laryngeal distinction in VOT. While VOT thus serves as a cue in these positions, there are other acoustic and articulatory correlates for the three-way contrast in both tone unit initial and medial positions (e.g. Shen et al. Reference Shen, Wooters and Wang1987 on closure duration; Ren Reference Ren1992 on transillumination/photoglottography data in the stops’ laryngeal adjustments; Cao & Maddieson Reference Cao and Maddieson1992 on phonation cues such as H1−H2 around the onset of the following vowel).
Among the alveolar consonants of Shanghai Chinese, /t h t d/ tend to be denti-alveolar and /s z ts h ts/ apical alveolar, with /s z/ having a contact area slightly further front than /ts h ts/ (there is no */dz/). /n l/ are typically laminal alveolar but their place of articulation varies due to coarticulation. Consonants are palatalized before high front segments (i.e. /i y j/), which is particularly noticeable in alveolars, as in /ti 1/ [t j i 1] ‘low’, /njɔ3/ [n j jɔ] ‘to circle’. It is important to note that there is no contrast between alveolar /t h t d n l/ and their palatalized versions, and we therefore do not posit any palatal or palatalized nasal in the system, despite its inclusion in other descriptions of the language (e.g. Xu & Tang Reference Xu and Tang1988). The contact areas for the alveolo-palatals (/tɕʰ tɕ dʑ ɕ ʑ/) include the alveolar ridge and the forward part of the palatal region, again with the contact being laminal. These consonants generally show a raised tip/blade and front of the tongue and are thus laminal palatalized alveolo-palatal (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 180). Further, /ɕ ʑ/ have a slightly more front contact area than /tɕʰ tɕ dʑ/, an observation that is supported by electropalatographic data in Ping (Reference Ping2005). Phonotactically, alveolo-palatal sounds share the distribution of palatal /j/. However, following widely observed practice in Illustrations of the IPA, we have placed the alveolo-palatals and /j/ in different columns in the above chart.
Syllables with an alveolo-palatal fricative or affricate onset evolved from alveolar or velar consonants due to the palatalization triggered, historically, by the following high front vowel or glide /j/ (see Liu Reference Liu2004 for further discussion and references). The convention among Sinologists has been to transcribe syllables with an alveolo-palatal onset with a high vowel /i/ after the consonants (e.g. Xu & Tang Reference Xu and Tang1988), as in /tɕiɔ1/ ‘tender’ and /ɕiɔ1/ ‘vanish’.Footnote 6 The spectrograms in Figure 2, however, do not support this practice. They show a considerable period of time over which the second formant converges toward the value of the back rounded vowel in the case of /tjɔ1/, while the transition after /tɕ/ is more rapid, temporally comparable to the transition in /tɔ1/. Moreover, the inclusion of /j/ after alveolo-palatals would amount to the presence of a phoneme which is fully predictable from the context. We therefore treat any transitional effects after an alveolo-palatal onset as phonetic, without the need to posit an underlying phoneme /i/ or /j/.
The labial, alveolar, and alveolo-palatal fricativesFootnote 7 have a two-way laryngeal contrast, commonly labeled as voiceless vs. voiced. Similar to the stops, their phonatory states vary from stiff in the voiceless ones to more slack in the voiced ones. In the initial position of a tone unit, the voiced fricatives are fully devoiced, paralleling the lack of VOT cues for initial voiced obstruents. Other cues are nevertheless present; voiceless fricatives show greater amplitude of the noise component and are often longer than their voiced counterparts. Their difference is also evident in their different spectral centres of gravity (7140 Hz in /z/ of /z 3/ ‘tree’ vs. 7638 Hz in /s/ of /s 2/ ‘try’; 1834 Hz in /v/ of /vu 3/ ‘father’ vs. 5143 Hz in /f/ of /fu 1/ ‘husband’). Non-initially in the tone unit, the voiceless fricatives remain voiceless, but their voiced counterparts are fully voiced and are produced with glottal pulse turbulence. This contrast is illustrated in Figure 3 (/fu1/ ‘man/husband’ vs. /zɐŋ3 + fu 1/ ‘husband’) and Figure 4 (/vu3/ ‘father’ vs. /jɐŋ3 + vu 3/ ‘adopted father’). Lastly, /h/ is pronounced [x] before /w/ and non-low back vowels, as in /hwɛ1/ [xwei 1] ‘dust’ and [h] before other vowels.
Syllabic / / occurs in syllables with an alveolar sibilant onset and no coda, e.g. /s 2/ ‘try’ and /z 3/ ‘tree’. The later portion of syllabic / / tends to lose its friction, in which case it ends with the spectral quality of a central close-mid vowel, commonly transcribed as /ɿ/ (e.g. Xu & Tang Reference Xu and Tang1988 after Karlgren Reference Karlgren1940, but see Ladefoged & Maddieson (Reference Ladefoged and Maddieson1996: 314) on fricative vowels). The suspension of friction is visible in Figure 5 for the larger part of / / in /s 2/ and for the final part in /z 3/.Footnote 8
Sonorants
Labial and velar nasals can form syllable nuclei, e.g. / 3/ ‘fish’, / 3 məʔ/ ‘not in possession’, and / 1 ma/ ‘mother’. /ŋ/ is the only nasal that may occur in all three positions in the syllable – onset, nucleus, and coda. However, the specific place of articulation for the nasal coda varies, partly due to the articulatory latitude that the nasal coda enjoys, given that the language neutralizes place contrasts in coda position. For example, following /ə/ and /ʏ/, the nasal coda is closer to an alveolar [n] or alveolo-palatal [nʲ] (as in /kəŋ3/ [kən 3] ‘to follow’ and /tɕʏŋ1/ [tɕʏnʲ1] ‘army’). An open vowel before coda /ŋ/ is strongly nasalized (as in /zɑŋ3/ [z ŋ3] ‘bed’). Very often, the vocal tract configuration for the low vowel is maintained till the end of the syllable and there is no complete velic closure (as in /kɐŋ1/ [k 1] ‘hard (rice)’). (See the section on vowels below for more discussion on nasalized vowels.) When a sonorant serves as the onset of a syllable, its phonatory state varies from stiff with a high-register tone to more slack with a low-register tone. It is common to annotate this allophonic change of phonatory state with [Ɂ] and [ɦ], respectively, as in [Ɂm] and [mɦ] (Xu & Tang Reference Xu and Tang1988: 7).
Vowels
There are 15 vowels in the basic inventory.Footnote 9 Nine monophthongs occur in open syllables, as plotted in Figure 6. Vowels in closed syllables are plotted in Figure 7, where six occur in syllables closed by a nasal coda (left diagram) and five in syllables closed by a glottal coda (right diagram). We adopted the same set of symbols for vowels followed by a glottal stop and those followed by nasal coda, although their articulations often differ. However, the symbols for the vowels in Figure 6 are all different from those in Figure 7, so as to emphasize the fact that for no pair of open-syllable vowel and closed-syllable vowel do we assume phonological equivalence. In general, vowels in closed syllables are more central and lower than vowels in open syllables. The auditory plots here are based on accompanying sound files produced by our informant. For acoustic analyses of vowels produced by several speakers, readers are referred to Chen (Reference Chen2008a).
Monophthongs in open syllables
Monophthongs in closed syllables
The back unrounded vowel /ɤ/ is more diphthongal (i.e. [ɤɯ]) in our sound files than in the speech of the older speakers in Chen (Reference Chen2008a), whose monophthongal pronunciation is illustrated in /dɤ3/ ‘head’ by one of them. No durational difference was found between the diphthongized and non-diphthongized realizations. The back rounded /u/ and /o/ are both close to close-mid back monophthongs with compressed lip rounding. The lips for /o/ are more protruding, while in the case of /u/, the lips are less rounded but more compressed. The lips typically converge towards the end of the vowel, sometimes making a light contact. The difference in lip position enhances the difference between the two vowels /u/ and /o/ (see Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 295). The lip convergence is particularly clear when /u/ occurs in combination with an onset /t k/ (e.g. /tu 1/ ‘capital city’ and /ku 1/ ‘song’, compared to /vu 3/ ‘father’).
Typologically, the vowel system in open syllables is remarkable for the clustering in the close-mid to close area. The mid front unrounded vowel resulted from a merger of /e/ and /ɛ/ by the late 1980s (Xu & Tang Reference Xu and Tang1988).Footnote 10 Given the height of the vowel produced by our informant, we may symbolize it as /e/, but we have chosen /ɛ/ because native speakers appear to enjoy considerable latitude in the tongue height of this vowel, and some may have an opener realization, as illustrated in /ɦɛ3/ ‘salty’ by an older speaker reported in Chen (Reference Chen2008a). Our informant also produces the vowel as [ei] in some lexical items which contain /ei/ in Standard Chinese. This is evidently due to the influence of Standard Chinese, and /ei/ thus has the status of a ‘marginal’ vowel of our informant's language. A contrastive pair is /tei 1/ ‘to accumulate’ vs. /tɛ1/ ‘dumbfounded’. /a/ is often transcribed as /A/ in the Sinological literature, a non-IPA symbol for a central, open unrounded vowel (e.g. Xu & Tang Reference Xu and Tang1988).
For some older speakers and our informant, there is a contrast between /tʰjɛ1/ [tʰ ] ‘sky’ vs. /tʰi 1/ ‘ladder’, and /pjɛ2/ [p ] ‘to change’ vs. /pi 2/ ‘arm’, and /ʑɛ3/ [ʑ ] ‘front’ vs. /ʑi 3/ ‘surname Xu’). While we assume that the underlying forms are /jɛ/ or /ɛ/ and /i/, respectively, this difference has been documented in different ways (Qian Reference Qian2003). The two forms are known to have merged in the younger generation by the 1980s (Xu & Tang Reference Xu and Tang1988). In terms of articulation, the tongue body for /ɛ/ after a palatal is raised, which results in a more peripheral (i.e. mainly higher F2) realization, as shown in the contrast between /ɕɛ1/ [ɕ ] ‘fresh’ (left) vs. /ɕi 1/ ‘west’ (right) in Figure 8.
As for vowels in closed syllables, the contrast between /ɐ/ and /ɑ/ only exists before /ŋ/ (i.e. not before /ʔ/). /ɐ/ is more centralized, while /ɑ/ is further back. These two vowels are often described as fully nasalized (e.g. as [ã] and [ ] in Xu & Tang Reference Xu and Tang1988). However, spectrogram inspection of the rhyme realizations by our informant shows that the vowels are indeed consistently nasalized, but often followed by a brief velar closure at the end. A case in point here is /zɑŋ3/ [z ŋ3] ‘bed’ in Figure 9. This observation is supported by Ping (Reference Ping2005: 24).
Vowels preceding a glottal stop in the coda show a general displacement towards an open back position, possibly due to a retracted tongue root for the glottal constriction. Lip unrounding occurs just before the glottal closure and the velar nasal, such that /tɕʏʔ4/ ‘bound morpheme for resolution’ is realized as [tɕʏᶦʔ4], and /kʊʔ4/ ‘country’ as [kʊəʔ] where the [ɪ/ə] element is very brief. Finally, /əʔ/ and /ɐʔ/ are interchangeable for some speakers, particularly after /w/.
Syllable structure
Shanghainese syllable structure is (C)(G)V(C), where G is either /j/, as in /tja 1/ ‘daddy’, or /w/, as in /kwa 1/ ‘well-behaved’. As noted in the section on vowels, the coda C is either /ŋ/ or /ʔ/, as in /kʊŋ1/ ‘public’ and /kʊʔ4/ ‘country’. The glides /j w/ may occur syllable-initially, as in /jɔ3/ ‘to shake’ and /wa3/ ‘rotten’, or following C, but /j w/ do not appear in combination (*/wj/ */jw/). Within an open syllable, j before /i y o u/ is banned, while /w/ is only found before /ɑ ɛ ø/. Within a closed syllable, /j/ is absent before /ɪ ʏ ə/ and /w/ before /ɪ ʏ ʊ/.
Before a rounded vowel, /j/ is rounded to [ɥ], as in /jʊɁ5/ [ɥʊɁ] ‘bath’, /pʰjɔ1/ [p°ɥɥɔ] ‘to float’, and /jø3/ [ɥø] ‘round’ (compare /ø3/ [ɦø] ‘cold’ and /y 3/ [ʝʷy] ‘rain’). While a rounded glide /ɥ/ is commonly posited as a separate phoneme, we treat it here as an allopone of /j/, since it never occurs before a non-round vowel.
In the literature, /j w/ have been analyzed as initial elements in the syllable nucleus, i.e. the ‘medial’ in traditional Chinese phonology, and as onset consonants. In combination with a preceding C, they have additionally been analysed as secondary articulations (see Yip Reference Yip2003, Duanmu, forthcoming, for further details). Regardless of the analysis of prevocalic glides as either onset or nucleus, there are a number of complementary distributions between (non-glide) onset consonants and glides. Before /j/, either a labial or alveolar consonant can appear (e.g. /pʰjɔ1/ ‘to float’, /pjɔ1/ ‘to mark’, /bjɔ3/ ‘prostitute’, /mjɔ1/ ‘to peek’ and /tʰjɔ1/ ‘to shoulder’, /tjɔ1/ ‘marten’, /djɔ3/ ‘stripe’, /njɔ3/ ‘to circle’, /ljɔ1/ ‘to uncover’). In addition to the alveolo-palatal and velar obstruents rejecting a following /j/, the sequences */fj vj wj hj/Footnote 11 are excluded. Before /w/, only velar plosives can appear (e.g. /kʰwɛ1/ [kʰwei 1] ‘debit’, /kwɛ1/ ‘to close’ and /gwɛ3/ ‘to circle (n.)’) or /h/ (e.g. /hwɛ1/ [xwei 1] ‘dust’).
Further phonotactic restrictions can be observed in CV sequences. First, while the alveolar sibilants /s z ts tsʰ/ do not appear before /i y ɪ ʏ j/, syllabic / / occurs precisely after this group of consonants. Examples include /ts 2/ ‘paper’, /ts h 2/ ‘here’, /s 2/ ‘try’, and /z 3/ ‘tree’. Secondly, /i y ɪ ʏ j/ do not appear after velars (e.g. */ki/, */ŋy/, */gj/, etc.). Thirdly, the high front rounded /y ʏ/ only occurs after /l n/ and /tɕʰ tɕ dʑ ɕ ʑ/. Table 1 summarizes the distribution of post-C /j i w u/.
Onsetless syllables
Onsetless syllables at the start of a tone unit begin with a glottal stop if they have T1, T2 or T4. That is, initial [ʔ] is a predictable beginning of vowel-initial syllables with (non-low) Yin register tones. Generally, syllables with these non-low tones have stiff voice. By contrast, slack voice is a predictable beginning of vowel-initial syllables with low tone, T3 or T5, the Yang register tones in traditional Chinese. While it is common to assume phonemic glottal stops and voiced glottal fricatives as syllable onsets, we propose that these syllables should be analyzed as onsetless. Initial slack voice is an enhancement of the realization of T3 and T5, just as initial stiffness can be seen as an enhancement of T1, T2 and T4.
The phonation difference between stiffness vs. slackness of the vowel in these syllables is shown in Figure 10, which plots the Fast Fourier transform (FFT) spectrum of the vowel /a/ in /a 1/ [ʔa 1] ‘bound morpheme a’ (top panel) vs. /a 3/ [ɦa 3] ‘shoe’ (bottom panel), taken over an interval of approximately 30 ms from the beginning of the vowel. It is clear from the measurements on both H1−H2 (i.e. amplitude difference between the first and second harmonics) and H1−A1 (i.e. amplitude difference between the first harmonic and first formant) that there is a phonation difference between the two vowels with /a 3/ [ɦa 3] showing more breathiness (i.e. slack voice) than /a 1/ [ʔa 1]. (See Blankenship Reference Blankenship2002 for more details on the acoustic correlates of phonation types.)
The tone-dependent slack voice is heard as the phonetic segment [ɦ] only with mid and open vowels, as in /ø3/ [ɦø] ‘cold’, /ɛ3/ [ɦɛ] ‘to harm’, /ɔ3/ [ɦɔ] ‘unrestrained’, /a 3/ [ɦa] ‘shoe’. Before less open vowels, the glottal friction is absent, causing /ɪʔ5/ ‘leaf’, /ʏʔ5/ ‘moon’ and /ʊʔ5/ ‘to live’ to sound like [jɪʔ], [ɥʏɪʔ] and [wʊəʔ], respectively. In the case of the high vowels /i y u/, there tends to be weakly voiced cavity friction produced along the place of articulation of the vowel. Thus, /i 3/ ‘aunt’ is [ʝi], /y 3/ ‘rain’ is [ʝʷy] and /u 3/ ‘fox’ is [βu]. As explained in the section on tone contrasts below, the tone of non-initial syllables in the tone unit is deleted, with the tone of the first syllable determining the f0 contour for the tone unit. The relevance of this fact for the segmental analysis is that tone-dependent segmental features in non-initial syllables of the tone unit disappear. Thus, the homorganic glides/fricatives disappear in medial position in the tone unit, while underlying /j w/ remain, as shown in Table 2. Likewise, the tone-dependent status of the phonetic segments [ʔ] and, for non-high vowels, [ɦ] is shown by their disappearance in non-initial position in the tone unit. This can be seen in Table 3, where [ʔ] and [ɦ] disappear after a domain-initial morpheme /tɕʰɪŋ1/. By contrast, the phoneme /h/ is retained in non-initial position.
Tone contrasts
Tone contrasts in Shanghai Chinese are signalled by a complex set of features. In addition to the f0 differences, there is a constellation of phonetic and phonological features of the tone-bearing syllable, which include voice quality (slack voice for T3/T5 and stiff voice for T1/T2/T4), pre-vocalic state of the glottis (a glottal closure before onsetless syllables with T1/T2/T4 and slack vocal fold vibration/weak voiced friction for those with T3/T5), as well as the post-vocalic state of the glottis and the duration of the vowel (short vowel and presence of a glottal closure after T4/T5 vs. long vowel and absence of a glottal closure for T1/T2/T4). A similar point was made in Sherard (Reference Sherard1972).
Table 4 summarizes the distributional restrictions. In an open syllable or a syllable with nasal rhyme, aspirated and unaspirated obstruent onsets allow for a two-way contrast between T1 and T2 (column 2). When the onset is a voiced obstruent, only T3 is possible (column 3). As for nasal and approximant onsets (column 4), T1 and T3 occur fequently on syllables with nasal onsets, while T2 is rare. In the speech of our informant, a sub-minimal triplet is: /mɛ1/ ‘very’, /mei 2/ ‘beautiful’, /mɛ3/ ‘slow’. The pitch for T3 after nasal onsets seems to be not quite as low as that after voiced obstruents, perhaps reflecting the absence of the f0 perturbation effect that occurs after voiced obstruents on the one hand and the low functional load of the contrast with T2 on the other. In checked syllables, voiceless obstruents condition T4 (column 5), leaving voiced obstruents to co-occur with T5 (column 6). After nasal and approximant onsets (column 7), T5 abounds and T4 is rare.
T = voiceless obstruent, S = sonorant consonant, D = voiced obstruent, V = vowel
Onsetless syllables and syllables beginning with /j w/ allow the full set of tonal contrasts: T1 vs. T2 (or T4) vs.T3 (or T5). Table 5 summarizes the phonetic effects of the tone for syllables beginning with a high vowel and a low vowel. All of these examples in fact occur as morphemes, as indicated by the glosses.
Understandably, given these distributional restrictions, there have been conflicting summary statements of the number of tones in Shanghainese. For example, Jin (Reference Jin1986) and Selkirk & Shen (Reference Selkirk, Shen and Inkelas1990) assume a three-way contrast, Duanmu (Reference Duanmu1993, Reference Duanmu1999) a two-way contrast, and Zee & Maddieson (Reference Zee and Maddieson1980) work with five phonetic contours of the lexical tones.
Within a tone unit, the tonal contours of non-initial syllables never surface, and the tone of the first syllable determines the f0 contour for the whole domain. After the second syllable, f0 falls gradually, and converges to low pitch as the number of non-initial syllables increases. (Readers are referred to Chen Reference Chen2008b for further details.) This pattern is known as guangyong shi 广用式 ‘commonly applied pattern’, as opposed to zhaiyong shi 窄用式 ‘less commonly applied pattern’ (Xu & Tang Reference Xu and Tang1988), with the latter referring to tonal reduction processes due to the lack of phrasal-level prominence – a topic for future research.
Many factors can affect f0 contour realization, including prosodic grouping and the information status of the constituents (see e.g. Chen Reference Chen, Cohn, Fougeron and Huffman2012 for review on tonal variation). There is a large literature on tonal variation over the non-initial syllables of tone units as well as on how tone units, together with larger prosodic domains, are realized (see Zee & Maddieson Reference Zee and Maddieson1980, Selkirk & Shen Reference Selkirk, Shen and Inkelas1990, Duanmu Reference Duanmu1999, Chen Reference Chen2008b). It is beyond the scope of this paper to evaluate these proposals or to provide a new analysis.
Transcription of recorded passage
This passage is transcribed phonemically, using the symbols presented in the vowel and consonant charts. Tones are marked for each tone unit on the basis of the tone of the initial syllable. The boundaries between syllables are indicated by spaces, the boundaries of tone units are marked by parentheses, while | marks the end of major phrases and || that of utterances.
(jɤ3 t hɑŋ ts ) | (pʊʔ4 fʊŋ) | (təʔ4) (t h a 2 jɐŋ) | (tsəŋ2 hɔ)| (ləʔ5 ləʔ) (tsɐŋ1) |(sa 2 nɪŋ) | (pəŋ2 z ) (du 3) || (tsɐŋ1 lɛ) | (tsɐŋ1 tɕʰi) | (tsɐŋ1 vəʔ tɕʰɪŋ1 sɑŋ) | (ɡəʔ zəŋ3 kwɑŋ) || (lu 3 lɑŋ) |(tsɤ2 ku lɛ) | (ɪʔ4 ɡə) (nɪŋ3) || (səŋ1 lɑŋ) | (tsɐʔ4 lə) (ɪʔ4 dʑɛ) | (ɤ3 da i) || (i 3 la) (ljɐŋ3 ka dɤ) | (dʑɤ3) (kɑŋ2 hɔ) || (tɕa 2 s ) (sa 2 nɪŋ) | (nəŋ3 kɤ) (tɕɔ2) | (ɡəʔ5 ɡəʔ) (nɪŋ3) | (ɕɛ1) (t hɐʔ1 t həʔ) (da 3 i) || (dʑɤ3) (s 2ø) | (sa 2 nɪŋ) | (pəŋ2 z ) (du 3) || (pʊʔ4 fʊŋ) (dʑɤ3) | (jʊŋ3 tsʊʔ) (lɪʔ5 tɕʰi) | (dʑʊŋ3 tsʰ ) | (pɐʔ4 tsʰ ) || (pəʔ4 ku) || (i 3) | (ʏʔ 5) (tsʰ 1 təʔ) (tɕɪʔ4 kwəŋ) || (ɡəʔ5 ɡəʔ) (tsɤ2 lu ɡəʔ) (nɪŋ3) | (nɛ1) | (da 3 i) | (ku 2 təʔ) | (ʏʔ5) (tɕɪŋ2) || (ɤ3 lɛ) | (pʊʔ4 fʊŋ) | ( 3 məʔ) (tɕɪŋ2 ləʔ) || (tsəʔ4 hɔ) | (vəʔ5) (ts h 1 ləʔ) || (kəʔ4 məʔ) | (tʰa 2 jɐŋ) (tsʰəʔ4 lɛ ləʔ) || (i 3) (gəʔ5 nəŋ ka) | (ɪʔ4) (so 2) || (ɡəʔ5 ɡəʔ) | (tsɤ2 lu gəʔ) (nɪŋ3) | (mo 3 zɐŋ) | (tɕɤ2) (nɛ1) (i 3 dʑɛ) (da 5 i) | (tʰɐʔ4 tʰə ləʔ) || (kəʔ4 məʔ) (pʊʔ4 fʊŋ) | (tsəʔ4 hɔ) (zəŋ3 nɪŋ) || (ljɐŋ3 ka dɤ) | (tɑŋ1 tsʊŋ) || (ɛ3 z ) (t h a 2 jɐŋ) | (tswɛ2) (jɤ3) | (pəŋ2 z ) ||
Orthographic version
有趟子, 北风搭太阳正好勒勒争啥人本事大。争来争去争勿清爽格辰光, 路浪
走过来一个人, 身浪着勒一件厚大衣。伊拉两家头就讲好, 假使啥人能够叫格个人先脱忑大衣, 就算啥人本事大。北风就用足力气穷吹八吹。不过伊越吹得结棍, 格个走路格人拿大衣裹得越紧。后来北风勿没劲勒, 只好勿吹勒。格么太阳出来勒。伊格能嘎一晒, 格个走路格人马上就拿伊件大衣脱忑勒。格么北风只好承认, 两家头当中, 还是太阳最有本事。
Acknowledgements
We would like to thank first and foremost our informant Zhiqi Deng for making this study possible. In addition, we are grateful to Yueling Ping, Huan Tao, Rujie You, and Dunming You for sharing their thoughts with us on various linguistic aspects of the language, and to Zhongmin Chen and Menghui Shi for sharing references. We are also greatly indebted to the anonymous reviewers, whose comments have led to a large number of improvements in the exposition as well as to various additions to the text. The editorial assistance from Ewa Jaworska, Roger Lo, and Myrthe Wildeboer at various stages of the writeup is gratefully appreciated. The work was in part supported by a VENI grant and a VIDI grant awarded to the first author, and the Network grant ‘Forms and Functions of Prosody’, awarded to both authors, all by the Netherlands Organisation for Scientific Research (NWO). YC was also supported by the European Research Council (ERC-Starting Grant 206198).