Lili Wu Chinese () is a Wu dialect (; ISO 639-3; code: wuu) spoken by approximately 38,000 people who reside in the town of Lili (), one of the ten major towns in the Wujiang district (). The Wujiang district belongs to the prefectural-level municipality of Suzhou city () in Jiangsu province (), the People’s Republic of China. It is located at the juncture area of the city of Shanghai (), the city of Suzhou, and the city of Jiaxing (), as shown in Figure 1.
Lili Wu Chinese is commonly considered to belong to the Suhujia dialect cluster (), which in turn is classified as a member of the Tai Lake subgroup () of the Northern Wu dialect group, a Sinitic branch within the Sino-Tibetan family (Wurm et al. Reference Wurm, Li, Baumann and Lee1987: B–9). The dialect is famous for the so-called aspiration-induced tonal split phenomenon, which refers to the lowering of f0 contours after voiceless aspirated obstruents in certain tonal contexts.Footnote 1 Lili Wu Chinese has therefore attracted much attention over the last six decades, which led to a handful of descriptions not only on the dialect but also on its closely-related dialects in the Wujiang area which appear to have similar aspiration-induced tonal splits. Perhaps because of this salient tonal-split feature in the dialect, much less attention has been paid to the segmental properties of Lili Wu Chinese in the existing literature. This description aims to bring together existing descriptions of Lili Wu Chinese in an accessible form, as well as to propose a number of methodological/analytical innovations and new perspectives with regard to not only lexical tones but also segmental features. Specifically, they are: (i) an instrumental analysis of the lexical tones and a reanalysis of the co-occurrence pattern between lexical tone and onset; (ii) acoustic realizations of voiceless vs. voiced fricatives; (iii) detailed phonetic analyses of two high front vowels /i/ and /i̟/; and (iv) the addition of two syllabic approximants /ɹ̩/ and /ɹ̹̍/ in the sound system of Lili Wu Chinese.
The description is mainly accompanied by recordings of a sixty-eight-year-old male native speaker, who was born in 1948 and raised in Lili town. All acoustic data we present in this description were elicited from this consultant. Our consultant spent most of his life living in Lili and speaking Lili Wu Chinese, except for the three years attending a college in a nearby city. According to his self-report, he can speak (accented) Standard Chinese and limited Shanghainese when the situation requires him to do so, but he speaks only Lili Wu Chinese at home. All video recordings were elicited from another male native speaker, who was born in 1947 and raised in Lili town as well.
Lexical tones and aspiration-induced tonal split
There are eight lexical tones in Lili Wu Chinese. Plotted in Figure 2 are the f0 contours of the example morphemes, labelled as T1 to T8, respectively. Generally speaking, lexical tones marked as odd numbers start within a higher f0 range (above 160Hz, high-register hereafter), while those marked as even numbers start within a lower range (under 160Hz, low-register hereafter). T1 (black solid) has a level f0 contour within the high-register (high–level) while T2 (dark grey solid) is a low-register rising tone (low–rising). T3 (black round dot) starts within the high-register and falls (high–falling). T4 (dark grey round dot) is a low-register level tone (low–level). T5 (black square dot) has a convex contour which starts at the high-register, falls and ends with a slight rise (high–dipping). T6 (dark grey square dot) is realized with a similar f0 contour to that of T5 but starts at the low-register (low–dipping). Both T7 (black dash-dotted) and T8 (dark grey dash-dotted) are associated with syllables that have a much shorter duration than the other tone-bearing syllables. T7 starts within the high-register and despite the slight falling contour, sounds like a high-register level tone (short–high–level). T8 is a low-register level tone (short–low–level).
These eight lexical tones exhibit interesting co-occurrence patterns with both the onset and coda. Lili Wu Chinese features the three-way laryngeal contrast in obstruents, known as voiceless unaspirated, voiceless aspirated and voiced, respectively. (See the section on consonants below for more details.) Syllables with voiceless unaspirated onsets only allow high-register tones (T1, T3, T5, and T7); while voiced onsets co-occur with low-register tones (T2, T4, T6, and T8). T1 to T6 only co-occur with open syllables or syllables with a nasal coda () and are therefore also known as smooth/non-checked tones (developing from the Ping [, level], Shang [, rising], and Qu [, departing] tonal categories of Middle Chinese), while the T7 and T8 only co-occur with closed syllables with a glottal coda /ʔ/ () and are known as abrupt/checked tones (developing from the Ru [, entering] tonal category of Middle Chinese).
In the vast majority of Northern Wu dialects such as Shanghainese (Chen & Gussenhoven Reference Chen and Gussenhoven2015), both voiceless unaspirated and aspirated onsets condition high-register tones, leaving voiced onsets to co-occur with low-register tones. What makes Lili Wu Chinese interesting is the effect of obstruent aspiration on lexical tonal realization, as exemplified by /tʰʊŋ¹/ ‘unblocked’, /tʰʊŋ⁴/ ‘to unify’, /tʰʊŋ⁶/ ‘ache’, and /tʰʊʔ⁸/ ‘baldy’. Their f0 contours are plotted in Figure 3 (labelled as T1–A, T3–A, T5–A, and T7–A where A indicates voiceless aspirated onsets), in comparison to the f0 contours of the presumably same lexical tones realized after voiceless unaspirated onsets (indicated with U). Except for T1 (i.e. T1–U vs. T1–A), we see a clear f0-lowering effect in syllables with voiceless aspirated onsets. This lowering effect, as if a split of the same tone into two as a function of voiceless unaspirated vs. aspirated onsets, is known as aspiration-induced tonal split.
Perhaps due to this prominent phenomenon, the tonal inventory in Lili Wu Chinese has been a point of debates in recent decades. To our knowledge, there are at least eleven descriptive works focusing on this aspiration-induced tonal-split phenomenon in Lili Wu Chinese (Chao Reference Chao1928: 82; Ye Reference Ye1983; Zhang & Liu Reference Zhang and Liu1983; F. Shi Reference Shi1992; Qian Reference Qian1992: 48; Shen Reference Shen1994; P. Wang Reference Wang2008, Reference Wang2010: 26–27; Z. Xu Reference Xu2009: 55; Hirayama Reference Hirayama2010; Yanhong Xu Reference Xu2013: 32). Researchers differ greatly in their treatment/interpretation of the tonal-split phenomenon. The main debate lies in the question of whether the f0 contours of lexical tones after aspirated onsets have been merged with those after voiced onsets or emerged as distinct tonal categories independent of the existing eight tonal categories. It is important to note that previous studies typically explore this phenomenon based on impressionistic descriptions (e.g. Chao Reference Chao1928: 8–10), or with data from a very limited number of speakers (e.g. F. Shi Reference Shi1992 for one male and one female speakers; Shen Reference Shen1994 for two young speakers).
M. Shi, Chen & Mous (Reference Shi, Chen and Mous2016), with data from twenty native speakers (eight males and 12 females with mean age of 67 years and standard deviation of six years), show comparable f0 contours after voiceless aspirated (T1–A) and voiceless unaspirated (T1–U) onsets, both of which are realized within the high register, as shown in Figure 4(a). However, the f0 contours after voiceless aspirated onsets can also pattern more like those after voiced onsets, resulting in the merger of the f0 contours of T3–A and T4, T5–A and T6, T7–A and T8, respectively, as plotted in Figure 4(b–c). F0 contours after aspirated onsets show a trend of slightly higher f0. Suggested by the statistical results (growth curve analysis, GCA) in M. Shi et al. (Reference Shi, Chen and Mous2016), there is no significant difference between f0 contours after voiceless aspirated and voiced onsets for each pair.
In summary, the lexical tonal system of Lili Wu Chinese includes two level tones (high–level T1 and low–level T4), one falling tone (high–falling T3), one rising (low–rising T2) and two dipping tones (high–dipping T5 and low–dipping T6). For short syllables with a glottal coda, two level tones are identified (short–high–level T7 and short–low–level T8). The numerical representations of the eight lexical tones and their co-occurrence patterns with onsets are provided in Table 1. Here, we adopted the tonal transcription system developed by Chao (Reference Chao1930) where 5 indicates the highest end of a speaker’s pitch range into levels and 1 the lowest.
T1 can co-occur with both voiceless onsets (i.e. unaspirated and aspirated). T3, T5, and T7, on the other hand, can only co-occur with voiceless unaspirated onsets. The three low-register tones (T4, T6, and T8) are licensed by both voiceless aspirated and voiced onsets, while T2 is only allowed after voiced onsets. It is important to note that the co-occurrence pattern (i.e. voiceless onsets co-occurring with high-register tones, while voiced onsets with low-register ones), which is commonly observed in most Northern Wu dialects, falls apart in Lili Wu Chinese where voiceless aspirated onsets can co-occur with low-register tones.
In addition, it is worth noting that in Lili Wu Chinese, sonorants (i.e. nasals and liquids) mainly co-occur with low-register tones and share the same tonal pattern with voiced plosives. A set of words initialed with nasals can also co-occur with high-register tones, such as /mu³/ [məʊ³] ‘bound morpheme for the literary address of mother’.Footnote 2 With respect to fricatives, since there is only a two-way laryngeal distinction (i.e. voiceless vs. voiced), voiceless fricatives co-occur with high-register tones while their voiced counterparts with low-register ones.
Consonants
Lili Wu Chinese has 28 consonants. Corresponding key words/bound morphemes are provided below the consonant chart. Lili Wu Chinese features the three-way laryngeal contrast in obstruents, known as voiceless unaspirated, voiceless aspirated and voiced, respectively (Chao Reference Chao1967). This three-way contrast is a prominent feature of the Northern Wu dialects. The three-way laryngeal contrast, however, has different phonetic manifestations in the initial vs. medial positions within a word. Generally speaking, in the initial position, these obstruents vary in their phonation from clearly modal (voiceless unaspirated), aspirated with breathiness (voiceless aspirated), to breathy (voiced) (M. Shi et al. Reference Shi, Chen and Mous2016). In the medial position, voiced obstruents are realized with noticeable voicing throughout the closure, leading to a three-way laryngeal distinction in terms of voice onset time (VOT). In Shanghai Wu Chinese, a Northern Wu dialect closely related to Lili Wu, there are other phonetic properties to signal the three-way laryngeal contrast in both initial and medial positions (e.g. Shen, Wooters & W S.-Y. Wang Reference Shen, Wooters, Wang, Joseph and Zwicky1987 on closure duration; Ren Reference Ren1992: 95–111 on transillumination data; Gao Reference Gao2015: 199–207 on motion-capture-system data; see also a review in Chen Reference Chen2011). Impressionistically speaking, Lili Wu behaves similarly to Shanghainese. Needless to say, more research is needed to examine if these properties also function in Lili Wu Chinese.
Fricatives have the voiceless vs. voiced two-way laryngeal contrast. Similar to the plosives and affricates, in the initial position, their phonatory states vary from clearly modal in the voiceless ones to slightly breathy in the voiced ones. In medial position, the voiced category is realized as vigorous voicing, leading to a two-way contrast in terms of VOT. It is worth noting that the fricative voicing contrast is also signaled via their durational differences, similar to what have been reported for voicing contrast in English fricatives (Cole & Cooper Reference Cole and Cooper1975), as shown in the following pairs: /f/ (/fu¹/ ‘husband’) vs. /v/ (/vu²/ ‘support somebody with one's hand’); /s/ (/sɛ¹/ ‘three’) vs. /z/ (/zɛ²/ ‘greedy’). Figure 5 illustrates the acoustic realization of /f/ in /fu¹/ (6a) and /v/ in /vu²/ (6b). Although neither is realized with regular vocal pulses (i.e. phonetically voiceless), the fricative duration of /f/ (131 ms; 29% of the total duration) is almost 2.4 times longer than that of /v/ (56 ms; 12% of the total duration).
To further confirm these observations, we elicited ten minimal pairs for each minimal set of voicing contrast. All stimuli were lexemes of relatively high familiarity, as confirmed by our consultant. Both the absolute duration of the frication and the percentage of the frication duration over the whole syllable duration were calculated. The fricative duration was measured from the onset of clear frication noise to the first periodic cycle of the vowel. Results in Table 2 show that the percentage of the frication duration of voiceless onsets is significantly greater than that of their voiced counterparts, confirmed by the results of the independent samples t-tests (one-tailed) for each pair.
Last but not least, /dz/ and /z/ are sometimes in free variation for the same lexical item, as exemplified in ‘groceries’ (/ʣaʔ⁸ hu⁵/ vs. /zaʔ⁸ hu⁵/). This finding may imply that in Lili Wu Chinese, the affricate /dz/ and fricative /z/ are undergoing merger at the lexical level. The glottal plosive /ʔ/ only appears in coda position as a phoneme and co-occurs with short syllables as in /paʔ⁷/ ‘hundred’. Phonetically speaking, the [ʔ] segment can be observed at the beginning of onsetless syllables with the high-register tones (i.e. T1, T3, T5, and T7) (see the section below for further details on onsetless syllables).
Sonorants
/n l/ are typical laminal alveolar. The alveolar nasal /n/ is palatalized before high front segments (i.e. /i i̟ y j/), as in /ni²/ [ɲi²] ‘year’ and /njɛ⁶/ [ɲjɛ⁶] ‘to read’. Labial and velar nasals can form syllable nuclei as in /ŋ̍⁴/ ‘five’ and /m̩⁴/ ‘parcel of land’. These two syllabic nasals can be found in many Southern Chinese dialects (i.e. Wu, Min, Hakka, Xiang, and Yue) but are relatively rare in dialects belonging to the Mandarin family (Shen Reference Shen2006). In addition, /ŋ/ occurs as a nasal coda as well, but its acoustic realization varies according to the preceding vowel. After a front vowel, the nasal coda acquires the anterior feature and sounds like [n] (as in /zɪŋ²/ [zɪn²] ‘to look for’ and /tɕʏŋ¹/ [tɕʏn¹] ‘army’), as in contrast to a non-front vowel (as in /dzəŋ¹/ ‘deity’ and /dʊŋ²/ ‘copper’). Following the treatment of Chen & Gussenhoven (Reference Chen and Gussenhoven2015) for Shanghainese, we posit an underlying /ŋ/ in coda position.
There are two glides /j/ and /w/ in Lili Wu Chinese. Glides are typically defined as vowel-like segments that function as consonants and belong to the approximant class (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 322). In Lili Wu Chinese, /j/ and /w/ differ from the corresponding vowels (i.e. /i/ and /u/) in that both tend to be produced with a narrower constriction of the vocal tract indicated via lower F1 values. Following Maddieson & Emmorey (Reference Maddieson and Emmorey1985), we compared mean F1 of the beginning interval (i.e. 50 ms) of /j/ (/jɘ̝o¹/ ‘surname, Ou’) with /i/ (/i¹/ ‘smoke’) and /w/ (/wɛ²/ ‘to return’) with /u/ (/u¹/ ‘crow’), respectively. Results showed that the F1 values of /j/ (265 Hz) and /w/ (314 Hz) are lower than the corresponding vowels (/i/: 271 Hz; /u/: 354 Hz). Existing descriptions of Lili Wu Chinese such as P. Wang (Reference Wang2010: 26) have typically posited high vowels /i u/ instead of glides /j w/ in words like /jɘ̝o¹/ and /wɛ²/ (/iəu¹/ and /uᴇ²/ in P. Wang Reference Wang2010: 26, respectively),Footnote 3 despite the consensus among sinologists that they are glides. We have adopted the approximants /j w/ to transcribe the sounds. Note that before rounded vowels /o/ and /ø/, /j/ is realized as [ɥ] as in /joʔ⁸/ [ɥoʔ⁸] ‘bath’ and /jø²/ [ɥø²] ‘rounded’. Because of the complementary distribution, /ɥ/ is treated as a context-specific (i.e. before /o/ and /ø/) variation of /j/.
A controversial issue is whether it is necessary to posit /j/ after an alveolo-palatal affricate or fricative onset (i.e. /ʨʰ ʨ ʥ ɕ/) in Wu Chinese (see a brief discussion in Chen & Gussenhoven Reference Chen and Gussenhoven2015). Historically, these alveolo-palatal onsets are commonly believed to develop from the velar or glottal onsets (i.e. /kʰ k ɡ h/) due to the palatalization process triggered by the following high front segments (e.g. L. Wang Reference Wang1985: 394). Synchronically, there is no contrast between /ʨʰ ʨ ʥ ɕ/ and /ʨʰj ʨj ʥj ɕj/ in Lili Wu Chinese. More remarkably, the transition from the alveolo-palatal affricate to the following vowel is rather brief. Figure 6 illustrates the different transitional characteristics among /tɑ¹/ ‘knife’ (6a), where there is no glide, /tjɑ¹/ ‘marten’ (6b) and /tsjɑ¹/ ‘scorched’ (6c), where there is commonly recognized presence of /j/, and /ʨɑ¹/ ‘to converge’ (6d), where we propose absence of /j/. Adapting the method of Chitoran (Reference Chitoran2002), we marked the beginning of the transition at the start of the sonorant part (i.e. glide or vowel) and the end of the transition as the turning point from a falling F2 to an F2 steady-state, before it falls consistently less than 20 Hz. The F2 values were automatically measured in Praat (Boersma & Weenink Reference Boersma and Weenink2020) with a window length of 5 ms. Note that we would have expected a much more stable realization of /j/ with longer transition from /ʨ/ to /ɑ/ if we assumed the presence of a glide /j/ following /ʨ/. These observations motivated us not to posit an underlying /j/ after alveolo-palatal onsets (following the analysis of Chen & Gussenhoven Reference Chen and Gussenhoven2015 for Shanghainese). But we would like to stress the importance of further experimental studies to investigate the phonological status and phonetic realization of /j/ after alveolo-palatals in Lili Wu Chinese as well as other Chinese dialects.
Vowels
The traditional quadrilateral vowel plot of Lili Wu monophthongs in open syllables is as follows:
Monophthongs and diphthong in open syllables
Monophthongs in closed syllables and nasalized vowels
In open syllables, there are nine monophthongs (7a) and one diphthong (7b) in Lili Wu Chinese as plotted in Figure 7. These nine monophthongs of Lili Wu Chinese (/i y i̟ ɛ ø u o ɔ ɑ/) constitute a four-way distinction (i.e. close, close-mid, open-mid, and open) in height and a two-way distinction (i.e. front and back) in backness. /i y/ contrast in roundness. In addition, there is one diphthong occurring in open syllables, with /ɘ̝o/ gliding towards the back. Monophthongs in closed syllables and nasalized vowels are plotted in Figure 8, where four (/ɪ ʏ ə ʊ/) occur in syllables closed by a nasal coda (8a), four (/ɪ a ʊ Ʌ/) in syllables closed by a glottal coda (8b) and two (/æ̃ ɑ̃/) are nasalized vowels (8c). Compared to the vowels in open syllables, the number of vowels in closed syllables is largely reduced and so is their acoustic vowel space. Generally speaking, vowels in closed syllables or with nasalization are more central and lower than those in open syllables. Following Chen & Gussenhoven (Reference Chen and Gussenhoven2015), we adopted the same set of symbols (i.e. /ɪ/ and /ʊ/) for monophthongs followed by a nasal coda and those by a glottal coda, although their articulations do differ. The plots of the F1–F2 values are based on accompanying sound files produced by our consultant. The mean formant value of a vowel was calculated by averaging over ten tokens (except for /u/ which was calculated based on five tokens).
Lili Wu presents an interesting case of fricative vowel, as illustrated in Figure 9, which plots the spectrograms of the minimal pair /i/ in /ti³/ ‘dot’ (9a) and /i̟/ in /ti̟³/ ‘bottom’ (9b). The F2 of /i/ (2399 Hz) is higher than the F2 of /i̟/ (2009 Hz). Perceptually, a striking difference between /i/ and /i̟/ is the frication present in /i̟/. Figure 10 exhibits narrow band spectrograms of /ti³/ (10a) and /ti̟³/ (10b). Harmonics can be clearly identified in /ti³/ but are not in /ti̟³/, especially in the frequency bands above 2 kHz. Furthermore, there is a substantial amount of aperiodic energy in the higher frequency region, particularly above 4 kHz in /ti̟³/, which suggests the presence of strong fricative noise. This observation is further confirmed by the HNR (Harmonics-to-Noise Ratio) results, with /i̟/ in /ti̟³/ (8.1 dB) showing more noise than /i/ in /ti³/ (9.8 dB).
A similar contrast has been reported in Suzhou Wu Chinese (Chao Reference Chao1928: 38; P. Wang Reference Wang1987; Hu Reference Hu2007; Ling Reference Ling, Trouvain and Barry2007, Reference Ling2011). In order to illustrate the frication, Ling (Reference Ling2011) adopted the symbol /iz/ for the phoneme and [ʒ̻̍] (i.e. the syllabic laminal postalveolar voiced fricative) for its phonetic realization. However, this treatment is problematic. First, a subscript /z/ does not meet the convention of diacritic symbols in the IPA. Second, articulatory data (i.e. palatographic, linguagraphic, and electromagnetic articulographic studies) of Suzhou Wu Chinese have shown that the constriction of /i̟/ is located at a more anterior position (Ling Reference Ling, Trouvain and Barry2007, Reference Ling2011; Hu & Ling Reference Hu and Ling2019) than /i/. Consequently, the lengthening of the back resonating cavity lowers the F2 of /i̟/ as argued by Ling (Reference Ling2011) following Stevens (Reference Stevens1989). Third, we have also noted that the frication in Lili, compared to that in Suzhou Wu Chinese, is not consistently audible for all /i̟/ words produced by our consultant and also not as strong as that in Suzhou Wu Chinese. For instance, there is little frication in /fi̟¹/ [fᵊɨ̟¹] ‘to fly’ (which tends to be diphthongal). Given the three reasons, we have adopted the symbol /̟/ to highlight the more anterior constriction of /i̟/ and the less friction. Such an articulatory gesture is also accompanied by the raising of the lower jaw in words such as /i̟¹/ ‘clothes’, which however, is not observed in words such as /i¹/ ‘smoke’, as shown in the video recordings. The contrast of high front vowels between /i/ and /i̟/ is an areal feature in many Chinese dialects, especially in the Jianghuai Mandarin family () (R. Shi Reference Shi1998, Zhu Reference Zhu2004b, Zhao Reference Zhao2007). Similar contrasts have also been argued to occur in modern African languages, such as Len Mambila (Connell Reference Connell2007) and Ring languages (Faytak & Merrill Reference Faytak and Merrill2015).
Both /u/ (in /u¹/ ‘crow’) and /o/ (in /ko¹/ ‘melon’) are close/closed-mid back monophthongs with compressed lip rounding. The lips for /o/ are more protruding but for /u/ they are less rounded and more compressed (similar to the /u o/ contrast in Shanghainese as discussed in Chen & Gussenhoven Reference Chen and Gussenhoven2015). After bilabial and labio-dental,Footnote 4 /u/ is produced as [v̩] (i.e. the syllabic labiodental voiced fricative), as exemplified in /pu¹/ [pv̩¹] ‘wave’. After alveolar, alveolo-palatal and velar consonants, /u/ is realized with diphthong quality (i.e. [əʊ]), as shown in /ku¹/ [kəʊ¹] ‘song’. According to a Suzhounese syllabary named A Syllabary of the Soochow Dialect, recorded by A Committee of the Soochow Literary Association (1892) for missionaries in acquiring Suzhounese, such a diphthongal realization of /u/ after alveolar, alveolo-palatal and velar consonants can be traced back to the beginning of the 20th century.
The front vowel /ø/ tends to be produced with a lower F2 such as in /ø¹/ [ʔø̈¹] ‘in safe’ (1228 Hz) than in /jø²/ [ɥø²] ‘rounded’ (1425 Hz). Both, however, are produced with a lip rounding gesture, as shown in the video recordings.
/ɘ̝o/ is a diphthong and only co-occurs with the glide /j/ (e.g. /vjɘ̝o²/ ‘to float’ and /kjɘ̝o¹/ ‘to tick off’) or alveolo-palatals (e.g., /dʑɘ̝o⁶/ ‘used’ and /ɕɘ̝o¹/ ‘to rest’).
Vowels preceding a glottal stop coda show a much shorter duration. When high vowels (i.e. /ɪ/ and /ʊ/) occur before /ʔ/, a general displacement towards an open back position often results in a brief schwa after nuclei, such as /ʨɪʔ⁷/ [ʨɪᵊʔ⁷] ‘hurry’ and /kʊʔ⁷/ [kʊᵊʔ⁷] ‘surname, Guo’.
/æ̃/ and /ɑ̃/ are two nasalized vowels, as illustrated in /tsʰæ̃⁶/ ‘unimpeded’ and /tsʰɑ̃⁶/ ‘to sing’. Both vowels are consistently nasalized without recognizable velum closure in Lili Wu Chinese, different from the case of Shanghainese where a brief velar nasal coda has been reported (Chen & Gussenhoven Reference Chen and Gussenhoven2015).
Syllabic approximants
There are two syllabic approximants in Lili Wu Chinse, which are exemplified in /sɹ̩¹/ [sɹ̪̍¹] ‘silk’ and /sɹ̹̍¹/ [sʷɹ̻̹̍¹] ‘book’. The syllabic approximant /ɹ̩/ [ɹ̪̍] in Lili Wu Chinese is similar to that in Standard Chinese.
With respect to /ɹ̹̍/, two features are to be further noted. First, the lip rounding gesture of the approximant contributes to the labialization of the preceding alveolar sibilant onset (i.e. /s/ [sʷ] before /ɹ̹̍/). Labialized alveolar sibilants are rare in the world’s languages (but see Lao, a Tai-Kadai language reported in Erickson Reference Erickson, Adams and Hudak2001). The rounding feature is believed to evolve from /u/ or /y/, the two rounded vowels reported to be present instead of /ɹ̹̍/ in other Wu dialects, such as /su¹/ in Danyang () Wu and /ɕy¹/ in Songjiang () Wu for ‘book’ (Qian Reference Qian1992: 88). In addition, /ɹ̹̍/ is articulated more laminally. Laminal consonants have been widely reported to exist in Australian languages (Butcher Reference Butcher and Roland1990, Anderson & Maddieson Reference Anderson and Maddieson1994). Such an articulatory gesture of /ɹ̹̍/ is reflected in Figure 11 as a lowered F4 (3375 Hz, compared to 4221 Hz of /ɹ̩/ in /sɹ̩¹/) and the proximity of F3 and F4. F4 lowering is generally said to be related to articulatory retraction (e.g. Fant Reference Fant1960: 121; Stevens & Blumstein Reference Stevens and Blumstein1975; Vaissière Reference Vaissière, Lee and Zee2011). The proximity of F3 and F4 is known as a consequence of weakly coupled resonators by forming a relatively larger frontal cavity (Stevens Reference Stevens1989). For instance, a significant convergence of F3 and F4 is observed in laminal alveolar and postalveolar fricatives in English, as well as in apico-laminal alveolars in French (Dart Reference Dart1991: 104). In short, /ɹ̹̍/ is produced with a more laminal articulation combined with a lip rounding gesture than its counterpart /ɹ̩/. Such differences were also noticed by our consultant who offered his native intuition voluntarily with us. Given the impressionistic nature of the description, needless to say, more instrumental studies (e.g. ultrasound) are needed for a precise description of their articulation and acoustic consequences.
It is worth noting that there exist different proposals to transcribe these sounds. For example, among sinologists (after Karlgren Reference Karlgren1915: 294), /ɹ̩/ and /ɹ̹̍/ have often been transcribed as /ɿ/ and /ʮ/, respectively, and are known as ‘apical vowels’. /ɹ̩/ is sometimes treated as [z̩] (e.g. Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 314; Wiese Reference Wiese, Wang and Smith1997: 239; Duanmu Reference Duanmu2000: 36 for Standard Chinese; Chen & Gussenhoven Reference Chen and Gussenhoven2015 for Shanghainese). Such a treatment, however, has been questioned with ultrasound imaging data (Lee-Kim Reference Lee-Kim2014, Faytak & Lin Reference Faytak and Lin2015) and acoustic analyses (Howie Reference Howie1976: 10). Lee-Kim (Reference Lee-Kim2014) further argues that it is more appropriate to describe [z̩] as ‘dental approximant [ɹ̪̍]’.Footnote 5 A similar treatment can also be found in Lee & Zee (Reference Lee and Zee2003).
Last but not least, an increasing body of literature has shown that such syllabic approximants are known to affect diachronic changes of high vowels in different languages at different time points, across an overwhelmingly large number of Sino-Tibetan languages (e.g. Baron Reference Baron1974, R. Shi Reference Shi1998, Zhu Reference Zhu2004b, Zhao Reference Zhao2007, Hu & Ling Reference Hu and Ling2019).
Syllable structure
Generally speaking, eight syllable combinations can be identified in Lili Wu Chinese. The canonical syllable minimally consists of an obligatory nucleus (V) and a lexical tone as in /u¹/ ‘crow’ and /ø²/ [ɦø̈²] ‘cold’. The nucleus can be either a vowel or a syllabic consonant (i.e. /ɹ̩ ɹ̹̍ m̩ ŋ̍/).Footnote 6 It may also contain up to three optional elements in the following linear structure: (C1)(G)V(C2), where C1 can be any consonant in the consonant inventory except for /ʔ/, G is either /j/, as in /kjɘ̝o¹/ ‘to tick off’, or /w/, as in /kwɛ¹/ ‘to close’; C2 is either /ŋ/ or /ʔ/ as in /kʊŋ¹/ ‘public’ and /kaʔ⁷/ ‘to clip’. Parentheses indicate optional constituents. All combinations are demonstrated in Table 3.
As illustrated in Table 4, co-occurrence constraints on onset and rhyme combinations can be observed. First, /i ɪŋ ɪʔ i̟/ behave similarly except that /i̟/ can appear after labio-dentals as in /fi̟¹/ ‘to fly’ and /vi̟²/ ‘fat’. /i/, on the other hand, is prohibited in this context (i.e. */fi/, */vi/). Second, /y ʏŋ/ are only allowed after alveolar sonorants and alveolo-palatals, or without an onset. Third, before /ø o ɔ ɑ æ̃/, labio-dentals are prohibited (*/fø fo fɔ fɑ fæ̃/) but /ɛ u/ are possible as in /fɛ¹/ ‘to turn over’ and /vu²/ ‘to support somebody with one’s arm’. Fourth, the two syllabic approximants /ɹ̩ ɹ̹̍/ occur only after alveolar homorganic sibilant onsets /ts tsʰ ʣ s z/. /j w/ can serve as an onset as in /jø²/ ‘rounded’ and /wɔ⁴/ ‘broken’.
The distribution of the two glides is summarized in Table 5. /j/ is allowed in the majority of cases (e.g. /pjɑ¹/ ‘watch’, /vjɘ̝o²/ ‘to float’, /tjɑ¹/ ‘marten’, /kjɘ̝o¹/ ‘to tick off’, /hjɘ̝o³/ ‘to roar’ and /jø²/ ‘rounded’) except after alveolo-palatals. /w/, however, is more constrained and only allowed after velars (e.g. /kwɛ¹/ ‘to close’), glottal fricative /h/ (e.g. /hwɛ¹/ ‘dust’), or serves as a glide onset (e.g. /wɛ²/ ‘to be back’).
and cannot be observed.
cannot be found.
cannot be found.
and cannot be observed.
cannot be found.
and cannot be found.
Onsetless syllables
In onsetless syllables with high-register tones (i.e. T1, T3, T5, and T7), the phonetic segment [ʔ] can be observed at the onset of the tone-bearing syllable, as in /ø¹/ [ʔø̈¹] ‘in safe’ and /si̟¹ ø¹/ [si̟⁴⁴ ʔø̈⁴²] ‘a city, Xi’an’. With respect to onsetless syllables with low-register tones (i.e. T2, T4, T6, and T8), we observe phonetic realization of [ɦ] before a non-high vowel (e.g. /ø²/ [ɦø̈²] ‘cold’, /ɔ²/ [ɦɔ²] ‘shoes’, and /aʔ⁸/ [ɦaʔ⁸] ‘box’), in contrast to the cases when there is a high vowel or glide (e.g. /i²/ [ʝi²] ‘salt’, /jø²/ [ɥø²] ‘rounded’, /u²/ ‘river’, and /wɅʔ⁸/ ‘alive’). [ɦ] disappears in non-initial position within a prosodic word, e.g. /tʰɑ⁴ ɔ²/ ‘galoshes’. The general pattern is therefore similar to Shanghainese (Chen & Gussenhoven Reference Chen and Gussenhoven2015).
In Lili Wu Chinese, syllables with low-register tones show relatively stronger breathiness than those with high-register counterparts. As indicated by Figure 12, the Fast Fourier Transform (FFT) spectrum of /ø¹/ [ʔø̈¹] ‘in safe’ (dark) and /ø²/ [ɦø̈²] ‘cold’ (light) shows the phonation contrast in the vowel /ø/, taken within an interval of approximately 30 ms from the first regular vocal pulse of the vowel. As shown by the measurements on H1 – H2 (i.e. amplitude difference between the first and second harmonics), there is a phonatory difference between the two vowels with /ø²/ (4.5 dB) showing more breathiness than /ø¹/ (2 dB). This contrast has also been observed in other Northern Wu dialects (Cao & Maddieson Reference Cao and Maddieson1992).
Tone sandhi: A preliminary overview
Lexical tones over monosyllabic morphemes undergo changes when they are combined into compounds or phrases. In this description, we offer some preliminary observations concerning tone sandhi variations in Lili Wu Chinese over disyllabic compounds (hereafter called the tone unit). Tonal realization is mainly contingent upon the lexical tone of the second syllable (σ2). Generally speaking, two general patterns are observed.
First, when σ2 carries an abrupt tone (i.e. T7 and T8 over a glottal-coda syllable), regardless of the syllable structure of the first syllable (σ1), only level f0 contours surface, and the specific f0 height is dependent on the lexical tone of σ1. After a high tone, a low tone appears; while after a low tone, a high tone appears. Both patterns are illustrated in Figure 13, which shows T1 (high–level) + T7/T8 (13a /tsʰəŋ¹ tsɪʔ⁷/‘the Spring Festival’, 13b /tɕɪŋ¹ dʑɪʔ⁸/ ‘Peking Opera’) and T6 (low–level) + T7/T8 (13c /tʰɔ⁶ kwɅʔ⁷/ ‘Thailand’, 13d /ʨʰi̟⁶ dɪʔ⁸/ ‘steam whistle’).
Second, when σ2 carries a non-abrupt tone (i.e. T1 to T6 over an open syllable or a syllable with a nasal coda), the lexical tonal contour of σ1 remains and affects the pitch realization of σ2. The specific f0 contour of σ2 hinges upon the lexical tonal register of σ1. When σ1 is produced with a high-register tone (i.e. T1, T3, T5, and T7), σ2 is typically realized with a falling f0 contour, as shown in Figure 14 (e.g. 14a /sɪŋ¹ zəŋ⁴/ ‘new kidney’, 14b /kɛ³ zɑ⁴/ ‘to remold’, 14c /tɕɔ⁵ zɑ⁴/ ‘introduction’, and 14d /kʊʔ⁷ tʰu⁴/ ‘territory’). However, other patterns have also been observed. For example, in the combination of T7 + σ2, when σ2 bears T1 (e.g. /kʊʔ⁷ kʰu¹/ ‘orthopaedics’), the underlying form of T1 in /kʰu¹/ (high–level) is preserved, instead of a predictable falling contour like /tʰu⁴/ in /kʊʔ⁷ tʰu⁴/ ‘territory’.
When σ1 is pronounced with a low-register tone (i.e. T2, T4, T6, and T8), the sandhi pattern tends to be more complicated. The tonal contour of σ2 seems to also exert influence on the overall tonal realization. For example, Figure 15 shows the contrast of /pʰɔ⁶ tɕʰi⁴/ ‘to dispatch’ (15a) vs. /ʨʰi̟⁶ pʰɑ⁶/ ‘bubble’ (15b). Here, T4 in /tɕʰi⁴/ completely loses its underlying form (low–level) and is realized with a high-falling contour, similar to Shanghainese (Chen & Gussenhoven Reference Chen and Gussenhoven2015). However, the lexical tone of the preceding tone T6 in /pʰɑ⁶/ (low–dipping) is only preserved to a certain extent. The same tone (i.e. T6) is realized with an audible pitch level difference: T6 in /pʰɔ⁶ tɕʰi⁴/ is overall lower than that in /ʨʰi̟⁶ pʰɑ⁶/.
In addition, it is worth noting that syllables with aspirated onsets show two different patterns of changes. They pattern either with syllables that have unaspirated onsets and carry T1, or with syllables that have voiced onsets and carry T4, T6, or T8. For example, the sandhi change of /tsʰɪŋ¹ zɹ̩⁶/ ‘in person’ patterns with that of /sɪŋ¹ zɑ̃⁶/ ‘heart’; while /tsʰɅ⁸ djɘ̝o²/ ‘to stand out’ patterns with /zɅ⁸ djɘ̝o²/ ‘tongue’.
It is important to conclude here that even within the arguably simplest construction beyond a monosyllabic morpheme (i.e. disyllabic compounds), Lili Wu Chinese already exhibits different patterns of tonal realization from its neighboring Northern Wu dialects such as Shanghainese (Chen & Gussenhoven Reference Chen and Gussenhoven2015). It is not only subject to the influence of the preceding tone on tonal realization, but also seems sensitive to tonal properties of the second syllable. In this illustration, we have just presented a preliminary glimpse into the pitch contours of disyllabic compounds in Lili Wu Chinese. Needless to say, more data and further research are needed.
Transcription of recorded passage ‘North Wind and the Sun’
The passage is transcribed phonemically, using the symbols presented in the vowel and consonant charts. Tones have been transcribed phonemically before ⁻ (with pitch levels for the eight tonal categories provided in the section on lexical tones). Listeners will find significant deviation in the actual pitch contours of these lexical tones due to contextual tonal variation (see Chen Reference Chen, Cohn, Fougeron and Huffman2012 for a comprehensive review on tonal variation). Given the salient feature of tone sandhi in Lili Wu Chinese (and generally speaking in Wu dialects), we have also provided a transcription of tonal contours based on perceptual impressions of the pitch levels according to Chao’s system (after ⁻). For personal pronouns and modal particles, only actual pitch contours have been transcribed. The boundaries between syllables are indicated by spaces. The boundaries of tone units are marked by parentheses, with | marking the end of major phrases and || that of utterances. Our consultant tends to produce more creakiness in running speech. For example, creaky voice can be identified at the end of /kʰø²¹³/ ‘to look’ in ‘make a comparison (a bit, as an attempt)’ with a sudden rise of f0. Worth noting that segmental reduction usually happens in running utterance. For example, /Ʌ/ is sometimes reduced to [ə]. [] marks the (allophonic) cases that auditorily deviate saliently from the phonemic transcription.
Phonemic transcription
Orthographic transcription
Acknowledgements
We would like to thank our principal consultant Mr Liangquan Cheng for making this possible. We are also grateful to Mr Haimin Li for his coordination and arrangement for the three fieldwork sessions conducted in Lili. In addition, we would like to thank Maarten Mous and Ruiqing Shen for valuable comments on earlier versions of our paper, and to Zhongmin Chen, Hang Cheng, Maarten Kossmann, Feng Ling, Zhongwei Shen, Yimin Sheng, Rujie Shi, Huan Tao, Ping Wang, Xinyi Wen, and Dan Yuan for sharing their thoughts with us on various linguistic aspects of the language, and to Feifan Wang for collecting pilot recordings, and to He Huang, Huaqiang Song, and Lei Wang for sharing references. Moreover, we gratefully acknowledge the anonymous reviewers of this journal for helpful comments and suggestions. The proofreading assistance from Seamus Leith and the audio editing guidance from André Radtke are gratefully appreciated. This work is supported by China Scholarship Council (CSC) and Leiden University Centre for Linguistics (LUCL) scholarship to the first author as well as the KNAW–China Exchange grant (13CDP012) from the Netherlands Royal Academy of Sciences to the second author. Neither the individuals and institutions cited herein nor the funding agencies, however, should be held responsible for the views expressed in this paper.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ S0025100320000092.