Zhushan Mandarin () is a dialect of Mandarin Chinese (ISO 639-3; code: cmn) spoken in the Zhushan county (), which belongs to the city of Shiyan () in Hubei Province (), the People’s Republic of China. As shown in Figure 1, the county borders the city of Chongqing () to the south and Shaanxi Province () to the north. It has an area of 3,586 km² and a population of about 4.7 million residents (Hubei Province Annals Committee 2017). The general consensus is that it is a Mandarin dialect (LAC 2012). However, there have been debates on the proper classification of this dialect as belonging to the Jianghuai Mandarin group () (e.g. Coblin Reference Coblin, Dong and Feng2005, X. B. Liu Reference Liu2007) or to the Southwestern Mandarin group () (e.g. Ting Reference Ting1996, X. C. Liu Reference Liu2005, L. Li Reference Li2009).
The debated status of Zhushan Mandarin is believed to result not only from the influence of the dialects from surrounding regions but also from the “combined effects of inheritance and convergence” in its linguistic system (Coblin Reference Coblin, Dong and Feng2005: 111). Many speakers of Zhushan Mandarin are descendants of migrants from different regions where not only different Mandarin dialects (e.g. Jianghuai Mandarin, Southwestern Mandarin, and Zhongyuan Mandarin) but also other Sinitic varieties (e.g. Gan dialects) are spoken (Yakhontov Reference Yakhontov1986: 134; Zhushan County Annals Committee 2002; Coblin Reference Coblin, Dong and Feng2005; Guo Reference Guo2012). Despite the different proposals on the dialectal classification of Zhushan Mandarin, researchers do agree that Zhushan Mandarin possesses a mixture of phonetic/phonological features known to Jianghuai and Southwestern Mandarin at both the segmental and tonal levels.
Within the county, impressionistic descriptions also suggest different subdialects. In addition to the main variety spoken in the county of Zhushan (hereafter the ‘Urban Zhushan Mandarin’), there are three locally well-known varieties, according to the Zhushan County Annals Committee (2002). One is spoken in southern Zhushan (e.g. in Liulin Township, Hongping Township, and Liangjia Township), which seems to resemble the Southwestern Mandarin dialects spoken in the eastern part of Sichuan province. Another is spoken in western Zhushan (i.e. the area to the west of Leigu Township), which is closer to the varieties spoken in Shaanxi Province. A third one, spoken in northern Zhushan (e.g. in Canglang Township, Loutai Township, and Shenhe Township), differs from the other two varieties with rich r-suffixation. These sub-regional varieties provide further evidence that Zhushan Mandarin is spoken in a transitional zone where several dialects are present in the surrounding areas. The different views on the classification of Zhushan Mandarin in the literature may be, in part, due to the different regional varieties which field researchers have investigated. Recent descriptive studies on Zhushan Mandarin include Ding (Reference Ding2017) (on the Baofeng variety of Zhushan Mandarin), Zhu (Reference Zhu2015), and Sheng (Reference Sheng2016), with the latter two on unspecified varieties. To date, however, there is no systematic and comprehensive documentation of the sound system of Zhushan Mandarin.
The current description of the sound system of Zhushan Mandarin aimed at filling in the gap, with data based on the speech of two male speakers recorded by the second author. Both speakers produced 3900 monosyllabic morphemes; one also produced 1160 bisyllabic (lexical) collocations. The target stimuli were chosen on the basis of word lists compiled for dialect surveys by the Linguistic Committee of Chinese Academy of Social Sciences (2004). One speaker was interviewed and recorded in 2007 (at the age of 37) and the other in 2016 (at the age of 36). Both speakers are from the town of Shangyong (); the variety spoken there is known as the Urban Zhushan Mandarin. The recordings that accompany this text were produced by the second speaker who grew up in Nanba Village () of Shangyong Town. We chose his productions because of the superior recording quality (i.e. with less background noise) of his speech compared to that of the first speaker. This speaker uses Zhushan Mandarin daily but also communicates in Standard Chinese fluently. Prior to the year when the recording was done, he had lived his entire life in Zhushan.
We will show that Zhushan Mandarin differs significantly from the best known Mandarin variety (i.e. Standard Chinese) not only in its lexical tonal system and contextual tonal variation but also in several segmental properties. This description thus provides further evidence for the diversity of Mandarin dialects, a family of dialects consisting of no less than seven distinct groups with salient phonological, lexical, and grammatical differences (Yuan Reference Yuan1960, R. Li Reference Li1985, LAC 2012). In particular, we wish to bring readers’ attention to the phonetic and phonological characteristics of important features of Zhushan Mandarin (often in comparison to Standard Chinese). These features include three aspects of the segmental properties of Zhushan Mandarin: (i) the neutralization of alveolar and post-alveolar obstruent onsets (compared to Standard Chinese) and the phonetic realization of the neutralized obstruents as a function of the following rhyme, (ii) the phonotactics of /n/ and /l/ as syllable onset, and (iii) the distribution of /w ɻ/ and their related diachronic sound changes. Also, we provide a detailed description of the tonal inventory and contextual tonal variations within bi-syllabic collocations. These segmental and tonal features conjointly provide further insights for those who are interested in diachronic sound changes and synchronic typological tendencies of the sound systems of Sinitic languages.
Lexical tone
Zhushan Mandarin has four lexical tones, marked here with the numerical superscripts 1–4 representing the four tonal categories. The f0 contours of the tones uttered in isolation are illustrated in Figure 2A, where the tone-bearing syllables have an obstruent onset (i.e. /tʰi¹/ ‘ladder’, /tʰi²/ ‘to mention’, /tʰi³/ ‘body’, and /tʰi⁴/ ‘to shave’), and in Figure 2B, where the tone-bearing syllables have a sonorant onset (i.e. /mɐ¹/ ‘mother’, /mɐ²/ ‘hemp’, /mɐ³/ ‘horse’, and /mɐ⁴/ ‘to scold’). Onset exerts an effect on the initial portion of the f0 contour. (See e.g. Y. Chen Reference Chen2011 and references therein for further discussions on f0 perturbation effects of consonants.) Generally speaking, Tone1 and Tone3 both show a dipping f0 contour but are realized in different registers: T1 within the lower register and T3 within the higher one. Thus, although T3 words in Zhushan Mandarin are mostly translation equivalents with T3 words in Standard Chinese, in terms of the actual f0 contours of their lexical tones, T1 in Zhushan Mandarin sounds more similar to T3 in Standard Chinese. Both T2 and T4 are realized with a falling f0 contour with T2 showing a shallow falling slope and T4 a much steeper fall. The two contours of T2 in this figure seem different (due to segmental perturbation), but they are considered to have the same T2 by our speakers. From Figure 2, it is also clear that the dipping tones (T1 and T3) tend to be longer than T2 and T4.
For ease of comparison, Figure 3 plots overlaid f0 contours for the four lexical tones, where each curve represents the mean f0 contour of one lexical tone, taken at eleven equidistant points of the rhyme, averaged over ten words sampled from the same tonal category. The sample words have either a stop onset or no onset and were produced in isolation by the same second informant of ours. To reflect the tone-intrinsic duration differences, we used the average duration of the ten words for each tonal category.
Following the International Phonetic Alphabet (IPA), we may transcribe the tones as T1 (), T2 (), T3 (), and T4 (). We may also adopt the five-level pitch marking system developed by Chao (Reference Chao1930), known as the tone letters, to provide a numerical impression of the f0 contours of the lexical tones: T1 (324), T2 (54), T3 (435), and T4 (51). Here, the numbers represent the pitch level of the tones within a speaker’s pitch range, which is divided into five levels, with 5 indicating the highest level and 1 the lowest. Note that these systems do not necessarily imply any theoretical position on tonal representation. To follow the convention of representing tones in terms of H and L level tones and tonal registers (Snider Reference Snider1990, Bao Reference Bao1999, Yip Reference Yip2002), we see two possibilities. One is to represent T1 as a low-register rising tone (l-LH), T2 as a high tone (H), T3 as a high-register rising tone (h-LH), and T4 as a falling tone (HL). Another possibility is to allow only H and L level tones in the system and represent T1 as a Low tone, T2 as a High tone, T3 as a Rising tone, and T4 as a falling tone. It is clear that as the phonological system becomes more economical, the mapping between tonal representations and their corresponding surface f0 contours becomes more obscure. More research is needed to adjudicate the different possibilities.
Consonants
Zhushan Mandarin has 25 consonants. Example morphemes illustrating consonants are listed below, where most of the words have Tone1, with a few remaining ones having T2, T3, or T4.
Like all Mandarin dialects, Zhushan Mandarin shows a two-way (i.e. voiceless unaspirated and voiceless aspirated) laryngeal contrast in plosives and affricates (e.g. /tʰəŊ¹/ ‘through’ vs. /tsʰəŊ¹/ ‘scallion’ and /tɛn¹/ ‘lamp’ vs. /tsɛn¹/ ‘to quarrel’). Furthermore, all fricatives in Zhushan Mandarin are voiceless. Different from Standard Chinese, which has alveolar sounds that tend to be dental (Lee & Zee Reference Lee and Zee2003), /s ts tsʰ/ in Zhushan Mandarin are laminal alveolars. They are produced with the tongue blade (rather than the tip) behind the alveolar ridge, with narrow contact/constriction formed mainly through the raising of the tongue sides, as suggested by the palatographic and linguographic records of /ts/ produced in the morpheme /tsɹ̩¹/ ‘to know’ in Figure 4. /s ts tsʰ/ can be followed by the syllabic /ɹ̩/ as shown in /sɹ̩³/ ‘dead’, /tsɹ̩³/ ‘purple’, and /tsʰɹ̩³/ ‘here’.
/tʃ tʃʰ ʃ/ are post-alveolar, occurring only with the retroflex approximant /ɻ/ to form a complex onset as shown in the examples above, or with the syllabic /ɻ̩/ (e.g. /tʃɻ̩¹/ for both ‘pig’ and ‘residence’). The palatal obstruents in Zhushan Mandarin /tɕ tɕʰ ɕ/ co-occur with the high front vowel /i/ (e.g. /tɕi¹/ ‘machine’, /tɕʰi¹/ ‘to paint’, and /ɕi¹/ ‘west’) or the palatal glide /j/ (e.g. /tɕjan³/ ‘to reduce’ and /ɕjo²/ [ɕɥo²] ‘to study’).Footnote 1 Thus, /tʃ tʃʰ ʃ/ and /tɕ tɕʰ ɕ/ can be considered context-specific allophonic variants of /ts tsʰ s/, with the latter showing the widest distribution in Zhushan Mandarin. We may summarize as the following: /ts tsʰ s/ are realized as [tʃ tʃʰ ʃ] before the retroflex approximant /ɻ/ and as [tɕ tɕʰ ɕ] before /i/ and /j/. Given the prominent role the three groups of sounds play in the lexicon of Zhushan Mandarin and our native consultants’ intuition of them being separate and independent sounds, we have listed all three series of sounds as phonemes in the consonant chart. Zhushan Mandarin is known for the neutralization of alveolar and post-alveolar obstruent onsets in certain contexts (compared to Standard Chinese). Our novel finding is that the specific place of articulation of the merged onsets varies as a function of the following rhyme, which we will discuss further in the section on areal features below.
There are three nasals in Zhushan Mandarin (/m n Ŋ/). /m/ only occurs in the onset position. /Ŋ/ can be both an onset and a coda (e.g. /tɑŊ³/ ‘party’). /n/ can also appear in both syllabic positions. As an onset, it is typically before a high front vowel or a palatal glide. Note that /n/ in such contexts also tends to be relatively retracted (e.g. /ni³/ [n̠i³] ‘you’ and /njɛ¹/ [n̠jɛ¹] ‘to knead with fingers’), which may be perceived as palatalized. In our corpus, we have also observed that an alveolar nasal coda may assimilate to the place of articulation of the following onset within a compound as illustrated in the following triplet: /san¹ pjɛn¹/ [sam¹ pjɛn¹] ‘mountainside’, /san¹ tɪn³/ [san¹ tɪn³] ‘mountain top’, and /san¹ xəʊ⁴/ [saŊ¹ xəʊ⁴] ‘mountain back’. Future research is needed to establish how general the assimilation pattern is.
Zhushan Mandarin has five approximants: /w j ɹ ɻ l/. /l/ serves as an onset (/la²/ ‘to take’). Complementary to /n/, it is prohibited before a front vowel or a palatal glide. Further details on the /n/ and /l/ alternation as syllable onset are provided in the section on areal features. /w/ and /j/ both can serve as a simplex onset or as a glide in a complex onset (e.g. /kwɐ¹/ ‘melon’, /xwei¹/ ‘grey’, /pʰjaʊ⁴/ ‘ticket’, and /tjaʊ⁴/ ‘to hang’). /j/ occurs after several onsets with different places of articulation, including bilabial, alveolar, and palatal (/ɕjɛn¹/ ‘fresh’). /w/, however, does not occur after bilabial, labiodental, and palatal onsets and is also prohibited after alveolars, which is to be discussed further in the section, on areal features. In words such as /tɕjo¹/ [tɕʷɥo¹] ‘foot’ and /jo¹/ [ɥo¹] ‘medicine’, we perceive cues which suggest the presence of a labial-palatal approximant. [ɥ] is treated here as an allophone of the unrounded palatal /j/ which undergoes labialization due to the following rounded vowel.
/ɹ/ can serve both as onset and as syllabic rhyme after an alveolar homorganic onset (e.g. /sɹ̩³/ for both ‘dead’ and ‘history’, /tsɹ̩³/ for both ‘purple’ and ‘paper’, /tsʰɹ̩³/ for both ‘this’ and ‘tooth’). /ɹ/ seems to form a relatively narrow constriction with the (post-)alveolar region, giving rise to more friction. Among Sinologists, this approximant onset is commonly described as the voiced fricative /ʑ/ given its weak frication (e.g. Ding Reference Ding2017). We opted to treat it as an approximant given two observations. First, there is no other voiced fricative in the language. Second, despite the weak frication, the percept of this sound for us is closer to a sonorant than a strident fricative. Further studies are certainly needed to verify the observation and to understand the articulatory and acoustic features of this sound fully.
/ɻ/ is a retroflex approximant and differs from /ɹ/ not only in terms of its place of articulation but also in its distinct lip position, which can be both protrusion and compression depending on its syllabic role, giving rise to the percept of different types of rounding. The supplementary video recordings illustrate the articulation of these two sounds in /ɹɛn²/ ‘person’ and /ɻɛn²/ ‘cloud’ (Video01). Note that the adoption of the retroflex symbol /ɻ/ does not imply that in Zhushan Mandarin, its articulation involves extreme displacement and curling of the tongue tip as is typical for retroflex sounds spoken in South Asia (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996; Namboodiripad & Garellek Reference Namboodiripad and Garellek2017 for Malayalam). Rather, the retroflection is likely due to retracted tongue back and raised tongue blade, more similar to what has been reported in ultrasonic data for Standard Chinese (Lee-Kim Reference Lee-Kim2014). Future research, preferrably with ultrasound imaging, would be necessary to clarify further the articulation of this sound and the extent of its acoustic and articulatory uniformity with the preceding onset.
/ɻ/ can be syllabic in an onsetless syllable (/ɻ̩²/ for both ‘if’ and ‘fish’) or following a post-alveolar onset (/tʃɻ̩¹/ for both ‘pig’ and ‘residence’, and /ʃɻ̩¹/ for both ‘book’ and ‘humble’). When serving as a syllabic approximant, /ɻ/ is realized with a rounding feature better characterized as due to lip compression. /ɻ/ also co-occurs with post-alveolar obstruents as complex onsets. It may be realized very briefly, giving the impression that the complex onsets are merely retroflex sounds with protruded lips. We have, therefore, analyzed /ɻ/ as secondary articulation in /tʃʰɻo⁴/ [tʂʰʷo⁴] ‘mistake’, /tʃɻo⁴/ [tʂʷo⁴] ‘to sit’, /tʃɻan¹/ [tʂʷan¹] for both ‘brick’ and ‘to donate’ and /ʃɻɐ¹/ [ʂʷɐ¹] ‘to brush’. The retroflex feature and lip rounding are further illustrated in /tʃɻɐ¹/ [tʂʷɐ¹] ‘to catch’ (in comparison to /tsa¹/ [tsɐ¹] ‘residue’).
/ɻ/ can also serve as an onset by itself (e.g. /ɻe²/ ‘moon’ and /ɻɛn²/ ‘cloud’). As a simplex onset, /ɻ/ is realized with a rounding feature better characterized as due to lip protrusion. Thus, we have posited the same retroflex approximant /ɻ/ for two positional variants with different lip rounding features, given that they are not minimally constrative. This treatment is similar to the way different lip positions for vowels have been discussed in Ladefoged & Maddieson (Reference Ladefoged and Maddieson1996). The rounded onset /ɻ/ sometimes even induces labialization of the following alveolar nasal coda as observed in /ɻan²/ [ɻam²] ‘round’, /ɻan³/ [ɻam³] ‘soft’, and /ɻan³/ [ɻam³] ‘far’. Note that when our informant was video-recorded while producing five renditions of /ɻan²/ ‘round’ (Video02) with a self-administered pause in between, we observed labialization of /n/ as [m] only in the very first and the fourth renditions ([ɻam²]). We consider the bilabial [m] to be a free variant of the alveolar nasal coda /n/ due to the spreading of the labial feature from /ɻ/ for ease of articulation.Footnote 2 The distribution of the approximants /w ɻ/ and their related diachronic sound changes will be discussed below as another important areal feature of Zhushan Mandarin.
Vowels
Zhushan Mandarin has nine monophthongs in open syllables: three (/y u o/) are rounded and four (i.e. /I a ə ɑ/) only appear in closed syllables with a nasal coda. Note that the vowel /ɜ/ tends to be rhotacized as in /kɜ¹/ [kɜ˞¹] ‘song’.
In addition, there are four diphthongs, /ei ae ɑo əʊ/.
/u/ is prohibited after alveolar onsets as evident in */lu du tsu/ but appears after bilabials (e.g. /pu⁴/ ‘cloth’) and velars (e.g. /kʰu³/ ‘bitter’ and /xu¹/ ‘to breathe’). The absence of alveolar onsets before the vowel /u/ is due to a diachronic change, where the monophthong /u/ changed to the diphthong /əʊ/ after alveolar onsets as in /tsʰəʊ¹/ ‘unrefined’, /səʊ⁴/ ‘vegetarian’, /tsəʊ¹/ ‘to rent’, /ləʊ¹/ ‘deer’, and /tʰəʊ⁴/ ‘rabbit’. This change has led to homophones such as /tsəʊ³/ for both ‘ancestor’ and ‘to walk’. In Standard Chinese, these pairs of words have different vowels (i.e. /tsu³/ ‘ancestor vs. /tsəʊ³/ ‘walk’).
Figures 5–7 show the F1-F2 values (in Hz) of the monophthongs in open syllables (Figure 5) and closed ones (Figure 6) based on the mean formants of ten samples for each vowel, randomly selected from the data corpus. The four diphthongs are plotted in Figure 7; they are also based on the mean formants of ten samples for each vowel.
As mentioned earlier and illustrated in Figure 8, Tone4-bearing syllables in Zhushan Mandarin are intrinsically shorter than the other tone-bearing syllables. The tone-intrinsic syllable duration difference seems to exert an effect on the diphthong quality of the vowel /ae/. Figure 8A shows that the T1-bearing syllable (/tae¹/ ‘dull’) is much longer than the T4-bearing syllable (/tae⁴/ ‘to wear’), despite the fact that they have the same segmental syllable. During the last 50 ms of the syllables, the vowel spectral pattern shows a perceptually salient difference with an F2 of around 2200 Hz in /tae¹/ ‘dull’ and 1930 Hz in /tae⁴/ ‘to wear’. Such a tone-intrinsic vowel spectral difference is further illustrated in /Ŋae¹/ ‘sad’ vs. /Ŋae⁴/ ‘love’ and /tsʰae¹/ ‘to guess’ vs. /tsʰae⁴/ ‘vegetables’. Interestingly, in syllables with the other diphthongs (/ei əʊ ɑo/), despite tone-intrinsic syllable duration differences, only subtle differences in their spectral qualities can be observed, as illustrated in /tɑo¹/ ‘knife’ and /tɑo⁴/ ‘to arrive’ (Figure 8B), /tei¹/ ‘to heap’ vs. /tei⁴/ ‘correct’ (Figure 8C), and /təʊ¹/ ‘capital’ vs. /təʊ⁴/ ‘to fight’ (Figure 8D).
Important areal features
One important areal feature of Zhushan Mandarin is the across-the-board neutralization of alveolars (/s ts tsʰ/) vs. post-alveolars (/ʃ tʃ tʃʰ/) contrasts which are distinctive in Standard Chinese. What hasn’t been reported is the coarticulatory assimilation between the place of articulation of the merged onsets and their following rhymes.
When the rhyme is a front vowel or followed by an alveolar nasal coda, we observe laminal alveolar fricative or affricate onsets, as evident in /tsaɛ⁴/ (for both ‘again’ and ‘debt’), /tsʰaɛ²/ (for both ‘talent’ and ‘firewood’), /san³/ (for both ‘scattered’ and ‘flash’), /tsan⁴/ (for both ‘to praise’ and ‘to stand’), and /tsʰɛn²/ (for both ‘storey’ and ‘surname–CHENG’). These homophones in Zhushan Mandarin are all minimal pairs contrasting between alveolar and post-alveolar onsets in Standard Chinese. Further examples include /tse²/ ‘responsibility’ and /tse²/ ‘philosophy’.
When the rhyme contains a vowel further back (/əʊ ɑo/) or ends with a velar nasal coda as in /əŊ ɑŊ/, the place of articulation of the onset moves correspondingly back towards the post-alveolar region and perceptually is better characterized as a post-alveolar sound. Examples for /əʊ/ are /tsəʊ³/ [tʃəʊ³] (for both ‘to walk’ and ‘elbow’) and /tsʰəʊ⁴/ [tʃʰəʊ⁴] (for both ‘to gather’ and ‘smelly’). Examples for /ɑo/ are /sɑo³/ [ʃɑo³] (for both ‘sister-in-law’ and ‘little’), /tsɑo³/ [tʃɑo³] (for both ‘early’ and ‘to find’) and /tsʰɑo³/ [tʃʰɑo³] (for both ‘grass’ and ‘to stir fry’). Examples for /əŊ/ are /tsʰəŊ¹/ [tʃʰəŊ¹] (for both ‘to charge’ and ‘scallion’) and /tsəŊ¹/ [tʃəŊ¹] (for both ‘clan’ and ‘clock’). Examples for /ɑŊ/ are /sɑŊ¹/ (for both ‘mulberry’ and ‘wound’) and /tsɑŊ¹/ [tʃɑŊ¹] (for both ‘dirty’ and ‘surname–ZHANG).
The assimilatory effect between rhymes and their preceding onsets is further illustrated in two triplets (/tsaɛ⁴/ ‘again’, /tsʰaɛ⁴/ ‘vegetable’, and /saɛ⁴/ ‘race’; /tsəʊ³/ [tʃəʊ³] ‘to walk’, /tsʰəʊ³/ [tʃʰəʊ³] ‘ugly’, and /səʊ³/ [ʃʰəʊ³] ‘hand’). The effect of nasal coda is further illustrated in the pairs /tsɛn¹/ ‘to quarrel’ vs. /tsəŊ¹/ [tʃʰəŊ¹] ‘clock’ and /tsʰɛn²/ ‘city’ vs. /tsʰəŊ²/ [tʃʰəŊ²] ‘insect’. There are, however, both inter- and intra-speaker variations in such onset-rhyme coarticulation. Further research, with a large number of speakers and more stimuli, is necessary to investigate the distributional patterns and the cognitive representation(s) of such variations.
A second important areal feature is the alternation of /n/ and /l/ as syllable onset. /n/ as an onset is typically observed before a high front vowel /i/ (e.g. /ni²/ for ‘nun’ and ‘pear’, /nIn²/ for ‘peace’ and ‘zero’) or /y/ (e.g. /ny²/ ‘donkey’), or before a palatal glide (e.g. /j/ in /njɛ¹/ ‘to pinch’, /njɑo³/ for both ‘bird’ and ‘to settle’, /njan²/ for both ‘year’ and ‘to unite’, /njɑŊ⁴/ for both ‘to brew’ and ‘bright’, /njəʊ²/ for both ‘cow’ and ‘to flow’). The alveolar /n/ is not as dental as typically observed in Standard Chinese; rather, it is retracted. /l/ is observed in the other “else” phonological contexts, as evident in /lae³/ ‘milk’, /lɑo³/ for both ‘old’ and ‘brain’, /lɐ¹/ ‘spicy’, /lan²/ for both ‘difficult’ and ‘orchid’, /lo²/ for both ‘gong’ and ‘to move’, /lɛn²/ ‘capable’, /ləʊ²/ for both ‘building’ and ‘slave’, and /lei⁴/ for both ‘inside’ and ‘tear’. These examples suggest that the neutralization and complementary distribution of /n/ and /l/ is conditioned by their following rhyme. In Standard Chinese, /n/ and /l/ are distinctive phonemes with a large number of minimal pairs in all of the contexts mentioned above.
The third important areal feature of Zhushan Mandarin is the distribution of the two approximants /w ɻ/ as well as their related diachronic sound changes. /w/ in Zhushan Mandarin is barred after an alveolar onset. Thus, Standard Chinese (SC) words with a complex onset (i.e. alveolar obstruent followed by /w/) all have a simplex onset in Zhushan Mandarin, as evident in /tei⁴/ ‘correct’, /tsʰɛn⁴/ ‘inch’, /lɛn⁴/ ‘to discuss’ (SC: /tweI⁴/, /tsʰwən⁴/, /lwən⁴/, respectively). The loss of /w/ resulted in quite a large number of homophones in Zhushan Mandarin (e.g. /tan⁴/ for both ‘light color’ and ‘to break’, /lan⁴/ for both ‘rotten’ and ‘messy’), compared to Standard Chinese (/tan⁴/ for ‘light (color)’ and /twan⁴/ for ‘to break’; /lan⁴/ for ‘rotten’ and /lwan⁴/ for ‘messy’). Following the common assumption that Chinese dialects evolved from Middle Chinese, the loss of the glide /w/ must have occurred before the context-specific neutralization of the alveolar and post-alveolar onsets, given the presence of minimal pairs such as /tsʰei¹/ ‘to urge’ (SC: /tsʰwei¹/) and /tʃʰɻei¹/ [tʂʰʷei¹] ‘to cook a meal’ (SC: /tʃ̺ʰwei¹/), /tsan⁴/ ‘diamond’ (SC: /tswan⁴/) and /tʃɻan⁴/ [tʂʷan⁴] ‘to turn’ (SC: /tʃ̺wan⁴/). For further details on the evolution of the onglide /w/ across Chinese dialects, see G. Y. Zhang (Reference Zhang2006).
Concerning /ɻ/, what makes Zhushan Mandarin interesting is that /ɻ/ has evolved from different sources, as evident in the different sounds (/u/, /y/, /w/ and /ɥ/) present in their transliterated cognates in Standard Chinese (Table 1). This table also shows that morphemes that surface with the same vowel in Standard Chinese (e.g. /ɕy¹/ for ‘modest’ and ‘mustache’) may be realized with different phonemes in Zhushan Mandarin. We conjecture that this was conditioned by their different onsets in the Middle Chinese sound system, which are provided in the table in italics with an asterisk (to indicate that they are reconstructed). The reconstruction of MC was done in consultation with the website for historical sound system reconstructions provided by the East Asian Languages Data Center at Fudan University (http://ccdc.fudan.edu.cn/linguae/ltcPhonology.jsp) and following Karlgren (Reference Karlgren1915–1926). In the table, morphemes that surface with /i j/ in modern Zhushan Mandarin had dental sibilant onsets in Middle Chinese (see also Norman Reference Norman1988). Note that for /ɕIn²/ ‘ten-day period’ and /ɕi¹/ ‘mustache’, younger generations of Zhushan Mandarin speakers (like our second informant) now pronounce them as /ʃɻɛn²/ and /ʃɻ̩¹/, respectively. This is presumably due to an analogical effect of sound change, given that these words belong to the same sound category as /tɕɥɛn¹/ ‘army’ and /ɕy¹/ ‘modest’ in Standard Chinese. What remains fascinating for future research is to understand further the various paths of sound change that have led to the current sound system of Zhushan Mandarin.
aThe alveolar nasal coda in /ʃɻεn2/ ‘ten-day period’ is rounded, likely due to the rounding feature of /ɻ/. The narrow transcription is [ʃɻεm2].
It is important to note that the syllabic /ɻ̩/ in our analysis is transcribed as /ʯ/ by most Sinologists and is known as one of the four apical vowels introduced by Karlgren (Reference Karlgren1915–1926). Pullum & Ladusaw (Reference Pullum and Ladusaw1996: 80) use /ʯ/ to represent ‘a rhotacized non-open central or back rounded “apical” vowel with friction’ and consider it essentially a syllabic fricative with lip rounding /ʐʷ/. We have opted to transcribe the sound as an approximant for two reasons. First, although the duration of the approximant varies depending on its syllabic role, its formant pattern has remained rather stable as a retroflex approximant. This is illustrated in Figure 9, where we can observe a consistently lowered F3 across the three contexts: an approximant onset (/ɻe/ ‘moon’), a syllabic approximant in an onsetless syllable (/ɻ̩/ ‘fish’), and a syllabic approximant after an affricate onset (/tʃɻ̩/ ‘pig’). Worth further research is the clearly audible frication noise in the approximant, especially over the first half of the rhyme, and the extent to which the preceding onsets affect the articulatory and acoustic properties. The added benefit of treating the sound as a syllabic approximant is that we avoid transcribing the sound with a phonetic symbol that is not yet recognized in the IPA chart.
Syllable structure
Zhushan Mandarin has the syllable structure of (C1)(A)V(C2) with an obligatory tone. Here, ‘A’ means ‘approximants’ (e.g. /j/ as in /tjɑo¹/ ‘to fish’, /w/ as in /kwɐ ⁴/ ‘to hang’, or /ɻ/ as in /tʃɻei¹/ ‘to chase’). The coda C2 can be either /n/ or /Ŋ/ as in /kɛn¹/ ‘root’ and /kəŊ¹/ ‘public’, respectively.
There are several phonotactic constraints in terms of the onset C1A combination. Before /j/, a range of consonants are possible, as in /pjɑo³/ ‘watch’, /njɑŊ²/ ‘mother’, /tjɛn¹/ ‘to jolt’, /tɕjɑo⁴/ ‘to call’, /njəʊ²/ ‘cow’, /ɕjɛ³/ ‘blood’, and /tɕjɐ³/ ‘fake’, but labiodental, post-alveolar and velar onsets are banned. /w/ in Zhushan Mandarin, however, co-occurs only with velars (e.g. /kwɐ¹/‘melon’ and /xwei¹/ ‘grey’). /ɻ/ co-occurs only with a post-alveolar obstruent (e.g. /tʃɻɛn³/ ‘accurate’, /tʃʰɻɛn³/ ‘stupid’, /ʃɻɛn³/ ‘bamboo shoot’, /tʃɻan⁴/ ‘to turn around’, and /tʃʰɻan⁴/ ‘to string together’).
While traditionally, /w j ɻ/ are described as part of the rhyme among Sinologists, we have posited them as part of a complex onset, following Lin (Reference Lin2007). An alternative treatment is that these approximants serve as the secondary articulation of a simplex onset (e.g. Duanmu Reference Duanmu2007). In our view, the latter option may be adopted for /ɻ/ for Zhushan Mandarin, given that its presence can be fleeting and that it is evident mainly in its influence on the articulation of the remaining segments within the syllable. Such an approach then leads to the possibility that even within the same dialect, different approximants may have different syllabic affiliations. To adjudicate what should be the cognitively plausible representation(s) for Zhushan Mandarin approximants, experimental studies on the phonetic realization and phonological representation of these sounds are needed.
Contextual tonal variations within bi-syllabic constituents
As introduced earlier, Zhushan Mandarin has four lexical tones. In connected speech, we observe different levels of segmental and tonal reduction. Segmentally speaking, they tend to have more centralized vowels, lenited consonants, less intensity, and/or shorter duration. Suprasegmentally speaking, the pitch contours of these syllables are more influenced by the surrounding lexical tones. Some of these contextual reduction processes may be attributed to the lack of stress which results in a neutral tone, indicated by the superscript 0 hereafter. Typical examples of neutral tone syllables include reduplicated forms (e.g. /sɑo³sɑo³/ ‘sister-in-law’) and grammatical particles (e.g. /tʰɑo²tsɹ̩o³/ ‘peach’). There are also bi-syllabic lexical items in our corpus such as /pan⁴ fɐo³/ ‘method’ (in contrast to /fan⁴ fɐ²/ ‘to commit crime’) and /kʰwae⁴ xo³/ ‘joyful’ (in contrast to /tsəŊ⁴ xo²/ ‘heavy lifting’). For bi-syllabic lexical items, speakers seem to show variation in whether and to what extent the tonal articulation is reduced. This suggests the need for further research to differentiate underlying neutral tone syllables (as in reduplication and particles) from those that derive neutral tone from underlying full lexical tones as well as from those that undergo tonal reduction due to other processes.
In this section, we will restrict our attention to the four lexical tones in Zhushan Mandarin and show their contextual tonal variations within bi-syllabic collocations, which may be words, compounds, or lexicalized phrasal constructions. Details aside, given a bi-syllabic lexical collocation (without neutral tone), Zhushan Mandarin shows a similar right-dominant tonal variation pattern as Tianjin Mandarin (e.g. Q. Li & Y. Chen Reference Li and Chen2016) and Standard Chinese (e.g. Xu Reference Xu1997). (For further discussions of left- vs. right-dominant tone sandhi systems, see e.g. Yue-Hashimoto Reference Yue-Hashimoto1987, M. Y. Chen Reference Chen2000, J. Zhang Reference Zhang2007.) In right-dominant tonal systems, lexical tone of the first syllable exhibits deviation from its f0 realization in isolation that cannot be accounted for simply as contextual tonal coarticulation; such f0 deviations are categorized as tone sandhi changes; lexical tone of the second syllable, despite co-articulatory influence from the preceding tone, typically resembles that produced in isolation. In contrast, in left-dominant tonal systems, tonal identity of the initial syllable is usually preserved and exerts a strong influence over the non-initial syllables. Such patterns are typified by some Northern Wu dialects (e.g. Y. Chen Reference Chen2008 and references therein on Shanghainese), although many Wu dialects show a mixture of both left-dominant and right-dominant tonal variation patterns.
Illustrated in Figures 10 and 11 are contextual tonal variations in Zhushan Mandarin within bi-syllabic lexical collocations. Each curve represents the z-score normalized f0 contour of one tonal combination, based on measurements from ten equidistant points within each of the two tone-carrying syllable rhymes, averaged over a varying number of items with the same tonal combination (as shown in Table 2). The shaded areas represent ±1 standard error of the plotted mean f0. Duration is time-normalized across all tonal combinations. Space indicates the boundary between the syllable rhymes. Figure 10 plots the f0 contours of the first syllable as a function of the tones of the second syllable (Figure 10A: T1 + Tx; Figure 10B: T2 + Tx; Figure 10C: T3 + Tx; Figure 10D: T4 + Tx). This set of figures highlights the anticipatory effect of the second tone on the first tone, where, in right-dominant tonal systems, drastic f0 variations may be observed. Figure 11 plots the f0 contour of the second syllable as a function of the tone of the first syllable (Figure 11A: Tx+T1; Figure 11B: Tx+T2; Figure 11C: Tx+T3; Figure 11D: Tx+T4). This set of figures highlights the carry-over effect of the first tone on the second one, where, in right-dominant tonal systems, no drastic f0 variations are expected. In Standard Chinese, the former type of changes is typically categorized as tone sandhi changes while the latter type as co-articulatory f0 changes (Xu Reference Xu1997). Although the terms ‘tone sandhi’ and ‘tonal coarticulation’ suggest two broad categorically different classes of contextual tonal variations, the classification between the two is not always straight-forward, as will be evident in the data discussed below.
Figure 10A shows the f0 realizations of T1 (in the first syllable) as a function of four different following tones (in the second syllable, indicated with different colors). As a low dipping tone produced in isolation, T1 is realized with a rising f0 contour before another T1. This is illustrated in /san¹ pjan¹/ ‘mountainside’. Before T2 and T3, T1 is realized with a slightly falling f0 contour as shown in /san¹ tsʰəŊ²/ ‘mountain city’ and /san¹ tIn³/ ‘mountain top’. Perceptually speaking, it is level and may be considered the realization of the first half of the dipping T1 contour over the whole syllable. Before T4 (/san¹ xəʊ⁴/ ‘mountain back’), T1 is realized with a dipping contour with a lower f0 offset compared to that before T1. The f0 rises before T1 and T4 are likely due to two different underlying mechanisms: Tone sandhi dissimilation triggered by the obligatory contour principle (OCP) in the T1 + T1 sequence and anticipatory assimilation triggered by the high f0 onset of the following T4 in the T1 + T4 sequence.
As shown in Figure 10B, T2 is realized with a slightly falling f0 trajectory, resembling that in isolation and sounding (high) level. The consistent T2 realization preceding different lexical tones is illustrated in the quadruplet /lae² pIn¹/ ‘guests’, /lae² ɻan²/ ‘source’, /lae² ɕi³/ ‘name–LAIXI’, and /lae² tɕjan⁴/ ‘incoming letter’.
Figure 10C shows that T3, like in isolation, is realized within the higher register of the speaker’s pitch range, as illustrated in the quadruplet /xo³ ɕIn¹/ ‘Mars’, /xo³ ləʊ²/ ‘stove’, /xo³ tʰei³/ ‘ham’, and /xo³ tɕjan⁴/ ‘rocket’. Before T2, T3 is realized with a level f0 contour, which may be viewed as due to anticipatory assimilation to the following T2. Before T1, T3, and T4, however, T3 is realized with an audible f0 rise. We conjecture that there are three different anticipatory effects. T3 is realized with a raised pitch register before T1 due to anticipatory dissimilation. The raising of T3 before T3, however, is better accounted for via OCP. The third anticipatory effect is the raising of T3 before T4, which seems an assimilation effect (as we have observed in T1T4).
T4 shows a steep falling f0 contour, regardless of the identity of the following lexical tone, as shown in Figure 10D and further illustrated in the quadruplet /tɑo⁴ tɕjɐ¹/ ‘Taoism’, /tɑo⁴ ɻɛn²/ ‘Taoist’, /tɑo⁴ tʰəŊ³/ ‘Confucian orthodoxy’, and /tɑo⁴ sɹ̩⁴/ ‘Taoist priest’. In our corpus, we also noticed an item-specific variant of T4 before T4: a falling f0 contour with a clear rising tail, as shown in /ɕjɑŊ⁴ xwɐ⁴/ ‘proper and reasonable’, /ɕjɑŊ⁴ pʰjan⁴/ ‘photo’, and /tʰei⁴ se⁴/ ‘retiring from a society’. This variant is much less frequent and constitutes only about 10% of total T4T4 sequences. They are not plotted in Figure 10D. Further research is needed to understand the origin and distribution of the two T4 variants.
The same set of data was plotted differently in Figure 11 to illustrate the carry-over effects of the lexical tone of the first syllable on that of the second syllable. Across the figures, we see that the f0 contours of the four lexical tones over the second syllable resemble, to a great extent, that of the tones produced in isolation. When there is a visible influence from the preceding tone, the effect tends to be assimilatory in that the high f0 offset of the preceding lexical tone tends to raise the f0 onset of the following tone. Interestingly, while T2 and T4 show little anticipatory effects (in Figures 10B, D), they appear much more prone to the carry-over influence from the preceding lexical tones (in Figures 11B, D). Two points are worth noting. First, while the carry-over influence on T1 and T3 seems no longer present over the second half of tonal contour, the carryover effect on T2 and T4 may last throughout the whole tone-carrying syllable (e.g. T3T2 and T2T4). Second, a proper understanding of the carry-over effect observed over T2 and T4 needs to take into consideration the dynamic vs. static nature of the lexical tonal target in the preceding syllable. For example, in the TxT4 sequences (Figure 11D), the carry-over effect is not simply contingent upon the static f0 offset of the preceding syllable. Rather, the rising tones in T1T4 and T3T4 show a more general raising effect over T4 than that after T2 and T4.
In summary, we have observed both carry-over tonal variations, which are mainly assimilatory, and anticipatory tonal variations, which can be assimilatory or dissimilatory. As suggested earlier, local contextual tonal variations are generally classified as tonal coarticulation or tone sandhi, with the former typically considered a phonetic gradient process while the latter a categorical phonological process. We conjecture that the carry-over assimilatory changes in Zhushan Mandarin are not likely to be perceived as categorical tonal changes, given the resemblance of the f0 contours produced in the final syllable (within a bi-syllabic constitute) and that in isolation. On the other hand, the anticipatory raising of T1 preceding T1 and T4 is likely to lead to categorical perception of the tonal changes. What remains unclear is whether T1 has one allophone (i.e. a rising variant) before T1 and T4 (i.e. Toneme T1 → rising T1 allotone if preceding T1 or T4), or two rising allotones (i.e. Toneme T1 → rising allotone T1A if preceding T1 and rising allotone T1B if preceding T4). Acoustic studies do not provide any adjudicating evidence for the two competing possibilities. What has also remained for further research is how to explain the f0 variations in T3 (Figure 10C). Perceptually, the magnitudes of T3 f0 raising before T1, T3, and T4 are somewhat similar. Without further experimentation, it is also not clear whether T3 before these tones are different from that before T2. (For more detailed discussions on the categorization of contextual tonal variations into phonetic coarticulation and phonological sandhi, see M. Y. Chen Reference Chen2000, Y. Chen Reference Chen, Cohn, Fourgeron and Huffman2012, Li & Y. Chen Reference Li and Chen2016, and references therein.)
Three points are to be further noted. First, Zhushan Mandarin presents a clear case that lexical tones do assimilate or dissimilate due to neighboring lexical tones in context. However, contextual tonal variations in Zhushan Mandarin, be it assimilation or dissimilation, cannot be represented as the categorical change of one lexical tone to another. This is because the f0 contour modifications (as discussed above) cannot be easily classified as pitch contour changes from one lexical tone to another. Second, while Chao’s (Reference Chao1930) tone letters are excellent visual representations of the lexical tone contours, they fall short in revealing the mechanisms that underly contextual tonal variations observed in the bi-syllabic constituents. Last, more data from multiple speakers are essential to verify the patterns described here. Further experiments are also needed to gain insights not only into the mechanisms of contextual tonal variations in Zhushan Mandarin but also into the effect of these variations on the representation and processing of lexical tones in general.
Transcription of the recorded passage
Phonemic version
The passage is transcribed phonemically at the segmental level, using the symbols presented in the consonant table and the vowel chart above. Four important patterns of variation in connected speech are worth noting. One is the rich r-suffixation observed in the passage reading, as indicated by the symbol ˊ (e.g. /lɐ˞²/). The second one is nasal assimilation (e.g. /In² lae²/ is realized as [In² nae²], which has also been observed in bi-syllabic constituents produced in isolation. The third one is segmental lenition such as the loss of fricative onset (/ʃɻ̩¹ In²/ [ɻ̩¹ In²], /sɛn¹ sɑŊ⁴/ [sɛn¹ ɑŊ⁴]), and the voicing of voiceless obstruents (e.g. /ko³/ [go³], /təʊ¹/ [dəʊ¹]). Last, there is a good number of neutral tone syllables in bi-syllabic lexicalized collocations (e.g. /pe²foŊo³/ ‘north wind’) and grammatical particles (e.g. /ləo³/ – aspect marker).
Lexical tones are transcribed in two ways. First in their pitch contours for Tones 1–4 based on their citation forms produced in isolation, except for cases where neutral tone (Tone 0) is observed; then in numerical representations that approximate contextual tonal variations. The citation forms and their contextual variants are paired up via the symbol - connecting them. For both, we adopted Chao’s (Reference Chao1930) tone letters, but see earlier discussion on the limitation of this coding system.
The boundaries between syllables are indicated by a space. The symbol | marks the end of a phonological word, || the end of a phonological phrase, and ||| the end of an intonational phrase.
Orthographic version
Acknowledgements
We are grateful to our language consultants for making this study possible. We would also like to thank Maarten Kossmann, Menghui Shi, Daan van de Velde, Thom van Hugte, and Xinyi Wen for their comments on earlier versions of the manuscript. In addition, we would like to thank our editors and the anonymous reviewers for their questions and suggestions, as well as Andre Radtke and Ewa Jaworska for their assistance. The support from the Netherlands Royal Academy of Sciences (KNAW-China Exchange Program 13CDP012), the Netherlands Organization for Scientific Research (NWO VI.C.181.040), the Shanghai Philosophy and Social Sciences Foundation (2016BYY005), and the National Social Sciences Foundation (18ZDA 297) are gratefully acknowledged.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0025100320000183.