The Hmu language is spoken by approximately 1,250,000 people who reside in Qiandongnan Miao and Dong Autonomous Prefecture (黔东南苗族侗族自治州), Guizhou Province (贵州省), the People's Republic of China (Wang & Mao Reference Wang and Mao1995: 3–4; Lewis, Simons & Fenning Reference Lewis, Simons and Fennig2016).
Hmu is also known as the eastern dialect of the Hmongic language (Wang Reference Wang1983: 5–7; Wang Reference Wang1985: 103–104; The Language Atlas of China (Chinese Academy of Social Sciences and Australian Academy of the Humanities 1987); Ratliff Reference Ratliff2010: 3),Footnote 1 the latter of which, along with Hmong, A-Hmao, Bunu, Qo Xiong, Jiongnai, Ho Ne, and Pa-Hng, comprises the Hmongic subgroup of the Hmong-Mien language family (for more details, see Wang & Mao Reference Wang and Mao1995: 2–3; Ratliff Reference Ratliff2010: 3). The Hmong-Mien family was once proposed to be a member of the Sino-Tibetan family (see especially Li Reference Li1937/Reference Li1973), but Benedict (Reference Benedict1942, Reference Benedict1972, Reference Benedict1975) believes that the sound correspondences the hypothesis is based on could have resulted from language contact. Following his own observation of related data, he suggests that Hmong-Mien and Austronesian could be genetically related. For the latest discussions of the relationship between Hmong-Mien and Sino-Tibetan, the reader is referred to Wang (Reference Wang2015), and Wang & Liu (Reference Wang and Liu2017).
The speech form investigated in the present study is the Xinzhai variety of Hmu. Xinzhai (新寨) is a small mountain village in Sankeshu (三棵树) Township, which is about 30 kilometers northeast of Kaili City (凯里市) (see Figure 1). Among the 1000 Hmu speakers who reside in Xinzhai, people over the age of eighty are basically monolingual in Hmu. Over the last few decades, with the development of the economy and transportation, the younger generations have largely become bilingual in Southwest Mandarin and Hmu. But Hmu still stands as the primary language of oral communication in family and community events. It does not have its own writing system.
The present Illustration is based on the data collected with the third author, Zhenghui Yang (a.k.a. tɕʰo51 ɕoŋ44 in Hmu), aged 26 years in 2017. Mr Yang is fluent in both Hmu and Chinese (the Southwestern dialect and Mandarin). He worked with the other three authors intensively in 2014, and before and after that has served as the main consultant in the various projects the first author has undertaken.
In the present study, the phonetic characteristics are analyzed and described on the basis of acoustic and EGG (electroglottography) data. The devices used during the recording sessions include a clip-on condenser microphone (Sony ECM-44B), Eggforsingers 7050A, and a laptop computer (Thinkpad, X1 Carbon, Lenovo, Beijing, China). The software used for recording is Adobe Audition 2.0, the sound file format is .wav, with a sampling rate of 44.1 kHz, and 16-bit resolution.
Consonants
As demonstrated in the Consonant Table, the consonant inventory of Hmu comprises 32 phonemes. The places of articulation range from the lips (bilabial) to the glottis (glottal). Non-nasal obstruents (plosives and affricates) show a general two-way manner distinction in terms of aspiration: voiceless aspirated vs. voiceless unaspirated. Fricatives (except for lateral, alveolo-palatal, velar, and glottal fricatives), on the other hand, show a three-way distinction in manners: voiced, voiceless unaspirated, and voiceless aspirated. Nasals of two places of articulation (i.e. bilabial and alveolar) contrast in voice. Moreover, this language contrasts velar and uvular consonants.
The following minimal pairs/sets and near-minimal pairs/sets illustrate the contrasts summarized above:
In our database, all consonant contrasts can occur at the onset position of monosyllabic roots in the language, but only a subset (i.e. the nasals) are observed at the coda position.
Throughout the paper, the transcription of the accompanying sound files is phonemic.
Plosives and affricates
Hmu plosives show four places of articulation (i.e. bilabial, alveolar, velar, and uvular), while affricates are restricted to the alveolar and alveolo-palatal regions. Plosives and affricates are all voiceless in Hmu, and they come with a general two-way contrast in aspiration. That is, they are either voiceless unaspirated or voiceless aspirated. Unaspirated plosives and affricates have shorter VOT duration than the VOT duration of their aspirated counterparts. Consider /ta33/ ‘thick’ vs. /tha33/ ‘to plane’ in Figure 2. In the left panel, the VOT of /t/ is approximately zero because release and voicing take place almost simultaneously. On the other hand, in the right panel, there is a period of aspiration between release and voicing, the VOT duration of /th/ is about 70 ms.
Fricatives
The places of articulation where Hmu fricatives are produced are labio-dental, alveolar, alveolo-palatal, velar, and glottal. The fricatives can be bifurcated into voiced and voiceless subgroups; and the voiceless fricatives can be further subdivided in terms of aspiration. Such a three-way distinction in manner is observed in labio-dental and alveolar fricatives.
Aspirated fricatives in Hmu serve to distinguish Eastern Hmongic varieties from the other Hmongic languages. The reader is referred to Shi, Liu & Yang (Reference Shi, Liu and Yang2017) for details. In Figure 3 and Table 1, it is shown that aspirated fricatives have a longer duration than that of their unaspirated counterparts at the initial position of a syllable.
Perceptually the voiceless unaspirated alveolo-palatal fricative and affricate are often produced with a palatal offglide, but the situation is not as clear for their aspirated counterparts.Footnote 2
Nasals
Hmu nasals can be pronounced at three places of articulation: bilabial (/m/, /m̥/), alveolar (/n/, /n̥/), and velar (/ŋ/). It should be noted that such contrasts in place only occur in syllable-initial position. Word-finally, only [m], [n], and [ŋ] are observed, and their occurrences are predictable based on the vowel preceding them. That is, [m] only occurs after the vowel /o/, [n] occurs after front vowels, and [ŋ] occurs after back vowels; and since /o/ is a back vowel, [m] and [ŋ] are interchangeable, though native speakers seem to prefer [ŋ] rather than [m] in such cases. In other words, the phonemes /m/, /n/, and /ŋ/ are neutralized in syllable-final position. We therefore represent the syllable-final nasal as an archiphoneme /N/, whose place of articulation is determined by the backness of the vowel preceding it, or by whether the preceding vowel is /o/.
When a vowel is followed by a nasal final, the vowel always becomes nasalized. The nasal final consonant can disappear with its preceding vowel nasalized. The factors that underlie these two variations of the nasal final (that is, as a nasal consonant or as a nasalized vowel) have yet to be fully determined. What we have observed so far is that in the citation form (i.e. when the speaker is reading a vocabulary list), the nasalized vowel seems to be the only option; and the nasal consonant occurs more in casual or connected speech.
Consider the following examples of nasal finals in Hmu:
So far in our database, we have found no example in which the schwa (/ə/) is followed by a nasal final.
Hmu nasals also contrast in voicing, but this contrast is limited to the bilabial and alveolar nasals. The velar nasal /ŋ/ does not have a voiceless counterpart. The following examples demonstrate nasals that contrast in voicing: /mu33/ ‘illness’ vs. /m̥u33/ ‘Hmong’, /nɛ33/ ‘grandma (loan word)’ vs. /n̥ɛ33/ ‘sun’.
The production of Hmu voiceless nasals begins with a substantial amount of nasal airflow before the nasal release. The spectrograms in Figure 4 demonstrate the contrast between voiced and voiceless nasals (/m/ vs. /m̻/; and /n/ vs. /n̻/). The voiceless nasals start with a noise portion, and show little or even no voicing before the release of the nasal. Auditorily the voiceless nasals in Hmu could be perceived as ending with a homorganic stop; in fact Fang-Kuei Li also reports that in Black Miao (another variety of Hmu), the voiceless nasal /n̥/ sounds like [n̥th] (Li Reference Li1980: 19).Footnote 3 However, it should be noted that in the acoustic analysis as demonstrated in the spectrograms in Figure 4, no burst occurs between the voiceless nasal initial and the following vowel. What induces the auditory perception of the plosive ending is a topic worth pursuing in future research for Hmongic languages.
On the other hand, if a voiceless nasal occurs after the vowel of the preceding syllable, it is realized with a voiced nasal onset before the voicelessness starts; consider Figure 5.
It is also worth noting that voiceless nasal onsets always make the following vowel nasalized. We have not found any syllable composed of a voiceless nasal initial and a modal (i.e. not nasalized) vowel. The same situation is not observed with voiced nasal onsets.
Laterals
The lateral consonants also show a tripartite distinction among unaspirated voiceless lateral fricative /ɬ/, aspirated voiceless lateral fricative /ɬʰ/, and lateral approximant /l/. This minimal set demonstrates the contrastive status between /l/, /ɬ/ and /ɬʰ/: /la11/ ‘vegetable garden’ vs. /ɬa11/ ‘rich’ vs. /ɬha11/ ‘to cut’ (see Figure 6).
Approximants
Hmu approximants contrast two places: palatal (/j/) and velar (/ɰ/). When produced at an utterance-initial position in a prosodically strengthened way, the two approximants tend to be realized respectively as voiced alveolo-palatal fricative (/j/ → [ʑ]) and voiced velar fricative (/ɰ/ → [ɣ]). In such situations, however, the approximants and their fricative counterparts are interchangeable, resulting in no difference in meaning.Footnote 4
The approximant /j/ is the only consonant that can serve as the second component of a two-member consonant cluster (CC). See the section of the syllable structure for more details.
Vowels
As demonstrated in the diagram and the above examples, Hmu has eight oral monophthongs (/ie ɛ a ə ɔ ou/) and the vowel-to-consonant ratio is about 0.24. The mean vowel-to-consonant ratio in languages of the world amounts to 0.39 (Maddieson Reference Maddieson1984: 9). From this perspective, the Hmu vowel system is simpler than average. Moreover, Hmu vowels do not contrast in length, nasality, or tenseness.
See also the vowel chart plotted on the relative F1/F2 formant values in Figure 7. F1 and F2 were measured at three spots: one quarter of the vowel duration (25%), the mid point (50%), and three quarters of the vowel duration (75%). The mean formant value of a vowel was calculated by averaging over the three measurements. The consonants in the words examined are varied and not controlled for places and manners of articulation.
It should be noted that /a/ is realized as [ɑ] when it is followed by the velar nasal; and /u/ is realized as [əu] after alveolar consonants, such as [təu55] ‘pipeful (measure unit)’.
Tones
Hmu is a tone language. According to our database, monosyllabic words can take eight phonemic tones: five-level tones (High Level: 55, Mid–High Level: 44, Mid Level: 33, Mid–Low Level: 22, Low Level: 11), two rising tones (High Rising: 24, Low Rising: 23) and a high falling tone (51).Footnote 5 See Table 2 for some contrastive sets.
Note: the Proto-Hmong language has been proposed to have four tone categories (level 平, rising 上, falling 去 and entering 入), each of which can further split in two based on the voicing of the onset, resulting in a total of eight tone categories: Yingping 阴平 (T1), Yangping 阳平 (T2), Yinshang 阴上 (T3), Yangshang 阳上(T4), Yinqu 阴去(T5), Yangqu 阳去(T6), Yinru 阴入(T7), Yangru 阳入(T8) (Chang Reference Chang1947, Reference Chang1953, Reference Chang1972; Wang Reference Wang1994: 1; Wang & Mao Reference Wang and Mao1995: 23). The tone categories 1–8 are the labels used in the previous studies on Proto-Hmong reconstruction.
The pitch trajectory of each tone in terms of semitones is illustrated in Figure 8. These plots are based on the pitch contours of a set of monosyllabic words. Two things are especially worth noting. First, the only falling tone (51) is of remarkably shorter duration than the other tones. Second, the f0 trajectories of rising tones T3 (24) and T7 (23) are special in that each trajectory can be divided primarily into two parts based on its movement. In the first part the pitch moves up from low to high through half of the duration, then in the second part it remains level towards the end. This is somewhat different from the rising tone we commonly observe in languages like Mandarin Chinese, for which the pitch trajectory only moves upward. The rising trajectories as observed in Xinzhai Hmu are also found in Yuliang, another Hmu variety (Liu & Zhang Reference Liu and Zhang2016: 202).
An important feature of the Hmu tone system is that it has five-level tones. The Xinzhai variety of Hmu is one of the several Hmongic languages that have up to five-level tones (see Kwan Reference Kwan1971, Wang Reference Wang1994, Wang & Mao Reference Wang and Mao1995, and Kuang Reference Kuang2013a, b for five-level tones in other varieties of Hmu; and see Kong Reference Kong, Xueliang and Wang1992 for five-level tones in Western Hmongic). As far as we know, no other languages in the world has been reported to have as many as five contrastive level tones (Chao Reference Chao1948; Maddieson Reference Maddieson, Greenberg, Ferguson and Moravcsik1978: 338; Kuang Reference Kuang2013a: 76; Reference Kuang2013b: 1).
In addition, tone sandhi is not observed in Hmu (Liu et al. Reference Liu, Yang and Kong2017: 18–22). That is to say, the realization of a tone is not affected by the surrounding tonal environment.
Voice quality
Apart from the pitch and duration differences, voice quality is also a component of Hmu tones. Two phonation types are observed in the production of the vowels in Hmu: modal and breathy voice.Footnote 6 The breathy phonation is only observed with low-level tone /11/. According to our preliminary analysis, the production of the low-level tone (/11/) comes with the largest open quotient value in comparison with that of the other tones. We examined tokens with the low-level tone (/11/) for the ratio of open phase and the whole cycle in the EGG signal, and obtained the open quotient value of 60–65%. With the modal voice having the open quotient of approximately 55% (Kong Reference Kong2001: 172), the EGG signal of the low-level tone suggests that it is produced with breathy voice. This accords with our preliminary judgement during our first field investigation.
The breathy voice of vowels on the low-level tone (/11/) is readily perceived by non-native speakers. Vocal pulses in breathy voice have far less intensity at the higher harmonics and more energy at the fundamental frequency. One can therefore measure the energy of the fundamental frequency (the first harmonic) and higher harmonics, and determine the presence of breathiness by the amplitude accretion of the fundamental frequency and the amplitude decrease of the higher harmonics (Ladefoged & Antoñanzas-Barroso Reference Ladefoged and Antoñanzas-Barroso1985: 79, 81). Another acoustic measure that is commonly used to detect breathiness is the amplitude difference between the first and second harmonics (Fischer-Jørgensen Reference Fischer-Jørgensen1967, Bickley Reference Bickley1982, Kirk, Ladefoged & Ladefoged Reference Kirk, Ladefoged and Ladefoged1984, Maddieson & Ladefoged Reference Maddieson and Ladefoged1985, Huffman Reference Huffman1987 among many others), with expected higher H1–H2 differences for breathy sounds (Andruski & Ratliff Reference Andruski and Ratliff2000: 43, 46; Ladefoged Reference Ladefoged2003: Section 7.2). Acoustic measurements of Hmu data show that the non-modal voice is characterized by less intensity at higher frequencies and higher difference between the first and second harmonics, which is strongly indicative of breathy voice. These acoustic features are illustrated in the spectra in Figure 9, in which the H1–H2 value of modal voice is about 9.7 dB, while it is about 13.5 dB in breathy voice.
The non-modal phonation type that co-occurs with the low-level tone /11/ can vary from person to person. So far breathy, harsh, and nearly modal voices have been observed with tone /11/. The harsh voice is likely a by-product of breathy with very low tone, in which a speaker can tend to lower their larynx. Physiologically, lowering the larynx can not only decrease the longitudinal tension of vocal folds controlled by cricothyroid muscles, but also brings about the engagement of other supralaryngeal muscles, leading to supraglottal constriction (Edmondson et al. Reference Edmondson, Esling, Harris, Li and Ziwo2001: 90; Edmondson & Esling Reference Edmondson and Esling2006: 169–171). The voice quality can therefore be perceived as harsh voice when the vocal folds and ventricular folds are compressed in a narrowing supraglottic space. The production of very low pitch can also induce the aryepiglottic folds to vibrate (Esling Reference Esling, Chapelle, Levis and Munro2013: 3). Through their laryngoscopic experiments of harsh voice, Esling & Harris (Reference Esling, Harris, Hardcastle and Beck2005: 373) figured that when glottal adduction was quickly followed by ventricular incursion, aryepiglottic constriction occurred.
Syllable structure
The canonical Hmu syllable minimally consists of an obligatory nucleus (V) on a specific tone. The syllable may also comprise up to three optional elements in the following linear structure: (C) (C) V (C). The archiphonemic nasal (N) is the only syllable-final consonant, and is often merged with the vowel, resulting in a nasalized vowel. A complex (CC) initial always has /j/ as the second element.
Similar to its linguistic neighbors, Hmu is phonologically monosyllabic with a strong tendency towards disyllabicity in its lexicon. Multi-syllabic words are mostly compounds.
Transcription of the recorded passage
The passage is a Hmu version of ‘The North Wind and the Sun’ story, transcribed phonemically using the symbols presented and discussed in the main text of this illustration. The tones are marked for the eight tones described above, with Chao's five-scale notation. In the transcription, the symbol ‘ǀ’ marks a pause, while ‘ǁ’ marks the end of a complete sentence. Interlinear glossing and a free translation are also provided.
Abbreviations
Acknowledgements
This work was funded by Ministry of Education of the People's Republic of China (No. 17JJD740001). We would like to thank the three anonymous reviewers for their detailed and insightful comments. We also would like to thank the Editor of JIPA, Prof. Amalia Arvaniti, for her very helpful suggestions.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0025100318000336