1 Introduction
Tŝilhqot’inFootnote 1 (ISO 639-3) is a Northern Dene language spoken in the central interior of British Columbia, Canada (see Figure 1). According to the 2018 Report on the Status of B.C. First Nations Languages, there are currently 765 fluent speakers, the largest number of speakers of any First Nations language spoken strictly within the boundaries of B.C. (Dunlop et al. Reference Dunlop, Gessner, Herbert and Parker2018).
The bulk of the existing linguistic research on Tŝilhqot’in has been done by Eung-Do Cook, culminating in his 2013 A Tsilhqút’ín Grammar. While Cook’s grammar includes thorough descriptions of Tŝilhqot’in phonology, morphology, and syntax, it does not provide any details on the phonetic structures of the language. Tŝilhqot’in has an extremely rich sound inventory, including features typical of Dene languages (e.g. tone) as well as ones shared with neighbouring Salish and Wakashan languages (e.g. contrastive pharyngealization). Table 1 provides the consonant inventory, including a three-way voicing contrast in obstruents, a velar–uvular contrast (including secondary labialization), and a plain–pharyngealized contrast in coronal fricatives and affricates. Table 1 is organized phonetically, based on the International Phonetic Association’s (2015) International Phonetic Alphabet chart. However, it is important to note that Cook (Reference Cook2013) groups the sounds /l z zʕ j w ʁ ʁw/ together as ‘voiced continuants (spirants)’ (p. 15), based on phonological evidence. In van Eijk’s (Reference van Eijk1997) work on neighbouring St’át’imcets (Interior Salish), /z zʕ ʁ ʁw/ are grouped together with /l l’ lʕ lʕ’ j j’ w w’ ɣ ɣ’ ʕ ʕ’ ʕwʕ’w ʔ h/, in this case as (voiced) ‘resonants’ (p. 2).Footnote 2 Thus, both Cook and van Eijk recognize that voiced fricatives and (non-nasal) resonants share certain phonological properties; we return to the relevance of their classification system for the study of /z/ and /zʕ/ in Section 4.
Tŝilhqot’in also has a relatively complex vowel system: tense (long) vowels (/i a u/) contrast with lax (short) ones (/ɨ e o/), and each of these six underlying vowels has (at least) two realizations, one occurring in retracted environments, termed ‘flat’ by Cook (Reference Cook1993, Reference Cook2013), and the other occurring elsewhere, termed ‘sharp’ by Cook (Reference Cook1993, Reference Cook2013). The presence of non-retracted vs. retracted vowel allophones is a reliable perceptual cue to the quality – retracted or not – of adjacent consonants (see Sections 3.1.1 and 3.1.2 below). In fact, a Tŝilhqot’in mother-tongue speaker and language expert that co-author Bird has worked with has talked about reforming the orthography, so that the diacritic used to mark pharyngealization on consonants (< ˆ >) is instead written on the vowels. At least in some dialects (Stone in particular), there is evidence for contrastive nasalization (Cook Reference Cook2013: 21). Finally, Tŝilhqot’in is a tone language: vowels can have high (marked) or low (unmarked) tone.
The study described here focuses on the voiced alveolar plain and pharyngealized fricatives, /z/ and /zʕ/. Contrastive pharyngealization is a key feature of the Tŝilhqot’in sound system (see Table 1), and is also found in adjacent Interior Salish languages (Bessell Reference Bessell1992, Shahin Reference Shahin2002, Namradan Reference Namdaran2006). Cook (Reference Cook1993) provides a phonological analysis of the local and non-local assimilation (retraction) processes that are triggered by pharyngealized consonants in Tŝilhqot’in, but no detailed phonetic work has yet been done on these sounds. In St’át’imcets (Interior Salish), which is spoken to the south-east of Tŝilhqot’in (see Figure 1), pharyngealized coronal consonants are articulated with significant tongue root retraction towards the lower pharyngeal wall, pulling back both preceding and following vowels (Namdaran Reference Namdaran2006). This results in raised F1 and lowered F2 values associated with pharyngealized consonants compared to their plain counterparts (Shahin Reference Shahin1997, Reference Shahin2002), a pattern reflective of pharyngealized sounds cross-linguistically, including consonants (Shar & Ingram Reference Shar and Ingram2010, Al-Tamimi Reference Al-Tamimi2017) and vowels (Chiu & Sun Reference Chiu and Sun2020).
Cook (Reference Cook1993) describes both /z/ and /zʕ/ as ‘spirants rather than fricatives, i.e. non-strident, especially in syllable final position, so that they are sometimes perceived mistakenly as dark l’ (p. 159). In his more recent work, Cook (Reference Cook2013) does not say anything particular about the phonetic realization of /z/ and /zʕ/, although he does describe a more general weakening (lenition) process of continuant consonants in coda position, i.e. a ‘stronger articulation (fricative) in initial position and weaker articulation (spirant, glide) in final position of the continuants’ (p. 44). This lenition process is unusual in that, cross-linguistically, lenition occurs most often in intervocalic position (Ennerver et al. Reference Ennever, Meakins and Round2017, Katz & Pitzanti Reference Katz and Pitzanti2019). We return to the question of whether ‘lenition’ is the best term to use to describe the observed variation in /z/ and /zʕ/ in Section 4.2.
To our ears (as trained phoneticians), /z/ and /zʕ/ clearly have an elusive phonetic target, exhibiting substantial variation beyond stronger vs. weaker manners of articulation. In previous work on the language, co-author Bird has transcribed these sounds using a variety of symbols: [z z̞ zð zʁ ɫ ɫð ʁ ʁ̞ ɮ] (Bird Reference Bird2014). The purpose of this study is to characterize the phonemes /z/ and /zʕ/ phonetically, to determine (a) what the possible phonetic variants of these phonemes are, and (b) to what extent these variants are systematically distributed, based on prosodic position and segmental environment. In terms of (b), our expectations are threefold: (i) Lenited realizations will be more common in coda position than elsewhere (Cook Reference Cook2013: 44). Furthermore, weakening may be affected by adjacent segments, although previous studies disagree on precise effects (Kirchner Reference Kirchner2001, Reference Kirchner, Hayes, Kirchner and Steriade2004; Ennerver et al. Reference Ennever, Meakins and Round2017). (ii) Retracted realizations will also be more common in coda position than elsewhere, based on cross-linguistic findings showing that, in articulatorily complex segments, tongue body articulations (as opposed to tongue tip articulations) are more prominent in coda than in onset position (Krakow Reference Krakow, Huffman and Krakow1993, Gick et al. Reference Gick, Campbell, Oh and Tamburri-Watt2006). Finally, (iii) in terms of lateralization, following Cook’s (Reference Cook1993) observations, we anticipate that lateralized realizations will be more common in coda position than elsewhere.
2 Method
The study reported on below came out of an elicitation session in the fall of 2013 with a single speaker who had worked closely with Eung-Do Cook in his linguistic fieldwork in the 1970s. She grew up speaking Tŝilhqot’in in the home, where her mother was a monolingual speaker. She is bilingual in English, and has continued to be involved in language work, as a linguist and as a teacher. At the time of the elicitation session, co-author Sonya Bird was in Tŝilhqot’in territory for other reasons and had the unique opportunity to work with her for an afternoon. Because we only had a very limited time together, we made do with available word lists, based primarily on Cook’s (Reference Cook, Cook and Rice1989, Reference Cook1993, Reference Cook2004) materials; this is reflected in the unevenness of the token numbers across conditions, summarized in Table 2 below. Unfortunately, we were limited to making audio recordings, and so were not able to capture articulation directly–this is clearly an area of future exploration.
We recognize that working with a single speaker and using recording materials that were not all specifically tailored for this study has implications in terms of the reliability of the patterns described below. We note that the variable realizations of /z/ and /zʕ/ in the speaker’s recordings are matched by several speakers in the FirstVoices Tŝilhqot’in (Xeni Gwet’in) language portal (FirstVoices 2021),Footnote 3 providing evidence that these are robust patterns that hold across speakers of the language.
2.1 Stimuli and recording procedure
Stimuli consisted of words extracted from Cook’s (Reference Cook, Cook and Rice1989, Reference Cook1993, Reference Cook2004) materialsFootnote 4 illustrating Tŝilhqot’in sounds and complemented by materials available from a Field Methods course offered in the spring 2006 at the University of Victoria by Dr. Leslie Saxon. Impressionistically, /z/ and /zʕ/ share phonetic properties with /ʁ/ and /l/, especially in coda position. Therefore, the set of words analyzed for this study included ones containing /z/ and /zʕ/ as well as representative tokens of /ʁ/ and /l/ for comparison. The words themselves ranged in duration from one to six syllables, the majority being disyllabic, with three- and four-syllable words also being common (see Appendix A for the full word list).
The word list was recorded in a quiet room in the home of the speaker’s sister, using a Zoom H4N portable recorder and a head-mounted microphone set at approximately 3cm from the consultant’s mouth. The microphone was kept in a fixed position for the entire recording session, ensuring that intensity could be reliably compared across tokens (Kingston Reference Kingston2008: 19). For each word, the speaker was asked to check that she knew the word, and (if so) to repeat it three times in a row. Because, in some cases, the pronunciation of the target sounds (/z/ and /zʕ/ in particular) varied across repetitions, all three repetitions of each word were included in the analysis.
Table 2 summarizes the number of tokens per phoneme analyzed,Footnote 5 organized by position. Note that the token counts are unevenly distributed and, in some cases, quite small. They also do not include word-initial onsets. For /z/ and /zʕ/, the token counts reflect the distribution of the sounds in available written materials. Cook (Reference Cook2013: 16) notes that his corpus does not include any word-initial /z/ or /zʕ/ tokens. Based on the materials available to us, /z/ also seems relatively infrequent in non-intervocalic, word-medial onset position; in short, /z/ occurs most often in VCV and VC# positions. The distribution of /zʕ/ is somewhat broader, including (in this dataset) a fair number of word-medial coda (VCCV) tokens as well. Note also that lexical tone was not incorporated into our study, because (i) we had no indication it might affect the realization of the target segments, and (ii) our dataset did now allow us to include it as a predictor variable, given that only seven words had a lexical high tone.Footnote 6 /l/ and /ʁ/ did not vary much in their realization and were included for comparison only, and as such made up relatively small sets.
V = vowel; C = consonant; target consonant is bolded and underlined.
In 41 of the 308 elicited word tokens, the target phoneme was not present phonetically, most often in coda position (29/41). This turned out to be the case for all 15 tokens with /ʁ/ in coda position (12 final and three medial) e.g. bilogh /biluʁ/ ‘knife’ was pronounced [biloː] and for 12/45 words with /zʕ/ in medial coda position e.g. tiẑlin/tizʕlin/ ‘Chilco Lake’, pronounced [tɛɫɛin]. In addition, /z/ was not realized in final coda position in 2/38 words. Note that in such cases of deletion, underlying consonantal retraction was generally still evident in the adjacent vowels e.g. in /biluʁ/, /u/ is retracted to [o] (see Cook Reference Cook2013: 24 example c.); in /tizʕlin/, both /i/ vowels are retracted, as is /l/.Footnote 7 Deletion in coda position is mentioned in Cook (Reference Cook2013: 44) as a process affecting all continuant consonants; it was therefore not surprising to observe in this data set. In addition, target phonemes were not realized in 9/18 words with /ʁ/ in intervocalic position and 3/37 words with /zʕ/ in intervocalic position. All medial onsets (VCCV) were realized phonetically.
2.2 Data analysis
Each token was analyzed qualitatively (Section 2.2.1) and quantitatively (Section 2.2.2), with a primary focus on the target consonant itself. As mentioned above, vowels provide robust information about the quality – retracted vs. non-retracted – of adjacent consonants. As such, we did include vowels adjacent to the target consonants in our analysis. However, small sample sizes by vowel meant that we were not able to conduct reliable statistical analyses on them.
2.2.1 Qualitative analysis
Qualitatively, tokens were categorized in terms of (i) manner of articulation (two levels: non-lenited vs. lenited), based on the presence of visible frication and/or formant structure (Lee-Kim Reference Lee-Kim2014, Shao & Ridouane Reference Shao and Ridouane2018, Katz & Pitzanti Reference Katz and Pitzanti2019); (ii) retraction (two levels: non-retracted vs. retracted), based on the quality of the target consonant and the adjacent vowels (observed auditorily and visually); and (iii) laterality (two levels: non-lateral vs. lateral), based primarily on auditory observation. Katz & Pitzanti (Reference Katz and Pitzanti2019: 11) have pointed out the limitations of using subjective judgments to ‘force a binary classification onto continuous phonetic properties’. Given the complexity of the observed variation in /z/ and /zʕ/, it seemed nonetheless likely that these discrete categorizations would be useful in describing /z/ and /zʕ/ realizations and their distribution. Note that we use the general term ‘retraction’ to describe perceived backing of the tongue body (in both /z/ and /zʕ/), without specifying precisely where the articulatory target of this backing is (raised vs. lowered).
Coding was done using Praat textgrids (Boersma & Weenink Reference Boersma and Weenink2018); six tiers were used, to mark segments (both target consonants and adjacent vowels) of interest, token number, underlying consonant and position, phonetic realization of manner specifically, phonetic realization more generally (via phonetic transcription) of consonant, adjacent vowel quality (sharp vs. flat), and phonemic transcription of adjacent vowel. We recognize that there are certain limitations to our auditory coding, and consequently to the acoustic analyses that are based on this auditory coding (Section 3.1.2). First, although we are trained phoneticians, we are not speakers of Tŝilhqot’in, and the auditory cues we used to classify sounds may not correspond exactly to those that fluent speakers would use; future work should include a perceptual study with fluent speakers, especially to disentangle realizations coded as having ambiguous (to us) place features. Second, many of the realizations were ambiguous, varying along dimensions (e.g. of lenition) in continuous ways, making auditory judgments challenging.Footnote 8 To increase the reliability of our analyses, we coded the data in a two-step process. Initial auditory coding and transcription was carried out by co-author Bird. Subsequently, co-author Onosson independently transcribed the entire dataset, using a Praat textgrid that did not specify the underlying phonemes of the target sounds so as to minimize potential bias towards any particular phonetic realizations. We initially agreed on the transcriptions of 223 out of 267 tokens (an inter-rater agreement rate of 83.5 $\%$ ), both having noted a number of individual tokens of uncertain quality. A consensus of opinion was reached on several of these, bringing the number of agreed-upon tokens to 230, for an overall inter-rater agreement rate of 86.1 $\%$ . For those tokens which were not fully agreed-upon (37 of 267, or 13.9 $\%$ ), co-author Bird’s auditory coding was used in the quantitative analysis, given her more extensive experience listening to these sounds and the language more generally.
2.2.2 Quantitative analysis
Quantitative analysis focused on several measurements automatically extracted using a Praat script. Within the target consonants themselves, we measured duration, mean intensity and band-pass filtered zero crossing rate (bp-zcr) as correlates of lenition, as well as spectral moments 1–4 and mean F1 & F2 (Hz) as correlates of retraction and lateralization.
Mean intensity (dB) across the duration of the target consonants was extracted from the Intensity object created from the Sound object using Praat’s To intensity… function. Bp-zcr has been used as an alternative to harmonic to noise ratio (HNR) to quantify noisiness in a signal without reference to periodicity (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Fuchs, Toda and Żygis2010, Westerberg Reference Westerberg2018); the higher the bp-zcr value, the more noisy (i.e. less lenited) the sound is. Bp-zcr was measured in Praat following Westerberg (Reference Westerberg2018), separately within each one-third of target consonant duration, by dividing the number of zero crossings (taken from a PointProcess ‘zeros’ object, set by default to include both ‘raisers’ and ‘fallers’) by a third of the token duration. A mean per-token bp-zcr was then calculated in R by averaging across the three one-third rates. Spectral moments (centre of gravity, standard deviation, skewness and kurtosis) were included in our analysis, as measures of fricative place of articulation (Jongman, Wayland & Wong Reference Jongman, Wayland and Wong2000). They were measured within a 30 ms Hamming window from the centre of each token, band-pass filtered between 200 Hz and 22,050 HzFootnote 9 and extracted at a power setting of 2.0 (the default). In addition to spectral moments, we also measured the mean first and second formants within the consonants themselves (Jassem Reference Jassem1965, Soli Reference Soli1981, Alwan Reference Alwan1986, Jongman et al. Reference Jongman, Wayland and Wong2000) because, even for realizations coded as (voiced) fricatives, formant structure was often visible within the consonant. F1 and F2 were calculated using the burg formula and the following settings: five formants, ceiling of 5500 Hz, 25 ms window length, and 50 Hz pre-emphasis.
In addition to measuring various acoustic properties within the target consonants themselves, we also measured F1 and F2 (using the same parameters as for the consonants) within the vowels preceding and following /z/ and /zʕ/ since, as mentioned above, vowel quality is a robust and reliable perceptual cue of retraction, or at least phonological pharyngealization, in Tŝilhqot’in. We split the vowels into three equal thirds (beginning, middle, end), and measured mean F1 and F2 in each third. We referred to formants in the first third when describing vowels following target consonants, and to the last third when describing vowels preceding target consonants.
2.2.3 Statistical analysis
All statistical analyses of acoustic features and distributional properties were conducted in R (R Core Team 2020) running in RStudio (RStudio Team 2020) and used several sub-components of the tidyerse R package library (Wickham et al. Reference Wickham, Mara, Jennifer, Winston, McGowan, Romain, Garrett, Alex, Lionel, Jim, Max, Pedersen, Evan, Bache, Kirill, Jeroen, David, Seidel, Vitalie, Kohske, Davis, Claus, Kara and Hiroaki2019) as well as the stats package from the base R library for specific statistical functions. Statistical analyses were carried out using the following formulas. Chi-square tests of distributions: chisq.test(table(variable1, variable2)). ANOVAs: aov(dependent.variable ∼ independent.variable (*ind.var2 *ind.var3)). Post-hoc testing for significant interactions within multivariate ANOVAs was carried out using Tukey’s Honest Significant Difference (Tukey Reference Tukey1953): TukeyHSD(anova.test.result). Statistical significance was determined for all tests at p < .05; we omit specific p-values when reporting results generally, except where discussing findings which fail to meet this threshold. Qualitative or categorical variables which were tested include the following factors: phoneme (two levels: /z zʕ/), preceding or following vowel (four levels: /a e i u/), preceding or following consonant (four levels: /ʁ l z zʕ/), manner (two levels: lenited, non-lenited), retraction (two levels: retracted, non-retracted), laterality (two levels: lateral, non-lateral). Quantitative or continuous variables which were tested included the following factors: duration, intensity, band-passed zero-crossing rate (bp-zcr), F1, F2, centre of gravity, standard deviation, skewness, and kurtosis.
3 Results
In general, /z/ and /zʕ/ vary substantially from token to token and even from repetition to repetition within a given token, giving the impression of a somewhat underspecified articulatory and/or acoustic target. Nonetheless, certain patterns do emerge, pointing to syllabic (onset vs. coda) and segmental (adjacent segment) effects on phonetic realization. Results are presented in two parts: first, we describe the different phonetic realizations of /z/ and /zʕ/ with accompanying acoustic analysis (Section 3.1); second, we analyze the distribution of the different phonetic realizations, according to syllabic position and segmental environment (Section 3.2).
3.1 Phonetic realizations of /z/ and /zʕ/
In this section, we describe the acoustic and auditory variants of /z/ and /zʕ/. In Section 3.1.1, we compare the two phonemes to each other, to get a general sense of how /z/ and /zʕ/ differ phonetically. In Section 3.1.2, we explore surface realizations of both phonemes in more detail, in terms of variation in lenition, retraction, and lateralization as well as ‘buzziness’ (likely corresponding to dentalization–see below).
3.1.1 Acoustic correlates of /z/ vs. /zʕ/
The phonetic realizations of /z/ and /zʕ/ included relatively similar ranges of variation in manner, retraction, and laterality. Before exploring this variation in more detail, it is useful to compare the two phonemes phonetically in terms of their overall phonetic features.
As we shall see in Section 3.2, /z/ vs. /zʕ/ differ somewhat in the frequency of different realizations (see Tables 3 and 4). Nonetheless, we hypothesized that, overall, /zʕ/ realizations would exhibit acoustic measures reflective of pharyngealization–in particular, having raised F1 and lowered F2 within both /zʕ/ and in adjacent vowels (Shahin Reference Shahin2002, Namdaran Reference Namdaran2006). A set of one-way ANOVAs (phoneme) was conducted across our suite of acoustic parameters to look for differences by phoneme (appendix Table B1). Acoustic correlates which distinguish between /z/ vs. /zʕ/ vary depending on the realization, and often do not reach the level of statistical significance, most likely due to low token counts in our data for certain realizations. Nevertheless, the trends are consistent: realizations deriving from underlying /zʕ/ tend to have longer duration, lower intensity, lower F2, and greater skewness, in comparison to realizations deriving from underlying /z/. Figure 2 plots kernel density estimate distributions (Rosenblatt Reference Rosenblatt1956, Parzen Reference Parzen1962) of the various acoustic measures by phoneme. Those correlates which meet the level of statistical significance across all realizations are intensity (–1.6 dB for /zʕ/), F2 (–303 Hz for /zʕ/), cog (+228 for /z/), and standard deviation (+225 for /z/).
Figure 3 provides kernel density estimate plots of /z/ and /zʕ/ distributions by mean F1 (y-axis) and F2 (x-axis), as measured within the consonants. Overall, Figure 3 shows that the two consonants differ substantially along F2 but much less so along F1; this is also the case for the vowels adjacent to /z/ and /zʕ/ (Figure 4), similar to Zawaydeh & de Jong’s (Reference Zawaydeh and de Jong2011) findings in Ammani-Jordanian Arabic. This indicates that, unlike in other languages (see Al-Tamini Reference Al-Tamimi2017 on Arabic, and Shahin Reference Shahin1997on St’át’imcets), what is termed ‘pharyngealization’ in Tŝilhqot’ in is manifested primarily as backing, without substantial lowering.
As predicted based on Cook (Reference Cook2013) and also described for Arabic (Laver Reference Laver1994, Embarki et al. Reference Embarki, Slim Ouni, Guilleminot and Al Maqtari2011), adjacent vowels also reliably correlate with phonemic pharyngealization. For example, the final /a/ of teẑilhchaz /tezʕiɬt͡ʃaz/ ‘I started to fry it’ is realized as [ӕ] (with a relatively high F2) whereas in telhant’aẑ /teɬant’azʕ/ ‘crowberry’ it is realized as [ɑ] (with a relatively low F2).Footnote 10 Figure 4 provides the acoustic vowel spaces before /z/ vs. /zʕ/ (a) and after /z/ vs. /zʕ/ (b). This plot is based on F1 and F2 measurements averaged over the third of the vowel that is adjacent to the target consonant, i.e. the last third for vowels preceding /z/ and /zʕ/ and the first third for vowels occurring after /z/ and /zʕ/. Although there are many more vowel tokens preceding than following /z/ and /zʕ/, the general pattern is the same: the vowel space is substantially further back (lower F2) and slightly lower down (higher F1) adjacent to pharyngealized /zʕ/ compared to its non-pharyngealized counterpart /z/. These results mirror those represented for the consonants themselves, in Figure 3.
To determine statistical significance of formant differences, two-way ANOVAs (phoneme, retraction) were conducted for each formant per vowel according to the preceding (appendix Table B2) or following (appendix Table B3) phoneme (for /u/, its scarcity in our dataset meant that we were only able to make comparisons when preceding but not following /z/ vs. /zʕ/). For vowels following the target consonant, the main differences occur in F2: /a/, /e/, and /i/ all have higher F2 values following /z/ than /zʕ/. The only significant F1 difference is for /e/, with F1 being lower following /z/ than /zʕ/. For vowels preceding the target consonant, only /a/ has a significantly higher F2 before /z/ than /zʕ/; only /i/ shows a significant F1 difference, but in the opposite direction than expected: F1 is higher before /z/ than /zʕ/. Although statistical analysis is limited in reliability because of small sample sizes, the trends for F2 are consistent with the consonant-internal measurements (Figure 3). They also suggest that retraction effects are stronger on vowels which follow consonants compared to vowels preceding consonants, which is somewhat surprising given documented phonetic effects in other languages (Nolan Reference Nolan2017) and the phonological effects of retraction described for Tŝilhqot’in and more generally (Cook Reference Cook1993, Zawaydeh & de Jong Reference Zawaydeh and de Jong2011).
As we shall see below, retraction is one component of the phonetic variation observed in both /z/ and /zʕ/, in addition to being contrastive at the phonological level in the form of pharyngealization. One of the most interesting things about the Tŝilhqot’in patterns described here is this blurred role of retraction in the sound system.
3.1.2 Phonetic variation across /z/ and /zʕ/
In analyzing /z/ and /zʕ/ auditorily, we noted variation along three dimensions: degree of lenition, degree of retraction, and lateralization. In this section, we explore the acoustic correlates of each of these dimensions in turn. The results are complex as a result of many interacting factors, and not all tendencies reach statistical significance. In this section, we report on clear tendencies; full statistical analyses are provided in appendix Tables B4–B8.
Considering lenition first, both /z/ and /zʕ/ vary in how lenited their realizations are, from clear fricatives to barely present approximants. We hypothesized that lenited forms should differ from non-lenited forms acoustically in terms of intensity and bp-zcr in particular (see Tables B6–B8). Figure 5 provides a density comparison of bp-zcr, which showed the larger effect size, in /z/ and /zʕ/ tokens coded as lenited vs. non-lenited. The distribution of bp-zcr values is much flatter for non-lenited (dashed lines) than for lenited tokens and, crucially, the mean is much higher, as expected: 3787 for non-lenited vs. 1337 for lenited.
Turning to retraction, we observed during auditory coding that it tended to coincide with lenition. We tested the correlation between the two, finding that they are indeed highly correlated ( $ \chi^{2} $ = 130.85, df = 1), such that lenited realizations also (strongly) tend to be retracted (of the 167 tokens coded as retracted, 153 or 92 $\%$ were also coded as lenited). We expected retraction to be reflected in both consonantal and vocalic measures of place (spectral moments and formants). No significant independent effect of retraction was found on the consonantal measures (see Table B6). Retraction did tend to have an effect on the quality of adjacent vowels, although small token numbers per vowel made it difficult to reliably test this effect statistically (see Tables B2, B3). Figure 6 shows vowel formants plotted according to the following phoneme (/z/ or /zʕ/, the condition for which we have the most data) and for retraction (as coded auditorally). In general, F2 values largely match our expectations, tending to be lower adjacent to retracted versus non-retracted tokens (this effect was only significant in the case of /i/ occurring after a retracted consonant). For F1, values are lower for /e/ and higher for /i/ following retracted consonants, and lower for /u/ preceding retracted consonants. These preliminary F1 results indicate that the retraction we hear on both /z/ and /zʕ/ is possibly more accurately described as uvularization (see Zawaydeh & de Jong Reference Zawaydeh and de Jong2011 on Ammani-Jordanian Arabic), involving backing and slight raising rather than lowering. This would explain both lowering of F1 in /e/ vs. raising in /i/, as well as lowering of F1 in /u/, which is realized as [o] in retracted environments (Cook Reference Cook2013).
The observed variation in retraction and lenition was such that, in some cases, both /z/ and /zʕ/ were realized as retracted and lenited [ʁ̞], which is also a common realization of underlying /ʁ/. In Figure 7, we compare what was transcribed phonetically as [ʁ̞] from intervocalic /zʕ/ vs. underlying /ʁ/, in the word teẑighin /tezʕiʁin/ ‘I started to pack or haul it’. The realization of both /zʕ/ and /ʁ/ is a short approximant [ʁ̞] (/zʕ/: 55 ms; /ʁ/: 48 ms), although underlying /zʕ/ is somewhat more lenited than underlying /ʁ/ in terms of intensity (69 dB vs. 65 dB) and formant structure (clearer in /zʕ/ than in /ʁ/). The vowel between /zʕ/ and /ʁ/ is underlyingly /i/; its transitional nature both out of the preceding /zʕ/ and into the following /ʁ/ indicates retraction of both flanking consonants.
Given the apparent neutralization of the phonemic contrast between /zʕ/ and /ʁ/ (Figure 7), and also /z/, it is worth examining the phonetic realization of these three phonemes, to see whether they are still distinguishable from one another acoustically. Figure 8 plots F2 (x-axis, as the most reliable correlate of place) and bp-zcr (y-axis, as the most reliable correlate of manner) by underlying phoneme for surface realizations of [ʁ̞]. Phonemic /ʁ/ mainly varies in bp-zcr over a limited range of F2 values, meaning it varies a fair amount in manner (degree of lenition), but is relatively consistent in place. Conversely, [ʁ̞] realizations of /z/ and /zʕ/ vary mainly in F2 over a limited range of bp-zcr values, meaning they vary in place (degree of retraction), with /z/ being somewhat more forward than /zʕ/ overall, but not much in manner (degree of lenition). We used one-way ANOVAs (phoneme) to test for significant differences among [ʁ̞] realizations corresponding to different phonemes (see Table B4). F2 was found to differ significantly by phoneme, but only between /z/ and /ʁ/, with a mean difference of +209 Hz for /z/. For bp-zcr, /ʁ/ differed significantly from both /z/ (+782) and /zʕ/(+682), /ʁ/ exhibiting less lenition than /z/ and /zʕ/; /z/ and /zʕ/ did not differ from each other in bp-zcr. Other significantly different measures among [ʁ̞] realizations included intensity (lower for /ʁ/, no significant difference between /z/ vs. /zʕ/), centre of gravity (higher only for /ʁ/ vs. /z/), and standard deviation (higher only for /ʁ/ vs. /z/).
Laterality is the third dimension of variation observed in the dataset. Especially in coda position (see Section 3.2), many tokens of both /z/ and /zʕ/ were coded as dark [ɫ]. Figure 9 provides a comparison of /l/ and /z/ in séla ninq’ez /séla ninq’ez/ ‘my hands are cold’, both of which are realized as lateral approximants. Lateralized realizations of /z/ and /zʕ/ typically sounded more retracted than underlying /l/. In Figure 9, this is reflected by the formant values of /l/ vs. /z/, especially F2 (1296 Hz for /z/ vs. 1800 Hz for /l/). In addition, /z/ is preceded by the retracted allophone of /e/: [ʌ]. Statistical analysis shows that tokens coded as lateral generally have longer durations than ones coded as non-lateral, in addition to having lower intensity and higher skewness values (Table B7).
Similar to the potential neutralization of /z/, /zʕ/ and /ʁ/ resulting from combined effects of lenition and retraction (Figures 7 and 8), lateralization potentially leads to neutralization of /z/, /zʕ/, and underlying /l/. Figure 10 plots F2 (x-axis; as the most reliable correlate of place) and bp-zcr (y-axis; as the most reliable correlate of manner), by underlying phoneme. One-way ANOVAs (phoneme) show that surface lateral realizations are distinguished by several acoustic parameters (Table B5), including F2 (higher in /z/ vs. /l/ and /zʕ/, no significant difference between /l/ and /zʕ/) and bp-zcr (lower in /l/ vs. /z/ and /zʕ/, no significant difference between /z/ and /zʕ/). The F2 results are surprising, since our perception was that /l/ is realized as a lighter lateral than both /z/ and /zʕ/ (see Figure 9). Bp-zcr results reflect the fact that /l/ is a true (and consistent) approximant, whereas /z/ and /zʕ/ are more variable in manner, even when coded as [ɫ].
In addition to the three main dimensions of lenition, retraction, and lateralization, we observed what we called ‘buzziness’ in some of the /z/ and /zʕ/ tokens (39/88 and 43/129, respectively). Unfortunately, it was not possible to video record the elicitation session, but in discussing the articulatory details of /z/ and /zʕ/ with the speaker, she confirmed that she clenched her jaw during these sounds, as her mother had taught her. She also mentioned that, when she taught these sounds to children, she showed them her teeth and told them to keep their teeth closed. This articulatory tension sometimes leads secondary dentalization superimposed on the primary articulation, resulting in the auditory impression of buzziness (see also Zhou & Wu Reference Zhou and Wu1963 and others on the Chinese apical vowel). Acoustically, the buzzy nature of some [z] and [zʕ] tokens is reflected in their spectral composition: it is not quite as ‘clean’ as that of canonical /z/, with more noise throughout the frequency ranges, and especially below 4500 Hz.
In our data, buzziness was strongly correlated with manner, ( $ \chi^{2} $ = 62.809, df = 1), with 61 $\%$ of buzzy tokens also coded as fricatives and 89 $\%$ of non-buzzy tokens coded as approximants. Given the speaker’s description of /z/ and /zʕ/, it is not surprising that buzziness occurred primarily as a secondary effect of jaw clenching specifically in the more closed (fricative) realizations of the phonemes. Statistically, we did not find a reliable independent effect of buzziness on bp-zcr, as expected based on previous literature (Gordeeva & Scobbie Reference Gordeeva, Scobbie, Fuchs, Toda and Żygis2010, Westerberg Reference Westerberg2019). We did however find interactions between buzziness and other factors (Table B8). In terms of manner, buzzy tokens had lower cog values than non-buzzy tokens within non-lenited tokens specifically.
We end this section by noting that we observed a number of tokens that were ambiguous and/or transitional in their realization. With respect to manner, several word-final tokens transitioned from a relatively open (approximant) to a relatively closed (fricative) sound e.g. the final /z/ in jíz/d͡ʒíz/ ‘inside’ was realized as [ʁ̞͡ʁ]. Note that this pattern of realization is opposite to what has been described for Chinese apical vowels (discussed in Section 4), which go from more to less constricted (Shao & Ridouane Reference Shao and Ridouane2018, Reference Shao and Ridouane2019). With respect to place, realizations were also especially variable in coda position, with token(s) coded as [ð], [z∼ð], [l] (light), [ʁ̞∼ɮ], [ɬ∼ɮ], and [ʁ̞∼ɬ∼ɮ], [ɮ∼ʁ̞]. Such realizations point to the fact that /z/ and /zʕ/ varied not only along three dimensions, but also along continua within these dimensions, and especially in coda position.
Summarizing so far, /z/ and /zʕ/ differ from one another in intensity (likely because of lenition patterns) as well as in acoustic features associated with place, with /zʕ/ showing lower F2, cog, and standard deviation values in particular, indicating tongue body/root backing. For both consonants, lenited tokens were associated with higher intensity and lower bp-zcr (correlates of manner), as well as lower F2, centre of gravity & standard deviation, and higher skewness & kurtosis (correlates of place, reflecting the strong correlation between retraction and lenition). No significant effects of retraction were found within the consonants themselves, but adjacent vowel formants (F2 in particular) showed that retraction was associated with tongue body/root backing (raised F2). This preliminary finding supports the fluent speakers’ perceptions (see Section 2) that Tŝilhqot’in retraction is carried on the vowels rather than the consonants. Lateralization of /z/ and /zʕ/ was associated with longer duration, lower intensity, and higher skewness. Finally, buzzy tokens were significantly correlated with non-lenited forms, and were associated with lower cog values than non-buzzy tokens.
3.2 Distribution of /z/ and /zʕ/ variants
Now that the phonetic variants of /z/ and /zʕ/ have been described, we consider their distribution across prosodic positions (Section 3.2.1) and segmental environments (Section 3.2.2).Footnote 11 The analysis is based on our auditory classification of /z/ and /zʕ/ realizations (via transcription), as supported by the acoustic measures summarized in Section 3.1.
3.2.1 Phonetic realizations across prosodic positions
Based on Cook’s (Reference Cook1993, Reference Cook2013) descriptions of /z/ and /zʕ/ as well as on more general effects of sonority (Clements Reference Clements1990), we expect the clearest predictor of phonetic realization to be prosodic position, with more lenited and lateral realizations in coda position than elsewhere. Tables 3 and 4 summarize the number of tokens of each phonetic realization by underlying consonant, in intervocalic position (Table 3) and in final coda position (Table 4). We focus on these positions because they are the most common ones in our dataset (see Table 2 above).
Intervocalically (Table 3), the most common realization of /z/ is a retracted, lenited approximant (coded as [ʁ̞]) and the most common realization of /zʕ/ is a non-retracted, non-lenited fricative (coded as [z]). Crucially, unlike in coda position, intervocalic /z/ and /zʕ/ have no lateral component, with the exception of a single token coded as ambiguous [ɮ∼ ʁ̞].
In coda position (Table 4), the most common realization is [ɫ], especially for /zʕ/. Note that in addition to the major realizations illustrated above, a few realizations were observed only once or twice: [ð], [z∼ð], [l] (light), and hybrid realizations [ʁ̞∼ɮ], [ɬ∼ɮ], and [ʁ̞∼ɬ∼ɮ]. These reflect the fact that /z/ and /zʕ/ are particularly variable and ambiguous in coda position, much more so than in onset position.
Considering prosodic position as a whole (including medial onsets and codas as well–see Table 2), degree of lenition is significantly correlated with prosodic position for /zʕ/ ( $ \chi^{2} $ = 50.64, df = 3); for /z/, the relationship is weaker, falling slightly above the level of significance ( $ \chi^{2} $ = 6.79, df = 3, p = .079). In coda position, both medial and final, there is an overwhelming tendency (between 75 $\%$ –100 $\%$ ) for both phonemes to be lenited. Conversely, in medial onset position there is equally strong resistance to lenition (67–100 $\%$ ). These findings support cross-linguistic tendencies related to sonority and syllable structure, whereby preference is for low-sonority onsets and high-sonority codas (Clements Reference Clements1990). Intervocalically, the two phonemes differ, with /z/ leniting as it does in coda position and /zʕ/ resisting lenition as it does in onset position. However, for both phonemes, intervocalic position shows the weakest tendency towards categorical behaviour, i.e. there is more variability in lenition intervocalically than in any other prosodic position.
Based on cross-linguistic findings that, in articulatorily complex segments, tongue body articulations are more dominant in coda than in onset position (Krakow Reference Krakow, Huffman and Krakow1993, Gick et al. Reference Gick, Campbell, Oh and Tamburri-Watt2006), we predicted that coda position would also lead to higher numbers of tokens perceived as retracted. Although this is generally the case, the effect of prosodic position on retraction differs between /z/ and /zʕ/, the correlation being significant only for /zʕ/ ( $ \chi^{2} $ = 39.833, df = 3). Both /z/ and /zʕ/ are categorically non-retracted in medial onset position. In final coda position, both phonemes tend to be produced with retracted realizations, although this tendency is relatively weak for /z/. In intervocalic and medial coda positions, the two phonemes behave with opposite tendencies: /z/ tends to retract intervocalically and shows a weak tendency to non-retraction in medial codas; in other words, intervocalic and final coda positions behave similarly for /z/, and contrast with medial onsets and codas (which pattern together). In contrast, /zʕ/ tends not to retract intervocalically but is nearly categorically retracted in medial codas; in other words, intervocalic and onset positions behave similarly for /zʕ/, and contrast with medial and final codas (which pattern together). Note that the relatively weak effects of prosody on retraction reflect Cook’s (Reference Cook1993, Reference Cook2013) descriptions of /z/ and /zʕ/ variation, in which he refers to lenition and lateralization, but not retraction.
In support of Cook’s (Reference Cook1993) observations, there is a significant correlation between prosodic position and laterality with the effect being generally quite strong and consistent for both /z/ and /zʕ/ (/z/: $ \chi^{2} $ = 36.577, df = 3; /zʕ/: $ \chi^{2} $ = 89.101, df = 3): non-lateral realizations occur almost categorically in intervocalic and medial onset positions (with the tendency in the latter position being slightly weaker for /z/). In medial and final coda position, lateral realizations are nearly categorical, except for /z/ in final codas where there is only a weak tendency.
Finally, the relationship between buzziness and prosodic position is statistically significant for both /z/ ( $ \chi^{2} $ = 9.306, df = 3) and /zʕ/ ( $ \chi^{2} $ = 42.988, df = 3), reflecting the correlation between buzziness and manner (see 3.2.1). For /zʕ/, the relationship between manner and position is very clear (see above) and therefore predictive of buzziness in a straight-forward way: non-lenited intervocalic /zʕ/ realizations are by and large buzzy; lenited coda realizations are non-buzzy. The only exception to this involves intervocalic lenited realizations, half of which are buzzy. For /z/, the relationship between manner and position is not as clear, and therefore neither is the relationship between buzziness and position: lenited intervocalic /z/ realizations are almost categorically non-buzzy and non-lenited coda realizations are entirely buzzy, but other manner–position combinations also occur and are more variable. Overall, the observed patterns of buzziness support the idea that intervocalic /zʕ/ is syllabified as an onset, with lenition and hence non-buzziness occurring only in coda position, whereas this is not so clearly the case for /z/.
The findings reported in Section 3.2.1 suggest that the prosodic affiliation of intervocalic consonants is worth investigating further. Across dimensions of variation, /zʕ/ shows relatively consistent syllabic effects, with intervocalic /zʕ/ behaving like onset /zʕ/, and in opposition to medial and final coda /zʕ/. In contrast, /z/ does not show such consistent syllabic effects: with respect to retraction, intervocalic /z/ patterns with medial and final coda /z/, and in opposition to onset /z/; with respect to lenition, intervocalic /z/ patterns with final coda /z/, and in opposition to medial coda or onset /z/; with respect to laterality, intervocalic /z/, patterns with onset /z/, in opposition to coda /z/ (medial and final); with respect to buzziness, /z/ final codas stand apart from other positions as the locus for buzzy realizations. This difference in syllabic affiliations between intervocalic /z/ and /zʕ/ is interesting, and worth delving into in more detail, especially given the complexity of syllabification of intervocalic consonants in other Dene languages (Bird Reference Bird2002).
3.2.2 Phonetic realizations across segmental environments
In addition to prosodic position, a number of studies have shown that segmental environment plays a role in lenitionFootnote 12 patterns in particular (Kirchner Reference Kirchner2001, Reference Kirchner, Hayes, Kirchner and Steriade2004; Kingston Reference Kingston2008; Ennerver et al. Reference Ennever, Meakins and Round2017). In the dataset we are working with, /z/ and /zʕ/ occur either adjacent to vowels or (in a small number of case) to resonants. We focus here on the potential role of preceding vowels in predicting the phonetic realization of /z/ and /zʕ/, since this is the condition for which we have the most data.
There is ongoing debate about whether the quality of adjacent vowels affects degree of consonant lenition, and existing findings are conflicting in terms of the direction of possible effects (Ennerver et al. Reference Ennever, Meakins and Round2017).Footnote 13 In Tŝilhqot’in, /z/ and /zʕ/ show a strong tendency to lenite following all oralFootnote 14 vowels (intervocalic and coda position), compared to following another consonant (medial onset position). Variation between vowels (significant only for /z/: $ \chi $ 2 = 11.45, df = 3) suggests a prohibitive effect of vowel proximity on lenition. Ranking vowels according to lenition of /z/ (least to most lenited), we get the following scale: i (61 $\%$ ) > e (88 $\%$ ) > a, u (100 $\%$ i.e. categorical lenition). For lenition of /zʕ/ the rankings are: e (62 $\%$ ) > i (75 $\%$ ) > u (80 $\%$ ) > a (88 $\%$ ). In both cases, distal (articulatorily incompatible) [a] leads to the most cases of lenition, and the more proximal (articulatorily compatible) [e] and [i] lead to fewer cases of lenition. This supports Kirchner (Reference Kirchner2001, Reference Kirchner, Hayes, Kirchner and Steriade2004), and is also compatible with Iskarous et al.’s (Reference Iskarous, Christine Mooshammer, Daniel Recasens, Shadle and Whalen2013) model of coarticulatory resistance, which predicts that high and front vowels will resist coarticulatory effects more than low and back vowels (Recasens & Rodriguez Reference Recasens and Rodríguez2016).Footnote 15 To the extent that [u]’s behaviour is reliable (very few tokens exist, especially preceding /z/), it seems to reflect the articulatory specification of /z/ vs. /zʕ/: [u] patterns with distal [a] before /z/, but closer to proximal [e] and [i] before /zʕ/.
Recall from Section 3.1.2 that buzziness was strongly correlated with manner. The effect of vowel quality on lenition is reflected in its effect on buzziness as well (significant only for /zʕ/: $ \chi^{2} $ = 15.785, df = 5). For /z/, when ordered by buzziness (least to most buzzy), vowels are ranked as followed: u (0 $\%$ ) > a (22 $\%$ ) > e (38 $\%$ ) > i (61 $\%$ ); only /i/ is likely to produce buzzy realizations, which makes sense articulatorily if buzziness results from a high/fronted tongue body. For /zʕ/, the ranking is: a (17 $\%$ ) > u (20 $\%$ ) > i (28 $\%$ ) > e (45 $\%$ ). The ranking of vowels preceding /zʕ/ is exactly opposite for buzziness vs. lenition, reflecting the fact that less lenited realizations are consistently more buzzy. The rankings for vowels preceding /z/ by buzziness vs. lenition are more complex, reflecting the fact that the relationship between buzziness and lenition is also more complex, and partly dependent on prosodic position.
There is also a correlation between preceding vowel and retraction for both phonemes (/z/: $ \chi^{2} $ = 15.908, df = 5; /zʕ/ $ \chi^{2} $ = 16.925, df = 5). If we rank vowels according to retraction (least to most) for /z/, we get the following ranking: a (44 $\%$ ) > i (61 $\%$ ) > e (76 $\%$ ) > u (100 $\%$ ). In comparison to /z/, /zʕ/ shows a much greater incidence of retraction overall, reflecting its inherently retracted nature. When ranked according to retraction for /zʕ/, the vowels are ordered as follows: e (62 $\%$ ) > i (72 $\%$ ) > a (88 $\%$ ) > u (100 $\%$ ). For both phonemes, [u] categorically favours retraction, which supports the idea that what we coded as phonetically retracted corresponds to raised backing, or uvularization (compatible with [u]) rather than pharyngealization (Saltzman & Munhall Reference Saltzman and Munhall1989). The fact that [a] does not favour retraction of /z/ further supports this view, since one would expect [a] to favour pharyngealization, but not uvularization.
Finally, in terms of laterality, the patterns differ somewhat by phoneme, and both are statistically significant (/z/: $ \chi^{2} $ = 21.342, df = 5; /zʕ/ $ \chi^{2} $ = 22.461, df = 5). The order of vowels with respect to laterality (least to most lateral) of /z/, is as follows: u (0 $\%$ lateral) > e (29 $\%$ ) > i (42 $\%$ ) > a (100 $\%$ ); only [a] is followed categorically by lateral realizations. For /zʕ/, the order is: e (52 $\%$ ) > u (60 $\%$ ) > a (75 $\%$ ) > i (81 $\%$ ); all vowels favour lateralization of following /zʕ/, which reflects the near categorical tendency for /zʕ/ to be lateralized in coda position.
Overall, the findings presented in Sections 3.1 and 3.2 paint a relatively consistent pattern with respect to /z/ and /zʕ/ variation, even though not all results are statistically significant. Tokens coded as [ʁ̞], corresponding primarily to /z/ in intervocalic position (Table 3), are acoustically lenited, retracted, and non-buzzy. Tokens coded as [z], often corresponding to /zʕ/ in intervocalic position (Table 3), are acoustically non-lenited, non-retracted, and buzzy. Thus, lenition, retraction, and buzziness pattern together in where they occur. Tokens coded as [ɫ] are observed strictly in final coda position (Table 4) and this is reflected in the very clear results of laterality by position, across both /z/ and /zʕ/. In terms of segmental environment, distal vowels trigger lenition more so than proximal vowels; other effects vary by phoneme.
4 Discussion
Both the phonetic features of Tŝilhqot’in /z/ and /zʕ/ and their distributional properties across syllabic and segmental environments are reminiscent of patterns observed in other languages, locally and further afield. The discussion that follows considers how /z/ and /zʕ/ should be characterized phonetically and phonologically (Section 4.1), and whether the observed patterns can be described as lenition (Section 4.2).
4.1 Phonetic and phonological features of Tŝilhqot’in /z/ and /zʕ/
Phonetically, the defining features of Tŝilhqot’in /z/ and /zʕ/ include a characteristic ‘buzziness’ (accompanying coronal articulations in particular) and acoustic features compatible with engagement of both the tongue tip (TT; non-retracted/non-lateralized articulations) and the tongue body (TB; retracted/lateralized articulations), or what Laver (Reference Laver1994: 314) refers to as ‘double articulations’. Although phonologically, Tŝilhqot’in /z/ and /zʕ/ are clearly consonants, their phonetic characteristics are reminiscent of two other sounds described in the literature: the Chinese front apical vowel and the Swedish ‘Viby-i’ vowel (see Laver Reference Laver1994: Chapter 11/Section 11.3).
The Chinese sound that has traditionally been called the ‘front apical vowel’ has been the topic of much debate concerning its precise nature and whether it is best characterized (phonetically) as a vowel, an approximant, or a fricative (Karlgren Reference Karlgren1915, Ladefoged & Maddieson Reference Ladefoged and Maddieson1996, Yu Reference Yu1999, Duanmu Reference Duanmu2007, Lee-Kim Reference Lee-Kim2014). X-ray images in Zhou & Wu (Reference Zhou and Wu1963) and ultrasound images in Lee-Kim (Reference Lee-Kim2014) both show that the sound (transcribed here as [ɹ̪] following Lee-Kim) has a double articulation, with TT raising and TB raising/backing (see also Laver (Reference Laver1994) on double-articulations). In fact, Lee-Kim (Reference Lee-Kim2014) notes that there seems to be an inherent compatibility between dental TT articulations and TB retraction, citing Stevens, Keyser & Kawasaki (Reference Stevens, Jay Keyser, Kawasaki, Perkell and Klatt1986), who ‘conjecture that the dental constriction, which requires a flat tongue front, can be achieved more easily when the tongue back is retracted’ (p. 271). Lee-Kim notes that [ɹ̪] is ‘presumably unattested in any other language’ (p. 279). While the phonological distribution of [ɹ̪] is certainly a signature feature of Chinese languages, Tŝilhqot’in /z/ and /zʕ/ show that its phonetic realization is perhaps not so unique.
If Tŝilhqot’in /z/ and /zʕ/ indeed shows the same kind of double articulation (TT and TB) as Chinese [ɹ̪], as our acoustic and auditory analysis suggests, this might explain why all of these sounds can exhibit what is referred to here as ‘buzziness’. Shao & Ridouane (Reference Shao and Ridouane2018, Reference Shao and Ridouane2019) describe the front apical vowel in Jixi-Hui Chinese, which they transcribe /z/. According to their 2019 articulatory investigation, Jixi-Hui /z/ has a high TB and a raised TT. They hypothesize that the high TT in particular might explain the presence of ‘abundant frication noise’ during the sound, which has also been described by Trubetzkoy (Reference Trubetzkoy1969) as ‘frication-like noise resembling a humming’ (p. 171) and by Chao (Reference Chao1961) as ‘a buzzing quality’ (p. 22).Footnote 16 Note that Jixi-Hui Chinese /z/ transitions from a fricative to a vowel. In cases where Tŝilhqot’in /z/ and /zʕ/ are transitional, they follow the opposite pattern, transitioning from a more lenited, sonorant sound to a less lenited one. This is not surprising, given the differences in their phonological status, Jixi-Hui Chinese /z/ acting as a syllable nucleus and Tŝilhqot’in /z/ and /zʕ/ acting as consonants.
Another language which exhibits a distinctly ‘buzzy’ sound is Swedish. Westberger (Reference Westerberg2018, Reference Westerberg2019) has conducted the most recent work on the vowel termed ‘Viby-i’, which she describes as ‘an /i:/ variant with an unusual “thick”, “buzzing”, and “damped” quality (Engstrand et al. Reference Engstrand, Björnsten, Lindblom, Bruce and Eriksson1998)’ (Westerberg Reference Westerberg2019: 3696). Westberger cites similar descriptions by previous authors: Björsten & Engstrand (Reference Björsten and Engstrand1999) ‘suggest that viby-i is a high central unrounded [ɨ], which may be produced with a raised tongue tip to amplify its “damped” quality’ (Westberger Reference Westerberg2019: 3696–7). Frid et al. (Reference Frid, Schötz, Gustafsson and Löfqvist2015) report that ‘Viby-i is produced with a lower and backer tongue body, and different tongue tip behaviour, than [i:]’ (Westberger Reference Westerberg2019: 3697). Based on acoustic evidence, Westberger hypothesizes that Viby-i is a centralized vowel (similar to our auditory impressions of the most lenited versions of Tŝilhqot’in /z/ and /zʕ/) and links the low F2 of Viby-i to a complex tongue shape. Of course, the articulatory properties of Tŝilhqot’in /z/ and /zʕ/ can only be inferred from the acoustic signal here. Our hope is that, in the future, we might also have the possibility of conducting an articulatory study of these sounds. Particularly intriguing to us is the potential role of jaw clenching in producing /z/ and /zʕ/. Neither the literature on the Chinese front apical vowel nor that on the Swedish Viby-i mentions the jaw as a primary articulator, but according to the Tŝilhqot’in speaker we worked with, the jaw is tightly shut for /z/ and /zʕ/. Her descriptions provide support for Esling’s (Reference Esling2005) laryngeal articulator model of vowel production, which specifically includes the jaw as a primary articulator.
Although phonetically Tŝilhqot’in /z/ and /zʕ/ appear similar to the Chinese front apical vowel and to the Swedish Viby-i, these sounds differ in their phonological status. Prosodically, unlike the Chinese and Swedish sounds, Tŝilhqot’in /z/ and /zʕ/ are clearly consonants, acting as syllable onsets and codas, but never as nuclei. Segmentally, their precise phonological categorization is less clear. Although they are transcribed as fricatives by Cook (Reference Cook1993, Reference Cook2013), /z/ and /zʕ/ show prosodically-based variation that is typical of sonorants cross-linguistically. Gick et al. (Reference Gick, Campbell, Oh and Tamburri-Watt2006) and Krakow (Reference Krakow, Huffman and Krakow1993, Reference Krakow1999) discuss articulatory timing in liquids and nasals, respectively. Liquids and nasals are articulatorily complex, consisting of (at least) two gestures: TT + TB for liquids, and tongue (TT or TB) + velum for nasals. In such segments, both the magnitude of the anterior and posterior gestures and their relative timing can vary as a function of syllabic position (Gick et al. Reference Gick, Campbell, Oh and Tamburri-Watt2006): in onset position, if the posterior gesture occurs at all, it occurs relatively synchronously with the anterior gesture; in coda position, the posterior gesture precedes the anterior gesture and the anterior gesture is often lenited, sometimes to the point of disappearing altogether. For liquids (/l/ in particular), asynchronous timing in coda position leads to perceived dominance of the TB gesture, as in dark [ɫ]. For nasals, asynchronous timing in coda position leads to perceived nasal co-articulation with the preceding vowel. In Tŝilhqot’in, retracted (TB-dominant) variants of /z/ and /zʕ/ (especially lateral ones) tend to occur in coda position, whereas non-retracted tokens are more common in onset position (TT-dominant), especially for /zʕ/. This pattern matches that found among other sonorants across the world’s languages.
In the literature on inter-gestural timing, the anterior and posterior gestures are sometimes conceptualized instead as ‘consonantal’ and ‘vocalic’, where more anterior gestures (TT) are consonantal and more posterior gestures (TB) are vocalic. Inter-gestural timing is then described in terms of peripherality within the syllable: consonantal (anterior) gestures occur more peripherally, whereas vocalic (posterior) gestures occur more centrally (Sproat & Fujimura Reference Sproat and Fujimura1993). If we think of lenited variants in our data as more ‘vocalic’ and non-lenited variants as more ‘consonantal’, then the co-variation ( $ \chi^{2} $ = 73.263, df = 1) observed between lenition and retraction (see Section 3.1.2) makes good sense: as with sonorants in other languages, the more vocalic variants of /z/ and /zʕ/ are also retracted (dominance of TB gesture; 91 $\%$ of lenited forms), whereas the more consonantal variants are also non-retracted (dominance of TT gesture; 90 $\%$ of non-lenited forms).
The idea of /z/ and /zʕ/ patterning with sonorants is familiar from Interior Salish languages that neighbour Tŝilhqot’in. For example, van Eijk (Reference van Eijk1997: 4) notes that in St’át’imcets (Lillooet Salish; see Figure 1), /z/ behaves as a resonant, phonologically. Phonetically, van Eijk characterizes /z/ as a ‘lax’ fricative, implying that it is on the sonorant end of the fricative–sonorant continuum. Phonologically, it has the same distributional restrictions as other resonants in the language and, like more typical resonants, it also has a glottalized counterpart. Interestingly, van Ejik notes that, in the Mount Currie dialect, /z’/ allows free variation between [z’] and [l’] in coda position e.g. /χez’p/ ‘ember(s)’ can be realized as [χez’p] or [χel’p]. This suggests that [z]–[l] allophonic variation is an areal phenomenon, and/or that [z] and [l] are compatible in some way that makes them likely allophones, not just in Tŝilhqot’in, but more generally.
Summarizing so far, we have seen that the acoustics of Tŝilhqot’in /z/ and /zʕ/ appear to indicate both TT and TB gestures, which vary in their magnitude and specific realization, and which sometimes lead to the perception of buzziness within the sound. Phonologically, /z/ and /zʕ/ behave similarly to /z/ in Interior Salish languages, which also has (phonological) properties of sonorants (rather than fricatives). We return to the phonological specification of /z/ and /zʕ/ in Section 4.2, after a discussion of whether their variation falls under the umbrella of lenition.
4.2 Tŝilhqot’in /z/ and /zʕ/ variation as lenition
Given the variation in manner observed across tokens of Tŝilhqot’in /z/ and /zʕ/, it is worth considering to what extent this variation can be described in terms of lenition (Kirchner Reference Kirchner, Hayes, Kirchner and Steriade2004, Kingston Reference Kingston2008, Warner & Tucker Reference Warner and Tucker2011, Ennerver et al. Reference Ennever, Meakins and Round2017, Katz & Pitzanti Reference Katz and Pitzanti2019, Broś et al. Reference Broś, Żygis, Sikorski and Wołłejko2021, and others). Ennerver et al. (Reference Ennever, Meakins and Round2017) summarize the factors that have been shown to play a role in lenition, including duration, prosodic position, and segmental environment (as well as consonantal place of articulation, which we will not consider here).Footnote 17
In terms of duration, cross-linguistic patterns show that lenited forms also tend to be shorter in duration, not surprisingly. A common analysis of this correlation is ‘undershoot’, where shorter durations (e.g. due to faster speech rates) lead to articulatory undershoot of targets. As Ennerver et al. (Reference Ennever, Meakins and Round2017: 4) state, ‘the shorter the duration afforded to a constriction, the less likely full constriction will be achieved’. Interestingly, the lenited and non-lenited variants of Tŝilhqot’in /z/ and /zʕ/ are not significantly different in duration, and in fact the trend is in the opposite direction than predicted: the average durations for lenited vs. non-lenited tokens are 182 ms and 170 ms, respectively. Warner & Tucker (Reference Warner and Tucker2011) report that, in American English, lenition occurs more often at faster speech rates and in less formal registers. Recall that the Tŝilhqot’in data were elicited in a word list task, and hence presumably reflect a relatively slow speech rate and formal register. Perhaps a correlation between duration and lenition would emerge in faster, more spontaneous speech. Given the observed variation more broadly though, across three dimensions (manner, retraction, laterality), it seems unlikely that shortened duration is a determining factor in the observed variation in /z/ and /zʕ/ realization.
In terms of prosodic position, intervocalic position is ‘widely accepted as the segmental environment most favourable for consonantal lenition’ (Ennerver et al. Reference Ennever, Meakins and Round2017: 5; see also Kirchner Reference Kirchner, Hayes, Kirchner and Steriade2004). Different scholars propose different motivations for lenition in this position. According to Kirchner (Reference Kirchner2001, Reference Kirchner, Hayes, Kirchner and Steriade2004), intervocalic consonant lenition occurs for articulatory reasons, to reduce the effort required to move between vocalic (relatively open) and consonantal (relatively closed) articulatory targets. Kingston (Reference Kingston2008) argues that lenition occurs for perceptual reasons, so as not to interrupt the speech stream word-internally; this helps to focus the listener’s attention on morpheme and word boundaries (see also Katz & Pitzanti Reference Katz and Pitzanti2019). In Tŝilhqot’in, the most favoured site for lenition, particularly so for /zʕ/, is coda position, a position that does not require quick transitions between the target consonant and adjacent vowels (Kirchner Reference Kirchner2001, Reference Kirchner, Hayes, Kirchner and Steriade2004) and that would not otherwise interrupt the speech stream word-internally (Kingston Reference Kingston2008). This particular pattern of /zʕ/ variation goes directly against Kingston (Reference Kingston2008), who argues that while lenition is common within prosodic constituents (including the word), it is often disfavoured at their edges. The pattern for /z/ is less clear, and possibly more compatible with a lenition analysis, in that its most common realization in intervocalic position is lenited [ʁ̞]. One pattern that clearly does match cross-linguistic tendencies is that lenition is disfavoured in medial onset position, for both /z/ and /zʕ/ (Escure Reference Escure1977). Recall that Tŝilhqot’in /z/ and /zʕ/ do not occur in word-initial onset position; it would be interesting to further investigate the potential relationship between the distribution of these sounds and their lenition patterns.
Although we were unable to test the effect of adjacent consonant on lenition (Kingston Reference Kingston2008), we did test the effect of adjacent (preceding) vowel (Ennever et al. Reference Ennever, Meakins and Round2017) and found that, for both /z/ and /zʕ/, lenition was favoured adjacent to distal ([a]) vowels and disfavoured adjacent to proximal ([e], [i]) vowels. To the extent that vowel environment is reliable, this supports Kirchner’s (Reference Kirchner2001, Reference Kirchner, Hayes, Kirchner and Steriade2004) effort-based model of lenition.
Summarizing, while lenition is an important aspect of Tŝilhqot’in /z/ and /zʕ/ variation, the specifics of the observed patterns do not consistently reflect ones found cross-linguistically: although vocalic environment does seem to play a role consistent with articulatory (effort-based) motivations for lenition, there is no correlation between (shorter) duration and lenition, and the syllabic position that favours lenition most consistently is (final) coda position rather than intervocalic position, especially for /zʕ/. Given that lenition is only one dimension of /z/ and /zʕ/ variation, this result is perhaps not surprising. Indeed, even if articulatory considerations could explain the lenition aspect of /z/ and /zʕ/ variation, they could not explain the full variation observed, which also includes laterality and retraction, all of which entail increased (rather than reduced) articulatory complexity.
That the variation observed in Tŝilhqot’in /z/ and /zʕ/ occurs across three dimensions–manner, retraction, and laterality–begs the question: What is the phonological specification of these sounds, and what IPA symbol most appropriately represents them? Without direct articulatory data, it is difficult to answer these questions. We provide some preliminary thoughts here, as a starting point for future work. Ennerver et al. (Reference Ennever, Meakins and Round2017: 1–2) refer to Keating’s (Reference Keating1990) Window Model to explain observed variation in manner: segments are specified for ‘windows’ or ‘ranges’ of targets rather than single points. This model potentially works well for explaining variation along a single articulatory continuum, like manner (e.g. degree of lenition). It is not clear how well it would work for variation in features like retraction and laterality.Footnote 18 To answer this question, detailed articulatory work is needed to determine what precisely the gestures are that create the auditory and acoustic characteristics of retraction and laterality in Tŝilhqot’in, and whether these also vary continuously in strength. Whether target gestures are single points or ranges, it seems clear that /z/ and /zʕ/ need to include retraction and laterality as part of their phonetic and phonological specification, otherwise there would be no way of explaining why these properties turn up in their production.
5 Conclusion
The Tŝilhqot’in consonant inventory is incredibly rich and has yet to be documented phonetically in a thorough way. This study takes a first step towards this endeavour, focusing on the acoustic characteristics of the voiced plain and pharyngealized coronal fricatives /z/ and /zʕ/. Findings show that these sounds vary along three main dimensions: manner (fricative–approximant), retraction (non-retracted–retracted), and laterality (non-lateral–lateral), in addition to buzziness, which is highly correlated with manner. Variation is partly based in part on syllabic position and segmental environment. A fuller description of these sounds requires studying their articulatory properties directly. We hope to be able to continue documenting the Tŝilhqot’in sound system, and that this work will advance our understanding of phonetic typology, while also contributing to pedagogical resources for teaching and learning Tŝilhqot’in pronunciation.
Acknowledgements
This research was conducted on the lands of the Tŝilhqot’in Nation (data collection), the Esquimalt, Songhees, and WSÁNEĆ Nations, and in Treaty One territory, original lands of the Anishinaabeg, Cree, Oji-Cree, Dakota, and Dene peoples, and the homeland of the Métis Nation. We are grateful to be able to live and work on these lands. We would like to thank the Tŝilhqot’in speakers who welcomed us into their homes and shared their language with us. We are also grateful to the audience members at the 2014 Phonetic Building Blocks of Speech conference, and to the two anonymous reviewers who provided us with such thorough and insightful feedback on our work. This work was funded by the Social Sciences and Research Council of Canada, grant # 410-2011-224.
Appendix A. Elicitation list
The word list was used to elicit the target phonemes for this study. To request access to the audio recordings that accompany the word list, please contact co-author Bird.
Appendix B. Statistical analyses
Shaded cells in the tables indicate results where no significant difference was found.
aSignificance: *** p < .001; ** p < .01; * p < .05
aSignificance: *** p < .001; ** p < .01; * p < .05
aSignificance: *** p < .001; ** p < .01; * p < .05
aSignificance: *** p < .001; ** p < .01; * p < .05
Supplementary material
To view supplementary material for this article (including audio files to accompany the language examples), please visit https://doi.org/10.1017/S0025100322000093.