1 Introduction
The acoustic characterisation of vowel length contrasts in Australian English (AusE) has been clearly documented. Vowel length contrasts in this variety are realised through temporal (Bernard Reference Bernard1967, Cochrane Reference Cochrane1970, Fletcher & McVeigh Reference Fletcher and McVeigh1993, Cox Reference Cox2006, Cox & Palethorpe Reference Cox and Palethorpe2011, Cox, Palethorpe & Miles Reference Cox, Palethorpe and Miles2015), spectral (Bernard Reference Bernard1970, Cox Reference Cox2006, Elvin, Williams & Escudero Reference Elvin, Williams and Escudero2016), and dynamic characteristics (Harrington & Cassidy Reference Harrington and Cassidy1994, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). Less is known about the articulatory characteristics of vowel length contrasts in AusE. Fletcher, Harrington & Hajek (Reference Fletcher, Harrington and Hajek1994) compared jaw displacement in the long–short vowel pair /ɐː–ɐ/ (barb – bub) in /bVb/ syllables for three speakers, and found that /ɐː/ was consistently characterised by a lower target jaw position than /ɐ/. Blackwood Ximenes, Shaw & Carignan (Reference Blackwood Ximenes and Carignan2017) examined the articulation of a subset of AusE vowels produced in /sVd/ context by four speakers, and found that the average tongue dorsum position of /ɪ/ was lower and more retracted than its long equivalent /iː/. The present study expands on previous articulographic work by examining the lingual articulation of multiple long–short vowels pairs in AusE, allowing us to characterise the kinematic properties that underlie the realisation of this contrast.
1.1 Phonetic correlates of vowel length contrast
The primary cue to vowel length contrasts in languages such as AusE, is vowel duration. Long vowels are prototypically produced with a greater acoustic duration than short vowels (Lehiste Reference Lehiste1970, Lindau Reference Lindau1978). The acoustic duration of vowels is commonly measured from the onset to offset of vowel voicing (House Reference House1961, Lehiste & Peterson Reference Lehiste and Peterson1961, Bell-Berti & Harris Reference Bell-Berti and Harris1981, Hertrich & Ackermann Reference Hertrich and Ackermann1997). This measure is dependent upon the duration of laryngeal activity associated with vowel articulation (Bell-Berti & Harris Reference Bell-Berti and Harris1981, Hertrich & Ackermann Reference Hertrich and Ackermann1997). However, the durations of the supralaryngeal articulatory movements of the lips, jaw and tongue have been relatively understudied. Hertrich & Ackermann (Reference Hertrich and Ackermann1997) examined the duration of lip-opening gestures associated with German vowels, finding that, on average, the lip-opening movement of short vowels was approximately 80 $\%$ the duration of those of long vowels, while the acoustic duration of short vowels was 60 $\%$ that of long vowels. These results demonstrate that while vowel length-related durational contrast is specified across multiple articulators (e.g. lips, larynx), this durational contrast appears to be specified differently across these different articulators. However, it remains an open question whether differences between acoustic and articulatory characteristics of vowel duration occur in other languages.
In Dutch, English, German, and Swedish, long/tense and short/lax vowelsFootnote 1 often also differ with regard to their position in the vowel space, with the acoustic and articulatory targets of short vowels produced closer to the centre of the vowel space compared to their long equivalents (Lindblom Reference Lindblom1963, Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964, Lindau Reference Lindau1978, Nooteboom & Doodeman Reference Nooteboom and Doodeman1980, Jessen Reference Jessen1993, Hoole & Mooshammer Reference Hoole and Mooshammer2002, Cox Reference Cox2006, Harrington, Hoole & Reubold Reference Harrington, Hoole and Reubold2012, Elvin et al. Reference Elvin, Williams and Escudero2016). Early accounts of vowel quality differences between long and short vowels proposed a physiological explanation, whereby the centralisation of short vowel targets was said to be due to biomechanical limitations on achieving the same phonological target as their long equivalents in a shorter time span (Lindblom Reference Lindblom1963). In this undershoot account, the primary determinant of centralisation is vowel duration: the shorter the vowel the more centralised its target (Lindblom Reference Lindblom1963). However, vowel quality may be manipulated independently of vowel duration in the realisation of vowel length contrasts. In German unstressed syllables, short (lax) vowels are centralised but not shorter in duration than long vowels (Mooshammer & Fuchs Reference Mooshammer and Fuchs2002, Mooshammer & Geng Reference Mooshammer and Geng2008). Furthermore, listeners appear to use both vowel quality differences and durational differences as cues to vowel length contrasts (Delattre Reference Delattre1962, Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964, Mooshammer & Fuchs Reference Mooshammer and Fuchs2002, Gussenhoven Reference Gussenhoven, Cole and Hualde2007, Mády & Reichel Reference Mády and Reichel2007, Mooshammer & Geng Reference Mooshammer and Geng2008, Lehnert-LeHouillier Reference Lehnert-LeHouillier2010, Meister, Werner & Meister Reference Meister, Werner and Meister2011, Tomaschek, Truckenbrodt & Hertrich Reference Tomaschek, Truckenbrodt, Hertrich, Adrian Leemann, Schmid and Dellwo2015). Vowel quality differences and durational differences appear to be in a trading relationship in some languages; listeners rely less on durational cues to vowel length when presented with stimuli in which long–short vowel quality differences are exaggerated, and rely more on durational cues when quality differences are minimised (Delattre Reference Delattre1962, Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964, Lehnert-LeHouillier Reference Lehnert-LeHouillier2010).
Vowel length contrasts are also characterised by differences in formant dynamics. The proportionate duration of three acoustic components: the acoustic onglide, acoustic steady-state (target) and the acoustic offglide have been shown to differ between long and short vowels. In American English (Lehiste & Peterson Reference Lehiste and Peterson1961), Canadian English (Nearey & Assmann Reference Nearey and Assmann1986), German (Strange & Bohn Reference Strange and Bohn1998), and AusE (Bernard Reference Bernard1967, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006), short vowels have proportionately shorter acoustic steady-states and proportionately longer acoustic offglides than their long counterparts. These differences have also been observed in articulation, with short vowels in German and Slovak exhibiting proportionately shorter articulatory steady states than their long equivalents (Kroos et al. Reference Kroos, Hoole, Kühnert and Tillmann1997, Hoole & Mooshammer Reference Hoole and Mooshammer2002, Beňuš Reference Beňuš2011). In German, short vowels also exhibit proportionately longer release intervals (the articulatory transition to following tautosyllabic consonants) than long vowels (Kroos et al. Reference Kroos, Hoole, Kühnert and Tillmann1997, Hoole & Mooshammer Reference Hoole and Mooshammer2002). Little is known about the dynamic articulatory properties of AusE vowels, or whether AusE exhibits similar vowel-length dependent patterns of articulation as those found in German.
1.2 Australian English
The AusE vowel inventory consists of 18 stressable vowels (Cox Reference Cox2006, Cox & Fletcher Reference Cox and Fletcher2017),Footnote 2 including six long /iː eː ɐː oː ʉː ɜː/ and six short /ɪ e ӕ ɐ ɔ ʊ/ monophthongs (Figure 1; Cox Reference Cox2006, Cox & Fletcher Reference Cox and Fletcher2017).
Australian English uses a vowel length contrast (Cox & Palethorpe Reference Cox and Palethorpe2007), rather than a tense–lax contrast, characteristic of other English varieties such as American English (Peterson & Lehiste Reference Peterson and Lehiste1960, House Reference House1961, Lehiste & Peterson Reference Lehiste and Peterson1961). This is due to the contrastive status of duration in signalling the distinction between some vowel pairs; in particular, /ɐː–ɐ/, /eː–e/ and (for some speakers) /iː–ɪ/ (Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Cox & Palethorpe Reference Cox and Palethorpe2007).
1.2.1 Duration
On average, AusE short vowels are 60 $\%$ the duration of their long equivalents in voiced coda contexts (Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). This is more distinct than the tense/lax contrast of General American English, in which lax vowels are approximately 75 $\%$ the duration of their tense equivalents (Peterson & Lehiste Reference Peterson and Lehiste1960, House Reference House1961). Relative durational differences are consistent across various phonetic contexts (Elvin et al. Reference Elvin, Williams and Escudero2016). However, the absolute duration of AusE vowels is affected by vowel height similar to other English dialects (House Reference House1961, Chen Reference Chen1970, Cochrane Reference Cochrane1970, Klatt Reference Klatt1976, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). In citation form /hVd/ context, /ɪ/ has a shorter absolute duration (∼140 ms) than both the short open vowel /ɐ/ (∼160 ms) and the short mid-open vowel /ɔ/ (∼170 ms) (Cox Reference Cox2006). No previous studies of AusE have examined the duration of lingual activity associated with vowels, so it is not known how these durational contrasts manifest in the articulatory domain.
1.2.2 Target quality
Although duration is the primary cue to vowel length in AusE (Harrington & Cassidy Reference Harrington and Cassidy1994, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006), long and short vowel pairs also differ in target quality, with some vowel pairs intrinsically more spectrally and spatially differentiated than others.
/ɐː/ and /ɐ/ have largely overlapping vowel targets, and thus differ primarily in duration (Bernard Reference Bernard1970, Cochrane Reference Cochrane1970, Harrington & Cassidy Reference Harrington and Cassidy1994, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). Cox (Reference Cox2006) examined 960 /hVd/ tokens from 120 female adolescent speakers of AusE. The mean F1 and F2 values of /ɐː/ (F1 = 856 Hz, F2 = 1451 Hz) did not differ significantly from those of /ɐ/ (F1 = 842 Hz, F2 = 1469 Hz). Likewise, Bernard (Reference Bernard1970), in his cineoradiographic study of AusE vowels, found a high degree of similarity between the lingual articulatory targets of /ɐː–ɐ/ in /hVd/ syllables. Conversely, Fletcher et al. (Reference Fletcher, Harrington and Hajek1994) found small but significant differences in jaw displacement between /ɐː/ and /ɐ/ in /bVb/ syllables, with /ɐ/ showing a more centralised jaw trajectory.
/iː–ɪ/ also share similar acoustic vowel targets. Cox (Reference Cox2006) found no significant difference in the mean F1 and F2 values of /iː/ (F1 = 391 Hz, F2 = 2729 Hz) and /ɪ/ (F1 = 402 Hz, F2 = 2697 Hz) produced by adolescent females in the 1990s. However, studies based on more recent acoustic data have suggested that in young AusE speakers /ɪ/ is marginally lower and more retracted than /iː/ (Cox, Palethorpe & Bentink Reference Cox, Palethorpe and Bentink2014). These acoustic results are supported by recent articulatory studies where /ɪ/ is produced with a significantly more retracted and lowered tongue dorsum than /iː/ (Blackwood Ximenes et al. Reference Blackwood Ximenes and Carignan2017).
Unlike /ɐː–ɐ/ and /iː–ɪ/, /oː/ and /ɔ/ can be differentiated through their target formant values alone, independent of durational information (Bernard Reference Bernard1970, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016, Cox & Fletcher Reference Cox and Fletcher2017). The primary difference between /oː/ and /ɔ/ is in target F1 (/oː/ = 494 Hz, /ɔ/ = 708 Hz) although the pair also differs in F2 (/oː/= 954 Hz, /ɔ/ = 1182 Hz; Cox Reference Cox2006). Early articulatory analysis of /oː/ and /ɔ/ shows a clear differentiation of target tongue position for this pair (Bernard Reference Bernard1970), however, recent articulatory analyses highlight that the tongue dorsum positions at the target of /oː/ and /ɔ/ have much larger degree of articulatory overlap than reflected in the target F1 and F2 values of these vowels: /oː/ is articulated with a similar tongue dorsum height and a slightly more retracted posture than /ɔ/ (Blackwood Ximenes, Shaw & Carignan Reference Blackwood Ximenes and Carignan2016, Reference Blackwood Ximenes and Carignan2017; Ratko et al. Reference Ratko, Proctor, Cox and Veld2016). Instead, differences in lip rounding may also contribute to the F1 and F2 differences between /oː/ and /ɔ/. Blackwood Ximenes et al. (Reference Blackwood Ximenes and Carignan2017) observed that the long /oː/ had a greater degree of lip protrusion than the short /ɔ/ in three out of four recorded participants. More research is needed to determine whether differences in lip rounding are also present in other samples of AusE speakers.
1.2.3 Dynamic formant structure
Finally, long and short vowels differ in their dynamic formant structure in AusE (Harrington & Cassidy Reference Harrington and Cassidy1994, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Cox et al. Reference Cox, Palethorpe and Bentink2014). On average, short vowels are produced with proportionately shorter acoustic steady-states and proportionately longer acoustic offglides than phonologically long vowels (Cox Reference Cox2006). However, /iː/ and /ɪ/ differ further in dynamic formant structure, with /iː/ characterised by a prolonged acoustic onglide for some AusE speakers, giving it a semi-diphthongal quality [əi] (Harrington & Cassidy Reference Harrington and Cassidy1994, Harrington, Cox & Evans Reference Harrington, Cox and Evans1997, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Cox et al. Reference Cox, Palethorpe and Bentink2014, Cox et al. Reference Cox, Palethorpe and Miles2015).
Collectively, this work suggests that different long–short vowel pairs may vary in the extent to which vowel length contrast is expressed by temporal (duration) or spectral/spatial (target formant values or target tongue position) information (Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006).
This study will focus on the articulation of three long–short vowel pairs /iː–ɪ/, /ɐː–ɐ/ and /oː–ɔ/.Footnote 3 These pairs are distributed across three peripheral areas of the AusE vowel space (Figure 1). /iː–ɪ/ beat – bit are considered to contrast primarily in vowel length, although this pair also has an additional onglide contrast present in /iː/ (Cox Reference Cox2006, Cox & Palethorpe Reference Cox and Palethorpe2007). /ɐː–ɐ/ cart – cut contrast primarily in length. The third pair, /oː–ɔ/ port – pot are distinguishable by acoustic height in addition to length (Cox Reference Cox2006), but have a high degree of lingual articulatory similarity (Blackwood Ximenes et al. Reference Blackwood Ximenes and Carignan2016, Reference Blackwood Ximenes and Carignan2017; Ratko et al. Reference Ratko, Proctor, Cox and Veld2016).
1.3 Aims and predictions
The aim of this paper is to provide an empirical investigation of the lingual articulation of vowel length contrasts in AusE. The present study builds upon a largely acoustic description of AusE vowels. The few prior articulatory studies of AusE vowels have not focused on length contrasts (Bernard Reference Bernard1967, Blackwood Ximenes et al. Reference Blackwood Ximenes and Carignan2017) or have included examination of only a single long–short vowel pair (Fletcher et al. Reference Fletcher, Harrington and Hajek1994). We make the following predictions:
-
1. Durational differences in the lingual gestures (gesture onset to gesture offset) of long and short vowels should follow similar patterns as acoustic duration differences, with short vowel gestures having a shorter duration than long vowel gestures (Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016), but the magnitude of durational differences between long and short vowels should be reduced in the articulatory domain, as has been found in German (Hertrich & Ackermann Reference Hertrich and Ackermann1997).
-
2. Although all long–short vowel pairs should exhibit similar articulatory targets, the degree of similarity is predicted to differ by vowel pair. The low vowel pair /ɐː–ɐ/ will have the most similar articulatory targets, whereas /iː–ɪ/ and /oː–ɔ/ will have less similar pairwise articulatory targets (Bernard Reference Bernard1970, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016).
-
3. There will be a trading relationship between acoustic duration and spatial and kinematic differences in the realisation of vowel length contrast (Delattre Reference Delattre1962, Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964, Lehnert-LeHouillier Reference Lehnert-LeHouillier2010). That is, the long–short pair with the most similar articulatory targets will exhibit the largest pairwise difference in acoustic duration, whereas the long–short pair with the least similar articulatory targets will exhibit the smallest difference in acoustic duration. This is in opposition to Lindblom’s (Reference Lindblom1963) target undershoot account, which predicts that the vowel pair with the least similar articulatory target would exhibit the largest durational differences.
-
4. /oː/ will be produced with more lip rounding than /ɔ/.
-
5. In line with acoustic studies (Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016), long and short vowel gestures will be characterised by different dynamic articulatory patterns. Short vowels should exhibit a proportionately shorter period of articulatory stability around their mid-points and proportionately longer articulatory transitions to following consonants. However, the long vowel /iː/ should be characterised by a lengthy phonological onglide as is characteristic of AusE (Cox Reference Cox2006).
2 Method
2.1 Participants
Participants were seven monolingual speakers of AusE (four females). Average age was 20.4 years (s.d. = 2.82). All participants were born in Australia and had at least one Australian-born parent. All reported no history of speech or hearing disorders. All received primary and secondary education within New South Wales, and were residents of the Greater Sydney region at time of recording.
2.2 Experiment materials
Vowel pairs /iː–ɪ/, /ɐː–ɐ/, /oː–ɔ/ were elicited in two symmetrical consonant contexts: /pVp/ and /tVt/ (Table 1). Consonant context conditions the duration, quality and formant dynamics of vowels (Stevens & House Reference Stevens and House1963, Klatt Reference Klatt1976, Jenkins, Strange & Edman Reference Jenkins, Strange and Edman1983, Strange, Jenkins & Johnson Reference Strange, Jenkins and Johnson1983, Sussman et al. Reference Sussman, Bessell, Dalston and Majors1997, Strange & Bohn Reference Strange and Bohn1998, Hillenbrand, Clark & Nearey Reference Hillenbrand, Clark and Nearey2001, Strange et al. Reference Strange, Weber, Levy, Shafiro, Hisagi and Nishi2007, Pycha & Dahan Reference Pycha and Dahan2016). As such, we included two consonant contexts to better determine which intrinsic differences between long and short vowels are maintained across consonant contexts, and which are contingent upon surrounding consonant identity.
The experiment was designed to carefully control for the effects of phonetic context on vowel articulation, which necessitated the use of a combination of both words and non-words. Studies have shown that participants may hyperarticulate novel or unfamiliar words (Umeda Reference Umeda1975, Klatt Reference Klatt1976, Fowler & Housom Reference Fowler and Housom1987). To minimise the potential influences on articulation due to lexical status and familiarity, all participants in this task undertook two practice sessions prior to recording.
A carrier phrase was used to create an antagonistic tongue dorsum position prior to and following the target item. /iː–ɪ/ were presented within the carrier phrase Star CVC heart /stɐː CVC hɐːt/. /ɐː–ɐ/ and /oː–ɔ/ were presented within the carrier phrase See CVC heat /siː CVC hiːt/, with focus on the target word. Prior to recording, all participants were familiarised with elicitation materials and instructed to read them aloud. If a participant pronounced the target word incorrectly or was unsure how to pronounce it, they were shown a written word that rhymed with the desired pronunciation. Participants then read each phrase from orthographic presentation on a computer screen in a sound attenuated room. Presentation was self-paced.
The 12 target words (Table 1) were divided into two blocks: block one consisted of target words containing /iː/ and /ɪ/, and block two containing /ɐː ɐ oː ɔ/. Target words were randomised within blocks. Ten repetitions of each word within its carrier phrase (120 items) were elicited from each participant. Participant W1 terminated the experiment early, resulting in only eight repetitions for that participant.
2.3 Data acquisition
Articulatory data were recorded using a Northern Digital Inc. Wave Electromagnetic Articulography (EMA) system (Northern Digital Inc. 2016) at a sampling rate of 100 Hz. The placement of sensors is shown in Figure 2. Three lingual sensors were placed at the (1) tongue tip (∼6 mm from anatomical tongue tip), (2) tongue body (∼22 mm from tongue tip) and (3) tongue dorsum (∼40 mm from tongue tip). Sensors were also placed on the (4) upper lip, (5) lower lip and (6) lower gum line, to track jaw height. Reference sensors were placed on the (7) nasion and the protrusion of the (8) left mastoid and (9) right mastoid processes. Speech audio was recorded using a RØde NT1-A shotgun microphone at a sampling rate of 22050 Hz.
2.4 Data processing
Articulatory sensor signals were corrected for head movement and rotated to a common coordinate system defined with respect to the rear of the upper incisors using the three reference sensors. For the analysis presented in this study we used data from the tongue dorsum (TD) sensor (Sensor 3 in Figure 2). The TD sensor was chosen as it exhibited the greatest displacement during vowel gesture production for all participants and vowel pairs (see Appendix Table A1). Articulatory signals were low-pass filtered and conditioned using a DCT-based discretised smoothing spline (Garcia Reference Garcia2010) and synchronised with the audio data.
2.5 Acoustic segmentation
Two acoustic landmarks were identified for each vowel: acoustic onset (Figure 3: (A)) and acoustic offset (Figure 3: (B)). In each recording, RMS energy was calculated in 20 ms 75 $\%$ overlapped Hamming-windowed intervals over the length of a 1.5-second interval centred on the target vowel. Working outwards from the peak RMS energy, the first and last points in time were located at which signal energy fell below 0.5 $\%$ of maximum RMS energy (Tiede Reference Tiede2005). These acoustic estimates were superimposed on time-aligned waveforms and short-time spectrograms plotted up to 10000 Hz, and inspected and manually adjusted by a trained phonetician when necessary (approximately 5 $\%$ of tokens). Vowel acoustic duration (AcDur) was calculated as the difference between acoustic limits (B–A).
2.6 Articulatory analysis
Acoustic and articulatory landmarks are illustrated for tokens of parp and pup in Figure 3. The topmost panel is the acoustic waveform, the middle panel is the velocity of the tongue dorsum (TD) sensor and the lower panel is the TD trajectory. For simplicity, velocity and displacement are shown only in the vertical dimension, however, measurements were based on the tangential velocity of the TD sensor in both horizontal (TD x ) and vertical (TD y ) dimensions. A trained phonetician located a lingual vowel gesture in each target word using the findgest algorithm in the matlab-based software package mview (Tiede Reference Tiede2005). The findgest algorithm uses the tangential velocity of a given sensor to locate several gesture landmarks (Figure 3).
Gestural onset (GONS) was the point before P1 where velocity dropped to 20 $\%$ of P1 velocity, nucleus onset (NONS) was the point after P1 where velocity dropped to 20 $\%$ of P1 velocity, nucleus offset (NOFFS) was the point before P2 where velocity dropped to 20 $\%$ of P2 velocity, gestural offset (GOFFS) was the point after P2 where velocity dropped to 15 $\%$ of P2 velocity. Vowel gesture durations (GDur) spanned from vowel gesture onset (GONS) to vowel gesture offset (GOFFS).
Three intervals were demarcated in each vowel gesture (Chitoran, Goldstein & Byrd Reference Chitoran, Goldstein, Byrd, Gussenhoven and Warner2002, Gafos Reference Gafos2002): (i) Formation interval (FI) = vowel gesture onset (GONS) to gesture nucleus onset (NONS), (ii) Gesture nucleus (GN) = gesture nucleus onset (NONS) to gesture nucleus offset (NOFFS), (iii) Release interval (RI) = gesture nucleus offset (NOFFS) to vowel gesture offset (GOFFS). The duration of these three intervals were represented as proportionate durations of the entire vowel gesture duration (FI $\%$ , GN $\%$ and RI $\%$ ).
The choice of the current tripartite division of vowel gestures is informed by theories of gestural grammar (Browman & Goldstein Reference Browman, Goldstein, Kingston and Beckman1990, Chitoran et al. Reference Chitoran, Goldstein, Byrd, Gussenhoven and Warner2002, Gafos Reference Gafos2002, Davidson Reference Davidson2004). Several studies have shown that linguistic grammars have access to and utilise the internal temporal structures of vowel and consonant gestures (see Gafos Reference Gafos2002 for review). Acoustic studies of vowels also utilise a tripartite division, particularly in reference to vowel length contrasts, where the durations of these three sub-vocalic intervals are important to differentiating vowel length in many languages, including AusE (Cochrane Reference Cochrane1970, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006).
We also report Euclidean distances between the articulatory targets (TargDiff) of the three long–short vowel pairs. Euclidean distances were calculated for each participant between the centroid of the articulatory targets of each of the three long vowels (MAXC in Figure 3; [(CentroidTD xl , CentroidTD yl )] and the individual tokens of their short equivalents [(TD xs i , TD ys i )]:
A challenge of analysing articulatory data across participants is that differences in tongue shape, vocal tract size and sensor placement lead to cross–participant differences in constriction location that may not be linguistically meaningful. For example, a retraction of the TD sensor to 30 mm behind the front teeth (maxillary occlusal plane) may result in the production of a front vowel for one participant, or a back vowel for another participant, depending on the size and shape of each participant’s vocal tract (Blackwood Ximenes et al. Reference Blackwood Ximenes and Carignan2017). To compare across participants, Euclidean distance measures were normalised through z-scoring (TargDiffz), as outlined by Lobanov (Reference Lobanov1971). Lobanov’s (Reference Lobanov1971) method was originally applied to vowel formants, however recently it has been applied to normalisation of EMA sensor positions (Shaw et al. Reference Shaw, Chen, Proctor and Derrick2016, Blackwood Ximenes et al. Reference Blackwood Ximenes and Carignan2017).
Lindblom’s (Reference Lindblom1963) target undershoot account would predict that long–short pairs with larger durational differences would also exhibit larger vowel quality differences. We therefore also included the difference between the time to target attainment of long and short vowels as a variable in our models examining vowel quality across the three long–short pairs. Difference in time to target attainment (ms; TimeTargDiff) was calculated for each participant between the average time to target of each of the three long vowels [AverageTimetoTarg l ] (GONS to MAXC in Figure 3) and time to target of individual tokens of their short equivalents [TimetoTarg xsi ].
Lip protrusion was used as a measure of lip rounding in the present study, in line with previous work by Blackwood Ximenes et al. (Reference Blackwood Ximenes and Carignan2017). Degree of lip protrusion of /oː/ and /ɔ/ was calculated based on the average horizontal position of the UL and LL sensors (Figure 2 above) measured at the target of the lingual gesture of the vowel (MAXC; Figure 3). This average horizontal position was z-transformed by participant.
2.6.1 Data exclusion
A total of 816 target words were elicited (12 target words × 10 repetitions × 6 participants) + (12 items × 8 repetitions × 1 participant). In six coronal context tokens there were more than two velocity peaks on the TD sensor trajectory between the maximum constrictions of the onset and coda consonant: these tokens were excluded from further analysis. Seven further items were excluded due to mispronunciation and sensor tracking errors, leaving a total of 803 analysed target words.
2.7 Statistical analysis
Statistical tests were applied in R using the lme4 (Bates et al. Reference Bates, Maechler, Bolker and Walker2015) and lmerTest (Kuznetsova, Brockhoff & Christensen Reference Kuznetsova, Brockhoff and Christensen2017) packages.
For the dependent variables of (i) acoustic duration (AcDur), (ii) gesture duration (GDur), (iii) distance between articulatory targets (TargDiffz), (iv) lip protrusion of /oː/ and /ɔ/ (LPz), (v) proportionate formation interval duration (FI $\%$ ), (vi) proportionate gesture nucleus duration (GN $\%$ ) and (vii) proportionate release interval duration (RI $\%$ ), we constructed linear mixed effects regression models with independent variables of vowel length (long, short), vowel pair (/iː–ɪ/, /ɐː–ɐ/, /oː–ɔ/) and consonant context (labial – /pVp/, coronal – /tVt/) with a three way interaction. When exploring TargDiffz time to vowel target was also included as a potential predictor.
To find the optimal model for each dependent variable, we explored top down, step-wise model building strategies, where a model was compared with another model one order less complex, using log-likelihood ratios. Final models only included main effects and interactions that significantly improved model fit (p >.050). Participant differences were modelled using random intercepts for participant and repetition. In cases where a full random-effects structure resulted in model convergence issues or a singular fit, the random effect with the lowest variance was removed; this is in line with recommendations by Barr et al. (Reference Barr, Levy, Scheepers and Tily2013) and Bates et al. (Reference Bates, Maechler, Bolker and Walker2015). The random components of models were not of further interest and are not reported.
P-values for main effects were obtained through maximum likelihood tests with Satterthwaite approximations to degrees of freedom (Kuznetsova et al. Reference Kuznetsova, Brockhoff and Christensen2017). Because the variable vowel pair had three levels (/iː–ɪ/, /ɐː–ɐ/ and /oː–ɔ/), we also conducted individual pairwise least-mean squares regression analysis (with Holm-Bonferroni corrections) using the emmeans package (Lenth Reference Lenth2019). This facilitated the comparison of the main effect of vowel pair, and interactions between vowel length and vowel pair and consonant context and vowel pair. For pairwise analysis, factors were coded as: vowel length: LONG = 0 and consonant context: LABIAL = 0. For vowel pair analysis: /iː–ɪ/ = 0. For the comparison between /ɐː–ɐ/ and /oː–ɔ/, /ɐː–ɐ/ = 0. Full summaries of all linear mixed effects models are provided in Appendix Tables A2–A8.
Euclidean distance measures are an incomplete measure of vowel target similarity as they fail to take into account distribution differences across the different vowels. Two vowel pairs may exhibit a similar distance between their centroids but due to different overall distributions of individual token values may have vastly different degrees of overlap (Warren Reference Warren2017). To overcome issues of different distributions across different vowels, Pillai-Bartlett scores have been used to examine spectral overlap in ongoing vowel mergers in acoustic literature (Hay, Warren & Drager Reference Hay, Warren and Drager2006, Hall-Lew Reference Hall-Lew2010, Nance Reference Nance2011, Wong Reference Wong2012, Havenhill Reference Havenhill2015). The Pillai-Bartlett score is one of the test statistics of MANOVAs. The higher the value of the Pillai-Bartlett score, the greater the difference between the two analysed distributions with respect to the dependent variables of the MANOVA (Hay et al. Reference Hay, Warren and Drager2006, Hall-Lew Reference Hall-Lew2010). Three MANOVA models were constructed (one for each vowel pair), with dependent variables of z-transformed TD fronting (TD xz ) and TD height (TD yz ) with the following equation:
Finally, because speech rate was not actively controlled during this experiment, we wished to determine whether differences in speech rate contributed to the observed differences in measured variables. We measured the onset of the target word to the onset of the target word in the following trial (token-to-token duration) as an approximation for speech rate. Token-to-token duration was a poor predictor in all the models analysed in this study, and as such was removed from all models during the model selection process. Appendix Figure A1 shows the correlation between token-to-token duration and the dependent variables analysed in this study.
3 Results
3.1 Vowel durations
3.1.1 Acoustic duration
We first wished to confirm that participants in the present study produced short vowels with a shorter acoustic duration than their long equivalents, in line with previous studies of vowel length in AusE (Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation:
A full model summary is provided in Appendix Table A2.
Mean acoustic duration of short vowels was 62 $\%$ of the mean acoustic duration of long vowels (F(796) = 1720.9, p < .001; Table 2, Figure 4).
On average, /ɪ/ was 70 $\%$ the acoustic duration of /iː/, /ɐ/ was 57 $\%$ the acoustic duration of /ɐː/ and /ɔ/ was 61 $\%$ the acoustic duration of /oː/. There was a vowel length × vowel pair interaction F(795) = 67.3, p < .001) indicating that the magnitude of difference between long and short vowels differed by vowel pair. The acoustic duration difference between /iː/ and /ɪ/ was smaller than the acoustic duration difference between /ɐː/ and /ɐ/ ( $\beta$ = −35 ms, t(802) = −11.0, p < .001) and the acoustic duration difference between /oː/ and /ɔ/ ( $\beta$ = −27 ms, t(796) = −8.5, p < .001). The acoustic duration difference between /ɐː/ and /ɐ/ was also greater than the acoustic duration difference between /oː/ and /ɔ/ ( $\beta$ = 8 ms, t(796) = 2.5, p = .012).
Overall, coronal context vowels exhibited longer acoustic durations than labial context vowels (F(796) = 30.2, p < .001). However the magnitude of the effect of consonant context differed across the three vowel pairs (F(796) = 7.2, p < .001). The acoustic duration of /iː–ɪ/ did not differ between labial and coronal contexts (p = .765). The acoustic duration of /ɐː–ɐ/ was longer in coronal than in the labial context ( $\beta$ = 8.3 ms, t(802) = 3.7, p = .001), this was also the case for /oː–ɔ/ ( $\beta$ = 12.6 ms, t(802) = 5.6, p < .001). The acoustic duration of /ɐː–ɐ/ and /oː–ɔ/ increased to a similar extent in the coronal context (p = .345).
Our results regarding vowel length are therefore congruent with prior acoustic studies of vowel length in AusE.
3.1.2 Gesture duration
Our first prediction was that lingual gestures of short vowels should be shorter than those of long vowels (Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). To test this, a linear mixed effects model was constructed using the method described in Section 2.7 with the following equation:
A full model summary is provided in Appendix Table A3.
Mean duration of short vowel gestures was 90 $\%$ of the mean duration of long vowel gestures (F(786) = 59.3, p < .001; Table 2, Figure 4). The magnitude of difference between long and short vowels differed across labial and coronal contexts (F(787) = 7.2, p = .007). The difference in gesture duration between long and short vowels was smaller in coronal than in the labial context ( $\beta$ = 28 ms, t(787) = 2.7, p = .048).
Vowel gesture durations were shorter in the coronal than the labial context (F(787) = 405.7, p < .001). The gesture duration of all vowel pairs were shorter in the coronal than in the labial context, but the magnitude of the effect of consonant context on vowel gesture duration differed across vowel pairs (F(787) = 8.7, p < .001). Consonant context had the largest effect on the gesture duration of /oː–ɔ/. The gesture duration of /oː–ɔ/ shortened to a greater extent in the coronal context than the gesture duration of both /iː–ɪ/ ( $\beta$ = −48.2 ms, t(787) = 3.7, p = .001) and to a greater extent than /ɐː–ɐ/ ( $\beta$ = −38.2 ms, t(787) = 3.0, p = .019). /iː–ɪ/ and /ɐː–ɐ/ shortened to a similar extent in the coronal (compared to the labial) context (p = .544).
3.2 Articulatory targets
Our second prediction posited that /ɐː–ɐ/ will exhibit the most similar articulatory targets of the three vowel pairs, whereas /iː–ɪ/ and /oː–ɔ/ will have less similar pairwise articulatory targets. Our third prediction posited that vowel duration and vowel quality would exhibit a trading relationship in AusE, such that vowel pairs with the largest acoustic duration difference would exhibit the smallest difference in target quality and vice versa. To determine this, the similarity in articulatory target tongue dorsum positions were compared for the three long–short vowel pairs produced in labial (/pVp/) and coronal (/tVt/) contexts (Table 3, Figure 5), using the method illustrated in Section 2.6. The z-transformed absolute Euclidean distance between the targets of long and short vowel pairs (TargDiffz) was modelled using the method described in Section 2.7, with the following equation:
The duration difference between time to long and short vowel target (TimeTargDiff) did not improve model fit (p = .536), so was not included in the present model. A full model summary is provided in Appendix Table A4.
TargDiffz differed by vowel pair (F(376) = 31.8, p < .001). TargDiffz was greater between /iː/ and /ɪ/ than between /ɐː/ and /ɐ/ ( $\beta$ = −0.7, t(372) = −3, p < .001) but smaller than TargDiffz between /oː/ and /ɔ/ ( $\beta$ = 0.3, t(372) = 2.4, p = .017; Table 3, Figure 5). The TargDiffz between /ɐː/ and /ɐ/ was also smaller than the TargDiffz between /oː/ and /ɔ/ ( $\beta$ = 1.4, t(372) = 7.9, p < .001). Vowels produced in the coronal context had larger TargDiffz than vowels produced in the labial context (F(376) = 5.9, p =.016).
Overlap between the distributions of long and short vowel targets was also compared using Pillai-Bartlett scores. Pillai-Bartlett scores are shown in Table 3. Lower Pillai scores indicate more overlap between two distributions. /ɐː–ɐ/ exhibited the lowest Pillai-Bartlett scores (0.24) of the three vowel pairs, while /iː–ɪ/ (0.47) and /oː–ɔ/ (0.48) exhibited similar scores.
3.3 Lip rounding differences between /oː/ and /ɔ/
Our fourth prediction posited that lip rounding should be greater for /oː/ than /ɔ/. To investigate this, we compared lip protrusion of /oː/ and /ɔ/. Lip protrusion was calculated as the average horizontal position of the UL and LL sensors at the lingual target of the two vowels z-transformed across participants. Differences in lip protrusion between /oː/ and /ɔ/ was modelled using the method described in Section 2.7 using the following equation:
A full model summary is provided in Appendix Table A5.
Overall, /oː/ was produced with more lip protrusion than /ɔ/ (F(198) = 143.4, p < .001; Figure 6). Z-transformed lip protrusion was also greater for coronal than labial tokens (F(198) = 10.3, p = .002), suggesting greater lip rounding for /oː/ than /ɔ/.
3.4 Interval durations
Our final prediction was that, in line with acoustic studies (Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016), long and short vowel gestures will be characterised by different dynamic articulatory patterns. Short vowels should exhibit a proportionately shorter period of articulatory stability around their mid-points and proportionately longer articulatory transitions to following consonants. However, the long vowel /iː/ should be characterised by a lengthy phonological onglide as is characteristic of AusE (Cox Reference Cox2006). To determine whether this was the case, proportionate durations of three intervals within each vowel gesture were compared across the three long–short vowel pairs using a linear mixed effects model constructed using the method described in Section 2.7. Full model summaries are provided in Appendix Tables A6–A8. Absolute and proportionate durations of the three sub-gestural intervals are presented in Table 2 and Figures 7 and 8. However, statistical analyses were undertaken only for the proportionate formation interval, gesture nucleus and release interval durations (FI $\%$ , GN $\%$ and RI $\%$ , respectively).
Figure 7 compares tongue dorsum displacement throughout the vowel gesture for each pair of vowels. For each vowel, TD displacement with respect to dorsal location at the (i) vowel gesture onset (GONS) is tracked at three gesture landmarks: (ii) Nucleus onset (NONS), (iii) Nucleus offset (NOFFS), and (iv) Gesture offset (GOFFS). At each landmark, displacement is calculated as mean Euclidean distance in the midsagittal plane between TD xy and TD xy at GONS. Timing of landmarks is expressed as a proportion of total vowel gesture duration (GDur).
We discuss results for each of the three intervals separately below.
3.4.1 Formation interval
A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation:
A full model summary provided in Appendix Table A6.
As shown in Figure 8, vowel length conditioned FI $\%$ of the three vowel pairs differently across the two consonant contexts (F(796) = 6.0, p = .003).
In the labial context, /iː/ had a longer FI $\%$ than /ɪ/ ( $\beta$ = −4 $\%$ , t(796) = −2.6, p = .030). While in the coronal context, FI $\%$ of /iː/ did not differ from FI $\%$ of /ɪ/ (p = .300). FI $\%$ of /ɐː/ was shorter than FI $\%$ of /ɐ/ in both labial ( $\beta$ = 5 $\%$ , t(807) = 3.2, p = .008) and coronal contexts ( $\beta$ = 5 $\%$ , t(807) = .3.1, p = .009). The magnitude of difference between /ɐː/ and /ɐ/ did not differ across labial and coronal context (p = .999). FI $\%$ of /oː/ was shorter than FI $\%$ of /ɔ/ in the labial context ( $\beta$ = 5 $\%$ , t(807) = 3.4, p = .004), but did not differ in the coronal context (p = .898). In the labial context, the magnitude of difference in FI $\%$ between /ɐː/ and /ɐ/ did not differ from the magnitude of difference between /oː/ and /ɔ/ (p = .858).
3.4.2 Gesture nucleus
A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation:
A full model summary is provided in Appendix Table A7.
Short vowels had shorter GN $\%$ than long vowels (F(796) = 236.7, p < .001; Figure 8, Table 2). There was also a vowel length × vowel pair interaction, indicating that the magnitude of difference between long and short vowel GN $\%$ differed across the three vowel pairs (F(796) = 8.9, p < .001). /iː–ɪ/ had the smallest pairwise difference in GN $\%$ of the three pairs. The difference in GN $\%$ between /iː/ and /ɪ/ was smaller than the difference between /ɐː/ and /ɐ/ ( $\beta$ = −4 $\%$ , t(796) = −3.2, p = .007), also smaller than the difference in GN $\%$ between /oː/ and /ɔ/ ( $\beta$ = −5 $\%$ , t(796) = −3.9, p = .001). The difference in GN $\%$ between /ɐː/ and /ɐ/ was equivalent to the difference between /oː/ and /ɔ/ (p = .693). GN $\%$ was also shorter for coronal context than labial context vowels (F(796) = 125.3, p < .001). However, there was also a consonant context × vowel pair interaction, indicating that consonant context conditioned GN $\%$ of the three vowel pairs to different extents (F(796) = 8.9, p <.001; Figure 8, Table 2). While the GN $\%$ of all three vowel pairs decreased in the coronal compared to labial contexts, GN $\%$ of /iː–ɪ/ differed less across labial and coronal contexts than /ɐː–ɐ/ ( $\beta$ = −4 $\%$ , t(804) = −3.6, p = .001) and /oː–ɔ/ ( $\beta$ = −6 $\%$ , t(804) = −5.1, p < .001). GN $\%$ of /ɐː–ɐ/ and /oː–ɔ/ decreased by a similar extent in the coronal (compared to the labial) context (p = .140).
3.4.3 Release interval
A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation:
Full model summary provided in Appendix Table A8.
Short vowels had greater RI $\%$ than long vowels (F(796) = 70.8, p < .001; Figure 8, Table 2). Overall, there was no difference in RI $\%$ across labial and coronal contexts (p = .237). However, there was a consonant context × vowel pair interaction, indicating that the effect of consonant context on RI $\%$ differed across vowel pairs (F(796) = 19.5, p < .001; Figure 8). RI $\%$ of /iː–ɪ/ was longer in the coronal context than labial context ( $\beta$ = 5 $\%$ , t(802) = 4.3, p < .001). While RI $\%$ of /ɐː–ɐ/ tended to be shorter in the coronal context ( $\beta$ = −2 $\%$ , t(802) = −2.2, p = .083). RI $\%$ of /oː–ɔ/ was also shorter in the coronal than the labial context ( $\beta$ = −4 $\%$ , t(802) = −4.1, p < .001). Both /ɐː–ɐ/ and /oː–ɔ/ decreased to a similar extent in the coronal (compared to the labial) context (p = .346).
3.5 Summary of main findings
This study compared the lingual articulatory properties of three long–short vowel pairs /iː–ɪ/, /ɐː–ɐ/ and /oː–ɔ/ in two symmetrical consonant contexts /pVp/ and /tVt/. The main findings of this study are:
acoustic durationFootnote 4 of short vowels was 62 $\%$ the acoustic duration of long vowels
gesture duration (measured as GDur) of short vowels was 90 $\%$ that of long vowels
/ɐː–ɐ/ had the greatest pairwise difference in acoustic duration, while /iː–ɪ/ had the smallest pairwise difference in acoustic duration
the difference between long and short vowel gesture durations was larger in the labial than in the coronal context
/ɐː/ and /ɐ/ were produced with the most similar mean articulatory targets and the most overlapping long–short distributions; /oː/ and /ɔ/ were produced with the least similar articulatory targets
contrasts between long and short vowel FI $\%$ and GN $\%$ were reduced in coronal compared to labial contexts
FI $\%$ was longer for /iː/ than for /ɪ/, but longer for /ɐ/ and /ɔ/ than /ɐː/ and /oː/ in the labial context
GN $\%$ was shorter for short vowels, compared to their long equivalents
pairwise difference in GN $\%$ was smallest between /iː–ɪ/, and equivalent between /ɐː–ɐ/ and /oː–ɔ/
RI $\%$ was longer for short vowels, compared to their long equivalents
4 Discussion
The aim of this study was to investigate lingual articulation of vowel length contrasts in AusE, building on previous, largely acoustic description of AusE vowel contrasts. These data provide an articulatory characterisation of some key aspects of vowel length contrasts in AusE, revealing new insights into AusE production kinematics.
4.1 Gesture duration
We first explored the impact of contrastive vowel length on vowel gesture durations. Our first prediction was that in line with acoustic durations, the duration of short vowel gestures would be shorter than those of long vowel gestures, but that the duration difference between long and short vowels should be reduced in the articulatory domain. Our results confirmed this prediction. On average short vowels were 62 $\%$ the acoustic duration of long vowels, in line with previous acoustic studies of AusE vowel length (Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016), while short vowel gestures were 90 $\%$ the duration of long vowel gestures. Hertrich & Ackermann (Reference Hertrich and Ackermann1997) have speculated that the discrepancy between acoustic duration and gesture durations indicates that phonological vowel length contrast is not produced as a difference in either the duration of laryngeal activity (vowel voicing) or supralaryngeal (lips, tongue, jaw) movement alone, but rather reflects a vowel length dependent difference in the coordination of laryngeal and supralaryngeal gestures.
The gesture durations of coronal context vowels were 77 $\%$ the duration of labial context vowels. This result is congruent with findings that show coronal consonants constrain the production of following vowels to a greater degree than labial consonants (Recasens, Pallarès & Fontdevila Reference Recasens, Pallarès and Fontdevila1997, Sussman et al. Reference Sussman, Bessell, Dalston and Majors1997, Löfqvist Reference Löfqvist1999, Fowler & Brancazio Reference Fowler and Brancazio2000, Recasens Reference Recasens2002, Harrington et al. Reference Harrington, Hoole, Kleber and Reubold2011, Harrington, Kleber & Reubold Reference Harrington, Kleber and Reubold2011). Due to the relative independence of the lips and tongue dorsum, vowel gestures in labial contexts can begin earlier than those in coronal contexts, which can result in a longer duration as has been observed in this study. There was also a smaller difference between the duration of long and short vowel gestures in the coronal than in the labial context. Across the two consonant contexts, the duration of short vowel gestures was less conditioned by consonant context than the duration of long vowel gestures, resulting in a reduction of contrast in the coronal context. This finding is consistent with general observations that the duration of short vowels is more stable than the duration of long vowels across different speech rates, prominences and phonetic contexts (Klatt Reference Klatt1973, Port Reference Port1981, Gopal Reference Gopal1990, Fletcher et al. Reference Fletcher, Harrington and Hajek1994, Hoole, Mooshammer & Tillmann Reference Hoole, Mooshammer and Tillmann1994, Hoole & Mooshammer Reference Hoole and Mooshammer2002, Jong & Zawaydeh Reference Jong and Zawaydeh2002, Mooshammer & Fuchs Reference Mooshammer and Fuchs2002, Hirata Reference Hirata2004, White & Mády Reference White and Mády2008, Nakai et al. Reference Nakai, Kunnari, Turk, Suomi and Ylitalo2009, Beňuš Reference Beňuš2011, Cox & Palethorpe Reference Cox and Palethorpe2011, Cox et al. Reference Cox, Palethorpe and Miles2015, Peters Reference Peters2015, Penney et al. Reference Penney, Cox, Miles and Palethorpe2018).
However, the shorter gesture duration of coronal context vowels contradicts our acoustic duration results, where coronal context vowels exhibited a longer acoustic duration than labial context vowels. While this has also been observed in other acoustic studies of English (House & Fairbanks Reference House and Fairbanks1953, Lehiste & Peterson Reference Lehiste and Peterson1961, Port Reference Port1981), the discrepancy between acoustic and gesture durations once again suggests that the relationship between acoustic and articulatory landmarks in vowel production are sensitive to factors such as vowel length and consonant context.
4.2 Articulatory target similarity
In Section 3.2, we compared articulatory targets of long and short vowel pairs in AusE. Acoustic studies of AusE have shown that long–short vowel pairs differ in the degree of spectral similarity, with /ɐː–ɐ/ the least spectrally differentiated and /oː–ɔ/ the most spectrally differentiated (Bernard Reference Bernard1970, Harrington & Cassidy Reference Harrington and Cassidy1994, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Cox et al. Reference Cox, Palethorpe and Bentink2014, Elvin et al. Reference Elvin, Williams and Escudero2016, Cox & Fletcher Reference Cox and Fletcher2017). Our second prediction was that although all long–short vowel pairs should be produced with similar articulatory targets, the degree of similarity should differ by vowel pair. Of the three long–short vowel pairs, /ɐː–ɐ/ was predicted to exhibit the most similar articulatory targets, while /oː–ɔ/ was predicted to be realised with the least similar articulatory targets. This was indeed the case, /ɐː–ɐ/ had the shortest Euclidean distance between vowel targets and the most overlapping distributions of the three vowel pairs (Figure 5). /oː–ɔ/ had the largest Euclidean distance between long and short targets and the least overlapping distributions. However, Euclidean distance values were also highly variable for /oː–ɔ/ (Figure 5), which may indicate participant-specific strategies for the production of this pair, with some participants producing the pair with more articulatorily distinct targets than others (Appendix Figure A2).
There was a larger difference in long and short vowel target quality in coronal than in the labial context (Figure 7). Prior studies have suggested that short vowels may be more coarticulated with following consonants than their long equivalents (Hoole & Mooshammer Reference Hoole and Mooshammer2002), resulting in short vowels exhibiting more target quality variation across consonant contexts than their long equivalents. Future studies should examine interactions between consonant context and articulation of AusE vowels in more detail.
We also observed that /oː/ exhibited greater lip protrusion than /ɔ/, suggesting that /oː/ is more rounded than /ɔ/. This is congruent with Blackwood Ximenes et al.’s (Reference Blackwood Ximenes and Carignan2017) observations of lip rounding differences between /oː/ and /ɔ/ in three speakers of AusE. Blackwood Ximenes et al. (Reference Blackwood Ximenes and Carignan2017) have suggested that differences in lip rounding between /oː–ɔ/ may also contribute to F1 and F2 differences between the pair, raising and retracting /oː/ in the acoustic space relative to /ɔ/ independent of lingual adjustments. This may also be the case here; however, the tongue dorsum position of /oː/ was still higher and retracted compared to /ɔ/. There was also variation in the degree of lip protrusion differences across participants. M3 and W3 produced /oː–ɔ/ with overlapping lip protrusion values. This once again highlights potential speaker-specific strategies in the production of these vowels. Although more research is needed to determine whether overlapping lip protrusion and/or tongue dorsum postures between /oː/ and /ɔ/ are reflected in overlapping F2 values in these speakers.
4.3 Trade-offs between acoustic duration and articulatory target
In languages such as Japanese, Swedish and Thai, acoustic vowel duration and spectral quality have a trading relationship as cues to vowel length (Delattre Reference Delattre1962, Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964, Lehnert-LeHouillier Reference Lehnert-LeHouillier2010); that is, the more differentiated the acoustic targets of a long–short vowel pair, the less listeners rely on durational cues and vice versa (Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964). In line with these studies, our third prediction was that the vowel pair with the largest pairwise difference in duration would have the most similar articulatory targets and vice versa. Our results partially support this prediction. /ɐː–ɐ/ had the largest pairwise difference in acoustic duration, and the most similar articulatory targets of the three vowel pairs. This is consistent with prior studies that have shown that vowels in this pair differ primarily in acoustic duration, and have largely overlapping acoustic targets (Bernard Reference Bernard1970, Cochrane Reference Cochrane1970, Harrington & Cassidy Reference Harrington and Cassidy1994, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Elvin et al. Reference Elvin, Williams and Escudero2016). However, /iː–ɪ/ had the smallest pairwise difference in acoustic duration, but /oː–ɔ/ had the least similar articulatory targets. This result is not unexpected for two reasons. First, /oː–ɔ/ can be differentiated by acoustic target quality alone, independent of durational information (Watson & Harrington Reference Watson and Harrington1999). Second, while duration is important for differentiating /iː/ and /ɪ/ in AusE, there are also dynamic formant differences (namely /iː/’s prolonged acoustic onglide) that also serve to further differentiate /iː/ from /ɪ/ (Harrington & Cassidy Reference Harrington and Cassidy1994, Harrington et al. Reference Harrington, Cox and Evans1997, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006, Cox et al. Reference Cox, Palethorpe and Bentink2014, Cox et al. Reference Cox, Palethorpe and Miles2015). The previously observed trade-off between acoustic duration and spectral quality as cues to vowel length (Delattre Reference Delattre1962, Hadding-Koch & Abramson Reference Hadding-Koch and Abramson1964, Lehnert-LeHouillier Reference Lehnert-LeHouillier2010) may rather be a trade-off between durational and non-durational cues to vowel length contrast, with the dynamic differences between /iː/ and /ɪ/ contributing to this trading relationship.
Our findings also challenge a purely physiological account of vowel quality differences between long and short vowels, such as that proposed in Lindblom’s (Reference Lindblom1963) target undershoot model. First, in a target undershoot account, we would expect the vowel pair with the largest durational differences to exhibit the largest vowel quality differences. However, this was not the case for either acoustic duration or gesture duration. As mentioned above, /ɐː–ɐ/ had the largest difference in acoustic duration, and the smallest pairwise difference in vowel quality. In terms of gesture duration, the difference between long and short vowels was similar across the three vowel pairs. Furthermore, in a target undershoot account, we would predict that the difference in duration to the time of gestural target would be a predictor of differences in target quality (TargDiffz). Our results do not support this account. Difference in time to target was not a significant predictor of vowel quality differences across our three vowel pairs.
4.4 Kinematic differences
Finally, we examined dynamic kinematic differences between long and short vowel gestures. Previous acoustic production studies have found differences in the formant dynamics of long and short AusE vowels suggesting differences in articulatory kinematics (Bernard Reference Bernard1970, Cochrane Reference Cochrane1970, Watson & Harrington Reference Watson and Harrington1999, Cox Reference Cox2006). We predicted that short vowel gestures would have a proportionately shorter period of articulatory stability around their mid-points and proportionately longer articulatory transitions to following consonants than their long equivalents. In our comparison of these intervals in long and short vowel gestures, short vowel gestures indeed had proportionately shorter gesture nuclei and proportionately longer release intervals than long vowel gestures (Figures 7 and 8). These data provide the first articulatory evidence supporting acoustic studies that have found AusE short vowels to have a proportionately shorter acoustic steady-states and proportionately longer acoustic offglides than long vowels (Cox Reference Cox2006).
We also posited that /iː/ would exhibit a prolonged phonological onglide as is characteristic of AusE (Harrington et al. Reference Harrington, Cox and Evans1997, Cox Reference Cox2006, Cox et al. Reference Cox, Palethorpe and Bentink2014). Our results generally confirmed this, with /iː/ exhibiting the longest proportionate formation interval of the long vowels. However, proportionate formation interval of /iː/ was only significantly longer than /ɪ/ in the labial context. The shortening of /iː/ in the coronal context, may be due to the articulatory requirements of coda /t/ on the /iː/ gesture (Recasens et al. Reference Recasens, Pallarès and Fontdevila1997, Sussman et al. Reference Sussman, Bessell, Dalston and Majors1997, Recasens Reference Recasens2002). Several studies have noted that high front vowels in syllables containing coronal consonants, exhibit more retracted acoustic and articulatory targets than those produced in other consonantal contexts (Stevens & House Reference Stevens and House1963; Schouten & Pols Reference Schouten and Pols1979a, b; Sussman et al. Reference Sussman, Bessell, Dalston and Majors1997; Strange & Bohn Reference Strange and Bohn1998; Hoole Reference Hoole1999; Nearey Reference Nearey, Stewart Morrison and Assman2013). In English, coronal consonants exhibit higher coarticulatory resistances than surrounding vowels, with the targets of surrounding vowels compromised to reach the desired articulatory goal of the coronal consonant (Sussman et al. Reference Sussman, Bessell, Dalston and Majors1997, Hoole Reference Hoole1999). In the production of coda /t/, the tongue dorsum must be sufficiently retracted for the tongue tip to be raised for alveolar closure (Hoole Reference Hoole1999).
As shown in Figure 5, /iː/ and /ɪ/ are sometimes produced with a more retracted TD posture in the coronal context, supporting these observations. In the production of /iː/ in the coronal context, the proportionately later achievement of vowel target, due to prolonged onglide, is antagonistic to the required retracted position necessary for production of the coronal coda. Therefore, speakers may shorten the duration of onglide in /t/ final syllables to allow earlier tongue dorsum retraction for coda /t/ closure. As no such constraint is placed on /iː/ in labial final syllables, the phonological onglide is present. More research into production of /iː/ in non-symmetrical consonant contexts may further illuminate the relative contribution of onset–vowel and vowel–coda organisation in the production of onglide in AusE.
/ɐː/ had a shorter proportionate formation interval than /ɐ/ in both the labial and coronal context, while /oː/ had a shorter proportionate formation interval than /ɔ/ in the labial context (Figures 6 and 7). This result is largely inconsistent with previous acoustic studies of vowel length in not only AusE but also American English and German, which have found no significant difference in proportionate acoustic onglide between long and short vowels (Lehiste & Peterson Reference Lehiste and Peterson1961, Strange & Bohn Reference Strange and Bohn1998, Cox Reference Cox2006). However, Lehiste & Peterson (Reference Lehiste and Peterson1961) descriptively reported that short/lax vowels in American English (excluding /ɪ/) had proportionately longer acoustic onglides than their long equivalents. The proportionately longer formation interval of short /ɐ/ and /ɔ/ reported here, are consistent with general observations that vowels of shorter durations exhibit proportionately longer transitions from and to surrounding phonemes (Gay Reference Gay1981, Soli Reference Soli1982, Van Summers Reference Van Summers1987). The discrepancy between prior acoustic and current articulatory results may arise from, as discussed above, potential differences in laryngeal–supralaryngeal coordination in long and short vowel gestures (Hertrich & Ackermann Reference Hertrich and Ackermann1997). If a larger proportion of short vowels is concealed by preceding consonant aspiration than long vowels, it may mask articulatory differences in onglide in the acoustic domain.
We also found that the magnitude of gesture nucleus durations differed by vowel pair, with /iː–ɪ/ exhibiting the smallest pairwise difference in proportionate gesture nucleus duration of the three vowel pairs. This appears to be the result of shorter proportionate gesture nucleus duration of /iː/ compared to the other two long vowels (Table 2), driven by the presence of onglide in /iː/. Coronal context vowels also exhibited a proportionately shorter gesture nucleus duration than labial context vowels. The shortened gesture nucleus duration in the coronal context once again may be due to the relatively greater coarticulatory influence of /t/ on vowels (Recasens et al. Reference Recasens, Pallarès and Fontdevila1997, Recasens Reference Recasens2002).
The effect of consonant context on proportionate release interval duration also differed across vowel pairs. Release intervals were longer in the coronal context than labial context for /iː–ɪ/ but were shorter in the coronal than labial context for /ɐː, ɐ, oː, ɔ/. These patterns appear to be in a trading relationship with formation interval duration, although the exact mechanism behind this requires further investigation.
4.5 Future directions
There are some limitations to this study. First, we did not investigate articulatory control mechanisms that may underlie durational and vowel quality differences such as stiffness and velocity. This is primarily because speech rate was not actively controlled in this study. Speech rate also conditions the durational, spatial and kinematic properties of vowels (Ostry & Munhall Reference Ostry and Munhall1985, Kroos et al. Reference Kroos, Hoole, Kühnert and Tillmann1997, Hoole & Mooshammer Reference Hoole and Mooshammer2002, Beňuš Reference Beňuš2011). In particular, changes in duration due to variation in speech rate may be implemented through adjustments in gestural stiffness (the ratio of velocity to displacement) or adjustments in only velocity (Gay Reference Gay1981, Byrd & Tan Reference Byrd and Tan1996, Shaiman Reference Shaiman2001). These mechanisms are not mutually exclusive and may also interact with the implementation of vowel length (Kroos et al. Reference Kroos, Hoole, Kühnert and Tillmann1997, Hoole & Mooshammer Reference Hoole and Mooshammer2002). Future speech-rate controlled studies should investigate differences in stiffness, velocity and intergestural overlap between long and short vowels and how these can be understood within mass-spring implementations of Task-Dynamics (Saltzman & Kelso Reference Saltzman and Scott Kelso1987, Saltzman & Munhall Reference Saltzman and Munhall1989, Hawkins Reference Hawkins, Docherty and Robert Ladd1992, Turk & Shattuck-Hufnagel Reference Turk and Shattuck-Hufnagel2020).
We also did not investigate differences in the intergestural organisation of syllables containing long vs. short vowels in AusE. In German, research suggests that short vowels are more overlapped with following coda consonants than long vowels (Hertrich & Ackermann Reference Hertrich and Ackermann1997, Kroos et al. Reference Kroos, Hoole, Kühnert and Tillmann1997, Hoole & Mooshammer Reference Hoole and Mooshammer2002). This may also be the case in AusE, but requires further investigation to confirm.
Acoustic target and dynamic acoustic data were not directly compared to articulatory target and articulatory kinematic data in this study. In this study we found discrepancies between relative acoustic and relative gestural duration measures, with short vowels ∼ 62 $\%$ the acoustic duration of long vowels, but ∼ 90 $\%$ the gestural duration of long vowels. This is similar to prior studies investigating this relationship in German vowel length contrast (Hertrich & Ackermann Reference Hertrich and Ackermann1997). This suggests that the timing relationship between the larynx and the supralaryngeal articulators in vowel production may differ between long and short vowels, however this requires further empirical examination.
We examined only the lingual articulation of vowels, and only using data from a single lingual sensor. Differences in vowel identity arise due to differences in overall vocal tract shape (Stevens & House Reference Stevens and House1955, Chiba & Kajiyama Reference Chiba and Kajiyama1958, Lindblom & Sundberg Reference Lindblom and Sundberg1971, Fant Reference Fant1980), which is dependent on the coordinated placement of the tongue with respect to the jaw and lips (Lindblom & Sundberg Reference Lindblom and Sundberg1971, Hoole & Mooshammer Reference Hoole and Mooshammer2002). More detailed articulatory characterisation of vowels should examine the entire vocal tract. Future studies would benefit from rtMRI imaging technologies which offer high spatial and temporal resolution imaging of the vocal tract (Zhu et al. Reference Zhu, Kim, Proctor, Narayanan and Nayak2013, Lingala et al. Reference Lingala, Zhu, Kim, Toutios, Narayanan and Nayak2017).
Finally, perceptual studies are also needed to examine how duration and target quality are used by listeners to cue long versus short vowels. Investigation of participant-specific trading relationships between duration and vowel quality in the production of vowel length contrasts may also provide further insight into the representation and implementation of vowel length.
4.6 Conclusions
This study has systematically examined articulatory differences between long and short vowels in AusE. Long vowels were characterised by different temporal, spatial and dynamic kinematic properties compared to their short equivalents. Our results suggest that vowel duration and vowel quality may be actively and independently controlled to realise vowel length contrasts in AusE. Our results also highlight discrepancies between acoustic and articulatory measures of vowel duration, raising questions about the relationship between these two ways of measuring durational contrast. These data reveal the importance of studying vowel production in both the acoustic and articulatory domains to more fully understand the representation and implementation of vowel contrasts.
Acknowledgements
This research was supported in part by Australian Research Council Award DE150100318 and Australian Research Council award FT180100462. Parts of this study were presented at the 16th Conference on Laboratory Phonology (LabPhon16), Universidade de Lisboa, Lisbon, 19–22 June 2018, and the 17th Australasian International Conference on Speech Science and Technology (SST2018), University of New South Wales, Sydney, 4–7 December 2018.
Appendix. Additional materials
Formula = AcDur ∼ Vlength + Vpair + Conscontext + (Vlength × Vpair) + (Conscontext × Vpair) + (Vlength | participant)
Formula = GDur ∼ Vlength + Vpair + Conscontext + Vlength × Conscontext + Conscontext × Vpair + (1| participant)
Formula = targdiffz ∼ Vpair + Conscontext + (1 | participant)
Formula = LPz ∼ + Vlength + Conscontext + (1 | participant) + (1 | repetition)
Formula = FI $\%$ ∼ Vlength + Vpair + Conscontext + (Vlength × Vpair) + (Vlength × cons.context) + (Conscontext × Vpair) + (Vlength × Conscontext × Vpair) + (1 | participant)
Formula = GN $\%$ ∼ Vlength + Vpair + Conscontext + Vlength × Vpair + Conscontext × Vpair + (1 |participant)
Formula = RI $\%$ ∼ Vlength + Vpair + Conscontext + Conscontext × Vpair + (1 | participant)