1 Introduction
One of the most innovative contributions of Articulatory Phonology to our knowledge of speech production mechanisms has been the concept ‘gestural blending’ (Browman & Goldstein Reference Browman and Goldstein1989, Reference Browman, Goldstein, Kingston and Beckman1990, Reference Browman and Goldstein1992). According to a first formulation, gestural blending is predicted to take place in sequences of two phonetic segments produced with articulatory gestures implemented through the same articulator or contiguous articulators (e.g., either the tongue dorsum or the tongue blade, and the tongue predorsum) and results in a single phonetic realization whose closure or constriction location is intermediate between those for the two original segments. This blending type is exemplified by front velars, i.e., velar consonants followed by a front vowel or palatal glide, which are articulated at the back of the hard palate exclusively or at an articulatory zone embracing the postpalatal and front velar zones (Frisch & Wodzinski Reference Frisch and Wodzinski2016). An analogous blending strategy may also yield laminopredorso-alveoloprepalatal articulations out of sequences of consonants produced with the semi-independent tongue blade and tongue dorsum articulators, as exemplified by a [c]-like realization in the case of the sequences /tk/ and /kt/ (Catalan data reported in Recasens et al. Reference Recasens, Fontdevila and Pallarés1993) and by the outcomes [ɲ] and [ʃ] of /nj/ and /sj/, which may occur in fast speech productions of onion and bless you in English (Zsiga Reference Zsiga, Connell and Arvaniti1995). We have shown elsewhere that there is another gestural blending mechanism involving the addition or summation of the closure or central contact areas of two consecutive consonants. Electropalatographic (EPG) data for Catalan reveal that this blending type applies in sequences of heterosyllabic (dento)alveolar and alveolopalatal stops, nasals and laterals such as /nʎ/, /ɲn/, /ʎn/ and /ɲt/ (where /ʎ/ and /ɲ/ are alveolopalatal), while consonants subject to higher manner requirements such as front lingual fricatives and rhotics fail to take part in this blending process (Recasens Reference Recasens, Goldstein, Whalen and Best2006).
Two more remarks about the types of gestural blending just mentioned are in order.
On the one hand, the corresponding articulatory blended end product may be highly stable or else continuously changing and thus dynamic over the time domain. The first option applies to sequences of (dento)alveolar and alveolopalatal consonants (and also to front velars); thus, once the closure location for the blended articulation emerges from sequences like /ʎn/ and /ɲt/, it stays unmodified until the C2 release. The dynamic option has been reported to occur for /sj/ and thus a sequence of two consonants i.e., a lingual fricative and a palatal glide, which are both implemented through a temporally variable constriction. Indeed, acoustic centroid and electropalatographic (EPG) data for the sequence /s#j/ in American English, as in bless you, show a realization which changes continuously from more /s/-like at onset to more /ʃ/-like at later points in time (Zsiga Reference Zsiga, Connell and Arvaniti1995), and an analogous dynamic articulatory pattern is likely to be at work in instances of the blended outcome [ɲ] of /nj/. Once these two blended outcomes are phonologized (as it has occurred historically in the Romance languages), they become allophones of a single phoneme. On the other hand, and in contrast with regressive assimilatory processes, whenever operating in two-consonant sequences such as those just mentioned, gestural blending is often implemented not until about the midpoint of the temporal portion allocated to C1. This condition appears to hold for instances of static blending (and thus in sequences like /ʎn/ and /ɲt/) as well as for dynamic blending cases (and thus /sj/).
It may be objected that the term ‘dynamic blending’ is not appropriate for describing the sort of mutual adaptation process which occurs during the production of segmental sequences like /s#j/. Zsiga (Reference Zsiga, Connell and Arvaniti1995) found both for the synchronized articulatory and acoustic data that the /s/ palatalization characteristic induced by the palatal glide proceeded gradiently and variably, i.e., the linguopalatal contact pattern and the spectral centroid evolved progressively from more /s/-like to more /j/-like throughout the consonantal sequence (gradience) and speakers could differ as to whether /s/ was approached more or less at onset and /j/ at offset (variability). We believe that the term ‘dynamic blending’ captures this series of events better than the term ‘coarticulation’ since the latter implies that segments achieve essentially their target while this is rarely the case for neither /s/ or /j/ in the case of the sequence /s#j/ under consideration.
This paper deals with another case of dynamic gestural blending already reported in previous studies. EPG and spectral center of gravity (COG) data for /s#ʃ/ and /ʃ#s/ and the control sequences /s#s/ and /ʃ#ʃ/ in Central Catalan reported in Recasens & Mira (Reference Recasens and Mira2013) showed different adaptation strategies for /s/ followed by /ʃ/ (as in dos xais ‘two lambs’) and /ʃ/ followed by /s/ (as in peix salat ‘salted fish’). In agreement with impressionistic descriptions, the linguopalatal contact data revealed that the sequence /s#ʃ/ is realized as a palatoalveolar fricative all throughout independently of speaker and other factors such as stress placement and speaking rate and consequently may be said to undergo regressive assimilation. The sequence /ʃ#s/, on the other hand, was implemented through a blending mechanism which resulted into a phonetic realization showing an intermediate lingual configuration between those for /s/ and /ʃ/, which could be considered to be appropriate for the palatalized alveolar fricative [sj]. What turned out to be special about the /ʃ#s/ case was that the intermediate lingual configuration in question proceeded from more /ʃ/-like at onset of the two-fricative consonant sequence to more /s/-like at offset and was thus not stable but dynamic. Moreover, the COG data for /s#ʃ/ and /ʃ#s/ were in agreement with the EPG data just described in showing regressive assimilation for /s#ʃ/ and thus stable COG values appropriate for C2 all throughout, and a progressive increase from more [ʃ]-like to more [s]-like COG frequencies for /ʃ#s/. There was also a difference between the articulatory and acoustic data for the latter consonantal sequence, namely, the dynamic trajectory for the EPG signal was closer to /ʃ/ than to /s/ (not only at onset but later in time as well) and the reverse for the COG signal. This difference, which was not referred to explicitly in our previous study and will be commented upon in some detail in the Discussion section of the present paper, ought to be due to the fact that, while the electropalatographic data values were associated with the overall linguopalatal contact pattern (they were obtained using the EPG similarity index method developed by Guzik & Harrington Reference Guzik and Harrington2007), the COG values of the frication noise depend exclusively on the size of the oral cavity located in front of the lingual constriction (Fant Reference Fant1960). Interestingly enough, the acoustic durations of /s#ʃ/ and /ʃ#s/ were consistent with the assimilated (/s#ʃ/) and temporally blended (/ʃ#s/) realizations in that, while there was no significant difference in duration between /s#ʃ/ and the controls /s#s/ and /ʃ#ʃ/, the frication noise was clearly longer for /ʃ#s/ than for the three other sequences. It should be stated in this respect that, while, as it occurs with any sequence composed of two equal fricatives, rhotics or approximants separated by a word boundary in spoken Catalan and thus consonant sequences like /s#s/, /ʃ#ʃ/, /r#r/ and /j#j/, the long assimilated outcome [ʃ:] of /s#ʃ/ is often simplified into [ʃ] (dos xais is generally realized as [do ˈʃajs]), this shortening mechanism is expected to operate on /ʃ#s/ only if exhibiting a temporally stable blending trajectory, which, as pointed out above, does not appear to be a feasible option. In other words, one would expect the long fricative outcome of /ʃ#s/ to shorten if showing a [sj:]-like realization throughout the entire consonantal sequence but less so or not at all if, as the actual production data show, a dynamic blending mechanism is at work and consequently articulatory and acoustic changes occur from a more /ʃ/-like to a more /s/-like configuration.
To summarize, the temporally dynamic blended realization of /ʃ#s/ differs from other temporally stable blending types described above yielding front velar stops and alveolopalatal realizations while paralleling other instances of dynamic blending such as that occurring during the production of /s/ followed by /j/. Moreover, in so far as the articulatory configuration for /ʃ#s/ turns out not to be entirely appropriate for either /ʃ/ at onset or /s/ at offset, it can be suggested that two articulatory targets are not separately programmed in this case but that speakers aim at some compromise phonetic realization between the two fricatives instead.
The production asymmetry between /s#ʃ/ and /ʃ#s/ just discussed appears to have some universal validity though languages may differ in important respects with regard to the precise articulatory implementation of the two consonantal sequences in question. According to EPG and spectral data for English, /s#ʃ/ often has a canonical /ʃ/ quality all throughout, while /ʃ#s/ may show either two separate targets which are appropriate for /ʃ/ and /s/ or else an intermediate trajectory proceeding gradually or abruptly from more /ʃ/-like to more /s/-like (Holst & Nolan Reference Holst, Nolan, Connell and Arvaniti1995; Nolan et al. Reference Nolan, Holst and Kühnert1996; Pouplier et al. Reference Pouplier, Hoole and Scobbie2011). On the other hand, EPG and acoustic data for German reveal that /s#ʃ/ may not only undergo regressive assimilation but also blending, while /ʃ#s/ is implemented invariably through a more /ʃ/-like than /s/-like blended realization (Pouplier & Hoole Reference Pouplier and Hoole2016). Finally, spectral data for the frication noises of /s#ʃ/ and /ʃ#s/ in French reported by Niebuhr et al. (Reference Niebuhr, Clayards, Meunier and Lancia2011) show that /ʃ/ prevails over /s/ leftwards and rightwards and therefore that there is regressive assimilation in the case of /s#ʃ/, as in Catalan and English, and progressive assimilation in the case of the sequence /ʃ#s/, which, albeit less often, may also apply in other languages and dialects such as Western Catalan (see in this respect the EPG and acoustic data for this Catalan dialect reported in Recasens & Mira Reference Recasens and Mira2013). In sum, data for the languages reviewed so far show regressive assimilation for /s#ʃ/ most of the time and, regarding the sequence /ʃ#s/, either two separate targets (with regularly some carryover coarticulation yielding a more /ʃ/-like realization than expected at offset), a temporally dynamic blended realization which falls somewhere between /s/ and /ʃ/ or else progressive assimilation. This summary leads us to conclude that the articulation of /ʃ#s/ is far more variable and complex (and thus less controlled) than that of /s#ʃ/ and also that there is a robust trend for /ʃ/ to overcome /s/.
In searching for an explanation for the production data for /s#ʃ/ and /ʃ#s/ in Catalan and the other languages just described, it makes sense to argue that /ʃ/ prevails over /s/ clearly for /s#ʃ/ and in specific cases for /ʃ#s/ since a greater involvement of the dorsum of the tongue during the production of /ʃ/ than of /s/ renders the palatoalveolar fricative more constrained articulatorily than the alveolar fricative. In accordance with this difference in degree of articulatory constraint, /ʃ/ has been reported to be less prone to coarticulate with the adjacent vowels than /s/ while exerting larger coarticulatory effects on them (Recasens & Espinosa Reference Recasens and Espinosa2009). This however cannot explain by itself why phonological assimilation is so much more likely to operate on /s#ʃ/ (and thus at the regressive level) than on /ʃ#s/ (and thus at the progressive level). A plausible interpretation of this asymmetrical behavior, which is also found to hold in the case of other sequence pairs like /tk/-/kt/, may be sought in tongue repositioning for /ʃ#s/ but not for /s#ʃ/. On the one hand, the anticipation of /ʃ/ during preceding /s/ involves a single articulatory action during C1: little interarticulatory coupling between the primary tongue tip articulator and the back of the tongue for apicoalveolar /s/ leaves the tongue body quite free to anticipate the tongue dorsum raising/tongue front lowering gesture for following /ʃ/, which results in some constriction retraction cooccurring with an increase in predorsal contact immediately behind the alveolar constriction. On the other hand, more interarticulatory coupling between the primary laminopredorsal articulator and the back of the tongue and thus a stringent tongue body positioning for /ʃ/ renders the transition from /ʃ/ to /s/ less straightforward. In this case the anticipation of the lingual gesture for /s/ requires the execution of two non-complementary actions, i.e., some tongue dorsum lowering followed by the raising of the tongue tip and blade. We believe that this specific articulatory requirement rather than other factors such as, for example, a higher frequency of occurrence of /s/ vs. /ʃ/ in the word-final position accounts for why, when adjacent to /ʃ/, /s/ undergoes regressive rather than progressive assimilation. A similar but not identical explanation has been proposed by Perkell et al. (Reference Perkell, Boyce, Stevens, Wolf and Klatt1979, Reference Perkell, Guenther, Lane, Marrone, Matthies, Stockmann, Tiede, Zandipour, Harrington and Tabain2013): once the lingual groove for the initial fricative has been created, the production of /ʃ/ in the sequence /sʃ/ requires just pushing the tip-blade upward and forward and that of /s/ in the sequence /ʃs/ a more precise tongue front placement along the front–back dimension. This explanatory account assumes that there are differences in articulatory control during the formation of the front lingual constrictions for /ʃ/ (longer and shaped more precisely also to ensure a sufficient sublingual cavity volume) and /s/ (shorter and more ballistic). Differences in articulatory complexity between the two sequences of interest appear to be in accordance with the differences in duration reported above: while assimilated /s#ʃ/ exhibits a comparable duration to /s#s/ and /ʃ#ʃ/, non-assimilated /ʃ#s/ is often longer than /s#s/, /ʃ#ʃ/ and /s#ʃ/.
Language-dependent and even speaker-dependent differences in the articulatory and acoustic implementation of the two consonant sequences of interest may be associated with an aspect which has been largely neglected by previous studies, namely, the precise constriction location for /s/ and thus whether this consonant is laminopredorsal, more anterior and less grooved, or else apical, more retracted and more grooved. In contrast with the former variant, the latter is more /ʃ/-like and, in the same way as the palatoaveolar fricative, exhibits a sublingual cavity albeit smaller than that for the palatoalveolar cognate. While English may exhibit the two /s/ types (Ladefoged & Maddieson Reference Ladefoged and Maddieson1996: 146; Dart Reference Dart1998), French and Italian have typically a predorsal /s/ while Catalan, European Portuguese and Castilian Spanish favor the apical type (see Navarro Tomás Reference Navarro Tomás1918: 81–82 for Spanish).
Within this framework, the purpose of the present investigation is to analyze the realization of the two symmetrical sequences /s#ʃ/ and /ʃ#s/ in Central Catalan using a different technique from those used in previous studies, namely, ultrasound, which provides data on front and back lingual configuration over time. Ultrasound should allow ascertaining whether /s#ʃ/ is implemented through complete regressive adaptation and /ʃ#s/ through dynamic blending or other production mechanisms, and if evidence for these production strategies may be detected not only at about constriction location (and thus at the alveolar zone or prepalate) but also at more posterior areas of the vocal tract (and thus at the palatal, velar and pharyngeal regions). In contrast with EPG, which provides tongue-to-palate contact patterns but no actual lingual configuration data, ultrasound allows collecting information about the placement of the tongue dorsum surface at the palatal/velar zones and of the tongue back at the pharyngeal zone. Gathering lingual configuration contours at different regions of the tongue should make it possible to determine the relative timing of specific articulatory events such as anticipatory coarticulation or gestural blending at different portions of the tongue and thus whether those events occur at a specific tongue region before they do at another region. Detecting the two tongue edges with ultrasound is however problematic since the mandible and hyoid bones refract the sound before it reaches the tongue surface thus creating a black region at both margins of the image where the tongue tip and the tongue root are located (Stone Reference Stone2005). In order to ascertain the precise articulatory strategies used by speakers for the production of /s#ʃ/ and /ʃ#s/, their lingual configurations will be compared with those for the control sequences /s#s/ and /ʃ#ʃ/ at several consecutive points in time. Moreover, in parallel to our previous study Recasens & Mira (Reference Recasens and Mira2013), the extent to which articulatory changes are correlated with variations in the acoustic spectrum will be explored through inspection of the lingual contour and spectral data gathered at the same temporal points. Duration values for the fricative sequences will also be evaluated in the light of the following hypotheses: the assimilated palatoalveolar outcome of /s#ʃ/ is expected not to differ from /s#s/ and /ʃ#ʃ/ in duration, while /ʃ#s/ should be longer if exhibiting two clearcut targets or if executed through a dynamic blending mechanism.
2 Method
2.1 Lingual spline data
Ultrasound and acoustic recordings were carried out of the following meaningful Catalan sentences including /s#ʃ/, /ʃ#s/, /s#s/ and /ʃ#ʃ/ preceded by a mid front vowel or schwa and followed by [e]: tu compres xeixa ‘you buy candeal wheat’ ([əs#ʃe]); en coneix cent ‘(s)he knows one hundred of them’ ([ɛʃ#se]); en compres cent ‘you buy one hundred of them’ ([əs#se]); allí hi neix xeixa ‘candeal wheat grows up there’ ([eʃ#ʃe]). Sentence stress always fell on the initial syllable of the second word and thus the CV portion of the VC#CV sequences of interest. The word xeixa was chosen because it is the only meaningful word, together with the noun of the letter ‘x’ (xeix), which begins with stressed [ʃe] in Catalan. The sentence material were recorded six times by five Central Catalan speakers, i.e., two men (DR, the paper’s author; RO) and three women (ES, JU, IM) between forty to sixty years of age who speak Catalan on a regular basis in their everyday life.
Ultrasound recordings were performed with an Echo Blaster unit type EB128CEXT from TELEMED and a microconvex Echo Blaster 128 CEXT transducer with a 2–4 MHz frequency range and a central curvature of 20 mm. The ultrasound images were acquired using a probe with a 100% of 104° field of view and a frequency of 2 MHz, which was attached to a transducer holder positioned under the subject’s chin in an Articulate Instruments Stabilization Headset. The recording sampling rate was fifty-four frames per second yielding one image every 18.5 ms. Image streams were recorded synchronously with the audio signal sampled at 22,050 Hz with an AKG-D70 microphone using the sync hardware provided with the Articulate Assistant Advanced (AAA) software by Articulate Instruments Ltd. Contours of the back of the alveolar zone and hard palate were also recorded by asking speakers to press the tongue against their hard palate. Tongue contours were tracked automatically at all temporal frames along each C#C sequence token for each speaker using AAA and adjusted manually. Data points for all tongue contours were exported into an ASCII-file as x-y coordinates with their origin located at the bottom-left corner of the ultrasound image towards the rear of the vocal tract. Acoustic files were also exported in .wav format for taking segmental duration measures.
Several temporal points for measurement were identified on waveform and spectrographic displays. The onset and offset of the fricative sequences of interest were identified at the onset and offset of the frication noise, which coincided with the end of the preceding vowel and the beginning of the following vowel, respectively. Lingual spline data were processed at those two temporal points (referred to as P1 and P5 from now on), as well as at the consonant sequence midpoint (P3) and at the midpoint between the two resulting P1–P3 and P3–P5 periods and therefore at P2 and P4, respectively. Tongue spline data points were converted from Cartesian to polar coordinates by shifting the origin of the ultrasound image to approximately the center of the ultrasound probe which was located at X= 86.7 mm and Y=0 mm (Mielke Reference Mielke2015). SSANOVA smoothed splines consisting of strings of points separated by 0.01 radians with the associated standard errors (SE) were computed across the splines for all tokens of each C#C sequence using the R package gss to find a best fit curve (Davidson Reference Davidson2006). The rightmost and leftmost edges of the smoothed splines were determined by entering into the SSANOVA computation procedure the corresponding mean angle radian values across all tokens of the consonant sequences of interest. Inspection of the SE values for all sequences and speakers revealed the existence of small deviation percentages from the mean (between 0.25% and 0.5% depending mostly on subject) thus meaning that speakers’ productions were highly consistent across sequence tokens.
In order to evaluate the tongue configuration data at several tongue regions, the spatial length of the SSANOVA splines displayed in Cartesian coordinates was divided into four portions which correspond to different articulatory zones, namely, alveolar (ALV), palatal (PAL), velar (VEL) and pharyngeal (PHAR), separately for each subject applying the same subdivision procedure as in a previous publication reporting data for other consonant clusters which were recorded in the same session as the fricative sequences under analysis in the present investigation (Recasens & Rodríguez Reference Recasens and Rodríguez2017). The criterion for determining the four articulatory zones is as follows: the boundary between the alveolar and the dental zone was identified at an inflection point occurring at the spline front edge during dental /t/ in the sequence /pt/ and that between the alveolar and palatal zones at another inflection point located at the back alveolar area during the trill /r/ in the sequence /pr/ (/r/ is postalveolar in Catalan); the boundary between the palatal and velar zones was placed at the closure location for the velar stop in the sequence /iki/ which according to EPG data is articulated at the postpalatal zone, just in front of the soft palate, in Catalan; the length of the velar zone was taken to be 1.25 and 1.51 times that of the palatal zone in the case of the male and female speakers, respectively, as reported by Fitch & Giedd (Reference Fitch and Giedd1999); finally, the pharyngeal zone extended between the left edge of the velar zone all the way until the bottom edge of the lingual splines.
Spatial distances between each of the four lingual regions and the origin of the ultrasound field of view were measured at P1 through P5 on the SSANOVA smoothed splines. The corresponding distance values for the velar and palatal zones were obtained by averaging the distances between the five central points at each zone and the origin in order to avoid possible consistency problems which could have arisen if a single distance using the central point had been computed instead. Given that the splines for the C#C sequences subject to analysis could differ in length and thus be somewhat shorter or longer with regard to each other, the distance values for the two extreme zones, alveolar and pharyngeal, were evaluated by averaging the distances between the origin and five points located not at the zone midpoint but at the upper third of the pharyngeal zone and at the leftmost third of the alveolar zone.
2.2 Spectral center of gravity
A frequency measure of the frication noise spectrum, i.e., center of gravity (COG), was computed for all tokens of /s#ʃ/, /ʃ#s/, /s#s/ and /ʃ#ʃ/ using a MatLab script written by Leonardo Lancia (see also Recasens & Mira Reference Recasens and Mira2013). COG was calculated by multiplying each frequency in Hertz by the amplitude in decibels at that frequency, summing the products across all frequencies in the relevant range, and dividing the outcoming value by the sum of all the amplitude values. COG measures reflect the mean central frequencies for the entire spectrum and should be inversely dependent on the dimensions of the cavity in front of the lingual constriction (Cho et al. Reference Cho, Jun and Ladefoged2002). This COG computation procedure has been applied instead of the COG of the power spectrum (Forrest et al. Reference Forrest, Weismer, Milenkovic and Dougall1988), the difference between the two methods being that the former assigns more importance to the main spectral peak than the latter (i.e., the latter method divides the spectrum into two chunks of equal power and emphasizes secondary spectral peaks relative to the main spectral peak while also raising the COG values for fricatives with much high-frequency energy). It should be noted in this respect that the noise spectrum of Catalan apicoalveolar /s/ has a main peak at about 4000–5000 Hz which is definitely lower than that for more anterior, laminodental varieties of /s/ (Recasens Reference Recasens2014). COG measures were obtained from FFT spectra over a frequency range spanning from 1–11 kHz using a 25-ms length Hamming window which was shifted through the fricative sequence in steps of 10 ms. Hence the number of spectral slices varied according to the overall duration of the frication noise. The lower frequency limit of the COG range is intended to avoid artifacts due to possible residues of voicing.
Those COG values occurring at approximately the same time points as the lingual splines, i.e., at P1 through P5 as defined above, were selected for statistical analysis. The COG temporal trajectories in question allowed ascertaining how much the fricative spectrum changed throughout the frication noise and at what points in time did the spectral changes take place, if available.
2.3 Evaluation of assimilation and blending
Specific criteria were used for deciding whether the fricative sequences /s#ʃ/ and /ʃ#s/ were implemented through assimilation or gestural blending. The sequences /s#s/ and /ʃ#ʃ/ served as controls for determining which one of the two articulatory adaptation processes was at work. Whenever assimilation applies, C1 should equal C2 (regressive assimilation) or C2 should equal C1 (progressive assimilation), and there ought to be small articulatory and/or acoustic changes over time, if any, throughout the entire fricative sequence. For gestural blending to occur, on the other hand, the lingual configuration and/or frication noise spectrum ought to fall somewhere between /s/ and /ʃ/ throughout the entire fricative sequence, and either exhibit no obvious changes over time (static blending condition) or a continuous change from a more /ʃ/-like to a more /s/-like target in the /ʃ#s/ case and vice versa for /s#ʃ/ (dynamic blending condition). For the evaluation of the production mechanisms for /s#ʃ/ and /ʃ#s/ it will be assumed that the articulatory and acoustic data at the temporal points P1 and P2 correspond to C1 and those at P4 and P5 correspond to C2, while data occurring at point P3 are not strictly associated with either consonant.
2.4 Statistical analysis
Articulatory distance, COG and consonant sequence duration measures were submitted to linear mixed model (LMM) analysis with subject as a random factor using the mixlm package of R version 3.1.2 (R Developmental Core Team 2014).
Linear mixed model tests were run on the distance values between tongue position and the origin of the ultrasound field of view, and on the COG values, with the following fixed factors: sequence (levels /s#ʃ/, /ʃ#s/, /s#s/, /ʃ#ʃ/) and temporal point (P1, P2, P3, P4, P5) for the articulatory and acoustic data, and also articulatory zone (ALV, PAL, VEL, PHAR) for the articulatory data. Additional LMM tests were performed on the tongue distance and COG data for /s#ʃ/, /s#s/ and /ʃ#ʃ/, on the one hand, and for /ʃ#s/, /s#s/, /ʃ#ʃ/, on the other hand, since the main research topic was to find out whether /s#ʃ/ and /ʃ#s/ differed from the control sequences /s#s/ and /ʃ#ʃ/. Statistical results from the latter LMM tests will only be reported since they did not differ substantially from those obtained from the tests performed on the data for all four clusters. Results will be also provided for an LMM analysis performed on the duration values for /s#ʃ/, /ʃ#s/, /s#s/ and /ʃ#ʃ/.
Least Significance Difference (LSD) post-hoc tests were run on all main effects and significant interactions when available in order to find out whether numerical differences between pairs of levels of a given statistical variable reached significance or not. In view of the large number of tests involved in the LMM analyses, the Benjamini-Hochberg (BH) correction procedure for adjusting the false discovery rate was applied to those comparisons which were of relevance to the present investigation. In all statistical tests the significance level was set at p < 0.05.
3 Results
3.1 /ʃ#s/
3.1.1 Articulation
The statistical results for the tongue distance data for /ʃ#s/, /s#s/ and /ʃ#ʃ/ are presented in Table 1. According to the top panel there is a highly significant main effect of zone which turned out to be associated with all possible differences among zones but not of temporal point or sequence, and significant interactions for all two- and three-factor combinations. The failure for the temporal point and sequence factors to reach significance indicates that /ʃ#s/ does not differ from either /s#s/ or /ʃ#ʃ/ throughout the entire fricative sequence and thus, that is is implemented not through regressive or progressive assimilation but possibly through gestural blending. More detailed information about the production mechanism used by speakers for the production of /ʃ#s/ ought to come from results for the post-hoc test run on the significant triple factor interaction provided in the middle and bottom panels of the table.
According to the middle panel, significant differences between /ʃ#s/ and /s#s/ (left) were found to hold essentially at all five temporal points and at the three articulatory zones ALV, PAL and PHAR. On the other hand, differences between /ʃ#s/ and /ʃ#ʃ/ (right) turned out to be significant at all five temporal points P1 through P5 at PHAR and thus at the tongue body back, and essentially at P3, P4 and P5 and thus at the sequence midpoint and later temporal periods but not towards the onset of the two-fricative sequence at more anterior articulatory zones than the pharyngeal zone. It is thus the case that even though the tongue configuration for /ʃ#s/ differs from that for /s#s/ and /ʃ#ʃ/ at all temporal points, it happens to be somewhat more /ʃ/-like at onset than /s/-like at offset. In agreement with this finding, variations in tongue configuration between consecutive temporal points during the /ʃ#s/ sequence (see bottom panel) were significant for P1 vs. P3/P4/P5 and for P2 vs. P3/P4/P5, as well as for P3 vs. P4/P5, at the three articulatory zones ALV, PAL and PHAR, while those between P1 and P2 and between P4 and P5 did not achieve significance. Consequently, relevant changes in tongue position turned out to occur past P2 and thus towards the middle of the fricative sequence.
In order to get a better feeling for the production strategies for /ʃ#s/ used by the individual speakers, the actual distances in mm between the tongue surface and the origin of the ultrasound field of view at the temporal points P1 through P5 for /ʃ#s/ and for the controls /s#s/ and /ʃ#ʃ/ have been displayed separately for the five individual subjects in Figure 1 (upper and middle row graphs). Distance values are provided for the two articulatory zones PHAR (i.e., distances between the postdorsum and root and the origin) and PAL (i.e., distances between the tongue dorsum and the origin). An increase in tongue surface-to-origin distance at the pharyngeal zone between a given temporal point and the next one means that the tongue back is being retracted while an analogous increase in the tongue-to-origin distance at the palatal zone that the tongue dorsum is being raised.
According to the data presented in Figure 1, tongue distance trajectories over time at PHAR indicate that at time points P1 and P2 the back of the tongue body for /ʃ#s/ (red discontinuous line) remains stable at a location close to or practically identical to /ʃ/ (blue continuous line). Past P1 and P2, the trajectory for /ʃ#s/ proceeds backwards through an intermediate position between the trajectories for /ʃ#ʃ/ and /s#s/ until about point P4. Finally, at the time periods P4 and P5 and thus towards the end of /ʃ#s/, the trajectory in question becomes stable again at a tongue body back position which is not as front as that for /ʃ/ nor as back as that for /s/ or else practically identical to that for /s/. A mirror image occurs at the palatal zone (PAL): /ʃ#s/ starts being very much like /ʃ/, lowers gradually towards /s/ after P2, and stabilizes at P4 somewhere between the tongue dorsum positions for /ʃ/ and /s/ and often somewhat closer to the alveolar than the palatoalveolar fricative. Moreover, a trend for the PHAR and PAL trajectories for /ʃ#s/ to run closer to those for /ʃ#ʃ/ than to the ones for /s#s/ towards the middle of the sequence (and thus around P3) for most speakers may be observed, which is in agreement with the EPG data reported in Recasens & Mira (Reference Recasens and Mira2013).
A simultaneous consideration of the distance trajectories at PAL and PHAR reveals a tongue position intermediate between those for /ʃ/ and /s/ at sequence onset and offset (whether at PHAR, at PAL or at the two zones) at least in the case of subjects IM and RO, which, in concomitance with the COG data reported in Section 3.1.2, may be considered to correspond to a blending strategy. Moreover, this specific lingual placement may be found mostly at PHAR at onset and at PAL or at PHAR and PAL at offset, thus suggesting that the blending mechanism is implemented at the tongue back before it achieves the tongue dorsum. This sequence of events may also be observed in the overall tongue configuration data for the two subjects IM and RO plotted at time points P1 through P5 in Figure 2b. The other three speakers DR, ES and JU (see also Figures 2a for DR and ES and Figure 2b for JU) exhibit a continuous change from a lingual position appropriate for /ʃ/ at sequence onset to one which is not quite appropriate for /s/ and thus lies closer to /ʃ/ than expected at about the time point P4, which denotes some carryover coarticulation. As referred to above for the lingual trajectories, the tongue configurations for /ʃ#s/ displayed in Figures 2a and 2b also come closer to those for /ʃ/ than to the ones for /s/ for most speakers.
In sum, the lingual profile trajectories for /ʃ#s/ proceed from /ʃ/-like to /s/-like through an articulatory space which is intermediate between the two fricatives and biased towards /ʃ/ at the middle of the sequence, or else exhibit a tongue configuration much like that for /ʃ/ at onset and approaching /s/ at offset. Moreover, this intermediate tongue configuration is available at the pharynx rather than at the palatal zone, i.e., at all or most temporal points at the former articulatory zone and past C1 at the latter.
3.1.2 Spectral center of gravity
Results for the statistical tests run on the COG data for /ʃ#s/, /s#s/ and /ʃ#ʃ/ are presented in Table 2. Statistical results reported in the top panel reveal the presence of significant main effects of temporal point and sequence and a significant temporal point x sequence interaction, all these significant results achieving the p < 0.001 level of significance. Post-hoc tests performed on the two main effects yielded significant differences for /s#s/ > /ʃ#s/ > /ʃ#ʃ/ and thus higher values for /s#s/ than for /ʃ#ʃ/ and intermediate ones for /ʃ#s/, which is consistent with the blending account, and for all paired time points except for P3–P4. Results for the post-hoc tests performed on the temporal point x sequence interaction (see the middle and bottom panels of Table 2) differed in some respects from those for the articulatory data reported in Section 3.1.1: there were significant COG differences between /ʃ#s/ and /s#s/ and between /ʃ#s/ and /ʃ#ʃ/ at all five temporal points except for P1 in the case of the latter sequence pair (middle panel); COG differences between pairs of consecutive temporal points turned out to be all significant except for those for P1–P5 and P3–P4 (bottom panel). It is thus the case that, in contrast with the statistical results for the ultrasound data presented in Table 1, the COG frequency values during the temporal periods P1 and P2 allocated to C1=/ʃ/ in the sequence /ʃ#s/ could differ significantly from those for /ʃ/ and /s/, which is consistent with the blending account.
The COG trajectories for /ʃ#s/ for the individual speakers (Figure 1, bottom row graphs) reveal a continuous change from a low frequency value at onset (P1), which is intermediate between /ʃ/ and /s/ (speakers DR, ES, RO) or lies close to or at the same frequency as the COG value for /ʃ/ (speakers IM, JU), to a higher frequency value at P3 and P4 and thus about cluster midpoint and during C2, which approaches the COG frequency for /s/. Past P4, the COG frequency for /ʃ#s/, and those for /s#s/ and /ʃ#ʃ/ as well, lower as the jaw lowers and the constriction size increases in preparation of the following vowel (see Jongman et al. Reference Jongman, Wayland and Wong2000 and Iskarous et al. Reference Iskarous, Shadle and Proctor2011 for the same anticipatory vowel effect at the offset of the frication period for single fricatives in CV sequences). In so far as it may be implemented by means of a lingual trajectory which runs continuously from more /ʃ/-like to more /s/-like through an intermediate space between the two fricatives already at sequence onset, the spectral data for /ʃ#s/ appears to be consistent with a gestural blending strategy proceeding dynamically. Moreover, in parallel to results reported in Recasens & Mira (Reference Recasens and Mira2013), the graphs also reveal a trend for the COG trajectory for /ʃ#s/ to proceed closer to /s#s/ than to /ʃ#ʃ/ for speakers DR, ES and RO (Figure 1). An analysis of the relationship between these COG data and the articulatory data reported in Section 3.1.1 are left for the Discussion section.
3.2 /s#ʃ/
3.2.1 Articulation
Results for the statistical analysis performed on the distance values between tongue position and the origin of the ultrasound field of view for /s#ʃ/, /s#s/ and /ʃ#ʃ/ presented in Table 3 show a highly significant main effect of zone (which turned out to be significant for each zone against all the other zones) but not of temporal point or sequence, and several significant interactions among which the three-factor zone x temporal point x sequence interaction. In principle, the absence of a significant sequence effect is not consistent with the hypothesis that /s#ʃ/ should assimilate into [ʃʃ] and thus ought to differ from /s#s/ but not from /ʃ#ʃ/.
Results from the post-hoc tests performed on the triple-factor interaction provided in the middle panel of Table 3 show that the distance values for /s#ʃ/ differ statistically from those for /s#s/ all throughout the fricative sequence at ALV, PAL and PHAR, and from those for /ʃ#ʃ/ at all temporal points at PHAR and at P1 and P2 and thus at the onset of the fricative sequence at ALV and PAL. The finding that /s#ʃ/ differs from /ʃ#ʃ/ mostly during the time allocated to C1 is not in agreement with the regressive assimilation hypothesis; indeed, this difference in tongue position ought not to hold in case that regressive assimilation were at work. According to the bottom panel of Table 3, on the other hand, significant differences between consecutive temporal points occur mostly towards the onset of /s#ʃ/ thus meaning that the tongue is moving substantially during this span of time. In particular, distances at the tongue dorsum (PAL) and tongue body back (PHAR) become significantly different between P1 or P2 and any of the other temporal points, but essentially not for pairs of temporal points involving P3, P4 and P5 exclusively.
Analogously to the ultrasound data for /ʃ#s/ (see Section 3.1.1), in order to make sense of some of these statistical results it is convenient to evaluate them against the actual distance values between the tongue position at PHAR and PAL and the origin of the ultrasound field of view at all consecutive points in time (see Figure 3, top and middle row graphs for the data for the individual subjects). At PHAR (top graphs) there is a progressive fronting of the tongue body mostly through the time points P1, P2 and P3 and thus during the first half of the consonant sequence. This tongue body fronting motion proceeds from a more retracted /s/-like position at onset to a more anterior configuration at P4/P5, which either approaches closely or equals that for /ʃ/ (DR, ES, JU) or is intermediate between those for /s/ and /ʃ/ (IM, RO). At PAL (middle graphs), /s#ʃ/ becomes identical or practically so to /ʃ/ but not before the time points P2 or P3, i.e., the tongue dorsum trajectory for /s#ʃ/ starts out somewhere between the trajectories for /ʃ#ʃ/ and /s#s/ and raises until reaching a tongue dorsum position appropriate for /ʃ/ at P3 or P4 (all subjects and most clearly for DR, ES and IM). Taking jointly into consideration the trajectories at PHAR and PAL, it turns out to be the case that practically all speakers show some tongue dorsum raising and tongue back fronting during the first half of the sequence /s#ʃ/, which is not consistent with the regressive assimilation hypothesis. This sequence of events may also be observed in the overall tongue configuration data for the individual subjects plotted at time points P1 through P5 in Figures 4a and 4b.
3.2.2 Spectral center of gravity
As to the statistical results for the COG values for /s#ʃ/, /s#s/ and /ʃ#ʃ/, the two main effects (temporal point, sequence) and the temporal point x sequence interaction turned out to be highly significant (Table 4, top panel). Results obtained from the post-hoc tests for those main effects yielded a significant difference between all consecutive time points with a single exception (i.e., the P2–P3 pair) and, more importantly, between /s#ʃ/ and /s#s/ though not between /s#ʃ/ and /ʃ#ʃ/, which is consistent with the assumption that /s#ʃ/ assimilates into [ʃʃ] in Catalan. Also in accordance with the assimilation hypothesis, the pairwise comparisons for the two-factor interaction turned out to be significant for /s#ʃ/ vs. /s#s/ at P1 through P5 but not between /s#ʃ/ and /ʃ#ʃ/ also at all temporal points (Table 4, middle panel). On the other hand, COG frequency differences between consecutive time points did not achieve significance for those pairs involving P1, P2 and P3 but did for pairs involving any of those three time points and P4 or P5 and also in the case of the P4–P5 pair. It was thus the case that COG did not change much during the first half of the consonant sequence but changed a great deal towards its end (Table 4, bottom panel).
Moving now to the COG trajectories over time (see Figure 3, bottom row graphs), it may be seen that the COG frequency values for /s#ʃ/ are practically identical to those for /ʃ#ʃ/ throughout the entire frication noise for the four subjects DR, ES, IM and JU, which is clearly in support of the regressive assimilation account. Moreover, the COG trajectories for all three fricative sequences lower past P3 towards the end of the sequence, which accounts for the significant differences between P1/P2/P3 and P4/P5 referred to above and ought to be attributed to the gradual jaw lowering and constriction opening motion occurring at the offset of the frication period in anticipation of the following vowel (see also Section 3.1.2).
Judging from the articulatory and acoustic analysis results presented for /s#ʃ/ in this section and in Section 3.2.1, it appears that the regressive assimilation account is more consistent with the spectral data than with the articulatory data: indeed, while the COG values for this fricative sequence do not differ from those for /ʃ#ʃ/ since its very onset, the corresponding tongue configurations do not become /ʃ/-like until about the sequence midpoint. A tentative interpretation for this discrepancy will be proposed in Section 4. For the time being it suffices to say that gestural blending is not at work here since /s#ʃ/ becomes the same as /ʃ#ʃ/ whether throughout the entire sequence (COG) or during its second half (articulation).
3.3 Articulatory data summary
In order to draw a cross-speaker comparison between the articulatory and acoustic trajectories for the two asymmetrical sequences /ʃ#s/ and /s#ʃ/, a data normalization procedure has been carried out by setting the value for /s#s/ to 1 and that for /ʃ#ʃ/ to 0 for the PHAR and COG signals, and the reverse for PAL, at all five time points and rescaling the temporal values for the asymmetrical sequences /ʃ#s/ and /s#ʃ/ accordingly. In the three graphs of Figure 5 the normalized PHAR, PAL and COG trajectories for /ʃ#s/ and /s#ʃ/ have been overlaid for comparison.
It may be seen that all trajectories for /ʃ#s/ (continuous line) proceed from a C1-like to a C2-like target, the consonant target being fully achieved only for /ʃ/ in the case of the PAL trajectory. Moreover, a look at the articulatory and acoustic values at sequence midpoint (and thus at P3) shows that changes in tongue position run closer to the /ʃ/ space than to the /s/ space and the reverse for the spectral changes. Regarding the sequence /s#ʃ/ (discontinuous line) full regressive assimilation may be inferred from the COG trajectory in so far as it is essentially identical to that for /ʃ/ all throughout. The PAL distance trajectory (though not the PHAR trajectory, which shows considerable speaker-dependent variability past P3), on the other hand, achieves the lingual target for /ʃ/ at about sequence midpoint but not at time points P1 and P2 during C1. An explanatory hypothesis for why regressive assimilation during C1 is apparent in the COG data but not in the articulatory data will be provided in Section 4.
3.4 Duration
The fricative sequence duration values achieved significance according to the LMM test carried out on /ʃ#s/, /s#s/ and /ʃ#ʃ/ (F(2, 75)=7.35, p < 0.05) but not to the test performed on /s#ʃ/, /s#s/ and /ʃ#ʃ/. In agreement with data reported in our previous study (Recasens & Mira Reference Recasens and Mira2013), /ʃ#s/ turned out to be significantly longer than /s#s/ and /ʃ#ʃ/, which is consistent with the production of the former sequence exhibiting changes in articulatory state over the time domain. The bar graph of Figure 6 shows that all subjects produced indeed a longer frication noise for /ʃ#s/ than for /s#ʃ/, /s#s/ and /ʃ#ʃ/. On the other hand, the fact that /s#ʃ/ had a comparable duration to /s#s/ and /ʃ#ʃ/ suggests that it was articulated with a single target and therefore that the alveolar fricative assimilates to the following palatoalveolar fricative in constriction location in this case.
Judging from differences in sequence duration among speakers, it may very well be that the reason why speaker RO did not assimilate /s/ to following /ʃ/ (see Figures 3 and 4b) is because he produced all four fricative sequences more slowly than the other subjects. In spite of this he showed signs of a blending strategy in the case of the sequence /ʃ#s/ (see Figures 1 and 2b).
4 Discussion
Articulatory data for Catalan reported in the present study reveal that the production of /ʃ#s/ involves a change in tongue position from onset to offset and thus a temporally dynamic trajectory, which appears to be in conformity with the production characteristics of single lingual fricatives in many respects (Iskarous et al. Reference Iskarous, Shadle and Proctor2011; Reidy Reference Reidy2015). A longer duration for /ʃ#s/ than for the controls /s#s/ and /ʃ#ʃ/ is in line with the dynamic nature of the articulatory trajectory in question.
This articulatory change proceeds through intermediate targets for /ʃ/ and /s/ for some subjects or else from a lingual configuration appropriate for /ʃ/ to a configuration which is intermediate between /ʃ/ and /s/ for other speakers, and tends to be more /ʃ/-like than /s/-like not only at onset but towards the sequence midpoint as well. Towards the onset of the fricative sequence, this intermediate configuration happens to be more apparent at the pharynx than at the palatal zone where the tongue dorsum position for C1 is practically identical to /ʃ/. On the other hand, for most subjects, the COG trajectories proceed from more /ʃ/-like to more /s/-like through intermediate frequencies at all time points, and tend to be more similar to /s/ than to /ʃ/.
It may be argued that, even though the articulatory outcome for /ʃ#s/ is not temporally stable as that for front velar stops or for two-stop consonant sequences involving the superposition between the closure areas for C1 and C2 (see Introduction), we are facing, at least for some subjects, a blending strategy yielding an articulatory configuration which is intermediate between those for the alveolar and palatoalveolar fricatives. EPG data for /ʃ#s/ in Catalan reported in Recasens & Mira (Reference Recasens and Mira2013) also show a dynamic realization proceeding continuously from a more /ʃ/-like to a more /s/-like configuration, which may be implemented through a decrease in dorsopalatal contact and/or an increase in lingual constriction fronting along the time domain. Strictly speaking this is not necessarily a two-target movement since the lingual targets for the two consonants of the sequence /ʃ#s/ (i.e., palatoalveolar and alveolar, respectively) may be approached but not fully achieved. The discrepancy between the articulatory and COG data reported in Sections 3.1.1 and 3.1.2 suggests that the tongue front and also the tongue root for some speakers is being fronted towards the /s/ target already at the onset of the fricative sequence while the tongue dorsum lags behind. This tongue tip/blade fronting motion, which cannot be detected in the ultrasound signal (see Introduction), may account for why the COG frequency value increases between P1 and P2 and thus the temporal portion assigned to the first fricative of the consonantal sequence.
The scenario for /s#ʃ/ was more complex than initially expected in so far as a clear assimilatory mechanism was at work for COG but not for the articulatory data while being consistent with the presence of non-significant differences in duration between /s#ʃ/, on the one hand, and /s#s/ and /ʃ#ʃ/, on the other hand. The reason why the lingual configuration for /s#ʃ/ was /s/-like at onset and did not reach the /ʃ/ target until about the sequence midpoint could be sought in the fact that the preceding vowel was [ə] instead of [e] ([əs#ʃe] tu compres xeixa ‘you buy candeal wheat’). It may be that, in contrast with /s#s/, which was also preceded by schwa in our database ([əs#se] en compres cent ‘you buy one hundred of them’), during the onset of the /s#ʃ/ frication noise, while regressive assimilation occurs at constriction location, the tongue body behind constriction placement is still being raised and fronted from the appropriate lingual configuration for schwa to that for /ʃ/. Moreover, the reason why this change in tongue body position did not affect the acoustic spectrum must have been due to the fact that the COG frequency for lingual fricatives depends almost exclusively on the size of the cavity located in front of the constriction (Fant Reference Fant1960). Thus, while /s#ʃ/ assimilates into [ʃ#ʃ] in Catalan, articulatory accommodation to the preceding vowel segment is allowed to occur at the back of the vocal tract as long as it does not jeopardize the frequency characteristics of the [ʃ] frication noise. This data interpretation finds support in the EPG data reported in Recasens & Mira (Reference Recasens and Mira2013) showing the predicted assimilated outcome for /s#ʃ/ in sentences where the preceding vowel was mid front ([ɛs#ʃə] es prenia tres xarops ‘(s)he was taking three kinds of syrup’, [es#ʃa] li digué que matés xais ‘he asked him/her to kill some muttons’).
The experimental findings reported in the present investigation are relevant with respect to more theoretical issues. Differences in articulatory adaptation between the two lingual fricatives reported in this and previous studies for various languages support the notion that /ʃ/ should be specified for a higher degree of articulatory constraint than /s/: /s/ assimilates to following /ʃ/ while /ʃ/ does not assimilate to following /s/; there may be carryover /ʃ/-to-/s/ effects which may be rendered phonological and thus lead to the assimilation of /s/ to preceding /ʃ/ (see Introduction). A higher degree of articulatory constraint for /ʃ/ than for /s/ appears to be due to a more precise tongue body positioning associated with the tongue raising gesture itself, the shaping of a relatively long laminal or lamino-predorsal constriction channel and the formation of a front sublingual cavity. Moreover, a rationale for why /s/ assimilates to following /ʃ/ rather than to preceding /ʃ/ and /ʃ#s/ is far more variable than /s#ʃ/ (the articulatory motion for /ʃ#s/ may show blending, progressive assimilation or two targets) should be sought in the different production strategies involved, i.e., free adaptation vs. repositioning (see the Introduction), and also in the fact that regressive but not progressive assimilation conforms to phonemic preprogramming in speech, i.e., phonemes are programmed in advance of their actual articulatory manifestation. On the other hand, carryover coarticulation and progressive assimilation for /ʃ#s/ are likely to be directly related to the mechanico-inertial effects associated with the tongue dorsum raising/fronting gesture for the palatoalveolar fricative as well as for other dorsal consonants, as suggested by the realization [ɲʃ] and [ʎʃ] of the Catalan sequences /ɲs/ and /ʎs/ in words like anys ‘year, pl.’ and alls ‘garlic, pl.’.
A cross-language comparison among the adaptation strategies occurring in the fricative consonant sequences under analysis reveals a general pattern towards the prevalence of /ʃ/ over /s/ at the regressive level rather than at the progressive level where the three adaptation strategies referred to above may be found and languages/dialects and speakers may differ regarding which one of these strategies is at work. Thus, acoustic data for four English subjects reported in Niebuhr et al. (Reference Niebuhr, Clayards, Meunier and Lancia2011) turned out to be very similar to those for the Eastern Catalan speakers reported in the present study with regressive assimilation occurring for /sʃ/ and a /ʃ/-to-/s/ trajectory for /ʃs/, which may proceed gradually or abruptly. EPG and acoustic data for English subjects provided in Pouplier et al. (Reference Pouplier, Hoole and Scobbie2011) show a similar scenario with also instances of /ʃs/ where the [ʃ] realization extends until C2. Descriptive studies also refer to regressive assimilation for /sʃ/ in English (this shop; Cruttenden Reference Cruttenden2008: 302) and the EPG data presented in Nolan et al. (Reference Nolan, Holst and Kühnert1996) are consistent with this segmental adaptation mechanism. In other languages/dialects, /ʃ/ is favored over /s/ both leftwards and rightwards. This appears to be the case in Southern French (Niebuhr et al. Reference Niebuhr, Clayards, Meunier and Lancia2011), Western Catalan, and also European Portuguese where /s#ʃ/ is unavailable since word-final /s/ is realized as [ʃ] while /ʃ#s/ is implemented as [ʃ] ([dojˈʃaʃ] dois chás ‘two teas’, [dojʃɐˈpatuʃ] dois sapatos ‘two shoes’; Leite de Vasconcelos Reference Leite de Vasconcelos1901: 120; Mateus & d’Andrade Reference Mateus and d’Andrade2000: 145). It seems therefore plausible to characterize as categorical regressive assimilation the adaptation scenario occurring in most instances of /sʃ/ and to allow for several production strategies for /ʃs/ that may depend on multiple factors, among which not only speaker and dialect but presumably speech rate and stress as well.
An issue open for further research is whether language-dependent differences in the extent to which /s/ assimilates to following /ʃ/ parallel a trend for categorical regressive place assimilations to be favored in certain languages rather than others where C1 may not assimilate to C2 or adapt to it partially in constriction location (i.e., gradient assimilation). This has been found to be the case for /n/+ velar, labial sequences in English and German which may exhibit assimilation, no assimilation or reduction of the alveolar gesture, as opposed to several Romance languages such as Italian, Spanish and Catalan where the nasal assimilates to the following heterorganic consonant practically all the time and traces of the C2 lingual gesture are not likely to occur during C1 (see Kochetov et al. Reference Kochetov, Colantoni and Steele2021 for a summary and Bergman Reference Bergmann, Philip Hoole, Pouplier, Mooshammer and Kühnert2012 for articulatory data for /n#g/ in German). Regarding the fricative consonant sequences of interest, the production of /s#ʃ/ has been shown to involve not only regressive assimilation but also blending in German (Pouplier & Hoole Reference Pouplier and Hoole2016), which clearly violates the prediction that /s/ should assimilate to /ʃ/ in these circumstances. Following a suggestion put forth in the Introduction, it may very well be that the reluctance for /s/ to assimilate to the following palatoalveolar fricative is due to speakers having laminodental productions of this consonant (and see in this respect Howson & Redford Reference Howson and Redford2022 about the high degree of coarticulatory resistance for dental fricatives).
As also pointed out in the Introduction, we believe that the term ‘dynamic blending’ is suitable for cases of mutual adaptation in sequences of two consonants articulated with a central channel for the passage of airflow and thus lingual fricatives and lingual approximants (/ʃ#s/, /s#j/), a reason for this being that their articulatory trajectories are also dynamic when produced in VCV sequences. An issue for further research is how these blended realizations are perceived in running speech by listeners. Regarding Catalan, the blended fricative outcome of /ʃ#s/ is often perceived as intermediate between /ʃ/ and /s/, i.e., as palatalized /s/ or depalatalized /ʃ/ (Badia Reference Badia Margarit1951: 101), which is very much in accordance with the gestural blending account.
Generally speaking, a good match was obtained between the lingual and COG frequency trajectories, which is in agreement with that between the EPG and acoustic data reported in earlier studies (Nolan et al. Reference Nolan, Holst and Kühnert1996; Pouplier et al. Reference Pouplier, Hoole and Scobbie2011). A relevant issue is why the lingual profiles for /ʃ#s/ were more similar to those for the palatoalveolar fricative while the COG values for the same sequence approached the ones for the alveolar fricative instead. As pointed out in the Introduction, this may be related to the fact that, while the lingual profiles reflect changes in overall tongue configuration, the COG values for fricative consonants depend on the size of the cavity located in front of the lingual constriction. Regarding front cavity size, it needs to be recalled that /ʃ/ is produced not only with a more retracted constriction than /s/ but also with a larger sublingual cavity. In Catalan, the apical or laminal alveolar constriction for /s/ is rather back which means that it ought to be articulated with a larger sublingual space than the more anterior, predorsal variety of the alveolar fricative to be found in languages like French. This difference in constriction location causes the spectral peak of the frication noise to be relatively low, i.e., ca. 5000 Hz, when compared to that for /s/ in other languages (Jongman et al. Reference Jongman, Wayland and Wong2000; Zygis Reference Zygis2003). In these circumstances, we speculate that a progressive reduction in front cavity size during the production of /ʃ#s/ as the lingual constriction is being fronted causes a fast spectral COG increase towards a value more appropriate for /s/. This change in constriction location could be related to a change in formant-cavity affiliation, i.e., the front cavity would be affiliated with F3 for /ʃ/ and with F3 or F4 for apicoalveolar /s/ instead of with F4 or F5, which would be the case if /s/ were predorso-dentoalveolar (Stevens Reference Stevens1989; Dart Reference Dart1991; Tabain Reference Tabain2001).
In agreement with previous EPG and acoustic data for Catalan, the present study has confirmed a clear trend for /s/ to assimilate to following /ʃ/ and for the sequence /ʃ#s/ to exhibit a /ʃ/-to-/s/ trajectory, which at least for some speakers appears to result from a dynamic blending production strategy. Results may be accounted for assuming that the two consonants differ in degree of articulatory constraint and articulatory precision, as also revealed by a higher degree of coarticulatory resistance for /ʃ/ than for /s/, and that the C1-to-C2 transition for /ʃ#s/ involves articulatory repositioning and thus an asynchronous activation of different portions of the tongue. Moreover, the ultrasound data reported in the present investigation indicate that dynamic blending may be observed at specific regions of the tongue before others, i.e., at the tongue back before more anterior tongue regions. Future research could look into the language- and speaker-dependent factors which may intervene in the various realizations of /ʃs/ and how these realizations are perceived by listeners and also into deviations from the regressive assimilation pattern for /sʃ/ which may occur in languages other than German.
Acknowledgments
I would like thank Clara Rodríguez for help with the data processing and statistical analysis, the Servei d’Estadística Aplicada of the Universitat Autònoma de Barcelona for statistics advice, and three reviewers for comments on previous versions of the manuscript. This research was supported by project PGC2018-096877-B-I00 of the Spanish Ministry of Science, Innovation and Universities and by the research group 2017 SGR 34 of the Catalan Government.