AN UNSOLVED PROBLEM IN CHAIN SHIFTING
On first glance, chain shifting seems to be a phenomenon that is fairly well understood. It reflects the systematic interrelationships in a phonological system, in which the tendency towards maximal dispersion produces symmetrical patterns, somewhat offset by the asymmetrical character of articulatory space (Martinet, 1955). Furthermore, chain shifts reflect the hierarchical nature of phonological organization, since most of them are initiated by the shift of a phoneme from one subsystem to another, disturbing the symmetry of the original subsystem in a way that requires restoration. Martinet's initial view of chain shifting as a teleological process—that speakers made efforts to maintain contrast and avoided mergers—can be replaced by a perspective based on the automatic and mechanical process of probability matching (Labov, 1994:580–588). In this view, margins of security are preserved by the result of misunderstanding, rather than efforts to avoid misunderstanding. Outliers from the normal distribution of realizations of a phoneme that overlap the normal distribution of a neighboring phoneme are less likely to be understood as being tokens of the intended phoneme, and are thus less likely to participate in the calculation of the mean target of that phoneme by the language learner. But when the neighboring phoneme has shifted away, increasing the margin of security, the same outlier is more likely to be recognized as a member of its intended phoneme, and thereby shift the calculation of the mean in its direction.
In the most common examples of chain shifting, the leaving element is at one end of the series, as in (1).
This is the familiar case of the monophthongization of /ay/ in the Southern Shift (Feagin, 2003; Labov, 1991), where /ay/ leaves the set of upgliding vowels for the set of long and ingliding vowels, and /ey/ and /iy/ readjust their positions downward to obtain maximal dispersion; or the diphthongization of /i:/ in the Great Vowel Shift to /iy/, followed by the upward adjustment of /e:/ and /æ:/. This article will consider the consequences of removing an element from the middle of a series, as in (2).
B is removed from the A/B/C subsystem. How can we predict whether A or C will shift to fill the hole in the pattern? In this report, we will discuss a case in which A and C follow a collision course: They both move into the vacated area in phonological space.
Our interest in this problem stemmed from the work of the Atlas of North American English (Labov, Ash, & Boberg, 2006, henceforth ANAE), which involved the measurement of 134,000 vowels of 439 speakers, an average of 305 vowels per speaker. To maximize both the number of speakers studied and the number of vowels per speaker, it was decided to take measurements only of the first and second formants, F1 and F2, following the finding that these two parameters served to distinguish vowels in a manner generally in accord with acoustic impressions (Cooper et al., 1952; Labov, Yaeger, & Steiner, 1972). In the general procedure, no measurements were taken of the fundamental F0, the third formant F3, bandwidths, or duration. As expected, we found a sizeable amount of overlap of measurements for vowels in the same subsystem, and in many cases it seemed that the auditory impressions of the vowels differed as well. The particular overlap we study here is commonly produced in the course of the Northern Cities Shift, the rotation of six vowels found throughout the Inland North, from Syracuse and Binghamton to Madison and Milwaukee (ANAE, Ch. 14). It is shown as Figure 1.
The first step in this chain shift is the general tensing of all short-a words, with a consequent fronting and raising of the phoneme as a whole along the front diagonal. The leaving element here is (1) the fronting and raising of /æ/ along the front peripheral path, exiting from the middle of a series of short vowels, with /e/ on one side and /o/ on the other. This is followed by a shift of both of the neighboring vowels, (2) the fronting of /o/, and (4) the lowering and backing of /e/ along the nonperipheral path, into the position formerly occupied by /æ/.
1The third step, the lowering of /oh/ in caught, etc., is not so uniformly ordered in relation to the others, and is often found in mid-back position.
(Eckert, 1999; Labov, 1994; Labov, Yaeger, & Steiner, 1972).
The designation of word classes in Figure 1 is the standard ANAE binary notation for North American English.
2Chapter 2 of ANAE discusses the relation between the ANAE notation and the IPA-based phonetic notation frequently used by dialectologists, as well as the J. C. Wells system for labeling word classes with typical words. The classes /æ/, /e/, /o/, and /oh/ that are central to this article are
in the notation of Kurath and McDavid 1961, and TRAP, DRESS, LOT, and THOUGHT in the Wells terminology.
and the long and ingliding vowels /ah, oh/ in father, spa and caught, law, etc.
3The designation “long and ingliding” should be amplified by the specification that the degree of ingliding is proportionate to the height of the vowel, with zero as the limiting case for the lowest vowel, /ah/. The upgliding subsystems /iy, ey, ay/ and /uw, iw, ow, aw/ will not be involved, since the margins of security between /ey/ and /e/, /o/ and /ay/, /ah/ and /ay/, etc. are very large in the Inland North dialects considered here.
No such overlap is suggested in Figure 2, which displays the means of the short vowels of Martha F., a 28-year-old resident of Kenosha, WI, interviewed in 1992. The /ae/ mean has moved to an upper-mid front position, higher and fronter than /e/. The backing of /e/ and the fronting of /o/ have brought these two vowels in alignment on the F2 dimension. At the same time, it seems that /e/ and /o/ are quite distinct: The means are 206 Hz apart on the F1 dimension. It would seem that /e/ has followed the upper path on Figure 1, backing but not lowering.
When we examine the full set of tokens from the same speaker in Figure 3, it becomes evident that the lowering of /e/ is very much a part of the overall development. At least 20 /e/ tokens are low enough to overlap with /o/ tokens, though none are located as low as the 20 lowest exemplars of /o/.
Figure 4 is an expanded view of the area of /e/-/o/ overlap, with the overlapping tokens labeled. There are no minimal pairs since the initiating Telsur project had no motivation to inquire into /e ∼ o/ contrast. Listening to /e/∼/o/ pairs in close conjunction in this area, the vowel qualities seem quite close, but it is not immediately evident whether they are distinct or not, since the phonetic environments do not match exactly. There is however no phonetic conditioning apparent that would account for the low position of /e/ or the high position of /o/ tokens. The vowels of not4 and get are in close approximation and are not clearly different in their phonetic enviornments. Nor is there any obvious reason for pen to be lower than on2 if these vowels are phonemically distinct. In general, all of the characteristic environments of /e/ and /o/ are distributed evenly in this area of overlap.
Given this degree of overlap, it seems clear that (1), either /e/ and /o/ are nondistinct in this region, or (2), that the F1, F2 measurements of /e/ and /o/ are not sufficient to register their distinctive qualities, and that other phonetic features must be called on to distinguish them. The most likely candidate for such a feature is duration.
PHONEMIC AND PHONETIC LENGTH
We have been using the notation /o/ for the class of cot, socks, college, etc., which is largely composed of short-o words. However, in the Inland North (as in most areas of North America), short-o has merged with the /ah/ class of father, pa, pajamas, etc., so that bother rhymes with father, and bomb with balm.4
When of course the /l/ is not pronounced, which it frequently is.
It follows that the preconditions for the Northern Cities Shift include the two phonological movements shown in Figure 5. Both /æ/ and /o/ migrate from the subset of short vowels to the subset of long and ingliding vowels. In the case of /æ/, phonetic indications of this phonological shift appear as the vowel moves higher and fronter, in the form of increasing length of the nucleus and the development of an inglide. The question remains as to whether there are phonetic indications of the merger of /o/ and /ah/, that is, whether /o/ acquires phonemic length. It is well known that vowel duration is inversely correlated with vowel height, that is, there are phonetic differences in vowel duration. Peterson and Lehiste (1960) showed mean values for duration of English vowels, as reorganized in Table 1. There are sizeable differences in length associated with long versus short vowels: 60–70 msec. (The value for /ow/ is the sole anomaly in this picture). Each degree of opening also corresponds to an increase of duration of about 30 msec. These duration differences are redundant, since vowel quality fully differentiates the high, mid, and low vowels. It can be argued that the duration differences in height are purely mechanical effects. Beckman (1986) attributed the length difference to the differential time required for the transition from the consonantal position to the maximum tongue and jaw opening required for the vowel.
However, if vowels of the same height differ systematically in duration, no such mechanical effect can be appealed to. If the /e/ and /o/ tokens in the overlapping area of Northern Cities Shift speakers are differentiated by duration instead of formant locations, it will be a clear indication of phonemic length, that is, duration associated with membership in an abstract category.
Phonemic length in English generally plays a marginal role. As noted earlier, the length distinction in balm versus bomb (Trager & Smith, 1957) has all but disappeared in North American English. However, a clear example of phonemic length is found in the Pittsburgh Chain Shift (ANAE, Ch. 19). In this shift
falls to low central position in response to the merger of /o/ and /oh/ in lower back mid position. In so doing, the
phoneme largely overlaps with the area of monophthongal /aw/, characteristic of the Pittsburgh dialect. The two vowels are, however, distinctly differentiated by length. The Pittsburgh monophthongization of /aw/ is accompanied by compensatory lengthening. For one 61-year-old speaker, monophthongal /aw/ has a mean duration of 207 msec, ranging from 170 to 270 msec, while
has a mean of 98 Hz, ranging from 70 to 120 msec. There is no overlap, and the means are more than 5 standard deviations apart.
To measure the duration differences for /e/ and /ah/ in the Inland North, the areas of overlap for 48 Inland North speakers were examined. In each such area, we identified /e/∼/ah/ pairs that were close in terms of F1/F2 measurements, for a total of 350 pairs involving 456 different words. Table 2 shows the means and standard deviations of the durations of /e/ and /ah/ in the overlapping area of the Inland North speakers.
The mean value of 54 msec is considerably less than in the Pittsburgh case, but greater than the intrinsic differences of height found by Peterson and Lehiste. Since these pairs are not matched for segmental environment or number of syllables, it is possible that the duration difference is a product of differences in the syllable structures of the /e/ and /ah/ word pairs. A regression analysis was therefore carried out for the 456 words (201 /e/ and 255 /ah/), considering all features of the preceding and following segmental environment.5
This included not only the immediately preceding and following segments, but also the presence of initial obstruents/liquid clusters, complex codas, and preceding and following syllables.
In Table 3, it appears that F1 position has no significant effect on duration within this phonetic range, and F2 has only a slight effect—3 msec shorter for every 100 Hz advance to the front. Each extra syllable beyond 1 shortens the vowel nucleus by 35 msec, and vowels before voiceless stops are expected to be shorter by 24 msec. Beyond this, none of the several dozen environmental features have a significant effect on duration. The word class identification emerges at 51 msec, very close to the value of Table 2. We can therefore conclude that the 50 msec difference between word classes is not a phonetic product of environmental factors but a phonological reflex of the word classes themselves.
The question remains as to how consistent this differentiation of /e/ and /ah/ is across the 48 Inland North speakers. The scattergram of Figure 6 displays on the horizontal axis the mean differences in seconds between /ah/ and /e/ for each individual speaker. The accuracy of this difference value clearly depends on the number of pairs measured for each speaker, displayed on the vertical axis. The overall view is that almost all speakers differentiate /e/ and /ah/ by duration. The only speakers who show a difference less than 0 are those for whom less than 5 pairs were measured, and the only ones who showed a mean difference less than 20 msec had 10 or fewer pairs measured. We conclude that the differentiation of /ah/ and /e/ by duration is a general characteristic of Inland North speakers.
EXPERIMENT 1: ARE THERE OTHER CUES TO /e∼ah/ IDENTITY BESIDES DURATION?
Duration may not be the only factor influencing the identification of vowels; it might be F0, F3, amplitude, tenseness, or some other quality of spectral shape. Examination of F0, F3, or VQI (Voice Quality Index; Di Paolo & Faber, 1990) did not show any tendency to differentiate /e/ and /ah/ in the overlapping area. To test this possibility more thoroughly, we extracted a section of approximately 120 msec duration from the nuclei of 37 different words, both /e/ and /ah/, with vowels located in the overlapping area, excluding all information from transitions, so that the vowels could only be identified by the nucleus alone.
The vowels were played individually to a class of 31 undergraduates, repeated once with a 250-msec interval. Subjects were given a forced choice of four words from which the vowel might have been taken, as in (3).
The largest number of tokens were identified with /æ/ in had in accordance with their location in the overlapping area. Some, however, were identified as head and hod, in some cases in agreement with the word that the extract was taken from. Table 4 shows the percent identifications by the original phoneme category. The /e/ tokens were evenly split between head and had, while the /ah/ tokens were predominantly heard as had.
Table 4 indicates the confusion that would be expected from the F1/F2 overlap, but does not indicate whether there is some feature present that would separate the two phonemes in a way that could lead to their successful use in connected speech to distinguish their respective word classes. Such a tendency would be shown by any response that indicated the /e/ extract was higher and/or fronter than the /ah/ extract. For example:
We selected 26 pairs of extracts that were close in F1/F2 values and compared responses to see if there was any such tendency to distinguish the /e/ and /ah/ members. The results are shown in Figure 7. The vertical axis in Figure 7 is the proportion of the 31 students distinguishing short /e/ from /ah/ in the appropriate direction for each of the 26 pairs, arranged in order of increasing success. A success rate beyond chance would be above .67 by the binomial theorem. This was found for only four of the 26 pairs.
Experiment 1 therefore led to the conclusion that if one controlled for duration, there was no other phonetic feature that listeners could use to distinguish /e/ from /ah/ in the overlapping area.
EXPERIMENT 2: CAN A 50-MSEC DIFFERENCE IN DURATION FUNCTION AS PHONEMIC LENGTH?
We now turn to the question of whether the mean difference in duration of 50 msec, less than one standard deviation, is sufficient to maintain a difference between these two word classes when they overlap in two-formant space. If /ah/ and /e/ did in fact follow a collision course, we would expect that one or the other would recede from low front position.
If the effect of duration is sufficient to distinguish the overlapping /e/ and /ah/ words, we project that the frequency of lowered /e/ would persist as a stable and developing trend. If not, we would expect a shift of /e/ to some other position in the vowel space. The question to be addressed is whether the mean difference in duration of 50 msec is enough to distinguish vowels in the overlapping F1/F2 space. In general, duration differences without quality differences have been found to be marginal in North American English, as in the vanishing distinction of balm and bomb. In the Pittsburgh Chain Shift, a difference in length separates
from monophthongal /aw/ as in dahn vs. done, but this difference is much greater than the 50-msec differential found here. Experiment 2 addressed the question of whether such a 50-msec difference in length can effectively distinguish the two phonemes.
In this experiment we selected nine short-e words and six /ah/ words in the overlapping F1/F2 area. Five short-e words in the 100–140 msec range were lengthened in a series of steps; two short-e words in the 140–240 range were shortened and lengthened; and two short-e words in the 240–300 msec range were shortened. For /ah/ words, we selected three tokens of socks and shortened them in four progressive steps. In addition, three tokens of sod were shortened.
The perceptual effect in some cases is dramatic. When a fronted token of socks is heard at its original duration of 275 msec it is uniformly heard as sacks. There is a slight shift towards
at 250 msec, and a sudden jump to identification as sex at 175 msec.
These stimuli were played to a group of 31 undergraduate students, who were asked to make a forced choice as in (4).
Results are presented in Figures 9, 10, and 11. In these diagrams, the horizontal axis is the length of the nucleus in milliseconds, the vertical axis is the number of subjects, and the numbers identify responses to each token with the shortened versions connected by straight lines.
At lower right of Figure 8, 7 subjects identified the full version of socks3 as /ah/, the word socks. As the word was shortened to 220 msec this dropped to 5, and then to none as it was shortened to 160 msec. The solid lines indicate the additional numbers who identified the word as /æ/ in sacks: 23, for a total of 30 who chose either socks or sacks; the residual number, in this case 1, is the total of those who heard /e/ as in sex. Thus the lower right of Figure 9 is the socks/sacks area, and the upper left is the sex area. The downward slope of the solid lines indicates the declining number of those who heard the word as /ah/ or /æ/ as the word was shortened, and the increasing number who heard it as /e/. The crossover point, where a majority switched from /æ/ to /e/ is in the 160–220 msec range.
The mean durations of the /e/ and /ah/ words in the overlapping area are indicated in Figure 8 by vertical dashed lines. The crossover point for one word, socks2, lies within this range, and for the other two, it is somewhat higher, that is, shortening socks changes it to sex some 20 msec before the mean value of socks in production is reached. In all three cases, the crossover is abrupt or categorical: A change of majority identification takes place within the 50-msec differential that is characteristic of production.
Figure 9 is the corresponding data for two shortened forms of sod. Again, the dotted lines at the lower right indicate the number of subjects who identified this extract with the vowel /ah/ as in sod. For sod4 this is maintained at 12–14 of the 31 subjects until the duration is cut to 120 msec, when only 7 subjects heard it as the original sod. In the case of sod5, the comparable decline occurs at 160 msec.
The majority of subjects heard the original form of sod as /æ/. A decline in favor of /e/ begins at 190 msec. In the 140–160 msec area, about a third of the subjects hear the shortened forms as /e/. Though the shift of category perception is less than that for socks in Figure 9, a radical change takes place within a 60 msec window. For sod4, 15 of the 31 subjects changed their categorization of the stimulus when the extract was shortened from 180 msec to 120 msec.
Figure 10 shows the experimental results for the original short-e words. The solid lines represent the number of subjects who identified the /e/ extract as sad, so that the residual area above represents the proportion who heard the token as /e/. At the lower left are five short tokens that were lengthened; in the middle are two tokens of intermediate length, hem and said, which were both lengthened and shortened, and at upper right are two long tokens of short-e words, said and met, which were only shortened. The effect of lengthening shorter tokens produces a more abrupt change of identification than the shortening of longer tokens.
The dashed lines again indicate the mean durations of /e/ and /ah/ words in production. Three of the experimental tokens switch their majority identifications within that range, one shortened and two lengthened. Three others show a more abrupt crossover in the 180–220 msec range.
We conclude that in the majority of our experimental alternations of length, a difference of 50-msec duration of the vowel extract can effectively alter the majority perception of the phoneme it is derived from.
F1 AND F2 VALUES WITH CHANGE OF LENGTH
There are at least two different interpretations that may be made of the effect of shortening of the /ah/ and /e/ tokens. The implication of the argument thus far is that listeners perceive the durations appropriate for one or the other phonemic category. Another possibility is that the shortening leads to the calculation of new F1 and F2 values, taking into account that less time is available to reach the target tongue (and formant) positions. Such a calculation would lead to the opposite results than those of Figures 8, 9, and 10, since listeners would infer that the intended target of the shorter vowel was lower than that achieved, and so switch from /e/ to /æ/ rather than /æ/ to /e/. It is a matter of some interest that LPC calculations of the maximum and minimum formant positions are generally invariant with shortening. Figure 11 shows the formant values calculated by Praat (Boersma & Weenink, 2006) for the original (empty squares) and fully shortened (solid squares). Most of the tokens remain unchanged. The only major change is for socks2, where the shift of formant measurements reflects the shift in judgments. Four cases show identical values, and there is no overall direction of change.
THE PATH OF /e/ IN THE NORTHERN CITIES SHIFT
If the overlapping tokens of /e/ and /ah/ were not effectively distinguished by duration, the logic of chain shifting would lead us to expect either a shift of the means of /e/ upward or a shift of the means of /ah/ further to the back. Figure 12 illustrates the situation where an outlier of /e/ in the main /ah/ distribution is either recognized consistently as /e/ or not. The diagram indicates only F1 and F2 values, but not duration. If the outlier is consistently distinguished from /ah/ by duration (or some other feature), it will be consistently entered into the calculation of the F1 mean for /e/ by the language learner, with the resultant mean1. If no such feature is available, we cannot expect such consistent recognition. Though the context in which the outlier is heard may lead to correct identification in many cases, it will be less consistently recognized as /e/ than the tokens in the main distribution, and so contributes less consistently to the calculation of the /e/ mean by the language learner. The end result will be a different mean for the incoming generation of language learners, mean2, and a gradual shift of /e/ to a higher position.
If the 50-msec difference between /ah/ and /e/ is not sufficient to distinguish these two phonemes, we can expect that in the course of time, the overlap of /e/ and /ah/ will lead to a reversal of the lowering of /e/ in the Northern Cities Shift, so that /e/ is shifted more consistently back towards
in mid center position. Indeed, this was the first interpretation of the situation in response to the discovery of the backing of /e/ in Detroit (Eckert, 1989), compared to the earlier observations of /e/ lowering in Chicago (Labov, Yaeger, & Steiner, 1972). The completed ANAE now gives us a larger perspective on this question, and allows us to determine whether, on the whole, the lowering of /e/ and its overlap with /ah/ is an ongoing process, or whether the process is being reversed.
Figure 13 shows normalized mean values of F1 and F2 for 2,909 short-e tokens from 211 speakers in the Northern dialect region, divided by gender and four age groups. For men (solid squares), there is little difference among the older groups, but a sharp increase in lowering for those under 20. For women, the picture is dramatically different. The oldest women, over 60, are the most conservative in their treatment of /e/. With each successive age cohort, /e/ moves back and downward. The lowering movement is comparable in magnitude to the backing movement, since a difference of 50 Hz in F1 corresponds in perception to roughly 100 Hz in F2.
The increasing gender differentiation with younger speakers confirms the findings of Eckert's study of Detroit high schools (1999), where variables appeared to develop stronger gender differentiation as the sound changes progressed through the Northern Cities Shift.
SUMMARY
In the development of the Northern Cities Shift, the removal of /æ/ from the short vowel subsystem led both /e/ and /ah/ to shift into the same region of phonological space, drawn into the vacated area. However, it appears that the /e/ and /ah/ vowels in the overlapping area are differentiated by a duration difference of about 50 msec.
Experiment 1 showed that when duration differences are controlled, vowels in the overlapping area are not distinguished from each other.
Experiment 2 showed that alterations in the duration of a vowel comparable in size to the 50 msec production differential of /e/ and /ah/ in the overlapping area produced a change of category identification.
The F1 mean of /e/ continues to increase in the Inland North in spite of the overlap with /ah/, indicating that a duration difference of 50 msec may play an active role in differentiating the subset of short vowels from the subset of long and ingliding vowels in North American English.