Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-22T07:53:47.716Z Has data issue: false hasContentIssue false

The Physiological Interpretation of Sound Spectrograms

Published online by Cambridge University Press:  02 December 2020

Extract

In the physiological study of speech articulations our most objective information has come, until recently, from radiograms. Now we have, in addition, spectrograms, which, if we learn to interpret them, can also give us very objective information. For the typical phase (portion) of a speech sound, the interest of a spectrogram may be about equal to that of a radiogram; but for the transitional phases, the interest of a spectrogram will probably be much superior; and from the practical viewpoint of availability, the spectrogram will have a marked advantage for it can be had in a few minutes and at low cost. But to the linguist, the usefulness of a spectrogram depends on his ability to interpret it in articulatory terms. We need not stress, therefore, the importance of investigating the relation between formant positions and speech organ positions at this stage of the still young science of sound spectrography.

Type
Research Article
Information
PMLA , Volume 66 , Issue 5 , September 1951 , pp. 864 - 875
Copyright
Copyright © Modern Language Association of America, 1951

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Note 1 in page 864 X-ray pictures of the organs of speech during articulation.

Note 2 in page 864 Spectrographs pictures of the acoustic resonances produced by the speech organs during articulation, in three acoustic dimensions: time, frequency, intensity. The first extensive presentation of such pictures is to be found in Visible Speech by Potter, Copp, and Green (New York: Van Nostrand, 1947). Briefly, a sound spectrogram shows the energy distribution on a time-frequency scale where time is read from left to right, frequency from bottom to top, energy by the degree of darkness.

Note 3 in page 864 For those who are not yet familiar with spectrography, we shall define the essential term formant as it is used here. Linguistically the color of a vowel is determined by the frequency position of its formants—mainly its two lowest formants. Let us look at Fig. 1 or Fig. 3. There, formants appear as dark horizontal bands on a linear frequency scale (range: 3500 cycles from bottom to top). For instance, for [ce],the lowest band is formant 1 (frequency: about 500 cycles), the one above is formant 2 (frequency: about 1400 cycles), and the next one above is formant 3 (frequency: about 2400 cycles). Thus, on our spectrograms formants appear as the darkest areas. Acoustically, formants are the frequency regions of greatest intensity. For voiced vowels, the number of harmonics that cross such regions (in other words, that are comprised in formants) usually vary from one for high female voice to two or three for male voice. The frequency of a formant can satisfactorily be given by the frequency of its center.

Note 4 in page 864 Acoustic Phonetics, Linguistic Society of America (Baltimore, 1948), pp. 49–59. 5 Ibid., p. 57.

Note 6 in page 866 The Pattern Playback, developed by Franklin S. Cooper at the Haskins Laboratories, New York, is a speech-synthesizer that permits us to transform hand-drawn spectrograms into sound, using modulated light that is reflected from hand-drawn white lines. The relative intensity of each harmonic or of each formant depends on the width of the lines drawn in the harmonic channels. For our use of the machine, the harmonic channels were set 120 cycles apart and there were SO channels for a total frequency range of 6000 cycles. Among its many uses, this machine makes it possible to study the effects upon speech obtained by omitting or adding some resonances, or by modifying either their intensity, their frequency or their type.

Note 7 in page 867 Daniel Jones, An Outline of English Phonetics (Cambridge: Heffer, 1936), p. 63.

Note 8 in page 867 John S. Kenyon, American Pronunciation (Ann Arbor: Wahr, 1937), p. 66.

Note 9 in page 867 C. E. Parmenter and C. A. Bevans, “Analysis of Speech Radiographs,” American Speech, viii, iii, 51.

Note 10 in page 867 Richard T. Holbrook and Francis J. Carmody, “X-ray Studies of Speech Articulations,” Univ. of Calif. Publ. in Mod. Philol. XX, iv, 230.

Note 11 in page 867 L'articulation des voyelles (Paris; Vrin, 1937), p. 7.

Note 12 in page 867 The Vowel (Columbus: Ohio State Univ. Press, 1928), pp. 110–111.

Note 13 in page 871 Pierre Delattre, “Un triangle acoustique des voyelles orales du français,” French Review, xxi (May 1948), 477–484.

Note 14 in page 872 See note 5.

Note 15 in page 873 Op. cit., p. 93.

Note 16 in page 873 The result, a nasalized [e], is not to be confused with the real French nasal [e], which does not have the same articulatory positions, hence the same formant 1 and 2 frequencies (apart from the velum lowering).

Note 17 in page 873 The result of such denasalizing does not give French oral vowels [e], [oe], [?], [a], but some strange vowels that do not exist in French (nor probably in any language), for the organic positions of the four French nasals (and their formants 1 and 2) are not the same as those of any French orals. This can be shown by synthetic speech as well as by human speech.

Note 18 in page 875 This research was made possible by a University of Pennsylvania Faculty Research Grant to the author and by a Carnegie Corporation of New York Grant to the Haskins Laboratories, New York. We also wish to express our deepest appreciation to the staff members of the Haskins Laboratories and Bell Telephone Laboratories.

ACTS OF THE EXECUTIVE COUNCIL