Fundamental Frequency and Pitch

doi:10.1017/9781108644198.014

13 - Fundamental Frequency and Pitch

from Section III - Measuring Speech

Published online by Cambridge University Press: 11 November 2021

Daniel Hirst and

Céline De Looze

Edited by

Rachael-Anne Knight and

Jane Setter

Show author details

Rachael-Anne Knight: Affiliation:
City, University of London
Jane Setter: Affiliation:
University of Reading

Book contents

Get access

Summary

Pitch, the subjective impression of whether individual speech sounds are perceived as relatively high or low, is an important characteristic of spoken language, contributing in some languages to the lexical identity of words and in all languages to the perception of the intonation pattern of utterances. Pitch corresponds to the physiological parameter of the frequency of vibration of the vocal folds, the fundamental frequency, which can be measured in cycles per second or hertz.Estimating and measuring fundamental frequency and modelling pitch is not easy. After presenting some automatic models of pitch, we address issues related to the detection and measurement of fundamental frequency, including tracking/detection errors, and explain how many of these errors can be avoided by the appropriate choice of pitch ceiling and floor settings. We finally discuss the use of acoustic scales (linear, logarithmic, psychoacoustic) for the measurement of pitch. Based on evidence from recent findings in neuroanatomy, neurophysiology, behavioural studies and speech production, we suggest that a new scale, the Octave-Median (OMe) scale, appears to be more natural for the study of speech prosody.

Keywords

pitch fundamental frequency (f0)automatic pitch detection pitch measurement pitch scales micromelodic effects Momel algorithm chromatic repetition Octave Median scale (OMe)

Type: Chapter
Information: The Cambridge Handbook of Phonetics , pp. 336 - 361

DOI: https://doi.org/10.1017/9781108644198.014 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

13.7 References

Beranek, L. L. (1949). Acoustical Measurements. Melville, NY: Acoustical Society of America [revised edition 1988].Google Scholar

Bigi, B. (2015). SPPAS – Multi-lingual approaches to the automatic annotation of speech. The Phonetician (International Society of Phonetic Sciences), 111–112(I–II), 54–69.Google Scholar

Boersma, P & Weenink, D. (2019). Praat: Doing Phonetics by Computer [computer program]. Version 6.0.56, June 2019, www.praat.org.Google Scholar

Braun, M. (2001). Speech mirrors norm-tones: Absolute pitch as a normal but precognitive trait. Acoustics Research Letters Online, 2(3), 85–90.Google Scholar

Braun, M. (2006). A retrospective study of the spectral probability of spontaneous otoacoustic emissions: Rise of octave shifted second mode after infancy. Hearing Research, 215, 39–46.CrossRef Google Scholar PubMed

Braun, M. & Chaloupka, V. (2005). Carbamazepine induced pitch shift and octave space representation. Hearing Research, 210, 85–92.CrossRef Google Scholar PubMed

Brøndsted, T. (1997). Intonation contours distorted by tone patterns of stress groups and word accent. In Botinis, A., ed., Intonation: Theory, Models and Applications (Proceedings of an ISCA workshop). Athens: Athanasopoulos, pp. 55–8.Google Scholar

Chentir, A., Guerti, M. & Hirst, D. J. (2009). Extraction of standard Arabic micromelody. Journal of Computer Science, 5(2), 86–9.CrossRef Google Scholar

Cho, H. & Rauzy, S. (2008). Phonetic pitch movements of accentual phrases in Korean read speech. In Proceedings of the 4th International Conference on Speech Prosody, Campinas, Brazil.Google Scholar

De Looze, C. (2010). Analyse et interprétation de l’empan temporel des variations prosodiques en français et en anglais. PhD thesis, Université de Provence, Aix-en-Provence, France.Google Scholar

De Looze, C. & Hirst, D. J. (2008). Detecting changes in key and range for the automatic modelling and coding of intonation. In Proceedings of 4th International Conference on Speech Prosody. Campinas, Brazil, pp. 135–8.Google Scholar

De Looze, C. & Hirst, D. J. (2014). The OMe (Octave-Median) scale: A natural scale for speech melody. Proceedings of the 7th International Conference on Speech Prosody, Dublin, pp. 910–13.Google Scholar

Di Cristo, A. & Hirst, D. J. (1986). Modelling French micromelody: Analysis and synthesis. Phonetica, 43 (1–3), 11–30.CrossRef Google Scholar

Fant, G. (1968). Analysis and synthesis of speech processes. In Malmberg, B., ed., Manual of Phonetics. Amsterdam: North Holland, pp. 173–7.Google Scholar

Fant, G. (2004). Speech Acoustics and Phonetics. Dordrecht: Kluwer.Google Scholar

Fourcin, A. J. & Abberton, E. (1971). First applications of a new laryngograph. Medical and Biological Illustration, 21, 172–82.Google Scholar PubMed

Fujisaki, H. (2004). Information, prosody, and modeling – with emphasis on tonal features of speech. In Proceedings of the Second International Conference on Speech Prosody, Nara, Japan, pp. 1–10.Google Scholar

Fujisaki, H. & Nagashima, S. (1969). A model for the synthesis of pitch contours of connected speech. Annual Report of the Engineering Research Institute, 28, 53–60.Google Scholar

Gårding, E. (1998). Intonation in Swedish. In Hirst, D. J. and Di Cristo, A., eds., Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge University Press, pp. 117–36.Google Scholar

Goldsmith, J. A. (1990). Autosegmental and Metrical Phonology. Cambridge, MA: Blackwell.Google Scholar

Graddol, D. (1986). Discourse specific pitch behaviour. In Johns Lewis, C., ed., Intonation in Discourse. Edinburgh: Croom Helm, pp. 221–38.Google Scholar

Halle, M. & Vergnaud, J.-R. (1987). An Essay on Stress. Cambridge, MA: MIT Press.Google Scholar

Hanson, H. (2009). Effects of obstruent consonants on fundamental frequency at vowel onset in English. Journal of the Acoustical Society of America, 125, 425–41.CrossRef Google Scholar PubMed

’t Hart, J., Collier, R. & Cohen, A. (1990). A Perceptual Study of Intonation: An Experimental-Phonetic Approach to Speech Melody. Cambridge: Cambridge University Press.Google Scholar

Hermes, D. I. & van Gestel, I. E. (1991). The frequency scale of speech intonation. Journal of the Acoustical Society of America, 90, 97–102.Google Scholar

Hess, W. (1983). Pitch Determination of Speech Signals: Algorithms and Devices. Belin: Springer-Verlag.Google Scholar

Hirst, D. J. (1981). Phonological implications of a production model of English intonation. Phonologica, 1980, 195–201.Google Scholar

Hirst, D. J. (1983). Structures and categories in prosodic representations. In Cutler, A. & Ladd, D. R., eds., Prosody: Models & Measurements. Berlin: Springer, pp. 93–109.CrossRef Google Scholar

Hirst, D. J. (2007). A Praat plugin for Momel and INTSINT with improved algorithms for modelling and coding intonation. In Proceedings of the XVIth International Conference of Phonetic Sciences (paper 1443), Saarbrücken, pp. 1233–6.Google Scholar

Hirst, D. J. (2012). Diapason.praat. Praat script. www.researchgate.net/publication/327764721_diapason.Google Scholar

Hirst, D. J. (2015). ProZed: A speech prosody editor for linguists, using analysis-by-synthesis. In Hirose, K. & Tao, J., eds., Speech Prosody in Speech Synthesis. Modeling and Generation of Prosody for High Quality and Flexible Speech Synthesis. Berlin: Springer-Verlag, pp. 3–17.CrossRef Google Scholar

Hirst, D. J. & Espesser, R. (1993). Automatic modelling of fundamental frequency using a quadratic spline function. Travaux de l’Institut de Phonétique d’Aix, 15, 75–85.Google Scholar

Hirst, D. J., Di Cristo, A. & Espesser, R. (2000). Levels of representation and levels of analysis for intonation. In Horne, M., ed., Prosody: Theory and Experiment. Dordrecht: Kluwer Academic Publishers, pp. 51–87.Google Scholar

Hirst, D. J., Cho, H., Kim, S. & Yu, H. (2007). Evaluating two versions of the Momel pitch modeling algorithm on a corpus of read speech in Korean. In Proceedings of INTERSPEECH, VIII. Antwerp, Belgium, pp. 1649–52.Google Scholar

House, A. & Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America, 25, 105–13.Google Scholar

House, D. (1990). Tonal Perception in Speech. Lund: Lund University Press.Google Scholar

Iivonen, A. (1998). Intonation in Finnish. In Hirst, D. J. and Di Cristo, A., eds., Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge University Press, pp. 331–47.Google Scholar

Imig, T. J. & Morel, A. (1985). Tonotopic organization in ventral nucleus of medial geniculate body in the cat. Journal of Neurophysiology, 53, 309–40.CrossRef Google Scholar PubMed

Jassem, W. (1952). Intonation of Conversational English (educated Southern British). Wrocław: Wrocławskie Towarzystwo Naukowe [PDF available from the Speech and Language Data Repository, http://sldr.org/sldr000777/en].Google Scholar

Jones, D. (1909). Intonation Curves. Leipzig: Teubner.Google Scholar

Kiessling, A., Kompe, R., Niemann, H., Nöth, E. & Batliner, A. (1995). Voice source state as a source of information in speech recognition: Detection of laryngealizations. Natoasi Series of Computer and Systems Sciences, 147, 329–32.Google Scholar

Kuttner, F. A. (1975). Prince Chu Tsai-Yu’s life and work: A re-evaluation of his contribution to equal temperament theory. Ethnomusicology, 19(2), 163–206.Google Scholar

Liberman, M. (2017). Pitch contour perception. http://languagelog.ldc.upenn.edu/nll/?p=34251.Google Scholar

Lindley, Mark. (2001). Well-tempered clavier. In Sadie, S. & Tyrrell, J., eds., The New Grove Dictionary of Music and Musicians, 2nd ed. London: Macmillan.Google Scholar

Liu, J., Wang, N., Li, J., Shi, B. & Wang, H. (2009). Frequency distribution of synchronized spontaneous otoacoustic emissions showing sex-dependent differences and asymmetry between ears in 2- to 4- day-old neonates. International Journal of Pediatric Otorhinolaryngology, 73(5), 731–6.Google Scholar

Maghbouleh, A. (1998). Tobi accent type recognition. In Proceedings of the Sixth International Conference on Spoken Language Processing, Paper 0632.Google Scholar

Martin, P. (1981). Extraction de la fréquence fondamentale par intercorrélation avec une fonction peigne. 12e Journées d’Etude sur la Parole, SFA, Montréal.Google Scholar

Mertens, P. (2004). The Prosogram: Semi-automatic transcription of prosody based on a tonal perception model. In Proceedings of the 2nd International Conference on Speech Prosody, Nara, Japan, pp. 549–52.Google Scholar

Mertens, P. (2018). Prosogram, v 2.15. Pitch contour stylization based on a tonal perception model. https://sites.google.com/site/prosogram/home.Google Scholar

Mertens, P. & d’Alessandro, C. (1995). Pitch contour stylization using a tonal perception model. In Proceedings of the 13th International Congress of Phonetic Sciences vol. 4, pp. 228–31.Google Scholar

Mixdorff, H. -J. (1999). A novel approach to the fully automated extraction of Fujisaki model parameters. In Proceedings of ICASSP 1999, pp. 1281–4.Google Scholar

Moore, B. C. J. & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. Journal of the Acoustical Society of America, 74, 750–3.Google Scholar

Moore, B. C. J. & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model. Acta Acustica, 82, 335–45.Google Scholar

Morel, A. (1980). Codage des sons dans le corps genouille médian du chat: évaluation de l’organisation tonotopique de ses différents noyaux, PhD dissertation, Université de Lausanne, Juris, Zurich.Google Scholar

Morest, D. K. (1965). The laminar structure of the medial geniculate body of the cat. Journal of Anatomy 99, 143–60.Google Scholar

Nolan, F. (2003). Intonational equivalence: an experimental evaluation of pitch scales. In Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, pp. 771–4.Google Scholar

Nooteboom, S. (1999). The prosody of speech melody and rhythm. In Hardcastle, W. J. & Laver, J., eds., The Handbook of Phonetic Sciences. London: Blackwell, pp. 640–73.Google Scholar

O’Shaughnessy, D. (1987). Speech Communication: Human and Machine. Reading, MA: Addison-Wesley, p. 150.Google Scholar

Paeschke, A. & Sendlmeier, W. F. (2000). Prosodic characteristics of emotional speech: Measurements of fundamental frequency movements. In Proceedings of the ISCA Workshop on Speech and Emotion, Belfast, Ireland, pp. 75–80.Google Scholar

Rossi, M. (1971). Le seuil de glissando ou seuil de perception des variations tonales pour les sons de la parole. Phonetica, 23, 1–33.Google Scholar

Silverman, K. (1986). f₀ segmental cues depend on intonation: The case of the rise after voiced stops. Phonetica, 43(1–3), 76–91.Google Scholar

Steele, J. (1779). Prosodia Rationalis: or, an Essay towards Establishing the Melody and Measure of Speech, to be Expressed and Perpetuated by Peculiar Symbols, 2nd ed. London: J. Nichols.Google Scholar

Stevens, S., Volkman, J. & Newman, E. (1937). A scale for the measurement of the psychological magnitude of pitch. Journal of the Acoustical Society of America, 8, 185–90.Google Scholar

Taylor, P. (1995). The rise/fall/connection model of intonation. Speech Communication, 15(1–2), 169–86.Google Scholar

Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. Journal of the Acoustical Society of America, 88, 97–100.Google Scholar

Traunmüller, H. (1997). Auditory scales of frequency representation. www2.ling.su.se/staff/hartmut/bark.htm.Google Scholar

Umesh, S., Cohen, L. & Nelson, D. (1999). Fitting the Mel-scale. In Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing, 1, Phoenix, Arizona, USA, March 1999, pp. 217–20.Google Scholar

Véronis, J., Hirst, D. J. & Ide, N. (1994). NL and speech in the Multext project. In Proceedings of AAAI Workshop on Integration of Natural Language and Speech, Seattle, USA, pp. 72–8.Google Scholar

Wightman, C. & Campbell, N. (1995). Improved labeling of prosodic structure. In IEEE Transactions on Speech and Audio Processing.Google Scholar

Wikipedia. (2018). Pitch detection algorithm. https://en.wikipedia.org/wiki/Pitch_detection_algorithm.Google Scholar

Wright, A. A., Rivera, J. J., Hulse, S. H., Shyan, M. & Neiworth, J. J. (2000). Music perception and octave generalization in rhesus monkeys. Journal of Experimental Psychology Gen 129 (3), 291–307.Google Scholar

Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands (Frequenz-gruppen). Journal of the Acoustical Society of America, 33, 248.Google Scholar

Zwirner, E. & Zwirner, Z. K. (1937). Über das Hören und Messen des Sprachmelodie, Achiv für vergleichende Phonetik 1, pp. 35–47.Google Scholar