Where is Female Synthetic Speech?

Caroline Henton

doi:10.1017/S0025100300006411

Where is Female Synthetic Speech?

Published online by Cambridge University Press: 27 April 2009

Caroline Henton

Show author details

Caroline Henton: Affiliation:
fonix corporation, 180 West Election Road, Draper, UT 84020 e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

There is widespread, immediate and enduring demand for high quality, natural, intelligible synthetic female voices in the expanding speech technology industry. Yet synthetic female voices are scarce, both in parametric text-to-speech (TTS) systems and in concatenative ones. Current female synthetic speech largely lacks naturalness, pleasantness and tolerability. Some acoustic specifications of female voices that are relevant to synthesis are discussed in detail. Recent research pertaining to female voice quality is reported and a ranking of these various considerations is proposed. This paper reviews the present situation and considers why there is a paucity of female voice synthesis.

Type: Articles
Information: Journal of the International Phonetic Association , Volume 29 , Issue 1 , June 1999 , pp. 51 - 61

DOI: https://doi.org/10.1017/S0025100300006411 [Opens in a new window]
Copyright: Copyright © Journal of the International Phonetic Association 1999

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Aronovitch, C.D. (1976). The voice of personality: stereotyped judgements and their relation to voice quality and sex of speaker. Journal of Social Psychology 99, 207–20.CrossRef Google Scholar PubMed

Bennett, S. & Weinberg, B. (1979). Sexual characteristics of pre-adolescent children's voices. Journal of the Acoustical Society of America 65, 179–89.Google Scholar

Bladon, A. (1986). The use of auditory modelling for speaker normalization in speech recognition. In Mermelstein, P. (editor) Montreal Symposium on Speech Recognition.Google Scholar

Bladon, A., Henton, C.G. & Pickering, J.B. (1984) Towards an auditory theory of speaker normalization. Language and Speech 4, 59–69.Google Scholar

Brend, R. (1975). Male-female intonation patterns in American English. In Thome, B. & Henley, N. (editors), Language and Sex: Difference and Dominance, 84–87. Rowley, MA: Newbury House.Google Scholar

Byrd, D. (1994). Relations of sex and dialect to reduction. Speech Communication 15, 39–54.CrossRef Google Scholar

Carlson, R. & Granström, B. (1999). Speech Synthesis. In Hardcastle, W.J. & Laver, J. (editors) The Handbook of Phonetic Sciences, 768–788. Oxford: Blackwell.Google Scholar

Chen, F.R. (1980). Acoustic Characteristics and Intelligibility of Clear and Conversational Speech at the Segmental Level. S.M. thesis, Dept. Electrical Engineering, MIT.Google Scholar

Crystal, D. (1975). The English Tone of Voice. London: Arnold.Google Scholar

Darrow, B. (1984). Research spurs development of talking machines, Design News, 12 03, 110.Google Scholar

Drinkwater, L. (1984). Quoted in Industry Week, 10 15, p.82.Google Scholar

Elyan, O. (1978). Sex differences in speech style. Women Speaking 4.Google Scholar

Fant, G. (1979). Temporal fine structure of formant damping and excitation. In Wolf, J.J. & Klatt, D. (editors), Speech Communication Papers, Acoustical Society of America, 161–66.Google Scholar

Gates, B. (1999). Talking to your computer-not that crazy. The CostCo Connection, 03 17.Google Scholar

Goldstein, U. (1980). An Articulatory Model for the Vocal Tracts of Growing Children. D.Sc. thesis, MIT.Google Scholar

Henton, C. (1998). Text to Speech Systems: When Size Does Matter. Proceedings of the American Voice Input-Output Society, 129–35, 1998.Google Scholar

Henton, C. (1995). Cross-language Variation in the Vowels of Female and Male Speakers. Proceedings of the XIIIth International Congress of Phonetic Sciences. Stockholm: KTH and Stockholm University, Vol. 3, 420423.Google Scholar

Henton, C. (1986). Comparative Study of Phonetic Sex-specific Differences Across Languages. D.Phil, thesis, University of Oxford.Google Scholar

Henton, C. (1983). Changes in the Vowels of Received Pronunciation. Journal of Phonetics 11, 353–71.Google Scholar

Henton, C. & Bladon, A. (1985). Breathiness in normal female speech: inefficiency versus desirability. Language and Communication 5, 221–227.Google Scholar

Henton, C. & Bladon, A. (1988). Creak as a sociophonetic marker. In Hyman, L.M. & Li, C.N. (editors), Language, Speech and Mind: Studies in Honor of Victoria A. Fromkin, 3–29. London: Routledge.Google Scholar

Hollien, H. & Jackson, B. (1973). Normative data on the speaking fundamental frequency characteristics of young adult males. Journal of Phonetics 1, 117–20.Google Scholar

Hollien, H. & Shipp, T. (1972). Speaking fundamental frequency and chronologic age in males. Journal of Speech and Hearing Research, 15, 155–59.Google Scholar

Ingemann, F. (1968). Identification of the speaker's sex from voiceless fricatives. Journal of the Acoustical Society of America, 44, 1142–44.Google Scholar

Jespersen, O. (1922) Language, Its Nature, Development and Origin. London: Allen & Unwin.Google Scholar

Johansson, C., Sundberg, J. & Wilbrand, H. (1982) x-ray study of articulation and formant frequencies in two female singers. Quarterly Progress Status Reports, RIT Stockholm, 4, 117–34.Google Scholar

Karlsson, I. & Neovius, L. (1994). Rule-based female speech synthesis -segmental level improvements. Proceedings of the 2nd. ESCA/1EEE Workshop on Speech Synthesis. New Paltz, NY: 123–126.Google Scholar

Key, M.R. (1972). Linguistic behavior of male and female. Linguistics 88, 15–31.Google Scholar

Klatt, D. (1982). Speech processing strategies based on auditory models. In Carlson, R. & Granström, B. (editors), The Representation of Speech in the Peripheral Auditory System, 181–96. New York: Elsevier.Google Scholar

Klatt, D. (1986). Detailed spectral analysis of a female voice. Journal of the Acoustical Society of America 80, Suppl.1: S97.Google Scholar

Ladefoged, P. (1967) Three Areas of Experimental Phonetics. Oxford: Oxford University Press.Google Scholar

Ladefoged, P. & Bladon, R.A.W. (1982) Attempts by human speakers to reproduce Fant's nomograms. Speech Communication 1, 185–98.Google Scholar

Lakoff, R. (1975). Language and Woman's Place. New York: Harper Colophon.Google Scholar

Markel, N.N., Prebor, L.D. & Brandt, J.F. (1972) Biosocial factors in dyadic communication: sex and speaking intensity. Journal of Personality and Social Psychology 23, 11–13.Google Scholar

McConnell-Ginet, S. (1983) Intonation in a man's world. In Thome, B., Kramarae, C. & Henley, N. (editors), Language, Gender and Society. Rowley, MA: Newbury House.Google Scholar

Monsen, R.B. & Engebretson, A.M. (1977) Study of variations in the male and female glottal wave. Journal of the Acoustical Society of America 62, 981–93.CrossRef Google Scholar PubMed

Muraskin, E. (1999). Today's TTS Technology. Computer Telephony, 03: 82–94.Google Scholar

Pellowe, J. & Jones, V. (1978). On intonational variability in Tyneside speech. In Trudgill, P. (editor) Sociolinguistic Patterns in British English, 101–21. London: Arnold.Google Scholar

Pinto, de O. & Hollien, H. (1982). Speaking fundamental frequency characteristics of Australian women: then and now. Journal of Phonetics 10, 367–75.Google Scholar

Raffler-Engel von, W. & Buckner, J. (1976). A difference beyond inherent pitch? In Dubois, B. & Crouch, I. (editors), The Sociology of the Languages of American Women. Texas.Google Scholar

Schwartz, M.F. (1968). Identification of speaker sex from isolated, voiceless fricatives. Journal of the Acoustical Society of America 43, 1178–79.CrossRef Google Scholar PubMed

Shuy, R. (1970). Sociolinguistic research at the Center for Applied Linguistics: the correlation of language and sex. International Days of Sociolinguistics. Institutio Luigi Sturzo: 849–57.Google Scholar

Smith, P. (1985). Language, the Sexes and Society. Oxford: Blackwell.Google Scholar

Weeninck, D.J.M. (1984). Literature overview on perceptual and physical normalization of speaker variation. Proceedings of the Institute for Phonetic Sciences, University of Amsterdam 8, 5–17.Google Scholar

Article contents

Where is Female Synthetic Speech?

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests