Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-28T12:57:39.906Z Has data issue: false hasContentIssue false

Segmenting words from natural speech: subsegmental variation in segmental cues*

Published online by Cambridge University Press:  22 March 2010

C. ANTON RYTTING*
Affiliation:
University of Maryland Center for Advanced Study of Language (CASL) and Department of Linguistics, the Ohio State University
CHRIS BREW
Affiliation:
Department of Computer Science and Engineering and Department of Linguistics, the Ohio State University
ERIC FOSLER-LUSSIER
Affiliation:
Department of Computer Science and Engineering and Department of Linguistics, the Ohio State University
*
Address for correspondence: C. A. Rytting, 7005 52nd Avenue, College Park, MD 20742, USA. tel: +1 (301) 226-8883. e-mail: [email protected]

Abstract

Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.

Type
Articles
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

[*]

Portions of this research were conducted with the monetary support of a National Science Foundation Graduate Research Fellowship awarded to the primary author while he was at the Ohio State University, as well as from NSF-ITR grant #0427413, granted to Chin-Hui Lee, Mark Clements, Keith Johnson, Lawrence Rabiner and Eric Fosler-Lussier for the multi-university Automatic Speech Attribute Transcription (ASAT) project. Preliminary versions of parts of this work, in particular Simulation 1, appear in the primary author's (unpublished) dissertation.

References

REFERENCES

Aslin, R. N., Woodward, J. Z., LaMendola, N. P. & Bever, T. G. (1996). Models of word segmentation in fluent maternal speech to infants. In Demuth, K. & Morgan, J. L. (eds), Signal to syntax: Bootstrapping from speech to grammar in early acquisition, 117–34. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Batchelder, E. O. (2002). Bootstrapping the lexicon: A computational model of infant speech segmentation. Cognition 83, 167206.CrossRefGoogle ScholarPubMed
Blank, D., Kumar, D., Meeden, L. & Yanco, H. (2003). Pyro: A Python-based versatile programming environment for teaching robotics. Journal of Educational Resources in Computing 3, 115.CrossRefGoogle Scholar
Brent, M. R. (1999). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning 34, 71–105.CrossRefGoogle Scholar
Brent, M. R. & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition 81, 3144.Google Scholar
Cairns, P., Shillcock, R., Chater, N. & Levy, J. (1997). Bootstrapping word boundaries: A bottom-up corpus based approach to speech segmentation. Cognitive Psychology 33, 111–53.CrossRefGoogle ScholarPubMed
Carterette, E. C. & Jones, M. H. (1974). Informal speech: Alphabetic and phonemic texts with statistical analyses and tables. Berkeley, CA: University of California Press.CrossRefGoogle Scholar
Cho, T. & Keating, P. A. (2007). Effects of initial position versus prominence in English. UCLA Working Papers in Phonetics 106, 133.Google Scholar
Christiansen, M. H. & Allen, J. (1997). Coping with variation in speech segmentation. In Sorace, A., Heycock, C. & Shillcock, R. (eds), Proceedings of the GALA ‘97 conference on language acquisition: Knowledge representation and processing, 327–32. Edinburgh: Edinburgh University Press.Google Scholar
Christiansen, M. H., Allen, J. & Seidenberg, M. (1998). Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes 13(2/3), 221–68.CrossRefGoogle Scholar
Christiansen, M. H., Conway, C. M. & Curtin, S. (2005). Multiple-cue integration in language acquisition: A connectionist model of speech segmentation and rule-like behavior. In Minett, J. W. & Wang, W. S. (eds), Language acquisition, change and emergence: Essays in evolutionary linguistics, 205249. Hong Kong: City University of Hong Kong Press.Google Scholar
CMU (1993). The Carnegie Mellon pronouncing dictionary, version 0.6. Pittsburgh, PA: Carnegie Mellon University. Retrieved from www.speech.cs.cmu.edu/cgi-bin/cmudict.Google Scholar
Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8, 181–95.CrossRefGoogle Scholar
Fleck, M. M. (2008). Lexicalized phonotactic word segmentation. In Proceedings of the 46th annual meeting of the Association for Computational Linguistics: Human language technologies, 130–38. Presented at the ACL’08, Columbus, OH: ACL.Google Scholar
Fosler-Lussier, E., Greenberg, S. & Morgan, N. (1999). Incorporating contextual phonetics into automatic speech recognition. In Ohala, John J., Hasegawa, Yoko, Ohala, Manjari, Granville, Daniel & Bailey, Ashlee C. (eds), Proceedings of the International Congress of Phonetic Sciences, 611–14, San Francisco. Berkeley, CA: University of California, Berkeley.Google Scholar
Fougeron, C. & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains. The Journal of the Acoustical Society of America 101, 3728–40.Google Scholar
Frank, M. C., Goldwater, S., Mansinghka, V., Griffiths, T. L. & Tenenbaum, J. (2007). Modeling human performance in statistical word segmentation. In McNamara, D. S. & Trafton, J. G. (eds), Proceedings of the 29th Annual Meeting of the Cognitive Science Society, 281–86. Austin, TX: Cognitive Science Society.Google Scholar
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S. & Dahlgren, N. L. (1993). DARPA TIMIT acoustic phonetic continuous speech corpus CD-ROM. Available from www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1Google Scholar
Goldwater, S., Griffiths, T. L. & Johnson, M. (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 2154.Google Scholar
Halberstadt, A. K. & Glass, J. R. (1997). Heterogeneous acoustic measurements for phonetic classification. In Proceedings of Eurospeech ′97, 401404. Rhodes: European Speech Communication Association.Google Scholar
Johnson, E. K. & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language 44(4), 548–67.CrossRefGoogle Scholar
de Jong, K. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America 91, 491504.CrossRefGoogle Scholar
Korman, M. (1984). Adaptive aspects of maternal vocalizations in differing contexts at ten weeks. First Language 5, 4445.Google Scholar
Krull, D. (1990). Relating acoustic properties to perceptual responses: A study of Swedish voiced stops. The Journal of the Acoustical Society of America 88, 2557–70.Google Scholar
Liu, Y. (2004). Structural event detection for rich transcription of speech. Unpublished doctoral dissertation, Purdue University.Google Scholar
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. Mahwah, NJ: Erlbaum.Google Scholar
de Marcken, C. G. (1996). Unsupervised language acquisition. Unpublished doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
McMurray, B. & Aslin, R. N. (2005). Infants are sensitive to within-category variation in speech perception. Cognition 95(2), B15B26.CrossRefGoogle ScholarPubMed
Narayan, C. R., Werker, J. F. & Beddor, P. S. (in press). The interaction between acoustic salience and language experience in developmental speech perception: Evidence from nasal place discrimination. Developmental Science.Google Scholar
Newman, R. S. (2005). The cocktail party effect in infants revisited: Listening to one's name in noise. Developmental Psychology 41, 352–62.Google Scholar
Newman, R. S., Bernstein Ratner, N., Jusczyk, A. M., Jusczyk, P. W. & Dow, K. A. (2006). Infants’ early ability to segment the conversational speech signal predicts later language development: A retrospective analysis. Developmental Psychology 42, 643–55.CrossRefGoogle ScholarPubMed
Pitt, M. A., Johnson, K., Hume, E., Kiesling, S. & Raymond, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication 45, 8995.CrossRefGoogle Scholar
Polka, L. & Rvachew, S. (2005). The impact of otitis media with effusion on infant phonetic perception. Infancy 8, 101117.CrossRefGoogle Scholar
Redford, M. A. & Diehl, R. L. (1999). The relative perceptual distinctiveness of initial and final consonants in CVC syllables. The Journal of the Acoustical Society of America 106, 1555.Google Scholar
Roy, D. & Pentland, A. (2002). Learning words from sights and sounds: A computational model. Cognitive Science 26(1), 113–46.Google Scholar
Rytting, C. A. (2007). Preserving Subsegmental Variation in Modeling Word Segmentation, or The Raising of Baby Mondegreen. Unpublished doctoral dissertation, The Ohio State University.Google Scholar
Scharenborg, O., Norris, D., ten Bosch, L. & McQueen, J. M. (2005). How should a speech recognizer work? Cognitive Science 29(6), 867918.CrossRefGoogle ScholarPubMed
Thiessen, E. D. & Saffran, J. R. (2004). Spectral tilt as a cue to word segmentation in infancy and adulthood. Perception and Psychophysics 66, 779–91.CrossRefGoogle ScholarPubMed
Werker, J. & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development 7, 4963.CrossRefGoogle Scholar
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al. (2002). The HTK Book. Cambridge: Cambridge University Engineering Department.Google Scholar