Segmenting words from natural speech: subsegmental variation in segmental cues*

C. ANTON RYTTING; CHRIS BREW; ERIC FOSLER-LUSSIER

doi:10.1017/S0305000910000085

Segmenting words from natural speech: subsegmental variation in segmental cues*

Published online by Cambridge University Press: 22 March 2010

C. ANTON RYTTING ,

CHRIS BREW and

ERIC FOSLER-LUSSIER

Show author details

C. ANTON RYTTING*: Affiliation:
University of Maryland Center for Advanced Study of Language (CASL) and Department of Linguistics, the Ohio State University
CHRIS BREW: Affiliation:
Department of Computer Science and Engineering and Department of Linguistics, the Ohio State University
ERIC FOSLER-LUSSIER: Affiliation:
Department of Computer Science and Engineering and Department of Linguistics, the Ohio State University
*: Address for correspondence: C. A. Rytting, 7005 52nd Avenue, College Park, MD 20742, USA. tel: +1 (301) 226-8883. e-mail: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.

Type: Articles
Information: Journal of Child Language , Volume 37 , Special Issue 3: Computational models of child language learning , June 2010 , pp. 513 - 543

DOI: https://doi.org/10.1017/S0305000910000085 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

[*]

Portions of this research were conducted with the monetary support of a National Science Foundation Graduate Research Fellowship awarded to the primary author while he was at the Ohio State University, as well as from NSF-ITR grant #0427413, granted to Chin-Hui Lee, Mark Clements, Keith Johnson, Lawrence Rabiner and Eric Fosler-Lussier for the multi-university Automatic Speech Attribute Transcription (ASAT) project. Preliminary versions of parts of this work, in particular Simulation 1, appear in the primary author's (unpublished) dissertation.

References

REFERENCES

Aslin, R. N., Woodward, J. Z., LaMendola, N. P. & Bever, T. G. (1996). Models of word segmentation in fluent maternal speech to infants. In Demuth, K. & Morgan, J. L. (eds), Signal to syntax: Bootstrapping from speech to grammar in early acquisition, 117–34. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Batchelder, E. O. (2002). Bootstrapping the lexicon: A computational model of infant speech segmentation. Cognition 83, 167–206.CrossRef Google Scholar PubMed

Blank, D., Kumar, D., Meeden, L. & Yanco, H. (2003). Pyro: A Python-based versatile programming environment for teaching robotics. Journal of Educational Resources in Computing 3, 1–15.CrossRef Google Scholar

Brent, M. R. (1999). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning 34, 71–105.CrossRef Google Scholar

Brent, M. R. & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition 81, 31–44.Google Scholar

Cairns, P., Shillcock, R., Chater, N. & Levy, J. (1997). Bootstrapping word boundaries: A bottom-up corpus based approach to speech segmentation. Cognitive Psychology 33, 111–53.CrossRef Google Scholar PubMed

Carterette, E. C. & Jones, M. H. (1974). Informal speech: Alphabetic and phonemic texts with statistical analyses and tables. Berkeley, CA: University of California Press.CrossRef Google Scholar

Cho, T. & Keating, P. A. (2007). Effects of initial position versus prominence in English. UCLA Working Papers in Phonetics 106, 1–33.Google Scholar

Christiansen, M. H. & Allen, J. (1997). Coping with variation in speech segmentation. In Sorace, A., Heycock, C. & Shillcock, R. (eds), Proceedings of the GALA ‘97 conference on language acquisition: Knowledge representation and processing, 327–32. Edinburgh: Edinburgh University Press.Google Scholar

Christiansen, M. H., Allen, J. & Seidenberg, M. (1998). Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes 13(2/3), 221–68.CrossRef Google Scholar

Christiansen, M. H., Conway, C. M. & Curtin, S. (2005). Multiple-cue integration in language acquisition: A connectionist model of speech segmentation and rule-like behavior. In Minett, J. W. & Wang, W. S. (eds), Language acquisition, change and emergence: Essays in evolutionary linguistics, 205–249. Hong Kong: City University of Hong Kong Press.Google Scholar

CMU (1993). The Carnegie Mellon pronouncing dictionary, version 0.6. Pittsburgh, PA: Carnegie Mellon University. Retrieved from www.speech.cs.cmu.edu/cgi-bin/cmudict.Google Scholar

Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8, 181–95.CrossRef Google Scholar

Fleck, M. M. (2008). Lexicalized phonotactic word segmentation. In Proceedings of the 46th annual meeting of the Association for Computational Linguistics: Human language technologies, 130–38. Presented at the ACL’08, Columbus, OH: ACL.Google Scholar

Fosler-Lussier, E., Greenberg, S. & Morgan, N. (1999). Incorporating contextual phonetics into automatic speech recognition. In Ohala, John J., Hasegawa, Yoko, Ohala, Manjari, Granville, Daniel & Bailey, Ashlee C. (eds), Proceedings of the International Congress of Phonetic Sciences, 611–14, San Francisco. Berkeley, CA: University of California, Berkeley.Google Scholar

Fougeron, C. & Keating, P. A. (1997). Articulatory strengthening at edges of prosodic domains. The Journal of the Acoustical Society of America 101, 3728–40.Google Scholar

Frank, M. C., Goldwater, S., Mansinghka, V., Griffiths, T. L. & Tenenbaum, J. (2007). Modeling human performance in statistical word segmentation. In McNamara, D. S. & Trafton, J. G. (eds), Proceedings of the 29th Annual Meeting of the Cognitive Science Society, 281–86. Austin, TX: Cognitive Science Society.Google Scholar

Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S. & Dahlgren, N. L. (1993). DARPA TIMIT acoustic phonetic continuous speech corpus CD-ROM. Available from www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1 Google Scholar

Goldwater, S., Griffiths, T. L. & Johnson, M. (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.Google Scholar

Halberstadt, A. K. & Glass, J. R. (1997). Heterogeneous acoustic measurements for phonetic classification. In Proceedings of Eurospeech ′97, 401–404. Rhodes: European Speech Communication Association.Google Scholar

Johnson, E. K. & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language 44(4), 548–67.CrossRef Google Scholar

de Jong, K. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. Journal of the Acoustical Society of America 91, 491–504.CrossRef Google Scholar

Korman, M. (1984). Adaptive aspects of maternal vocalizations in differing contexts at ten weeks. First Language 5, 44–45.Google Scholar

Krull, D. (1990). Relating acoustic properties to perceptual responses: A study of Swedish voiced stops. The Journal of the Acoustical Society of America 88, 2557–70.Google Scholar

Liu, Y. (2004). Structural event detection for rich transcription of speech. Unpublished doctoral dissertation, Purdue University.Google Scholar

MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. Mahwah, NJ: Erlbaum.Google Scholar

de Marcken, C. G. (1996). Unsupervised language acquisition. Unpublished doctoral dissertation, Massachusetts Institute of Technology.Google Scholar

McMurray, B. & Aslin, R. N. (2005). Infants are sensitive to within-category variation in speech perception. Cognition 95(2), B15–B26.CrossRef Google Scholar PubMed

Narayan, C. R., Werker, J. F. & Beddor, P. S. (in press). The interaction between acoustic salience and language experience in developmental speech perception: Evidence from nasal place discrimination. Developmental Science.Google Scholar

Newman, R. S. (2005). The cocktail party effect in infants revisited: Listening to one's name in noise. Developmental Psychology 41, 352–62.Google Scholar

Newman, R. S., Bernstein Ratner, N., Jusczyk, A. M., Jusczyk, P. W. & Dow, K. A. (2006). Infants’ early ability to segment the conversational speech signal predicts later language development: A retrospective analysis. Developmental Psychology 42, 643–55.CrossRef Google Scholar PubMed

Pitt, M. A., Johnson, K., Hume, E., Kiesling, S. & Raymond, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication 45, 89–95.CrossRef Google Scholar

Polka, L. & Rvachew, S. (2005). The impact of otitis media with effusion on infant phonetic perception. Infancy 8, 101–117.CrossRef Google Scholar

Redford, M. A. & Diehl, R. L. (1999). The relative perceptual distinctiveness of initial and final consonants in CVC syllables. The Journal of the Acoustical Society of America 106, 1555.Google Scholar

Roy, D. & Pentland, A. (2002). Learning words from sights and sounds: A computational model. Cognitive Science 26(1), 113–46.Google Scholar

Rytting, C. A. (2007). Preserving Subsegmental Variation in Modeling Word Segmentation, or The Raising of Baby Mondegreen. Unpublished doctoral dissertation, The Ohio State University.Google Scholar

Scharenborg, O., Norris, D., ten Bosch, L. & McQueen, J. M. (2005). How should a speech recognizer work? Cognitive Science 29(6), 867–918.CrossRef Google Scholar PubMed

Thiessen, E. D. & Saffran, J. R. (2004). Spectral tilt as a cue to word segmentation in infancy and adulthood. Perception and Psychophysics 66, 779–91.CrossRef Google Scholar PubMed

Werker, J. & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development 7, 49–63.CrossRef Google Scholar

Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al. (2002). The HTK Book. Cambridge: Cambridge University Engineering Department.Google Scholar

Article contents

Segmenting words from natural speech: subsegmental variation in segmental cues*

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests