Published online by Cambridge University Press: 22 March 2010
Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
Portions of this research were conducted with the monetary support of a National Science Foundation Graduate Research Fellowship awarded to the primary author while he was at the Ohio State University, as well as from NSF-ITR grant #0427413, granted to Chin-Hui Lee, Mark Clements, Keith Johnson, Lawrence Rabiner and Eric Fosler-Lussier for the multi-university Automatic Speech Attribute Transcription (ASAT) project. Preliminary versions of parts of this work, in particular Simulation 1, appear in the primary author's (unpublished) dissertation.