Phonetics and Eye-Tracking

doi:10.1017/9781108644198.019

18 - Phonetics and Eye-Tracking

from Section IV - Audition and Perception

Published online by Cambridge University Press: 11 November 2021

Eva Reinisch and

Holger Mitterer

Edited by

Rachael-Anne Knight and

Jane Setter

Show author details

Rachael-Anne Knight: Affiliation:
City, University of London
Jane Setter: Affiliation:
University of Reading

Book contents

Get access

Summary

Eye-tracking has proven to be a fruitful method to investigate how listeners process spoken language in real time. This chapter highlights the contribution of eye-tracking to our understanding of various classical issues in phonetics about the uptake of segmental and suprasegmental information during speech processing, as well as the role of gaze during speech perception. The review introduces the visual-world paradigm and shows how variations of this paradigm can be used to investigate the timing of cue uptake, how speech processing is influenced by phonetic context, how word recognition is affected by connected-speech processes, the use of word-level prosody such as lexical stress, and the role of intonation for reference resolution and sentence comprehension. Importantly, since the eye-tracking record is continuous, it allows us to distinguish early perceptual processes from post-perceptual processes. The chapter also provides a brief note on the most important issues to be considered in teaching and using eye-tracking, including comments on data processing, data analysis and interpretation, as well as suggestions for how to implement eye-tracking experiments.

Keywords

eye-tracking language comprehension spoken word recognition visual world paradigm continuous information uptake prosody design issues

Type: Chapter
Information: The Cambridge Handbook of Phonetics , pp. 457 - 479

DOI: https://doi.org/10.1017/9781108644198.019 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

18.7 References

Allopenna, P. D., Magnuson, J. S. & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38(4), 419–39.Google Scholar

Alsius, A., Navarra, J., Campbell, R. & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology, 15(9), 839–43. https://doi.org/10.1016/j.cub.2005.03.046.Google Scholar

Altmann, G. T. M. (2011). Language can mediate eye movement control within 100 milliseconds, regardless of whether there is anything to move the eyes to. Acta Psychologica, 137(2), 190–200. https://doi.org/10.1016/j.actpsy.2010.09.009.CrossRef Google Scholar PubMed

Arnold, J. E. (2008). THE BACON not the bacon: How children and adults understand accented and unaccented noun phrases. Cognition, 108(1), 69–99. https://doi.org/10.1016/j.cognition.2008.01.001.CrossRef Google Scholar

Barr, D. J. (2008). Analyzing ‘visual world’ eyetracking data using multilevel logistic regression. Journal of Memory and Language, 59(4), 457–74. https://doi.org/10.1016/j.jml.2007.09.002.Google Scholar

Beckman, M. & Hirschberg, J. (1994). The ToBI Annotation Conventions, Columbus, OH: Ohio State University.Google Scholar

Beddor, P. S., McGowan, K. B., Boland, J. E., Coetzee, A. W. & Brasher, A. (2013). The time course of perception of coarticulation. Journal of the Acoustical Society of America, 133(4), 2350–66. https://doi.org/10.1121/1.4794366.Google Scholar

Brouwer, S., Mitterer, H. & Huettig, F. (2012). Can hearing puter activate pupil? Phonological competition and the processing of reduced spoken words in spontaneous conversations. The Quarterly Journal of Experimental Psychology, 65(11), 2193–220. https://doi.org/10.1080/17470218.2012.693109.Google Scholar

Brouwer, S., Mitterer, H. & Huettig, F. (2013). Discourse context and the recognition of reduced and canonical spoken words. Applied Psycholinguistics, 34, 519–39. https://doi.org/10.1017/s0142716411000853.Google Scholar

Brown, M., Salverda, A. P., Dilley, L. C. & Tanenhaus, M. K. (2011). Expectations from preceding prosody influence segmentation in online sentence processing. Psychonomic Bulletin & Review, 18(6), 1189–96. https://doi.org/10.3758/s13423-011–0167-9.CrossRef Google Scholar PubMed

Brown, M., Salverda, A. P., Dilley, L. C. & Tanenhaus, M. K. (2015a). Metrical expectations from preceding prosody influence perception of lexical stress. Journal of Experimental Psychology: Human Perception and Performance, 41(2), 306–23. https://doi.org/10.1037/a0038689.Google Scholar PubMed

Brown, M., Salverda, A. P., Gunlogson, C. & Tanenhaus, M. K. (2015b). Interpreting prosodic cues in discourse context. Language, Cognition and Neuroscience, 30(1–2), 149–66. https://doi.org/10.1080/01690965.2013.862285.Google Scholar

Brown-Schmidt, S. & Toscano, J. C. (2017). Gradient acoustic information induces long-lasting referential uncertainty in short discourses. Language, Cognition and Neuroscience, 32(10), 1211–28. https://doi.org/10.1080/23273798.2017.1325508.Google Scholar

Chen, A., den Os, E. & de Ruiter, J. P. (2007). Pitch accent type matters for online processing of information status: Evidence from natural and synthetic speech. The Linguistic Review, 24(2–3), 317–44. https://doi.org/10.1515/TLR.2007.012.CrossRef Google Scholar

Clayards, M., Niebuhr, O. & Gaskell, M. G. (2015). The time course of auditory and language-specific mechanisms in compensation for sibilant assimilation. Attention, Perception & Psychophysics, 77(1), 311–28. https://doi.org/10.3758/s13414-014–0750-z.Google Scholar

Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6(1), 84–107. https://doi.org/10.1016/0010–0285(74)90005-X.CrossRef Google Scholar

Cutler, A., Weber, A. & Otake, T. (2006). Asymmetric mapping from phonetic to lexical representations in second-language listening. Journal of Phonetics, 34, 269–84. https://doi.org/10.1016/j.wocn.2005.06.002.Google Scholar

Dahan, D. & Tanenhaus, M. K. (2004). Continuous mapping from sound to meaning in spoken-language comprehension: Immediate effects of verb-based thematic constraints. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 498–513.Google Scholar PubMed

Dahan, D., Tanenhaus, M. K. & Chambers, C. G. (2002). Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language, 47(2), 292–314. https://doi.org/10.1016/S0749-596X(02)00001–3.CrossRef Google Scholar

Donnelly, S. & Verkuilen, J. (2017). Empirical logit analysis is not logistic regression. Journal of Memory and Language, 94, 28–42. https://doi.org/10.1016/j.jml.2016.10.005.Google Scholar

Escudero, P., Hayes-Harb, R. & Mitterer, H. (2008). Novel second-language words and asymmetric lexical access. Journal of Phonetics, 36, 345–60. https://doi.org/10.1016/j.wocn.2007.11.002.CrossRef Google Scholar

Gow, D. W. & McMurray, B. (2007). Word recognition and phonology: The case of English coronal place assimilation. In Cole, J. & Hualde, J., eds., Laboratory Phonology 9. New York: Mouton de Gruyter, pp. 173–200.Google Scholar

Hanulíková, A. & Weber, A. (2012). Sink positive: Linguistic experience with th substitutions influences nonnative word recognition. Attention, Perception & Psychophysics, 74(3), 613–29. https://doi.org/10.3758/s13414-011–0259-7.CrossRef Google Scholar PubMed

Heeren, W. F. L., Bibyk, S. A., Gunlogson, C. & Tanenhaus, M. K. (2015). Asking or telling: Real-time processing of prosodically distinguished questions and statements. Language and Speech, 58(4), 474–501. https://doi.org/10.1177/0023830914564452.Google Scholar

Hisanaga, S., Sekiyama, K., Igasaki, T. & Murayama, N. (2016). Language/culture modulates brain and gaze processes in audiovisual speech perception. Scientific Reports, 6, srep35265. https://doi.org/10.1038/srep35265.Google Scholar

Huettig, F. & Altmann, G. T. M. (2007). Visual-shape competition during language-mediated attention is based on lexical input and not modulated by contextual appropriateness. Visual Cognition, 15(8), 985–1018. https://doi.org/10.1080/13506280601130875.Google Scholar

Huettig, F. & McQueen, J. M. (2007). The tug of war between phonological, semantic and shape information in language-mediated visual search. Journal of Memory and Language, 57(4), 460–82. https://doi.org/10.1016/j.jml.2007.02.001.CrossRef Google Scholar

Huettig, F., Rommers, J. & Meyer, A. S. (2011). Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica, 137(2), 151–71. https://doi.org/10.1016/j.actpsy.2010.11.003.Google Scholar

Ito, K. & Speer, S. R. (2008). Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language, 58(2), 541–73. https://doi.org/10.1016/j.jml.2007.06.013.Google Scholar

Ito, K. & Speer, S. R. (2011). Semantically-independent but contextually-dependent interpretation of contrastive accent. In Frota, S., Elordieta, G. & Prieto, P., eds., Prosodic Categories: Production, Perception and Comprehension. Dordrecht: Springer, pp. 69–92. https://doi.org/10.1007/978–94-007–0137-3_4.Google Scholar

Jesse, A., Poellmann, K. & Kong, Y.-Y. (2017). English listeners use suprasegmental cues to lexical stress early during spoken-word recognition. Journal of Speech, Language, and Hearing Research, 60(1), 190–8. https://doi.org/10.1044/2016_JSLHR-H-15–0340.Google Scholar

Kingston, J., Levy, J., Rysling, A. & Staub, A. (2016). Eye movement evidence for an immediate Ganong effect. Journal of Experimental Psychology. Human Perception and Performance, 42(12), 1969–88. https://doi.org/10.1037/xhp0000269.CrossRef Google Scholar PubMed

Liberman, A. M., Harris, K. S., Hoffman, H. S. & Griffith, B. C. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54(5), 358–68. https://doi.org/10.1037/h0044417.Google Scholar

Llompart, M. & Reinisch, E. (2017). Articulatory information helps encode lexical contrasts in a second language. Journal of Experimental Psychology: Human Perception and Performance, 43(5), 1040–56. https://doi.org/10.1037/xhp0000383.Google Scholar

Magnuson, J. S., Dixon, J. A., Tanenhaus, M. K. & Aslin, R. N. (2007). The dynamics of lexical competition during spoken word recognition. Cognitive Science, 31(1), 133–56. https://doi.org/10.1080/03640210709336987.CrossRef Google Scholar PubMed

Malins, J. G. & Joanisse, M. F. (2010). The roles of tonal and segmental information in Mandarin spoken word recognition: An eyetracking study. Journal of Memory and Language, 62(4), 407–20. https://doi.org/10.1016/j.jml.2010.02.004.CrossRef Google Scholar

McClelland, J. L. & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1–86. https://doi.org/10.1016/0010–0285(86)90015–0.CrossRef Google Scholar PubMed

McMurray, B., Clayards, M. A., Tanenhaus, M. K. & Aslin, R. N. (2008). Tracking the time course of phonetic cue integration during spoken word recognition. Psychonomic Bulletin & Review, 15(6), 1064–71.Google Scholar

McMurray, B., Tanenhaus, M. K. & Aslin, R. N. (2009). Within-category VOT affects recovery from ‘lexical’ garden paths: Evidence against phoneme-level inhibition. Journal of Memory and Language, 60(1), 65–91. https://doi.org/10.1016/j.jml.2008.07.002.Google Scholar

Mirman, D., Dixon, J. A. & Magnuson, J. S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), 475–94. https://doi.org/10.1016/j.jml.2007.11.006.Google Scholar

Mitterer, H. & Ernestus, M. (2006). Listeners recover /t/s that speakers reduce: Evidence from /t/-lenition in Dutch. Journal of Phonetics, 34(1), 73–103. https://doi.org/10.1016/j.wocn.2005.03.003.Google Scholar

Mitterer, H. & McQueen, J. M. (2009). Processing reduced word-forms in speech perception using probabilistic knowledge about speech production. Journal of Experimental Psychology: Human Perception and Performance, 35(1), 244–63. https://doi.org/10.1037/a0012730.Google Scholar PubMed

Mitterer, H. & Reinisch, E. (2013). No delays in application of perceptual learning in speech recognition: Evidence from eye tracking. Journal of Memory and Language, 69(4), 527–45. https://doi.org/10.1016/j.jml.2013.07.002.Google Scholar

Mitterer, H. & Reinisch, E. (2015). Letters don’t matter: No effect of orthography on the perception of conversational speech. Journal of Memory and Language, 85, 116–34. https://doi.org/10.1016/j.jml.2015.08.005.Google Scholar

Mitterer, H. & Reinisch, E. (2017). Visual speech influences speech perception immediately but not automatically. Attention, Perception & Psychophysics, 79(2), 660–78. https://doi.org/10.3758/s13414-016–1249-6.Google Scholar

Mitterer, H., Kim, S. & Cho, T. (2013). Compensation for complete assimilation in speech perception: The case of Korean labial-to-velar assimilation. Journal of Memory and Language, 69(1), 59–83. https://doi.org/10.1016/j.jml.2013.02.001.Google Scholar

Nakamura, C., Arai, M. & Mazuka, R. (2012). Immediate use of prosody and context in predicting a syntactic structure. Cognition, 125(2), 317–23. https://doi.org/10.1016/j.cognition.2012.07.016.Google Scholar

Nixon, J. S., van Rij, J., Mok, P., Baayen, R. H. & Chen, Y. (2016). The temporal dynamics of perceptual uncertainty: Eye movement evidence from Cantonese segment and tone perception. Journal of Memory and Language, 90, 103–25. https://doi.org/10.1016/j.jml.2016.03.005.CrossRef Google Scholar

Quam, C. & Swingley, D. (2014). Processing of lexical stress cues by young children. Journal of Experimental Child Psychology, 123, 73–89. https://doi.org/10.1016/j.jecp.2014.01.010.CrossRef Google Scholar PubMed

Reinisch, E. & Sjerps, M. J. (2013). The uptake of spectral and temporal cues in vowel perception is rapidly influenced by context. Journal of Phonetics, 41(2), 101–16.Google Scholar

Reinisch, E. & Weber, A. (2012). Adapting to suprasegmental lexical stress errors in foreign-accented speech. Journal of the Acoustical Society of America, 132(2), 1165–76.Google Scholar

Reinisch, E., Jesse, A. & McQueen, J. M. (2010). Early use of phonetic information in spoken word recognition: Lexical stress drives eye movements immediately. The Quarterly Journal of Experimental Psychology, 63(4), 772–83.CrossRef Google Scholar PubMed

Reinisch, E., Jesse, A. & McQueen, J. M. (2011). Speaking rate from proximal and distal contexts is used during word segmentation. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 978.Google Scholar

Rossano, F., Brown, P. & Levinson, S. C. (2009). Gaze, questioning and culture. In Sidnell, J., ed., Conversation Analysis: Comparative Perspectives. Cambridge: Cambridge University Press, pp. 187–249.Google Scholar

Salverda, A. P. & Tanenhaus, M. K. (2010). Tracking the time course of orthographic information in spoken-word recognition. Journal of Experimental Psychology. Learning, Memory, and Cognition, 36(5), 1108–17. https://doi.org/10.1037/a0019901.Google Scholar

Salverda, A. P., Dahan, D. & McQueen, J. M. (2003). The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension. Cognition, 90(1), 51–89. https://doi.org/10.1016/S0010-0277(03)00139–2.CrossRef Google Scholar PubMed

Salverda, A. P., Dahan, D., Tanenhaus, M. K., Crosswhite, K., Masharov, M. & McDonough, J. (2007). Effects of prosodically-modulated sub-phonetic variation on lexical competition. Cognition, 105(2), 466–76. https://doi.org/10.1016/j.cognition.2006.10.008.Google Scholar

Salverda, A. P., Kleinschmidt, D. & Tanenhaus, M. K. (2014). Immediate effects of anticipatory coarticulation in spoken-word recognition. Journal of Memory and Language, 71(1), 145–63. https://doi.org/10.1016/j.jml.2013.11.002.Google Scholar

Sedivy, J. C., Tanenhaus, M. K., Chambers, C. G. & Carlson, G. N. (1999). Achieving incremental semantic interpretation through contextual representation. Cognition, 71(2), 109–47. https://doi.org/10.1016/S0010-0277(99)00025–6.Google Scholar

Shatzman, K. B. & McQueen, J. M. (2006a). Prosodic knowledge affects the recognition of newly acquired words. Psychological Science, 17(5), 372–7. https://doi.org/10.1111/j.1467–9280.2006.01714.x.Google Scholar

Shatzman, K. B. & McQueen, J. M. (2006b). Segment duration as a cue to word boundaries in spoken-word recognition. Perception & Psychophysics, 68(1), 1–16. https://doi.org/10.3758/BF03193651.Google Scholar

Shatzman, K. B. & McQueen, J. M. (2006c). The modulation of lexical competition by segment duration. Psychonomic Bulletin & Review, 13(6), 966–71. https://doi.org/10.3758/BF03213910.Google Scholar

Shen, J., Deutsch, D. & Rayner, K. (2013). On-line perception of Mandarin Tones 2 and 3: Evidence from eye movements. Journal of the Acoustical Society of America, 133(5), 3016–29. https://doi.org/10.1121/1.4795775.Google Scholar

Shockey, L. (2003). Sound Patterns of Spoken English. Cambridge, MA: Blackwell.Google Scholar

Snedeker, J. & Trueswell, J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language, 48(1), 103–30. https://doi.org/10.1016/S0749-596X(02)00519–3.Google Scholar

Somppi, S., Törnqvist, H., Hänninen, L., Krause, C. & Vainio, O. (2012). Dogs do look at images: Eye tracking in canine cognition research. Animal Cognition, 15(2), 163–74. https://doi.org/10.1007/s10071-011–0442-1.Google Scholar

Sulpizio, S. & McQueen, J. M. (2012). Italians use abstract knowledge about lexical stress during spoken-word recognition. Journal of Memory and Language, 66(1), 177–93. https://doi.org/10.1016/j.jml.2011.08.001.CrossRef Google Scholar

Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M. & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268(5217), 1632–4.Google Scholar

Toscano, J. C. & McMurray, B. (2015). The time-course of speaking rate compensation: Effects of sentential rate and vowel length on voicing judgments. Language, Cognition and Neuroscience, 30(5), 529–43. https://doi.org/10.1080/23273798.2014.946427.Google Scholar

van der Heijden, A. H. C. (1992). Selective Attention in Vision. New York: Routledge.Google Scholar

Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S. & Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics, 60(6), 926–40. https://doi.org/10.3758/BF03211929.Google Scholar

Viebahn, M. C., Ernestus, M. & McQueen, J. M. (2015). Syntactic predictability in the recognition of carefully and casually produced speech. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6), 1684–702. https://doi.org/10.1037/a0039326.Google Scholar

Watson, D. G., Tanenhaus, M. K. & Gunlogson, C. A. (2008). Interpreting pitch accents in online comprehension: H* vs. L+H*. Cognitive Science, 32(7), 1232–44. https://doi.org/10.1080/03640210802138755.Google Scholar

Weber, A. & Cutler, A. (2004). Lexical competition in non-native spoken-word recognition. Journal of Memory and Language, 50, 1–25. https://doi.org/10.1016/S0749-596X(03)00105–0.Google Scholar

Weber, A., Braun, B. & Crocker, M. W. (2006a). Finding referents in time: Eye-tracking evidence for the role of contrastive accents. Language and Speech, 49(3), 367–92. https://doi.org/10.1177/00238309060490030301.CrossRef Google Scholar PubMed

Weber, A., Grice, M. & Crocker, M. W. (2006b). The role of prosody in the interpretation of structural ambiguities: A study of anticipatory eye movements. Cognition, 99(2), B63–B72. https://doi.org/10.1016/j.cognition.2005.07.001.Google Scholar

Westfall, J., Kenny, D. A. & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology. General, 143(5), 2020–45. https://doi.org/10.1037/xge0000014.Google Scholar