Book contents
- The Cambridge Handbook of Phonetics
- Cambridge Handbooks in Language and Linguistics
- The Cambridge Handbook of Phonetics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Introduction
- Section I Segmental Production
- Section II Prosodic Production
- Section III Measuring Speech
- Section IV Audition and Perception
- 16 Neurological Foundations of Phonetic Sciences
- 17 Psycholinguistic Aspects
- 18 Phonetics and Eye-Tracking
- 19 Automatic Speech Recognition by Machines
- Section V Applications of Phonetics
- Index
- References
19 - Automatic Speech Recognition by Machines
from Section IV - Audition and Perception
Published online by Cambridge University Press: 11 November 2021
- The Cambridge Handbook of Phonetics
- Cambridge Handbooks in Language and Linguistics
- The Cambridge Handbook of Phonetics
- Copyright page
- Contents
- Figures
- Tables
- Contributors
- Introduction
- Section I Segmental Production
- Section II Prosodic Production
- Section III Measuring Speech
- Section IV Audition and Perception
- 16 Neurological Foundations of Phonetic Sciences
- 17 Psycholinguistic Aspects
- 18 Phonetics and Eye-Tracking
- 19 Automatic Speech Recognition by Machines
- Section V Applications of Phonetics
- Index
- References
Summary
Building machines to converse with human beings through automatic speech recognition (ASR) and understanding (ASU) has long been a topic of great interest for scientists and engineers, and we have recently witnessed rapid technological advances in this area. Here, we first cast the ASR problem as a pattern-matching and channel-decoding paradigm. We then follow this with a discussion of the Hidden Markov Model (HMM), which is the most successful technique for modelling fundamental speech units, such as phones and words, in order to solve ASR as a search through a top-down decoding network. Recent advances using deep neural networks as parts of an ASR system are also highlighted. We then compare the conventional top-down decoding approach with the recently proposed automatic speech attribute transcription (ASAT) paradigm, which can better leverage knowledge sources in speech production, auditory perception and language theory through bottom-up integration. Finally we discuss how the processing-based speech engineering and knowledge-based speech science communities can work collaboratively to improve our understanding of speech and enhance ASR capabilities.
Keywords
- Type
- Chapter
- Information
- The Cambridge Handbook of Phonetics , pp. 480 - 500Publisher: Cambridge University PressPrint publication year: 2021
References
19.7 References
- 1
- Cited by