Book contents
- Frontmatter
- Contents
- Foreword: Out of Sight, Out of Mind
- Preface
- Part one New Interfaces and Novel Applications
- Part two Tracking Human Action
- Part three Gesture Recognition and Interpretation
- 11 A Framework for Gesture Generation and Interpretation
- 12 Model-Based Interpretation of Faces and Hand Gestures
- 13 Recognition of Hand Signs from Complex Backgrounds
- 14 Probabilistic Models of Verbal and Body Gestures
- 15 Looking at Human Gestures
- Acknowledgements
- Bibliography
- List of contributors
14 - Probabilistic Models of Verbal and Body Gestures
from Part three - Gesture Recognition and Interpretation
Published online by Cambridge University Press: 06 July 2010
- Frontmatter
- Contents
- Foreword: Out of Sight, Out of Mind
- Preface
- Part one New Interfaces and Novel Applications
- Part two Tracking Human Action
- Part three Gesture Recognition and Interpretation
- 11 A Framework for Gesture Generation and Interpretation
- 12 Model-Based Interpretation of Faces and Hand Gestures
- 13 Recognition of Hand Signs from Complex Backgrounds
- 14 Probabilistic Models of Verbal and Body Gestures
- 15 Looking at Human Gestures
- Acknowledgements
- Bibliography
- List of contributors
Summary
Abstract
This chapter describes several probabilistic techniques for representing, recognizing, and generating spatiotemporal configuration sequences. We first describe how such techniques can be used to visually track and recognize lip movements in order to augment a speech recognition system. We then demonstrate additional techniques that can be used to animate video footage of talking faces and synchronize it to different sentences of an audio track. Finally we outline alternative low-level representations that are needed to apply these techniques to articulated body gestures.
Introduction
Gestures can be described as characteristic configurations over time. While uttering a sentence, we express very fine grained verbal gestures as complex lip configurations over time, and while performing bodily actions, we generate articulated configuration sequences of jointed arm and leg segments. Such configurations lie in constrained subspaces and different gestures are embodied as different characteristic trajectories in these constrained subspaces.
We present a general technique called Manifold Learning, that is able to estimate such constrained subspaces from example data. This technique is applied to the domain of tracking, recognition, and interpolation. Characteristic trajectories through such spaces are estimated using Hidden Markov Models. We show the utility of these techniques on the domain of visual acoustic recognition of continuous spelled letters.
We also show how visual acoustic lip and facial feature models can be used for the inverse task: facial animation. For this domain we developed a modified tracking technique and a different lip interpolation technique, as well as a more general decomposition of visual speech units based on Visemes.
- Type
- Chapter
- Information
- Computer Vision for Human-Machine Interaction , pp. 267 - 290Publisher: Cambridge University PressPrint publication year: 1998
- 4
- Cited by