Probabilistic Models of Verbal and Body Gestures

doi:10.1017/CBO9780511569937.016

14 - Probabilistic Models of Verbal and Body Gestures

from Part three - Gesture Recognition and Interpretation

Published online by Cambridge University Press: 06 July 2010

C. Bregler ,

S.M. Omohundro ,

M. Covell ,

M. Slaney ,

S. Ahmad ,

D.A. Forsyth and

J.A. Feldman

Edited by

Roberto Cipolla and

Alex Pentland

Show author details

Roberto Cipolla: Affiliation:
University of Cambridge
Alex Pentland: Affiliation:
Massachusetts Institute of Technology

Book contents

Get access

Summary

Abstract

This chapter describes several probabilistic techniques for representing, recognizing, and generating spatiotemporal configuration sequences. We first describe how such techniques can be used to visually track and recognize lip movements in order to augment a speech recognition system. We then demonstrate additional techniques that can be used to animate video footage of talking faces and synchronize it to different sentences of an audio track. Finally we outline alternative low-level representations that are needed to apply these techniques to articulated body gestures.

Introduction

Gestures can be described as characteristic configurations over time. While uttering a sentence, we express very fine grained verbal gestures as complex lip configurations over time, and while performing bodily actions, we generate articulated configuration sequences of jointed arm and leg segments. Such configurations lie in constrained subspaces and different gestures are embodied as different characteristic trajectories in these constrained subspaces.

We present a general technique called Manifold Learning, that is able to estimate such constrained subspaces from example data. This technique is applied to the domain of tracking, recognition, and interpolation. Characteristic trajectories through such spaces are estimated using Hidden Markov Models. We show the utility of these techniques on the domain of visual acoustic recognition of continuous spelled letters.

We also show how visual acoustic lip and facial feature models can be used for the inverse task: facial animation. For this domain we developed a modified tracking technique and a different lip interpolation technique, as well as a more general decomposition of visual speech units based on Visemes.

Type: Chapter
Information: Computer Vision for Human-Machine Interaction , pp. 267 - 290

DOI: https://doi.org/10.1017/CBO9780511569937.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

14 - Probabilistic Models of Verbal and Body Gestures

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive