Book contents
- Frontmatter
- Contents
- List of contributors
- 1 Multimodal signal processing for meetings: an introduction
- 2 Data collection
- 3 Microphone arrays and beamforming
- 4 Speaker diarization
- 5 Speech recognition
- 6 Sampling techniques for audio-visual tracking and head pose estimation
- 7 Video processing and recognition
- 8 Language structure
- 9 Multimodal analysis of small-group conversational dynamics
- 10 Summarization
- 11 User requirements for meeting support technology
- 12 Meeting browsers and meeting assistants
- 13 Evaluation of meeting support technology
- 14 Conclusion and perspectives
- References
- Index
9 - Multimodal analysis of small-group conversational dynamics
Published online by Cambridge University Press: 05 July 2012
- Frontmatter
- Contents
- List of contributors
- 1 Multimodal signal processing for meetings: an introduction
- 2 Data collection
- 3 Microphone arrays and beamforming
- 4 Speaker diarization
- 5 Speech recognition
- 6 Sampling techniques for audio-visual tracking and head pose estimation
- 7 Video processing and recognition
- 8 Language structure
- 9 Multimodal analysis of small-group conversational dynamics
- 10 Summarization
- 11 User requirements for meeting support technology
- 12 Meeting browsers and meeting assistants
- 13 Evaluation of meeting support technology
- 14 Conclusion and perspectives
- References
- Index
Summary
Introduction
The analysis of conversational dynamics in small groups, like the one illustrated in Figure 9.1, is a fundamental area in social psychology and nonverbal communication (Goodwin, 1981, Clark and Carlson, 1982). Conversational patterns exist at multiple time scales, ranging from knowing how and when to address or interrupt somebody, how to gain or hold the floor of a conversation, and how to make transitions in discussions. Most of these mechanisms are multimodal, involving multiple verbal and nonverbal cues for their display and interpretation (Knapp and Hall, 2006), and have an important effect on how people are socially perceived, e.g., whether they are dominant, competent, or extraverted (Knapp and Hall, 2006, Pentland, 2008).
This chapter introduces some of the basic problems related to the automatic understanding of conversational group dynamics. Using low-level cues produced by audio, visual, and audio-visual perceptual processing components like the ones discussed in previous chapters, here we present techniques that aim at answering questions like: Who are the people being addressed or looked at? Are the involved people attentive? What conversational state is a group conversation currently in? Is a particular person likely perceived as dominant based on how they interact? As shown later in the book, obtaining answers for these questions is very useful to infer, through further analysis, even higher-level aspects of a group conversation and its participants.
The chapter is organized as follows. Section 9.2 provides the basic definitions of three conversational phenomena discussed in this chapter: attention, turn-taking, and addressing.
- Type
- Chapter
- Information
- Multimodal Signal ProcessingHuman Interactions in Meetings, pp. 155 - 169Publisher: Cambridge University PressPrint publication year: 2012
- 5
- Cited by