Book contents
- Frontmatter
- Contents
- Acknowledgements
- 1 Introduction
- 2 Corner extraction and tracking
- 3 The affine camera and affine structure
- 4 Clustering using maximum affinity spanning trees
- 5 Affine epipolar geometry
- 6 Outlier rejection in an orthogonal regression framework
- 7 Rigid motion from affine epipolar geometry
- 8 Affine transfer
- 9 Conclusions
- A Clustering proofs
- B Proofs for epipolar geometry minimisation
- C Proofs for outlier rejection
- D Rotation matrices
- E KvD motion equations
- Bibliography
- Index
1 - Introduction
Published online by Cambridge University Press: 18 December 2009
- Frontmatter
- Contents
- Acknowledgements
- 1 Introduction
- 2 Corner extraction and tracking
- 3 The affine camera and affine structure
- 4 Clustering using maximum affinity spanning trees
- 5 Affine epipolar geometry
- 6 Outlier rejection in an orthogonal regression framework
- 7 Rigid motion from affine epipolar geometry
- 8 Affine transfer
- 9 Conclusions
- A Clustering proofs
- B Proofs for epipolar geometry minimisation
- C Proofs for outlier rejection
- D Rotation matrices
- E KvD motion equations
- Bibliography
- Index
Summary
Motivation
Sight is the sense that provides the highest information content – in engineering terms, the highest bandwidth – to the human brain. A computer vision system, essentially a “TV camera connected to a computer”, aims to perform on a machine the tasks which our own visual system seems to perform so effortlessly. Since the world is constantly in motion, it comes as no surprise that time–varying imagery reveals valuable information about the environment. Indeed, some information is easier to obtain from a image sequence than from a single image [62]. Thus, as noted by Murray and Buxton, “understanding motion is a principal requirement for a machine or organism to interact meaningfully with its environment” [100] (page 1). For this reason, the analysis of image sequences to extract 3D motion and structure has been at the heart of computer vision research for the past decade [172].
The problem involves two key difficulties. First, the useful content of an image sequence is intricately coded and implicit in an enormous volume of sensory data. Making this information explicit entails significant data reduction, to decode the spatio–temporal correlations of the intensity values and eliminate redundancy. Second, information is lost in projecting the three spatial dimensions of the world onto the two dimensions of the image. Assumptions about the camera model and imaging geometry are therefore required.
This thesis develops new algorithms to interpret visual motion using a single camera, and demonstrates the practical feasibility of recovering scene structure and motion in a data-driven (or “bottom-up”) fashion. Section 1.2 outlines the basic themes and describes the system architecture.
- Type
- Chapter
- Information
- Affine Analysis of Image Sequences , pp. 1 - 8Publisher: Cambridge University PressPrint publication year: 1995