Large-Scale Learning for Vision with GPUs

doi:10.1017/CBO9781139042918.019

18 - Large-Scale Learning for Vision with GPUs

from Part Four - Applications

Published online by Cambridge University Press: 05 February 2012

Rajat Raina and

Edited by

Mikhail Bilenko and

Adam Coates: Affiliation:
Stanford University
Rajat Raina: Affiliation:
Facebook Inc., Palo Alto, CA, USA
Andrew Y. Ng: Affiliation:
Stanford University
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

Computer vision is a challenging application area for learning algorithms. For instance, the task of object detection is a critical problem for many systems, like mobile robots, that remains largely unsolved. In order to interact with the world, robots must be able to locate and recognize large numbers of objects accurately and at reasonable speeds. Unfortunately, off-the-shelf computer vision algorithms do not yet achieve sufficiently high detection performance for these applications. A key difficulty with many existing algorithms is that they are unable to take advantage of large numbers of examples. As a result, they must rely heavily on prior knowledge and hand-engineered features that account for the many kinds of errors that can occur. In this chapter, we present two methods for improving performance by scaling up learning algorithms to large datasets: (1) using graphics processing units (GPUs) and distributed systems to scale up the standard components of computer vision algorithms and (2) using GPUs to automatically learn high-quality feature representations using deep belief networks (DBNs). These methods are capable of not only achieving high performance but also removing much of the need for hand-engineering common in computer vision algorithms.

The fragility of many vision algorithms comes from their lack of knowledge about the multitude of visual phenomena that occur in the real world. Whereas humans can intuit information about depth, occlusion, lighting, and even motion from still images, computer vision algorithms generally lack the ability to deal with these phenomena without being engineered to account for them in advance.

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 373 - 398

DOI: https://doi.org/10.1017/CBO9781139042918.019 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alsabti, K., Ranka, S., and Singh, V. 1998. CLOUDS: A decision tree classifier for large datasets. In: 4th International Conference on Knowledge Discovery and Data Mining.Google Scholar

Banko, M., and Brill, E. 2001. Scaling to Very Very Large Corpora for Natural Language Disambiguation. In: 39th Annual Meeting on Association for Computational Linguistics.Google Scholar

Bengio, Y. 2007. Speeding Up Stochastic Gradient Descent. In: Advances in Neural Information Processing Systems: Workshop on Efficient Machine Learning.Google Scholar

Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. 2006. Greedy Layer-wise Training of Deep Networks. Pages 153–160 of: Advances in Neural Information Processing Systems.Google Scholar

Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks.Google Scholar

Catanzaro, B., Sundaram, N., and Keutzer, K. 2008. Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th International Conference on Machine Learning.Google Scholar

Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G. R., Ng, A. Y., and Olukotun, K. 2006. Map-Reduce for Machine Learning on Multicore. Pages 281–288 of: Neural Information Processing Systems.Google Scholar

Coates, A., and Ng, A. Y. 2010. Multi-camera Objection Detection for Robotics. In: IEEE International Conference on Robotics and Automation.Google Scholar

Coates, A., Baumstarck, P., Le, Q., and Ng, A. Y. 2009. Scalable Learning for Object Detection with GPU Hardware. In: IEEE/RSJ International Conference on Intelligent Robots and Systems.Google Scholar

Dalal, N., and Triggs, B. 2005. Histograms of Oriented Gradients for Human Detection. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar

Dean, J., and Ghemawat, S. 2004. MapReduce: Simplified Data Processing on Large Clusters. In: Sixth Symposium on Operating System Design and Implementation.Google Scholar

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. 2009. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascalnetwork.org/challenges/VOC/voc2009/workshop/index.html.

Felzenszwalb, P., Mcallester, D., and Ramanan, D. 2008. A Discriminatively Trained, Multiscale, Deformable Part Model. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar

Ferrari, V., Fevrier, L., Jurie, F., and Schmid, C. 2008. Groups of Adjacent Contour Segments for Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.CrossRef Google Scholar PubMed

Friedman, J., Hastie, T., and Tibshirani, R. 1998. Additive Logistic Regression: A Statistical View of Boosting. Technical Report, Department of Statistics, Stanford University.Google Scholar

Goto, K., and Van De Geijn, R. 2008. High-performance Implementation of the Level-3 BLAS. ACM Transactions on Mathematical Software, 35(1), 1–14.CrossRef Google Scholar

Grauman, K., and Darrell, T. 2005. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: Tenth IEEE International Conference on Computer Vision.Google Scholar

Griffin, G., Holub, A., and Perona, P. 2007. Caltech-256 Object Category Dataset. Technical Report, California Institute of Technology.Google Scholar

Heymann, S., Miller, K., Smolic, A., Frhlich, B., and Wiegand, T. 2007. SIFT implementation and optimization for general-purpose GPU. In: 15th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision.Google Scholar

Hinton, G. E. 2002. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771–1800.CrossRef Google Scholar PubMed

Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.CrossRef Google Scholar PubMed

Hinton, G. E., Osindero, S., and Teh, Y.-W. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554.CrossRef Google Scholar PubMed

LeCun, Y., Huang, F. J., and Bottou, L. 2004. Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar

Lee, H., Chaitanya, E., and Ng, A. Y. 2007. Sparse deep belief net model for visual area V2. Pages 873–880 of: Advances in Neural Information Processing Systems.Google Scholar

Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. 2009. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In: Proceedings of the 26th International Conference on Machine Learning.Google Scholar

Lowe, D. G. 1999. Object Recognition from Local Scale-invariant Features. Pages 1150–1157 of: Seventh IEEE International Conference on Computer Vision, vol. 2.CrossRef Google Scholar

Nister, D., and Stewenius, H. 2006. Scalable Recognition with a Vocabulary Tree. Pages 2161–2168 of: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar

Vidia, . 2009. nVidia CUDA Programming Guide. NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA.Google Scholar

Opelt, A., Pinz, A., and Zisserman, A. 2006. Incremental Learning of Object Detectors Using a Visual Shape Alphabet. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar

Osuna, E., Freund, R., and Girosi, F. 1997. Training Support Vector Machines: An Application to Face Detection. In: Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.CrossRef Google Scholar

Quigley, M., Batra, S., Gould, S., Klingbeil, E., Le, Q. V., Wellman, A., and Ng, A. Y. 2009. Highaccuracy 3D Sensing for Mobile Manipulation: Improving Object Detection and Door Opening. In: IEEE International Conference on Robotics and Automation.Google Scholar

Raina, R., Madhavan, A., and Ng, A. 2009. Large-Scale Deep Unsupervised Learning Using Graphics Processors. Pages 873–880 of: Bottou, L., and Littman, M. (eds), Proceedings of the 26th International Conference on Machine Learning. Montreal: Omnipress.Google Scholar

Ranzato, M. A., and Szummer, M. 2008. Semi-supervised Learning of Compact Document Representations with Deep Networks. Pages 792–799 of: Proceedings of the 25th International Conference on Machine Learning.Google Scholar

Rowley, H.A., Baluja, S., and Kanade, T. 1995. Human Face Detection inVisual Scenes. In: Advances in Neural Information Processing Systems.Google Scholar

Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2005. Labelme: A Database andWebbased Tool for Image Annotation. Technical Report MIT-CSAIL-TR-2005-056, Massachusetts Institute of Technology.Google Scholar

Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2008. LabelMe: A Database and Web-based Tool for Image Annotation. International Journal of Computer Vision, 77(May), 157–173.CrossRef Google Scholar

Salakhutdinov, R., and Hinton, G. 2007. Semantic Hashing. In: SIGIR Workshop on Information Retrieval and Applications of Graphical Models.Google Scholar

Sapp, B., Saxena, A., and Ng, A. Y. 2008. A Fast Data Collection and Augmentation Procedure for Object Recognition. In: AAAI'08: Proceedings of the 23rd National Conference on Artificial Intelligence.Google Scholar

Schneiderman, H., and Kanade, T. 2000. A Statistical Method for 3D Object Detection Applied to Faces and Cars. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar

Schneiderman, H., and Kanade, T. 2004. Object Detection Using the Statistics of Parts. International Journal of Computer Vision.CrossRef Google Scholar

Torralba, A., Fergus, R., and Freeman, W. T. 2007a. 80 Million Tiny Images: A Large Dataset for Non-parametric Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar

Torralba, A., Murphy, K. P., and Freeman, W. T. 2007b. Sharing Visual Features for Multiclass and Multiview Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.CrossRef Google Scholar PubMed

Torralba, A., Fergus, R., and Weiss, Y. 2008. SmallCodes and Large Image Databases for Recognition. In: IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar PubMed

van Hateren, J. H., and van der Schaaf, A. 1998. Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortex. Proceedings of the Royal Society of London B, 265, 359–366.CrossRef Google Scholar PubMed

Viola, P., and Jones, M. J. 2001. Robust Real-time Object Detection. International Journal of Computer Vision.Google Scholar

Viola, P., and Jones, M. J. 2004. Robust Real-Time Face Detection. International Journal of Computer Vision.CrossRef Google Scholar

Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing, 27(1–2), 3–35.CrossRef Google Scholar

Winn, J., Criminisi, A., and Minka, T. 2005. Object Categorization by Learned Universal Visual Dictionary. In: Tenth IEEE International Conference on Computer Vision.Google Scholar

Book contents

18 - Large-Scale Learning for Vision with GPUs

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive