Book contents
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
7 - Binary Classification with Support Vector Machines
Published online by Cambridge University Press: 05 December 2012
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
Summary
Introduction
Support vector machines (SVM) are currently one of the most popular and accurate methods for binary data classification and prediction. They have been applied to a variety of data and situations such as cyber-security, bioinformatics, web searches, medical risk assessment, financial analysis, and other areas [1]. This type of machine learning is shown to be accurate and is able to generalize predictions based upon previously learned patterns. However, current implementations are limited in that they can only be trained accurately on examples numbering to the tens of thousands and usually run only on serial computers. There are exceptions. A prime example is the annual machine learning and classification competitions such as the International Conference on Artificial Neural Networks (ICANN), which present problems with more than 100,000 elements to be classified. However, in order to treat such large test cases the formalism of the support vector machines must be modified.
SVMs were first developed by Vapnik and collaborators [2] as an extension to neural networks. Assume that we can convert the data values associated with an entity into numerical values that form a vector in the mathematical sense. These vectors form a space. Also, assume that this space of vectors can be separated by a hyperplane into the vectors that belong to one class and those that form the opposing class.
- Type
- Chapter
- Information
- Data-Intensive ComputingArchitectures, Algorithms, and Applications, pp. 157 - 179Publisher: Cambridge University PressPrint publication year: 2012