Binary separation and training support vector machines*

Roger Fletcher; Gaetano Zanghirati

doi:10.1017/S0962492910000024

Binary separation and training support vector machines*

Published online by Cambridge University Press: 10 May 2010

Roger Fletcher and

Gaetano Zanghirati

Show author details

Roger Fletcher: Affiliation:
Department of Mathematics, University of Dundee, Dundee DD1 4HN, UK, E-mail: [email protected]
Gaetano Zanghirati: Affiliation:
Department of Mathematics and Math4Tech Center, University of Ferrara, 44100 Ferrara, Italy, E-mail: [email protected]

Article contents

Extract
References

Get access

Rights & Permissions

Extract

We introduce basic ideas of binary separation by a linear hyperplane, which is a technique exploited in the support vector machine (SVM) concept. This is a decision-making tool for pattern recognition and related problems. We describe a fundamental standard problem (SP) and show how this is used in most existing research to develop a dual-based algorithm for its solution. This algorithm is shown to be deficient in certain aspects, and we develop a new primal-based SQP-like algorithm, which has some interesting features. Most practical SVM problems are not adequately handled by a linear hyperplane. We describe the nonlinear SVM technique, which enables a nonlinear separating surface to be computed, and we propose a new primal algorithm based on the use of low-rank Cholesky factors.

It may be, however, that exact separation is not desirable due to the presence of uncertain or mislabelled data. Dealing with this situation is the main challenge in developing suitable algorithms. Existing dual-based algorithms use the idea of L1 penalties, which has merit. We suggest how penalties can be incorporated into a primal-based algorithm. Another aspect of practical SVM problems is often the huge size of the data set, which poses severe challenges both for software package development and for control of ill-conditioning. We illustrate some of these issues with numerical experiments on a range of problems.

Type: Research Article
Information: Acta Numerica , Volume 19 , May 2010 , pp. 121 - 158

DOI: https://doi.org/10.1017/S0962492910000024 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Agarwal, S. and Niyogi, P. (2009), ‘Generalization bounds for ranking algorithms via algorithmic stability’, J. Mach. Learn. Res. 10, 441–474.Google Scholar

Bennett, K. P. and Parrado-Hernández, E. (2006), ‘The interplay of optimization and machine learning research’, J. Mach. Learn. Res. 7, 1265–1281.Google Scholar

Bordes, A., Ertekin, S., Weston, J. and Bottou, L. (2005), ‘Fast kernel classifiers with online and active learning’, J. Mach. Learn. Res. 6, 1579–1619.Google Scholar

Boser, B. E., Guyon, I. and Vapnik, V. N. (1992), A training algorithm for optimal margin classifiers. In Proc. 5th Annual ACM Workshop on Computational Learning Theory (Haussler, D., ed.), ACM Press, Pittsburgh, pp. 144–152.Google Scholar

Bottou, L. and Lin, C.-J. (2007), Support vector machine solvers. In Large Scale Kernel Machines (Bottou, L., Chapelle, O., DeCoste, D. and Weston, J., eds), The MIT Press, pp. 301–320.CrossRef Google Scholar

Burges, C. J. and Schölkopf, B. (1997), Improving the accuracy and speed of support vector machines. In Advances in Neural Information Processing Systems, Vol. 9, The MIT Press, pp. 375–381.Google Scholar

Burges, C. J. C. (1998), ‘A tutorial on support vector machines for pattern recognition’, Data Min. Knowl. Discovery 2, 121–167.Google Scholar

Caponnetto, A. and Rosasco, L. (2004), Non standard support vector machines and regularization networks. Technical report DISI-TR-04–03, University of Genoa, Italy.Google Scholar

Catanzaro, B., Sundaram, N. and Keutzer, K. (2008), Fast support vector machine training and classification on graphics processors. In Proc. 25th International Conference on Machine Learning, Helsinki, Finland, pp. 104–111.Google Scholar

Chang, C.-C. and Lin, C.-J. (2001), LIBSVM: A library for support vector machines. www.csie.ntu.edu.tw/~cjlin/libsvm Google Scholar

Chang, C.-C., Hsu, C.-W. and Lin, C.-J. (2000), ‘The analysis of decomposition methods for support vector machines’, IEEE Trans. Neural Networks 11, 1003–1008.CrossRef Google Scholar PubMed

Chang, E., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z. and Cui, H. (2008), Parallelizing support vector machines on distributed computers. In Advances in Neural Information Processing Systems, Vol. 20, The MIT Press, pp. 257–264.Google Scholar

Chapelle, O. (2007), ‘Training a support vector machine in the primal’, Neural Comput. 19, 1155–1178.CrossRef Google Scholar PubMed

Chen, P.-H., Fan, R.-E. and Lin, C.-J. (2006), ‘A study on SMO-type decomposition methods for support vector machines’, IEEE Trans. Neural Networks 17, 893–908.Google Scholar

Collobert, R. and Bengio, S. (2001), ‘SVMTorch: Support vector machines for large-scale regression problems’, J. Mach. Learn. Res. 1, 143–160.Google Scholar

Cristianini, N. and Shawe-Taylor, J. (2000), An Introduction to Support Vector Machines and other Kernel-Based Learning Methods, Cambridge University Press.Google Scholar

Cucker, F. and Smale, S. (2001), ‘On the mathematical foundations of learning’, Bull. Amer. Math. Soc. 39, 1–49.CrossRef Google Scholar

Cucker, F. and Smale, S. (2002), ‘Best choices for regularization parameter in learning theory: On the bias-variance problem’, Found. Comput. Math. 2, 413–428.CrossRef Google Scholar

Cucker, F. and Zhou, D. X. (2007), Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press.Google Scholar

De Vito, E., Rosasco, L., Caponnetto, A., De Giovannini, U. and Odone, F. (2005), ‘Learning from examples as an inverse problem’, J. Mach. Learn. Res. 6, 883–904.Google Scholar

De Vito, E., Rosasco, L., Caponnetto, A., Piana, M. and Verri, A. (2004), ‘Some properties of regularized kernel methods’, J. Mach. Learn. Res. 5, 1363–1390.Google Scholar

Dong, J.-X., Krzyzak, A. and Suen, C. Y. (2003), A fast parallel optimization for training support vector machine. In Proc. 3rd International Conference on Machine Learning and Data Mining (Perner, P. and Rosenfeld, A., eds), Vol. 2734 of Lecture Notes in Artificial Intelligence, Springer, pp. 96–105.Google Scholar

Dong, J.-X., Krzyzak, A. and Suen, C. Y. (2005), ‘Fast SVM training algorithm with decomposition on very large data sets’, IEEE Trans. Pattern Anal. Mach. Intelligence 27, 603–618.Google Scholar

Drineas, P. and Mahoney, M. W. (2005), ‘On the Nyström method for approximating a Gram matrix for improved kernel-based learning’, J. Mach. Learn. Res. 6, 2153–2175.Google Scholar

Durdanovic, I., Cosatto, E. and Graf, H.-P. (2007), Large-scale parallel SVM implementation. In Large Scale Kernel Machines (Bottou, L., Chapelle, O., DeCoste, D. and Weston, J., eds), The MIT Press, pp. 105–138.Google Scholar

Evgeniou, T., Pontil, M. and Poggio, T. (2000), ‘Regularization networks and support vector machines’, Adv. Comput. Math. 13, 1–50.Google Scholar

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin, C.-J. (2008), ‘LIB-LINEAR: A library for large linear classification’, J. Mach. Learn. Res. 9, 1871–1874.Google Scholar

Ferris, M. C. and Munson, T. S. (2002), ‘Interior-point methods for massive support vector machines’, SIAM J. Optim. 13, 783–804.Google Scholar

Fine, S. and Scheinberg, K. (2001), ‘Efficient SVM training using low-rank kernel representations’, J. Mach. Learn. Res. 2, 243–264.Google Scholar

Fine, S. and Scheinberg, K. (2002), INCAS: An incremental active set method for SVM. Technical report, IBM Research Labs, Haifa, Israel.Google Scholar

Fletcher, R. (1987), Practical Methods of Optimization, 2nd edn, Wiley, Chichester.Google Scholar

Fletcher, R. (1996–2007), BQPD: Linear and quadratic programming solver. www-new.mcs.anl.gov/otc/Guide/SoftwareGuide/Blurbs/bqpd.html Google Scholar

Franc, V. and Sonnenburg, S. (2008 a), LIBOCAS: Library implementing OCAS solver for training linear SVM classifiers from large-scale data. cmp.felk.cvut.cz/~xfrancv/ocas/html Google Scholar

Franc, V. and Sonnenburg, S. (2008 b), Optimized cutting plane algorithm for support vector machines. In Proc. 25th International Conference on Machine Learning, Helsinki, Finland, Vol. 307, ACM Press, New York, pp. 320–327.Google Scholar

Gertz, E. M. and Griffin, J. D. (2005), Support vector machine classifiers for large data sets. Technical report, Mathematics and Computer Science Division, Argonne National Laboratory, USA.CrossRef Google Scholar

Goldfarb, D. and Scheinberg, K. (2004), ‘A product-form cholesky factorization method for handling dense columns in interior point methods for linear programming’, Math. Program. Ser. A 99, 1–34.CrossRef Google Scholar

Graf, H. P., Cosatto, E., Bottou, L., Dourdanovic, I. and Vapnik, V. N. (2005), Parallel support vector machines: The Cascade SVM. In Advances in Neural Information Processing Systems (Saul, L., Weiss, Y. and Bottou, L., eds), Vol. 17, The MIT Press, pp. 521–528.Google Scholar

Groenen, P. J. F., Nalbantov, G. and Bioch, J. C. (2007), Nonlinear support vector machines through iterative majorization and I-splines. In Advances in Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization, Springer, pp. 149–161.CrossRef Google Scholar

Groenen, P. J. F., Nalbantov, G. and Bioch, J. C. (2008), ‘SVM-Maj: A majorization approach to linear support vector machines with different hinge errors’, Adv. Data Analysis and Classification 2, 17–43.CrossRef Google Scholar

Hastie, T., Tibshirani, R. and Friedman, J. (2001), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.Google Scholar

Herbrich, R. (2002), Learning Kernel Classifiers. Theory and Algorithms, The MIT Press.Google Scholar

Hush, D. and Scovel, C. (2003), ‘Polynomial-time decomposition algorithms for support vector machines’, Machine Learning 51, 51–71.Google Scholar

Hush, D., Kelly, P., Scovel, C. and Steinwart, I. (2006), ‘QP algorithms with guaranteed accuracy and run time for support vector machines’, J. Mach. Learn. Res. 7, 733–769.Google Scholar

Joachims, T. (1999), Making large-scale SVM learning practical. In Advances in Kernel Methods: Support Vector Learning (Schölkopf, B., Burges, C. J. C. and Smola, A., eds), The MIT Press, pp. 169–184.Google Scholar

Joachims, T. (2006), Training linear SVMs in linear time. In Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, ACM Press, New York, pp. 217–226.Google Scholar

Keerthi, S. S. and DeCoste, D. M. (2005), ‘A modified finite Newton method for fast solution of large-scale linear SVMs’, J. Mach. Learn. Res. 6, 341–361.Google Scholar

Keerthi, S. S. and Gilbert, E. G. (2002), ‘Convergence of a generalized SMO algorithm for SVM classifier design’, Machine Learning 46, 351–360.Google Scholar

Keerthi, S. S., Chapelle, O. and DeCoste, D. M. (2006), ‘Building support vector machines with reduced classifier complexity’, J. Mach. Learn. Res. 7, 1493–1515.Google Scholar

Keerthi, S. S., Shevade, S. K., Bhattacharyya, C. and Murthy, K. R.K. (2001), ‘Improvements to Platt‘s SMO algorithm for SVM classifier design’, Neural Comput. 13, 637–649.Google Scholar

Kulis, B., Sustik, M. and Dhillon, I. (2006), Learning low-rank kernel matrices. In Proc. 23rd International Conference on Machine Learning: ICML, pp. 505–512.Google Scholar

LeCun, Y. and Cortes, C. (1998), The MNIST database of handwritten digits. www.research.att.com/~yann/ocr/mnist Google Scholar

LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998), ‘Gradient-based learning applied to document recognition’, 86, 2278–2324.Google Scholar

Lee, Y.-J. and Mangasarian, O. L. (2001 a), RSVM: Reduced support vector machines. In Proc. 1st SIAM International Conference on Data Mining, Chicago, April 5–7, 2001, SIAM, Philadelphia, pp. 1–16.Google Scholar

Lee, Y.-J. and Mangasarian, O. L. (2001 b), ‘SSVM: A smooth support vector machine for classification’, Comput. Optim. Appl. 20, 5–22.Google Scholar

Lin, C.-J. (2001 a), Linear convergence of a decomposition method for support vector machines. Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.Google Scholar

Lin, C.-J. (2001 b), ‘On the convergence of the decomposition method for support vector machines’, IEEE Trans. Neural Networks 12, 1288–1298.Google Scholar PubMed

Lin, C.-J. (2002), ‘Asymptotic convergence of an SMO algorithm without any assumptions’, IEEE Trans. Neural Networks 13, 248–250.Google Scholar PubMed

Mangasarian, O. L. (2000), Generalized support vector machines. In Advances in Large Margin Classifiers, The MIT Press, pp. 135–146.Google Scholar

Mangasarian, O. L. (2002), ‘A finite Newton method for classification’, Optim. Methods Software 17, 913–939.Google Scholar

Mangasarian, O. L. (2006), ‘Exact 1-norm support vector machines via unconstrained convex differentiable minimization’, J. Mach. Learn. Res. 7, 1517–1530.Google Scholar

Mangasarian, O. L. and Musicant, D. R. (2001), ‘Lagrangian support vector machines’, J. Mach. Learn. Res. 1, 161–177.Google Scholar

Osuna, E., Freund, R. and Girosi, F. (1997), Training support vector machines: An application to face detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition: CVPR97, IEEE Computer Society, New York, pp. 130–136.CrossRef Google Scholar

Platt, J. C. (1998), Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning (Schölkopf, B., Burges, C. and Smola, A., eds), The MIT Press, pp. 185–210.Google Scholar

Platt, J. C. (1999), Using analytic QP and sparseness to speed training of support vector machines. In Advances in Neural Information Processing Systems (Kearns, M.et al., eds), Vol. 11, The MIT Press, pp. 557–563.Google Scholar

Prato, M., Zanni, L. and Zanghirati, G. (2007) ‘On recent machine learning algorithms for brain activity interpretation’, Applied Computational Electromagnetics Society Journal 22, 1939–1946.Google Scholar

Scheinberg, K. (2006), ‘An efficient implementation of an active set method for SVMs’, J. Mach. Learn. Res. 7, 2237–2257.Google Scholar

Schölkopf, B. and Smola, A. J. (2002), Learning with Kernels, The MIT Press.Google Scholar

Serafini, T. and Zanni, L. (2005), ‘On the working set selection in gradient projection-based decomposition techniques for support vector machines’, Optim. Methods Software 20, 583–596.Google Scholar

Serafini, T., Zanghirati, G. and Zanni, L. (2005), ‘Gradient projection methods for quadratic programs and applications in training support vector machines’, Optim. Methods Software 20, 353–378.Google Scholar

Shawe-Taylor, J. and Cristianini, N. (2004), Kernel Methods for Pattern Analysis, Cambridge University Press.Google Scholar

Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B. and Vandewalle, J. (2002), Least Squares Support Vector Machines, World Scientific, Singapore.Google Scholar

Teo, C. H., Le, Q. V., Smola, A. and Vishwanathan, S. (2009), BMRM: Bundle methods for regularized risk minimization. users.rsise.anu.edu.au/~chteo/BMRM.html Google Scholar

Tsang, I. W., Kwok, J. T. and Cheung, P.-M. (2005), ‘Core vector machines: Fast SVM training on very large data sets’, J. Mach. Learn. Res. 6, 363–392.Google Scholar

Vapnik, V. N. (1998), Statistical Learning Theory, Wiley, New York.Google Scholar

Vapnik, V. N. (1999), The Nature of Statistical Learning Theory, Information Science and Statistics, Springer.Google Scholar

Williams, C. K. and Seeger, M. (2001), Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems, Vol. 13, The MIT Press, pp. 682–688.Google Scholar

Woodsend, K. and Gondzio, J. (2007 a), Exploiting separability in large scale support vector machine training. Technical report MS-07–002, The University of Edinburgh.Google Scholar

Woodsend, K. and Gondzio, J. (2007 b), Parallel support vector machine training with nonlinear kernels. Technical report MS-07–007, The University of Edinburgh.Google Scholar

Woodsend, K. and Gondzio, J. (2009), ‘Hybrid MPI/OpenMP parallel linear support vector machine training’, J. Mach. Learn. Res. 20, 1937–1953.Google Scholar

Wyganowski, M. (2008), Classification algorithms on the cell processor. PhD thesis, Kate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USA. http://hdl.handle.net/1850/7767 Google Scholar

Zanni, L. (2006), ‘An improved gradient projection-based decomposition technique for support vector machines’, Comput. Management Sci. 3, 131–145.CrossRef Google Scholar

Zanni, L., Serafini, T. and Zanghirati, G. (2006), ‘Parallel software for training large-scale support vector machines on multiprocessors systems’, J. Mach. Learn. Res. 7, 1467–1492. http://dm.unife.it/gpdt Google Scholar

Article contents

Binary separation and training support vector machines*

Extract

Access options

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests