Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-23T08:15:07.440Z Has data issue: false hasContentIssue false

A new analytical approach to consistency and overfitting in regularized empirical risk minimization

Published online by Cambridge University Press:  20 July 2017

NICOLÁS GARCÍA TRILLOS
Affiliation:
Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA email: [email protected]
RYAN MURRAY
Affiliation:
Mathematics Department, The Pennsylvania State University, University Park, PA 16802, USA email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This work considers the problem of binary classification: given training data x1, . . ., xn from a certain population, together with associated labels y1,. . ., yn ∈ {0,1}, determine the best label for an element x not among the training data. More specifically, this work considers a variant of the regularized empirical risk functional which is defined intrinsically to the observed data and does not depend on the underlying population. Tools from modern analysis are used to obtain a concise proof of asymptotic consistency as regularization parameters are taken to zero at rates related to the size of the sample. These analytical tools give a new framework for understanding overfitting and underfitting, and rigorously connect the notion of overfitting with a loss of compactness.

Type
Papers
Creative Commons
This is a work of the U.S. Government and is not subject to copyright protection in the United States.
Copyright
Copyright © Cambridge University Press 2017

References

[1] Agapiou, S., Larsson, S. & Stuart, A. M. (2013) Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stoch. Process. Appl. 123, 38283860.Google Scholar
[2] Ambrosio, L., Fusco, N. & Pallara, D. (2000) Functions of Bounded Variation and Free Discontinuity Problems, Oxford Mathematical Monographs, The Clarendon Press Oxford University Press, New York.Google Scholar
[3] Ambrosio, L., Gigli, N. & Savaré, G. (2008) Gradient Flows: In Metric Spaces and in the Space of Probability Measures, Lectures in Mathematics, Birkhäuser, Basel.Google Scholar
[4] Billingsley, P. (2012) Probability and Measure, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ.Google Scholar
[5] Chambolle, A., Caselles, V., Cremers, D., Novaga, M. & Pock, T. (2010) An introduction to total variation for image analysis. In: Fornasier, Massimo (editor), Theoretical Foundations and Numerical Methods for Sparse Recovery, Radon Ser. Comput. Appl. Math., vol. 9, Walter de Gruyter, Berlin, pp. 263340.Google Scholar
[6] Dal Maso, G. (1993) An Introduction to Γ-Convergence, Springer, Birkhäuser Boston.Google Scholar
[7] Di Nezza, E., Palatucci, G. & Valdinoci, E. (2012) Hitchhiker's guide to the fractional Sobolev spaces. Bull. Sci. Math. 136, 521573.Google Scholar
[8] Esser, E. (2009) Applications of Lagrangian Based Alternating Direction Methods and Connections to Split Bregman, CAM Report 09-31, UCLA.Google Scholar
[9] Evans, L. C. (1990) Weak Convergence Methods for Nonlinear Partial Differential Equations, vol. 74, American Mathematical Soc, Providence, RI.Google Scholar
[10] Fonseca, I. & Leoni, G. (2007) Modern Methods in the Calculus of Variations: Lp Spaces, Springer Monographs in Mathematics, Springer, New York.Google Scholar
[11] García Trillos, N., Gerlach, M., Hein, M. & Slepčev, D. (2017) Spectral convergence of emprical graph laplacians. In preparation.Google Scholar
[12] García Trillos, N. & Slepčev, D. (2016) Continuum limit of total variation on point clouds. Arch. Ration. Mech. Anal. 220 (1), 193241.Google Scholar
[13] Garcia Trillos, N. & Slepcev, D. (2015) On the rate of convergence of empirical measures in ∞-transportation distance. Canad. J. Math. 67, 13581383.Google Scholar
[14] García Trillos, N., Slepčev, D., von Brecht, J., Laurent, T. & Bresson, X. (2016) Consistency of Cheeger and ratio graph cuts. to appear in J. Mach. Learning Res. 17, Paper No. 181, 46 pages.Google Scholar
[15] Ghosal, S., Ghosh, J. K. & van der Vaart, A. W. (2000) Convergence rates of posterior distributions. Ann. Statist. 28, 500531.Google Scholar
[16] Grisvard, P. (1985) Elliptic Problems in Nonsmooth Domains, Monographs and Studies in Mathematics, vol. 24, Pitman (Advanced Publishing Program), Boston, MA.Google Scholar
[17] Leoni, G. (2009) A First Course in Sobolev Spaces, Graduate Studies in Mathematics, vol. 24, American Mathematical Society, Providence, RI.Google Scholar
[18] Nikolova, M. (2004) A variational approach to remove outliers and impulse noise. J. Math. Imaging Vision 20, 99120.Google Scholar
[19] Pedregal, P. (1997) Parametrized Measures and Variational Principles, Progress in Nonlinear Differential Equations and their Applications, vol. 30, Birkhäuser Verlag, Basel.CrossRefGoogle Scholar
[20] Rockafellar, R. T. (1970) Convex Analysis, Princeton Mathematical Series, vol. 28, Princeton University Press, Princeton, NJ.Google Scholar
[21] Vapnik, V. N. (1998) Statistical Learning Theory, vol. 1, John Wiley & Sons, Inc., New York.Google Scholar
[22] Villani, C. (2003) Topics in Optimal Transportation, Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RI.Google Scholar
[23] Vogel, C. R. & Oman, M. E. (1996) Iterative methods for total variation denoising. SIAM J. Sci. Comput. 17, 227238.Google Scholar
[24] von Luxburg, U. & Schölkopf, B. (2011) Statistical learning theory: Models, concepts, and results. In: Gabbay, Dov M., Hartmann, Stephan and Woods, John (editors), Handbook of the History of Logic, Vol. 10: Inductive Logic Elsevier, North Holland, pp. 651706.Google Scholar