FAST RATES FOR ESTIMATION ERROR AND ORACLE INEQUALITIES FOR MODEL SELECTION

Peter L. Bartlett

doi:10.1017/S0266466608080225

FAST RATES FOR ESTIMATION ERROR AND ORACLE INEQUALITIES FOR MODEL SELECTION

Published online by Cambridge University Press: 15 January 2008

Peter L. Bartlett

Show author details

Peter L. Bartlett: Affiliation:
University of California, Berkeley

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We consider complexity penalization methods for model selection. These methods aim to choose a model to optimally trade off estimation and approximation errors by minimizing the sum of an empirical risk term and a complexity penalty. It is well known that if we use a bound on the maximal deviation between empirical and true risks as a complexity penalty, then the risk of our choice is no more than the approximation error plus twice the complexity penalty. There are many cases, however, where complexity penalties like this give loose upper bounds on the estimation error. In particular, if we choose a function from a suitably simple convex function class with a strictly convex loss function, then the estimation error (the difference between the risk of the empirical risk minimizer and the minimal risk in the class) approaches zero at a faster rate than the maximal deviation between empirical and true risks. In this paper, we address the question of whether it is possible to design a complexity penalized model selection method for these situations. We show that, provided the sequence of models is ordered by inclusion, in these cases we can use tight upper bounds on estimation error as a complexity penalty. Surprisingly, this is the case even in situations when the difference between the empirical risk and true risk (and indeed the error of any estimate of the approximation error) decreases much more slowly than the complexity penalty. We give an oracle inequality showing that the resulting model selection method chooses a function with risk no more than the approximation error plus a constant times the complexity penalty.We gratefully acknowledge the support of the NSF under award DMS-0434383. Thanks also to three anonymous reviewers for useful comments that improved the presentation.

Type: Research Article
Information: Econometric Theory , Volume 24 , Issue 2 , April 2008 , pp. 545 - 552

DOI: https://doi.org/10.1017/S0266466608080225 [Opens in a new window]
Copyright: © 2008 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

Bartlett, P.L., S. Boucheron, & G. Lugosi (2002) Model selection and error estimation. Machine Learning 48, 85–113.Google Scholar

Bartlett, P.L., O. Bousquet, & S. Mendelson (2005) Local Rademacher complexities. Annals of Statistics 33, 1497–1537.Google Scholar

Bartlett, P.L., M.I. Jordan, & J.D. McAuliffe (2006) Convexity, classification, and risk bounds. Journal of the American Statistical Association 101, 138–156.Google Scholar

Bartlett, P.L. & S. Mendelson (2006) Empirical minimization. Probability Theory and Related Fields 135, 311–334.Google Scholar

Blanchard, G., G. Lugosi, & N. Vayatis (2003) On the rate of convergence of regularized boosting classifiers. Journal of Machine Learning Research 4, 861–894.Google Scholar

Dudley, R.M. (1999) Uniform Central Limit Theorems. Cambridge University Press.

Freund, Y. & R.E. Schapire (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139.Google Scholar

Koltchinskii, V. (2006) Local Rademacher complexities and oracle inequalities in risk minimization. Annals of Statistics 34, 2593–2656.Google Scholar

Lee, W.S., P.L. Bartlett, & R.C. Williamson (1996) Efficient agnostic learning of neural networks with bounded fan-in. IEEE Transactions on Information Theory 42, 2118–2132.Google Scholar

Massart, P. (2000) Some applications of concentration inequalities to statistics. Annales de la Faculté des Sciences de Toulouse 9, 245–303.Google Scholar

Mendelson, S. (2002) Improving the sample complexity using global data. IEEE Transactions on Information Theory 48, 1977–1991.Google Scholar

Article contents

FAST RATES FOR ESTIMATION ERROR AND ORACLE INEQUALITIES FOR MODEL SELECTION

Abstract

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests