Hostname: page-component-586b7cd67f-vdxz6 Total loading time: 0 Render date: 2024-11-22T14:47:00.457Z Has data issue: false hasContentIssue false

An 1-oracle inequality for the Lasso in finite mixture Gaussian regression models

Published online by Cambridge University Press:  04 November 2013

Caroline Meynet*
Affiliation:
Laboratoire de Mathématiques, Faculté des Sciences d’Orsay, Université Paris-Sud, 91405 Orsay, France. [email protected]
Get access

Abstract

We consider a finite mixture of Gaussian regression models for high-dimensionalheterogeneous data where the number of covariates may be much larger than the sample size.We propose to estimate the unknown conditional mixture density by an1-penalized maximum likelihood estimator. We shall providean 1-oracle inequality satisfied by this Lasso estimator withthe Kullback–Leibler loss. In particular, we give a condition on the regularizationparameter of the Lasso to obtain such an oracle inequality. Our aim is twofold: to extendthe 1-oracle inequality established by Massart and Meynet[12] in the homogeneous Gaussian linearregression case, and to present a complementary result to Städler et al.[18], by studying the Lasso for its1-regularization properties rather than considering it as avariable selection procedure. Our oracle inequality shall be deduced from a finite mixtureGaussian regression model selection theorem for 1-penalizedmaximum likelihood conditional density estimation, which is inspired from Vapnik’s methodof structural risk minimization [23] and from thetheory on model selection for maximum likelihood estimators developed by Massart in [11].

Type
Research Article
Copyright
© EDP Sciences, SMAI, 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

P.L. Bartlett, S. Mendelson and J. Neeman, 1-regularized linear regression: persistence and oracle inequalities, Probability and related fields. Springer (2011).
J.P. Baudry, Sélection de Modèle pour la Classification Non Supervisée. Choix du Nombre de Classes. Ph.D. thesis, Université Paris-Sud 11, France (2009).
Bickel, P.J., Ritov, Y. and Tsybakov, A.B., Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37 (2009) 17051732. Google Scholar
S. Boucheron, G. Lugosi and P. Massart, A non Asymptotic Theory of Independence. Oxford University press (2013).
Bühlmann, P. and van de Geer, S., On the conditions used to prove oracle results for the Lasso. Electr. J. Stat. 3 (2009) 13601392. Google Scholar
Candes, E. and Tao, T., The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35 (2007) 23132351. Google Scholar
S. Cohen and E. Le Pennec, Conditional Density Estimation by Penalized Likelihood Model Selection and Applications, RR-7596. INRIA (2011).
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R., Least Angle Regression. Ann. Stat. 32 (2004) 407499. Google Scholar
M. Hebiri, Quelques questions de sélection de variables autour de l’estimateur Lasso. Ph.D. Thesis, Université Paris Diderot, Paris 7, France (2009).
C. Huang, G.H.L. Cheang and A.R. Barron, Risk of penalized least squares, greedy selection and 1-penalization for flexible function librairies. Submitted to the Annals of Statistics (2008).
P. Massart, Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lect. Notes Math. Springer, Berlin-Heidelberg (2007).
Massart, P. and Meynet, C., The Lasso as an 1-ball model selection procedure. Elect. J. Stat. 5 (2011) 669687. Google Scholar
Maugis, C. and Michel, B., A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: PS 15 (2011) 4168. Google Scholar
G. McLachlan and D. Peel, Finite Mixture Models. Wiley, New York (2000).
Meinshausen, N. and Yu, B., Lasso type recovery of sparse representations for high dimensional data. Ann. Stat. 37 (2009) 246270. Google Scholar
Redner, R.A. and Walker, H.F., Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 (1984) 195239. Google Scholar
Rigollet, P. and Tsybakov, A., Exponential screening and optimal rates of sparse estimation. Ann. Stat. 39 (2011) 731771. Google Scholar
Städler, N., Hlmann, B.P., and van de Geer, S., 1-penalization for mixture regression models. Test 19 (2010) 209256.
Tibshirani, R., Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. Ser. B 58 (1996) 267288. Google Scholar
Osborne, M.R., Presnell, B. and Turlach, B.A., On the Lasso and its dual. J. Comput. Graph. Stat. 9 (2000) 319337. Google Scholar
Osborne, M.R., Presnell, B. and Turlach, B.A, A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 (2000) 389404. Google Scholar
A. van der Vaart and J. Wellner, Weak Convergence and Empirical Processes. Springer, Berlin (1996).
V.N. Vapnik, Estimation of Dependencies Based on Empirical Data. Springer, New-York (1982).
V.N. Vapnik, Statistical Learning Theory. J. Wiley, New-York (1990).
Zhao, P. and Yu, B. On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006) 25412563. Google Scholar