Fundamental Limits in Model Selection for Modern Data Analysis

doi:10.1017/9781108616799.013

12 - Fundamental Limits in Model Selection for Modern Data Analysis

Published online by Cambridge University Press: 22 March 2021

Jie Ding ,

Yuhong Yang and

Vahid Tarokh

Edited by

Miguel R. D. Rodrigues and

Yonina C. Eldar

Show author details

Miguel R. D. Rodrigues: Affiliation:
University College London
Yonina C. Eldar: Affiliation:
Weizmann Institute of Science, Israel

Book contents

Get access

Summary

With rapid development in hardware storage, precision instrument manufacturing, and economic globalization etc., data in various forms have become ubiquitous in human life. This enormous amount of data can be a double-edged sword. While it provides the possibility of modeling the world with a higher fidelity and greater flexibility, improper modeling choices can lead to false discoveries, misleading conclusions, and poor predictions. Typical data-mining, machine-learning, and statistical-inference procedures learn from and make predictions on data by fitting parametric or non-parametric models. However, there exists no model that is universally suitable for all datasets and goals. Therefore, a crucial step in data analysis is to consider a set of postulated candidate models and learning methods (the model class) and select the most appropriate one. We provide integrated discussions on the fundamental limits of inference and prediction based on model-selection principles from modern data analysis. In particular, we introduce two recent advances of model-selection approaches, one concerning a new information criterion and the other concerning modeling procedure selection.

Keywords

model selection modeling procedure selection Akaike information criterion Bayesian information criterion Bridge criterion cross-validation

Type: Chapter
Information: Information-Theoretic Methods in Data Science , pp. 359 - 382

DOI: https://doi.org/10.1017/9781108616799.013 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Greenland, S., “Modeling and variable selection in epidemiologic analysis,” Am. J. Public Health, vol. 79, no. 3, pp. 340–349, 1989.Google Scholar

Andersen, C. M. and Bro, R., “Variable selection in regression – a tutorial,” J. Chemometrics, vol. 24, nos. 11–12, pp. 728–737, 2010.Google Scholar

Johnson, J. B. and Omland, K. S., “Model selection in ecology and evolution,” Trends Ecology Evolution, vol. 19, no. 2, pp. 101–108, 2004.Google Scholar

Stoica, P. and Selen, Y., “Model-order selection: A review of information criterion rules,” IEEE Signal Processing Mag., vol. 21, no. 4, pp. 36–47, 2004.Google Scholar

Kadane, J. B. and Lazar, N. A., “Methods and criteria for model selection,” J. Amer. Statist. Assoc., vol. 99, no. 465, pp. 279–290, 2004.CrossRef Google Scholar

Ding, J., Tarokh, V., and Yang, Y., “Model selection techniques: An overview,” IEEE Signal Processing Mag., vol. 35, no. 6, pp. 16–34, 2018.Google Scholar

Tibshirani, R., “Regression shrinkage and selection via the LASSO,” J. Roy. Statist. Soc. Ser. B, vol. 58, no. 1, pp. 267–288, 1996.Google Scholar

Shao, J., “An asymptotic theory for linear model selection,” Statist. Sinica, vol. 7, no. 2, pp. 221–242, 1997.Google Scholar

Rao, C. R., “Information and the accuracy attainable in the estimation of statistical parameters,” in Breakthroughs in statistics. Springer, 1992, pp. 235–247.Google Scholar

Ding, J., Tarokh, V., and Yang, Y., “Optimal variable selection in regression models,” http://jding. org/jie-uploads/2017/11/regression.pdf, 2016.Google Scholar

Nan, Y. and Yang, Y., “Variable selection diagnostics measures for high-dimensional regression,” J. Comput. Graphical Statist., vol. 23, no. 3, pp. 636–656, 2014.Google Scholar

Ioannidis, J. P., “Why most published research findings are false,” PLoS Medicine, vol. 2, no. 8, p. e124, 2005.Google Scholar

Shibata, R., “Asymptotically efficient selection of the order of the model for estimating parameters of a linear process,” Annals Statist., vol. 8, no. 1, pp. 147–164, 1980.Google Scholar

Yang, Y., “Comparing learning methods for classification,” Statist. Sinica, vol. 16, no. 2, pp. 635–657, 2006.Google Scholar

Ding, J., Tarokh, V., and Yang, Y., “Bridging AIC and BIC: A new criterion for autoregression,” IEEE Trans. lnformation Theory, vol. 64, no. 6, pp. 4024–4043, 2018.Google Scholar

Akaike, H., “A new look at the statistical model identification,” IEEE Trans. Automation Control, vol. 19, no. 6, pp. 716–723, 1974.Google Scholar

Akaike, H., “Information theory and an extension of the maximum likelihood principle,” in Selected papers of Hirotugu Akaike. Springer, 1998, pp. 199–213.Google Scholar

Schwarz, G., “Estimating the dimension of a model,” Annals Statist., vol. 6, no. 2, pp. 461–464, 1978.Google Scholar

Gelman, A., Stern, H. S., Carlin, J. B., Dunson, D. B., Vehtari, A., and Rubin, D. B., Bayesian data analysis. Chapman and Hall/CRC, 2013.Google Scholar

Van, A. W. der Vaart, Asymptotic statistics. Cambridge University Press, 1998, vol. 3.Google Scholar

Liu, J. S., Monte Carlo strategies in scientific computing. Springer Science & Business Media, 2008.Google Scholar

Allen, D. M., “The relationship between variable selection and data agumentation and a method for prediction,” Technometrics, vol. 16, no. 1, pp. 125–127, 1974.Google Scholar

Geisser, S., “The predictive sample reuse method with applications,” J. Amer. Statist. Assoc., vol. 70, no. 350, pp. 320–328, 1975. Fundamental Limits in Model Selection for Modern Data Analysis 381Google Scholar

Burman, P., “A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods,” Biometrika, vol. 76, no. 3, pp. 503–514, 1989.CrossRef Google Scholar

Shao, J., “Linear model selection by cross-validation,” J. Amer. Statist. Assoc., vol. 88, no. 422, pp. 486–494, 1993.Google Scholar

Zhang, P., “Model selection via multifold cross validation,” Annals Statist., vol. 21, no. 1, pp. 299–313, 1993.Google Scholar

Stone, M., “An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion,” J. Roy. Statist. Soc. Ser. B, pp. 44–47, 1977.Google Scholar

Hurvich, C. M. and Tsai, C.-L., “Regression and time series model selection in small samples,” Biometrika, vol. 76, no. 2, pp. 297–307, 1989.Google Scholar

Craven, P. and Wahba, G., “Smoothing noisy data with spline functions,” Numerische Mathematik, vol. 31, no. 4, pp. 377–403, 1978.Google Scholar

Akaike, H., “Fitting autoregressive models for prediction,” Ann. Inst. Statist. Math., vol. 21, no. 1, pp. 243–247, 1969.CrossRef Google Scholar

Akaike, H., “Statistical predictor identification,” Ann. lnst. Statist. Math., vol. 22, no. 1, pp. 203–217, 1970.Google Scholar

Mallows, C. L., “Some comments on C_p,” Technometrics, vol. 15, no. 4, pp. 661–675, 1973.Google Scholar

Hannan, E. J. and Quinn, B. G., “The determination of the order of an autoregression,” J. Roy. Statist. Soc. Ser. B, vol. 41, no. 2, pp. 190–195, 1979.Google Scholar

Rissanen, J., “Estimation of structure by minimum description length,” Circuits, Systems Signal Processing, vol. 1, no. 3, pp. 395–406, 1982.Google Scholar

Wei, C.-Z., “On predictive least squares principles,” Annals Statist., vol. 20, no. 1, pp. 1–42, 1992.Google Scholar

Nishii, R. et al., “Asymptotic properties of criteria for selection of variables in multiple regression,” Annals Statist., vol. 12, no. 2, pp. 758–765, 1984.CrossRef Google Scholar

Rao, R. and Wu, Y., “A strongly consistent procedure for model selection in a regression problem,” Biometrika, vol. 76, no. 2, pp. 369–374, 1989.CrossRef Google Scholar

Barron, A., Birgé, L., and Massart, P., “Risk bounds for model selection via penalization,” Probability Theory Related Fields, vol. 113, no. 3, pp. 301–413, 1999.Google Scholar

Shibata, R., “Selection of the order of an autoregressive model by Akaike’s information criterion,” Biometrika, vol. 63, no. 1, pp. 117–126, 1976.Google Scholar

Shibata, R., “An optimal selection of regression variables,” Biometrika, vol. 68, no. 1, pp. 45–54, 1981.Google Scholar

Yang, Y., “Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation,” Biometrika, vol. 92, no. 4, pp. 937–950, 2005.CrossRef Google Scholar

Wilks, S. S., “The large-sample distribution of the likelihood ratio for testing composite hypotheses,” Annals Math. Statist., vol. 9, no. 1, pp. 60–62, 1938.Google Scholar

Ing, C.-K., “Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series,” Annals Statist., vol. 35, no. 3, pp. 1238–1277, 2007.Google Scholar

Yang, Y., “Prediction/estimation with simple linear models: Is it really that simple?” Economic Theory, vol. 23, no. 1, pp. 1–36, 2007.Google Scholar

Liu, W. and Yang, Y., “Parametric or nonparametric? A parametricness index for model selection,” Annals Statist., vol. 39, no. 4, pp. 2074–2102, 2011.Google Scholar

van, T. Erven, Grünwald, P., and De Rooij, S., “Catching up faster by switching sooner: A predictive approach to adaptive estimation with an application to the AIC-BIC dilemma,” J. Roy. Statist. Soc. Ser. B., vol. 74, no. 3, pp. 361–417, 2012.Google Scholar

Box, G. E., Jenkins, G. M., and Reinsel, G. C., Time series analysis: Forecasting and control. John Wiley & Sons, 2011.Google Scholar

Zhang, Y. and Yang, Y., “Cross-validation for selecting a model selection procedure,” J. Econometrics, vol. 187, no. 1, pp. 95–112, 2015.Google Scholar

Scheetz, T. E., Kim, K.-Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dorrance, A. M., DiBona, G. F., Huang, J., Casavant, T. L., Sheffield, V. C., and Stone, E. M., “Regulation of gene expression in the mammalian eye and its relevance to eye disease,” Proc. Natl. Acad. Sci. USA, vol. 103, no. 39, pp. 14429–14434, 2006.CrossRef Google Scholar PubMed

Huang, J., Ma, S., and C.-H. Zhang, “Adaptive lasso for sparse high-dimensional regression models,,” Statist. Sinica, pp. 1603–1618, 2008.Google Scholar