Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-21T22:08:53.420Z Has data issue: false hasContentIssue false

Model Selection and Akaike's Information Criterion (AIC): The General Theory and its Analytical Extensions

Published online by Cambridge University Press:  01 January 2025

Hamparsum Bozdogan*
Affiliation:
University of Virginia
*
Requests for reprints should be sent to the author at the Department of Mathematics, Math-Astronomy Building, University of Virginia, Charlottesville, VA 22903.

Abstract

During the last fifteen years, Akaike's entropy-based Information Criterion (AIC) has had a fundamental impact in statistical model evaluation problems. This paper studies the general theory of the AIC procedure and provides its analytical extensions in two ways without violating Akaike's main principles. These extensions make AIC asymptotically consistent and penalize overparameterization more stringently to pick only the simplest of the “true” models. These selection criteria are called CAIC and CAICF. Asymptotic properties of AIC and its extensions are investigated, and empirical performances of these criteria are studied in choosing the correct degree of a polynomial model in two different Monte Carlo experiments under different conditions.

Type
Special Section
Copyright
Copyright © 1987 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The author extends his deep appreciation to many people. These include Hirotugu Akaike, Donald E. Ramirez, Marvin Rosenblum, and S. James Taylor for reading and commenting on some parts of this manuscript through various stages of its development. I especially wish to thank Yoshio Takane, Jim Ramsay, and Stanley L. Sclove for critically reading the paper and making many helpful suggestions. I also wish to thank Julie Riddleberger for her excellent typing of this manuscript.

This research was partially supported by NIH Biomedical Research Support Grant (BRSG) No. 5-24867 at the University of Virginia.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N., Csaki, B. F. (Eds.), Second International Symposium on Information Theory (pp. 267281). Budapest: Academiai Kiado.Google Scholar
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC-19, 716723.CrossRefGoogle Scholar
Akaike, H. (1976). Canonical correlation analysis of time series and the use of an information criterion. In Mehra, R. K., Lainiotis, D. G. (Eds.), System identification (pp. 2796). New York: Academic Press.Google Scholar
Akaike, H. (1977). On entropy maximization principle. In Krishnaiah, P. R. (Eds.), Proceedings of the Symposium on Applications of Statistics (pp. 2747). Amsterdam: North-Holland.Google Scholar
Akaike, H. (1978). On newer statistical approaches to parameter estimation and structure determination. International Federation of Automatic Control, 3, 18771884.Google Scholar
Akaike, H. (1979). A Bayesian extension of the minimum AIC procedure of autogressive model fitting. Biometrika, 66, 237242.CrossRefGoogle Scholar
Akaike, H. (1981). Likelihood of a model and information criteria. Journal of Econometrics, 16, 314.CrossRefGoogle Scholar
Akaike, H. (1981). Modern development of statistical methods. In Eykhoff, P. (Eds.), Trends and progress in system identification (pp. 169184). New York: Pergamon Press.CrossRefGoogle Scholar
Akaike, H. (1987). Factor Analysis and AIC. Psychometrika, 52.CrossRefGoogle Scholar
Anderson, T. W. (1962). The choice of the degree of a polynomial regression as a multiple decision problem. Annals of Mathematical Statistics, 33, 255265.CrossRefGoogle Scholar
Atilgan, T. (1983). Parameter parsimony, model selection, and smooth density estimation, Madison: University of Wisconsin, Department of Statistics.Google Scholar
Atilgan, T., & Bozdogan, H. (1987, June). Information-theoretic univariate density estimation under different basis functions. A paper presented at the First Conference of the International Federation of Classification Societies, Aachen, West Germany.Google Scholar
Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model. Biometrika, 67, 413418.CrossRefGoogle Scholar
Bhansali, R. J., Downham, D. Y. (1977). Some properties of the order of an autoregressive model selected by a generalization of Akaike's FPE criterion. Biometrika, 64, 547551.Google Scholar
Boltzmann, L. (1877). Über die Beziehung zwischen dem zweitin Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respective den Sätzen über das Wärmegleichgewicht. Wiener Berichte, 76, 373435.Google Scholar
Čencov, N. N. (1982). Statistical decision rules and optimal inference, Providence, RI: American Mathematical Society.Google Scholar
Clergeot, H. (1984). Filter-order selection in adaptive maximum likelihood estimation. IEEE Transactions on Information Theory, IT-30(2), 199210.CrossRefGoogle Scholar
Cox, D. R., Hinkley, D. V. (1974). Theoretical statistics, London: Chapman and Hall.CrossRefGoogle Scholar
Davis, M. H. A., Vinter, R. B. (1985). Stochastic modelling and control, New York: Chapman and Hall.CrossRefGoogle Scholar
Efron, B. (1967). The power of the likelihood ratio test. Annals of Mathematical Statistics, 38, 802806.CrossRefGoogle Scholar
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Royal Society of London. Philosophical Transactions (Series A), 222, 309368.Google Scholar
Graybill, F. A. (1976). Theory and application of the linear model, Boston: Duxbury Press.Google Scholar
Hannan, E. J. (1986). Remembrance of things past. In Gani, J. (Eds.), The craft of probabilistic modelling, New York: Springer-Verlag.Google Scholar
Hannan, E. J., Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society, (Series B), 41, 190195.CrossRefGoogle Scholar
Haughton, D. (1983). On the choice of a model to fit data from an exponential family, Cambridge, MA: Massachusetts Institute of Technology, Department of Mathematics.Google Scholar
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106, 620630.CrossRefGoogle Scholar
Kashyap, R. L. (1982). Optimal choice of AR and MA parts in autoregressive moving average models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 99104.CrossRefGoogle Scholar
Kendall, M. G., Stuart, M. A. (1967). The Advanced Theory of Statistics, Vol. 2 Second Edition,, New York: Hafner Publishing.Google Scholar
Kitagawa, G. (1979). On the use of AIC for the detection of outliers. Technometrics, 21, 193199.CrossRefGoogle Scholar
Kullback, S. (1959). Information theory and statistics, New York: John Wiley & Sons.Google Scholar
Kullback, S., Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 7986.CrossRefGoogle Scholar
Larimore, W. E., & Mehra, R. K. (1985, October). The problems of overfitting data. Byte, pp. 167180.Google Scholar
Lindley, D. V. (1968). The choice of variables in multiple regression (with discussion). Journal of the Royal Statistical Scociety, (Series B), 30, 3136.CrossRefGoogle Scholar
Neyman, J., Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20A, 175240.Google Scholar
Neyman, J., Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Royal Society of London. Philosophical Transactions, (Series A), 231, 289337.Google Scholar
Parzen, E. (1982). Data modeling using quantile and density-quantile functions. In de Oliveira, J. T., Epstein, B. (Eds.), Some recent advances in statistics (pp. 2352). London: Academic Press.Google Scholar
Quinn, B. G. (1980). Order determination for a multivariate autoregression. Journal of the Royal Statistical Society, (Series B), 42, 182185.CrossRefGoogle Scholar
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465471.CrossRefGoogle Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika, 52.CrossRefGoogle Scholar
Shibata, R. (1983). A theoretical view of the use of AIC. In Anderson, O. D. (Eds.), Time series analysis: Theory and practice, Vol. 4 (pp. 237244). Amsterdam: North-Holland.Google Scholar
Silvey, S. D. (1975). Statistical inference, London: Chapman and Hall.Google Scholar
Stone, C. J. (1981). Admissible selection of an accurate and parsimonious normal linear regression model. Annals of Statistics, 9, 475485.CrossRefGoogle Scholar
Teräsvirta, T., Mellin, I. (1986). Model selection criteria and model selection tests in regression models. Scandinavian Journal of Statistics, 13, 159171.Google Scholar
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54, 426482.CrossRefGoogle Scholar
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 126.CrossRefGoogle Scholar
Wilks, S. S. (1962). Mathematical Statistics, New York: John Wiley & Sons.Google Scholar
Woodroofe, M. (1982). On model selection and the arc sine laws. Annals of Statistics, 10, 11821194.CrossRefGoogle Scholar