On the Quantification of Model Uncertainty: A Bayesian Perspective

David Kaplan

doi:10.1007/s11336-021-09754-5

On the Quantification of Model Uncertainty: A Bayesian Perspective

Published online by Cambridge University Press: 01 January 2025

David Kaplan

Show author details

David Kaplan*: Affiliation:
University of Wisconsin–Madison
*: Correspondence should be made to David Kaplan, University of Wisconsin–Madison, Madison, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Issues of model selection have dominated the theoretical and applied statistical literature for decades. Model selection methods such as ridge regression, the lasso, and the elastic net have replaced ad hoc methods such as stepwise regression as a means of model selection. In the end, however, these methods lead to a single final model that is often taken to be the model considered ahead of time, thus ignoring the uncertainty inherent in the search for a final model. One method that has enjoyed a long history of theoretical developments and substantive applications, and that accounts directly for uncertainty in model selection, is Bayesian model averaging (BMA). BMA addresses the problem of model selection by not selecting a final model, but rather by averaging over a space of possible models that could have generated the data. The purpose of this paper is to provide a detailed and up-to-date review of BMA with a focus on its foundations in Bayesian decision theory and Bayesian predictive modeling. We consider the selection of parameter and model priors as well as methods for evaluating predictions based on BMA. We also consider important assumptions regarding BMA and extensions of model averaging methods to address these assumptions, particularly the method of Bayesian stacking. Simple empirical examples are provided and directions for future research relevant to psychometrics are discussed.

Keywords

Bayesian model averaging Bayesian stacking prediction

Type: Application Reviews and Case Studies
Information: Psychometrika , Volume 86 , Issue 1 , March 2021 , pp. 215 - 238

DOI: https://doi.org/10.1007/s11336-021-09754-5 [Opens in a new window]
Copyright: Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Akaike, H. Petrov, B. N., & Csaki, F. (1973). Information theory and an extension of the maximum likelihood principle Second international symposium on information theory, Budapest: Akademiai Kiado.Google Scholar

Berger, J. (2013). Statistical decision theory and Bayesian analysis. New York: Springer.Google Scholar

Bernardo, J., & Smith, A. F. M (2000). Bayesian theory. New York: Wiley.Google Scholar

Breiman, L. (1996). Stacked regressions. Machine Learning, 24, 49– 64.CrossRef Google Scholar

Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1– 3.2.0.CO;2>CrossRef Google Scholar

Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. (2). New York: Springer.Google Scholar

Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. Cambridge: Cambridge University Press.Google Scholar

Clarke, B. S., & Clarke, J. L. (2018). Predictive statistics: Analysis and inference beyond models. Cambridge: Cambridge University Press.CrossRef Google Scholar

Clyde, M. A. (1999). Bayesian model averaging and model search strategies. Bayesian statistics. Oxford: Oxford University Press. 157– 185.Google Scholar

Clyde, M. A. (2003). Model averaging. In In S. J. Press (Ed.), Subjective and objective Bayesian statistics: Principles, models, and applications (pp. 320–335). Hoboken, NJ: Wiley-Interscience.Google Scholar

Clyde, M. A. (2017). BAS: Bayesian adaptive sampling for bayesian model averaging [Computer software manual]. (R package version 1.4.7).Google Scholar

Clyde, M. A., & George, E. I. (2004). Model uncertainty. Statistical Science, 19, 81– 94.CrossRef Google Scholar

Clyde, M. A., & Iversen, E. S. (2013). Bayesian model averaging in the M-open framework. Bayesian theory and applications. Oxford: Oxford University Press. 483– 498.CrossRef Google Scholar

Dawid, A. P. (1982). The well-calibrated Bayesian. Journal of the American Statistical Association, 77, 605– 610.CrossRef Google Scholar

Dawid, A. P. (1984). Statistical theory: The prequential approach. Journal of the Royal Statistical Society, Series A, 147, 202– 278.CrossRef Google Scholar

de Finetti, B.Good, I. J. (1962). Does it make sense to speak of good probability appraisers. The scientist speculates—A anthology of partly-baked ideas, London: Heinemann. 357– 364.Google Scholar

Draper, D. (1995). Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society (Series B), 57, 55– 98.CrossRef Google Scholar

Draper, D. (2013). Bayesian model specification: Heuristics and examples. Bayesian theory and applications. Oxford: Oxford University Press. 483– 498.Google Scholar

Draper, D., Hodges, J. S., Leamer, E. E., Morris, C. N., & Rubin, D. B. (1987). https://www.rand.org/pubs/notes/N2683.html A Research Agenda for Assessment and Propagation of Model Uncertainty (Tech. Rep.). Santa Monica, CA: Rand Corporation. Retrieved from (N-2683-RC).Google Scholar

Eicher, T. S., Papageorgiou, C., & Raftery, A. E. (2011). Default priors and predictive performance in Bayesian model averaging, with application to growth determinants. Journal of Applied Econometrics, 26 (1), 30– 55.CrossRef Google Scholar

Feldkircher, M. & Zeugner, S. (2009). Benchmark priors revisited: on adaptive shrinkage and the supermodel effect in Bayesian model averaging (No. 9-202). International Monetary Fund.Google Scholar

Fernández, C., Ley, E., & Steel, M. F. J (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100, 381– 427.CrossRef Google Scholar

Fernández, C., Ley, E., & Steel, M. F. J (2001). Model uncertainty in cross-country growth regressions. Journal of Applied Econometrics, 16, 563– 576.CrossRef Google Scholar

Fletcher, D. (2018). Model averaging. Berlin: Springer.CrossRef Google Scholar

Foster, D. P., & George, E. I. (1994). The risk inflation criterion for multiple regression. Annals of Statistics, 22, 1947– 1975.CrossRef Google Scholar

Furnival, G. M., & Wilson, R. W. Jr (1974). Regressions by leaps and bounds. Technometrics, 16, 499– 511.CrossRef Google Scholar

Geisser, S., & Eddy, W. F. (1979). Journal of the American Statistical Association. 74, 153– 160.CrossRef Google Scholar

Gelfand, A. E. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Model determination using sampling-based methods. Markov Chain Monte Carlo in practice, Boca Raton: Chapman & Hall. 145– 161.Google Scholar

Gelman, A., Meng, X. -L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies: With commentary. Statistical Science, 6, 733– 807.Google Scholar

George, E., & Foster, D. (2000). Calibration and empirical Bayes variable selection. Biometrika, 1, 87Google Scholar

Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in practice, London: Chapman and Hall.Google Scholar

Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359– 378.CrossRef Google Scholar

Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society Series B (Methodological), 14, 107– 114.CrossRef Google Scholar

Goodrich, B. Gabry, J., Ali, I., & Brilleman, S. (2020). https://mc-stan.org/rstanarm rstanarm: Bayesian applied regression modeling via Stan. Retrieved from (R package version 2.21.1)Google Scholar

Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society Series B (Methodological), 41, 2 190– 195.CrossRef Google Scholar

Hansen, M. H., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96, 746– 774.CrossRef Google Scholar

Heckman, J. J., & Kautz, T. (2012). Hard evidence on soft skills. Labour Economics, 19, 451– 464.CrossRef Google Scholar PubMed

Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E. -J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science, 3, 200– 215.CrossRef Google Scholar

Hjort, N. L., & Claeskens, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879– 899.CrossRef Google Scholar

Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12 (1), 55– 67.CrossRef Google Scholar

Hoerl, R. W. (1985). Ridge analysis 25 years later. The American Statistician, 39, 3 186– 192.CrossRef Google Scholar

Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382– 417.Google Scholar

Hsiang, T. C. (1975). A Bayesian View on Ridge Regression. Journal of the Royal Statistical Society, D (The Statistician), 24, 267–268.CrossRef Google Scholar

Jose, V. R. R, Nau, R. F., & Winkler, R. L. (2008). Scoring rules, generalized entropy, and utility maximization. Operations Research, 56, 1146– 1157.CrossRef Google Scholar

Kaplan, D., & Chen, J. (2014). Bayesian model averaging for propensity score analysis. Multivariate Behavioral Research, 49, 505– 517.CrossRef Google Scholar PubMed

Kaplan, D., & Huang, M. (under review). Bayesian probabilistic forecasting with state NAEP data.Google Scholar

Kaplan, D., & Kuger, S. Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (2016). The methodology of PISA: Past, present, and future. Assessing contexts of learning world-wide—extended context assessment frameworks, Dordrecht: Springer.Google Scholar

Kaplan, D., & Lee, C. (2015). Bayesian model averaging over directed acyclic graphs with implications for the predictive performance of structural equation models. Structural Equation Modeling,Google Scholar

Kaplan, D., & Yavuz, S. (2019). An approach to addressing multiple imputation model uncertainty using Bayesian model averaging. Multivariate Behavioral Research, 1, 21Google Scholar

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773– 795.CrossRef Google Scholar

Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (2016). Assessing contexts of learning: An international perspective. Dordrecht: Springer.CrossRef Google Scholar

Kullback, S. (1959). Information theory and statistics. New York: Wiley.Google Scholar

Kullback, S. (1987). The Kullback–Leibler distance. The American Statistician, 41, 340– 341.Google Scholar

Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79– 86.CrossRef Google Scholar

Le, T., & Clarke, B. (2017).A Bayes interpretation of stacking for

M

\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\cal{M}$$\end{document}

-complete and

M

-open settings. Bayesian Analysis, 12, 807– 829.Google Scholar

Leamer, E. E. (1978). Specification searches: Ad hoc inference with nonexperimental data, New York: Wiley.Google Scholar

Ley, E., & Steel, M. F. J (2009). On the effect of prior assumptions in bayesian model averaging with applications to growth regression. Journal of Applied Econometrics, 24, 651– 674.CrossRef Google Scholar

Li, Q.Lin, N. (2010). The Bayesian elastic net. Bayesian Analysis, 5, 151– 170.CrossRef Google Scholar

Liang, F., Paulo, R., Molina, G., Clyde, M. A., & Berger, J. (2008).Mixtures of

g

-priors for Bayesian variable selection. Journal of the American Statistical Association, 103, 410–423.CrossRef Google Scholar

Lindley, D. (1991). Making Decisions, London: Wiley.Google Scholar

Madigan, D., & Raftery, A. E. (1994). Model selection and accounting for model uncertainly in graphical models using Occam’s window. Journal of the American Statistical Association, 89, 1535– 1546.CrossRef Google Scholar

Madigan, D., & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215– 232.CrossRef Google Scholar

Merkle, E. C., & Steyvers, M. (2013). Choosing a strictly proper scoring rule. Decision Analysis, 10, 292– 304.CrossRef Google Scholar

Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177– 196.CrossRef Google Scholar

Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29, 133– 161.CrossRef Google Scholar

Montgomery, J. M.,Nyhan, B.(2010).Bayesian model averaging: Theoretical developments and practical applications.Political Analysis,18,245–270.CrossRef Google Scholar

OECD. (2002). PISA 2000 Technical Report. Paris: Organization for Economic Cooperation and Development.Google Scholar

OECD. (2009). Pisa 2009 assessment framework-key competencies in reading, mathematics and science. Paris: Organization for Economic Cooperation and Development.Google Scholar

OECD. (2017). PISA 2015 Technical ReportParis: OECD.Google Scholar

OECD. (2018). https://doi.org/10.1787/9789264073234-en. Equity in Education: Breaking Down Barriers to Social Mobility (Tech. Rep.). Paris.Google Scholar

Park, T., &Casella, G.(2008).The Bayesian lasso.Journal of the American Statistical Association,103,681–686.CrossRef Google Scholar

Piironen, J., &Vehtari, A.(2017).Comparison of Bayesian prediction methods for model selection.Statistics and Computing,27,711–735.CrossRef Google Scholar

Raftery, A. E.Marsden, P. V.(1995).Bayesian model selection in social research (with discussion).Sociological Methodology,New York:Blackwell.111–196.Google Scholar

Raftery, A. E.(1996).Approximate Bayes factors and accounting for model uncertainty in generalized linear models.Biometrika,83,251–266.CrossRef Google Scholar

Raftery, A. E.,Gneiting, T.,Balabdaoui, F., &Polakowski, M.(2005).Using Bayesian model averaging to calibrate forecast ensembles.Monthly Weather Review,133,1155–1174.CrossRef Google Scholar

Raftery, A. E.,Hoeting, J., Volinsky, C., Painter, I., & Yeung, K. (2015). http://CRAN.R-project.org/package=BMABMA: Bayesian model averaging [Computer software manual]. Retrieved from (R package version 3.18.1).Google Scholar

Raftery, A. E.,Madigan, D., &Hoeting, J. A.(1997).Bayesian model averaging for linear regression models.Journal of the American Statistical Association,92,179–191.CrossRef Google Scholar

Raftery, A. E., &Zheng, Y.(2003).Discussion: Performance of Bayesian model averaging.Journal of the American Statistical Association,98,931–938.CrossRef Google Scholar

Rights, J.,Sterba, S.,Cho, S. -J., &Preacher, K.(2018).Addressing model uncertainty in item response theory person scores through model averaging.Behaviormetrika,45,495–503.CrossRef Google Scholar

Rubin, D. B.(1981).The Bayesian bootstrap.The Annals of Statistics,9,130–134.CrossRef Google Scholar

Sloughter, J. M.,Gneiting, T., &Raftery, A. E.(2013).Probabilistic wind vector forecasting using ensembles and Bayesian model averaging.Monthly Weather Review,141,2107–2119.CrossRef Google Scholar

Steel, M. F. J.(2020).Model averaging and its use in economics.Journal of Economic Literature,58,644–719.CrossRef Google Scholar

Tibshirani, R.(1996).Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society Series B (Methodological),58,267–288.CrossRef Google Scholar

Tierney, L., &Kadane, J. B.(1986).Accurate approximations for posterior moments and marginal densities.Journal of the American Statistical Association,81,82–86.CrossRef Google Scholar

Vehtari, A.,Gabry, J., Yao, Y., & Gelman, A.(2019). loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. Retrieved from https://CRAN.R-project.org/package=loo (R package version 2.1.0).Google Scholar

Vehtari, A., Gelman, A., &Gabry, J.(2017).Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing,27,1413–1432.CrossRef Google Scholar

Vehtari, A., &Ojanen, J.(2012).A survey of Bayesian predictive methods for model assessment, selection and comparison.Statistics Surveys,6,142–228.CrossRef Google Scholar

Watanabe, S.(2010).Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.Journal of Machine Learning Research,11,3571–3594.Google Scholar

Winkler, R. L.(1996).Scoring rules and the evaluation of probabilities.Test,5,1–60.CrossRef Google Scholar

Wolpert, D. H.(1992).Stacked generalization.Neural Networks,5,241–259.CrossRef Google Scholar

Yao, Y.,Vehtari, A.,Simpson, D., &Gelman, A.(2018).Using stacking to average Bayesian predictive distributions (with discussion).Bayesian Analysis,13,917–1007.CrossRef Google Scholar

Yeung, K. Y.,Bumbarner, R. E., &Raftery, A. E.(2005).Bayesian model averaging: Development of an improved multi-class, gene selection, and classification tool for microarray data.Bioinformatics,21,2394–2402.CrossRef Google Scholar PubMed

Zellner, A.Goel, P, &Zellner, A.(1986).On assessing prior distributions and Bayesian regression analysis with prior distributions.Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti . Studies in Bayesian Econometrics,New York:Elsevier.233–243.Google Scholar

Zeugner, S., &Feldkircher, M.(2015).Bayesian model averaging employing fixed and flexible priors: The BMS package for R.Journal of Statistical Software,68,41–37.CrossRef Google Scholar

Zou, H., &Hastie, T.(2005).Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society: Series B (Statistical Methodology),67,301–320.CrossRef Google Scholar

Article contents

On the Quantification of Model Uncertainty: A Bayesian Perspective

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests