Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods

Edgar C. Merkle; Daniel Furr; Sophia Rabe-Hesketh

doi:10.1007/s11336-019-09679-0

Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods

Published online by Cambridge University Press: 01 January 2025

Edgar C. Merkle

Daniel Furr and

Sophia Rabe-Hesketh

Show author details

Edgar C. Merkle*: Affiliation:
University of Missouri
Daniel Furr: Affiliation:
University of California, Berkeley
Sophia Rabe-Hesketh: Affiliation:
University of California, Berkeley
*: Correspondence should be made to Edgar C. Merkle, University of Missouri, Columbia, MO, USA.Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe–Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where “clusters” typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out cross-validation, whereas conditional WAIC corresponds to leave-one-unit out. These results lead to recommendations on the general application of the criteria to models with latent variables.

Keywords

Bayesian information criteria conditional likelihood cross-validation DIC IRT leave-one-cluster out marginal likelihood MCMC SEM WAIC

Type: Original Paper
Information: Psychometrika , Volume 84 , Issue 3 , September 2019 , pp. 802 - 829

DOI: https://doi.org/10.1007/s11336-019-09679-0 [Opens in a new window]
Copyright: Copyright © 2019 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-019-09679-0) contains supplementary material, which is available to authorized users.

The R code and the real MGRM item parameters used in this paper are available online.

References

Celeux, G.Forbes, F.Robert, C. P., & Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1 (4), 651–673.CrossRef Google Scholar

daSilva, M. A.Bazán, J. L., & Huggins-Manley, A. C. (2019). Sensitivity analysis and choosing between alternative polytomous IRT models using Bayesian model comparison criteria. Communications in Statistics-Simulation and Computation, 48 (2), 601–620.CrossRef Google Scholar

De Boeck, P. (2008). Random item IRT models Random item IRT models. Psychometrika, 73, 533–559.CrossRef Google Scholar

Denwood, M. J. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71 (9), 1–25. 10.18637/jss.v071.i09.CrossRef Google Scholar

Efron, B. (1986). How biased is the apparent error rate of a prediction rule?. Journal of the American Statistical Association, 81, 461–470.CrossRef Google Scholar

Fox, J. P. (2010). Bayesian item response modeling: Theory and applications, New York, NY: Springer.CrossRef Google Scholar

Furr, D. C. (2017). Bayesian and frequentist cross-validation methods for explanatory item response models. (Unpublished doctoral dissertation). University of California Berkeley, CA.Google Scholar

Gelfand, A. E.Sahu, S. K., & Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, 82, 379–488.CrossRef Google Scholar

Gelman, A.Carlin, J. B.Stern, H. S.Dunson, D. B.Vehtari, A.Rubin, D. B.et.al (2013). Bayesian data analysis, 3 New York: Chapman & Hall/CRC.CrossRef Google Scholar

Gelman, A.Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1016.CrossRef Google Scholar

Gelman, A.Jakulin, A.Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2, 1360–1383.CrossRef Google Scholar

Gelman, A.Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733–807.Google Scholar

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457–511.CrossRef Google Scholar

Gronau, Q. F., & Wagenmakers, E. J. (2018). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational Brain & Behavior, 2 (1), 1–11.CrossRef Google Scholar PubMed

Hoeting, J. A.Madigan, D.Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382–417.Google Scholar

Kang, T.Cohen, A. S., & Sung, H. J. (2009). Model selection indices for polytomous items. Applied Psychological Medicine, 35, 499–518.CrossRef Google Scholar

Kaplan, D. (2014). Bayesian statistics for the social sciences, New York, NY: The Guildford Press.Google Scholar

Lancaster, T. (2000). The incidental parameter problem since 1948. Journal of Econometrics, 95, 391–413.CrossRef Google Scholar

Levy, R.Mislevy, R. J. (2016). Bayesian psychometric modeling, Boca Raton, FL: Chapman & Hall.Google Scholar

Li, F.Cohen, A. S.Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33, 353–373.CrossRef Google Scholar

Li, L.Qui, S., & Feng, C. X. (2016). Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC. Statistics and Computing, 26, 881–897.CrossRef Google Scholar

Lu, Z. H.Chow, S. M., & Loken, E. (2017). A comparison of Bayesian and frequentist model selection methods for factor analysis models. Psychological Methods, 22 (2), 361–381.CrossRef Google Scholar PubMed

Lunn, D.Jackson, C.Best, N.Thomas, A.Spiegelhalter, D. (2012). The BUGS book: A practical introduction to Bayesian analysis, New York, NY: Chapman & Hall/CRC.CrossRef Google Scholar

Lunn, D.Thomas, A.Best, N., & Spiegelhalter, D. (2000). WinBUGS—a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.CrossRef Google Scholar

Luo, U., & Al-Harbi, K. (2017). Performances of LOO and WAIC as IRT model selection methods. Psychological Test and Assessment Modeling, 59, 183–205.Google Scholar

Marshall, E. C., & Spiegelhalter, D. J. (2007). Identifying outliers in Bayesian hierarchical models: A simulation-based approach. Bayesian Analysis, 2 (2), 409–444.CrossRef Google Scholar

McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan, New York, NY: Chapman & Hall/CRC.Google Scholar

Merkle, E. C., & Rosseel, Y. (2018). blavaan: Bayesian structural equation models via parameter expansion. Journal of Statistical Software, 85 (4), 1–30.CrossRef Google Scholar

Millar, R. B. (2009). Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors. Biometrics, 65, 962–969.CrossRef Google Scholar PubMed

Millar, R. B. (2018). Conditional vs. marginal estimation of predictive loss of hierarchical models using WAIC and cross-validation. Statistics and Computing, 28, 375–385.CrossRef Google Scholar

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195.CrossRef Google Scholar

Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17, 313–335.CrossRef Google Scholar PubMed

Navarro, D. (2018). Between the devil and the deep blue sea: Tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2 (1), 28–34.CrossRef Google Scholar

Naylor, J. C., & Smith, A. F. (1982). Applications of a method for the efficient computation of posterior distributions. Journal of the Royal Statistical Society C (Applied Statistics), 31, 214–225.Google Scholar

Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 1–32.CrossRef Google Scholar

O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63, 329–333.CrossRef Google Scholar

Piironen, J., & Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27, 711–735.CrossRef Google Scholar

Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational Graphics and Statistics, 4, 12–35.CrossRef Google Scholar

Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In K. Hornik, Leisch, F. & Zeileis, A. (Eds.), Proceedings of the 3rd international workshop on distributed statistical computing.Google Scholar

Plummer, M. (2008). Penalized loss functions for Bayesian model comparison. Biostatistics, 9 (3), 523–539.CrossRef Google Scholar PubMed

Rabe-Hesketh, S.Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128 (2), 301–323.CrossRef Google Scholar

Raftery, A. E.Lewis, S. M. (1995). The number of iterations, convergence diagnostics, and generic Metropolis algorithms, London: Chapman and Hall.Google Scholar

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods, 2 Thousand Oaks, CA: Sage.Google Scholar

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 1–36.CrossRef Google Scholar

Song, X. Y., & Lee, S. Y. (2012). Basic and advanced Bayesian structural equation modeling: With applications in the medical and behavioral sciences, Chichester, UK: Wiley.CrossRef Google Scholar

Spiegelhalter, D. J.Best, N. G.Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series, B64, 583–639.CrossRef Google Scholar

Spielberger, C. (1988). State-trait anger expression inventory research edition [Computer software manual]. FL: Odessa.Google Scholar

Stan Development Team. (2014). Stan modeling language users guide and reference manual, version 2.5.0 [Computer software manual]. http://mc-stan.org/.Google Scholar

Trevisani, M., & Gelfand, A. E. (2003). Inequalities between expected marginal log-likelihoods, with implications for likelihood-based model complexity and comparison measures. The Canadian Journal of Statistics, 31, 239–250.CrossRef Google Scholar

Vansteelandt, K. (2000). Formal models for contextualized personality psychology (Unpublished doctoral dissertation), Belgium: University of Leuven Leuven.Google Scholar

Vehtari, A., Gelman, A., & Gabry, J. (2016). loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. R package version 0.1.6. https://github.com/stan-dev/loo.Google Scholar

Vehtari, A.Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432.CrossRef Google Scholar

Vehtari, A.Mononen, T.Tolvanen, V.Sivula, T., & Winther, O. (2016). Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. Journal of Machine Learning Research, 17, 1–38.Google Scholar

Vehtari, A.Simpson, D. P.Yao, Y., & Gelman, A. (2018). Limitations of "Limitations of Bayesian leave-one-out cross-validation for model selection". Computational Brain & Behavior, 2 (1), 22–27.CrossRef Google Scholar

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.Google Scholar

White, I. R. (2010). simsum: Analyses of simulation studies including Monte Carlo error. The Stata Journal, 10, 369–385.CrossRef Google Scholar

Wicherts, J. M.Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89 (5), 696–716.CrossRef Google Scholar PubMed

Yao, Y.Vehtari, A.Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions (with discussion). Bayesian Analysis, 13, 917–1007. https://doi.org/10.1214/17-BA1091.CrossRef Google Scholar

Zhang, X.Tao, J.Wang, C., & Shi, N. Z. (2019). Bayesian model selection methods for multilevel IRT models: A comparison of five DIC-based indices. Journal of Educational Measurement, 56, 3–27.CrossRef Google Scholar

Zhao, Z., & Severini, T. A. (2017). Integrated likelihood computation methods. Computational Statistics, 32, 281–313.CrossRef Google Scholar

Zhu, X., & Stone, C. A. (2012). Bayesian comparison of alternative graded response models for performance assessment applications. Educational and Psychological Measurement, 7 (2), 5774–799.Google Scholar

Merkle et al. supplementary material

File 7.4 KB

Article contents

Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Merkle et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests