Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-20T04:03:32.373Z Has data issue: false hasContentIssue false

A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis

Published online by Cambridge University Press:  01 January 2025

Michael C. Edwards*
Affiliation:
The Ohio State University
*
Requests for reprints should be sent to Michael C. Edwards, 1827 Neil Avenue, Columbus, OH 43210, USA. E-mail: [email protected]

Abstract

Item factor analysis has a rich tradition in both the structural equation modeling and item response theory frameworks. The goal of this paper is to demonstrate a novel combination of various Markov chain Monte Carlo (MCMC) estimation routines to estimate parameters of a wide variety of confirmatory item factor analysis models. Further, I show that these methods can be implemented in a flexible way which requires minimal technical sophistication on the part of the end user. After providing an overview of item factor analysis and MCMC, results from several examples (simulated and real) will be discussed. The bulk of these examples focus on models that are problematic for current “gold-standard” estimators. The results demonstrate that it is possible to obtain accurate parameter estimates using MCMC in a relatively user-friendly package.

Type
Original Paper
Copyright
Copyright © 2010 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I would like to thank Li Cai, David Thissen, and R.J. Wirth for comments on earlier versions of this draft. I would like to thank Roger Millsap and the reviewers for their guidance on revisions. The resulting paper is better for all of your efforts. Any remaining faults are my own.

References

Adams, R.J., Wilson, M., Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 123.CrossRefGoogle Scholar
Albert, J.H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17, 251269.CrossRefGoogle Scholar
Albert, J.H., Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669679.CrossRefGoogle Scholar
Béguin, A.A., Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541561.CrossRefGoogle Scholar
Best, N.G., Cowles, M.K., Vines, S.K. (1997). coda: Convergence diagnosis and output analysis software for Gibbs sampling output, Cambridge: University of Cambridge, Institute of Public Health, Medical Research Council Biostatistics Unit (Version 0.4) [Computer software]Google Scholar
Bock, R.D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of the EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Bock, R.D., Gibbons, R., Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261280.CrossRefGoogle Scholar
Bock, R.D., Gibbons, R., Schilling, S.G., Muraki, E., Wilson, D.T., Wood, R. (2002). TESTFACT 4, Chicago: Scientific Software International, Inc. [Computer software]Google Scholar
Bolt, D.M., Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional IRT models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395414.CrossRefGoogle Scholar
Bradlow, E.T., Wainer, H., Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153168.CrossRefGoogle Scholar
Cai, L. (In Press-a). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika.Google Scholar
Cai, L. (In Press-b). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics.Google Scholar
Cai, L., Maydeu-Olivares, A., Coffman, D.L., Thissen, D. (2006). Limited-information goodness-of-fit testing of item response models for sparse 2p tables. British Journal of Mathematical and Statistical Psychology, 59, 173194.CrossRefGoogle ScholarPubMed
Casella, G., George, E.I. (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167174.CrossRefGoogle Scholar
Chen, M.-H., Shao, Q.-M., Ibrahim, J.G. (2000). Monte Carlo methods in Bayesian computation, New York: Springer.CrossRefGoogle Scholar
Chib, S., Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49, 327335.CrossRefGoogle Scholar
Cowles, M.K. (1996). Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Statistics and Computing, 6, 101111.CrossRefGoogle Scholar
Cowles, M.K., Carlin, B. (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91, 883904.CrossRefGoogle Scholar
de la Torre, J., Patz, R.J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30, 295311.CrossRefGoogle Scholar
DeMars, C.E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43, 145168.CrossRefGoogle Scholar
DeMars, C.E. (2007). “Guessing” parameter estimates for multidimensional item response theory models. Educational and Psychological Measurement, 67, 433446.CrossRefGoogle Scholar
Edwards, M.C. (2005a). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Unpublished doctoral dissertation, University of North Carolina at Chapel Hill.Google Scholar
Edwards, M.C. (2005b). MultiNorm: Multidimensional normal ogive item response theory analysis [Computer software].Google Scholar
Edwards, M.C., Vevea, J.L. (2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow?. The Journal of Educational and Behavioral Statistics, 31, 241259.CrossRefGoogle Scholar
Edwards, M.C., Wirth, R.J. (2009). Measurement and the study of change. Research in Human Development, 6, 7496.CrossRefGoogle Scholar
Fox, J.-P., Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269286.CrossRefGoogle Scholar
Gamerman, D. (1997). Markov chain Monte Carlo, New York: Chapman and Hall.Google Scholar
Gelman, A. (1996). Inference and monitoring convergence. In Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), Markov chain Monte Carlo in practice (pp. 131143). London: Chapman and Hall.Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2004). Bayesian data analysis, (2nd ed.). New York: Chapman and Hall.Google Scholar
Gelman, A., Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457511.CrossRefGoogle Scholar
Geman, S., Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.CrossRefGoogle ScholarPubMed
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bernardo, J.M., Berger, J., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian statistics 4 (pp. 169193). Oxford: Oxford University Press.CrossRefGoogle Scholar
Gibbons, R.D., Bock, R.D., Hedeker, D., Weiss, D.J., Segawa, E., Bhaumik, D.K.et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 419.CrossRefGoogle Scholar
Gibbons, R.D., Hedeker, D.R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423436.CrossRefGoogle Scholar
Gibbons, R.D., Rush, A.J., Immekus, J.C. (2009). On the psychometric validity of the domains of the pdsq: An illustration of the bi-factor item response theory model. Journal of Psychiatric Research, 43, 401410.CrossRefGoogle ScholarPubMed
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (1996). Introducing Markov chain Monte Carlo. In Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), Markov chain Monte Carlo in practice (pp. 119). New York: Chapman and Hall.Google Scholar
Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (1996). Markov chain Monte Carlo in practice, New York: Chapman and Hall.Google Scholar
Gill, J. (2008). Bayesian methods: A social and behavioral sciences approach, New York: Chapman and Hall/CRC.Google Scholar
Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97109.CrossRefGoogle Scholar
Heidelberger, P., Welch, P.D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31, 11091144.CrossRefGoogle Scholar
Hill, C.D., Edwards, M.C., Thissen, D., Langer, M.M., Wirth, R.J., Burwinkle, T.M.et al. (2007). Practical issues in the application of item response theory: A demonstration using item form the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales. Medical Care, 45, S39S47.CrossRefGoogle Scholar
Holzinger, K.J., Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 4154.CrossRefGoogle Scholar
Jöreskog, K.G., Sörbom, D. (2001). LISREL user’s guide, Chicago: SSI International.Google Scholar
Jöreskog, K.G., Sörbom, D. (2003). LISREL 8.54, Chicago: Scientific Software International, Inc [Computer software]Google Scholar
Kang, T., Cohen, A.S. (2007). Irt model selection methods for dichotomous items. Applied Psychological Measurement, 31, 331358.CrossRefGoogle Scholar
Kass, R.E., Carlin, B.P., Gelman, A., Neal, R.M. (1998). Markov chain Monte Carlo in practice: A roundtable discussion. The American Statistician, 52, 93100.CrossRefGoogle Scholar
Lord, F.M., Novick, M.R. (1968). Statistical theories of mental test scores, Reading: Addison-Wesley.Google Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 10871092.CrossRefGoogle Scholar
Metropolis, N., Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association, 44, 335341.CrossRefGoogle ScholarPubMed
Patz, R.J., Junker, B.W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146178.CrossRefGoogle Scholar
Patz, R.J., Junker, B.W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342366.CrossRefGoogle Scholar
Pearson, K. (1914). The life, letters and labours of Francis Gallon, Cambridge: Cambridge University Press.Google Scholar
R Development Core Team (2005). R: A language and environment for statistical computing [Computer software]. Vienna: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org. Available from http://www.R-project.org.Google Scholar
Raftery, A.E., Lewis, S. (1992). How many iterations in the Gibbs sampler?. In Bernardo, J.M., Berger, J., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian statistics 4 (pp. 763773). Oxford: Oxford University Press.CrossRefGoogle Scholar
Roberts, G.O. (1996). Markov chain concepts related to sampling algorithms. In Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), Markov chain Monte Carlo in practice (pp. 4557). New York: Chapman and Hall.Google Scholar
Samejima, F. (1969). Psychometrika Monograph, No. 17: Estimation of latent ability using a response pattern of graded scores.Google Scholar
Schilling, S., Bock, R.D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533555.Google Scholar
Segall, D.O. (2002). Confirmatory item factor analysis using Markov chain Monte Carlo estimation with applications to online calibration in CAT. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.Google Scholar
Shi, J.-Q., Lee, S.-Y. (1998). Bayesian sampling-based approach for factor analysis models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 51, 233252.CrossRefGoogle Scholar
Sinharay, S. (2004). Experiences with Markov chain Monte Carlo convergence assessment in two psychometric examples. Journal of Educational and Behavioral Statistics, 29, 461488.CrossRefGoogle Scholar
Sinharay, S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298321.CrossRefGoogle Scholar
Takane, Y., de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393408.CrossRefGoogle Scholar
Tanner, M.A. (1996). Tools for statistical inference, New York: Springer.CrossRefGoogle Scholar
Tanner, M.A., Wong, W.H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528550.CrossRefGoogle Scholar
Thissen, D. (1991). Multilog: Multiple category item analysis and test scoring using item response theory, Chicago: Scientific Software International, Inc [Computer software]Google Scholar
Thurstone, L.L. (1947). Multiple-factor analysis, Chicago: University of Chicago Press.Google Scholar
Wainer, H., Bradlow, E.T., Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in testlet-based adaptive testing. In van der Linden, W.J., Glas, C.A.W. (Eds.), Computerized adaptive testing: Theory and practice (pp. 245270). Boston: Kluwer Academic.CrossRefGoogle Scholar
Wainer, H., Kiely, G. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185202.CrossRefGoogle Scholar
Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., Nelson, L.et al. (2001). Augmented scores—“Borrowing strength” to compute scores based on a small number of items. In Thissen, D., Wainer, H.et al. (Eds.), Test scoring (pp. 347387). Mahwah: Lawrence Erlbaum Associates, Inc.Google Scholar
Wang, X., Bradlow, E.T., Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications. Applied Psychological Measurement, 26, 109128.CrossRefGoogle Scholar
Wirth, R.J., Edwards, M.C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 5879.CrossRefGoogle ScholarPubMed