Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-08T08:07:57.391Z Has data issue: false hasContentIssue false

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Published online by Cambridge University Press:  01 January 2025

Sun-Joo Cho*
Affiliation:
Vanderbilt University
Paul De Boeck
Affiliation:
Ohio State University and Ku Leuven
Susan Embretson
Affiliation:
Georgia Institute of Technology
Sophia Rabe-Hesketh
Affiliation:
University of California, Berkeley and Institute of Education, University of London
*
Requests for reprints should be sent to Sun-Joo Cho, Vanderbilt University, Nashville, USA. E-mail: [email protected]

Abstract

An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716723CrossRefGoogle Scholar
Albers, W., Does, R.J.M.M., Imbos, T., Janssen, M.P.E. (1989). A stochastic growth model applied to repeated test of academic knowledge. Psychometrika, 54, 451466CrossRefGoogle Scholar
Baker, F.B., Kim, S.-H. (2004). Item response theory: parameter estimation techniques, (2nd ed.). New York: DekkerCrossRefGoogle Scholar
Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In Frederiksen, N., Mislevy, R.J., Bejar, I.I. (Eds.), Test theory for a new generation of tests, Hillsdale: Erlbaum 323359Google Scholar
Bejar, I.I. (2012). Item generation: implications for a validity argument. In Gierl, M., Haladyna, T. (Eds.), Automatic item generation, New York: Taylor & FrancisGoogle Scholar
Bejar, I.I., Lawless, R.R., Morley, M.E., Wagner, M.E., Bennett, R.E., Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2, 328Google Scholar
Bellio, R., Brazzale, A.R. (2011). Restricted likelihood inference for generalized linear models. Statistics and Computing, 21, 173183CrossRefGoogle Scholar
Birnbaum, A. (1968). Test scores, sufficient statistics, and the information structures of tests. In Lord, L., Novick, M. (Eds.), Statistical theories of mental test scores, Reading: Addison-Wesley 425435Google Scholar
Bock, R.D., Schilling, S.G. (1997). High-dimensional full-information item factor analysis. In Berkane, M. (Eds.), Latent variable modelling and applications to causality, New York: Springer 164176Google Scholar
Bormuth, J.R. (1970). On the theory of achievement test items, Chicago: University of Chicago PressGoogle Scholar
Bradlow, E.T., Wainer, H., Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153168CrossRefGoogle Scholar
Breslow, N.E., Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 925CrossRefGoogle Scholar
Breslow, N.E., Lin, X. (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika, 82, 8191CrossRefGoogle Scholar
Breslow, N.E. (2004). Whither PQL?. In Lin, D.Y., Heagerty, P.J. (Eds.), Proceedings of the second seattle symposium in biostatistics: analysis of correlated data, New York: Springer 122Google Scholar
Browne, W.J., Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1, 473514CrossRefGoogle Scholar
Chaimongkol, S., Huffer, F.W., Kamata, A. (2006). A Bayesian approach for fitting a random effect differential item functioning across group units. Thailand Statistician, 4, 2741Google Scholar
Cho, S.-J., Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 1225CrossRefGoogle Scholar
Cho, S.-J., Suh, Y. (2012). Bayesian analysis of item response models using WinBUGS 1.4.3. Applied Psychological Measurement, 36, 147148CrossRefGoogle Scholar
Cho, S.-J., Athay, M., Preacher, K.J. (2013). Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model. British Journal of Mathematical & Statistical Psychology, 66, 353381CrossRefGoogle ScholarPubMed
Cho, S.-J., Gilbert, J.K., Goodwin, A.P. (2013). Explanatory multidimensional multilevel random item response model: an application to simultaneous investigation of word and person contributions to multidimensional lexical quality. Psychometrika, 78, 830855CrossRefGoogle Scholar
Clayton, D.G., Rasbash, J. (1999). Estimation in large crossed random-effect models by data augmentation. Journal of the Royal Statistical Society Series A, 162, 425436CrossRefGoogle Scholar
Daniel, R.C., Embretson, S.E. (2010). Designing cognitive complexity in mathematical problem-solving items. Applied Psychological Measurement, 34, 348364CrossRefGoogle Scholar
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533559CrossRefGoogle Scholar
De Jong, M.G., Steenkamp, J.B.E.M., Fox, J.-P. (2007). Relaxing cross-national measurement invariance using a hierarchical IRT model. Journal of Consumer Research, 34, 260278CrossRefGoogle Scholar
De Jong, M.G., Steenkamp, J.B.E.M., Fox, J.-P., Baumgartner, H. (2008). Using item response theory to measure extreme response style in marketing research: a global investigation. Journal of Marketing Research, 45, 104115CrossRefGoogle Scholar
De Jong, M.G., Steenkamp, J.B.E.M. (2010). Finite mixture multilevel multidimensional ordinal IRT models for large-scale cross-cultural research. Psychometrika, 75, 332CrossRefGoogle Scholar
Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: application to abstract reasoning. Psychological Methods, 3, 300396CrossRefGoogle Scholar
Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407433CrossRefGoogle Scholar
Embretson, S.E. (2010). Cognitive design systems: a structural modelling approach applied to developing a spatial abtiliy test. In Embretson, S.E. (Eds.), Measuring psychological constructs: advances in model-based approaches, Washington: American Psychological Association 247273CrossRefGoogle Scholar
Embretson, S.E., Daniel, R.C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem solving items. Psychology Science Quarterly, 50, 328344Google Scholar
Embretson, S.E., Gorin, J.S. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38, 343368CrossRefGoogle Scholar
Embretson, S.E., Yang, X. (2007). Automatic item generation and cognitive psychology. In Rao, C.R., Sinharay, S. (Eds.), Handbook of statistics: psychometrics, North Holland: Elsevier 747768Google Scholar
Fischer, G.H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359374CrossRefGoogle Scholar
Fox, J.-.P. (2010). Bayesian item response modeling, New York: SpringerCrossRefGoogle Scholar
Frederickx, S., Tuerlinckx, F., De Boeck, P., Magis, D. (2010). RIM: a random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432457CrossRefGoogle Scholar
Freund, Ph.A., Hofer, S., Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195210CrossRefGoogle Scholar
Geerlings, H., Glas, C.A.W., van der Linden, W.J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337359CrossRefGoogle Scholar
Gierl, M., Haladyna, T. (2012). Automatic item generation, New York: Taylor & FrancisCrossRefGoogle ScholarPubMed
Gierl, M., Lai, H. (2012). Using weak and strong theory to create item models for automatic item generation: some practical guidelines with examples. In Gierl, M., Haladyna, T. (Eds.), Automatic item generation, New York: Taylor & FrancisCrossRefGoogle ScholarPubMed
Gierl, M.J., Zhou, J., Alves, C.B. (2008). Developing a taxonomy of item model types to promote assessment engineering. The Journal of Technology, Learning, and Assessment, 7, 151Google Scholar
Glas, C.A.W., van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247261CrossRefGoogle Scholar
Goldstein, H., Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 159, 505513CrossRefGoogle Scholar
Goldstein, H. (1991). Nonlinear multilevel models, with an application to discrete response data. Biometrika, 78, 4551CrossRefGoogle Scholar
Gorin, J. (2005). Manipulating processing difficulty of reading comprehension questions: the feasibility of verbal item generation. Journal of Educational Measurement, 42, 351373CrossRefGoogle Scholar
Gurieroux, C., Holly, A., Monfort, A. (1982). Likelihood ratio test, Wald test, and Kuhn–Tucker test in linear models with inequality constraints on the regression parameters on the regression parameters. Econometrica, 50, 6380CrossRefGoogle Scholar
Holling, H., Bertling, J.P., Zeuch, N. (2009). Probability word problems: automatic item generation and LLTM modelling. Studies in Educational Evaluation, 35, 7176CrossRefGoogle Scholar
Irvine, S.H., Kyllonen, P. (2002). Item generation for test development, Mahwah: ErlbaumGoogle Scholar
Janssen, R., Tuerlinckx, F., Meulders, M., De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285306CrossRefGoogle Scholar
Janssen, R., Schepers, J., Perez, D. (2004). Models with item and item group predictors. In De Boeck, P., Wilson, M. (Eds.), Explanatory item response models: a generalized linear and nonlinear approach, New York: Springer 189212CrossRefGoogle Scholar
Joe, H. (2008). Accuracy of Laplace approximation for discrete response mixed models. Computational Statistics & Data Analysis, 52, 50665074CrossRefGoogle Scholar
Johnson, M.S., Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369400CrossRefGoogle Scholar
Karim, M.R., Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics, 48, 631644CrossRefGoogle ScholarPubMed
Klein Entink, R.H., Fox, J.-P., van der Linden, W.J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 2148CrossRefGoogle Scholar
Klein Entink, R.H., Kuhn, J.-T., Hornke, L.F., Fox, J.-P. (2009). Evaluating cognitive theory: a joint modeling approach using responses and response times. Psychological Methods, 14, 5475CrossRefGoogle ScholarPubMed
Koehler, E., Brown, E., Haneuse, S. (2009). On the assessment of Monte Carlo error in simulation-based statistical analyses. American Statistician, 63, 155162CrossRefGoogle ScholarPubMed
Lee, Y., Nelder, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series B, 58, 619678CrossRefGoogle Scholar
Lee, Y., Nelder, J.A. (2006). Double-hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society Series C, 55, 129CrossRefGoogle Scholar
Lin, X., Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association, 91, 10071016CrossRefGoogle Scholar
McGilchrist, C.A. (1994). Estimation in generalized mixed models. Journal of the Royal Statistical Society Series B, 56, 6169CrossRefGoogle Scholar
Millman, J., Westman, R.S. (1989). Computer assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177190CrossRefGoogle Scholar
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177195CrossRefGoogle Scholar
Mislevy, R.J. (1988). Exploiting auxiliary information about items in the estimation of Rasch item difficulty parameters. Applied Psychological Measurement, 12, 281296CrossRefGoogle Scholar
Natarajan, R., Kass, R.E. (2000). Reference Bayesian methods for generalized linear mixed model. Journal of the American Statistical Association, 95, 227237CrossRefGoogle Scholar
Noh, M., Lee, Y. (2007). REML estimation for binary data in GLMMs. Journal of Multivariate Analysis, 98, 896915CrossRefGoogle Scholar
Patterson, H.D., Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545554CrossRefGoogle Scholar
Pinheiro, J.C., Bates, D.M. (1995). Approximation to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational Graphics and Statistics, 4, 1235CrossRefGoogle Scholar
Rabe-Hesketh, S., Skrondal, A., Pickles, A. (2004). Generalized multilevel structural equation modelling. Psychometrika, 69, 167190CrossRefGoogle Scholar
Rabe-Hesketh, S., Skrondal, A., Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301323CrossRefGoogle Scholar
Rabe-Hesketh, S., Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata, (3rd ed.). College Station: Stata PressGoogle Scholar
Rasbash, J., Browne, W.J. (2007). Non-hierarchical multilevel models. In de Leeuw, J., Meijer, E. (Eds.), Handbook of multilevel analysis, New York: Springer 333336Google Scholar
Raudenbush, S.W., Yang, M., Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. Journal of Computational and Graphical Statistics, 9, 141157CrossRefGoogle Scholar
Rodriguez, G., Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society Series A, 158, 7389CrossRefGoogle Scholar
Rodriguez, G., Goldman, N. (2001). Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society Series A, 164, 339355CrossRefGoogle Scholar
Roid, G.H., Haladyna, T.M. (1982). Toward a technology of test-item writing, New York: AcademicGoogle Scholar
Schilling, S., Bock, R.D. (2005). High dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533555Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461464CrossRefGoogle Scholar
Scrams, D.J., Mislevy, R.J., Sheehan, K.M. (2002). An analysis of similarities in item functioning within antonym and analogy variant families (RR-02-13), Princeton: Educational Testing ServiceGoogle Scholar
Sinharay, S., Johnson, M.S., Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295313CrossRefGoogle Scholar
Snijders, T.A.B., Bosker, R.J. (1994). Modeled variance in two-level models. Sociological Methods & Research, 22, 342363CrossRefGoogle Scholar
Soares, T.M., Gonçalvez, F.B., Gamerman, D. (2009). An integrated Bayesian model for DIF analysis. Journal of Educational and Behavioral Statistics, 34, 348377CrossRefGoogle Scholar
Stram, D.O., Lee, J.W. (1994). Variance components testing in the longitudinal mixed effect model. Biometrics, 50, 11711177CrossRefGoogle Scholar
Stram, D.O., Lee, J.W. (1995). Correction to: variance components testing in the longitudinal mixed-effects model. Biometrics, 51, 1196Google Scholar
Tanner, M.A., Wong, W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528540CrossRefGoogle Scholar
Tierney, L., Kadane, J.B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81, 8286CrossRefGoogle Scholar
Vaida, F., Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika, 92, 351370CrossRefGoogle Scholar
van der Linden, W.J., Klein Entink, R.H., Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34, 327347CrossRefGoogle Scholar
Verbeke, G., Molenberghs, G. (2003). The use of score tests for inference on variance components. Biometrics, 59, 254262CrossRefGoogle ScholarPubMed
Wainer, H., Bradlow, E.T., Wang, X. (2007). Testlet response theory and its applications, New York: Cambridge University PressCrossRefGoogle Scholar