Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2025-01-05T15:34:58.041Z Has data issue: false hasContentIssue false

Modeling Rule-Based Item Generation

Published online by Cambridge University Press:  01 January 2025

Hanneke Geerlings*
Affiliation:
University of Twente
Cees A. W. Glas
Affiliation:
University of Twente
Wim J. van der Linden
Affiliation:
CTB/McGraw-Hill
*
Requests for reprints should be sent to Hanneke Geerlings, Department of Research Methodology, Measurement, and Data Analysis, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. E-mail: [email protected]

Abstract

An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.

Type
Original Paper
Copyright
Copyright © 2011 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albert, J.H. (1992). Bayesian estimation of normal-ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17, 261269.Google Scholar
Béguin, A.A., Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541562.CrossRefGoogle Scholar
Bormuth, J.R. (1970). On the theory of achievement test items, Chicago: University of Chicago Press.Google Scholar
Cho, S.-J., Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics and Data Analysis, 55, 1225.CrossRefGoogle Scholar
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533559.CrossRefGoogle Scholar
De Boeck, P., Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach, New York: Springer.CrossRefGoogle Scholar
Embretson, S.E. (1999). Generating items during testing: psychometric issues and models. Psychometrika, 64, 407433.CrossRefGoogle Scholar
Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359374.CrossRefGoogle Scholar
Fox, J.-P. (2004). Multilevel IRT model assessment. In van der Ark, L.A., Croon, M.A., Sijtsma, K. (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 227252). London: Lawrence Erlbaum Associates.Google Scholar
Fox, J.-P., Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271288.CrossRefGoogle Scholar
Freund, P.A., Hofer, S., Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195210.CrossRefGoogle Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2004). Bayesian data analysis, New York: Chapman & Hall.Google Scholar
Gelman, A., Pardoe, I. (2006). Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics, 48, 241251.CrossRefGoogle Scholar
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bernardo, J.M., Berger, J., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 169193). Oxford: Oxford University Press.CrossRefGoogle Scholar
Glas, C.A.W. (2010). Item parameter estimation and item fit analysis. In van der Linden, W.J., Glas, C.A.W. (Eds.), Elements of adaptive testing (pp. 269288). New York: Springer.Google Scholar
Glas, C.A.W., & van der Linden, W.J. (2001). Modeling variability in item parameters in item response models (Research Report 01-11). Enschede, The Netherlands: Department of Educational Measurement and Data Analysis, University of Twente.Google Scholar
Glas, C.A.W., van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247261.CrossRefGoogle Scholar
Glas, C.A.W., van der Linden, W.J., Geerlings, H. (2010). Estimation of the parameters in an item-cloning model for adaptive testing. In van der Linden, W.J., Glas, C.A.W. (Eds.), Elements of adaptive testing (pp. 289314). New York: Springer.Google Scholar
Griffiths, W.E., Valenzuela, M.R. (2006). Gibbs samplers for a set of seemingly unrelated regressions. Australian and New Zealand Journal of Statistics, 48, 335351.CrossRefGoogle Scholar
Heidelberger, P., Welch, P.D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31, 11091144.CrossRefGoogle Scholar
Hively, W., Patterson, H.L., Page, S.H. (1968). A “universe-defined” system of arithmetic achievement items. Journal of Educational Measurement, 5, 275290.CrossRefGoogle Scholar
Holling, H., Bertling, J.P., Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 7176.CrossRefGoogle Scholar
Irvine, S.H. (2002). The foundations of item generation for mass testing. In Irvine, S.H., Kyllonen, P.C. (Eds.), Item generation for test development (pp. 334). Mahwah: Lawrence Erlbaum Associates.Google Scholar
Janssen, R., Schepers, J., Peres, D. (2004). Models with item and item group predictors. In De Boeck, P., Wilson, M. (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189212). New York: Springer.CrossRefGoogle Scholar
Janssen, R., Tuerlinckx, F., Meulders, M., De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285306.CrossRefGoogle Scholar
Johnson, V.E., Albert, J.H. (1999). Ordinal data modeling, New York: Springer.CrossRefGoogle Scholar
Laros, J.A., Tellegen, P.J. (1991). Construction and validation of the SON-R 5,5-17, the Snijders-Oomen non-verbal intelligence test, Groningen: Wolters-Noordhoff.Google Scholar
Luecht, R.M. Adaptive computer-based tasks under an assessment engineering paradigm. Paper presented at the 2009 Graduate Management Admission Council Conference on Computerized Adaptive Testing, Minneapolis, Minnesota.Google Scholar
MacEachern, S.N., Berliner, L.M. (1994). Subsampling the Gibbs sampler. The American Statistician, 48, 188190.CrossRefGoogle Scholar
Millman, J., Westman, R.S. (1989). Computer-assisted writing of achievement test items: toward a future technology. Journal of Educational Measurement, 26, 177190.CrossRefGoogle Scholar
Mislevy, R.J., Levy, R. (2007). Bayesian psychometric modeling from an evidence-centered design perspective. In Rao, C.R., Sinharay, S. (Eds.), Handbook of statistics (pp. 839865). Amsterdam: Elsevier.Google Scholar
Osburn, H.G. (1968). Item sampling for achievement testing. Educational and Psychological Measurement, 28, 95104.CrossRefGoogle Scholar
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11. Available from http://CRAN.R-project.org/doc/Rnews/.Google Scholar
R Development Core Team (2009). R: A language and environment for statistical computing. Computer software manual. Vienna, Austria. Available from http://www.R-project.org.Google Scholar
Raftery, A.E., Lewis, S. (1992). How many iterations in the Gibbs sampler?. In Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian statistics 4: proceedings of the fourth Valencia international meeting (pp. 763773). Oxford: Oxford University Press.CrossRefGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Copenhagen: Danish Institute for Educational Research.Google Scholar
Rijmen, F., De Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271285.CrossRefGoogle Scholar
Rijmen, F., Tuerlinckx, F., De Boeck, P., Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185205.CrossRefGoogle ScholarPubMed
Roid, G., Haladyna, T. (1982). A technology for test-item writing, New York: Academic Press.Google Scholar
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581592.CrossRefGoogle Scholar
Sinharay, S., Johnson, M.S., Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295313.CrossRefGoogle Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583639.CrossRefGoogle Scholar
Tanner, M.A. (1996). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions, New York: Springer.CrossRefGoogle Scholar
Tellegen, P.J., Laros, J.A. (1993). The construction and validation of a nonverbal test of intelligence: the revision of the Snijders-Oomen tests. European Journal of Psychological Assessment, 9, 147157.Google Scholar
van den Noortgate, W., De Boeck, P., Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369386.CrossRefGoogle Scholar
van der Linden, W.J., Glas, C.A.W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 3553.CrossRefGoogle Scholar
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests of aggregation bias. Journal of the American Statistical Association, 57, 348368.CrossRefGoogle Scholar