Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2025-01-05T15:32:16.891Z Has data issue: false hasContentIssue false

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Published online by Cambridge University Press:  01 January 2025

Ping Chen*
Affiliation:
Beijing Normal University
Chun Wang
Affiliation:
University of Minnesota
*
Correspondence should be made to Ping Chen, National Innovation Center for Assessment of Basic Education Quality, Beijing Normal University, No. 19, Xin Jie Kou Wai Street, Hai Dian District, Beijing 100875, China. Email: [email protected]

Abstract

Multidimensional-Method A (M-Method A) has been proposed as an efficient and effective online calibration method for multidimensional computerized adaptive testing (MCAT) (Chen & Xin, Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, The Netherlands, 2013). However, a key assumption of M-Method A is that it treats person parameter estimates as their true values, thus this method might yield erroneous item calibration when person parameter estimates contain non-ignorable measurement errors. To improve the performance of M-Method A, this paper proposes a new MCAT online calibration method, namely, the full functional MLE-M-Method A (FFMLE-M-Method A). This new method combines the full functional MLE (Jones & Jin in Psychometrika 59:59–75, 1994; Stefanski & Carroll in Annals of Statistics 13:1335–1351, 1985) with the original M-Method A in an effort to correct for the estimation error of ability vector that might otherwise adversely affect the precision of item calibration. Two correction schemes are also proposed when implementing the new method. A simulation study was conducted to show that the new method generated more accurate item parameter estimation than the original M-Method A in almost all conditions.

Type
Original Paper
Copyright
Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Both authors made equal contributions to the paper, and the order of authorship is alphabetical.

References

Adams, R., Wilson, M., & Wang, W. -C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21 123CrossRefGoogle Scholar
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(2^{{\rm nd}}$$\end{document}Edition). New York: Dekker.CrossRefGoogle Scholar
Ban, J-CHanson, B. H., Wang, T. Y., Yi, Q., & Harris, D. J., (2001). A comparative study of on—line pretest item-calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement., 38 191212CrossRefGoogle Scholar
Ban, J.-C, Hanson, B. H., Yi, Q., & Harris, D. J. (2002). Data sparseness and online pretest item calibration/scaling methods in CAT (ACT Research Report 02-01). Iowa City, IA, ACT, Inc. Available at http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/19/da/e9.pdfGoogle Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R., Novick (Eds.), Statistical theories of mental test scores (pp. 379479). Reading, MA: Addison-Welsey.Google Scholar
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nolinear models: A modern perspective (2nd edn). London: Chapman and Hall.CrossRefGoogle Scholar
Chang, H. H., & Stout, W., (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58 3752CrossRefGoogle Scholar
Chen, P., Xin, T., Wang, C., & Chang, H. H., (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77 201222CrossRefGoogle Scholar
Chen, P., & Xin, T. (2013). Developing online calibration methods for multidimensional computerized adaptive testing. Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, the Netherlands, July.Google Scholar
Cheng, Y., & Yuan, K., (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75 280291CrossRefGoogle ScholarPubMed
Debeer, D., Buchholz, J., Hartig, J., & Janssen, R., (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39 502523CrossRefGoogle Scholar
Debeer, D., & Janssen, R., (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50 164185CrossRefGoogle Scholar
Eggen, T.J.H.M., & Verhelst, N. D., (2011). Item calibration in incomplete testing designs. Psicologica, 32 107132Google Scholar
Folk, V. G., & Golub-Smith, M. (1996). Calibration of on-line pretest data using BILOG. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, April.Google Scholar
Haberman, S. J., von Davier, A. A., Yan, D. L., von Davier, A. A., & Lewis, C., (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. Computerized multistage testing: Theory and applications, Boca Raton, FLCRC Press 229248Google Scholar
Haberman, S. J., (2009). Linking parameter estimates derived from an item response model through separate calibrations. Research Report RR-09-40, Princeton, NJEducational Testing ServiceGoogle Scholar
Hartig, J., & Höhler, J., (2008). Representation of competencies in multidimensional IRT models with within-item and between-item multidimensionality. Journal of Psychology, 216 89101Google Scholar
Hecht, M., Weirich, S., Siegle, T., & Frey, A., (2015). Modeling booklet effects for nonequivalent group designs in large-scale assessment. Educational and Psychological Measurement,Google ScholarPubMed
Hsu, Y., Thompson, T. D., & Chen, W. (1998). CAT item calibration. Paper presented at the annual meeting of the National Council on Measuement in Education, San Diego, CA, April.Google Scholar
Jones, D. H., & Jin, Z. Y., (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59 5975CrossRefGoogle Scholar
Lehmann, E. L., & Casella, G. C. (1998). Theory of point estimation (2nd edn). New York: Springer.Google Scholar
Lien, D-HD, (1985). Moments of truncated bivariate log-normal distributions. Economics Letters, 19 243247CrossRefGoogle Scholar
Lord, F. M., (1971). Tailored testing, an application of stochastic approximation. Journal of the American Statistical Association, 66 707711CrossRefGoogle Scholar
Mislevy, R. J., (1986). Bayes modal estimation in item response models. Psychometrika, 51 177195CrossRefGoogle Scholar
Mislevy, R. J., & Chang, H., (2000). Does adaptive testing violate local independence?. Psychometrika, 65 149156CrossRefGoogle Scholar
Mulder, J., & van der Linden, W. J., (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74 273296CrossRefGoogle ScholarPubMed
Newman, M.E.J., & Barkema, G. T., Monte Carlo methods in statistical physics, (1999).OxfordClarendon PressCrossRefGoogle Scholar
Parshall, C. G. (1998). Item development and pretesting in a computer-based testing environment. Paper presented at the colloquium Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia, PA, September.Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes: the art of scientific computing (3rd edn.). New York: Cambridge University Press.Google Scholar
Reckase, M. D., Multidimensional item response theory, (2009).New York, USASpringerCrossRefGoogle Scholar
Segall, D. O., (1996). Multidimensional adaptive testing. Psychometrika, 61 331354CrossRefGoogle Scholar
Segall, D. O., (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66 7997CrossRefGoogle Scholar
Segall, D. O. (2003). Calibrating CAT pools and online pretest items using MCMC methods. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago, IL, April.Google Scholar
Stefanski, L. A., & Carroll, R. J., (1985). Covariate measurement error in logistic regression. Annals of Statistics, 13 13351351CrossRefGoogle Scholar
Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.CrossRefGoogle Scholar
van der Linden, W. J., & Ren, H. (2014). Optimal Bayesian adaptive design for test-item calibration. Psychometrika. doi:10.1007/s11336-013-9391-8.CrossRefGoogle Scholar
Wainer, H., Mislevy, R. J., & Wainer, H., (1990). Item response theory, item calibration, and proficiency estimation. Computerized adaptive testing: A primer, Hillsdale, NJErlbaum 65102Google Scholar
Wang, C. (2014a). On latent trait estimation in multidimensional compensatory item response models. Psychometrika. doi:10.1007/s11336-013-9399-0.CrossRefGoogle Scholar
Wang, C., Improving measurement precision of hierarchical latent traits using adaptive testing. (2014). Journal of Educational and Behavioral Statistics, 39 452477CrossRefGoogle Scholar
Wang, C., & Chang, H. H., (2011). Item selection in multidimensional computerized adaptive testing—gaining information from different angles. Psychometrika, 76 363384CrossRefGoogle Scholar
Wang, C., & Chang, H. H. (2012). Reducing bias in MIRT trait estimation. Paper presented at the annual meeting of National Council on Measurement in Education, Vancouver, Canada, April.Google Scholar
Wang, C., Chang, H. H., & Boughton, K. A., (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 1339CrossRefGoogle Scholar
Wang, C., Chang, H. H., & Boughton, K. A., (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37 99122CrossRefGoogle Scholar
Yao, L. H., (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37 323CrossRefGoogle Scholar
Yao, L. H., Pommerich, M., & Segall, D. O., (2014). Using multidimensional CAT to administer a short, yet precise, screening test. Applied Psychological Measurement, 38 614631CrossRefGoogle Scholar