Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Chun Wang; David J. Weiss; Zhuoran Shang

doi:10.1007/s11336-018-9644-7

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Published online by Cambridge University Press: 01 January 2025

Chun Wang

David J. Weiss and

Zhuoran Shang

Show author details

Chun Wang*: Affiliation:
University of Washington
David J. Weiss: Affiliation:
University of Minnesota
Zhuoran Shang: Affiliation:
University of Minnesota
*: Correspondence should be made to Chun Wang, Measurement and Statistics, College of Education, University of Washington, 312E Miller Hall, Box 353600, Seattle, WA 98195-3600, USA. Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test 2012. https://doi.org/10.7333/1212-0101001) proposed an “absolute change in theta” (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1–17, 2010) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.

Keywords

computerized adaptive testing stopping rules variable-length adaptive testing standard error information multidimensional models

Type: Original Paper
Information: Psychometrika , Volume 84 , Issue 3 , September 2019 , pp. 749 - 771

DOI: https://doi.org/10.1007/s11336-018-9644-7 [Opens in a new window]
Copyright: Copyright © 2018 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-018-9644-7) contains supplementary material, which is available to authorized users.

The R code and the real MGRM item parameters used in this paper are available online.

References

Anderson, T. W. (1984). An introduction to multivariate statistical analysis, 2 New York: Wiley.Google Scholar

Babcock, B., & Weiss, D. (2012). Termination criteria in computerized adaptive tests: Do variable-length CATs provide efficient and effective measurement? Journal of Computerized Adaptive Testing. https://doi.org/10.7333/1212-0101001 CrossRef Google Scholar

Boyd, A. M.Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. In Nering, M. L., & Ostini, R. Handbook of polytomous item response theory models, 229–255. New York NY: Routledge.Google Scholar

Cai, L. (2015). flexMIRT version 3: Flexible multilevel multidimensional item analysis and test scoring [Computer software], Chapel Hill, NC: Vector Psychometric Group.Google Scholar

Chang, H. H., &Ying, Z. L. To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, (2008). 73 (3), 441–450.CrossRef Google Scholar

Cheng, Y.Guo, F.Chang, H., & Douglas, J. (2009). Constraint weighted a-stratification for computerized adaptive testing with nonstatistical constraints: Balancing measurement efficiency and exposure control. Educational and Psychological Measurement, 69, 35–49.CrossRef Google Scholar

Choi, S. W.Grady, M. W., & Dodd, B. G. (2010). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 70, 1–17.Google Scholar PubMed

Daniel, M. H. (1999). Behind the scenes: Using new measurement methods on DAS and KAITEmbretson, S. E., & Hershberger, S. L. The new rules of measurement, Mahwah, NJ: Lawrence Erlbaum Associates. 37–63.Google Scholar

Dodd, B. G.Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129–143.CrossRef Google Scholar

Dodd, B. G.Koch, W. R., & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules. Educational and Psychological Measurement, 53, 61–77.CrossRef Google Scholar

Fayers, P. M. (2007). Applying item response theory and computer adaptive testing: The challenges for health outcomes assessment. Quality of Life Research, 16, 187–194.CrossRef Google Scholar PubMed

Gardner, W.Shear, K.Kelleher, K.Pajer, K.Mammen, O.Buysse, D.et.al (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry, 4 (13), 1–11.CrossRef Google Scholar PubMed

Gershon, R. C. (2017).FastCAT—Customizing CAT administration rules to increase response efficiency. Paper presented at the 6th international conference on computerized adaptive testing, Niigata, Japan.Google Scholar

Gibbons, R. D.Weiss, D. J.Kupfer, D. J.Frank, E.Fagiolini, A.Grochocinski, V. J.et.al Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatric Services, (2008). 59, 49–58.CrossRef Google Scholar PubMed

Hart, D. L.Cook, K. F.Mioduski, J. E.Teal, C. R., & Crane, P. K. (2006). Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. Journal of Clinical Epidemiology, 59, 290–298.CrossRef Google Scholar PubMed

Hart, D. L.Mioduski, J. E., & Stratford, P. W. (2005). Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. Journal of Clinical Epidemiology, 58, 629–638.CrossRef Google Scholar PubMed

Hsieh, C-Avon Eye, A. A.Maier, K. S. (2010). Using a multivariate multilevel polytomous item response theory model to study parallel processes of change: The dynamic association between adolescents’ social isolation and engagement with delinquent peers in the National Youth Survey. Multivariate Behavioral Research, 45 (3), 508–552.CrossRef Google Scholar PubMed

Jiang, S.Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology (Quantitative Psychology and Measurement). https://doi.org/10.3389/fpsyg.2016.00109 CrossRef Google Scholar

Lord, F. M.Novick, M. R. (1968). Statistical theories of mental test scores, Reading, MA: Addison-Wesley.Google Scholar

Makransky, G., & Glas, C. A. W. (2013). The applicability of multidimensional computerized adaptive testing for cognitive ability measurement in organizational assessment. International Journal of Testing, 13, 123–139.CrossRef Google Scholar

Maurelli, V., & Weiss, D. J. (1981). Factors influencing the psychometric characteristics of an adaptive testing strategy for test batteries (Research Rep. No. 81-4). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. Retrieved from https://eric.ed.gov/?id=ED212676.Google Scholar

Michel, P.Baumstarck, K.Ghattas, B.Pelletier, J.Loundou, A.Boucekine, M.et.al A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire developed and validated for multiple sclerosis. The MusiQoL-MCAT. Medicine, (2016). 95 (14), Article e3068.CrossRef Google Scholar PubMed

Mulder, J., & van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74 (2), 273–296.CrossRef Google Scholar PubMed

Nering, M. L.Ostini, R. (2010). Handbook of polytomous item response theory models, New York: Taylor and Francis.Google Scholar

Nikolaus, S.Bode, C.Taal, E.Vonkeman, H. E.Glas, C. A. W.van der Laar, M. A. F. J. (2015). Working mechanism of a multidimensional computerized adaptive test for fatigue in rheumatoid arthritis. Health Qual Life Outcomes, 13, 23.CrossRef Google Scholar PubMed

Samejima, F. (1969). Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph. No. 17.CrossRef Google Scholar

Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61 (2), 331–354.CrossRef Google Scholar

Thissen, D., & Mislevy, R. J. (2000). Wainer, H. Testing algorithms. Computerized adaptive testing: A primer. 2, Hillsdale, NJ: Lawrence Erlbaum. 101–133.Google Scholar

Veldkamp, B. P., & van der Linden, W. J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika 67 (4), 575–588.CrossRef Google Scholar

Wang, C. (2014). Improving measurement precision of hierarchical latent traits using adaptive testing. Journal of Educational and Behavioral Statistics, 39, 452–477.CrossRef Google Scholar

Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80, 428–449.CrossRef Google Scholar PubMed

Wang, C., & Chang, H. (2011). Item selection in multidimensional computerized adaptive tests: Gaining information from different angles. Psychometrika, 76, 363–384.CrossRef Google Scholar

Wang, C.Chang, H., & Boughton, K. (2011). Kullback–Leibler information and its applications in multidimensional adaptive tests. Psychometrika, 76, 13–39.CrossRef Google Scholar

Wang, C., & Chang, H.Boughton, K. Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, (2013). 37, 99–122.CrossRef Google Scholar

Wang, C.Chang, H., & Douglas, J. (2012). Combining CAT with cognitive diagnosis: A weighted item selection approach. Behavior Research Methods, 44, 95–109.CrossRef Google Scholar PubMed

Wang, C.Su, S., & Weiss, D. J. (2018). Robustness of parameter estimation to assumptions of normality in the multidimensional graded response model. Multivariate Behavioral Research, 53 (3), 403–418.CrossRef Google Scholar PubMed

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.CrossRef Google Scholar

Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2, 1–27.CrossRef Google Scholar

Wang et al. supplementary material

Wang et al. supplementary material 1

File 9 KB

Wang et al. supplementary material

Wang et al. supplementary material 2

File 26.6 KB

Article contents

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Wang et al. supplementary material

Wang et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests