Computerized Adaptive Testing: The Capitalization on Chance Problem

Julio Olea; Juan Ramón Barrada; Francisco J. Abad; Vicente Ponsoda; Lara Cuevas

doi:10.5209/rev_SJOP.2012.v15.n1.37348

Computerized Adaptive Testing: The Capitalization on Chance Problem

Published online by Cambridge University Press: 10 January 2013

Julio Olea ,

Vicente Ponsoda and

Julio Olea*: Affiliation:
Universidad Autónoma de Madrid (Spain)
Juan Ramón Barrada: Affiliation:
Universidad de Zaragoza (Spain)
Francisco J. Abad: Affiliation:
Universidad Autónoma de Madrid (Spain)
Vicente Ponsoda: Affiliation:
Universidad Autónoma de Madrid (Spain)
Lara Cuevas: Affiliation:
Universidad Complutense de Madrid (Spain)
*: Correspondence concerning this article should be addressed to Julio Olea. Facultad de Psicología, Universidad Autónoma de Madrid. 28049-Madrid (Spain). Phone: +34 914975204. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of θ, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (θ). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed.

Se describen varios estudios de simulación para examinar los efectos de la capitalización del azar en la selección de items y la estimación de rasgo en Tests Adaptativos Informatizados (TAI), empleando el modelo logístico de 3 parámetros. Para generar diferentes errores de estimación de los parámetros de los ítems, se manipuló el tamaño de la muestra de calibración (N = 500, 1000 y 2000 sujetos), así como la ratio entre tamaño del banco y longitud del test (bancos de 197 y 788 ítems, longitudes del test de 20 y 40 ítems), ambos tanto en un TAI como en un test aleatorio. Los resultados muestran que la capitalización del azar es especialmente importante en el TAI, donde se obtuvo un sesgo positivo en las condiciones de escaso tamaño de la muestra. Para rangos amplios de θ, la sobrestimación de la precisión (Se asintótico) alcanza niveles del 40%, algo que no ocurre con los valores de RMSE (θ). El problema es mayor a medida que se incrementa la ratio entre el tamaño del banco de ítems y la longitud del test. Varias soluciones fueron puestas a prueba en un segundo estudio, donde se incorporaron dos métodos para el control de la exposición en los algoritmos de selección de los ítems. Se discuten también algunas soluciones alternativas.

Keywords

computerized adaptive testing capitalization on chance item parameter estimation.tests adaptativos informatizados capitalización del azar estimación de los parámetros de los ítems

Type: Research Article
Information: The Spanish Journal of Psychology , Volume 15 , Issue 1 , March 2012 , pp. 424 - 441

DOI: https://doi.org/10.5209/rev_SJOP.2012.v15.n1.37348 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abad, F. J., Olea, J., Aguado, D., Ponsoda, V., & Barrada, J. R. (2010). Deterioro de parámetros de los ítems en tests adaptativos informatizados: Estudio con eCAT. [Item parameter drift in computerized adaptive testing: Study with eCAT]. Psicothema, 22, 340–347.Google Scholar

Baker, F. B. (1992). Item Response Theory. Parameter estimation techniques. New York, NY: Marcel Dekker.Google Scholar

Barrada, J. R. (In press). Tests adaptativos informatizados: Una perspectiva general [Computerized Adaptive Testing: An overview]. Anales de Psicología.Google Scholar

Barrada, J. R., Abad, F. J., & Olea, J. (2011). Varying the valuating function and the presentable bank in computerized adaptive testing. The Spanish Journal of Psychology, 14, 500–508. http://dx.doi.org/10.5209/rev_SJOP.2011.v14.n1.45 CrossRef Google Scholar PubMed

Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness in the Fisher information for improving item-exposure control in CATs. British Journal of Mathematical and Statistical Psychology, 61, 493–513. http://dx.doi.org/10.1348/000711007X230937 CrossRef Google Scholar PubMed

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In Lord, F. M. & Novick, M. R., (1968). Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.Google Scholar

Dodd, B. G. (1990). The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model. Applied Psychological Measurement, 14, 355–366. http://dx.doi.org/10.1177/014662169001400403 CrossRef Google Scholar

Gao, F., & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the threeparameter logistic model. Applied Measurement in Education, 18, 351–380. http://dx.doi.org/10.1207/s15324818ame1804_2 CrossRef Google Scholar

Georgiadou, E., Triantafillou, E., & Economides, A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. Journal of Technology, Learning, and Assessment, 5. Retrieved from http://escholarship.bc.edu/ojs/index.php/jtla/article/viewFile/1647/1482 Google Scholar

Glas, C. A. W. (2005). The impact of item parameter estimation on CAT with item cloning. (Computerized Testing Report 02-06). Newtown, PA: Law School Admission Council.Google Scholar

Haley, S. M., Ni, P., Hambleton, R. K., Slavin, M. D., & Jette, A. M. (2006). Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank. Journal of Clinical Epidemiology, 59, 1174–1182. http://dx.doi.org/10.1016/j.jclinepi.2006.02.010 CrossRef Google Scholar

Hambleton, R. K., & Jones, R. W. (1994). Item parameter estimation errors and their influence on test information functions. Applied Measurement in Education, 7, 171–186. http://dx.doi.org/10.1207/s15324818ame0703_1 CrossRef Google Scholar

Hambleton, R. K., Jones, R. W., & Rogers, H. J. (1993). Influence of item parameter estimation errors in test development. Journal of Educational Measurement, 30, 143–155. http://dx.doi.org/10.1111/j.1745-3984.1993.tb01071.x CrossRef Google Scholar

Hambleton, R. K., Zaal, J. N., & Pieters, J. P. M. (1991). Computerized adaptive testing: Theory, applications, and standards. In Hambleton, R. K. & Zaal, J. N. (Eds.), Advances in educational and psychological testing. (pp. 341–366). Boston, MA: Kluwer.Google Scholar

Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.Google Scholar

Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two and three parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249–260. http://dx.doi.org/10.1177/014662168200600301 CrossRef Google Scholar

Leung, C. K., Chang, H. H., & Hau, K. T. (2005). Computerized adaptive testing: A mixture item selection approach for constrained situations. British Journal of Mathematical and Statistical Psychology, 58, 239–257. http://dx.doi.org/10.1348/000711005X62945 CrossRef Google Scholar PubMed

Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85–117. http://dx.doi.org/10.1111/j.1745-3984.2004.tb01109.x CrossRef Google Scholar

Li, Y. H., & Schafer, W. D. (2003 April). The effect of item selection methods on the variability of CAT's ability estimates when item parameters are contaminated with measurement errors. Paper presented at the National Council on Measurement in Education Convention, Chicago, IL.Google Scholar

Li, Y. H., & Schafer, W. D. (2005). Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests. Journal of Educational Measurement, 42, 245–269. http://dx.doi.org/10.1111/j.1745-3984.2005.00013.x CrossRef Google Scholar

Lord, F. M. (1977). A broad-range test of verbal ability. Applied Psychological Measurement, 1, 95–100. http://dx.doi.org/10.1177/014662167700100115 CrossRef Google Scholar

Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems. Hillsdale, NJ: LEA.Google Scholar

Luecht, R. M., De Champlain, A., & Nungester, R. J. (1998). Maintaining content validity in computerized adaptive testing. Advances in Health Sciences Education, 3, 29–41. http://dx.doi.org/10.1023/A:1009789314011 CrossRef Google Scholar PubMed

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–196. http://dx.doi.org/10.1007/BF02293979 CrossRef Google Scholar

Mislevy, R. J., & Bock, R. D. (1990). PC-BILOG 3: Item analysis and test scoring with binary logistic models (Computer Program). Mooresville, IN: Scientific Software.Google Scholar

Mislevy, R. J., Wingersky, M. S., & Seehan, K. M. (1994). Dealing with uncertainty about item parameters: Expected response functions. Research Report 94-28-ONR. Princeton, NJ: Education Testing Service.Google Scholar

Nicewander, W. A., & Tomasson, G. L. (1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 29, 239–247. http://dx.doi.org/10.1177/01466219922031356 CrossRef Google Scholar

Olea, J., Abad, F. J., Ponsoda, V. & Ximénez, M. C. (2004). Un test adaptativo informatizado para evaluar el conocimiento de inglés escrito: Diseño y comprobaciones psicométricas [A CAT for the assessment of written English: Design and psychometric properties]. Psicothema, 16, 519–525.Google Scholar

Owen, R. J. (1975). A bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356. http://dx.doi.org/10.2307/2285821 CrossRef Google Scholar

Ponsoda, V., & Olea, J. (2003). Adaptive and Tailored testing (including IRT and non-IRT application). In Fernández-Ballesteros, R. (Ed.), Encyclopaedia of Psychological Assessment (pp. 9–13). London, England: SAGE.Google Scholar

Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 35, 311–327. http://dx.doi.org/10.1111/j.1745-3984.1998.tb00541.x CrossRef Google Scholar

Sympson, J. B., & Hetter, R. D. (1985, October). Controlling itemexposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973–977). San Diego, CA: Navy Personnel Research and Development Center.Google Scholar

Swaminathan, H., Hambleton, R. K., Sireci, S. G., Xing, D., & Rizavi, S. M. (2003). Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates. Applied Psychological Measurement, 27, 27–51. http://dx.doi.org/10.1177/0146621602239475 CrossRef Google Scholar

Tsutakawa, R. K., & Johnson, J. C. (1990). The effect of uncertainty on item parameter estimation on ability estimates. Psychometrika, 55, 371–390. http://dx.doi.org/10.1007/BF02295293 CrossRef Google Scholar

van der Linden, W. J., & Glas, C. A. W. (2000). Capitalization on item calibration error in adaptive testing. Applied Measurement in Education, 13, 35–53. http://dx.doi.org/10.1207/s15324818ame1301_2 CrossRef Google Scholar

van der Linden, W. J., & Glas, C. A. W. (2001). Cross-validating item parameter estimation in computerize adaptive testing. In Boomsma, A., Duijn, M. A. J. van, & Snijders, T. A. M. (Eds.), Essays on Item Response Theory (pp. 205–219). New York, NY: Springer.CrossRef Google Scholar

Warm, T. A. (1989). Weighted likelihood estimation of ability in Item Response Theory. Psychometrika, 54, 427–450. http://dx.doi.org/10.1007/BF02294627 CrossRef Google Scholar

Willse, J. T. (2002). Controlling computer adaptive testing's capitalization on chance errors in item parameter estimates. (Unpublised doctoral dissertation). James Madison University, Harrisonburg, VA.Google Scholar

Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing. Psicológica, 21, 135–156.Google Scholar

Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.Google Scholar

Article contents

Computerized Adaptive Testing: The Capitalization on Chance Problem

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests