Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-23T12:51:45.210Z Has data issue: false hasContentIssue false

Fitting Logistic IRT Models: Small Wonder

Published online by Cambridge University Press:  10 April 2014

Miguel A. García-Pérez*
Affiliation:
Complutense University of Madrid
*
Correspondence concerning this article should be addressed to Dr. Miguel A. García-Pérez, Departamento de Metodología. Facultad de Psicología.Universidad Complutense. Campus de Somosaguas. 28223 Madrid (Spain). Phone: (+34) 91 394 3061. Fax: (+34) 91 394 3189. E-mail: [email protected]

Abstract

State-of-the-art item response theory (IRT) models use logistic functions exclusively as their item response functions (IRFs). Logistic functions meet the requirements that their range is the unit interval and that they are monotonically increasing, but they impose a parameter space whose dimensions can only be assigned a metaphorical interpretation in the context of testing. Applications of IRT models require obtaining the set of values for logistic function parameters that best fit an empirical data set. However, success in obtaining such set of values does not guarantee that the constructs they represent actually exist, for the adequacy of a model is not sustained by the possibility of estimating parameters. This article illustrates how mechanical adoption of off-the-shelf logistic functions as IRFs for IRT models can result in off-the-shelf parameter estimates and fits to data. The results of a simulation study are presented, which show that logistic IRT models can fit a set of data generated by IRFs other than logistic functions just as well as they fit logistic data, even though the response processes and parameter spaces involved in each case are substantially different. An explanation of why logistic functions work as they do is offered, the theoretical and practical consequences of their behavior are discussed, and a testable alternative to logistic IRFs is commented upon.

La función de respuesta al ítem (FRI) asumida en los modelos al uso en teoría de respuesta al ítem (TRI) es, en la práctica, exclusivamente la función logística. Las funciones logísticas cumplen los requisitos de que su rango es el intervalo [0, 1] y son monótonamente crecientes, pero imponen un espacio paramétrico cuyas dimensiones sólo tienen una interpretación metafórica en el contexto de la evaluación mediante pruebas objetivas. La aplicación de modelos TRI requiere la estimación de los parámetros logísticos que mejor describen unos datos empíricos. Sin embargo, el éxito en la obtención de estos parámetros no garantiza que los constructos representados mediante ellos existan en realidad, puesto que la validez de un modelo no queda establecida sólo por la posibilidad de estimar sus parámetros. Este trabajo muestra que la adopción mecánica de funciones logísticas como FRI en modelos TRI produce estimaciones y ajustes estereotipados. Como prueba, se presentan resultados de un estudio de simulación en el que el modelo logístico produjo un patrón de estimaciones y ajustes de datos no logísticos que fue indistinguible del patrón obtenido para datos logísticos, a pesar de que los datos no logísticos se generaron de acuerdo con un modelo que implica un proceso de respuesta y un espacio paramétrico marcadamente diferentes del logístico. El trabajo termina con unas reflexiones acerca de las razones por las que los modelos logísticos se comportan así y de las consecuencias teóricas y prácticas de ese comportamiento, y también se describe una alternativa empíricamente falsable a las FRI logísticas.

Type
Spanish research trends
Copyright
Copyright © Cambridge University Press 1999

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albanese, M.A. (1988). The projected impact of the correction for guessing on individual scores. Journal of Educational Measurement, 25, 149157.CrossRefGoogle Scholar
Ansley, T.N., & Forsyth, R.A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Applied Psychological Measurement, 9, 3748.CrossRefGoogle Scholar
Baker, F.B. (1987a). Methodology review: Item parameter estimation under the one-, two-, and three-parameter logistic models. Applied Psychological Measurement, 11, 111141.CrossRefGoogle Scholar
Baker, F.B. (1987b). Item parameter estimation via minimum logit chi-square. British Journal of Mathematical and Statistical Psychology, 40, 5060.CrossRefGoogle Scholar
Baker, F.B. (1991). Comparison of minimum logit chi-square and Bayesian item parameter estimation. British Journal of Mathematical and Statistical Psychology, 44, 299313.CrossRefGoogle Scholar
Baker, F.B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153169.CrossRefGoogle Scholar
Bejar, I.I. (1983). Introduction to item response models and their assumptions. In Hambleton, R.K. (Ed.), Applications of item response theory (pp. 123). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar
Blinkhorn, S.F. (1997). Past imperfect, future conditional: Fifty years of test theory. British Journal of Mathematical and Statistical Psychology, 50, 175185.CrossRefGoogle Scholar
Bliss, L.B. (1980). A test of Lord's assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students. Journal of Educational Measurement, 17, 147153.CrossRefGoogle Scholar
Cressie, N., & Read, T.R.C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Series B, 46, 440464.Google Scholar
Cross, L.H., & Frary, R.B. (1977). An empirical test of Lord's theoretical results regarding formula scoring of multiple choice tests. Journal of Educational Measurement, 14, 313321.CrossRefGoogle Scholar
De Ayala, R.J. (1992). The influence of dimensionality on CAT ability estimation. Educational and Psychological Measurement, 52, 513528.CrossRefGoogle Scholar
Dinero, T.E., & Haertel, E. (1977). Applicability of the Rasch model with varying item discriminations. Applied Psychological Measurement, 1, 581592.CrossRefGoogle Scholar
Drasgow, F., & Parsons, C.K. (1983). Application of unidimensional item response theory models to multidimensional data. Applied Psychological Measurement, 7, 189199.CrossRefGoogle Scholar
Forsyth, R., Saisangjan, U., & Gilmer, J. (1981). Some empirical results related to the robustness of the Rasch model. Applied Psychological Measurement, 5, 175186.CrossRefGoogle Scholar
Freedman, D.A. (1985). Statistics and the scientific method. In Mason, W.M. & Fienberg, S.E. (Eds.), Cohort analysis in social research: Beyond the identification problem (pp. 343366). New York: Springer-Verlag.CrossRefGoogle Scholar
García-Pérez, M.A. (1985). A finite state theory of performance in multiple-choice tests. In Terouanne, E. (Ed.), Proceedings of the 16th European mathematical psychology group meeting (pp. 5567). Montpellier: European Mathematical Psychology Group.Google Scholar
García-Pérez, M.A. (1987). A finite state theory of performance in multiple-choice tests. In Roskam, E.E. & Suck, R. (Eds.), Progress in mathematical psychology-I (pp. 455464). Amsterdam: Elsevier.Google Scholar
García-Pérez, M.A. (1989a). Item sampling, guessing, partial information and decision-making in achievement testing. In Roskam, E.E. (Ed.), Mathematical psychology in progress (pp. 249265). Berlin: Springer-Verlag.CrossRefGoogle Scholar
García-Pérez, M.A. (1989b). La corrección del azar en pruebas objetivas: un enfoque basado en una nueva teoría de estados finitos. Investigaciones Psicológicas, 6, 3362.Google Scholar
García-Pérez, M.A. (1990). A comparison of two models of performance in objective tests: Finite states versus continuous distributions. British Journal of Mathematical and Statistical Psychology, 43, 7391.CrossRefGoogle Scholar
García-Pérez, M.A. (1993). In defence of ‘none of the above.’ British Journal of Mathematical and Statistical Psychology, 46, 213229.CrossRefGoogle Scholar
García-Pérez, M.A. (1994). Parameter estimation and goodness-of-fit testing in multinomial models. British Journal of Mathematical and Statistical Psychology, 47, 247282.CrossRefGoogle Scholar
García-Pérez, M.A., & Frary, R.B. (1989). Psychometric properties of finite-state scores versus number-correct and formula scores: A simulation study. Applied Psychological Measurement, 13, 403417.CrossRefGoogle Scholar
García-Pérez, M.A., & Frary, R.B. (1991a). Finite state polynomic item characteristic curves. British Journal of Mathematical and Statistical Psychology, 44, 4573.CrossRefGoogle Scholar
García-Pérez, M.A., & Frary, R.B. (1991b). Testing finite state models of performance in objective tests using items with ‘none of the above’ as an option. In Doignon, J.-P. & Falmagne, J.-C. (Eds.), Mathematical psychology: Current developments (pp. 273291). New York: Springer-Verlag.CrossRefGoogle Scholar
Gifford, J.A., & Swaminathan, H. (1990). Bias and the effect of priors in Bayesian estimation of parameters of item response models. Applied Psychological Measurement, 14, 3343.CrossRefGoogle Scholar
Goldstein, H. (1979). Consequences of using the Rasch model for educational assessment. British Educational Research Journal, 5, 211220.CrossRefGoogle Scholar
Goldstein, H., & Wood, R. (1989). Five decades of item response modelling. British Journal of Mathematical and Statistical Psychology, 42, 139167.CrossRefGoogle Scholar
Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93107.CrossRefGoogle ScholarPubMed
Hambleton, R.K. (1983). Application of item response models to criterion-referenced assessment. Applied Psychological Measurement, 7, 3344.CrossRefGoogle Scholar
Hambleton, R.K., & Cook, L.L. (1983). Robustness of item response models and effects of test length and sample size on the precision of ability estimates. In Weiss, D.J. (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 3149). New York: Academic Press.Google Scholar
Hambleton, R.K., & Murray, L.N. (1983). Some goodness of fit investigations for item response models. In Hambleton, R.K. (Ed.), Applications of item response theory (pp. 7194). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar
Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer.CrossRefGoogle Scholar
Harrison, D.A. (1986). Robustness of IRT parameter estimation to violations of the unidimensionality assumption. Journal of Educational Statistics, 11, 91115.CrossRefGoogle Scholar
Harwell, M.R., & Janosky, J.E. (1991). An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG. Applied Psychological Measurement, 15, 279291.CrossRefGoogle Scholar
Hulin, C.L., Lissak, R.I., & Drasgow, F. (1982). Recovery of two-and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249260.CrossRefGoogle Scholar
Jannarone, R.J., Yu, K.F., & Laughlin, J.E. (1990). Easy Bayes estimation for Rasch-type models. Psychometrika, 55, 449460.CrossRefGoogle Scholar
Kim, J.K., & Nicewander, W.A. (1993). Ability estimation for conventional tests. Psychometrika, 58, 587599.CrossRefGoogle Scholar
Kim, S.-H., Cohen, A.S., Baker, F.B., Subkoviak, M.J., & Leonard, T. (1994). An investigation of hierarchical Bayes procedures in item response theory. Psychometrika, 59, 405421.CrossRefGoogle Scholar
Lord, F.M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39, 247264.CrossRefGoogle Scholar
Lord, F.M. (1975). The ‘ability’ scale in item characteristic curve theory. Psychometrika, 40, 205217.CrossRefGoogle Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
Lord, F.M. (1983). Maximum likelihood estimation of item response parameters when some responses are omitted. Psychometrika, 48, 477482.CrossRefGoogle Scholar
Lord, F.M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157162.CrossRefGoogle Scholar
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
Marascuillo, L.A. (1988). Introduction to model building and rank tests. Contemporary Psychology, 33, 794795.CrossRefGoogle Scholar
McKinley, R.L., & Mills, C.N. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9, 4957.CrossRefGoogle Scholar
Mislevy, R.J. (1987). Bayes modal estimation in item response models. Psychometrika, 51, 177195.CrossRefGoogle Scholar
Mislevy, R.J., & Bock, R.D. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42, 725737.CrossRefGoogle Scholar
Mislevy, R.J., & Bock, R.D. (1984). BILOG Version 2.2: Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software.Google Scholar
Mislevy, R.J., & Bock, R.D. (1986). PC-BILOG: Item analysis and test scoring with binary logistic models. 1986 edition. Mooresville, IN: Scientific Software.Google Scholar
Mislevy, R.J., & Stocking, M.L. (1989). A consumer's guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 5775.CrossRefGoogle Scholar
Mislevy, R.J., & Verhelst, N. (1987). Modeling item responses when different subjects employ different solution strategies. Research Report RR-87-47-ONR. Princeton, NJ: Educational Testing Service.Google Scholar
Ramsay, J.O., & Abrahamowicz, M. (1989). Binomial regression with monotone splines: A psychometric application. Journal of the American Statistical Association, 84, 906915.CrossRefGoogle Scholar
Reckase, M.D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4, 207230.CrossRefGoogle Scholar
Ree, M.J. (1979). Estimating item characteristic curves. Applied Psychological Measurement, 3, 371385.CrossRefGoogle Scholar
Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425435.CrossRefGoogle Scholar
Rowley, G.L., & Traub, R.E. (1977). Formula scoring, number-right scoring, and test-taking strategy. Journal of Educational Measurement, 14, 1522.CrossRefGoogle Scholar
Seong, T.-J. (1990). Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Applied Psychological Measurement, 14, 299311.CrossRefGoogle Scholar
Skaggs, G., & Stevenson, J. (1989). A comparison of pseudo-bayesian and joint maximum likelihood procedures for estimating item parameters in the three-parameter IRT model. Applied Psychological Measurement, 13, 391402.CrossRefGoogle Scholar
Slakter, M. (1968). The penalty for not guessing. Journal of Educational Measurement, 5, 141144.CrossRefGoogle Scholar
Swaminathan, H., & Gifford, J.A. (1983). Estimation of parameters in the three-parameter latent trait model. In Weiss, D.J. (Ed.). New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 1330). New York: Academic Press.Google Scholar
Swaminathan, H., & Gifford, J.A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589601.CrossRefGoogle Scholar
Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49, 501519.CrossRefGoogle Scholar
Traub, R.E. (1983). A priori considerations in choosing an item response model. In Hambleton, R.K. (Ed.), Applications of item response theory (pp. 5770). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar
Tsutakawa, R.K. (1992). Prior distribution for item response curves. British Journal of Mathematical and Statistical Psychology, 45, 5174.CrossRefGoogle Scholar
Tsutakawa, R.K., & Johnson, J.C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55, 371390.CrossRefGoogle Scholar
Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51, 251267.CrossRefGoogle Scholar
Tsutakawa, R.K., & Soltys, M.J. (1988). Approximation for Bayesian ability estimation. Journal of Educational Statistics, 13, 117130.CrossRefGoogle Scholar
Vale, C.D., & Gialluca, K.A. (1988). Evaluation of the efficiency of item calibration. Applied Psychological Measurement, 12, 5367.CrossRefGoogle Scholar
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339368.CrossRefGoogle Scholar
Wainer, H., & Wright, B.D. (1980). Robust estimation of ability in the Rasch model. Psychometrika, 45, 373391.CrossRefGoogle Scholar
Waller, M.I. (1989). Modeling guessing behavior: A comparison of two IRT models. Applied Psychological Measurement, 13, 233243.CrossRefGoogle Scholar
Wang, T., & Vispoel, W.P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35, 109135.CrossRefGoogle Scholar
Warm, A.W. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54, 427450.CrossRefGoogle Scholar
Weiss, D.J., & Yoes, M.E. (1991). Item response theory. In Hambleton, R.K. & Zaal, J.N. (Eds.), Advances in educational and psychological testing: Theory and applications (pp. 6995). Boston, MA: Kluwer.CrossRefGoogle Scholar
Weitzman, R.A. (1996). The Rasch model plus guessing. Educational and Psychological Measurement, 56, 779790.CrossRefGoogle Scholar
Wichmann, B.A., & Hill, I.D. (1982). Algorithm AS 183. An efficient and portable pseudo-random number generator. Applied Statistics, 31, 188190.CrossRefGoogle Scholar
Wingersky, M.S., Barton, M.A., & Lord, F.M. (1982). LOGIST 5.0 version 1.0 users' guide. Princeton, NJ: Educational Testing Service.Google Scholar
Wood, R. (1978). Fitting the Rasch model—A heady tale. British Journal of Mathematical and Statistical Psychology, 31, 2732.CrossRefGoogle Scholar
Yen, W.M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245262.CrossRefGoogle Scholar
Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125145.CrossRefGoogle Scholar
Yen, W.M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52, 275291.CrossRefGoogle Scholar
Yen, W.M., Burket, G.R., & Sykes, R.C. (1991). Nonunique solutions to the likelihood equation for the three-parameter logistic model. Psychometrika, 56, 3954.CrossRefGoogle Scholar
Zeng, L. (1997). Implementation of marginal Bayesian estimation with four-parameter beta prior distributions. Applied Psychological Measurement, 21, 143156.CrossRefGoogle Scholar
Zin, T.T. (1992). Comparing 12 finite state models of examinee performance on multiple-choice tests. Ph.D. Dissertation. Virginia Polytechnic Institute and State University.Google Scholar