Hostname: page-component-5f745c7db-hj587 Total loading time: 0 Render date: 2025-01-07T05:08:23.887Z Has data issue: true hasContentIssue false

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Published online by Cambridge University Press:  01 January 2025

Laine Bradshaw*
Affiliation:
Department of Educational Psychology, The University of Georgia
Jonathan Templin
Affiliation:
Department of Educational Psychology, The University of Georgia
*
Requests for reprints should be sent to Laine Bradshaw, Department of Educational Psychology, The University of Georgia, 323 Aderhold Hall, Athens, GA 30602, USA. E-mail: [email protected]

Abstract

Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic Supplementary Material The online version of this article (doi:10.1007/s11336-013-9350-4) contains supplementary material, which is available to authorized users.

References

Agresti, A. (2002). Categorical data analysis, (2nd ed.). Hoboken: WileyCrossRefGoogle Scholar
Baker, F.B., Kim, S.-H. (2004). Item response theory: parameter estimation techniques, (2nd ed.). New York: DekkerCrossRefGoogle Scholar
Bell, A., Swan, M., Taylor, G. (1981). Choice of operation in verbal problems with decimal numbers. Educational Studies in Mathematics, 12, 399420CrossRefGoogle Scholar
Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 2951CrossRefGoogle Scholar
Bolt, D., Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395414CrossRefGoogle Scholar
Borovcnik, M., Bentz, H.J., Kapadia, R. (1991). A probabilistic perspective. In Kapadia, R., Borovcnik, M. (Eds.), Chance encounters: probability in education, Dordrecht: Kluwer 2733CrossRefGoogle Scholar
Borsboom, D., Mellenbergh, G. (2007). Test validity in cognitive diagnostic assessment. In Leighton, J.P., Gierl, M.J. (Eds.), Cognitive diagnostic assessment for education, Cambridge: Cambridge University Press 85115CrossRefGoogle Scholar
Bradshaw, L., & Cohen, A. (2010). Accuracy of multidimensional item response model parameters estimated under small sample sizes. Paper presented at the annual American Educational Research Association conference in Denver, CO. Google Scholar
Choi, H.-J. (2009). A diagnostic mixture classification model (unpublished doctoral dissertation). University of Georgia, Athens, GA. Google Scholar
Cizek, G.J., Bunch, M.B., Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement, Issues and Practice, 23(4), 3150CrossRefGoogle Scholar
Confrey, J. (1990). A review of the research on student conceptions in mathematics, science, and programming. In Cazden, C. (Eds.), Review of research in education, Washington: American Educational Research Association 356Google Scholar
de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163183CrossRefGoogle Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179199CrossRefGoogle Scholar
de la Torre, J., Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595624CrossRefGoogle Scholar
DeMars, C. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275288CrossRefGoogle Scholar
Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Natoros, B.M., Wage, K. (2003). Progress of concept inventory assessment tools. Proceedings of the 33rd ASEE/IEEE frontiers in education conference, TT4G1-T4G8CrossRefGoogle Scholar
Garfield, J., Chance, B. (2000). Assessment in statistics education: issues and challenges. Mathematical Thinking and Learning, 2(1&2), 99125CrossRefGoogle Scholar
Gelman, A., Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457511CrossRefGoogle Scholar
Gibbons, R.D., Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423436CrossRefGoogle Scholar
Haberman, S.J., Sinharay, S., Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical & Statistical Psychology, 62, 7995CrossRefGoogle ScholarPubMed
Hake, R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 6474CrossRefGoogle Scholar
Halloun, I., Hake, R. R., Mosca, E. P., & Hestenes, D. (1995). Force concept inventory (revised) (unpublished instrument). Retrieved from http://modeling.asu.edu/R&E/Research.html. Google Scholar
Henson, R., & Templin, J. (2004). Modifications of the Arpeggio algorithm to permit analysis of NAEP (unpublished manuscript). Google Scholar
Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the joint skill distribution (unpublished manuscript). Google Scholar
Henson, R., & Templin, J. (2008). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the annual meeting of the American Educational Research Association, New York. Google Scholar
Henson, R., Templin, J., Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191210CrossRefGoogle Scholar
Henson, R. A., Templin, J., & Willse, J. T. (2013, under review). Adapting diagnostic classification models to better fit the structure of existing large scale tests (manuscript under review). Google Scholar
Hestenes, D., Wells, M., Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141151CrossRefGoogle Scholar
Huff, K., Goodman, D.P. (2007). The demand for cognitive diagnostic assessment. In Leighton, J.P., Gierl, M.J. (Eds.), Cognitive diagnostic assessment for education: theory and applications, London: Cambridge University Press 1960CrossRefGoogle Scholar
Jendraszek, P. (2008). Misconceptions of probability among future mathematics teachers: a study of certain influences and notions that could interfere with understanding the often counterintuitive principles of probability, Saarbrucken: VDM Verlag Dr. MüllerGoogle Scholar
Khazanov, L. (2009). A diagnostic assessment for misconceptions in probability. Paper presented at the Georgia Perimeter College Mathematics Conference in Clarkston, GA. Google Scholar
Kunina-Habenicht, O., Rupp, A.A., Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 5981CrossRefGoogle Scholar
Lee, Y.-S., Park, Y.S., Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144177CrossRefGoogle Scholar
Leighton, J.P., Gierl, M.J. (2007). Cognitive diagnostic assessment for education: theory and practices, Cambridge: Cambridge University PressCrossRefGoogle Scholar
Luecht, R. (2013). Assessment engineering task model maps, task models, and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 138Google Scholar
Maydeu-Olivares, A., Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: a unified framework. Journal of the American Statistical Association, 100, 10091020CrossRefGoogle Scholar
Mulford, D.R., Robinson, W.R. (2002). An inventory for alternate conceptions among first-semester general chemistry students. Journal of Chemical Education, 79(6), 739751CrossRefGoogle Scholar
Muthén, L.K., Muthén, B.O.. Mplus user’s guide, 1998–2012(6th ed.). Los Angeles: Muthén & MuthénGoogle Scholar
National Council of Teachers of Mathematics (NCTM) (2001). Principles and standards for school mathematics, Reston: National Council of Teachers of MathematicsGoogle Scholar
National Research Council (2010). State assessment systems: exploring best practices and innovations: summary of two workshops. Alexandra Beatty, rapporteur. Committee on best practices for state assessment systems: improving assessment while revisiting standards. Center for Education, Division of Behavioral and Social Sciences and Education. Washington: The National Academies Press. Google Scholar
No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, 115 Stat/1449-1452 (2002). Google Scholar
Perie, M., Marion, S., Gong, B., Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: a policy brief, Washington: The Aspen Institute Education and Society ProgramAvailable at www.aspeninstitute.orgGoogle Scholar
Rupp, A.A., Templin, J. (2008). Unique characteristics of cognitive diagnosis models: a comprehensive review of the current state-of-the-art. Measurement, 6, 219262Google Scholar
Rupp, A.A., Templin, J., Henson, R. (2010). Diagnostic measurement: theory, methods, and applications, New York: GuilfordGoogle Scholar
Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 2653.0.CO;2-P>CrossRefGoogle Scholar
Sadler, P.M., Coyle, H., Miller, J.L., Cook-Smith, N., Dussault, M., Gould, R.R. (2010). The astronomy and space science concept inventory: development and validation of assessment instruments aligned with the K-12 national science standards. Astronomy Education Review, 8Google Scholar
Sinharay, S., Haberman, S.J., Punhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement, Issues and Practice, 26(4), 2128CrossRefGoogle Scholar
Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: multilevel, longitudinal and structural equation models, Boca Raton: Chapman & Hall/CRCCrossRefGoogle Scholar
Smith, J.P., diSessa, A.A., Roschelle, J. (1993). Misconceptions reconceived: a constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115163CrossRefGoogle Scholar
Spiegelhalter, C.P., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583640CrossRefGoogle Scholar
Tate, R.L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89112CrossRefGoogle Scholar
Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 5573CrossRefGoogle Scholar
Tatsuoka, K.K. (1990). Toward an integration of item-response theory and cognitive error diagnoses. In Frederiksen, N., Glaser, R.L., Lesgold, A.M., Shafto, M.G. (Eds.), Diagnostic monitoring of skill and knowledge acquisition, Hillsdale: ErlbaumGoogle Scholar
Templin, J., Bradshaw, L. (2013). The comparative reliability of diagnostic model examinee estimates. Journal of Classification, 30(2), 251275CrossRefGoogle Scholar
Templin, J., & Bradshaw, L. (2013, under review). Diagnostic models for nominal response data (manuscript under review). Google Scholar
Templin, J.L., Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287305CrossRefGoogle ScholarPubMed
Templin, J., & Hoffman, L. (2013, in press). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice. CrossRefGoogle Scholar
Thissen, D., Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika, 49, 501519CrossRefGoogle Scholar
van der Linden, W.J., Hambleton, R.K. (1997). Item response theory: brief history, common models, and extensions. In van der Linden, W.J., Hambleton, R.K. (Eds.), Handbook of modern item response theory, New York: SpringerCrossRefGoogle Scholar
von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service. Google Scholar
Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B. III, Rosa, K., Nelson, L. et al. (2001). Augmented scores—“borrowing strength” to compute score based on small numbers of items. In Thissen, D., Wainer, H. et al. (Eds.), Test scoring, Mahwah: Erlbaum 343387Google Scholar
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (RR-08-27). Princeton, NJ: Educational Testing Service. Google Scholar
Supplementary material: File

Bradshaw and Templin supplementary material

Bradshaw and Templin supplementary material
Download Bradshaw and Templin supplementary material(File)
File 26.8 KB