Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Laine Bradshaw; Jonathan Templin

doi:10.1007/s11336-013-9350-4

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Published online by Cambridge University Press: 01 January 2025

Laine Bradshaw and

Jonathan Templin

Show author details

Laine Bradshaw*: Affiliation:
Department of Educational Psychology, The University of Georgia
Jonathan Templin: Affiliation:
Department of Educational Psychology, The University of Georgia
*: Requests for reprints should be sent to Laine Bradshaw, Department of Educational Psychology, The University of Georgia, 323 Aderhold Hall, Athens, GA 30602, USA. E-mail: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.

Keywords

diagnostic classification models item response theory diagnosing student misconceptions multidimensional measurement model nominal response

Type: Original Paper
Information: Psychometrika , Volume 79 , Issue 3 , July 2014 , pp. 403 - 425

DOI: https://doi.org/10.1007/s11336-013-9350-4 [Opens in a new window]
Copyright: Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Electronic Supplementary Material The online version of this article (doi:10.1007/s11336-013-9350-4) contains supplementary material, which is available to authorized users.

References

Agresti, A. (2002). Categorical data analysis, (2nd ed.). Hoboken: WileyCrossRef Google Scholar

Baker, F.B., Kim, S.-H. (2004). Item response theory: parameter estimation techniques, (2nd ed.). New York: DekkerCrossRef Google Scholar

Bell, A., Swan, M., Taylor, G. (1981). Choice of operation in verbal problems with decimal numbers. Educational Studies in Mathematics, 12, 399–420CrossRef Google Scholar

Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 29–51CrossRef Google Scholar

Bolt, D., Lall, V.F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27, 395–414CrossRef Google Scholar

Borovcnik, M., Bentz, H.J., Kapadia, R. (1991). A probabilistic perspective. In Kapadia, R., Borovcnik, M. (Eds.), Chance encounters: probability in education, Dordrecht: Kluwer 2733CrossRef Google Scholar

Borsboom, D., Mellenbergh, G. (2007). Test validity in cognitive diagnostic assessment. In Leighton, J.P., Gierl, M.J. (Eds.), Cognitive diagnostic assessment for education, Cambridge: Cambridge University Press 85115CrossRef Google Scholar

Bradshaw, L., & Cohen, A. (2010). Accuracy of multidimensional item response model parameters estimated under small sample sizes. Paper presented at the annual American Educational Research Association conference in Denver, CO. Google Scholar

Choi, H.-J. (2009). A diagnostic mixture classification model (unpublished doctoral dissertation). University of Georgia, Athens, GA. Google Scholar

Cizek, G.J., Bunch, M.B., Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement, Issues and Practice, 23(4), 3150CrossRef Google Scholar

Confrey, J. (1990). A review of the research on student conceptions in mathematics, science, and programming. In Cazden, C. (Eds.), Review of research in education, Washington: American Educational Research Association 356Google Scholar

de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163183CrossRef Google Scholar

de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199CrossRef Google Scholar

de la Torre, J., Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595–624CrossRef Google Scholar

DeMars, C. (2003). Sample size and the recovery of nominal response model item parameters. Applied Psychological Measurement, 27, 275–288CrossRef Google Scholar

Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Natoros, B.M., Wage, K. (2003). Progress of concept inventory assessment tools. Proceedings of the 33rd ASEE/IEEE frontiers in education conference, TT4G1-T4G8CrossRef Google Scholar

Garfield, J., Chance, B. (2000). Assessment in statistics education: issues and challenges. Mathematical Thinking and Learning, 2(1&2), 99125CrossRef Google Scholar

Gelman, A., Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511CrossRef Google Scholar

Gibbons, R.D., Hedeker, D. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436CrossRef Google Scholar

Haberman, S.J., Sinharay, S., Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical & Statistical Psychology, 62, 79–95CrossRef Google Scholar PubMed

Hake, R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74CrossRef Google Scholar

Halloun, I., Hake, R. R., Mosca, E. P., & Hestenes, D. (1995). Force concept inventory (revised) (unpublished instrument). Retrieved from http://modeling.asu.edu/R&E/Research.html. Google Scholar

Henson, R., & Templin, J. (2004). Modifications of the Arpeggio algorithm to permit analysis of NAEP (unpublished manuscript). Google Scholar

Henson, R., & Templin, J. (2005). Hierarchical log-linear modeling of the joint skill distribution (unpublished manuscript). Google Scholar

Henson, R., & Templin, J. (2008). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the annual meeting of the American Educational Research Association, New York. Google Scholar

Henson, R., Templin, J., Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191–210CrossRef Google Scholar

Henson, R. A., Templin, J., & Willse, J. T. (2013, under review). Adapting diagnostic classification models to better fit the structure of existing large scale tests (manuscript under review). Google Scholar

Hestenes, D., Wells, M., Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–151CrossRef Google Scholar

Huff, K., Goodman, D.P. (2007). The demand for cognitive diagnostic assessment. In Leighton, J.P., Gierl, M.J. (Eds.), Cognitive diagnostic assessment for education: theory and applications, London: Cambridge University Press 1960CrossRef Google Scholar

Jendraszek, P. (2008). Misconceptions of probability among future mathematics teachers: a study of certain influences and notions that could interfere with understanding the often counterintuitive principles of probability, Saarbrucken: VDM Verlag Dr. MüllerGoogle Scholar

Khazanov, L. (2009). A diagnostic assessment for misconceptions in probability. Paper presented at the Georgia Perimeter College Mathematics Conference in Clarkston, GA. Google Scholar

Kunina-Habenicht, O., Rupp, A.A., Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81CrossRef Google Scholar

Lee, Y.-S., Park, Y.S., Taylan, D. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144177CrossRef Google Scholar

Leighton, J.P., Gierl, M.J. (2007). Cognitive diagnostic assessment for education: theory and practices, Cambridge: Cambridge University PressCrossRef Google Scholar

Luecht, R. (2013). Assessment engineering task model maps, task models, and templates as a new way to develop and implement test specifications. Journal of Applied Testing Technology, 14, 1–38Google Scholar

Maydeu-Olivares, A., Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2ⁿ contingency tables: a unified framework. Journal of the American Statistical Association, 100, 1009–1020CrossRef Google Scholar

Mulford, D.R., Robinson, W.R. (2002). An inventory for alternate conceptions among first-semester general chemistry students. Journal of Chemical Education, 79(6), 739751CrossRef Google Scholar

Muthén, L.K., Muthén, B.O.. Mplus user’s guide, 1998–2012(6th ed.). Los Angeles: Muthén & MuthénGoogle Scholar

National Council of Teachers of Mathematics (NCTM) (2001). Principles and standards for school mathematics, Reston: National Council of Teachers of MathematicsGoogle Scholar

National Research Council (2010). State assessment systems: exploring best practices and innovations: summary of two workshops. Alexandra Beatty, rapporteur. Committee on best practices for state assessment systems: improving assessment while revisiting standards. Center for Education, Division of Behavioral and Social Sciences and Education. Washington: The National Academies Press. Google Scholar

No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, 115 Stat/1449-1452 (2002). Google Scholar

Perie, M., Marion, S., Gong, B., Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: a policy brief, Washington: The Aspen Institute Education and Society ProgramAvailable at www.aspeninstitute.orgGoogle Scholar

Rupp, A.A., Templin, J. (2008). Unique characteristics of cognitive diagnosis models: a comprehensive review of the current state-of-the-art. Measurement, 6, 219–262Google Scholar

Rupp, A.A., Templin, J., Henson, R. (2010). Diagnostic measurement: theory, methods, and applications, New York: GuilfordGoogle Scholar

Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 2653.0.CO;2-P>CrossRef Google Scholar

Sadler, P.M., Coyle, H., Miller, J.L., Cook-Smith, N., Dussault, M., Gould, R.R. (2010). The astronomy and space science concept inventory: development and validation of assessment instruments aligned with the K-12 national science standards. Astronomy Education Review, 8Google Scholar

Sinharay, S., Haberman, S.J., Punhan, G. (2007). Subscores based on classical test theory: to report or not to report. Educational Measurement, Issues and Practice, 26(4), 2128CrossRef Google Scholar

Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: multilevel, longitudinal and structural equation models, Boca Raton: Chapman & Hall/CRCCrossRef Google Scholar

Smith, J.P., diSessa, A.A., Roschelle, J. (1993). Misconceptions reconceived: a constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115163CrossRef Google Scholar

Spiegelhalter, C.P., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64(4), 583640CrossRef Google Scholar

Tate, R.L. (2004). Implications of multidimensionality for total score and subscore performance. Applied Measurement in Education, 17(2), 89112CrossRef Google Scholar

Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 5573CrossRef Google Scholar

Tatsuoka, K.K. (1990). Toward an integration of item-response theory and cognitive error diagnoses. In Frederiksen, N., Glaser, R.L., Lesgold, A.M., Shafto, M.G. (Eds.), Diagnostic monitoring of skill and knowledge acquisition, Hillsdale: ErlbaumGoogle Scholar

Templin, J., Bradshaw, L. (2013). The comparative reliability of diagnostic model examinee estimates. Journal of Classification, 30(2), 251275CrossRef Google Scholar

Templin, J., & Bradshaw, L. (2013, under review). Diagnostic models for nominal response data (manuscript under review). Google Scholar

Templin, J.L., Henson, R.A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305CrossRef Google Scholar PubMed

Templin, J., & Hoffman, L. (2013, in press). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice. CrossRef Google Scholar

Thissen, D., Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika, 49, 501–519CrossRef Google Scholar

van der Linden, W.J., Hambleton, R.K. (1997). Item response theory: brief history, common models, and extensions. In van der Linden, W.J., Hambleton, R.K. (Eds.), Handbook of modern item response theory, New York: SpringerCrossRef Google Scholar

von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: Educational Testing Service. Google Scholar

Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B. III, Rosa, K., Nelson, L. et al. (2001). Augmented scores—“borrowing strength” to compute score based on small numbers of items. In Thissen, D., Wainer, H. et al. (Eds.), Test scoring, Mahwah: Erlbaum 343387Google Scholar

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data (RR-08-27). Princeton, NJ: Educational Testing Service. Google Scholar

Bradshaw and Templin supplementary material

File 26.8 KB

Article contents

Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Bradshaw and Templin supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests