Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-08T08:53:55.589Z Has data issue: false hasContentIssue false

Markov Chain Estimation for Test Theory Without An Answer Key

Published online by Cambridge University Press:  01 January 2025

George Karabatsos*
Affiliation:
University of Illinois, Chicago
William H. Batchelder
Affiliation:
University of California, Irvine
*
Requests for reprints should be sent to George Karabatsos, University of Illinois-Chicago, College of Education, 1040 W. Haxrison Street (MC 147), Chicago, IL 60607, E-Mail: [email protected]

Abstract

This study develops Markov Chain Monte Carlo (MCMC) estimation theory for the General Condorcet Model (GCM), an item response model for dichotomous response data which does not presume the analyst knows the correct answers to the test a priori (answer key). In addition to the answer key, respondent ability, guessing bias, and difficulty parameters are estimated. With respect to data-fit, the study compares between the possible GCM formulations, using MCMC-based methods for model assessment and model selection. Real data applications and a simulation study show that the GCM can accurately reconstruct the answer key from a small number of respondents.

Type
Article
Copyright
Copyright © 2003 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This study was supported in part by Spencer Foundation grant SG2001000020, George Karabatsos, Principal Investigator, and also in part by NSF Renewal Grant SES-0001550 to A.K. Romney and W.H. Batchelder, Co-Principal Investigators. The second author acknowledges the kind support of the Santa Fe Institute, where he worked on aspects of this paper as a Visiting Professor in the fall of 2001. Both authors appreciate the detailed comments offered by the Editor and two referees on an earlier version of the manuscript.

References

Aarts, E., & Kours, T.J. (1989). Simulated Annealing and Boltzman machines:Stochastic approach to combinatorial optimization and neural computing. New York, NY: John Wiley & Sons.Google Scholar
Baker, F.B. (1992). Item response theory: Parameter estimation techniques. New York, NY: Marcel Dekker.Google Scholar
Batchelder, W.H., Kumbasar, E., & Boyd, J.P. (1997). Consensus analysis of three-way social network data. Journal of Mathematical Sociology, 22, 2958.CrossRefGoogle Scholar
Batchelder, W.H., & Romney, A.K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In Grofman, B., & Owen, G. (Eds.), Information pooling and group decision making (pp. 103112). Greenwich, CT: JAI Press.Google Scholar
Batchelder, W.H., & Romney, A.K. (1988). Test theory without an answer key. Psychometrika, 53, 7192.CrossRefGoogle Scholar
Batchelder, W.H., & Romney, A.K. (1989). New results in test theory without an answer key. In Roskam, E.E. (Eds.), Mathematical psychology in progress. Berlin, Germany: Springer-Verlag.Google Scholar
Batchelder, W.H., & Romney, A.K. (2000). Extending cultural consensus theory to comparisons among cultures. Irvine, CA: University of California, Irvine.Google Scholar
Bernardo, J.M., & Smith, A.F.M. (1994). Bayesian theory. Chichester, England: John Wiley & Sons.CrossRefGoogle Scholar
Carlin, B.P., & Louis, T.A. (1998). Bayes and empirical Bayes methods for data analysis (first reprint). Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
Chen, W.H., & Thissen, D. (1997). Local dependence indices for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265289.CrossRefGoogle Scholar
Clogg, C.C. (1981). New developments in latent structure analysis. In Jackson, D.M., & Borgatta, E.F. (Eds.), Factor analysis and measurement in sociological research (pp. 215246). Beverly Hills, CA: Sage Publications.Google Scholar
Cowles, M.K., & Carlin, B.P. (1996). Markov Chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91, 883904.CrossRefGoogle Scholar
Crowther, C.S., Batchelder, W.H., & Hu, X. (1995). A measurement-theoretic analysis of the fuzzy logic model of perception. Psychological Review, 102, 396408.CrossRefGoogle ScholarPubMed
Gelfand, A.E., & Smith, A.F.M. (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398409.CrossRefGoogle Scholar
Gelfand, A.E., Smith, A.F.M., & Lee, T.M. (1992). Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. Journal of the American Statistical Association, 87, 523532.CrossRefGoogle Scholar
Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733807.Google Scholar
Gelman, A., & Rubin, D.B. (1995). Avoiding model selection in Bayesian social research. In Marsden, Peter V. (Eds.), Sociological Methodology (pp. 165173). Cambridge, MA: Blackwell Publishing.Google Scholar
Gelman, A., & Rubin, D.B. (1999). Evaluating and using statistical methods in the social sciences. Sociological Methods and Research, 27, 407410.CrossRefGoogle Scholar
Geyer, C.J. (1992). Practical Markov Chain Monte Carlo (with discussion). Statistical Science, 7, 473483.Google Scholar
Geyer, C.J. (1996). Estimation and optimization of functions. In Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (Eds.), Markov Chain Monte Carlo in practice (pp. 241255). Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
Green, D.M., & Swets, J.A. (1966). Signal detection theory and psychophysics. New York, NY: John Wiley & Sons.Google Scholar
Grofman, B., & Owne, G. (1986). Information pooling and group decision making. Greenwich, CT: JAI Press.Google Scholar
Hastings, W.K. (1970). Monte Carlo methods using Markov Chains and their applications. Biometrika, 57, 99109.CrossRefGoogle Scholar
Insightful Corporation (1995). S-PLUS documentation. Seattle, WA: Author.Google Scholar
Johnson, N.L., & Kotz, S. (1970). Continuous univariate distributions, Vol. 2. Boston, MA: Houghton-Mifflin.Google Scholar
Karabatsos, G. (2001). The Rasch model, additive conjoint measurement, and new models of probabilistic measurement theory. Journal of Applied Measurement, 2, 389423.Google ScholarPubMed
Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. New York, NY: Houghton Mifflin.Google Scholar
Lord, F. (1983). SmallN justifies the Rasch model. In Weiss, D.J. (Eds.), New horizons in latent trait test theory and computerized adaptive testing (pp. 5161). New York, NY: Academic Press.Google Scholar
Macmillan, N.A., & Creelman, C.D. (1991). Detection theory: A user's guide. New York, NY: Cambridge University Press.Google Scholar
McCullaugh, P., & Nelder, J.A. (1983). Generalized linear models. London, U.K.: Chapman and Hall.CrossRefGoogle Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., & Teller, E. (1953). Equations of state calculations by fast computing machines. Journal of chemical physics, 21, 10871091.CrossRefGoogle Scholar
Nelson, T.O., & Narens, L. (1980). Norms of 300 general information questions: Accuracy of recall, latency of recall, and feeling-of-knowing ratings. Journal of Verbal Learning and Verbal Behavior, 19, 338368.CrossRefGoogle Scholar
Patz, R.J., & Junker, B.W. (1999). A straightforward approach to Markov Chain Monte Carlo Methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146178.CrossRefGoogle Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: The Danish Institute of Educational Research.Google Scholar
Romney, A.K., & Batchelder, W.H. (1999). Cultural consensus theory. In Wilson, R.A., & Keil, F.C. (Eds.), The MIT enclyclopedia of the cognitive sciences (pp. 208209). Cambridge, MA: The MIT Press.Google Scholar
Romney, A.K., Weller, S.C., & Batchelder, W.H. (1986). Culture as consensus: A theory of culture and respondent accuracy. American Anthropologist, 88, 313338.CrossRefGoogle Scholar
Roskam, E.E., & Jansen, P.G.W. (1984). A new derivation of the Rasch model. In Degreef, E., & Van Buggenhaut, J. (Eds.), Trends in mathematical psychology (pp. 293307). North-Holland: Elsevier Science Publishers.CrossRefGoogle Scholar
Scheiblechner, H. (1995). Isotonic ordinal probabilistic models (ISOP). Psychometrika, 60, 281304.CrossRefGoogle Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (in press). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B.Google Scholar
Swets, J.A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers (scientific psychology series). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Tanner, M.A. (1996). Tools for statistical inference: Methods for the exploration of posterior distributions and likelihood functions 3rd ed., New York, NY: Springer.CrossRefGoogle Scholar
Tierney, L. (1994). Exploring posterior distributions with Markov chains (with discussion). Annals of Statistics, 22, 17011762.Google Scholar
Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93, 120132.CrossRefGoogle Scholar