Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-08T09:28:27.517Z Has data issue: false hasContentIssue false

An Approximation of the K out N Reliability of a Test, and a Scoring Procedure for Determining which Items an Examinee Knows

Published online by Cambridge University Press:  01 January 2025

Rand R. Wilcox*
Affiliation:
University of Southern California
*
Reprint requests should be addressed to Rand R. Wilcox, Department of Psychology, SGM621, University of Southern California, Los Angeles, California 90089.

Abstract

Consider any scoring procedure for determining whether an examinee knows the answer to a test item. Let xi = 1 if a correct decision is made about whether the examinee knows the i th item; otherwise xi = 0. The k out of n reliability of a test is ρk = Pr (Σxik). That is, ρk is the probability of making at least k correct decisions for a typical (randomly sampled) examinee. This paper proposes an approximation of ρk that can be estimated with an answer-until-correct test. The paper also suggests a scoring procedure that might be used when ρk is judged to be too small under a conventional scoring rule where it is decided an examinee knows if and only if the correct response is given.

Type
Original Paper
Copyright
Copyright © 1983 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ashler, D. Biserial estimators in the presence of guessing. Journal of Educational Statistics, 1979, 4, 325355.Google Scholar
Bahadur, R. R. A representation of the joint distribution of responses to n dichotomous items. In Solomon, H. (Eds.), Studies in Item Analysis and Prediction, Stanford: Stanford University Press, 1961.Google Scholar
Barlow, R., Bartholomew, D., Bremner, J., & Brunk, H. Statistical inference under order restrictions, New York: Wiley, 1972.Google Scholar
Bliss, L. B. A test of Lord's assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students. Journal of Educational Measurement, 1980, 17, 147153.Google Scholar
Coombs, C. H., Milholland, J. E., & Womer, F. B. The assessment of partial information. Educational and Psychological Measurement, 1956, 16, 1337.Google Scholar
Copas, J. B. On symmetric compound decision rules for dichotomies. Annals of Statistics, 1974, 2, 199204.Google Scholar
Cross, L. H., & Frary, R. B. An empirical test of Lord's theoretical results regarding formula-scoring of multiple-choice tests. Journal of Educational Measurement, 1977, 14, 313321.CrossRefGoogle Scholar
Dayton, C. M., & Macready, G. B. A probabilistic model for validation of behavioral hierarchies. Psychometrika, 1976, 41, 189204.CrossRefGoogle Scholar
Dillon, W. R., & Goldstein, M. On the performance of some multinomial classification rules. Journal of the American Statistical Association, 1978, 73, 305313.CrossRefGoogle Scholar
Gilbert, E. S. On discrimination using qualitative variables. Journal of the American Statistical Association, 1968, 63, 13991412.Google Scholar
Macready, G. B., & Dayton, C. M. The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 1977, 2, 99120.CrossRefGoogle Scholar
Moore, D. H. II Evaluation of five discrimination procedures for binary variables. Journal of the American Statistical Association, 1973, 68, 399404.CrossRefGoogle Scholar
Robertson, T. Testing for and against an order restriction on multinomial parameters. Journal of the American Statistical Association, 1978, 73, 197202.CrossRefGoogle Scholar
Tong, Y. L. Probability inequalities in multivariate distributions, New York: Academic Press, 1980.Google Scholar
van den Brink, W. P., & Koele, P. Item sampling, guessing and decision-making in achievement testing. British Journal of Mathematical and Statistical Psychology, 1980, 33, 104108.Google Scholar
Weitzman, R. A. Ideal multiple-choice items. Journal of the American Statistical Association, 1970, 65, 7189.CrossRefGoogle Scholar
Wilcox, R. R. Determining the length of a criterion-referenced test. Applied Psychological Measurement, 1980, 4, 425446.Google Scholar
Wilcox, R. R. Some empirical and theoretical results on an answer-until-correct scoring procedure. British Journal of Mathematical and Statistical Psychology, 1982, 35, 5770 (a)Google Scholar
Wilcox, R. R. Some new results on an answer-until-correct scoring procedure. Journal of Educational Measurement, 1982, 19, 6774 (b)Google Scholar
Wilcox, R. R. Using results on k out of n system reliability to study and characterize tests. Educational and Psychological Measurement, 1982, 42, 153165 (c)CrossRefGoogle Scholar
Wilcox, R. R. Bounds on the k out of n reliability of a test, and an exact test for hierarchically related items. Applied Psychological Measurement, in press. (a)Google Scholar
Wilcox, R. R. How do examinees behave when taking multiple-choice tests? Applied Psychological Measurement, in press. (b)Google Scholar