Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2025-01-05T14:23:44.218Z Has data issue: false hasContentIssue false

The Many Null Distributions of Person Fit Indices

Published online by Cambridge University Press:  01 January 2025

Ivo W. Molenaar*
Affiliation:
University of Groningen
Herbert Hoijtink
Affiliation:
University of Groningen
*
Requests for reprints should go to Ivo W. Molenaar, Vakgroep Statistiek & Meettheorie FPPSW, Oude Boteringestr. 23, 9712 GC Groningen, THE NETHERLANDS.

Abstract

This paper deals with the situation of an investigator who has collected the scores of n persons to a set of k dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fitting items may be removed, this paper studies the alternative model in which a small minority of persons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting of k symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called “ability parameter”, although our results are equally valid for Rasch scales measuring other attributes.

As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.

Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.

Type
Original Paper
Copyright
Copyright © 1990 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version.

References

Abramowitz, M., Stegun, I. (1964). Handbook of mathematical functions, New York: Dover Publications.Google Scholar
Andersen, E. B. (1982). Latent trait models and ability parameter estimation. Applied Psychological Measurement, 6, 445461.CrossRefGoogle Scholar
Andersen, E. B., Madsen, M. (1977). Estimating the parameters of the latent population distribution. Psychometrika, 42, 357374.CrossRefGoogle Scholar
Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 6780.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 5979.CrossRefGoogle Scholar
Fischer, G. H. (1974). Einführung in die Theorie psychologischer Tests [An introduction to the theory of psychological tests], Bern: Huber.Google Scholar
Formann, A. K. (1986). A note on the computation of the second-order derivatives of the elementary symmetric functions in the Rasch model. Psychometrika, 51, 335339.CrossRefGoogle Scholar
Glas, C. A. W., Verhelst, N. D. (1989). Extensions of the partial credit model. Psychometrika, 54, 635659.CrossRefGoogle Scholar
Gustafsson, J. E. (1979). PML, a computer program for conditional estimation and testing in the Rasch model for dichotomous items, Göteborg: University of Göteborg.Google Scholar
Harnisch, D. L., Linn, R. L. (1981). Analysis of item response patterns: questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133146.CrossRefGoogle Scholar
Hoijtink, H. (1986). Detecting aberrant response patterns in the unidimensional scaling model of Rasch, Groningen: University of Groningen, Vakgroep S&M FSW.Google Scholar
Hoijtink, H. (1987). Rasch schaal constructie met behulp van een passingsindex voor personen [Rasch scale construction using a fit index for persons]. Kwantitatieve Methoden, 25, 101110.Google Scholar
Hulin, C. L., Drasgow, F., Parsons, C. K. (1983). Item response theory—Application to psychological measurement, Homewood, IL.: Dow Jones-Irwin.Google Scholar
IMSL (1982). IMSL Library, Houston: Author.Google Scholar
Jansen, P. G. W. (1984). Computing the second-order derivatives of the symmetric functions in the Rasch model. Kwantitatieve Methoden, 13, 131147.Google Scholar
Johnson, N. L., Kotz, S. S. (1970). Continuous univariate distributions (Vol. 1), Boston: Houghton Mifflin.Google Scholar
Kennedy, J., Gentle, J. E. (1980). Statistical computing, New York: Dekker.Google Scholar
Kogut, J. (1987). Reduction of bias in Rasch estimates due to aberrant patterns, Twente: University of Twente.Google Scholar
Kogut, J. (in press). Detecting aberrant response patterns in the Rasch model. Applied Psychological Measurement.Google Scholar
Levine, M. V., Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies. British Journal of Mathematical and Statistical Psychology, 35, 4256.CrossRefGoogle Scholar
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177195.CrossRefGoogle Scholar
Mokken, R. J., Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement, 6, 417430.CrossRefGoogle Scholar
Ramsay, J. O. (1989). A comparison of three simple test theory models. Psychometrika, 54, 487499.CrossRefGoogle Scholar
Rogers, H. J., Hattie, J. A. (1987). A Monte Carlo evaluation of several person and item fit statistics for item response models. Applied Psychological Measurement, 11, 4757.CrossRefGoogle Scholar
Smith, R. M. (1985). A comparison of Rasch person analysis and robust estimators. Educational and Psychological Measurement, 45, 433444.CrossRefGoogle Scholar
Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46, 359372.CrossRefGoogle Scholar
Snijders, T. A. B. (1988, June). Person fit when abilities are estimated. Paper presented at the Annual Psychometric Society Meeting, Los Angeles.Google Scholar
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95110.CrossRefGoogle Scholar
Tatsuoka, K. K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10, 5573.CrossRefGoogle Scholar
Verbeek, A., Kroonenberg, P. M. (1985). A survey of algorithms for exact distributions of test statistics in r × c contingency tables with fixed margins. Computational Statistics and Data Analysis, 3, 159185.CrossRefGoogle Scholar
Verhelst, N., Glas, C. A. W., van der Sluis, A. (1984). Estimation problems in the Rasch model: The basic symmetric functions. Computational Statistics Quarterly, 1, 245262.Google Scholar
Wright, B. D., Master, G. N. (1982). Rating scale analysis, Chicago: MESA Press University of Chicago.Google Scholar
Wright, B. D., Stone, M. H. (1979). Best test design, Chicago: MESA Press University of Chicago.Google Scholar