Person-fit statistics are used to identify individuals who are displaying aberrant—or unusual—behavior. Many of the most popular person-fit statistics—including \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} (Drasgow et al. Reference Drasgow, Levine and Williams1985), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} (Tatsuoka Reference Tatsuoka1984)—belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, this distribution only holds when both of the following conditions are satisfied: (a) the true ability is known and is used to compute T and (b) an infinite number of items are available and are used to compute T. Numerous researchers have shown that when one or both of these conditions are not satisfied, the null distribution of T deviates from the standard normal distribution (e.g., Li & Olejnik Reference Li and Olejnik1997; Molenaar & Hoijtink Reference Molenaar and Hoijtink1990; Noonan et al. Reference Noonan, Boss and Gessaroli1992; Reise Reference Reise1995; Sinharay Reference Sinharay2016b; Snijders Reference Snijders2001; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer1999). Thus, in practical settings where both conditions are not satisfied (because the ability parameter is estimated and only a finite number of items are available), the assumption of a standard normal null distribution is incorrect and may lead to an inaccurate assessment of person fit. The person-fit assessment may be too liberal (resulting in an inflated Type I error rate), too conservative (resulting in an unnecessary sacrifice in power), or some combination of both.
Several corrections have been suggested to improve the accuracy of person-fit assessment when one or both of the above-mentioned conditions are not satisfied. Researchers such as de la Torre & Deng (Reference de la Torre and Deng2008), Glas & Meijer (Reference Glas and Meijer2003), Sinharay (Reference Sinharay2016a), van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999) proposed resampling-based methods that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. However, resampling-based methods require the simulation and analysis of several large data sets and are therefore computationally intensive. More efficient methods have been proposed by Magis et al. (Reference Magis, Béland and Raîche2014), Sinharay (Reference Sinharay2016b), Snijders (Reference Snijders2001), who developed mean and variance corrections to account for the use of an estimated ability parameter, as well as Bedrick (Reference Bedrick1997) and Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990), who developed skewness corrections to account for the use of a finite number of items. These methods are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. Notably, however, no efficient methods have been developed that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The purpose of this paper is to fill this void in the literature.
In Sect. 1, we review the class of standardized person-fit statistics, T, as well as the existing corrections for T that account for either the use of an estimated ability parameter or the use of a finite number of items. In Sect. 2, we introduce three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. All three corrections are computationally efficient. In Sect. 3, detailed simulations are conducted to (a) examine the null distributions and (b) compare the Type I error rates and power of the new and existing statistics. In Sect. 4, a real data example is provided. Finally, in Sect. 5, we conclude with a brief discussion and suggest directions for future research.
1. Background
Consider a test comprised of n items. Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} denote the score on item i, and let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_i(\theta )=P(X_i=1|\theta )$$\end{document} denote the probability that item i is answered correctly given the ability parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . For example, for the three-parameter logistic model (3PLM),
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_i$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_i$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_i$$\end{document} are the discrimination, difficulty, and pseudo-guessing parameters, respectively, of item i.
1.1. Standardized Person-Fit Statistics
Consider the class of standardized person-fit statistics that was introduced by Snijders (Reference Snijders2001) and takes the form
where
for some suitable weight function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_i(\theta )$$\end{document} . For the standardized log-likelihood statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} (Drasgow et al. Reference Drasgow, Levine and Williams1985), the weight function is given by
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\theta )=1-p_i(\theta )$$\end{document} . For the standardized extended caution indices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} (Tatsuoka Reference Tatsuoka1984), the weight functions are given by
respectively, for
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _v$$\end{document} is the ability parameter of examinee v, and N is the total number of examinees.
Equation 3 implies that
where
Therefore, the standardized person-fit statistic of Eq. 2 can be expressed as
To classify a score pattern as aberrant, the significance probability p is computed as the probability that under the null distribution, the value of the test statistic T is equal to or exceeds the observed value t. That is,
For the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} statistic, extreme negative values indicate misfit. For the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} statistics, extreme positive values indicate misfit.
Standardized person-fit statistics are assumed to have a standard normal null distribution. However, this distribution only holds when both of the following conditions are satisfied: (a) the true ability is known and is used to compute T and (b) an infinite number of items are available and are used to compute T. Figure 1 shows the null distribution of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} statistic when both conditions are (almost) satisfied—that is, when the true ability and 500 items are used to compute \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} . Observe that the null distribution (dashed black line) is very close to the theorized standard normal distribution (solid black line).
However, when condition (a) is not satisfied—that is, when the true ability is unknown and an ability estimate is used instead—the null distribution of T has a variance smaller than 1, as indicated by the dashed gray line in Fig. 1. Thus, the assumption of a standard normal null distribution (that has a variance of 1) leads to a conservative assessment of person fit. When condition (b) is not satisfied—that is, when a finite number of items are used to compute T—the null distribution of T is skewed, as indicated by the dotted black line in Fig. 1, which represents a 12-item test. The distribution is negatively skewed if extreme negative values of the statistic indicate misfit (e.g., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} ) or positively skewed if extreme positive values of the statistic indicate misfit (e.g., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} ). Thus, the assumption of a standard normal null distribution (that is not skewed) leads to a liberal assessment of person fit. When both (a) and (b) are not satisfied, the null distribution of T is skewed, has a variance smaller than 1, and has a mean that differs slightly from 0, as indicated by the dotted gray line in Fig. 1. Thus, the assumption of a standard normal null distribution leads to either a liberal or a conservative assessment of person fit, depending on the chosen significance level: the use of smaller significance levels leads to a more liberal assessment of person fit, while the use of larger significance levels leads to a more conservative assessment.
In an effort to obtain a more accurate assessment of person fit, several researchers have proposed corrections for T. Mean and variance corrections have been suggested to account for the use of an estimated ability parameter. Skewness corrections have been suggested to account for the use of a finite number of items. The following subsections contain reviews of each of these corrections.
1.2. Mean and Variance Corrections
When the true ability is unknown and an ability estimate is used instead, a naïve approximation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\theta )$$\end{document} can be obtained by inserting \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} into Eq. 10. That is,
However, Snijders (Reference Snijders2001) proved that replacing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} has a non-negligible effect on the variance of W—and therefore, T—even when an infinite number of items are used. Thus, the assumption that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\hat{\theta })$$\end{document} has a standard normal null distribution, even asymptotically, is incorrect. Snijders further showed that if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} satisfies the condition
for some functions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_0(\hat{\theta })$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i(\hat{\theta })$$\end{document} , then the mean and variance of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\hat{\theta })$$\end{document} can be approximated using
respectively, where
and
for
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_i^\prime (\hat{\theta })$$\end{document} is the first derivative of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_i(\hat{\theta })$$\end{document} with respect to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} , and the modified weight function is given by
Therefore, the asymptotically correct statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*(\hat{\theta })$$\end{document} can be derived from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\hat{\theta })$$\end{document} by adjusting both the mean and variance of the statistic. In other words,
has an asymptotic standard normal null distribution.
The corrected statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} can be computed using any ability estimate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} that has functions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_0(\hat{\theta })$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i(\hat{\theta })$$\end{document} which satisfy Eq. 13. Magis et al. (Reference Magis, Raîche and Béland2012) showed that for the weighted likelihood (WL) estimate (Warm Reference Warm1989), maximum likelihood (ML) estimate, and maximum a posteriori (MAP) estimate, Eq. 13 is satisfied for
and
where
and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\cdot )$$\end{document} is the prior distribution on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} .
1.3. Skewness Corrections
Researchers such as Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990), Noonan et al. (Reference Noonan, Boss and Gessaroli1992), and van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999) have shown that the null distribution of T becomes more skewed as test length decreases. Therefore, the standard normal distribution (that is not skewed) provides a poor approximation for tests with fewer items. To obtain a more accurate approximation of the null distribution of T, several methods have been developed that take this skewness into account. These methods use naïve approximations of the mean, variance, and skewness of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\hat{\theta })$$\end{document} that are obtained by inserting \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} into Eqs. 6, 7, and 8, respectively. For example, the naïve approximation of skewness is given by
Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990) suggested two methods to approximate the null distribution of T. The first method is based on the Cornish–Fisher expansion. The skewness-corrected statistic is given by
which is equivalent to approximating the significance probability of Eq. 11 as
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi (\cdot )$$\end{document} denotes the cumulative distribution function (CDF) of the standard normal distribution.
The second method of Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990) employs a higher-order approximation of the significance probability that is based on a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu (\hat{\theta })=\frac{8}{\gamma ^2(\hat{\theta })}$$\end{document} degrees of freedom. Thus, the significance probability is approximated as
where
A third method was suggested by Bedrick (Reference Bedrick1997), who used the Edgeworth expansion to approximate the significance probability as
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\cdot )$$\end{document} denotes the probability density function of the standard normal distribution. The Edgeworth expansion occasionally yields estimates of p that are smaller than 0 or larger than 1. In this paper, we replace such values with (traditional) estimates of p that are obtained by applying \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\hat{\theta })$$\end{document} under the assumption of a standard normal null distribution.
We note that it may be desirable to transform the significance probability approximations in Eqs. 22 and 23 to person-fit statistics that are approximately standard normally distributed. If extreme negative values of t indicate misfit, the significance probability approximations can be transformed using the inverse CDF of the standard normal distribution, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(p)$$\end{document} . If extreme positive values of t indicate misfit, the significance probability approximations can be transformed using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(1-p)$$\end{document} .
2. Method
In the previous section, we reviewed (a) the class of standardized person-fit statistics T, (b) mean and variance corrections for T that account for the use of an estimated ability parameter, and (c) skewness corrections for T that account for the use of a finite number of items. In this section, we apply mean, variance, and skewness corrections to simultaneously account for the use of an estimated ability parameter and the use of a finite number of items.
We start by considering the class of mean and variance-corrected statistics \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} that is given by Eq. 18. As was the case for the uncorrected statistic T, the corrected statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} is assumed to have a standard normal null distribution. However, this distribution only holds asymptotically—that is, when an infinite number of items are available. Figure 2 shows the null distribution of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic when 500 items are used to compute \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} . Observe that the null distribution (dashed gray line) is very close to the theorized standard normal distribution (solid black line). Yet, when a finite number of items are used to compute \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} , the null distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} is skewed, as indicated by the dotted gray line in Fig. 2. Thus, the assumption of a standard normal distribution (that is not skewed) leads to a liberal assessment of person fit, especially at smaller significance levels such as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.02$$\end{document} (e.g., de la Torre & Deng Reference de la Torre and Deng2008; Sinharay Reference Sinharay2016b; Snijders Reference Snijders2001).
In an effort to obtain a more accurate assessment of person fit, we introduce three new methods for approximating the null distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} that take this skewness into account. These methods use asymptotic approximations of the mean, variance, and skewness of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\hat{\theta })$$\end{document} that are given by Eqs. 14, 15, and
respectively.
The three new methods for approximating the null distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} are heuristics that parallel the methods in Eqs. 21, 22, and 23 for approximating the null distribution of T. R code to implement the new methods is included in “Appendix A”.
The first of the three new methods is based on the Cornish–Fisher expansion. The skewness-corrected statistic is given by
which is equivalent to approximating the significance probability of Eq. 11 as
The second method employs a higher-order approximation of the significance probability that is based on a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\nu }(\hat{\theta })=\frac{8}{\tilde{\gamma }^2(\hat{\theta })}$$\end{document} degrees of freedom. Thus, the significance probability is approximated as
where
The third method uses the Edgeworth expansion to approximate the significance probability as
If the Edgeworth expansion yields estimates of p that are smaller than 0 or larger than 1, we replace such values with estimates of p that are obtained by applying \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*(\hat{\theta })$$\end{document} under the assumption of a standard normal null distribution.
The inverse CDF method can be used to transform the significance probability approximations in Eqs. 27 and 28 to person-fit statistics that are approximately standard normally distributed. If extreme negative values of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^*$$\end{document} indicate misfit, the significance probability approximations can be transformed using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(p^*)$$\end{document} . If extreme positive values of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^*$$\end{document} indicate misfit, the significance probability approximations can be transformed using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(1-p^*)$$\end{document} .
3. Simulation Study
3.1. Design and Analysis
We conducted a detailed simulation study to (a) examine the null distributions and (b) compare the Type I error rates and power of the new and existing statistics. The simulations were designed to mimic realistic testing conditions—therefore, an estimated ability parameter and a finite number of items were used to compute the person-fit statistics.
Three test lengths (12, 36, 72) were studied to represent short, medium, and long tests. For each test length, 1 million (10,000 examinees \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 100 replications) score patterns were simulated. For each replication, new sets of person and item parameters were generated. As in Glas & Meijer (Reference Glas and Meijer2003), 90% of the examinees were non-aberrant—that is, they fit the model—and were used to study the null distributions and Type I error rates of the statistics. The remaining 10% of the examinees were divided equally into four groups of aberrant examinees and were used to study power. The four groups of aberrant examinees were characterized by the type of aberrant behavior (lack of motivation, item disclosure) and by the proportion of contaminated items ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{6}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{3}$$\end{document} ).
Uncontaminated item scores were generated using the 3PLM. For each replication, the item parameters were sampled such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_i \sim Lognormal(0,0.25^2)$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_i \sim \mathcal {N}(0,1)$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_i \sim \mathcal {U}(0.05,0.30)$$\end{document} , as in Sinharay (Reference Sinharay2016b), and the person parameters were sampled such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _v \sim \mathcal {N}(0,1)$$\end{document} . Contaminated item scores were generated after manipulating the item success probabilities. As in Glas & Meijer (Reference Glas and Meijer2003), lack of motivation was simulated as random guessing on the easiest items, with a success probability equal to 0.2. Item disclosure was simulated as preknowledge of the most difficult items, with a success probability equal to 0.9. By simulating lack of motivation on the easiest items and item disclosure on the most difficult items, we studied conditions in which ability estimates would be severely impacted and are therefore important to detect.
After simulating the data, each of the score patterns was analyzed 72 times: once for each combination of two classes of person-fit statistics (T, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} ), three weight functions ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} ), four skewness corrections (none, Cornish–Fisher expansion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, Edgeworth expansion), and three ability estimates (WL, ML, MAP). In all conditions, the item parameters were treated as known. This assumption is common in person-fit research, as it prevents the null distribution of the statistics from being affected by any uncertainty in the item parameter estimates (e.g., Molenaar & Hoijtink Reference Molenaar and Hoijtink1990; Snijders Reference Snijders2001; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer1999).
The ML and MAP estimates of ability were bounded between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-4$$\end{document} and 4. The standard normal distribution was used as the prior distribution for the MAP estimates. The standardized extended caution indices were computed after reversing the sign of the weights given in Eq. 5; thus, extreme negative values of the statistics indicated misfit. This adjustment was made to facilitate comparisons against the standardized log-likelihood statistics, for which extreme negative values also indicate misfit.
3.2. Results
The choice of ability estimate was found not to affect the relative performance of the person-fit statistics. Therefore, we focus on the WL estimates of ability (since they were computed without any bounds or prior distributions) and include results for the other ability estimates in “Appendix B”.
3.2.1. The Null Distributions of the Person-Fit Statistics
Figure 3 displays the first four moments of the null distributions of the person-fit statistics. Each row corresponds to a different moment (mean, variance, skewness, excess kurtosis), and each column corresponds to a different test length (12, 36, 72). Note that excess kurtosis is defined as the kurtosis minus 3. Horizontal dotted lines are used to indicate the values that are expected under the theoretical null distribution. It is desirable for the moments of the empirical null distributions to be as close to these values as possible.
Figure 3 reveals that the null distribution of T (i.e., the uncorrected statistic given by Eq. 12) is negatively skewed, has a variance smaller than 1, and has a mean that is slightly larger than 0. Similar results are shown in Table 1 of Li & Olejnik (Reference Li and Olejnik1997), Table 3 of Reise (Reference Reise1995), and Table 1 of van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999). The skewness-corrected statistics \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} return values of skewness that are closer to 0, but do not offer much improvement in terms of the mean or variance. Conversely, the mean and variance-corrected statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} (that is given by Eq. 18) has a variance that is closer to 1, but does not offer much improvement in terms of skewness. Interestingly, although \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} has a mean that is closer to 0, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} have means that are farther from 0. Similar results are shown in Tables 1 and 2 of van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999) for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} , and in Table 1 of Sinharay (Reference Sinharay2016b) for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} .
The newly proposed statistics \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} are the only statistics to incorporate mean, variance, and skewness corrections. Therefore, it is not surprising to see that these are the only statistics that improve both the skewness and the variance. Figure 3 also reveals that although these statistics have similar means and variances, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} is the most effective at reducing skewness, followed by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} and then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} . This finding parallels the results for the class of T statistics, as shown in Fig. 3 and in previous research (Bedrick Reference Bedrick1997; Molenaar & Hoijtink Reference Molenaar and Hoijtink1990; Santos et al. Reference Santos, de la Torre and von Davier2020; von Davier & Molenaar Reference von Davier and Molenaar2003).
3.2.2. Type I Error Rates
Figure 4 displays the Type I error rates of the person-fit statistics. Each row corresponds to a different significance level (.01, .02, .05, .10), and each column corresponds to a different test length (12, 36, 72). Horizontal dotted lines are used to indicate the significance levels. It is desirable for the Type I error rates to be at or below these lines.
Figure 4 reveals that the Type I error rates of T (i.e., the uncorrected statistic) vary depending on the weight function that is used. For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , the Type I error rate is consistently smaller than the nominal level. This result can largely be attributed to the reduced variance of the statistic (see Fig. 3). In contrast, for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , the Type I error rates are slightly larger than the nominal level when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} is small and the test is short, but are close to or smaller than the nominal level in all other instances.
The Type I error rates of the skewness-corrected statistics \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} are always smaller than the Type I error rates of T. While this result is desirable in instances where the Type I error rates of T are inflated (e.g., when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} are used with a small \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} and a short test), it is undesirable in instances where T is already conservative, which seems to be the more common case. Therefore, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} are of limited utility in the present context.
The Type I error rates of the mean and variance-corrected statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} are always larger than the Type I error rates of T. When \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.10$$\end{document} , this result is useful, since the Type I error rates of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} are closer to, but still do not exceed, the nominal level. In contrast, when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} , .02, or .05, the Type I error rates of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} often exceed the nominal level, which is undesirable in practice.
The Type I error rates of the newly proposed statistics \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} are generally quite favorable and seem to overcome the limitations of the other corrected statistics. That is, the Type I error rates tend to be close to the nominal level (unlike \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} ), while still not exceeding it (unlike \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} ). Of the three new statistics, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} produces Type I error rates that are closest to the nominal level, followed by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} , or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.02$$\end{document} , .05, or .10. These conclusions are the same regardless of test length or the choice of weight function.
Figure 5 displays the Type I error rates by quintile of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistics. (The results for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} are similar and are available upon request from the first author.) Quintiles were formed by separating examinees based on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , such that low-ability examinees were placed in Quintile 1, high-ability examinees were placed in Quintile 5, and the examinees in between were sorted accordingly. Notably, the Type I error rates of the newly proposed statistics are similar across ability levels.
3.2.3. Power
Table 1 displays the power of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} at the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} significance level. (The results for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} are similar and are available upon request.) Each row corresponds to a different combination of aberrance and test length, and each column corresponds to a different combination of person-fit statistic and skewness correction. As expected, power increases as test length increases and as the proportion of contaminated items increases. Furthermore, across all conditions, the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic with no skewness correction is the most powerful, followed closely by the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the Cornish–Fisher expansion, the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the Edgeworth expansion, and then the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation. Thus, it appears that the new statistics can be used without a significant loss in power. Similar results are shown in Table 2 at the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document} significance level.
CF Cornish–Fisher expansion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.
CF Cornish–Fisher expansion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.
4. Real Data Example
4.1. Data and Analysis
The data in this example originate from a single form of a licensure examination. The data have been studied in the context of person-fit assessment by researchers such as Sinharay (Reference Sinharay2016b), and in the context of preknowledge detection in several chapters of Cizek & Wollack (Reference Cizek and Wollack2017). Item scores are available for 1644 examinees on 170 scored items. Following a statistical analysis and careful investigative process, the testing program flagged 61 items as being compromised, and 48 examinees as likely having engaged in fraudulent behavior. The 48 flagged examinees can be considered truly aberrant for purposes of the present analysis. However, it is important to note that other types of aberrance may be present among some of the non-flagged examinees, as well.
The Rasch model parameter estimates provided by the testing program were treated as the true item parameters. Then, using the WL estimates of ability, each score pattern was analyzed 24 times: once for each combination of two classes of person-fit statistics (T, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} ), three weight functions ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} ), and four skewness corrections (none, Cornish–Fisher expansion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, Edgeworth expansion).
4.2. Results
Tables 3 and 4 display the proportions of examinees classified as aberrant and the agreement rates for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} at the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document} significance levels, respectively. (The results for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} are similar and are available upon request.) In each table, the proportions of examinees classified as aberrant are displayed in bold text along the diagonal, and the agreement rates are displayed in non-bold text in the off-diagonal. Agreement rate is defined as the proportion of times two statistics make the same classification decision (aberrant or non-aberrant).
CF Cornish–Fisher expansion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.
CF Cornish–Fisher expansion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.
Across all person-fit statistics and skewness corrections, the proportions of flagged examinees classified as aberrant are much larger than the proportions of non-flagged examinees classified as aberrant. This result provides favorable evidence regarding the performance of the statistics. In addition, the proportions of non-flagged examinees classified as aberrant consistently exceed the significance levels. This result is interesting, as it implies that aberrance is present among some of the non-flagged examinees, as well. For example, some of the non-flagged examinees may have engaged in fraudulent behavior, but were mistakenly not flagged by the testing program. It is also possible that some of the non-flagged examinees had engaged in a different type of aberrant behavior altogether.
Consistent with the simulation results, the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic with no skewness correction classified the most examinees as aberrant, followed by the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the Cornish–Fisher expansion. Notably, all four variants of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic classified the same sets of flagged examinees as aberrant—however, the skewness-corrected statistics classified fewer non-flagged examinees as aberrant. This result shows that the new statistics may lead to noticeable differences in tests having as many as 170 items.
5. Discussion
Many popular person-fit statistics—including \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} —belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. In this paper, we proposed three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets (as is the case for resampling-based methods). Detailed simulations further revealed that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. They therefore outperform the existing corrections for T that were suggested by Bedrick (Reference Bedrick1997), Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990), and Snijders (Reference Snijders2001).
Based on the results of the simulation study, we created the following set of guidelines for users to follow while selecting an appropriate person-fit statistic:
-
• When \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \ge .10$$\end{document} , it is recommended that users apply the existing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} statistic of Snijders (Reference Snijders2001).
-
• When \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha <.10$$\end{document} , it is recommended that users apply the newly proposed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} statistic.
Note that the recommended statistics are those that were shown to display the largest power while still controlling the Type I error rate.
We would also like to remind readers that person-fit statistics, by definition, are most appropriate when the goal is to detect general misfit at the person level. If the goal is to detect a specific type of misfit, such as item preknowledge or test speededness, or if the goal is to detect misfit at the person-by-item level, then alternative methods may be more suitable.
There are several limitations to this work, providing many opportunities for future research. First, it is possible to study shorter tests, to explore the boundaries of the proposed methods to see if there is a point at which they fail or break down. It is also possible to study additional simulation conditions. For example, we simulated lack of motivation on the easiest items and item disclosure on the most difficult items, thereby considering extreme conditions in which ability estimates are severely impacted. However, in practice, such behaviors could happen on more than just the easiest and most difficult items. Therefore, future researchers could simulate less extreme conditions to compare the statistics under a more realistic setting. In our simulations, we also assumed that the item parameters were known. However, researchers such as Cheng & Yuan (Reference Cheng and Yuan2010) have shown that the error associated with item parameter estimation affects the distribution of the person parameter estimates. It would be interesting to study the extent to which this error affects the distributions of the person-fit statistics, as well.
Second, the efficient corrections that are described in this paper could be compared to the resampling-based methods that are described in Sinharay (Reference Sinharay2016a). The methods should be compared in terms of the false-positive rate, true-positive rate, and computation time. Third, the new corrections could be applied to other standardized person-fit statistics, such as the standardized infit and outfit statistics (Magis et al. Reference Magis, Béland and Raîche2014). Fourth, the new corrections could be applied using other ability estimates, such as the biweight estimate and the Huber estimate. Sinharay (Reference Sinharay2016d) found that the use of both estimates with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} produces inflated Type I error rates, suggesting that the new corrections may be particularly useful in these settings.
Finally, in this study, we developed corrections for the class of standardized person-fit statistics within a very narrow context: non-adaptive, unidimensional tests with only dichotomous items. Standardized person-fit statistics have been applied in other contexts, including adaptive tests (e.g., Nering Reference Nering1997; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer1999), multidimensional tests with simple structure (e.g., Albers et al. Reference Albers, Meijer and Tendeiro2016; Hong et al. Reference Hong, Lin and Cheng2021), tests with polytomous items (e.g., Gorney & Wollack Reference Gorney and Wollack2023; Hong et al. Reference Hong, Lin and Cheng2021; Sinharay Reference Sinharay2016c; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer2002; von Davier & Molenaar Reference von Davier and Molenaar2003), and tests with response times (e.g., Gorney et al. Reference Gorney, Sinharay and Liu2024). Standardized person-fit statistics have also been applied in cognitive diagnosis modeling (e.g., Santos et al. Reference Santos, de la Torre and von Davier2020). In each of these contexts, the assumption of the standard normal null distribution has been shown to be inappropriate when realistic testing conditions are simulated, suggesting that corrections may be beneficial.
Funding
This work was completed while the first author was an Educational Testing Service (ETS) Harold Gulliksen Psychometric Research Fellow.
Conflict of interest
The authors declare that they have no conflict of interest.
Data Availability
The data that support the findings of this study are available from Dr. James Wollack upon reasonable request.