Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2025-01-05T18:35:32.724Z Has data issue: false hasContentIssue false

Efficient Corrections for Standardized Person-Fit Statistics

Published online by Cambridge University Press:  27 December 2024

Kylie Gorney*
Affiliation:
Michigan State University
Sandip Sinharay
Affiliation:
Educational Testing Service
Carol Eckerly
Affiliation:
Educational Testing Service
*
Correspondence should be made to Kylie Gorney, Department of Counseling, Educational Psychology, and Special Education, Michigan State University, 460 Erickson Hall, 620 Farm Lane, East Lansing, MI 48824, USA. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331–342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191–199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75–106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.

Type
Theory & Methods
Copyright
Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

Person-fit statistics are used to identify individuals who are displaying aberrant—or unusual—behavior. Many of the most popular person-fit statistics—including lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} (Drasgow et al. Reference Drasgow, Levine and Williams1985), ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} (Tatsuoka Reference Tatsuoka1984)—belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, this distribution only holds when both of the following conditions are satisfied: (a) the true ability is known and is used to compute T and (b) an infinite number of items are available and are used to compute T. Numerous researchers have shown that when one or both of these conditions are not satisfied, the null distribution of T deviates from the standard normal distribution (e.g., Li & Olejnik Reference Li and Olejnik1997; Molenaar & Hoijtink Reference Molenaar and Hoijtink1990; Noonan et al. Reference Noonan, Boss and Gessaroli1992; Reise Reference Reise1995; Sinharay Reference Sinharay2016b; Snijders Reference Snijders2001; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer1999). Thus, in practical settings where both conditions are not satisfied (because the ability parameter is estimated and only a finite number of items are available), the assumption of a standard normal null distribution is incorrect and may lead to an inaccurate assessment of person fit. The person-fit assessment may be too liberal (resulting in an inflated Type I error rate), too conservative (resulting in an unnecessary sacrifice in power), or some combination of both.

Several corrections have been suggested to improve the accuracy of person-fit assessment when one or both of the above-mentioned conditions are not satisfied. Researchers such as de la Torre & Deng (Reference de la Torre and Deng2008), Glas & Meijer (Reference Glas and Meijer2003), Sinharay (Reference Sinharay2016a), van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999) proposed resampling-based methods that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. However, resampling-based methods require the simulation and analysis of several large data sets and are therefore computationally intensive. More efficient methods have been proposed by Magis et al. (Reference Magis, Béland and Raîche2014), Sinharay (Reference Sinharay2016b), Snijders (Reference Snijders2001), who developed mean and variance corrections to account for the use of an estimated ability parameter, as well as Bedrick (Reference Bedrick1997) and Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990), who developed skewness corrections to account for the use of a finite number of items. These methods are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. Notably, however, no efficient methods have been developed that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The purpose of this paper is to fill this void in the literature.

In Sect. 1, we review the class of standardized person-fit statistics, T, as well as the existing corrections for T that account for either the use of an estimated ability parameter or the use of a finite number of items. In Sect. 2, we introduce three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. All three corrections are computationally efficient. In Sect. 3, detailed simulations are conducted to (a) examine the null distributions and (b) compare the Type I error rates and power of the new and existing statistics. In Sect. 4, a real data example is provided. Finally, in Sect. 5, we conclude with a brief discussion and suggest directions for future research.

1. Background

Consider a test comprised of n items. Let Xi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} denote the score on item i, and let pi(θ)=P(Xi=1|θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_i(\theta )=P(X_i=1|\theta )$$\end{document} denote the probability that item i is answered correctly given the ability parameter θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} . For example, for the three-parameter logistic model (3PLM),

(1) pi(θ)=ci+(1-ci)exp[ai(θ-bi)]1+exp[ai(θ-bi)], \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_i(\theta )=c_i+(1-c_i) \frac{\exp [a_i(\theta - b_i)]}{1+\exp [a_i(\theta - b_i)]}, \end{aligned}$$\end{document}

where ai \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_i$$\end{document} , bi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_i$$\end{document} , and ci \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_i$$\end{document} are the discrimination, difficulty, and pseudo-guessing parameters, respectively, of item i.

1.1. Standardized Person-Fit Statistics

Consider the class of standardized person-fit statistics that was introduced by Snijders (Reference Snijders2001) and takes the form

(2) T(θ)=W(θ)Var(W(θ)), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T(\theta )=\frac{W(\theta )}{\sqrt{\text {Var}(W(\theta ))}}, \end{aligned}$$\end{document}

where

(3) W(θ)=i=1n(Xi-pi(θ))wi(θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} W(\theta ) = \sum _{i=1}^n (X_i - p_i(\theta )) w_i(\theta ) \end{aligned}$$\end{document}

for some suitable weight function wi(θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_i(\theta )$$\end{document} . For the standardized log-likelihood statistic lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} (Drasgow et al. Reference Drasgow, Levine and Williams1985), the weight function is given by

(4) wi(θ)=logpi(θ)qi(θ), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} w_i(\theta ) = \log \frac{p_i(\theta )}{q_i(\theta )}, \end{aligned}$$\end{document}

where qi(θ)=1-pi(θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_i(\theta )=1-p_i(\theta )$$\end{document} . For the standardized extended caution indices ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} (Tatsuoka Reference Tatsuoka1984), the weight functions are given by

(5) wi(θ)=g-giandwi(θ)=h(θ)-pi(θ), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} w_i(\theta ) = g - g_i \text { and } w_i(\theta ) = h(\theta ) - p_i(\theta ), \end{aligned}$$\end{document}

respectively, for

gi=1Nv=1Npi(θv),h(θv)=1ni=1npi(θv),andg=1ni=1ngi=1N×nv=1Ni=1npi(θv)=1Nv=1Nh(θv), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} g_i=\frac{1}{N} \sum _{v=1}^N p_i(\theta _v), \\{} & {} h(\theta _v)=\frac{1}{n} \sum _{i=1}^n p_i(\theta _v), \text { and } \\{} & {} g=\frac{1}{n} \sum _{i=1}^n g_i = \frac{1}{N \times n} \sum _{v=1}^N \sum _{i=1}^n p_i(\theta _v) = \frac{1}{N} \sum _{v=1}^N h(\theta _v), \end{aligned}$$\end{document}

where θv \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _v$$\end{document} is the ability parameter of examinee v, and N is the total number of examinees.

Equation 3 implies that

E(W(θ))=μ(θ),Var(W(θ))=E(W(θ)-μ(θ))2=σ2(θ),Skew(W(θ))=EW(θ)-μ(θ)σ(θ)3=γ(θ),andKurt(W(θ))=EW(θ)-μ(θ)σ(θ)4=κ(θ), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E(W(\theta ))&= \mu (\theta ), \\ \text {Var}(W(\theta ))&= E\left[ (W(\theta )-\mu (\theta ))^2 \right] = \sigma ^2(\theta ), \\ \text {Skew}(W(\theta ))&= E\left[ \left( \frac{W(\theta )-\mu (\theta )}{\sigma (\theta )}\right) ^3 \right] = \gamma (\theta ), \text { and } \\ \text {Kurt}(W(\theta ))&= E\left[ \left( \frac{W(\theta )-\mu (\theta )}{\sigma (\theta )}\right) ^4 \right] = \kappa (\theta ), \end{aligned}$$\end{document}

where

(6) μ(θ)=0, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu (\theta )&= 0, \end{aligned}$$\end{document}
(7) σ2(θ)=i=1npi(θ)qi(θ)wi2(θ), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma ^2(\theta )&= \sum _{i=1}^n p_i(\theta ) q_i(\theta ) w_i^2(\theta ), \end{aligned}$$\end{document}
(8) γ(θ)=i=1npi(θ)qi(θ)(qi(θ)-pi(θ))wi3(θ)σ3(θ),and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma (\theta )&= \frac{\sum _{i=1}^n p_i(\theta ) q_i(\theta ) (q_i(\theta ) - p_i(\theta )) w_i^3(\theta )}{\sigma ^3(\theta )}, \text { and} \end{aligned}$$\end{document}
(9) κ(θ)=i=1npi(θ)qi(θ)(1-3pi(θ)qi(θ))wi4(θ)σ4(θ). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \kappa (\theta )&= \frac{\sum _{i=1}^n p_i(\theta ) q_i(\theta )(1 - 3 p_i(\theta ) q_i(\theta )) w_i^4(\theta )}{\sigma ^4(\theta )}. \end{aligned}$$\end{document}

Therefore, the standardized person-fit statistic of Eq. 2 can be expressed as

(10) T(θ)=W(θ)-μ(θ)σ(θ). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T(\theta ) = \frac{W(\theta ) - \mu (\theta )}{\sigma (\theta )}. \end{aligned}$$\end{document}

To classify a score pattern as aberrant, the significance probability p is computed as the probability that under the null distribution, the value of the test statistic T is equal to or exceeds the observed value t. That is,

(11) p=P(Tt)if extreme negative values oftindicate misfit,P(Tt)if extreme positive values oftindicate misfit. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p = {\left\{ \begin{array}{ll} P(T \le t) &{}\text {if extreme negative values of }t\hbox { indicate misfit,} \\ P(T \ge t) &{}\text {if extreme positive values of }t\hbox { indicate misfit.} \end{array}\right. } \end{aligned}$$\end{document}

For the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} statistic, extreme negative values indicate misfit. For the ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} statistics, extreme positive values indicate misfit.

Standardized person-fit statistics are assumed to have a standard normal null distribution. However, this distribution only holds when both of the following conditions are satisfied: (a) the true ability is known and is used to compute T and (b) an infinite number of items are available and are used to compute T. Figure 1 shows the null distribution of the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} statistic when both conditions are (almost) satisfied—that is, when the true ability and 500 items are used to compute lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} . Observe that the null distribution (dashed black line) is very close to the theorized standard normal distribution (solid black line).

Figure 1 The null distributions of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} .

However, when condition (a) is not satisfied—that is, when the true ability is unknown and an ability estimate is used instead—the null distribution of T has a variance smaller than 1, as indicated by the dashed gray line in Fig. 1. Thus, the assumption of a standard normal null distribution (that has a variance of 1) leads to a conservative assessment of person fit. When condition (b) is not satisfied—that is, when a finite number of items are used to compute T—the null distribution of T is skewed, as indicated by the dotted black line in Fig. 1, which represents a 12-item test. The distribution is negatively skewed if extreme negative values of the statistic indicate misfit (e.g., lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} ) or positively skewed if extreme positive values of the statistic indicate misfit (e.g., ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} ). Thus, the assumption of a standard normal null distribution (that is not skewed) leads to a liberal assessment of person fit. When both (a) and (b) are not satisfied, the null distribution of T is skewed, has a variance smaller than 1, and has a mean that differs slightly from 0, as indicated by the dotted gray line in Fig. 1. Thus, the assumption of a standard normal null distribution leads to either a liberal or a conservative assessment of person fit, depending on the chosen significance level: the use of smaller significance levels leads to a more liberal assessment of person fit, while the use of larger significance levels leads to a more conservative assessment.

In an effort to obtain a more accurate assessment of person fit, several researchers have proposed corrections for T. Mean and variance corrections have been suggested to account for the use of an estimated ability parameter. Skewness corrections have been suggested to account for the use of a finite number of items. The following subsections contain reviews of each of these corrections.

1.2. Mean and Variance Corrections

When the true ability is unknown and an ability estimate is used instead, a naïve approximation of T(θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\theta )$$\end{document} can be obtained by inserting θ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} into Eq. 10. That is,

(12) T(θ^)=W(θ^)-μ(θ^)σ(θ^). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T(\hat{\theta }) = \frac{W(\hat{\theta }) - \mu (\hat{\theta })}{\sigma (\hat{\theta })}. \end{aligned}$$\end{document}

However, Snijders (Reference Snijders2001) proved that replacing θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} with θ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} has a non-negligible effect on the variance of W—and therefore, T—even when an infinite number of items are used. Thus, the assumption that T(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\hat{\theta })$$\end{document} has a standard normal null distribution, even asymptotically, is incorrect. Snijders further showed that if θ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} satisfies the condition

(13) r0(θ^)+i=1n(Xi-pi(θ^))ri(θ^)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r_0(\hat{\theta }) + \sum _{i=1}^n (X_i - p_i(\hat{\theta })) r_i(\hat{\theta }) = 0 \end{aligned}$$\end{document}

for some functions r0(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_0(\hat{\theta })$$\end{document} and ri(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i(\hat{\theta })$$\end{document} , then the mean and variance of W(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\hat{\theta })$$\end{document} can be approximated using

E(W(θ^))μ~(θ^)andVar(W(θ^))σ~2(θ^), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} E(W(\hat{\theta })) \approx \tilde{\mu }(\hat{\theta }) \text { and }\\{} & {} \text {Var}(W(\hat{\theta })) \approx \tilde{\sigma }^2(\hat{\theta }), \end{aligned}$$\end{document}

respectively, where

(14) μ~(θ^)=-c(θ^)r0(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tilde{\mu }(\hat{\theta }) = -c(\hat{\theta }) r_0(\hat{\theta }) \end{aligned}$$\end{document}

and

(15) σ~2(θ^)=i=1npi(θ^)qi(θ^)w~i2(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tilde{\sigma }^2(\hat{\theta }) = \sum _{i=1}^n p_i(\hat{\theta }) q_i(\hat{\theta }) \tilde{w}_i^2(\hat{\theta }) \end{aligned}$$\end{document}

for

(16) c(θ^)=i=1npi(θ^)wi(θ^)i=1npi(θ^)ri(θ^), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} c(\hat{\theta }) = \frac{\sum _{i=1}^n p_i^\prime (\hat{\theta }) w_i(\hat{\theta })}{\sum _{i=1}^n p_i^\prime (\hat{\theta }) r_i(\hat{\theta })}, \end{aligned}$$\end{document}

where pi(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_i^\prime (\hat{\theta })$$\end{document} is the first derivative of pi(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_i(\hat{\theta })$$\end{document} with respect to θ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} , and the modified weight function is given by

(17) w~i(θ^)=wi(θ^)-c(θ^)ri(θ^). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tilde{w}_i(\hat{\theta }) = w_i(\hat{\theta }) - c(\hat{\theta }) r_i(\hat{\theta }). \end{aligned}$$\end{document}

Therefore, the asymptotically correct statistic T(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*(\hat{\theta })$$\end{document} can be derived from T(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\hat{\theta })$$\end{document} by adjusting both the mean and variance of the statistic. In other words,

(18) T(θ^)=W(θ^)-μ~(θ^)σ~(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T^*(\hat{\theta }) = \frac{W(\hat{\theta }) - \tilde{\mu }(\hat{\theta })}{\tilde{\sigma }(\hat{\theta })} \end{aligned}$$\end{document}

has an asymptotic standard normal null distribution.

The corrected statistic T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} can be computed using any ability estimate θ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} that has functions r0(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_0(\hat{\theta })$$\end{document} and ri(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_i(\hat{\theta })$$\end{document} which satisfy Eq. 13. Magis et al. (Reference Magis, Raîche and Béland2012) showed that for the weighted likelihood (WL) estimate (Warm Reference Warm1989), maximum likelihood (ML) estimate, and maximum a posteriori (MAP) estimate, Eq. 13 is satisfied for

ri(θ^)=pi(θ^)pi(θ^)qi(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r_i(\hat{\theta }) = \frac{p_i^\prime (\hat{\theta })}{p_i(\hat{\theta }) q_i(\hat{\theta })} \end{aligned}$$\end{document}

and

r0(θ^)=J(θ^)2I(θ^)ifθ^is the WL estimate,0ifθ^is the ML estimate,dlogf(θ^)dθ^ifθ^is the MAP estimate, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r_0(\hat{\theta }) = {\left\{ \begin{array}{ll} \frac{J(\hat{\theta })}{2 I(\hat{\theta })} &{}\text {if }\hat{\theta }\hbox { is the WL estimate,} \\ 0 &{}\text {if }\hat{\theta }\text { is the ML estimate,} \\ \frac{d \log f(\hat{\theta })}{d \hat{\theta }} &{}\text {if } \hat{\theta }\hbox { is the MAP estimate,} \end{array}\right. } \end{aligned}$$\end{document}

where

J(θ^)=i=1npi(θ^)pi(θ^)pi(θ^)qi(θ^),I(θ^)=i=1n[pi(θ^)]2pi(θ^)qi(θ^), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} J(\hat{\theta }) = \sum _{i=1}^n \frac{p_i^\prime (\hat{\theta }) p_i^{\prime \prime }(\hat{\theta })}{p_i(\hat{\theta }) q_i(\hat{\theta })}, \\{} & {} I(\hat{\theta }) = \sum _{i=1}^n \frac{[p_i^\prime (\hat{\theta })]^2}{p_i(\hat{\theta }) q_i(\hat{\theta })}, \end{aligned}$$\end{document}

and f(·) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\cdot )$$\end{document} is the prior distribution on θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} .

1.3. Skewness Corrections

Researchers such as Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990), Noonan et al. (Reference Noonan, Boss and Gessaroli1992), and van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999) have shown that the null distribution of T becomes more skewed as test length decreases. Therefore, the standard normal distribution (that is not skewed) provides a poor approximation for tests with fewer items. To obtain a more accurate approximation of the null distribution of T, several methods have been developed that take this skewness into account. These methods use naïve approximations of the mean, variance, and skewness of W(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\hat{\theta })$$\end{document} that are obtained by inserting θ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\theta }$$\end{document} into Eqs. 67, and 8, respectively. For example, the naïve approximation of skewness is given by

(19) γ(θ^)=i=1npi(θ^)qi(θ^)(qi(θ^)-pi(θ^))wi3(θ^)σ3(θ^). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} \gamma (\hat{\theta }) = \frac{\sum _{i=1}^n p_i(\hat{\theta }) q_i(\hat{\theta }) (q_i(\hat{\theta }) - p_i(\hat{\theta })) w_i^3(\hat{\theta })}{\sigma ^3(\hat{\theta })}. \end{aligned}$$\end{document}

Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990) suggested two methods to approximate the null distribution of T. The first method is based on the Cornish–Fisher expansion. The skewness-corrected statistic is given by

(20) TCF(θ^)=T(θ^)-γ(θ^)[(T(θ^))2-1]12, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} T_\text {CF}(\hat{\theta }) = T(\hat{\theta }) - \frac{\gamma (\hat{\theta })[(T(\hat{\theta }))^2 - 1]}{12}, \end{aligned}$$\end{document}

which is equivalent to approximating the significance probability of Eq. 11 as

(21) pCF=ΦTCF(θ^)if extreme negative values oftindicate misfit,1-ΦTCF(θ^)if extreme positive values oftindicate misfit, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_\text {CF} = {\left\{ \begin{array}{ll} \Phi \left( T_\text {CF}(\hat{\theta }) \right) &{}\text {if extreme negative values of }t\hbox { indicate misfit,} \\ 1 - \Phi \left( T_\text {CF}(\hat{\theta }) \right) &{}\text {if extreme positive values of }t\hbox { indicate misfit,} \end{array}\right. } \end{aligned}$$\end{document}

where Φ(·) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi (\cdot )$$\end{document} denotes the cumulative distribution function (CDF) of the standard normal distribution.

The second method of Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990) employs a higher-order approximation of the significance probability that is based on a χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with ν(θ^)=8γ2(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu (\hat{\theta })=\frac{8}{\gamma ^2(\hat{\theta })}$$\end{document} degrees of freedom. Thus, the significance probability is approximated as

(22) pχ2=Pχ2(ν(θ^))|W(θ^)-μ(θ^)-a(θ^)|b(θ^)if extreme negative values oftindicate misfit,Pχ2(ν(θ^))|W(θ^)-μ(θ^)+a(θ^)|b(θ^)if extreme positive values oftindicate misfit, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{\chi ^2} = {\left\{ \begin{array}{ll} P\left( \chi ^2(\nu (\hat{\theta })) \ge \frac{|W(\hat{\theta }) - \mu (\hat{\theta }) - a(\hat{\theta })|}{b(\hat{\theta })} \right) &{}\text {if extreme negative values of }t\text { indicate misfit,} \\ P\left( \chi ^2(\nu (\hat{\theta })) \ge \frac{|W(\hat{\theta }) - \mu (\hat{\theta }) + a(\hat{\theta })|}{b(\hat{\theta })} \right) &{}\text {if extreme positive values of }t\hbox { indicate misfit,} \end{array}\right. } \nonumber \\ \end{aligned}$$\end{document}

where

a(θ^)=b(θ^)ν(θ^)andb(θ^)=σ2(θ^)2ν(θ^). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} a(\hat{\theta })=b(\hat{\theta })\nu (\hat{\theta }) \text { and }\\{} & {} b(\hat{\theta })=\sqrt{\frac{\sigma ^2(\hat{\theta })}{2\nu (\hat{\theta })}}. \end{aligned}$$\end{document}

A third method was suggested by Bedrick (Reference Bedrick1997), who used the Edgeworth expansion to approximate the significance probability as

(23) pEW=Φ(T(θ^))-ϕ(T(θ^))γ(θ^)[(T(θ^))2-1]6if extreme negative values oftindicate misfit,1-Φ(T(θ^))-ϕ(T(θ^))γ(θ^)[(T(θ^))2-1]6if extreme positive values oftindicate misfit, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_\text {EW} = {\left\{ \begin{array}{ll} \Phi (T(\hat{\theta })) - \frac{\phi (T(\hat{\theta })) \gamma (\hat{\theta })[(T(\hat{\theta }))^2 - 1]}{6} &{}\text {if extreme negative values of }t\text { indicate misfit,} \\ 1 - \left( \Phi (T(\hat{\theta })) - \frac{\phi (T(\hat{\theta })) \gamma (\hat{\theta })[(T(\hat{\theta }))^2 - 1]}{6} \right) &{}\text {if extreme positive values of }t\hbox { indicate misfit,} \end{array}\right. }\nonumber \\ \end{aligned}$$\end{document}

where ϕ(·) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\cdot )$$\end{document} denotes the probability density function of the standard normal distribution. The Edgeworth expansion occasionally yields estimates of p that are smaller than 0 or larger than 1. In this paper, we replace such values with (traditional) estimates of p that are obtained by applying T(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\hat{\theta })$$\end{document} under the assumption of a standard normal null distribution.

We note that it may be desirable to transform the significance probability approximations in Eqs. 22 and 23 to person-fit statistics that are approximately standard normally distributed. If extreme negative values of t indicate misfit, the significance probability approximations can be transformed using the inverse CDF of the standard normal distribution, Φ-1(p) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(p)$$\end{document} . If extreme positive values of t indicate misfit, the significance probability approximations can be transformed using Φ-1(1-p) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(1-p)$$\end{document} .

2. Method

In the previous section, we reviewed (a) the class of standardized person-fit statistics T, (b) mean and variance corrections for T that account for the use of an estimated ability parameter, and (c) skewness corrections for T that account for the use of a finite number of items. In this section, we apply mean, variance, and skewness corrections to simultaneously account for the use of an estimated ability parameter and the use of a finite number of items.

We start by considering the class of mean and variance-corrected statistics T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} that is given by Eq. 18. As was the case for the uncorrected statistic T, the corrected statistic T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} is assumed to have a standard normal null distribution. However, this distribution only holds asymptotically—that is, when an infinite number of items are available. Figure 2 shows the null distribution of the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic when 500 items are used to compute lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} . Observe that the null distribution (dashed gray line) is very close to the theorized standard normal distribution (solid black line). Yet, when a finite number of items are used to compute T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} , the null distribution of T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} is skewed, as indicated by the dotted gray line in Fig. 2. Thus, the assumption of a standard normal distribution (that is not skewed) leads to a liberal assessment of person fit, especially at smaller significance levels such as α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} or α=.02 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.02$$\end{document} (e.g., de la Torre & Deng Reference de la Torre and Deng2008; Sinharay Reference Sinharay2016b; Snijders Reference Snijders2001).

Figure 2 The null distributions of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} .

In an effort to obtain a more accurate assessment of person fit, we introduce three new methods for approximating the null distribution of T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} that take this skewness into account. These methods use asymptotic approximations of the mean, variance, and skewness of W(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W(\hat{\theta })$$\end{document} that are given by Eqs. 14, 15, and

(24) γ~(θ^)=i=1npi(θ^)qi(θ^)(qi(θ^)-pi(θ^))w~i3(θ^)σ~3(θ^), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} \tilde{\gamma }(\hat{\theta }) = \frac{\sum _{i=1}^n p_i(\hat{\theta }) q_i(\hat{\theta }) (q_i(\hat{\theta }) - p_i(\hat{\theta })) \tilde{w}_i^3(\hat{\theta })}{\tilde{\sigma }^3(\hat{\theta })}, \end{aligned}$$\end{document}

respectively.

The three new methods for approximating the null distribution of T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} are heuristics that parallel the methods in Eqs. 21, 22, and 23 for approximating the null distribution of T. R code to implement the new methods is included in “Appendix A”.

The first of the three new methods is based on the Cornish–Fisher expansion. The skewness-corrected statistic is given by

(25) TCF(θ^)=T(θ^)-γ~(θ^)[(T(θ^))2-1]12, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T_\text {CF}^*(\hat{\theta }) = T^*(\hat{\theta }) - \frac{\tilde{\gamma }(\hat{\theta })[(T^*(\hat{\theta }))^2 - 1]}{12}, \end{aligned}$$\end{document}

which is equivalent to approximating the significance probability of Eq. 11 as

(26) pCF=ΦTCF(θ^)if extreme negative values oftindicate misfit,1-ΦTCF(θ^)if extreme positive values oftindicate misfit. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_\text {CF}^* = {\left\{ \begin{array}{ll} \Phi \left( T_\text {CF}^*(\hat{\theta }) \right) &{}\text {if extreme negative values of }t^*\hbox { indicate misfit,} \\ 1 - \Phi \left( T_\text {CF}^*(\hat{\theta }) \right) &{}\text {if extreme positive values of }t^*\hbox { indicate misfit.} \end{array}\right. } \end{aligned}$$\end{document}

The second method employs a higher-order approximation of the significance probability that is based on a χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} distribution with ν~(θ^)=8γ~2(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\nu }(\hat{\theta })=\frac{8}{\tilde{\gamma }^2(\hat{\theta })}$$\end{document} degrees of freedom. Thus, the significance probability is approximated as

(27) pχ2=Pχ2(ν~(θ^))|W(θ^)-μ~(θ^)-a~(θ^)|b~(θ^)if extreme negative values oftindicate misfit,Pχ2(ν~(θ^))|W(θ^)-μ~(θ^)+a~(θ^)|b~(θ^)if extreme positive values oftindicate misfit, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{\chi ^2}^* = {\left\{ \begin{array}{ll} P\left( \chi ^2(\tilde{\nu }(\hat{\theta })) \ge \frac{|W(\hat{\theta }) - \tilde{\mu }(\hat{\theta }) - \tilde{a}(\hat{\theta })|}{\tilde{b}(\hat{\theta })} \right) &{}\text {if extreme negative values of }t^*\text { indicate misfit,} \\ P\left( \chi ^2(\tilde{\nu }(\hat{\theta })) \ge \frac{|W(\hat{\theta }) - \tilde{\mu }(\hat{\theta }) + \tilde{a}(\hat{\theta })|}{\tilde{b}(\hat{\theta })} \right) &{}\text {if extreme positive values of }t^*\hbox { indicate misfit,} \end{array}\right. } \nonumber \\ \end{aligned}$$\end{document}

where

a~(θ^)=b~(θ^)ν~(θ^)andb~(θ^)=σ~2(θ^)2ν~(θ^). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} \tilde{a}(\hat{\theta })=\tilde{b}(\hat{\theta })\tilde{\nu }(\hat{\theta }) \text { and }\\{} & {} \quad \tilde{b}(\hat{\theta })=\sqrt{\frac{\tilde{\sigma }^2(\hat{\theta })}{2\tilde{\nu }(\hat{\theta })}}. \end{aligned}$$\end{document}

The third method uses the Edgeworth expansion to approximate the significance probability as

(28) pEW=Φ(T(θ^))-ϕ(T(θ^))γ~(θ^)[(T(θ^))2-1]6if extreme negative values oftindicate misfit,1-Φ(T(θ^))-ϕ(T(θ^))γ~(θ^)[(T(θ^))2-1]6if extreme positive values oftindicate misfit. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_\text {EW}^* = {\left\{ \begin{array}{ll} \Phi (T^*(\hat{\theta })) - \frac{\phi (T^*(\hat{\theta })) \tilde{\gamma }(\hat{\theta })[(T^*(\hat{\theta }))^2 - 1]}{6} &{}\text {if extreme negative values of }t^*\text { indicate misfit,} \\ 1 - \left( \Phi (T^*(\hat{\theta })) - \frac{\phi (T^*(\hat{\theta })) \tilde{\gamma }(\hat{\theta })[(T^*(\hat{\theta }))^2 - 1]}{6} \right) &{}\text {if extreme positive values of }t^*\text { indicate misfit.} \end{array}\right. } \nonumber \\ \end{aligned}$$\end{document}

If the Edgeworth expansion yields estimates of p that are smaller than 0 or larger than 1, we replace such values with estimates of p that are obtained by applying T(θ^) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*(\hat{\theta })$$\end{document} under the assumption of a standard normal null distribution.

The inverse CDF method can be used to transform the significance probability approximations in Eqs. 27 and 28 to person-fit statistics that are approximately standard normally distributed. If extreme negative values of t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^*$$\end{document} indicate misfit, the significance probability approximations can be transformed using Φ-1(p) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(p^*)$$\end{document} . If extreme positive values of t \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^*$$\end{document} indicate misfit, the significance probability approximations can be transformed using Φ-1(1-p) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}(1-p^*)$$\end{document} .

3. Simulation Study

3.1. Design and Analysis

We conducted a detailed simulation study to (a) examine the null distributions and (b) compare the Type I error rates and power of the new and existing statistics. The simulations were designed to mimic realistic testing conditions—therefore, an estimated ability parameter and a finite number of items were used to compute the person-fit statistics.

Three test lengths (12, 36, 72) were studied to represent short, medium, and long tests. For each test length, 1 million (10,000 examinees × \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 100 replications) score patterns were simulated. For each replication, new sets of person and item parameters were generated. As in Glas & Meijer (Reference Glas and Meijer2003), 90% of the examinees were non-aberrant—that is, they fit the model—and were used to study the null distributions and Type I error rates of the statistics. The remaining 10% of the examinees were divided equally into four groups of aberrant examinees and were used to study power. The four groups of aberrant examinees were characterized by the type of aberrant behavior (lack of motivation, item disclosure) and by the proportion of contaminated items ( 16 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{6}$$\end{document} , 13 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{3}$$\end{document} ).

Uncontaminated item scores were generated using the 3PLM. For each replication, the item parameters were sampled such that aiLognormal(0,0.252) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_i \sim Lognormal(0,0.25^2)$$\end{document} , biN(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_i \sim \mathcal {N}(0,1)$$\end{document} , and ciU(0.05,0.30) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_i \sim \mathcal {U}(0.05,0.30)$$\end{document} , as in Sinharay (Reference Sinharay2016b), and the person parameters were sampled such that θvN(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _v \sim \mathcal {N}(0,1)$$\end{document} . Contaminated item scores were generated after manipulating the item success probabilities. As in Glas & Meijer (Reference Glas and Meijer2003), lack of motivation was simulated as random guessing on the easiest items, with a success probability equal to 0.2. Item disclosure was simulated as preknowledge of the most difficult items, with a success probability equal to 0.9. By simulating lack of motivation on the easiest items and item disclosure on the most difficult items, we studied conditions in which ability estimates would be severely impacted and are therefore important to detect.

After simulating the data, each of the score patterns was analyzed 72 times: once for each combination of two classes of person-fit statistics (T, T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} ), three weight functions ( lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} ), four skewness corrections (none, Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, Edgeworth expansion), and three ability estimates (WL, ML, MAP). In all conditions, the item parameters were treated as known. This assumption is common in person-fit research, as it prevents the null distribution of the statistics from being affected by any uncertainty in the item parameter estimates (e.g., Molenaar & Hoijtink Reference Molenaar and Hoijtink1990; Snijders Reference Snijders2001; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer1999).

The ML and MAP estimates of ability were bounded between -4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-4$$\end{document} and 4. The standard normal distribution was used as the prior distribution for the MAP estimates. The standardized extended caution indices were computed after reversing the sign of the weights given in Eq. 5; thus, extreme negative values of the statistics indicated misfit. This adjustment was made to facilitate comparisons against the standardized log-likelihood statistics, for which extreme negative values also indicate misfit.

3.2. Results

The choice of ability estimate was found not to affect the relative performance of the person-fit statistics. Therefore, we focus on the WL estimates of ability (since they were computed without any bounds or prior distributions) and include results for the other ability estimates in “Appendix B”.

3.2.1. The Null Distributions of the Person-Fit Statistics

Figure 3 displays the first four moments of the null distributions of the person-fit statistics. Each row corresponds to a different moment (mean, variance, skewness, excess kurtosis), and each column corresponds to a different test length (12, 36, 72). Note that excess kurtosis is defined as the kurtosis minus 3. Horizontal dotted lines are used to indicate the values that are expected under the theoretical null distribution. It is desirable for the moments of the empirical null distributions to be as close to these values as possible.

Figure 3 reveals that the null distribution of T (i.e., the uncorrected statistic given by Eq. 12) is negatively skewed, has a variance smaller than 1, and has a mean that is slightly larger than 0. Similar results are shown in Table 1 of Li & Olejnik (Reference Li and Olejnik1997), Table 3 of Reise (Reference Reise1995), and Table 1 of van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999). The skewness-corrected statistics TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} return values of skewness that are closer to 0, but do not offer much improvement in terms of the mean or variance. Conversely, the mean and variance-corrected statistic T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} (that is given by Eq. 18) has a variance that is closer to 1, but does not offer much improvement in terms of skewness. Interestingly, although lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} has a mean that is closer to 0, ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} have means that are farther from 0. Similar results are shown in Tables 1 and 2 of van Krimpen-Stoop & Meijer (Reference van Krimpen-Stoop and Meijer1999) for lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} , and in Table 1 of Sinharay (Reference Sinharay2016b) for ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} .

The newly proposed statistics TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} , Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} , and TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} are the only statistics to incorporate mean, variance, and skewness corrections. Therefore, it is not surprising to see that these are the only statistics that improve both the skewness and the variance. Figure 3 also reveals that although these statistics have similar means and variances, Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} is the most effective at reducing skewness, followed by TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} and then TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} . This finding parallels the results for the class of T statistics, as shown in Fig. 3 and in previous research (Bedrick Reference Bedrick1997; Molenaar & Hoijtink Reference Molenaar and Hoijtink1990; Santos et al. Reference Santos, de la Torre and von Davier2020; von Davier & Molenaar Reference von Davier and Molenaar2003).

Figure 3 Descriptive statistics of the null distributions of the person-fit statistics. CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

3.2.2. Type I Error Rates

Figure 4 displays the Type I error rates of the person-fit statistics. Each row corresponds to a different significance level (.01,  .02,  .05,  .10), and each column corresponds to a different test length (12, 36, 72). Horizontal dotted lines are used to indicate the significance levels. It is desirable for the Type I error rates to be at or below these lines.

Figure 4 Type I error rates of the person-fit statistics. CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Figure 4 reveals that the Type I error rates of T (i.e., the uncorrected statistic) vary depending on the weight function that is used. For lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , the Type I error rate is consistently smaller than the nominal level. This result can largely be attributed to the reduced variance of the statistic (see Fig. 3). In contrast, for ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , the Type I error rates are slightly larger than the nominal level when α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} is small and the test is short, but are close to or smaller than the nominal level in all other instances.

The Type I error rates of the skewness-corrected statistics TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} are always smaller than the Type I error rates of T. While this result is desirable in instances where the Type I error rates of T are inflated (e.g., when ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} are used with a small α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} and a short test), it is undesirable in instances where T is already conservative, which seems to be the more common case. Therefore, TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} are of limited utility in the present context.

The Type I error rates of the mean and variance-corrected statistic T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} are always larger than the Type I error rates of T. When α=.10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.10$$\end{document} , this result is useful, since the Type I error rates of T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} are closer to, but still do not exceed, the nominal level. In contrast, when α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} ,  .02, or  .05, the Type I error rates of T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} often exceed the nominal level, which is undesirable in practice.

The Type I error rates of the newly proposed statistics TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} , Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} , and TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} are generally quite favorable and seem to overcome the limitations of the other corrected statistics. That is, the Type I error rates tend to be close to the nominal level (unlike TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}$$\end{document} , Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}$$\end{document} , and TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}$$\end{document} ), while still not exceeding it (unlike T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} ). Of the three new statistics, TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} produces Type I error rates that are closest to the nominal level, followed by TEW \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {EW}^*$$\end{document} when α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} , or Tχ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\chi ^2}^*$$\end{document} when α=.02 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.02$$\end{document} ,  .05, or  .10. These conclusions are the same regardless of test length or the choice of weight function.

Figure 5 displays the Type I error rates by quintile of the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistics. (The results for ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} , ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} are similar and are available upon request from the first author.) Quintiles were formed by separating examinees based on θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} , such that low-ability examinees were placed in Quintile 1, high-ability examinees were placed in Quintile 5, and the examinees in between were sorted accordingly. Notably, the Type I error rates of the newly proposed statistics are similar across ability levels.

3.2.3. Power

Table 1 displays the power of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} at the α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} significance level. (The results for ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} , ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} are similar and are available upon request.) Each row corresponds to a different combination of aberrance and test length, and each column corresponds to a different combination of person-fit statistic and skewness correction. As expected, power increases as test length increases and as the proportion of contaminated items increases. Furthermore, across all conditions, the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic with no skewness correction is the most powerful, followed closely by the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the Cornish–Fisher expansion, the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the Edgeworth expansion, and then the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation. Thus, it appears that the new statistics can be used without a significant loss in power. Similar results are shown in Table 2 at the α=.05 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document} significance level.

Figure 5 Type I error rates by quintile of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} . CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Table 1 Power of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} ( α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} ).

CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Table 2 Power of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} ( α=.05 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document} ).

CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

4. Real Data Example

4.1. Data and Analysis

The data in this example originate from a single form of a licensure examination. The data have been studied in the context of person-fit assessment by researchers such as Sinharay (Reference Sinharay2016b), and in the context of preknowledge detection in several chapters of Cizek & Wollack (Reference Cizek and Wollack2017). Item scores are available for 1644 examinees on 170 scored items. Following a statistical analysis and careful investigative process, the testing program flagged 61 items as being compromised, and 48 examinees as likely having engaged in fraudulent behavior. The 48 flagged examinees can be considered truly aberrant for purposes of the present analysis. However, it is important to note that other types of aberrance may be present among some of the non-flagged examinees, as well.

The Rasch model parameter estimates provided by the testing program were treated as the true item parameters. Then, using the WL estimates of ability, each score pattern was analyzed 24 times: once for each combination of two classes of person-fit statistics (T, T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} ), three weight functions ( lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} ), and four skewness corrections (none, Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, Edgeworth expansion).

4.2. Results

Tables 3 and 4 display the proportions of examinees classified as aberrant and the agreement rates for lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} at the α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} and α=.05 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document} significance levels, respectively. (The results for ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1^*$$\end{document} , ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} , and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2^*$$\end{document} are similar and are available upon request.) In each table, the proportions of examinees classified as aberrant are displayed in bold text along the diagonal, and the agreement rates are displayed in non-bold text in the off-diagonal. Agreement rate is defined as the proportion of times two statistics make the same classification decision (aberrant or non-aberrant).

Table 3 Proportions of examinees classified as aberrant (diagonal) and agreement rates (off-diagonal) ( α=.01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document} ).

CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Table 4 Proportions of examinees classified as aberrant (diagonal) and agreement rates (off-diagonal) ( α=.05 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document} ).

CF Cornish–Fisher expansion, χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} χ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Across all person-fit statistics and skewness corrections, the proportions of flagged examinees classified as aberrant are much larger than the proportions of non-flagged examinees classified as aberrant. This result provides favorable evidence regarding the performance of the statistics. In addition, the proportions of non-flagged examinees classified as aberrant consistently exceed the significance levels. This result is interesting, as it implies that aberrance is present among some of the non-flagged examinees, as well. For example, some of the non-flagged examinees may have engaged in fraudulent behavior, but were mistakenly not flagged by the testing program. It is also possible that some of the non-flagged examinees had engaged in a different type of aberrant behavior altogether.

Consistent with the simulation results, the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic with no skewness correction classified the most examinees as aberrant, followed by the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic corrected using the Cornish–Fisher expansion. Notably, all four variants of the lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} statistic classified the same sets of flagged examinees as aberrant—however, the skewness-corrected statistics classified fewer non-flagged examinees as aberrant. This result shows that the new statistics may lead to noticeable differences in tests having as many as 170 items.

5. Discussion

Many popular person-fit statistics—including lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} , ζ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _1$$\end{document} , and ζ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\zeta _2$$\end{document} —belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. In this paper, we proposed three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets (as is the case for resampling-based methods). Detailed simulations further revealed that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. They therefore outperform the existing corrections for T that were suggested by Bedrick (Reference Bedrick1997), Molenaar & Hoijtink (Reference Molenaar and Hoijtink1990), and Snijders (Reference Snijders2001).

Based on the results of the simulation study, we created the following set of guidelines for users to follow while selecting an appropriate person-fit statistic:

  • When α.10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \ge .10$$\end{document} , it is recommended that users apply the existing T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} statistic of Snijders (Reference Snijders2001).

  • When α<.10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha <.10$$\end{document} , it is recommended that users apply the newly proposed TCF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_\text {CF}^*$$\end{document} statistic.

Note that the recommended statistics are those that were shown to display the largest power while still controlling the Type I error rate.

We would also like to remind readers that person-fit statistics, by definition, are most appropriate when the goal is to detect general misfit at the person level. If the goal is to detect a specific type of misfit, such as item preknowledge or test speededness, or if the goal is to detect misfit at the person-by-item level, then alternative methods may be more suitable.

There are several limitations to this work, providing many opportunities for future research. First, it is possible to study shorter tests, to explore the boundaries of the proposed methods to see if there is a point at which they fail or break down. It is also possible to study additional simulation conditions. For example, we simulated lack of motivation on the easiest items and item disclosure on the most difficult items, thereby considering extreme conditions in which ability estimates are severely impacted. However, in practice, such behaviors could happen on more than just the easiest and most difficult items. Therefore, future researchers could simulate less extreme conditions to compare the statistics under a more realistic setting. In our simulations, we also assumed that the item parameters were known. However, researchers such as Cheng & Yuan (Reference Cheng and Yuan2010) have shown that the error associated with item parameter estimation affects the distribution of the person parameter estimates. It would be interesting to study the extent to which this error affects the distributions of the person-fit statistics, as well.

Second, the efficient corrections that are described in this paper could be compared to the resampling-based methods that are described in Sinharay (Reference Sinharay2016a). The methods should be compared in terms of the false-positive rate, true-positive rate, and computation time. Third, the new corrections could be applied to other standardized person-fit statistics, such as the standardized infit and outfit statistics (Magis et al. Reference Magis, Béland and Raîche2014). Fourth, the new corrections could be applied using other ability estimates, such as the biweight estimate and the Huber estimate. Sinharay (Reference Sinharay2016d) found that the use of both estimates with T \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T^*$$\end{document} produces inflated Type I error rates, suggesting that the new corrections may be particularly useful in these settings.

Finally, in this study, we developed corrections for the class of standardized person-fit statistics within a very narrow context: non-adaptive, unidimensional tests with only dichotomous items. Standardized person-fit statistics have been applied in other contexts, including adaptive tests (e.g., Nering Reference Nering1997; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer1999), multidimensional tests with simple structure (e.g., Albers et al. Reference Albers, Meijer and Tendeiro2016; Hong et al. Reference Hong, Lin and Cheng2021), tests with polytomous items (e.g., Gorney & Wollack Reference Gorney and Wollack2023; Hong et al. Reference Hong, Lin and Cheng2021; Sinharay Reference Sinharay2016c; van Krimpen-Stoop & Meijer Reference van Krimpen-Stoop and Meijer2002; von Davier & Molenaar Reference von Davier and Molenaar2003), and tests with response times (e.g., Gorney et al. Reference Gorney, Sinharay and Liu2024). Standardized person-fit statistics have also been applied in cognitive diagnosis modeling (e.g., Santos et al. Reference Santos, de la Torre and von Davier2020). In each of these contexts, the assumption of the standard normal null distribution has been shown to be inappropriate when realistic testing conditions are simulated, suggesting that corrections may be beneficial.

Funding

This work was completed while the first author was an Educational Testing Service (ETS) Harold Gulliksen Psychometric Research Fellow.

Conflict of interest

The authors declare that they have no conflict of interest.

Data Availability

The data that support the findings of this study are available from Dr. James Wollack upon reasonable request.

Appendix A

R Code to Compute the Person-Fit Statistics

Appendix B

Ability Estimates

See Figs. 6 and 7.

Figure 6 Descriptive statistics of the null distributions of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} .

Figure 7 Type I error rates of lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} .

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Albers, C. J., Meijer, R. R., Tendeiro, J. N.. (2016). Derivation and applicability of asymptotic results for multiple subtests person-fit statistics. Applied Psychological Measurement, 40(4), 274288.CrossRefGoogle ScholarPubMed
Bedrick, E. J.. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62(2), 191199.CrossRefGoogle Scholar
Cheng, Y., Yuan, K.-H.. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280291.CrossRefGoogle ScholarPubMed
Cizek, G. J., & Wollack, J. A. (Eds.). (2017). Handbook of quantitative methods for detecting cheating on tests. Routledge.Google Scholar
de la Torre, J., Deng, W.. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159177.CrossRefGoogle Scholar
Drasgow, F., Levine, M. V., Williams, E. A.. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 6786.CrossRefGoogle Scholar
Glas, C. A. W., Meijer, R. R.. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27(3), 217233.CrossRefGoogle Scholar
Gorney, K., Sinharay, S., Liu, X.. (2024). Using item scores and response times in person-fit assessment. British Journal of Mathematical and Statistical Psychology, 77(1), 151168.CrossRefGoogle ScholarPubMed
Gorney, K., Wollack, J. A.. (2023). Using item scores and distractors in person-fit assessment. Journal of Educational Measurement, 60(1), 327.CrossRefGoogle Scholar
Hong, M., Lin, L., Cheng, Y.. (2021). Asymptotically corrected person fit statistics for multidimensional constructs with simple structure and mixed item types. Psychometrika, 86(2), 464488.CrossRefGoogle ScholarPubMed
Li, M. F., Olejnik, S.. (1997). The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21(3), 215231.CrossRefGoogle Scholar
Magis, D., Béland, S., Raîche, G.. (2014). Snijders’s correction of the infit and outfit indices with estimated ability level: An analysis with the Rasch model. Journal of Applied Measurement, 15(1), 8293.Google Scholar
Magis, D., Raîche, G., Béland, S.. (2012). A didactic presentation of Snijders’s lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37(1), 5781.CrossRefGoogle Scholar
Molenaar, I. W., Hoijtink, H.. (1990). The many null distributions of person fit indices. Psychometrika, 55(1), 75106.CrossRefGoogle Scholar
Nering, M. L.. (1997). The distribution of indexes of person fit within the computerized adaptive testing environment. Applied Psychological Measurement, 21(2), 115127.CrossRefGoogle Scholar
Noonan, B. W., Boss, M. W., Gessaroli, M. E.. (1992). The effect of test length and IRT model on the distribution and stability of three appropriateness indexes. Applied Psychological Measurement, 16(4), 345352.CrossRefGoogle Scholar
Reise, S. P.. (1995). Scoring method and the detection of person misfit in a personality assessment context. Applied Psychological Measurement, 19(3), 213229.CrossRefGoogle Scholar
Santos, K. C. P., de la Torre, J., von Davier, M.. (2020). Adjusting person fit index for skewness in cognitive diagnosis modeling. Journal of Classification, 37(2), 399420.CrossRefGoogle Scholar
Sinharay, S. (2016a). Assessment of person fit using resampling-based approaches. Journal of Educational Measurement, 53(1), 63–85.CrossRefGoogle Scholar
Sinharay, S. (2016b). Asymptotic corrections of standardized extended caution indices. Applied Psychological Measurement, 40(6), 418–433.CrossRefGoogle Scholar
Sinharay, S. (2016c). Asymptotically correct standardization of person-fit statistics beyond dichotomous items. Psychometrika, 81(4), 992–1013.CrossRefGoogle Scholar
Sinharay, S. (2016d). The choice of the ability estimate with asymptotically correct standardized person-fit statistics. British Journal of Mathematical and Statistical Psychology, 69(2), 175–193.CrossRefGoogle Scholar
Snijders, T. A. B.. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66(3), 331342.CrossRefGoogle Scholar
Tatsuoka, K. K.. (1984). Caution indices based on item response theory. Psychometrika, 49(1), 95110.CrossRefGoogle Scholar
van Krimpen-Stoop, E. M. L. A., Meijer, R. R.. (1999). The null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23(4), 327345.CrossRefGoogle Scholar
van Krimpen-Stoop, E. M. L. A., Meijer, R. R.. (2002). Detection of person misfit in computerized adaptive testing with polytomous items. Applied Psychological Measurement, 26(2), 164180.CrossRefGoogle Scholar
von Davier, M., Molenaar, I. W.. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68(2), 213228.CrossRefGoogle Scholar
Warm, T. A.. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427450.CrossRefGoogle Scholar
Figure 0

Figure 1 The null distributions of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document}.

Figure 1

Figure 2 The null distributions of lz∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document}.

Figure 2

Figure 3 Descriptive statistics of the null distributions of the person-fit statistics. CF Cornish–Fisher expansion, χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Figure 3

Figure 4 Type I error rates of the person-fit statistics. CF Cornish–Fisher expansion, χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Figure 4

Figure 5 Type I error rates by quintile of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document}. CF Cornish–Fisher expansion, χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document} approximation, EW Edgeworth expansion.

Figure 5

Table 1 Power of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} (α=.01\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document}).

Figure 6

Table 2 Power of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document} (α=.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document}).

Figure 7

Table 3 Proportions of examinees classified as aberrant (diagonal) and agreement rates (off-diagonal) (α=.01\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.01$$\end{document}).

Figure 8

Table 4 Proportions of examinees classified as aberrant (diagonal) and agreement rates (off-diagonal) (α=.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =.05$$\end{document}).

Figure 9

Figure 6 Descriptive statistics of the null distributions of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document}.

Figure 10

Figure 7 Type I error rates of lz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z$$\end{document} and lz∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_z^*$$\end{document}.