Hostname: page-component-599cfd5f84-v8j7l Total loading time: 0 Render date: 2025-01-07T07:27:19.841Z Has data issue: false hasContentIssue false

Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach

Published online by Cambridge University Press:  01 January 2025

Yang Liu*
Affiliation:
University of California, Merced
Brooke E. Magnus
Affiliation:
The University of North Carolina at Chapel Hill
David Thissen
Affiliation:
The University of North Carolina at Chapel Hill
*
Correspondence should be made to Yang Liu, School of Social Sciences, Humanities and Arts, University of California, Merced, 5200 North Lake Rd, Merced, CA 95343, USA. Email: [email protected]

Abstract

Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.

Type
Article
Copyright
Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abrahamowicz, M., Ramsay, J.O. (1992). Multicategorical spline model for item response theory. Psychometrika, 57(1), 527CrossRefGoogle Scholar
Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55(1), 117128CrossRefGoogle ScholarPubMed
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716723CrossRefGoogle Scholar
Angoff, W.H. (1993). Perspectives on differential item functioning methodology. In Holland, P.W., Wainer, H. (Eds.), Differential Item Functioning (pp. 323), Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar
Bauer, D.J., Hussong, A.M. (2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14(2), 101125CrossRefGoogle ScholarPubMed
Bedrick, E.J., Tsai, C.-L. (1994). Model selection for multivariate regression in small samples. Biometrics, 50(1), 226231CrossRefGoogle Scholar
Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289300CrossRefGoogle Scholar
Bock, R.D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443459CrossRefGoogle Scholar
Bock, R.D., Wainer, H., Petersen, A., Thissen, D., Murray, J., Roche, A. (1973). A parameterization for individual human growth curves. Human Biology, 45(1), 6380Google ScholarPubMed
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 3357CrossRefGoogle Scholar
Currie, I.D., Durban, M., Eilers, P.H. (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 259280CrossRefGoogle Scholar
Eilers, P. H. & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–102.CrossRefGoogle Scholar
Eysenck, S.B., Eysenck, H.J., Barrett, P. (1985). A revised version of the psychoticism scale. Personality and Individual Differences, 6(1), 2129CrossRefGoogle Scholar
Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8(3), 647667Google Scholar
Glas, C.A.W. (2001). Differential item functioning depending on general covariates. In Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (Eds.), Essays on item response theory (pp. 131148), New York, NY: SpringerCrossRefGoogle Scholar
Green, P.J., Silverman, B.W. (1994). Nonparametric regression and generalized linear models: A roughness penalty approach, Boca Raton, FL: CRC PressCrossRefGoogle Scholar
Hastie, T.J., Tibshirani, R.J. (1990). Generalized additive models, Boca Raton, FL: CRC PressGoogle Scholar
Holland, P.W., Thayer, D.T. (1988). Differential item performance and the Mantel–Haenszel procedure. In Wainer, H., Braun, H.I. (Eds.), Test validity (pp. 129145), Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar
Jöreskog, K.G., Goldberger, A.S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631639Google Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Lawrence ErlbaumGoogle Scholar
Millsap, R.E., Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297334CrossRefGoogle Scholar
Moustaki, I. (2003). A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables. British Journal of Mathematical and Statistical Psychology, 56(2), 337357CrossRefGoogle ScholarPubMed
Muthén, L.K., Muthén, B.O. (2012). Mplus User’s Guide, Los Angeles, CA: Muthén & MuthénGoogle Scholar
Ramsay, J. (1998). Estimating smooth monotone functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(2), 365375CrossRefGoogle Scholar
Ramsay, J., Winsberg, S. (1991). Maximum marginal likelihood estimation for semiparametric item analysis. Psychometrika, 56(3), 365379CrossRefGoogle Scholar
Ramsay, J.O., Silverman, B.W. (1997). Functional data analysis, New York, NY: SpringerCrossRefGoogle Scholar
Rossi, N., Wang, X., Ramsay, J.O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27(3), 291317CrossRefGoogle Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461464CrossRefGoogle Scholar
Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Boca Raton, FL: CRC PressCrossRefGoogle Scholar
Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9(6), 11351151CrossRefGoogle Scholar
Swaminathan, H., Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361370CrossRefGoogle Scholar
Thissen, D. (1986). Non-monotonic item characteristic curves. In Invited presentation at the Annual Meeting of the American Educational Association, San Francisco, CA, USA, April 1986.Google Scholar
Thissen, D., Steinberg, L., Kuang, D. (2002). Quick and easy implementation of the Benjamini–Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27(1), 7783CrossRefGoogle Scholar
Thissen, D., Steinberg, L., Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P.W., Wainer, H. (Eds.), Differential Item Functioning (pp. 67113), Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar
Van der Vaart, A.W. (2000). Asymptotic statistics, New York, NY: Cambridge University PressGoogle Scholar
Varni, J.W., Thissen, D., Stucky, B.D., Liu, Y., Magnus, B., Quinn, H. et.al (2014). PROMIS parent proxy report scales for children ages 5–7 years: An item response theory analysis of differential item functioning across age groups. Quality of Life Research, 23(1), 349361CrossRefGoogle ScholarPubMed
Wang, W.-C., Yeh, Y-L (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479498CrossRefGoogle Scholar
Wang, X., Bradlow, E. T. & Wainer, H. (2004). User’s guide for SCORIGHT (Version 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis. Research Report 04–49. Princeton, NJ: Educational Testing Services.Google Scholar
Wang, X., Bradlow, E. T., & Wainer, H. (2005). User’s guide for SCORIGHT (version 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis. ETS Research Report Series, 2004(2), 1–59.Google Scholar
Woods, C.M. (2006). Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychological Methods, 11(3), 253CrossRefGoogle ScholarPubMed
Woods, C.M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44, 127CrossRefGoogle ScholarPubMed
Woods, C.M., Grimm, K.J. (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339361CrossRefGoogle Scholar
Woods, C.M., Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281301CrossRefGoogle ScholarPubMed