Algorithms for Measurement Invariance Testing: Contrasts and Connections

Veronica Cole; Conor H. Lacey

doi:10.1017/9781009303408

Series: Elements in Research Methods for Developmental Science

Algorithms for Measurement Invariance Testing

Contrasts and Connections

Published online by Cambridge University Press: 02 December 2023

Veronica Cole and

Conor H. Lacey

Show author details

Veronica Cole: Affiliation:
Wake Forest University, North Carolina
Conor H. Lacey: Affiliation:
Wake Forest University, North Carolina

Summary

Latent variable models are a powerful tool for measuring many of the phenomena in which developmental psychologists are often interested. If these phenomena are not measured equally well among all participants, this would result in biased inferences about how they unfold throughout development. In the absence of such biases, measurement invariance is achieved; if this bias is present, differential item functioning (DIF) would occur. This Element introduces the testing of measurement invariance/DIF through nonlinear factor analysis. After introducing models which are used to study these questions, the Element uses them to formulate different definitions of measurement invariance and DIF. It also focuses on different procedures for locating and quantifying these effects. The Element finally provides recommendations for researchers about how to navigate these options to make valid inferences about measurement in their own data.

Element contents

Summary
References

Get access

Keywords

differential item functioning measurement invariance measurement bias latent variable models item response theory

Type: Element
Information: Series: Elements in Research Methods for Developmental Science

DOI: https://doi.org/10.1017/9781009303408 [Opens in a new window]

Online ISBN: 9781009303408

Publisher: Cambridge University Press

Print publication: 21 December 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Element purchase

Temporarily unavailable

References

Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495–508.Google Scholar

Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22(3), 507–526. https://doi.org/10.1037/met0000077.CrossRef Google Scholar PubMed

Bauer, D. J., Belzak, W. C. M., & Cole, V. T. (2020). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Structural Equation Modeling, 27(1), 43–55. https://doi.org/10.1080/10705511.2019.1642754.Google Scholar

Bauer, D. J., & Hussong, A. M. (2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14(2), 101–125. https://doi.org/10.1037/a0015583.CrossRef Google Scholar PubMed

Belzak, W. C. M. (2020). Testing differential item functioning in small samples. Multivariate Behavioral Research, 55(5), 722–747. https://doi.org/10.1080/00273171.2019.1671162.Google Scholar

Belzak, W. C. M., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25(6), 673–690. https://doi.org/10.1037/met0000253.Google Scholar

Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107(2), 238.Google Scholar

Birmaher, B., Khetarpal, S., Brent, D., et al. (1997). The screen for child anxiety related emotional disorders (SCARED): Scale construction and psychometric characteristics. Journal of the American Academy of Child & Adolescent Psychiatry, 36(4), 545–553.CrossRef Google Scholar PubMed

Birnbaum, A. (1969). Statistical theory for logistic mental test models with a prior distribution of ability. Journal of Mathematical Psychology, 6(2), 258–276.Google Scholar

Bock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In Linden, W. J., & Hambleton, R. K., (eds.), Handbook of modern item response theory (pp. 433–448). Springer.CrossRef Google Scholar

Bollen, K. A. (1989). Structural equations with latent variables (Vol. 210). John Wiley & Sons.Google Scholar

Bond, T., Yan, Z., & Heene, M. (2020). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge.Google Scholar

Brannick, M. T. (1995). Critical comments on applying covariance structure modeling. Journal of Organizational Behavior, 16(3), 201–213.Google Scholar

Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466.Google Scholar

Chalmers, R. P. (2023). A unified comparison of IRT-based effect sizes for DIF investigations. Journal of Educational Measurement, 60(2), 318–350.CrossRef Google Scholar

Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76(1), 114–140.Google Scholar

Chang, H.-H., & Mazzeo, J. (1994). The unique correspondence of the item response function and item category response functions in polytomously scored item response models. Psychometrika, 59(3), 391–404.Google Scholar

Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504.Google Scholar

Chen, F. F. (2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95(5), 1005–1018. https://doi.org/10.1037/a0013193.Google Scholar

Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A reconceptualization and proposed new method. Journal of Management, 25(1), 1–27.Google Scholar

Cheung, G. W., & Rensvold, R. B. (2002a). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255.CrossRef Google Scholar

Cheung, G. W., & Rensvold, R. B. (2002b). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. https://doi.org/10.1207/s15328007sem0902_5.CrossRef Google Scholar

Cohen, D. J., Dibble, E., & Grawe, J. M. (1977). Parental style: Mothers’ and fathers’ perceptions of their relations with twin children. Archives of General Psychiatry, 34(4), 445–451.Google Scholar

Cole, V. T., Hussong, A. M., Gottfredson, N. C., Bauer, D. J., & Curran, P. J. (2022). Informing harmonization decisions in integrative data analysis: Exploring the measurement multiverse. Prevention Science, 1–13.Google Scholar

Curran, P. J., Cole, V., Bauer, D. J., Hussong, A. M., & Gottfredson, N. (2016). Improving factor score estimation through the use of observed background characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 827–844.CrossRef Google Scholar PubMed

Curran, P. J., Cole, V. T., Bauer, D. J., Rothenberg, W. A., & Hussong, A. M. (2018). Recovering predictor–criterion relations using covariate-informed factor score estimates. Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 860–875.Google Scholar

Curran, P. J., McGinley, J. S., Bauer, D. J., et al. (2014). A moderated nonlinear factor model for the development of commensurate measures in integrative data analysis. Multivariate Behavioral Research, 49(3), 214–231.CrossRef Google Scholar PubMed

DeMars, C. E. (2009). Modification of the Mantel-Haenszel and logistic regression dif procedures to incorporate the sibtest regression correction. Journal of Educational and Behavioral Statistics, 34(2), 149–170.Google Scholar

DiStefano, C., Shi, D., & Morgan, G. B. (2021). Collapsing categories is often more advantageous than modeling sparse data: Investigations in the CFA framework. Structural Equation Modeling: A Multidisciplinary Journal, 28(2), 237–249.Google Scholar

DiStefano, C., Zhu, M., & Mindrila, D. (2009). Understanding and using factor scores: Considerations for the applied researcher. Practical assessment, Research, and Evaluation, 14(1), 1–14. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1226&context=pare.Google Scholar

Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23(4), 355–368.Google Scholar

Edelen, M. O., Stucky, B. D., & Chandra, A. (2015). Quantifying ‘problematic’ DIF within an IRT framework: Application to a cancer stigma index. Quality of Life Research, 24, 95–103.CrossRef Google Scholar PubMed

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.Google Scholar

Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized network psychometrics: Combining network and latent variable models. Psychometrika, 82(4), 904–927.CrossRef Google Scholar PubMed

Ferrando, P. J. (2002). Theoretical and empirical comparisons between two models for continuous item response. Multivariate Behavioral Research, 37(4), 521–542.CrossRef Google Scholar PubMed

Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278–295.Google Scholar

Fischer, H. F., & Rose, M. (2019). Scoring depression on a common metric: A comparison of EAP estimation, plausible value imputation, and full Bayesian IRT modeling. Multivariate Behavioral Research, 54(1), 85–99.Google Scholar

Flake, J. K., & McCoach, D. B. (2018). An investigation of the alignment method with polytomous indicators under conditions of partial measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 25(1), 56–70.Google Scholar

Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466–491.Google Scholar

French, B. F., & Finch, W. H. (2006). Confirmatory factor analytic procedures for the determination of measurement invariance. Structural Equation Modeling, 13(3), 378–402. https://doi.org/10.1207/s15328007sem1303_3.Google Scholar

Gottfredson, N. C., Cole, V. T., Giordano, M. L., et al. (2019). Simplifying the implementation of modern scale scoring methods with an automated R package: Automated moderated nonlinear factor analysis (AMNLFA). Addictive Behaviors, 94, 65–73.CrossRef Google Scholar PubMed

Gray, M., & Sanson, A. (2005). Growing up in Australia: The longitudinal study of Australian children. Family Matters, (72), 4–9.Google Scholar

Grice, J. W. (2001). Computing and evaluating factor scores. Psychological Methods, 6(4), 430.CrossRef Google Scholar PubMed

Gunn, H. J., Grimm, K. J., & Edwards, M. C. (2020). Evaluation of six effect size measures of measurement non-invariance for continuous outcomes. Structural Equation Modeling, 27(4), 503–514. https://doi.org/10.1080/10705511.2019.1689507.CrossRef Google Scholar

Holland, P. W., & Thayer, D. T. (1986). Differential item functioning and the Mantel–Haenszel procedure. ETS Research Report Series, 1986(2), i–24.CrossRef Google Scholar

Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research, 18(3), 117–144.Google Scholar

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (Vol. 398). John Wiley.Google Scholar

Hoyle, R. H. (1995). Structural equation modeling: Concepts, issues, and applications. Sage.Google Scholar

Jiang, H., & Stout, W. (1998). Improved type I error control and reduced estimation bias for DIF detection using SIBTEST. Journal of Educational and Behavioral Statistics, 23(4), 291–322.Google Scholar

Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.Google Scholar

Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36(4), 409–426.CrossRef Google Scholar

Kim, E. S., Cao, C., Wang, Y., & Nguyen, D. T. (2017). Measurement invariance testing with many groups: A comparison of five approaches. Structural Equation Modeling: A Multidisciplinary Journal, 24(4), 524–544.Google Scholar

Knott, M., & Bartholomew, D. J. (1999). Latent variable models and factor analysis (Vol. 7). Edward Arnold.Google Scholar

Kopf, J., Zeileis, A., & Strobl, C. (2015a). Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educational and Psychological Measurement, 75(1), 22–56.CrossRef Google Scholar PubMed

Kopf, J., Zeileis, A., & Strobl, C. (2015b). A framework for anchor methods and an iterative forward approach for DIF detection. Applied Psychological Measurement, 39(2), 83–103.CrossRef Google Scholar

Lai, M. H., Liu, Y., & Tse, W. W. Y. (2022). Adjusting for partial invariance in latent parameter estimation: Comparing forward specification search and approximate invariance methods. Behavior Research Methods, 54(1), 414–434. https://doi.org/10.3758/s13428-021-01560-2.Google Scholar

Lai, M. H. C., & Zhang, Y. (2022). Classification accuracy of multidimensional tests: Quantifying the impact of noninvariance. Structural Equation Modeling: A Multidisciplinary Journal, 29(4), 620–629, 1–10. https://doi.org/10.1080/10705511.2021.1977936.Google Scholar

Li, H.-H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647–677.Google Scholar

Li, Z., & Zumbo, B. D. (2009). Impact of differential item functioning on subsequent statistical conclusions based on observed test score data. Psicológica, 30(2), 343–370.Google Scholar

Lubke, G., & Neale, M. (2008). Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behavioral Research, 43(4), 592–620.Google Scholar

MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111(3), 490–504.CrossRef Google Scholar PubMed

Marsh, H. W., Guo, J., Parker, P. D., et al. (2018). What to do when scalar invariance fails: The extended alignment method for multi-group factor analysis comparison of latent means across many groups. Psychological Methods, 23(3), 524–545.CrossRef Google Scholar

Maydeu-Olivares, A., & Cai, L. (2006). A cautionary note on using g2 (DIF) to assess relative model fit in categorical data analysis. Multivariate Behavioral Research, 41(1), 55–64.CrossRef Google Scholar PubMed

McCullagh, P., & Nelder, J. A. (2019). Generalized linear models. Routledge.CrossRef Google Scholar

McNeish, D., & Wolf, M. G. (2020). Thinking twice about sum scores. Behavior Research Methods, 52(6), 2287–2305.CrossRef Google Scholar PubMed

Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93(3), 568–592. https://doi.org/10.1037/0021-9010.93.3.568.Google Scholar

Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7(4), 361–388. https://doi.org/10.1177/1094428104268027.Google Scholar

Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97(5), 1016–1031. https://doi.org/10.1037/a0027934.CrossRef Google Scholar PubMed

Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13(2), 127–143. https://doi.org/10.1016/0883-0355(89)90002-5.CrossRef Google Scholar

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543.Google Scholar

Millsap, R. E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2(3), 248–260.Google Scholar

Millsap, R. E. (1998). Group differences in regression intercepts: Implications for factorial invariance. Multivariate Behavioral Research, 33(3), 403–424.Google Scholar

Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge. https://doi.org/10.4324/9780203821961.Google Scholar

Millsap, R. E., & Kwok, O.-M. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9(1), 93–115. https://doi.org/10.1037/1082-989x.9.1.93.Google Scholar

Millsap, R. E., & Meredith, W. (2007). Factorial invariance: Historical perspectives and new problems. In Cudeck, R., and MacCallum, R. C., (eds.), Factor analysis at 100 (pp. 145–166). Routledge.Google Scholar

Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515.Google Scholar

Muraki, E., & Engelhard Jr, G. (1985). Full-information item factor analysis: Applications of EAP scores. Applied Psychological Measurement, 9(4), 417–430.Google Scholar

Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54(4), 557–585.CrossRef Google Scholar

Muthén, B., & Asparouhov, T. (2018). Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociological Methods & Research, 47(4), 637–664.Google Scholar

Nye, C. D., Bradburn, J., Olenick, J., Bialko, C., & Drasgow, F. (2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence: Organizational Research Methods, 22(3), 678–709. https://doi.org/10.1177/1094428118761122.Google Scholar

Nye, C. D., & Drasgow, F. (2011). Effect size indices for analyses of measurement equivalence: Understanding the practical importance of differences between groups. Journal of Applied Psychology, 96(5), 966–980. https://doi.org/10.1037/a0022955.Google Scholar

Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning. Sage.CrossRef Google Scholar

Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41(41), 71–90. https://doi.org/10.1016/j.dr.2016.06.004.Google Scholar

Raykov, T., Marcoulides, G. A., Harrison, M., & Zhang, M. (2020). On the dependability of a popular procedure for studying measurement invariance: A cause for concern? Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 649–656.CrossRef Google Scholar

Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21(1), 25–36.CrossRef Google Scholar

Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114(3), 552–566.CrossRef Google Scholar PubMed

Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.CrossRef Google Scholar

Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel–Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215–230.Google Scholar

Rutter, M., Bailey, A., & Lord, C. (2003). SCQ. The Social Communication Questionnaire. Torrance, CA: Western Psychological Services. https://www.wpspublish.com/store/p/2954/social-communication-questi-onnaire-scq.Google Scholar

Samejima, F. (1997). Graded response model. In Linden, W. J., & Hambleton, R. K., (eds.), Handbook of modern item response theory (pp. 85–100). Springer.Google Scholar

Sanson, A. V., Nicholson, J., Ungerer, J., et al. (2002). Introducing the longitudinal study of Australian children. Australian Institute of Family Studies.Google Scholar

Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66(4), 507–514.CrossRef Google Scholar

Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50(1), 83–90.Google Scholar

Savalei, V., & Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural equation modeling. Psychological Methods, 13(2), 150–170.Google Scholar

Schiltz, H. K., & Magnus, B. E. (2021). Differential item functioning based on autism features, IQ, and age on the screen for child anxiety related disorders (SCARED) among youth on the autism spectrum. Autism Research, 14(6), 1220–1236.Google Scholar

Schneider, L., Chalmers, R. P., Debelak, R., & Merkle, E. C. (2020). Model selection of nested and non-nested item response models using Vuong tests. Multivariate Behavioral Research, 55(5), 664–684.CrossRef Google Scholar PubMed

Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159–194.CrossRef Google Scholar

Shi, D., Song, H., DiStefano, C., et al. (2019). Evaluating factorial invariance: An interval estimation approach using bayesian structural equation modeling. Multivariate Behavioral Research, 54(2), 224–245. https://doi.org/10.1080/00273171.2018.1514484.Google Scholar

Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66, 563–575.Google Scholar

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Chapman Hall/CRC.CrossRef Google Scholar

Stark, S., Chernyshenko, O. S., & Drasgow, F. (2004). Examining the effects of differential item (functioning and differential) test functioning on selection decisions: When are statistically significant effects practically important? Journal of Applied Psychology, 89(3), 497–508.Google Scholar

Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292–1306.Google Scholar

Steenkamp, J.-B. E., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of Consumer Research, 25(1), 78–90.Google Scholar

Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit index. Structural Equation Modeling, 5(4), 411–419.CrossRef Google Scholar

Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11(4), 402–415.CrossRef Google Scholar PubMed

Steinmetz, H. (2013). Analyzing observed composite differences across groups: Is partial measurement invariance enough? Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(1), 1–12.Google Scholar

Stoel, R. D., Garre, F. G., Dolan, C., & Van Den Wittenboer, G. (2006). On the likelihood ratio test in structural equation modeling when parameters are subject to boundary constraints. Psychological Methods, 11(4), 439–455.CrossRef Google Scholar PubMed

Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.CrossRef Google Scholar

Thissen, D. (2001). IRTLRDIF v. 2.0 b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill, NC: LL Thurstone Psychometric Laboratory.Google Scholar

Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27(1), 77–83.Google Scholar

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P. W., & Wainer, H., (eds.), Differential Item Functioning (pp. 67–113). Lawrence Erlbaum Associates.Google Scholar

Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.3.0.CO;2-3>CrossRef Google Scholar PubMed

Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10.CrossRef Google Scholar

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70.Google Scholar

Vernon-Feagans, L., Willoughby, M., Garrett-Peters, P., & Family Life Project Key Investigators. (2016). Predictors of Behavioral Regulation in Kindergarten: Household Chaos, Parenting and Early Executive Functions. Developmental Psychology, 52(3), 430.CrossRef Google Scholar PubMed

Wachs, T. D. (2013). Relation of maternal personality to perceptions of environmental chaos in the home. Journal of Environmental Psychology, 34, 1–9.Google Scholar

Wirth, R., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58–79.CrossRef Google Scholar PubMed

Woods, C. M. (2008). IRT-LR-DIF with estimation of the focal-group density as an empirical histogram. Educational and Psychological Measurement, 68(4), 571–586.Google Scholar

Woods, C. M. (2009a). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33(1), 42–57.CrossRef Google Scholar

Woods, C. M. (2009b). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44(1), 1–27.CrossRef Google Scholar PubMed

Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532–547.CrossRef Google Scholar

Xu, Y. & Green, S. B. (2016). The impact of varying the number of measurement invariance constraints on the assessment of between-group differences of latent means. Structural equation modeling, 23(2), 290–301. https://doi.org/10.1080/10705511.2015.1047932.Google Scholar

Yoon, M., & Kim, E. S. (2014). A comparison of sequential and nonsequential specification searches in testing factorial invariance. Behavior Research Methods, 46(4), 1199–1206. https://doi.org/10.3758/s13428-013-0430-2.CrossRef Google Scholar PubMed

Yoon, M., & Millsap, R. E. (2007). Detecting violations of factorial invariance using data-based specification searches: A Monte Carlo study. Structural Equation Modeling, 14(3), 435–463. https://doi.org/10.1080/10705510701301677.Google Scholar

Yuan, K.-H., & Bentler, P. M. (2004). On chi-square difference and z tests in mean and covariance structure analysis when the base model is misspecified. Educational and Psychological Measurement, 64(5), 737–757.Google Scholar

Zhang, Y., Lai, M. H., & Palardy, G. J. (2022). A Bayesian region of measurement equivalence (ROME) approach for establishing measurement invariance. Psychological Methods, 28(4), 993–1004.Google Scholar

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.CrossRef Google Scholar

Zubrick, S. R., Lucas, N., Westrupp, E. M., & Nicholson, J. M. (2014). Parenting Measures in the Longitudinal Study of Australian Children: Construct Validity and Measurement Quality, Waves 1 to 4. Canberra: Department of social services 1–100.Google Scholar

Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15(3), 185–197.Google Scholar