Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2024-12-22T19:54:47.378Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  13 May 2021

Craig S. Wells
Affiliation:
University of Massachusetts, Amherst
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, & item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 6791.Google Scholar
Agresti, A. (2013). Categorical data analysis (3rd ed.). New York: Wiley-Interscience.Google Scholar
Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage.CrossRefGoogle Scholar
Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal of Abnormal Psychology, 112, 545557.Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). The standards for educational and psychological testing (3rd ed.). Washington, DC: American Educational Research Association.Google Scholar
Andrich, D. (1982). An extension of the Rasch model for ratings providing both location and dispersion parameters. Psychometrika, 47, 105113.Google Scholar
Angoff, W. H. (1972, September). A technique for the investigation of cultural differences. Paper presented at the annual meeting of the American Psychological Association, Honolulu.Google Scholar
Angoff, W. H. (1993). Perspective on differential item functioning methodology. In Holland, P. W. & Wainer, H. (eds.), Differential item functioning (pp. 324). Baltimore: Johns Hopkins University Press.Google Scholar
Baker, F. B. (1992). Item response theory: Parameter estimation techniques. New York: Marcel Dekker.Google Scholar
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.Google Scholar
Bandalos, D. L. (2018). Measurement theory and applications for the social sciences. New York: Guilford Press.Google Scholar
Bandalos, D. L. (2014). Relative performance of categorical diagonally weighted least squares and robust maximum likelihood estimation. Structural Equation Modeling: A Multidisciplinary Journal, 21, 102116.Google Scholar
Bejar, I. I. (1983). Introduction to item response models and their assumptions. In Hambleton, R. K. (ed.), Applications of item response theory (pp. 123). Vancouver, BC: Educational Research Institute of British Columbia.Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289300.Google Scholar
Bentler, P. M. (2004). EQS 6 structural equation modeling manual. Encino, CA: Multivariate Software, Inc.Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F. M. & Novick, M. R. (eds.), Statistical theories of mental test scores (pp. 397479). Reading, MA: Addison-Wesley.Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 2931.Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of the EM algorithm. Psychometrika, 46, 443449.Google Scholar
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for dichotomously scored items. Psychometrika, 35, 179198.Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley & Sons.Google Scholar
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113141.Google Scholar
Box, G. E. P., & Draper, N. R. (1987). Empirical model building and response surfaces. New York: John Wiley & Sons.Google Scholar
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153168.Google Scholar
Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). New York: Guilford Press.Google Scholar
Browne, M. W. (1984). Asymptotically distribution free methods in the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 6383.Google Scholar
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In Bollen, K. A. & Long, J. S. (eds.), Testing structural equation models (pp. 136162). Newbury Park, CA: Sage.Google Scholar
Byrne, B. M. (1998). Structural equation modeling with Lisrel, Prelis, and Simplis: Basic concepts, applications, and programming. New York: Routledge.Google Scholar
Byrne, B. M. (2012). Structural equation modeling with Mplus. New York: Routledge.Google Scholar
Cai, L. (2017). flexMIRT 3.5: Flexible multilevel multidimensional item analysis and test scoring [computer software]. Chapel Hill, NC: Vector Psychometric Group.Google Scholar
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publications.Google Scholar
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253260.CrossRefGoogle Scholar
Cervantes, V. H. (2017). DFIT: An R package for Raju’s differential functioning of items and tests framework. Journal of Statistical Software, 76, 124.Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 129.Google Scholar
Chalmers, R. P. (2018). Improving the crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika, 83, 376386.Google Scholar
Chen, F. F., Hayes, A., Carver, C. S., Laurenceau, J. P., & Zhang, Z. (2012). Modeling general and specific variance in multifaceted constructs: A comparison of the bifactor model to other approaches. Journal of Personality, 80(1), 219251.CrossRefGoogle ScholarPubMed
Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41, 189225.Google Scholar
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265289.Google Scholar
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233255.Google Scholar
Chou, C. P., & Bentler, P. M. (1995). Estimates and tests in structural equation modeling. In Hoyle, R. H. (ed.), Structural equation modeling: Concepts, issues, and applications (pp. 3755). Thousand Oaks, CA: Sage Publications.Google Scholar
Chou, C. P., Bentler, P. M., & Satorra, A. (1991). Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44, 347357.Google Scholar
Clauser, B. E., Mazor, K., & Hambleton, R. K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269279.Google Scholar
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 9971003.Google Scholar
Cole, N. S. (1978). Approaches to examining bias in achievement test items. Paper presented at the national meeting of the American Personnel and Guidance Association, Washington, DC.Google Scholar
Cook, L. L., & Eignor, D. R. (1991). An NCME instructional module on IRT equating methods. Educational Measurement: Issues and Practice, 10, 3745.Google Scholar
Cotton, R. (2013). Learning R. Sebastopol: O’Reilly.Google Scholar
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 1629.Google Scholar
de Ayala, R. J. (2009). Theory and practice of item response theory. New York: Guilford Publications.Google Scholar
DeMars, C. (2010). Item response theory. New York: Oxford Press.Google Scholar
Dolan, C. V. (1994). Factor analysis of variables with 2, 3, 5, and 7 response categories: A comparison of categorical variable estimators using simulated data. British Journal of Mathematical and Statistical Psychology, 47, 309326.Google Scholar
Donoghue, J. R., & Allen, N. L. (1993). Thin versus thick matching in the Mantel-Haenszel procedure for detecting DIF. Journal of Educational Statistics, 18, 131154.Google Scholar
Donoghue, J. R., Holland, P. W., & Thayer, D. T. (1993). A Monte Carlo study of factors that affect the Mantel-Haenszel and standardization measures of differential item functioning. In Holland, P. & Wainer, H. (eds.), Differential item functioning (pp. 137166). Hillsdale, NJ: Erlbaum.Google Scholar
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In Holland, P. W. & Wainer, H. (eds.), Differential item functioning (pp. 3566). Hillsdale, NJ: Erlbaum.Google Scholar
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355368.Google Scholar
Dorans, N. J., & Schmitt, A. P. (1993). Constructed response and differential item functioning: A pragmatic approach. In Bennett, R. E. & Ward, W. C. (eds.), Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. 135165). Hillsdale, NJ: Erlbaum.Google Scholar
Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1988). The standardization approach to assessing differential speededness (Research Report No. 88-31). Princeton, NJ: Educational Testing Services.Google Scholar
Douglas, J., & Cohen, A. S. (2001). Nonparametric item response function estimation for assessing parametric model fit. Applied Psychological Measurement, 25, 234243.Google Scholar
Edgington, E. S. (1987). Randomization tests. New York: Marcel Dekker.Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.Google Scholar
Enders, C. K. (2010). Applied missing data. New York: Guilford.Google Scholar
Engelhard, G. (2012). Invariant measurement: Using Rasch models in the social, behavioral, and health sciences. New York: Routledge.Google Scholar
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Los Angeles: Sage.Google Scholar
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood-ratio. Applied Psychological Measurement, 29, 278295.Google Scholar
Finney, S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation modeling. In Hancock, G. R. & Mueller, R. O. (eds.), Structural equation modeling: A second course, (pp. 439492). Charlotte, NC: Information Age Publishing.Google Scholar
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466491.Google Scholar
Foss, T., Joreskog, K. G., & Olsson, U. H. (2011). Testing structural equation models: The effect of kurtosis. Computational Statistics and Data Analysis, 55, 22632275.Google Scholar
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Education Psychological Measurement, 67, 373393Google Scholar
Gierl, M. J., Gotzmann, A., & Boughton, K. A. (2004). Performance of SIBTEST when the percentage of DIF items is large. Applied Measurement in Education, 17, 241264.Google Scholar
Goldstein, H. (1983). Measuring changes in educational attainment over time: Problems and possibilities. Journal of Educational Measurement, 20, 369377.Google Scholar
Gomez-Benito, J., Hidalgo, M. D., & Zumbo, B. D. (2013). Effectiveness of combining statistical tests and effect sizes when using logistic discriminant function regression to detect differential item functioning for polytomous items. Educational and Psychological Measurement, 73, 875897.Google Scholar
Gotzmann, A. J. (2001). The effect of large ability differences on Type I error and power rates using SIBTEST and TESTGRAF DIF detection procedures. Unpublished Master’s Thesis, University of Alberta, Edmonton, Alberta, Canada.Google Scholar
Gotzmann, A. J., & Boughton, K. A. (2004, April). A comparison of Type I error and power rates for the Mantel-Haenszel and SIBTEST procedures when group differences are large and unbalanced. Paper presented at the American Educational Research Association (AERA) in San Diego, California.Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications, Inc.Google Scholar
Harrell, F. E. (2020). rms: Regression modeling strategies. R package version 6.0–1.Google Scholar
Harrison, D. (1986). Robustness of IRT parameter estimation to violations of the unidimensionality assumption. Journal of Educational Statistics, 11, 91115.CrossRefGoogle Scholar
Holland, P. W. (1985). On the study of differential item performance without IRT. Proceedings of the 27th annual conference of the Military Testing Association (Vol. 1, pp. 282287), San Diego, California.Google Scholar
Holland, P. W. (1989). A note on the covariance of the Mantel-Haenszel log-odds estimator and the sample marginal rates. Biometrics, 45, 10091015.Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Wainer, H. & Braun, H. (eds.), Test validity (pp. 129145). Hillsdale, NJ: Erlbaum.Google Scholar
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 6570.Google Scholar
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 155.Google Scholar
Hu, L., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351362.Google Scholar
Hunter, J. E. (1975). A critical analysis of the use of item means and item-test correlations to determine the pressure or absence of content bias in achievement test items. Paper presented at the National Institute of Education conference on test bias, Annapolis, MD.Google Scholar
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the Logistic Regression procedure for DIF detection. Applied Measurement in Education, 14, 329349.Google Scholar
Kabacoff, R. I. (2011). R in action. Shelter Island: Manning.Google Scholar
Kim, S-H., & Cohen, A. S. (1993). A comparison of Lord’s χ2 and Raju’s area measures in detection of DIF. Applied Psychological Measurement, 17, 3952.Google Scholar
Kim, S-H., & Cohen, A. S. (1995). A comparison of Lord’s chi-square, Raju’s area measures, and the likelihood-ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291312.CrossRefGoogle Scholar
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole.Google Scholar
Kirk, R. E. (2012). Experimental design: Procedures for the behavioral sciences (2nd ed.). Pacific Grove, CA: Brooks/Cole.Google Scholar
Klein, A., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method, Psychometrika, 65, 457474.Google Scholar
Kline, R. B. (2015). Principles and practice of structural equation modeling (4th ed.). New York: Guilford Press.Google Scholar
Li, Y. H., & Lissitz, R. W. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of Educational Measurement, 41, 85117.CrossRefGoogle Scholar
Li, H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647677.Google Scholar
Linn, R. L., Levine, M. V., Hastings, C. N., & Wardrop, J. L. (1981). An investigation of item bias in a test of reading comprehension. Applied Psychological Measurement, 5, 159173.Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley.Google Scholar
Lord, F. M. (1977). A study of item bias using item characteristic curve theory. In Poortinga, Y. H. (ed.), Basic problems in cross-cultural psychology (pp. 1929). Amsterdam: Swets & Zeitlinger.Google Scholar
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Lord, F. (1983). Small N justifies the Rasch model. In Weiss, D. J. (ed.) New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 5161). New York: Academic Press, Inc.Google Scholar
MacCallum, , R. C, Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130149.Google Scholar
Magis, D., & Facon, B. (2012). Angoff’s delta method revisited: Improving DIF detection under small samples. The British Psychological Society, 65, 302321.Google Scholar
Magis, D., & Facon, B. (2013). Item purification does not always improve DIF detection: A counter-example with Angoff’s Delta plot. Educational and Psychological Measurement, 73, 293311.Google Scholar
Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning analysis with Angoff’s delta plot. Journal of Statistical Software, 59, 119.Google Scholar
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847862.CrossRefGoogle Scholar
Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58, 690700.Google Scholar
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719748.Google Scholar
Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139160.Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174.Google Scholar
McKinley, R. L. (1983, April). A multidimensional extension of the two-parameter logistic latent trait model. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.Google Scholar
McKinley, R. L., & Reckase, M. D. (1983, August). An extension of the two-parameter logistic model to the multidimensional latent space (Research Report No. ONR83-2). Iowa City, IA: American College Testing Program.Google Scholar
Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93(3), 568592.Google Scholar
Mellenbergh, G. J. (1982). Contingency table methods for assessing item bias. Journal of Educational Statistics, 7, 105118.Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156166.Google Scholar
Miller, T., Spray, J., & Wilson, A. (1992). A comparison of three methods for identifying nonuniform DIF in polytomously scored test items. Paper presented at the Psychometric Society meeting, Columbus, OH.Google Scholar
Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York: Routledge.Google Scholar
Millsap, R. E., & Cham, H. (2012). Investigating factorial invariance in longitudinal data. In Laursen, B., Little, T. D., & Card, N. A. (Eds.), Handbook of developmental research methods (pp. 109126). New York: Guilford Press.Google Scholar
Millsap, R. E., & Olivera-Aguilar, M. (2012). Investigating measurement invariance using confirmatory factor analysis. In Hoyle, R. H., Handbook of structural equation modeling, (pp. 380392). New York: Guilford Press.Google Scholar
Muniz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1, 115135.Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159176.Google Scholar
Muraki, E., & Bock, R. D. (2003). PARSCALE 4.1: IRT item analysis and test scoring for rating-scale data. Chicago: Science Software International.Google Scholar
Muthen, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189.Google Scholar
Muthén, L. K., & Muthén, B.O. (2008–2017). Mplus user’s guide (5th ed.). Los Angeles, CA: Muthén & Muthén.Google Scholar
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691692.Google Scholar
Nandakumar, R., & Roussos, L. (2001). CATSIB: A modified SIBTEST procedure to detect differential item functioning in computerized adaptive tests (Research Report). Newtown, PA: Law School Admission Council.Google Scholar
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 20, 315338.Google Scholar
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257274.Google Scholar
Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory models. New York: Routledge.Google Scholar
Nicewander, W. A. (2018). Conditional reliability coefficients for test scores. Psychological Methods, 23, 351362.Google Scholar
Olsson, U. H., Foss, T., & Troye, S. V. (2003). Does the ADF fit function decrease when kurtosis increases? British Journal of Mathematical and Statistical Psychology, 56, 289303.Google Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 5064.Google Scholar
Oshima, T. C., Raju, N. S., & Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43(1), 117.Google Scholar
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning. Thousand Oaks, CA: Sage.Google Scholar
Patz, R., & Junker, B. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146178.Google Scholar
Phillips, A., & Holland, P. W. (1987). Estimation of the variance of the Mantel-Haenszel log-odds-ratio estimate. Biometrics, 43, 425431.Google Scholar
R Core Team. (2020). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. www.R-project.org/.Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495502.Google Scholar
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197207.Google Scholar
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353368.Google Scholar
Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611630.Google Scholar
Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. New York: Springer.Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. New York: Springer.Google Scholar
Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667696.Google Scholar
Reise, S. P., Moore, T. M., & Haviland, M. G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scale scores. Journal of Personality Assessment, 92(6), 544559.Google Scholar
Reise, S. P., Scheines, R., Widaman, K. F., & Haviland, M. G. (2013). Multidimensionality and structural coefficient bias in structural equation modeling: A bifactor perspective. Educational and Psychological Measurement, 73(1), 526.Google Scholar
Rhemtulla, M., Brossequ-Liard, P., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17, 354373.Google Scholar
Robin, F., Sireci, S. G., & Hambleton, R. K. (2003). Evaluating the equivalence of different language versions of a credentialing exam. International Journal of Testing, 3, 120.Google Scholar
Robins, J., Breslow, N., & Greenland, S. (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics, 42, 311323.Google Scholar
Rogers, H. J., & Swaminathan, H. (1993). A comparison of the logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105116.Google Scholar
Rudner, L. M. (1977, April). An approach to biased item identification using latent trait measurement theory. Paper presented at the annual meeting of the American Educational Research Association, New York.Google Scholar
Rudner, L. M. (1978). Using standard tests with the hearing impaired: The problems of item bias. Volta Review, 80, 3140.Google Scholar
Samejima, F. (1969). Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph, No. 17.CrossRefGoogle Scholar
Samejima, F. (2010). The general graded response model. In Nering, M. L. & Ostini, R. (eds.), Handbook of polytomous item response theory models (pp. 77107). New York: Routledge/Taylor & Francis Group.Google Scholar
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 36, 308313.Google Scholar
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors on covariance structure analysis. In von Eye, A. & Clogg, C. C. (eds.), Latent variables analysis (pp. 399419). Thousand Oaks, CA: Sage Publications.Google Scholar
Satorra, A., and Bentler, P. M. (2010). Ensuring positiveness of the scaled chi-square test statistic. Psychometrika, 75, 243248.Google Scholar
Serlin, R. C., & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 7383.Google Scholar
Serlin, R. C., & Lapsley, D. K. (1993). Rational appraisal of psychological research and the good-enough principle. In Keren, G. & Lewis, C. (eds.), A handbook for data analysis in the behavioral sciences (pp. 199228). Hillsdale, NJ: Erlbaum.Google Scholar
Shealy, R., & Stout, W. (1993a). An item response theory model for test bias and differential item functioning. In Holland, P. & Wainer, H. (eds.), Differential item functioning (pp. 197240). Hillsdale, NJ: Erlbaum.Google Scholar
Shealy, R., & Stout, W. (1993b). A model-based standardization approach that separates true bias/DIF. Psychometrika, 58, 159194.Google Scholar
Shepard, L. A., Camilli, G., & Williams, D. M. (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics, 9, 93128.Google Scholar
Shepard, L. A., Camilli, G., & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77106.Google Scholar
Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage Publications.Google Scholar
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298321.Google Scholar
Somes, G. W. (1986). The generalized Mantel-Haenszel statistic. American Statistician, 40, 106108.Google Scholar
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72101.Google Scholar
Spearman, C. (1927). The abilities of man. New York: Macmillan.Google Scholar
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 12911306.Google Scholar
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173180.Google Scholar
Steinberg, L., & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402415.Google Scholar
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201210.Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361370.Google Scholar
Thissen, D. (2001). IRTLRDIF v2.05: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning.Google Scholar
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In Thissen, D. & Wainer, H. (eds.), Test scoring (pp. 73140). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.Google Scholar
Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49, 501519.Google Scholar
Thissen, D., & Steinberg, L. (1986). Taxonomy of item response models. Psychometrika, 51, 567578.Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In Wainer, W. & Braun, H. (eds.), Test validity (pp. 147169). Hillsdale, NJ: Erlbaum.Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P. & Wainer, H. (eds.), Differential item functioning (pp. 67113). Hillsdale, NJ: Erlbaum.Google Scholar
Thomas, M. L. (2012). Rewards of bridging the divide between measurement and clinical theory: Demonstration of a bifactor model for the Brief Symptom Inventory. Psychological Assessment, 24(1), 101113.Google Scholar
Thurstone, L. L. (1947). Multiple factor analysis. Chicago: University of Chicago Press.Google Scholar
Ullman, J. (2006). Structural equation modeling: Reviewing the basics and moving forward. Journal of Personality Assessment, 87, 3550.Google Scholar
Uttaro, T., & Millsap, R. E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18, 1525.Google Scholar
van der Linden, W. J. (2016). Handbook of item response theory, volume one: Models. Monterey, CA: Taylor & Francis Group.Google Scholar
Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In Holland, P. W. & Wainer, H. (eds.), Differential item functioning (pp. 123135). Hillsdale, NJ: Erlbaum.Google Scholar
Wainer, H., & Wang, X. (2001). Using a new statistical model for testlets to score TOEFL. ETS TOEFL Technical Report TR-16. Princeton, NJ.Google Scholar
Wainer, H., Bradlow, E. T., & Wang, Z. (2007). Testlet response theory. New York: Cambridge University Press.Google Scholar
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54, 426482.Google Scholar
Wang, W. C. (2004). Effects of anchor item methods on detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221261.Google Scholar
Wang, W. C., & Yeh, Y. L. (2004). Effects of anchor item methods on differential item functioning detection with the likelihood-ratio test. Applied Psychological Measurement, 27, 479498.Google Scholar
Wingersky, M. S., Barton, M. A., & Lord, F. M. (1982). LOGIST user’s guide. Princeton, NJ: Educational Testing Service.Google Scholar
Wollack, J. A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21, 307320.Google Scholar
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 4257.Google Scholar
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245262.Google Scholar
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125145.Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187213.Google Scholar
Yu, C., & Muthen, B. (2002, April). Evaluation of model fit indices for latent variable models with categorical and continuous outcomes. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.Google Scholar
Zenisky, A. L., Hambleton, R. K., & Robin, F. (2003). Detection of differential item functioning in large-scale state assessments: A study evaluating a two-stage approach. Educational and Psychological Measurement, 63, 5164.Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
Zwick, R., & Thayer, D. T. (1996). Evaluating the magnitude of differential item functioning in polytomous items. Journal of Educational and Behavioral Statistics, 21, 187201.Google Scholar
Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessing differential item functioning in performance tasks. Journal of Educational Measurement, 30, 233251.Google Scholar
Zwick, R., Thayer, D. T., & Wingersky, M. (1993). A simulation study of methods for assessing differential item functioning in computer-adaptive tests (Research Rep. No. 93-11). Princeton, NJ: Educational Testing Service.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • References
  • Craig S. Wells, University of Massachusetts, Amherst
  • Book: Assessing Measurement Invariance for Applied Research
  • Online publication: 13 May 2021
  • Chapter DOI: https://doi.org/10.1017/9781108750561.009
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • References
  • Craig S. Wells, University of Massachusetts, Amherst
  • Book: Assessing Measurement Invariance for Applied Research
  • Online publication: 13 May 2021
  • Chapter DOI: https://doi.org/10.1017/9781108750561.009
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • References
  • Craig S. Wells, University of Massachusetts, Amherst
  • Book: Assessing Measurement Invariance for Applied Research
  • Online publication: 13 May 2021
  • Chapter DOI: https://doi.org/10.1017/9781108750561.009
Available formats
×