Evaluating Test and Survey Items for Bias Across Languages and Cultures

doi:10.1017/CBO9780511779381.011

8 - Evaluating Test and Survey Items for Bias Across Languages and Cultures

Published online by Cambridge University Press: 05 June 2012

Stephen G. Sireci

Edited by

David Matsumoto and

Fons J. R. van de Vijver

Show author details

David Matsumoto: Affiliation:
San Francisco State University
Fons J. R. van de Vijver: Affiliation:
Universiteit van Tilburg, The Netherlands

Book contents

Get access

Summary

Introduction

The world is growing smaller at a rapid rate in this 21st century, and it is little wonder that interest and activity in cross-cultural research are at their peak. Examples of cross-cultural research activities include international comparisons of educational achievement, the exploration of personality constructs across cultures, and investigations of employees’ opinions, attitudes, and skills by multinational companies. In many, if not all, of these instances, the research involves measuring psychological attributes across people who have very different cultural backgrounds and often function using different languages. This cultural and linguistic diversity poses significant challenges for researchers who strive for standardization of measures across research participants. In fact, the backbone of scientific research in psychology – standardization of measures – may lead to significant biases in the interpretation of results if the measuring instruments do not take linguistic and cultural differences into account.

The International Test Commission (ITC) has long pointed out problems in measuring educational and psychological constructs across languages and cultures. Such problems are also well documented in the Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association, & National Council on Measurement in Education, 1999). For example, the Guidelines for Adapting Educational and Psychological Tests (Hambleton, 2005; ITC, 2001) provide numerous guidelines for checking the quality of measurement instruments when they are adapted for use across languages. These guidelines include careful evaluation of the translation process and statistical analysis of test and item response data to evaluate test and item comparability. Many of these guidelines are echoed by the aforementioned Standards. Table 8.1 presents some brief excerpts from the Guidelines and Standards that pertain to maximizing measurement equivalence across languages and cultures while ruling out issues of measurement bias. As can be seen from Table 8.1, both qualitative and quantitative procedures are recommended to comprehensively evaluate test comparability across languages. The qualitative procedures involve use of careful translation and adaptation designs and comprehensive evaluation of the different language versions of a test. Quantitative procedures include the use of dimensionality analyses to evaluate construct equivalence, differential predictive validity to evaluate the consistency of test-criterion relationships across test versions, and differential item functioning procedures to evaluate potential item bias.

Type: Chapter
Information: Cross-Cultural Research Methods in Psychology , pp. 216 - 240

DOI: https://doi.org/10.1017/CBO9780511779381.011 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Allalouf, A.Hambleton, R. K.Sireci, S. G. 1999 Identifying the sources of differential item functioning in translated verbal itemsJournal of Educational Measurement 36 185CrossRef Google Scholar

1999

Angoff, W. H. 1972 Use of difficulty and discrimination indices for detecting item biasBerk, R. A.Handbook of methods for detecting test bias96BaltimoreJohns Hopkins University PressGoogle Scholar

Angoff, W. H.Cook, L. L. 1988 (Report No. 88-2New YorkCollege Entrance Examination BoardGoogle Scholar

Angoff, W. H.Modu, C. C. 1973 Equating the scores of the Prueba de Aptitud Academica and the Scholastic Aptitude TestNew YorkCollege Entrance Examination BoardGoogle Scholar

Brislin, R. W. 1970 Back-translation for cross-cultural researchJournal of Cross-Cultural Psychology 1 185CrossRef Google Scholar

Budgell, G.Raju, N.Quartetti, D. 1995 Analysis of differential item functioning in translated assessment instrumentsApplied Psychological Measurement 19 309CrossRef Google Scholar

Camilli, G.Shepard, L. A. 1994 Methods for identifying biased test itemsThousand Oaks, CASageGoogle Scholar

Clauser, B. E.Mazor, K. M. 1998 Using statistical procedures to identify differentially functioning test itemsEducational Measurement: Issues and Practice 17 31CrossRef Google Scholar

Cronbach, L. J.Meehl, P. E. 1955 Construct validity in psychological testsPsychological Bulletin 52 281CrossRef Google Scholar PubMed

Day, S. X.Rounds, J. 1998 Universality of vocational interest structure among racial and ethnic minoritiesAmerican Psychologist 53 728CrossRef Google Scholar

Dorans, N. J.Holland, P. W. 1993 DIF detection and description: Mantel–Haenszel and standardizationHolland, P. W.Wainer, H.Differential item functioning35Hillsdale, NJErlbaumGoogle Scholar

Dorans, N. J.Kulick, E. 1986 Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude TestJournal of Educational Measurement 23 355CrossRef Google Scholar

Fischer, G. 1993 Notes on the Mantel–Haenszel procedure and another chi-squared test for the assessment of DIFMethodika 7 88Google Scholar

Geisinger, K. F. 1994 Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instrumentsPsychological Assessment 6 304CrossRef Google Scholar

Gierl, M. J.Khaliq, S. N. 2001 Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysisJournal of Educational Measurement 38 164CrossRef Google Scholar

Hambleton, R. K. 1993 Translating achievement tests for use in cross-national studiesEuropean Journal of Psychological Assessment 9 57Google Scholar

Hambleton, R. K. 1994 Guidelines for adapting educational and psychological tests: A progress reportEuropean Journal of Psychological Assessment 10 229Google Scholar

Hambleton, R. K. 2005 Issues, designs, and technical guidelines for adapting tests into multiple languages and culturesHambleton, R. K.Merenda, P.Spielberger, C.Adapting educational and psychological tests for cross-cultural assessment3Hillsdale, NJErlbaumGoogle Scholar

Hambleton, R. K.Sireci, S. G.Robin, F. 1999 Adapting credentialing exams for use in multiple languagesCLEAR Exam Review 10 24Google Scholar

Hauger, J. B.Sireci, S. G. 2008 Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second languageInternational Journal of Testing 8 237CrossRef Google Scholar

Holland, P. W.Thayer, D. T. 1988 Differential item functioning and the Mantel–Haenszel procedureWainer, H.Braun, H. I.Test validity129Hillsdale, NJErlbaumGoogle Scholar

Holland, P. W.Wainer, H. 1993 Differential item functioningHillsdale, NJErlbaumGoogle Scholar

International Test Commission 2001 International Test Commission guidelines for test adaptationLondonAuthorGoogle Scholar

Jodoin, M. G.Gierl, M. J. 2001 Evaluating power and Type I error rates using an effect size with the logistic regression procedure for DIFApplied Measurement in Education 14 329CrossRef Google Scholar

Lord, F. M. 1980 Applications of item response theory to practical testing problemsHillsdale, NJErlbaumGoogle Scholar

Mantel, N.Haenszel, W. 1959 Statistical aspects of the analysis of data from retrospective studies of diseaseJournal of the National Cancer Institute 22 19Google Scholar

Millsap, R. E.Everson, H.T. 1993 Methodology review: Statistical approaches for assessing measurement biasApplied Psychological Measurement 17 297CrossRef Google Scholar

Muniz, J.Hambleton, R. K.Xing, D. 2001 Small sample studies to detect flaws in test translationInternational Journal of Testing 1 115CrossRef Google Scholar

Penfield, R. D. 2005 DIFAS: Differential item functioning analysis systemApplied Psychological Measurement 29 150CrossRef Google Scholar

Potenza, M. T.Dorans, N. J. 1995 DIF assessment for polytomously scored items: A framework for classification and evaluationApplied Psychological Measurement 19 23CrossRef Google Scholar

Raju, N. S. 1988 The area between two item characteristic curvesPsychometrika 53 495CrossRef Google Scholar

Raju, N. S. 1990 Determining the significance of estimated signed and unsigned areas between two item response functionsApplied Psychological Measurement 14 197CrossRef Google Scholar

Reise, S. P.Widaman, K. F.Pugh, R. H. 1993 Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariancePsychological Bulletin 114 552CrossRef Google Scholar PubMed

Robin, F. 1999 SDDIF: Standardization and delta DIF analysesAmherstUniversity of Massachusetts, Laboratory of Psychometric and Evaluative ResearchGoogle Scholar

Robin, F.Sireci, S. G.Hambleton, R. K. 2003 Evaluating the equivalence of different language versions of a credentialing examInternational Journal of Testing 3 1CrossRef Google Scholar

Rogers, H. J.Swaminathan, H. 1993 A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioningApplied Psychological Measurement 17 105CrossRef Google Scholar

Shealy, R.Stout, W. 1993 A model-based standardization differences and detects test bias/DTF as well as item bias/DIFPsychometrika 58 159CrossRef Google Scholar

Sireci, S. G. 1997 Problems and issues in linking tests across languagesEducational Measurement: Issues and Practice 16 12CrossRef Google Scholar

Sireci, S. G. 2005 Using bilinguals to evaluate the comparability of different language versions of a testHambleton, R. K.Merenda, P.Spielberger, C.Adapting educational and psychological tests for cross-cultural assessment117Hillsdale, NJErlbaumGoogle Scholar

Sireci, S. G.Bastari, B.Allalouf, A. 1998 Evaluating construct equivalence across adapted testsSan Francisco, CAGoogle Scholar

Sireci, S. G.Berberoglu, G. 2000 Evaluating translation DIF using bilingualsApplied Measurement in Education 13 229CrossRef Google Scholar

Sireci, S. G.Fitzgerald, C.Xing, D. 1998 Adapting credentialing examinations for international usesLaboratory of Psychometric and Evaluative Research Report No. 329Amherst, MAUniversity of Massachusetts, School of EducationGoogle Scholar

Sireci, S. G.Harter, J.Yang, Y.Bhola, D. 2003 Evaluating the equivalence of an employee attitude survey across languages, cultures, and administration formatsInternational Journal of Testing 3 129CrossRef Google Scholar

Sireci, S. G.Patsula, L.Hambleton, R. K. 2005 Statistical methods for identifying flawed items in the test adaptations processHambleton, R. K.Merenda, P.Spielberger, C.Adapting educational and psychological tests for cross-cultural assessment93Hillsdale, NJErlbaumGoogle Scholar

Swaminathan, H.Rogers, H. J. 1990 Detecting differential item functioning using logistic regression proceduresJournal of Educational Measurement 27 361CrossRef Google Scholar

Thissen, D. 2001 http://www.unc.edu/∼dthissen/dl.html

Thissen, D.Steinberg, L.Wainer, H. 1988 Use of item response theory in the study of group differences in trace linesWainer, H.Braun, H. I.Test validity147Hillsdale, NJErlbaumGoogle Scholar

Thissen, D.Steinberg, L.Wainer, H. 1993 Detection of differential item functioning using the parameters of item response modelsHolland, P. W.Wainer, H.Differential item functioning67Mahwah, NJErlbaumGoogle Scholar

Van de Vijver, F. J. R.Poortinga, Y. H. 1997 Towards an integrated analysis of bias in cross-cultural assessmentEuropean Journal of Psychological Assessment 13 29CrossRef Google Scholar

Van de Vijver, F. J. R.Poortinga, Y. H. 2005 Conceptual and methodological issues in adapting testsHambleton, R. K.Merenda, P.Spielberger, C.Adapting educational and psychological tests for cross-cultural assessment39Hillsdale, NJErlbaumGoogle Scholar

Van de Vijver, F.Tanzer, N. K. 1997 Bias and equivalence in cross-cultural assessmentEuropean Review of Applied Psychology 47 263Google Scholar

Wainer, H.Sireci, S. G. 2005 Item and test biasEncyclopedia of social measurement365San Diego, CAElsevierCrossRef Google Scholar

Wainer, H.Sireci, S. G.Thissen, D. 1991 Differential testlet functioning: Definitions and detectionJournal of Educational Measurement 28 197CrossRef Google Scholar

Waller, N. G. 1998 EZDIF: The detection of uniform and non-uniform differential item functioning with the Mantel–Haenszel and logistic regression proceduresApplied Psychological Measurement 22 391CrossRef Google Scholar

Zumbo, B. D. 1999 A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scoresOttawa, CanadaDirectorate of Human Resources Research and Evaluation, Department of National DefenseGoogle Scholar

Zwick, R.Donoghue, J. R.Grima, A. 1993 Assessment of differential item functioning for performance tasksJournal of Educational Measurement 30 233CrossRef Google Scholar