Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-08T10:03:53.174Z Has data issue: false hasContentIssue false

Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas

Published online by Cambridge University Press:  01 January 2025

Matthijs J. Warrens*
Affiliation:
Tilburg University
*
Requests for reprints should be sent to Matthijs J. Warrens, Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands. E-mail: [email protected]

Abstract

An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.

Type
Original Paper
Copyright
Copyright © 2011 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A. (1990). Categorical data analysis, New York: Wiley.Google Scholar
Artstein, R., Poesio, M. (2005). Kappa3 = alpha (or beta), Colchester: University of Essex.Google Scholar
Berry, K.J., Mielke, P.W. (1988). A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurement, 48, 921933.CrossRefGoogle Scholar
Brennan, R.L., Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687699.CrossRefGoogle Scholar
Brenner, H., Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199202.CrossRefGoogle ScholarPubMed
Cicchetti, D., Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101109.CrossRefGoogle Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213220.CrossRefGoogle Scholar
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213220.CrossRefGoogle ScholarPubMed
Conger, A.J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322328.CrossRefGoogle Scholar
Davies, M., Fleiss, J.L. (1982). Measuring agreement for multinomial data. Biometrics, 38, 10471051.CrossRefGoogle Scholar
Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378382.CrossRefGoogle Scholar
Fleiss, J.L. (1981). Statistical methods for rates and proportions, New York: Wiley.Google Scholar
Fleiss, J.L., Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613619.CrossRefGoogle Scholar
Fleiss, J.L., Cohen, J., Everitt, B.S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323327.CrossRefGoogle Scholar
Heuvelmans, A.P.J.M., Sanders, P.F. (1993). Beoordelaarsovereenstemming. In Eggen, T.J.H.M., Sanders, P.F. (Eds.), Psychometrie in de Praktijk (pp. 443470). Arnhem: Cito Instituut voor Toestontwikkeling.Google Scholar
Holmquist, N.S., McMahon, C.A., Williams, E.O. (1968). Variability in classification of carcinoma in situ of the uterine cervix. Obstetrical & Gynecological Survey, 23, 580585.CrossRefGoogle Scholar
Hsu, L.M., Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205219.CrossRefGoogle Scholar
Hubert, L. (1977). Kappa revisited. Psychological Bulletin, 84, 289297.CrossRefGoogle Scholar
Jakobsson, U., Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427431.CrossRefGoogle ScholarPubMed
Janson, H., Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61, 277289.CrossRefGoogle Scholar
Kraemer, H.C. (1979). Ramifications of a population model for κ as a coefficient of reliability. Psychometrika, 44, 461472.CrossRefGoogle Scholar
Kraemer, H.C., Periyakoil, V.S., Noda, A. (2004). Tutorial in biostatistics: kappa coefficients in medical research. Statistics in Medicine, 21, 21092129.CrossRefGoogle Scholar
Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411433.Google Scholar
Kundel, H.L., Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303308.CrossRefGoogle Scholar
Landis, J.R., Koch, G.G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33, 363374.CrossRefGoogle ScholarPubMed
Mielke, P.W., Berry, K.J. (2009). A note on Cohen’s weighted kappa coefficient of agreement with linear weights. Statistical Methodology, 6, 439446.CrossRefGoogle Scholar
Mielke, P.W., Berry, K.J., Johnston, J.E. (2007). The exact variance of weighted kappa with multiple raters. Psychological Reports, 101, 655660.CrossRefGoogle ScholarPubMed
Mielke, P.W., Berry, K.J., Johnston, J.E. (2008). Resampling probability values for weighted kappa with multiple raters. Psychological Reports, 102, 606613.CrossRefGoogle ScholarPubMed
Nelson, J.C., Pepe, M.S. (2000). Statistical description of interrater variability in ordinal ratings. Statistical Methods in Medical Research, 9, 475496.CrossRefGoogle ScholarPubMed
Popping, R. (1983). Overeenstemmingsmaten voor Nominale Data. Unpublished doctoral dissertation, Rijksuniversiteit Groningen, Groningen.Google Scholar
Popping, R. (2010). Some views on agreement to be used in content analysis studies. Quality & Quantity, 44, 10671078.CrossRefGoogle Scholar
Schouten, H.J.A. (1986). Nominal scale agreement among observers. Psychometrika, 51, 453466.CrossRefGoogle Scholar
Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243253.CrossRefGoogle Scholar
Scott, W.A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321325.CrossRefGoogle Scholar
Vanbelle, S., Albert, A. (2009). Agreement between two independent groups of raters. Psychometrika, 74, 477491.CrossRefGoogle Scholar
Vanbelle, S., Albert, A. (2009). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82100.CrossRefGoogle Scholar
Vanbelle, S., Albert, A. (2009). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157163.CrossRefGoogle Scholar
Visser, H., de Nijs, T. (2006). The map comparison kit. Environmental Modelling & Software, 21, 346358.CrossRefGoogle Scholar
Warrens, M.J. (2008). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487502.CrossRefGoogle Scholar
Warrens, M.J. (2008). On the equivalence of Cohen’s kappa and the Hubert–Arabie adjusted Rand index. Journal of Classification, 25, 177183.CrossRefGoogle Scholar
Warrens, M.J. (2009). k-adic similarity coefficients for binary (presence/absence) data. Journal of Classification, 26, 227245.CrossRefGoogle Scholar
Warrens, M.J. (2010). Inequalities between kappa and kappa-like statistics for k×k tables. Psychometrika, 75, 176185.CrossRefGoogle Scholar
Warrens, M.J. (2010). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673677.CrossRefGoogle Scholar
Warrens, M.J. (2010). A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika, 75, 328330.CrossRefGoogle Scholar
Warrens, M.J. (2010). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322332.CrossRefGoogle Scholar
Warrens, M.J. (2010). Inequalities between multi-rater kappas. Advances in Data Analysis and Classification, 4, 271286.CrossRefGoogle Scholar
Warrens, M.J. (2011). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 4, 271286.Google Scholar
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374378.CrossRefGoogle Scholar