Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2025-01-05T15:02:31.248Z Has data issue: false hasContentIssue false

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference

Published online by Cambridge University Press:  27 December 2024

Jonas Moss*
Affiliation:
BI Norwegian Business School
*
Correspondence should be made to JonasMoss, Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen’s kappa or Fleiss’s kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss’s kappa, Conger’s kappa, and Hubert’s kappa, the variant of Fleiss’s kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.

Type
Original Research
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2024 The Author(s)

1. Introduction

The most popular measures of inter-rater agreement involve correction for chance agreement. These can be written on the form

(1.1) pa-pca1-pca=1-pdpcd, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{p_{a}-p_{ca}}{1-p_{ca}}=1-\frac{p_{d}}{p_{cd}}, \end{aligned}$$\end{document}

where pa \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{a}$$\end{document} ( pd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{d}$$\end{document} ) is the percentage agreement (disagreement) between the raters and pca \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{ca}$$\end{document} ( pcd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{cd}$$\end{document} ) is the chance agreement (disagreement) between the raters. Such measures are frequently called chance-corrected measures of agreement. Well-known examples of coefficients in this class are Cohen’s (Reference Cohen1960) kappa and its weighted variant (Reference Cohen1968), its multi-rater variant Conger’s kappa (Conger, Reference Conger1980; Light, Reference Light1971), Krippendorff’s (Reference Krippendorff1970) alpha, Scott’s (Reference Scott1955) pi, and Fleiss’ (Reference Fleiss1971) kappa. Some of these coefficients are defined only for two raters. The rest are defined in a pairwise manner, in the sense that they measure agreement between two raters at a time. However, not every proposed measure of agreement is defined on pairs of raters. The most famous is Hubert’s kappa (Reference Hubert1977), which was recently studied in detail by Martín Andrés and Álvarez Hernández (Reference Martín Andrés and Álvarez Hernández2020). Other agreement coefficients include the AC1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AC_{1}$$\end{document} coefficient (Gwet, Reference Gwet2008), the recent coefficient of van Oest (Reference van Oest2019), and a multitude of intraclass correlation coefficients (Gwet, Reference Gwet2014).

There is no consensus on how multi-rater agreement coefficients should be defined. Broadly speaking, two options are considered: pairwise coefficients and consensus coefficients. The pairwise coefficients measure the agreement between pairs of raters (Conger, Reference Conger1980), while the consensus coefficients measure the simultaneous agreement between all raters. In particular, consensus coefficients support the notion that “agreement occurs if and only if all raters agree on the categorization of an object” (Hubert, Reference Hubert1977). Both pairwise and consensus-based definitions of agreement are variants of g-wise measures of agreement (Conger, Reference Conger1980), where agreement is measured among g-tuples of raters. The case where 2<g<R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2<g<R$$\end{document} has received little attention in the literature (Warrens, Reference Warrens2012), and non-trivial ways to measure agreement are hard to invent in this case. However, we introduce a promising and general framework for handling g-wise measures of agreement based on the concept of Fréchet variances (Dubey and Müller, Reference Dubey and Müller2019). The Fréchet variances generalize the variance and the measures of agreement based on them generalize the nominal, linearly weighted, and quadratically weighted pairwise measures of agreement in a natural way. They are easily interpretable, as you measure how much the raters disagree with the generalized mean rater and then adjust for chance. For nominal data in particular, they measure how many raters disagree with the modal rater, with a resulting agreement measure less extreme than Hubert’s kappa.

We need inferential theory for the g-wise agreement coefficients to make them useful. Much work has been done on inference for agreement coefficients, but, to our knowledge, inference for g-wise agreement coefficients has yet to be studied. Assuming multivariate normality of the ratings, Lin (Reference Lin1989, Section 3) derived the asymptotic distribution of Cohen’s kappa with quadratic weights. Fleiss (Reference Fleiss1971) introduced a formula for the standard error of Fleiss’s kappa, but later showed that it was incorrect. Using the properties of the multinomial distribution and the delta method, Schouten (Reference Schouten1980) found the asymptotic variance of the weighted Fleiss’s kappa in the case when the number of categories is finite. Almost forty years later, Gwet (Reference Gwet2021) found a consistent estimator of the variance for the unweighted Fleiss’s kappa. We extend these results to the weighted g-wise Fleiss’s kappa for any number of categories below. In addition, we mention that bootstrap inference for Fleiss’s kappa and Krippendorff’s alpha was studied by Zapf et al. (Reference Zapf, Castell, Morawietz and Karch2016).

We begin the paper by providing the definitions of two kinds of chance-corrected agreement coefficients. Then, in Sect. 2, we establish connections between the multi-rater Cohen’s kappa, Fleiss’s kappa, Conger’s kappa, Krippendorff’s alpha, and Hubert’s kappa. We restrict ourselves to the context where every rater rates every item. In Sect. 3, we discuss the Fréchet variances mentioned above. Then we spell out the basic limit theory for this class agreement coefficients in Sect. 4, extending the results of Schouten (Reference Schouten1980), Schouten (Reference Schouten1982), and O’Connell and Dobson (Reference O’Connell and Dobson1984) to vector-valued items and g-wise coefficients. We do this using the theory of U-statistics (Lee, Reference Lee2019), but there are other ways to arrive at the same results. Then, in Sect. 5, we provide practical recommendations regarding the choice of confidence interval, obtained by comparing three confidence interval constructions: basic, arcsine transformed, and Fisher transformed. Using a simulation study, we find that the arcsine and Fisher intervals outperform the basic interval when n is small.

2. Measures of Agreement

Let d(x1,,xg) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d(x_{1},\ldots ,x_{g})$$\end{document} be a disagreement function, a positive and symmetric function of g arguments that equals 0 when all xi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i}$$\end{document} s are equal, i.e., d(x,,x)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d(x,\ldots ,x)=0$$\end{document} . The disagreement function quantifies the disagreement between the ratings x1,,xg \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},\ldots ,x_{g}$$\end{document} , where 0 is understood as complete agreement.

Most disagreement functions take two arguments. While there are infinitely many disagreement functions, the best-known belong to the class of lp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{p}$$\end{document} quasi-norms, p=0,1,2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=0,1,2$$\end{document} , potentially raised to the pth power. The lp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{p}$$\end{document} quasi-norms, p[0,] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\in [0,\infty ]$$\end{document} in Rk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^{k}$$\end{document} are defined as

(2.1) xp=i=1k|xi|p1/p. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Vert x\Vert _{p}=\left( \sum _{i=1}^{k}|x_{i}|^{p}\right) ^{1/p}. \end{aligned}$$\end{document}

Here ||x||0=i=1k1[xi0] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||x||_{0}=\sum _{i=1}^{k}1[x_{i}\ne 0]$$\end{document} and ||x||=supi|xi| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||x||_{\infty }=\sup _i |x_{i}|$$\end{document} , as can be verified by taking the limit of ||x||p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||x||_{p}$$\end{document} as p0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\rightarrow 0$$\end{document} and p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\rightarrow \infty $$\end{document} , respectively. It is well known that ||x||p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||x||_{p}$$\end{document} are proper norms if and only if p1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\ge 1$$\end{document} , as the triangle inequality is violated when 1>p0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1>p\ge 0$$\end{document} .

Now define the disagreement functions dp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{p}$$\end{document} as the lp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{p}$$\end{document} quasi-norm evaluated in x1-x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1}-x_{2}$$\end{document} , i.e.,

(2.2) dp(x1,x2)=||x1-x2||p. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} d_{p}(x_{1},x_{2})=||x_{1}-x_{2}||_{p}. \end{aligned}$$\end{document}

In the case of scalar values, d0(x1,x2)=1[x1x2] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}(x_{1},x_{2})=1[x_{1}\ne x_{2}]$$\end{document} is known as the nominal disagreement function. For p=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=1$$\end{document} , the lp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{p}$$\end{document} norm equals d1(x1,x2)=|x1-x2| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{1}(x_{1},x_{2})=|x_{1}-x_{2}|$$\end{document} , which is known as the absolute value disagreement function (and sometimes the linear disagreement function). The quadratic disagreement function is d22(x1,x2)=(x1-x2)2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}^{2}(x_{1},x_{2})=(x_{1}-x_{2})^{2}$$\end{document} . Vector-valued variants of dp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{p}$$\end{document} and dpp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{p}^{p}$$\end{document} are much less common, but have been used by, e.g., Berry et al. (Reference Berry, Johnston and Mielke2008).

When the dimension of the disagreement function d is not equal to 2, we are mostly interested in the case where its dimension equals the number of raters R. In this case, the disagreement functions often measure the degree of consensus among the raters, with 0 reflecting complete consensus. The most obvious choice is the Hubert disagreement function,

(2.3) d(x1,,xg)=1-1[x1==xg] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} d(x_{1},\ldots ,x_{g})=1-1[x_{1}=\cdots =x_{g}] \end{aligned}$$\end{document}

which equals 0 if and only if every rater agrees on a rating. The disagreement function is employed in Hubert’s kappa (Hubert, Reference Hubert1977).

We present our results in terms of disagreement functions instead of the more popular agreement functions (i.e., positive symmetric functions bounded by 1 where 1 signifies maximal agreement, sometimes with the additional assumption that a0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a\ge 0$$\end{document} ). We do this mainly for mathematical convenience. Agreement functions and disagreement functions are closely related, for if a is an agreement function, then d=1-a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1-a$$\end{document} is a disagreement function. Our results could have been framed in terms of agreement functions instead, though with some loss of generality. See Appendix (Sect. 6) for a short discussion.

Our results and definitions are framed in the following setup. Let R be the number of raters and n be the number of items rated. Moreover, let F be a fixed multivariate distribution function F so that all rating vectors Xi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{i}$$\end{document} are sampled independently from F. In symbols,

(2.4) X1,X2,,XniidF. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{X}_{1},\varvec{X}_{2},\ldots ,\varvec{X}_{n} {\mathop {\sim }\limits ^{iid}} F. \end{aligned}$$\end{document}

There are no restrictions on the rating vector components Xir \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{ir}$$\end{document} . They can be, e.g., categorical, real numbers, or vectors.

Equation (2.4) implies that every item is rated by exactly the same number of raters, which we refer to as the rectangular design assumption. The assumption is common in the literature,Footnote 1 but far from universal. It can be relaxed, but it is strictly required for the limit results. We sketch how to loosen it in Appendix (Sect. 6), but we have made no attempts at an inferential theory for non-rectangular designs.

There are two important special cases covered by equation (2.4). First, in the case of fixed raters, the same set of ordered raters rate every item. Having fixed raters is common in applications of Cohen’s kappa, Conger’s kappa, and the concordance correlation coefficient.Footnote 2 Having fixed raters ensures that F does not vary across different rating vectors, but F could potentially vary with the ratings when the raters are not fixed, provided we do not make further assumptions. And that leads us to the second case, that of exchangeable ratings given the item. Here, the rater identities do not affect the ratings given. The raters may be different for each item, but the distribution F will still be fixed. Exchangeable ratings occur when the ratings are identically distributed conditional on the item rated. Exchangeable ratings is an implicit assumption underlying most applications of Fleiss’ kappa, e.g., that of Fleiss (Reference Fleiss1971). In this case, the marginal distributions for all raters will be equal, which implies that the population value of the generalized Fleiss kappa equals the population value of the generalized Cohen’s kappa, both defined below. However, the sample Fleiss’s kappa is the preferred sample estimator, as it is invariant under changes of the raters’ identities.

We intend to collect the kappas of Cohen, Fleiss, Conger, Hubert, and so on, into a coherent framework of g-wise agreement coefficients. To do this, we will have to define some quantities. Let xi=(xi1,xi2,,xiR) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{i}=(x_{i1},x_{i2},\ldots ,x_{iR})$$\end{document} be an R-dimensional vector of observed ratings, and recall that g is the dimension of our disagreement function d. The following definitions are natural population counterparts of sample definitions prevalent in the agreement literature.

  1. (i) The disagreement at x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{1}$$\end{document} , as measured by d. The purpose of this quantity is to translate an arbitrary g-dimensional disagreement function d into a disagreement function taking an R-dimensional vector x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{1}$$\end{document} as input. It is defined as

    (2.5) Dd(x1)=Rg-1r1,,rgd(x1r1,,x1rg), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} D_{d}(\varvec{x}_{1})=\left( {\begin{array}{c}R\\ g\end{array}}\right) ^{-1} \sum _{r_{1},\ldots ,r_{g}}d(\varvec{x}_{1r_{1}},\ldots , \varvec{x}_{1r_{g}}), \end{aligned}$$\end{document}
    where the sum runs over all g-dimensional subsets of {1,,R} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{1,\ldots ,R\}$$\end{document} with order ignored, i.e., the g-combinations of R. The expression is simplified when g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} , as Dd(x1)=d(x11,,x1R) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_{d}(\varvec{x}_{1})=d(\varvec{x}_{11},\ldots ,\varvec{x}_{1R})$$\end{document} in this case. To gain some intuition about this quantity, suppose that g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} , that x1,x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},x_{2}$$\end{document} are scalars, and consider the nominal disagreement function d0(x1,x2)=1[x1x2] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}(x_{1},x_{2})=1[x_{1}\ne x_{2}]$$\end{document} . Then Dd(x1)=2R-1(R-1)-1r1>r21[x1r1x1r2] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_{d}(\varvec{x}_{1})=2R^{-1}(R-1)^{-1}\sum _{r_{1}>r_{2}}1[x_{1r_{1}}\ne x_{1r_{2}}]$$\end{document} is the percentage of times two distinct raters disagree on their rating.
  2. (ii) The Cohen-type chance disagreement at x1,,xg \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{1},\ldots ,\varvec{x}_{g}$$\end{document} , so called to differentiate it from the Fleiss-type chance disagreement. It is similar to the disagreement at x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{1}$$\end{document} , but this time the raters do not necessarily rate the same item, as one rater rates the first item (from x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{1}$$\end{document} ) another rater rates the second item (from x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{2}$$\end{document} ), and so on. We do not allow a rater to rate the same item more than once in a pass: Hence, we need to choose g raters from a set of R raters, and the chance disagreement is

    (2.6) Cd(x1,,xg)=Rg-1r1,,rgd(x1r1,,xgrg), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} C_{d}(\varvec{x}_{1},\ldots ,\varvec{x}_{g}) =\left( {\begin{array}{c}R\\ g\end{array}}\right) ^{-1}\sum _{r_{1},\ldots ,r_{g}}d(\varvec{x}_{1r_{1}}, \ldots ,\varvec{x}_{gr_{g}}), \end{aligned}$$\end{document}
    where the sum runs over all g-dimensional subsets of {1,,R} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{1,\ldots ,R\}$$\end{document} , i.e., the g-combinations of R. Observe that Dd(x)=Cd(x,,,x) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_{d}(\varvec{x})=C_{d}(\varvec{x},,\ldots ,\varvec{x})$$\end{document} . Since d is assumed to be symmetric, the expression is simplified to d(x1r1,,xRrR) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d(\varvec{x}_{1r_{1}},\ldots ,\varvec{x}_{Rr_{R}})$$\end{document} when g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} . When g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} , Cd(x1,x2)=R-1(R-1)-1r1r2d(x1r1,x2r2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{d}(\varvec{x}_{1},\varvec{x}_{2})=R^{-1}(R-1)^{-1} \sum _{r_{1}\ne r_{2}}d(\varvec{x}_{1r_{1}},\varvec{x}_{2r_{2}})$$\end{document} .
  3. (iii) The Fleiss-type chance disagreement at x1,,xg \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_{1},\ldots ,\varvec{x}_{g}$$\end{document} is similar, but allows the same rater to rate an item multiple times. Its definition is

    (2.7) Fd(x1,,xg)=R-gr1,,rgd(x1r1,,xgrg), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} F_{d}(\varvec{x}_{1},\ldots ,\varvec{x}_{g}) =R^{-g}\sum _{r_{1},\ldots ,r_{g}}d(\varvec{x}_{1r_{1}}, \ldots ,\varvec{x}_{gr_{g}}), \end{aligned}$$\end{document}
    where the sum runs over the product set Rg \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R^{g}$$\end{document} . The expression for Fd(x1,,xg) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{d}(\varvec{x}_{1},\ldots ,\varvec{x}_{g})$$\end{document} is not dramatically simplified when g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} . When g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} , Fd(x1,x2)=R-2r1,r2d(x1r1,x2r2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{d}(\varvec{x}_{1},\varvec{x}_{2})=R^{-2} \sum _{r_{1},r_{2}}d(\varvec{x}_{1r_{1}},\varvec{x}_{2r_{2}})$$\end{document} .

We will call the expected values of these quantities the mean disagreement, the mean Cohen-type chance disagreement, and the mean Fleiss-type chance disagreement. Slightly abusing notation, we denote them as

(2.8) Dd=E[Dd(X1)],Cd=E[Cd(X1,,Xg)],Fd=E[Fd(X1,,Xg)], \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} D_{d}=E[D_{d}(\varvec{X}_{1})],\quad C_{d}=E[C_{d}(\varvec{X}_{1},\ldots ,\varvec{X}_{g})],\quad F_{d}=E[F_{d}(\varvec{X}_{1},\ldots ,\varvec{X}_{g})], \end{aligned}$$\end{document}

where X1,,Xg \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1},\ldots ,\varvec{X}_{g}$$\end{document} are independently sampled from the same distribution F. Discussions about the difference between E[Cd(X1,,Xg)] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[C_{d}(\varvec{X}_{1},\ldots ,\varvec{X}_{g})]$$\end{document} and E[Fd(X1,,Xg)] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[F_{d}(\varvec{X}_{1},\ldots ,\varvec{X}_{g})]$$\end{document} , and why to prefer one over the other, are abundant in the literature, often in the context of the so-called paradox of kappa (Cicchetti and Feinstein, Reference Cicchetti and Feinstein1990).

Definition 1

Let XF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X\sim F$$\end{document} be a vector of R ratings and d be an agreement function with dimension g. Define the population values of the generalized Cohen’s kappa (κd) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\kappa _{d})$$\end{document} and Fleiss’s kappa (πd) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\pi _{d})$$\end{document} as

(2.9) κd=1-DdCd,πd=1-DdFd. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \kappa _{d}=1-\frac{D_{d}}{C_{d}},\quad \pi _{d}=1 -\frac{D_{d}}{F_{d}}. \end{aligned}$$\end{document}

The generalized Fleiss’s kappa, denoted as πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{d}$$\end{document} since it generalizes of Scott’s pi (Scott, Reference Scott1955), is a straightforward generalization of the Fleiss kappa (Reference Fleiss1971) to hold for 2<gR \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2< g\le R$$\end{document} . When g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} and d is the nominal disagreement, it equals Hubert’s kappa. Likewise, the generalized Cohen’s kappa is an extension of weighted Conger’s kappa to hold for 2gR \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\le g\le R$$\end{document} . When g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} , it equals the Schuster–Smith coefficient (Schuster & Smith, Reference Schuster and Smith2005, eq. 1).Footnote 3 It generalizes several other agreement coefficients as well. For instance, Berry and Mielke (Reference Berry and Mielke1988) discussed what we call κd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}$$\end{document} for Euclidean weights between vector-valued ratings, while Janson and Olsson (Reference Janson and Olsson2001) extended it to squared Euclidean and nominal weights. The relationship between most of the mentioned agreement coefficients is summarized in Table 1.

Table 1 Weighted agreement coefficients.

*Lin’s concordance coefficient and the concordance correlation coefficient (CC) is defined for quadratic weights only.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\dagger $$\end{document} Originally defined for nominal weights only.

3. Sample Estimates

Let X1,,XnF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1},\ldots ,\varvec{X}_{n}\sim F$$\end{document} be n iid vectors of ratings. Then there is a single natural sample estimator of Dd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_{d}$$\end{document} , namely

(2.10) D^d=n-1i=1nDd(xi). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{D}_{d}=n^{-1}\sum _{i=1}^nD_{d}(\varvec{x}_{i}). \end{aligned}$$\end{document}

There are, however, two natural estimators of the Cohen-type chance disagreement: one them a V-statistic (Lee, Reference Lee2019, Chapter 4.2) and the other a U-statistic (Lee, Reference Lee2019, Chapter 1),

(2.11) C^d=n-gi1,,igCd(xi1,,xig)andC^du=ng-1i1,,igCd(xi1,,xig), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{C}_{d} = n^{-g}\sum _{i_{1},\ldots ,i_{g}}C_{d} (\varvec{x}_{i_{1}},\ldots ,\varvec{x}_{i_g})\quad \text {and}\quad \hat{C}_{d}^{u}=\left( {\begin{array}{c}n\\ g\end{array}}\right) ^{-1}\sum _{i_{1}, \ldots ,i_{g}}C_{d}(\varvec{x}_{i_{1}},\ldots , \varvec{x}_{i_{g}}), \end{aligned}$$\end{document}

where the first estimator runs over all combinations with repetitions of i1,i2,,ig \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i_{1},i_{2},\ldots ,i_{g}$$\end{document} and the second estimator runs over the unordered combinations i1<i2<<ig \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i_{1}<i_{2}<\ldots <i_{g}$$\end{document} . Using the basic results of U-statistics (Lee, Reference Lee2019, Chapter 1), we see that Cdu \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{d}^{u}$$\end{document} is the unique minimum-variance unbiased estimator of Cd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{d}$$\end{document} , which makes it attractive from a theoretical point of view. However, from a well-known correspondence between U-statistics and V-statistics, the asymptotic distributions of C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}$$\end{document} coincide with the asymptotic distribution of C^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}^{u}$$\end{document} (Lee, Reference Lee2019, Chapter 4, Theorem 1), so the choice between C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}$$\end{document} and C^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}^{u}$$\end{document} barely matters when n is sufficiently large.

Likewise, there are two natural estimators of the Fleiss-type weighted chance agreement,

(2.12) F^d=n-gi1,,igFd(xi1,,xig)andF^du=ng-1i1,,igFd(xi1,,xig), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{F}_{d} = n^{-g}\sum _{i_{1},\ldots ,i_{g}}F_{d}(\varvec{x}_{i_{1}},\ldots , \varvec{x}_{i_{g}})\quad \text {and}\quad \hat{F}_{d}^{u} =\left( {\begin{array}{c}n\\ g\end{array}}\right) ^{-1}\sum _{i_{1},\ldots ,i_{g}}F_{d}(\varvec{x}_{i_{1}},\ldots , \varvec{x}_{i_{g}}), \end{aligned}$$\end{document}

where the index sets are described above.

Now, we can define two sample variants of Cohen’s kappa (Fleiss’s kappa), depending on which one of C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}$$\end{document} ( F^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{F}_{d}$$\end{document} ) and C^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}^{u}$$\end{document} ( F^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{F}_{d}^{u}$$\end{document} ) we choose to use. These are κ^d=1-D^d/C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}=1-\hat{D}_{d}/\hat{C}_{d}$$\end{document} and κ^du=1-D^d/C^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}^{u}=1-\hat{D}_{d}/\hat{C}_{d}^{u}$$\end{document} for Cohen’s kappa and π^d=1-D^d/F^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}=1-\hat{D}_{d}/\hat{F}_{d}$$\end{document} and π^du=1-D^d/F^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}^{u}=1-\hat{D}_{d}/\hat{F}_{d}^{u}$$\end{document} for Fleiss’s kappa. The definition of the sample Cohen’s kappa (Cohen, Reference Cohen1968) agrees with κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} , not with κ^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}^{u}$$\end{document} . Likewise, the sample Fleiss’s kappa has a definition agreeing with π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} (Fleiss, Reference Fleiss1971). Moreover, due to the possibility of binning data, π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} and κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} are faster to compute when the data is not continuous. Since the estimators are asymptotically equivalent in any case, we will stick to the V-statistics κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} for estimation, but use the U-statistic form when deriving limit distributions. We note that, since we need to compute strictly fewer combinations, κ^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}^{u}$$\end{document} and π^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}^{u}$$\end{document} are faster to compute when the data is continuous, which may be useful in some settings.

3. Fréchet Variances for g-Wise Agreement Coefficients

The most popular measures of agreement are defined only for g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} . It is easy to find reasonable disagreement measures in this case, as one can draw on the extensive literature on norms and distances. The lp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{p}$$\end{document} distances are the obvious choices, but there are many unexplored options, such as the Huber loss (Huber, Reference Huber1964) and the LINEX loss (Varian, Reference Varian1975).

In the setting of Hubert’s kappa and the Schuster–Smith coefficient, we have g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} , and it is not that easy to find reasonable disagreement functions anymore. The disagreement function used in Hubert’s kappa, d(x1,,xR)=1-1[x1==xR] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d(x_{1},\ldots ,x_{R})=1-1[x_{1}=\cdots =x_{R}]$$\end{document} , will penalize any number of discordant ratings equally, yielding the often undesirable outcome that most sets of ratings will be in complete disagreement. But there are less sensitive ways to count nominal disagreements. Consider the case of 10 raters with three ratings on an ordinal scale from 1–3, with 7 raters giving rating 1, 2 giving rating 2, and 1 giving rating 3. Then Hubert’s disagreement rating is 1, as the rating vector is not constant, and the pairwise disagreement is 46/100. But it sounds reasonable to pick the modal rating (in this case 1) and then report the number of raters that disagree with it, divided by the number of raters. In this case, the number of raters disagreeing with the modal rating is 3, and the “modal” disagreement equals 3/10.

Sometimes we wish to aggregate numerical ratings instead of categorical ratings. Consider the above case again but with the median (which is 1) instead of the mode. It is well known that the median of a vector x is equal to argminμ1Rr=1R|xr-μ| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{argmin}\,}}_{\mu }\frac{1}{R}\sum _{r=1}^R|x_{r}-\mu |$$\end{document} , so minμ1Rr=1R|xr-μ| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\min _{\mu }\frac{1}{R}\sum _{r=1}^R|x_{r}-\mu |$$\end{document} (mean absolute deviation from the median) appears to be a reasonable measure of the mean disagreement when we use the median as the aggregation method. The resulting mean disagreement of the previous example is minμ1Rr=1R|xr-μ|=110r=110|xr-1|=4/10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\min _{\mu }\frac{1}{R}\sum _{r=1}^R|x_{r}-\mu |=\frac{1}{10} \sum _{r=1}^{10}|x_{r}-1|=4/10$$\end{document} .

The “modal” and “median” disagreement measures are instances of an intuitive generalization of the variance called the Fréchet variance (Dubey and Müller, Reference Dubey and Müller2019). Let l be a distance function satisfying l(x,y)0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l(x,y)\ge 0$$\end{document} and l(x,x)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l(x,x)=0$$\end{document} , and let A={x1,x2,,xR} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A=\{x_{1},x_{2},\ldots ,x_{R}\}$$\end{document} be a set of points. The sample Fréchet mean of A is defined as the (not necessarily unique) point μl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{l}$$\end{document} that minimizes the sum of distances to all points in A, that is,Footnote 4

(3.1) μl[A]=argminμr=1Rl(μ,xr). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _{l}[A]={{\,\textrm{argmin}\,}}_{\mu }\sum _{r=1}^{R}l(\mu ,x_{r}). \end{aligned}$$\end{document}

Similarly, the sample Fréchet variance on A with distance function l is

(3.2) V(l)[A]=minμr=1R1Rl(μ,xr)=r=1R1Rl(μl[A],xr). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} V(l)[A]=\min _{\mu }\sum _{r=1}^{R}\frac{1}{R}l(\mu ,x_{r}) =\sum _{r=1}^{R}\frac{1}{R}l(\mu _{l}[A],x_{r}). \end{aligned}$$\end{document}

The Fréchet mean (Fréchet, Reference Fréchet1948) is a generalization of centroids to arbitrary distance functions l; likewise, the Fréchet variance is a generalization of dispersion to any such distance function. They are best understood through a decision-theoretic lens: The Fréchet mean of A represents your best guess of the true classification or value of an item according to the distance l; the Fréchet variance V(l) is the decision-theoretic risk associated with the choice. See Cooil and Rust (Reference Cooil and Rust1994) for an investigation of a closely related idea in the context of agreement measures.

Define the g-dimensional disagreement based on l as

(3.3) d(x1,,xg)=V(l)[{x1,,xg}]. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} d(\varvec{x}_{1},\ldots ,\varvec{x}_{g})=V(l) [\{\varvec{x}_{1},\ldots ,\varvec{x}_{g}\}]. \end{aligned}$$\end{document}

The most important distance functions are:

  1. (i) d0(x,y)=1[xy] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}(x,y)=1[x\ne y]$$\end{document} . Generalizes the nominal distance. If the data are categorical, the Fréchet mean μd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{d}$$\end{document} equals the mode, and the Fréchet variance equals the percentage of observations different from the mode. If we are dealing with vector-valued data with I elements each, it might be preferable to use I-1i=1I1[xiyi] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I^{-1}\sum _{i=1}^{I}1[x_{i}\ne y_{i}]$$\end{document} instead, as it counts each dimension of the nominal data separately.

  2. (ii) d1(x,y)=||x-y||1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{1}(x,y)=||x-y||_{1}$$\end{document} . For scalar ratings, the Fréchet mean is equal to the sample median. The Fréchet variance equals the sample mean absolute deviation from the median, i.e., 1Rr=1R|xr-μd| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{R}\sum _{r=1}^{R}|x_{r}-\mu _d|$$\end{document} , where μd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _d$$\end{document} is the sample median.

  3. (iii) d22(x,y)=||x-y||22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}^{2}(x,y)=||x-y||_{2}^{2}$$\end{document} . For scalar ratings, the Fréchet mean is equal to the sample mean μd=1Rr=1Rxr \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _d=\frac{1}{R}\sum _{r=1}^{R}x_{r}$$\end{document} , and the Fréchet variance is equal to the biased sample variance of {x1,x2,,xR} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{x_{1},x_{2},\ldots ,x_{R}\}$$\end{document} , that is, 1Rr=1R(xr-μd)2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{R}\sum _{r=1}^{R}(x_{r}-\mu _d)^{2}$$\end{document} .

  4. (iv) d2(x,y)=||x-y||2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}(x,y)=||x-y||_{2}$$\end{document} . For vector-valued data, the Fréchet mean has no simple formula, but is known as the geometric median. If the data is scalar, d2=d1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}=d_{1}$$\end{document} , which implies that the Fréchet mean equals the median, hence the name. There is an extensive literature on the geometric median; see, e.g., Drezner et al. (Reference Drezner, Klamroth, Schöbel and Wesolowsky2002) for an overview and Cohen et al. (Reference Cohen, Lee, Miller, Pachocki and Sidford2016) for how to compute it. When the ratings are vector-valued, the geometric median is far more computationally expensive than the Fréchet mean based on ||x-y||22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||x-y||_{2}^{2}$$\end{document} .

For any p[0,] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\in [0,\infty ]$$\end{document} and pair of vectors x1,x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},x_{2}$$\end{document} , we have the following (proved in Appendix, Sect. 6):

(3.4) V(dp)[x1,x2]=12dp(x1,x2),V(dpp)[x1,x2]=12pdpp(x1,x2). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} V(d_{p})[x_{1},x_{2}]=\frac{1}{2}d_{p}(x_{1},x_{2}),\quad V(d_{p}^{p})[x_{1},x_{2}]=\frac{1}{2^{p}}d_{p}^{p}(x_{1},x_{2}). \end{aligned}$$\end{document}

It follows that κdp=κV(dp) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d_{p}}=\kappa _{V(d_{p})}$$\end{document} and κdpp=κV(dpp) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d_{p}^{p}}=\kappa _{V(d_{p}^{p})}$$\end{document} when we are dealing with pairwise agreement. Thus, the Fréchet variances generalize the pairwise agreement for these distances to g-wise coefficients. But be aware that the particular case of V(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{2}^{2})$$\end{document} constitutes a trivial generalization, as it can be shown that the kappas do not vary with g when using the quadratic Fréchet variance V(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{2}^{2})$$\end{document} . It follows that κV(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _V(d_2^2)$$\end{document} equals the concordance coefficient for every g.

Example 1

Suppose you have R=5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=5$$\end{document} raters and 4 items, with ratings (1, 1, 2, 1, 1), (1, 2, 3, 2, 2), (2, 1, 1, 1, 1), (2, 3, 4, 4, 5). The Fréchet means using the distance |x-y| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|x-y|$$\end{document} equals the sample medians 1, 2, 1, 4. The Fréchet variances are V(d1)=(0.2,0.4,0.2,0.8) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{1})=(0.2,0.4,0.2,0.8)$$\end{document} . To calculate the sample Cohen’s kappa with d=V(d1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=V(d_{1})$$\end{document} , we first find the mean disagreement V(d1)¯=0.4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{V(d_{1})}=0.4$$\end{document} (2.10), then the mean Cohen disagreement, which is 0.73 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\approx 0.73$$\end{document} (2.11). Thus, Cohen’s kappa is 1-0.4/0.73=0.45 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-0.4/0.73=0.45$$\end{document} .

We believe the most useful distance measures will typically be d0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{0}$$\end{document} for categorical data and d1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{1}$$\end{document} for ordinal data, both using g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} . The quadratic distance d22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}^{2}$$\end{document} could be used for ordinal data as well, but is harder to justify, as it is usually not obvious why we would be interested in the squared distance between two observations rather than just the distance itself. The distances dp,p(1,] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{p},p\in (1,\infty ]$$\end{document} , with d2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}$$\end{document} included, are even harder to recommend, as they do not work in a coordinatewise manner for vector data. In any case, it seems most reasonable to go with the R-wise variants of these distance measures, as they make use of all the available information, but the g-wise agreement coefficients ( g<R) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g<R)$$\end{document} do not.

Example 2

In the paper introducing what is now called Fleiss’s kappa, Fleiss (Reference Fleiss1971) discussed an example involving 5 different types of diagnoses, n=30 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=30$$\end{document} patients, and R=6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=6$$\end{document} psychiatrists. The data were originally from Sandifer et al. (Reference Sandifer, Hordern, Timbury and Green1968), but Fleiss removed some ratings to make the design rectangular. We use this data to illustrate the difference between Hubert’s kappa and the Fréchet variances when applied to nominal data with g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} .

Hubert’s kappa is π=0.166 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi =0.166$$\end{document} while Fleiss’ kappa using V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_0)$$\end{document} is π=0.486 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi =0.486$$\end{document} . The substantial difference suggests that a sizeable number of rating vectors contain at least one rating that disagrees with the others. Table 2 summarizes the relevant aspects of the data. The maximal agreement row could potentially go from 1 to 6, but the smallest number of raters agreeing on the classification of an item in this data set is 3. The count row counts the number of rows with the corresponding maximal agreements and distances. According to the Hubert distance, the raters disagree a lot, since only 5 items have a disagreement of 0 and the rest a disagreement of 1. On the other hand, V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} results in a much smaller overall disagreement, with all disagreements smaller than the maximum of 1.

Table 2 Maximal agreement for the data of Fleiss (Reference Fleiss1971).

*The largest number of raters that agree on the classification of an item. Both V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} and Hubert’s distance depend only on this when g=R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=R$$\end{document} .

4. Inference

4.1. Limit Theory Using U-Statistics

Let X1,,Xn \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}_{1},\ldots ,\varvec{X}_{n}$$\end{document} be independently and identically distributed and ψ(x1,,xk) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (x_{1},\ldots ,x_{k})$$\end{document} be a symmetric function. A U-statistic of order k with kernel ψ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi $$\end{document} is

(4.1) Un=nk-1i1,,ikψ(Xi1,,Xik), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} U_{n}=\left( {\begin{array}{c}n\\ k\end{array}}\right) ^{-1}\sum _{i_{1},\ldots ,i_{k}} \psi (\varvec{X}_{i_{1}},\ldots ,\varvec{X}_{i_{k}}), \end{aligned}$$\end{document}

where the sum extends over all k-dimensional tuples satisfying 1i1<i2<n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\le i_{1}<i_{2}<\cdots \le n$$\end{document} .

The theory of U-statistics was established by Hoeffding (Reference Hoeffding1992); for an introduction, see, e.g., Chapter 6.1 of Lehmann (Reference Lehmann2004), Chapter 5 of Serfling (Reference Serfling1980), or the textbook of Lee (Reference Lee2019). These references handle U-statistics where the Xi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i}$$\end{document} s are real-valued, but their results, including the simple results below, hold for vector-valued Xi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i}$$\end{document} s as well (Korolyuk and Borovskich, Reference Korolyuk and Borovskich2013).

The weighted chance agreement of Fleiss-type (Cohen-type) is a U-statistic with kernel Fd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{d}$$\end{document} ( Cd) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{d})$$\end{document} , of order g. The disagreement is a U-statistic with kernel Dd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_{d}$$\end{document} , which has order 1. To find the asymptotic variance of the kappas, we will use formulas for the asymptotic covariance of U-statistics. Let U1n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{1n}$$\end{document} and U2n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{2n}$$\end{document} be two U-statistics of n observations with symmetric kernel functions ψ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{1}$$\end{document} , ψ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{2}$$\end{document} of dimensions k1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{1}$$\end{document} and k2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{2}$$\end{document} . Define

σ12=Var(E[ψ1(X1,,Xk1)X1)]),σ12=Cov(E[ψ1(X1,,Xk1)X1)],E[ψ2(X1,,Xk2)X1)]). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{1}^{2}= & {} {{\,\textrm{Var}\,}}(E[\psi _{1}(\varvec{X}_{1},\ldots , \varvec{X}_{k_{1}})\mid \varvec{X}_{1})]),\\ \sigma _{12}= & {} {{\,\textrm{Cov}\,}}(E[\psi _{1}(\varvec{X}_{1},\ldots ,\varvec{X}_{k_{1}}) \mid \varvec{X}_{1})],E[\psi _{2}(\varvec{X}_{1}, \ldots ,\varvec{X}_{k_{2}})\mid \varvec{X}_{1})]). \end{aligned}$$\end{document}

Then we have nCov(U1n,U2n)k1k2σ12 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n{{\,\textrm{Cov}\,}}(U_{1n},U_{2n})\rightarrow k_{1}k_{2}\sigma _{12}$$\end{document} and nVar(U1n)k12σ12 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n{{\,\textrm{Var}\,}}(U_{1n})\rightarrow k_{1}^{2}\sigma _{1}^{2}$$\end{document} (Lee, Reference Lee2019, Theorem 2, p. 76)). It is also possible to calculate the exact covariances, which could potentially make the asymptotic variances for the kappas perform better. See Appendix, Sect. 6 for the formula for the exact covariance (Lee, Reference Lee2019, Theorem 2, p. 17)).

Lemma 1

Define the parameter vectors p=(Dd,Cd,Fd) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{p}=(D_{d},C_{d},F_{d})$$\end{document} and p^=(D^d,C^d,F^d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\varvec{p}}=(\hat{D}_{d},\hat{C}_{d},\hat{F}_{d})$$\end{document} . Then n(p^-p)dN(0,Σ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}(\hat{\varvec{p}}-\varvec{p}){\mathop {\rightarrow }\limits ^{d}}N(0,\Sigma )$$\end{document} , where Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma $$\end{document} is the covariance matrix with elements

σ11=σD2=VarDd(X1),σ12=σCD=gCov(μdC(X1),Dd(X1)),σ22=σC2=g2VarμdC(X1),σ13=σFD=gCov(μdF(X1),Dd(X1)),σ33=σF2=g2VarμdF(X1),σ23=σCF=gCov(μdC(X1),μdF(X1)). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{11}= & {} \sigma _{D}^{2}= & {} {{\,\textrm{Var}\,}}D_{d}(\varvec{X}_{1})&,\quad&\sigma _{12}= & {} \sigma _{CD}= & {} g{{\,\textrm{Cov}\,}}(\mu _{dC}(\varvec{X}_{1}),D_{d}(\varvec{X}_{1})),\\ \sigma _{22}= & {} \sigma _{C}^{2}= & {} g^{2}{{\,\textrm{Var}\,}}\mu _{dC}(\varvec{X}_{1})&,\quad&\sigma _{13}= & {} \sigma _{FD}= & {} g{{\,\textrm{Cov}\,}}(\mu _{dF}(\varvec{X}_{1}),D_{d}(\varvec{X}_{1})),\\ \sigma _{33}= & {} \sigma _{F}^{2}= & {} g^{2}{{\,\textrm{Var}\,}}\mu _{dF}(\varvec{X}_{1})&,\quad&\sigma _{23}= & {} \sigma _{CF}= & {} g{{\,\textrm{Cov}\,}}(\mu _{dC}(\varvec{X}_{1}),\mu _{dF}(\varvec{X}_{1})). \end{aligned}$$\end{document}

Here the variable μdC(X1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{dC}(\varvec{X}_{1})$$\end{document} , and μdF(X1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{dF}(\varvec{X}_{1})$$\end{document} are defined as

μdC(X1)=E[Cd(X1,,Xg)X1]μdF(X1)=E[Fd(X1,,Xg)X1]. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _{dC}(\varvec{X}_{1})=E[C_{d}(\varvec{X}_{1}, \ldots ,\varvec{X}_{g})\mid \varvec{X}_{1}] \quad \mu _{dF}(\varvec{X}_{1}) =E[F_{d}(\varvec{X}_{1},\ldots ,\varvec{X}_{g}) \mid \varvec{X}_{1}]. \end{aligned}$$\end{document}

The form of the covariance matrix follows from the remarks preceding the lemma. Asymptotic normality follows from a general theorem about asymptotic normality of U-statistics, see, e.g., Theorem 2 of Lee (Reference Lee2019, p. 76).

We want to use Lemma 1 to find the limit distribution of the generalized Cohen’s kappa and Fleiss’s kappa. To this end, recall the multivariate delta method (see, e.g., Lehmann, Reference Lehmann2004, Theorem 5.2.3). Let f:RkR \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:\mathbb {R}^{k}\rightarrow \mathbb {R}$$\end{document} be continuously differentiable at θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and suppose that n(θ^-θ)dN(0,Σ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}(\hat{\theta }-\theta ){\mathop {\rightarrow }\limits ^{d}}N(0,\Sigma )$$\end{document} . Then

(4.2) n[f(θ^)-f(θ)]dN(0,f(θ)TΣf(θ)), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{n}[f(\hat{\theta })-f(\theta )]{\mathop {\rightarrow }\limits ^{d}}N(0,\nabla f(\theta )^{T}\Sigma \nabla f(\theta )), \end{aligned}$$\end{document}

where f(θ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nabla f(\theta )$$\end{document} denotes the gradient of f at θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} .

In the case of Cohen’s kappa and Fleiss’s kappa, we find that

(4.3) κd=1Cd-1,DdCd,πd=1Fd-1,DdFd. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla \kappa _{d}= & {} \frac{1}{C_{d}}\left( -1,\frac{D_{d}}{C_{d}}\right) , \quad \nabla \pi _{d}=\frac{1}{F_{d}}\left( -1,\frac{D_{d}}{F_{d}}\right) . \end{aligned}$$\end{document}

Using some algebra, the expressions for the asymptotic variances follow from Lemma 1 and the above gradients.

Proposition 1

Then Cohen’s kappa κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and Fleiss’s kappa π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} are asymptotically normal, and their asymptotic variances are

(4.4) σκ2=σD21Cd2-2σCDDdCd3+σC2Dd2Cd4,σπ2=σD21Fd2-2σFDDdFd3+σF2Dd2Fd4. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{\kappa }^{2}= & {} \sigma _{D}^{2}\frac{1}{C_{d}^{2}}-2\sigma _{CD} \frac{D_{d}}{C_{d}^{3}}+\sigma _{C}^{2}\frac{D_{d}^{2}}{C_{d}^{4}}, \nonumber \\ \sigma _{\pi }^{2}= & {} \sigma _{D}^{2}\frac{1}{F_{d}^{2}}-2\sigma _{FD}\frac{D_{d}}{F_{d}^{3}} +\sigma _{F}^{2}\frac{D_{d}^{2}}{F_{d}^{4}}. \end{aligned}$$\end{document}

These results are also valid for κ^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}^{u}$$\end{document} and π^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}^{u}$$\end{document} . Since the sample Krippendorff’s alpha (see note below) is equal to α^d=π^d+12Rn(1-π^d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\alpha }_{d}=\hat{\pi }_{d}+\frac{1}{2Rn}(1-\hat{\pi }_{d})$$\end{document} , it is also asymptotically normal with asymptotic variance σπ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{\pi }^{2}$$\end{document} .

With g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} and a finite number of categories, Schouten (Reference Schouten1980) derived σπ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{\pi }^{2}$$\end{document} , while Schouten (Reference Schouten1982) and O’Connell and Dobson (Reference O’Connell and Dobson1984) derived σκ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{\kappa }^{2}$$\end{document} . The result for Krippendorff’s alpha is, to our knowledge, new.

A brief aside on Krippendorff’s alpha Krippendorff’s alpha (Krippendorff, Reference Krippendorff1970) is an agreement coefficient especially popular in content analysis (Krippendorff, Reference Krippendorff2018). It has no population definition, but its sample definition equals α^d=π^d+1N(1-π^d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\alpha }_{d}=\hat{\pi }_{d}+\frac{1}{N}(1-\hat{\pi }_{d})$$\end{document} (the total sample size N equals 2Rn in the case of a rectangular design); see Proposition 3 in Appendix for a justification. For this reason, all of the results about the limit of π^du \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}^{u}$$\end{document} apply to Krippendorff’s alpha as well, as it is an asymptotically equivalent estimator of πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{d}$$\end{document} . Note, however, that Krippendorff (Reference Krippendorff2018) emphasizes the use of non-rectangular designs, and the limit results in the preceding section do not hold for such study designs.

4.2. Estimating the Variances

The unknown quantities D^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{D}_{d}$$\end{document} , C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{C}_{d}$$\end{document} , and F^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{F}_{d}$$\end{document} can be estimated using their sample counterparts. The variances and covariances can be estimated using the empirical (co)variances of the estimated μ^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }$$\end{document} s. These have formulas

(4.5) μ^d(xi)=Dd(xi),μ^dC(xi)=n-(g-1)i1,,ig-1Cd(xi,xi1,,xig-1),μ^dF(xi)=n-(g-1)i1,,ig-1Fd(xi,xi1,,xig-1), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\mu }_{d}(\varvec{x}_{i})= & {} D_{d}(\varvec{x}_{i}),\nonumber \\ \hat{\mu }_{dC}(\varvec{x}_{i})= & {} n^{-(g-1)}\sum _{i_{1},\ldots ,i_{g-1}}C_{d}(\varvec{x}_{i}, \varvec{x}_{i_{1}},\ldots ,\varvec{x}_{i_{g-1}}),\nonumber \\ \hat{\mu }_{dF}(\varvec{x}_{i})= & {} n^{-(g-1)}\sum _{i_{1},\ldots ,i_{g-1}}F_{d}(\varvec{x}_{i}, \varvec{x}_{i_{1}},\ldots ,\varvec{x}_{i_{g-1}}), \end{aligned}$$\end{document}

where the index sets run over all combinations with repetitions of (i1,i2,,ig-1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(i_{1},i_{2},\ldots ,i_{g-1})$$\end{document} .

Observe that estimating μ^dC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }_{dC}$$\end{document} and μ^dF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }_{dF}$$\end{document} directly is computationally very expensive, especially when done without binning, which cannot be done with continuous data. The obvious computation of all μ^dC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }_\text {dC}$$\end{document} requires a number of operations on the order of ng-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{g-1}$$\end{document} , which is prohibitively expensive for large n and g. However, there are few applications of agreement measures with very large n and g, so this should not be a serious problem in practice. We note that less computationally demanding procedures are possible for the quadratic Fréchet variance V(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_2^2)$$\end{document} , as it can be shown that its associated kappas are invariant under g. Thus, we may use the computationally very effective methods for the concordance coefficient outlined by, e.g., Carrasco and Jover (Reference Carrasco and Jover2003).

From the definitions of D^d,C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{D}_{d},\hat{C}_{d}$$\end{document} , and F^d, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{F}_{d},$$\end{document} (4), we quickly deduce that μ^d¯=D^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{\hat{\mu }_{d}}=\hat{D}_{d}$$\end{document} , μ^dC¯=C^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{\hat{\mu }_{dC}}=\hat{C}_{d}$$\end{document} and μ^dF¯=F^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{\hat{\mu }_{dF}}=\hat{F}_{d}$$\end{document} . Using this fact, we can define the estimators

σ^C2=g2n-1i=1n(μ^dC(xi)-C^d)2,σ^CD2=gn-1i=1n(μ^dC(xi)-C^d)(μ^d(xi)-D^d), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\sigma }_{C}^{2}=\frac{g^2}{n-1}\sum _{i=1}^{n}(\hat{\mu }_{dC} (\varvec{x}_{i})-\hat{C}_{d})^{2},\quad \hat{\sigma }_{CD}^{2} =\frac{g}{n-1}\sum _{i=1}^{n}(\hat{\mu }_{dC}(\varvec{x}_{i}) -\hat{C}_{d})(\hat{\mu }_{d}(\varvec{x}_{i})-\hat{D}_{d}), \end{aligned}$$\end{document}

and σ^D2=1n-1i=1n(μ^d(xi)-D^d)2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{D}^{2}=\frac{1}{n-1}\sum _{i=1}^{n}(\hat{\mu }_{d} (\varvec{x}_{i})-\hat{D}_{d})^{2}$$\end{document} . Moreover, we can estimate σ^F2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{F}^{2}$$\end{document} and σ^FD2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{FD}^{2}$$\end{document} in the same way, substituting μ^dF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }_{dF}$$\end{document} for μ^dC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }_{dC}$$\end{document} . Using the formulas for the theoretical variances (4.4), we find the estimators

(4.6) σ^κ2=σ^D21C^d2-2σ^CDD^dC^d3+σ^C2D^d2C^d4, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} \hat{\sigma }_{\kappa }^{2}=\hat{\sigma }_{D}^{2}\frac{1}{\hat{C}_{d}^{2}} -2\hat{\sigma }_{CD}\frac{\hat{D}_{d}}{\hat{C}_{d}^{3}} +\hat{\sigma }_{C}^{2}\frac{\hat{D}_{d}^{2}}{\hat{C}_{d}^{4}}, \end{aligned}$$\end{document}
(4.7) σ^π2=σ^D21F^d2-2σ^FDD^dF^d3+σ^F2D^d2F^d4. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} \hat{\sigma }_{\pi }^{2}=\hat{\sigma }_{D}^{2}\frac{1}{\hat{F}_{d}^{2}} -2\hat{\sigma }_{FD}\frac{\hat{D}_{d}}{\hat{F}_{d}^{3}}+\hat{\sigma }_{F}^{2} \frac{\hat{D}_{d}^{2}}{\hat{F}_{d}^{4}}. \end{aligned}$$\end{document}

The variance estimator σ^π2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{\pi }^{2}$$\end{document} coincides with that of Gwet (Reference Gwet2021, equation 4) in the case of nominal weights; see Appendix (Sect. 6) for a proof sketch.

4.3. Improving Approximate Normality with the Arcsine and Fisher Transforms

It is well known that the Fisher transform (Fisher, Reference Fisher1915) improves the inference for the correlation coefficient. If r is the sample correlation, artanh(r)=12log[(1+r)/(1-r)] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{artanh}\,}}(r)=\frac{1}{2}\log [(1+r)/(1-r)]$$\end{document} has approximately the same variance for most r, and its distribution is closer to normal than that of the untransformed r, especially when the population correlation is close to ±1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm 1$$\end{document} . This transform makes sense outside the world of correlations; for instance, Lin (Reference Lin1989) used the Fisher transform to improve the normality of the quadratically weighted Cohen’s kappa.

The arcsine is another reasonable transformation of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} . The arcsine is the inverse of the sine function and is defined as arcsinx=1/1-x2dx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arcsin x=\int 1/\sqrt{1-x^{2}}\textrm{d}x$$\end{document} . In ecology (Warton and Hui, Reference Warton and Hui2011), the arcsine transformation denotes arcsinp \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arcsin \sqrt{p}$$\end{document} , where p is a probability. We do not take square root, however, as κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} can be negative.

Calculating the limiting variance of arcsinκ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arcsin \hat{\kappa }_{d}$$\end{document} and arcsinπ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arcsin \hat{\pi }_{d}$$\end{document} requires an additional application of the delta method (4.2). Using that ddxarcsin(x)=1/1-x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\textrm{d}}{\textrm{d}x}\arcsin (x)=1/\sqrt{1-x^{2}}$$\end{document} and ddxartanh(x)=1/(1-x2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\textrm{d}}{\textrm{d}x}{{\,\textrm{artanh}\,}}(x)=1/(1-x^{2})$$\end{document} , we find

(4.8) n(arcsinκ^d-arcsinκd)N(0,(1-κd2)-1σκ2), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{n}(\arcsin \hat{\kappa }_{d}-\arcsin \kappa _{d})\rightarrow & {} N(0,(1-\kappa _{d}^{2})^{-1}\sigma _{\kappa }^{2}), \end{aligned}$$\end{document}
(4.9) n(artanhκ^d-artanhκd)N(0,(1-κd2)-2σκ2). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{n}({{\,\textrm{artanh}\,}}\hat{\kappa }_{d}-{{\,\textrm{artanh}\,}}\kappa _{d})\rightarrow & {} N(0,(1-\kappa _{d}^{2})^{-2}\sigma _{\kappa }^{2}). \end{aligned}$$\end{document}

Expressions for π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} can be found by swapping κd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}$$\end{document} for πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{d}$$\end{document} and σκ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{\kappa }^{2}$$\end{document} for σπ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{\pi }^{2}$$\end{document} .

Example 3

This example illustrates that the arcsine and Fisher transforms may make the sampling distribution closer to the normal distribution. Let the number of raters be R=3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=3$$\end{document} , the disagreement function be quadratic (with g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} ), and the number of items be n=20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} . There are five categories and the true classification of an item is one of {1,2,3,4,5} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{1,2,3,4,5\}$$\end{document} with probability 1/5 each. Every rater knows the true classification of an item with probability 0.9. If they do not know the correct classification, they will guess a classification from {1,2,3,4,5} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{1,2,3,4,5\}$$\end{document} uniformly at random. One can show that the population value of the quadratically weighted Cohen’s kappa is 0.816 under these circumstances, following the arguments of Perreault and Leigh (Reference Perreault and Leigh1989). We simulate the value of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} a total of N=50,000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=50,000$$\end{document} times and transform them using the identity transform, the arcsine transform, and the Fisher transform. The results are shown in Fig. 1. The arcsine transform appears to bring the sampling distribution of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} closer to the normal distribution, with the Fisher transform also improving normality quite a bit.

Figure 1 Simulated sampling distribution of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} for quadratic weights using three transformations, n=20,R=3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20, R=3$$\end{document} . The simulation setup is described in Example 3. The arcsine transform makes the sampling distribution closest to the normal distribution.

5. Confidence Intervals

Using the methodology we have developed, we can easily construct confidence intervals for the agreement coefficients.

We describe our three confidence interval constructions only for Cohen’s kappa, as the intervals using Fleiss’ kappa can be found by replacing every instance κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} with π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} and σ^κ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{\kappa }^{2}$$\end{document} with σ^π2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{\pi }^{2}$$\end{document} . We use the two-sided t-distribution-based confidence intervals with nominal level 1-α=0.95 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\alpha =0.95$$\end{document} . Let c be the (1-α/2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-\alpha /2)$$\end{document} -quantile of the t distribution with n-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n-1$$\end{document} degrees of freedom. The basic interval is

(5.1) [κ^d-cσ^κ/n-1,κ^d+cσ^κ/n-1], \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {[}\hat{\kappa }_{d}-c\hat{\sigma }_{\kappa }/\sqrt{n-1}, \hat{\kappa }_{d}+c\hat{\sigma }_{\kappa }/\sqrt{n-1}], \end{aligned}$$\end{document}

where σ^κ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_\kappa $$\end{document} is the estimated variance described in equation (4.6).

The arcsine interval replaces the basic limits with

(5.2) sinarcsinκ^d±c(1-κ^d2)-1/2σ^κ/n-1, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sin \left( \arcsin \hat{\kappa }_{d}\pm c(1-\hat{\kappa }_{d}^{2})^{-1/2}\hat{\sigma }_{\kappa }/ \sqrt{n-1}\right) , \end{aligned}$$\end{document}

where (1-κ^d2)-1σ^κ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-\hat{\kappa }_{d}^{2})^{-1}\hat{\sigma }_{\kappa }^{2}$$\end{document} is the asymptotic variance of arcsinκ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arcsin \hat{\kappa }_{d}$$\end{document} (4.8). The Fisher interval uses the area hyperbolic tangent,

(5.3) tanhartanhκ^d±c(1-κ^d2)-1σ^κ/n-1, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tanh \left( {{\,\textrm{artanh}\,}}\hat{\kappa }_{d}\pm c(1-\hat{\kappa }_{d}^{2})^{-1}\hat{\sigma }_{\kappa }/ \sqrt{n-1}\right) , \end{aligned}$$\end{document}

where (1-κ^d2)-2σ^κ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-\hat{\kappa }_{d}^{2})^{-2}\hat{\sigma }_{\kappa }^{2}$$\end{document} is the asymptotic variance of artanhκ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{artanh}\,}}\hat{\kappa }_{d}$$\end{document} (4.9).

Using the methodology just described, we can calculate confidence intervals for the Fleiss (Reference Fleiss1971) data of Example 2.

Example 4

(Ex. 2 cont.) Using the data of Fleiss (Reference Fleiss1971), we calculate arcsine confidence intervals for the g-wise Fleiss’s kappa. The raters are not the same for all items, but it seems plausible to assume that the ratings are exchangeable given the item. The diagnoses are essentially categorical in nature; hence, we will only consider V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} and Hubert’s disagreement function. The results are shown in Table 3. We see that the agreement coefficients agree when g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} , as both V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} and Hubert’s disagreement function equals the nominal agreement in this case. But the coefficients differ substantially as g increases. This is to be expected, as Hubert’s disagreement function measures consensus while V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} measures the number of observations different from the mode. Observe that V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_0)$$\end{document} is not invariant with respect to g, hence it is a proper alternative to the classical Fleiss’s kappa. Moreover, all confidence intervals are of comparable length.

Table 3 Confidence intervals for the data of Fleiss (Reference Fleiss1971) using the arcsine method.

*This is Hubert’s kappa when the Hubert disagreement is used.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\dagger }$$\end{document} Hubert disagreement equals the nominal disagreement V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} when g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} .

The preceding example fits best into the context of Fleiss’ kappa, as the identity of the raters are unknown. Moreover, there is no ordinal structure in the data, making the V(d1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_1)$$\end{document} and V(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_2^2)$$\end{document} distances unnatural to employ. Our next example concerns the Fréchet variances applied to a case of ordinal data when the identity of the raters are known.

Example 5

Zapf et al. (Reference Zapf, Castell, Morawietz and Karch2016) studied bootstrap intervals for Fleiss’s kappa and Krippendorff’s alpha using simulations and a case study. Their case study concerned the histopathological assessment of breast cancer and involved ratings performed by R=4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=4$$\end{document} senior pathologists and n=50 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=50$$\end{document} breast cancer biopsies. We apply the arcsine method to calculate confidence intervals and point estimates, displayed in Table 4. We focus on Cohen’s kappa since the same four pathologists rate each cancer biopsy, but we include a column for Fleiss’s kappa when g=4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=4$$\end{document} for comparison’s sake. When g=4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=4$$\end{document} , Cohen’s kappa and Fleiss’s kappa are as good as indistinguishable. As can be verified by using the code in the supplementary material, this happens for the other gs as well. It is not generally the case that Fleiss’s kappa and Cohen’s kappa nearly coincide, but it is likely to happen if the marginal ratings are approximately the same for all raters, as is the case in this data set. There is a sizable difference between the disagreement functions, but there is not typically a big difference when changing gs, provided we keep the disagreement functions constant. It remains to be seen whether this is common or not. The exception is Hubert’s disagreement function, which decreases quite a bit. (As in the Fleiss (Reference Fleiss1971) example, this is expected, as the Hubert’s disagreement function is a consensus measure.) Observe that the kappas under the quadratic Fréchet variance V(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d^2_2)$$\end{document} do not change with g, which is always the case.

Table 4 Confidence intervals for Zapf et al. (Reference Zapf, Castell, Morawietz and Karch2016) using the arcsine method.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\dagger }$$\end{document} Hubert disagreement equals the nominal disagreement V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} when g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} .

5.1. Simulation of Confidence Sets When g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document}

We include a small simulation study on the performance of confidence sets using two models: A Perreault–Leigh model for discrete rating data and a normal model for continuous rating data. For both models, we investigate the following parameters:

  1. (i) Number of raters R. We use 2, 5, 20, which corresponds to a small, medium, and large selection of raters.

  2. (ii) Sample sizes n. We use n=10,40,100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=10,40,100$$\end{document} , corresponding to small, medium, and large agreement studies.

  3. (iii) Disagreement functions. Nominal disagreement 1[xy] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1[x\ne y]$$\end{document} , quadratic disagreement (x-y)2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x-y)^{2}$$\end{document} , and absolute value disagreement |x-y| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|x-y|$$\end{document} .

  4. (iv) Methods. A basic interval without transformations, an arcsine-transformed interval, and a Fisher transformed interval.

5.1.1. A Perreault–Leigh Model

Perreault and Leigh (Reference Perreault and Leigh1989) discussed a particular model for ratings in which each rated user either knows the correct answer or guesses uniformly at random. Similar models have been used by Gwet (Reference Gwet2008); Maxwell (Reference Maxwell1977), among others; see Moss (Reference Moss2023) for a thorough discussion of such models. We assume there are five categories encoded as C={-2,-1,0,1,2} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=\{-2,-1,0,1,2\}$$\end{document} , and the distribution of the true classification distribution is uniform. For each item rated, the rth rater knows the correct classification with probability 0.8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{0.8}$$\end{document} . If not, he guesses, picking a number from C uniformly at random. Then κd=πd=0.8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}=\pi _{d}=0.8$$\end{document} for all weights and the number of raters, as can be verified by following the arguments of Perreault and Leigh (Reference Perreault and Leigh1989). We run each simulation N=10,000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=10,000$$\end{document} times.

The simulated lengths and coverages for Cohen’s kappa are given in Table 5. Two features stand out in Table 5. First, the confidence intervals have almost indistinguishable lengths and coverages when either R or n is large. Second, the basic interval has worse coverage than the arcsine and Fisher intervals when n is small, with the Fisher interval having coverage slightly closer to nominal than the arcsine interval. However, the better nominal coverage comes at the expense of greater lengths. In particular, for the absolute value weight, the coverage of the arcsine interval is greater than the coverage of the Fisher interval, but its length is shorter! The table for Fleiss’s kappa is similar and can be found in Appendix, Table 8.

Table 5 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Cohen’s kappa.

Coverages greater than 0.95 are in bold.

5.1.2. Normal Model

In this study, the rating data is distributed according to the multivariate normal N(0,Σ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N(0,\Sigma )$$\end{document} , where Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma $$\end{document} is the R×R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R\times R$$\end{document} correlation matrix with off-diagonal elements Σrirj=ρ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma _{r_{i}r_{j}}=\rho $$\end{document} . Since the data is continuous, we study the absolute value disagreement d1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{1}$$\end{document} and the quadratic disagreement d22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{2}^{2}$$\end{document} only. The true values are κd2=πd22=ρ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d_{2}}=\pi _{d_{2}^{2}}=\rho $$\end{document} and κd1=πd1=1-1-ρ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d_{1}}=\pi _{d_{1}}=1-\sqrt{1-\rho }$$\end{document} . See Appendix (Sect. 6) for details on the computation of these true values. We use ρ=0.7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.7$$\end{document} , and hence, κd22=0.7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d_{2}^{2}}=0.7$$\end{document} and κd1=0.45 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d_{1}}=0.45$$\end{document} . We run each simulation N=1,000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1,000$$\end{document} times.Footnote 5 We note that agreement coefficients are often called concordance coefficients when dealing with continuous data, especially when the quadratic distance is used. Lin’s concordance coefficient (Lin, Reference Lin1989, Reference Lin1992) is a prominent example.

The simulated lengths and coverages for Cohen’s kappa are given in Table 6. There is barely any difference between the three confidence interval constructions. Taken together with the results for the Perreault–Leigh model, where the basic interval performs worse than the other two, we would recommend the usage of either the arcsine or Fisher interval. Again, the table for Fleiss’s kappa is very similar and can be found in Appendix (Table 9).

Table 6 Coverage (first entry) and lengths (second entry) of confidence intervals: normal model, Cohen’s kappa.

Coverages greater than 0.95 are in bold.

5.2. Simulation of Confidence Sets when g2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g\ne 2$$\end{document}

Table 7 contains simulations from the Perreault–Leigh model (Sect. 5.1.1) with N=1000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=1000$$\end{document} repetitions and R=5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=5$$\end{document} raters using the Fréchet variances V(d0) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{0})$$\end{document} , V(d1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{1})$$\end{document} , and Hubert’s disagreement function. We drop V(d22) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{2}^{2})$$\end{document} since it does not vary with g. To save space, we drop the basic confidence interval in the simulation. As before, we show the results only for the Cohen-type disagreement, with the Fleiss-type disagreement relegated to Appendix (Table 10). All coverages are decent, and the coverages and lengths are similar across the board.

Table 7 Coverage (first entry) and lengths (second entry) of confidence intervals for g-wise coefficients: Perreault–Leigh model, Cohen’s kappa.

Coverages greater than 0.95 are in bold.

6. Concluding Remarks

When choosing an agreement coefficient one has to carefully think through exactly what one wishes to measure. The Fréchet variances are attractive because of their interpretation. You measure how much the raters disagree with the generalized mean rater, and then adjust for chance. In the case of nominal data, we measure the disagreement with the modal rater. When dealing with numerical data, we may measure disagreement with the median rater (using the absolute value distance), or the mean rater (using the quadratic distance), or use any other Fréchet variance defined on numeric data.

When dealing with nominal data, we believe that using the Fréchet variance, which measures the distance from the mode, is a reasonable choice. But other options are certainly possible, even when dealing with g-wise agreement measures. For example, one could use the entropy instead, with distance measure d(x1,x2,,xg)=-i=1g#iglog#ig \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d(x_{1},x_{2},\ldots ,x_{g})=-\sum _{i=1}^{g}\frac{\#i}{g}\log \frac{\#i}{g}$$\end{document} , where #i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\#i$$\end{document} counts the number of elements in (x1,x2,,xg) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x_{1},x_{2},\ldots ,x_{g})$$\end{document} classified as i, which could be useful when the number of raters is finite but large. The topic of how to choose reasonable distance measures for g-wise agreement studies has not been thoroughly studied, and there might be options preferable to the Fréchet variances that have not yet been found.

We have only covered rectangular design, where every item is rated by the same number of raters. It is quite easy to generalize the definitions of κd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}$$\end{document} and πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{d}$$\end{document} to non-rectangular designs, as we have done in Appendix, Sect. 6. But inference appears to be quite difficult, probably requiring additional assumptions for the case of non-exchangeable ratings.

In Sect. 4, we introduced the U-statistic-based estimators of Cd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_d$$\end{document} and Fd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_d$$\end{document} , but only used them for theoretical purposes. The U-statistic-based estimators may plausibly outperform the classical V-statistic-based estimators since they are minimum variance unbiased estimators. It would be interesting to see whether the U-statistic-based estimators could outperform the traditional V-statistic-based estimators when n is small, for example in terms of mean squared error or confidence interval coverage.

The confidence intervals based on the arcsine and Fisher transforms perform better than the basic, untransformed interval. It is unclear which one of these intervals to prefer, but it barely matters when the sample size is sufficiently large. It might be possible to improve all of these intervals. Small-sample corrections to the variance appear feasible, with potential openings in the application of the delta rule and in the calculation of Σ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma $$\end{document} of Lemma 1. We have used the arcsine and Fisher transforms to improve approximate normality of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} , but this choice is semi-arbitrary. Better variance-stabilizing transformations might be found by inspecting the formula for the variances of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} in Proposition 1. The confidence intervals used in the simulation are only known to be first-order accurate. To make second-order accurate confidence intervals, it would be possible to use the explicit formula for the variances to construct studentized confidence intervals, i.e., bootstrap-t intervals (Efron, Reference Efron1987), which are second-order accurate.

None of these approaches is guaranteed to help when n is small, especially when dealing with categorical data, as the sampling distributions of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} and π^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}$$\end{document} are discrete and highly irregular. For example, consider the sample distribution of the Perreault–Leigh model (Sect. 5.1) when n=20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} and R=20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=20$$\end{document} , displayed in Fig. 2. (We omit a dominating spike at 1.) As there are C=5< \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=5<\infty $$\end{document} categories, there is a finite number of possible values for κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} to take, which is strongly reflected in the plots, especially for the nominal weight.

Figure 2 Sample distribution of κ^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} for nominal (left) and absolute value (right) weights. Both plots omit a dominating spike at 1. Here n=20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} and j=5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=5$$\end{document} , and we use the Perreault–Leigh model (same parameters as in Sect. 5.1) to simulate the data. There were 2573 unique values for the nominal weight and 8790 unique values for the absolute value weight after N=200,000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=200{,}000$$\end{document} simulations.

The superior performance of methods such as the bootstrap-t depends on the quantity θ^-θse \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\hat{\theta }-\theta }{{{\,\textrm{se}\,}}}$$\end{document} being approximately pivotal, that is, approximately the same for all parameters, possibly after applying a transformation. Judging from the plots in Fig. 2, there is no such transformation.

Funding

Open access funding provided by Norwegian Business School.

Appendix

Agreement Versus Disagreement

Agreement weighting functions are frequently standardized to guarantee that w(x1,x2)0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x_{1},x_{2})\ge 0$$\end{document} , e.g., w(x1,x2)=1-|x1-x2|/max(|x1-x2|) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x_{1},x_{2})=1-|x_{1}-x_{2}|/\max (|x_{1}-x_{2}|)$$\end{document} for the absolute value weights. Standardization is not necessary, as they do not change the values of κd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}$$\end{document} and πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{d}$$\end{document} when it is possible (i.e., when max(|x1-x2|)< \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max (|x_{1}-x_{2}|)<\infty $$\end{document} ), and is not defined otherwise. We choose not to use this operation, as it does not change the value of the agreement coefficients in this paper and is impossible to do when the range of x1,x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},x_{2}$$\end{document} is unbounded.

Proof of Equivalence Between V(dp)(x1,x2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{p})(\varvec{x}_{1},\varvec{x}_{2})$$\end{document} and ||x1-x2|| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||\varvec{x}_{1}-\varvec{x}_{2}||$$\end{document}

Proof

We will show that

V(dp)[x1,x2]=12||x1-x2||p,V(dpp)[x1,x2]=12p||x1-x2||pp. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} V(d_{p})[\varvec{x}_{1},\varvec{x}_{2}]= \frac{1}{2}||\varvec{x}_{1}-\varvec{x}_{2}||_{p},\quad V(d_{p}^{p})[\varvec{x}_{1},\varvec{x}_{2}]= \frac{1}{2^{p}}||\varvec{x}_{1}-\varvec{x}_{2}||_{p}^{p}. \end{aligned}$$\end{document}

First, consider the case when p1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\ge 1$$\end{document} . Using translation invariance and homogeneity of the norm,

||x1-μ||p+||x2-μ||p,=||x1-x1+x22-μ+x1+x22||p+||x2-x1+x22-μ+x1+x22||p,=||x1-x22-ν||p+||-x1-x22-ν||p,=||a-ν||p+||a+ν||p, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned}{} & {} ||x_{1}-\mu ||_{p}+||x_{2}-\mu ||_{p},\\{} & {} \quad = ||x_{1}-\frac{x_{1}+x_{2}}{2}-\mu +\frac{x_{1}+x_{2}}{2}||_{p}+||x_{2}-\frac{x_{1}+x_{2}}{2}-\mu +\frac{x_{1}+x_{2}}{2}||_{p},\\{} & {} \quad = ||\frac{x_{1}-x_{2}}{2}-\nu ||_{p}+||-\frac{x_{1}-x_{2}}{2}-\nu ||_{p},\\{} & {} \quad = ||a-\nu ||_{p}+||a+\nu ||_{p}, \end{aligned}$$\end{document}

where a=x1-x22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a=\frac{x_{1}-x_{2}}{2}$$\end{document} and ν=μ-x1+x22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =\mu -\frac{x_{1}+x_{2}}{2}$$\end{document} .

Observe that

argminν||a+ν||p+||a-ν||p=0,for alla \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{\,\textrm{argmin}\,}}_{\nu }||a+\nu ||_{p}+||a-\nu ||_{p}=0,\quad \text {for all }a \end{aligned}$$\end{document}

implies μ=x1+x22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu =\frac{x_{1}+x_{2}}{2}$$\end{document} .

By the Minkowski inequality,

2p||a||p=||a+ν+a-ν||p(||a-ν||+||a+ν||)p. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} 2^{p}||a||^{p}=||a+\nu +a-\nu ||^{p}\le (||a-\nu ||+||a+\nu ||)^{p}. \end{aligned}$$\end{document}

This is an equality if ||a-ν||=||a+ν||=||a|| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||a-\nu ||=||a+\nu ||=||a||$$\end{document} , i.e., when ν=0, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =0,$$\end{document} as the left side equals (||a-μ||+||a+μ||)p=2p||a||p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(||a-\mu ||+||a+\mu ||)^{p}=2^{p}||a||^{p}$$\end{document} . Now it is easy to verify that V(dp) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{p})$$\end{document} and V(dpp) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(d_{p}^{p})$$\end{document} have the claimed form; just substitute the value μ=x1+x22 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu =\frac{x_{1}+x_{2}}{2}$$\end{document} into the formula for the Fréchet variance, 12(||x1-μ||p+||x2-μ||p) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{2}(||x_{1}-\mu ||_{p}+||x_{2}-\mu ||_{p})$$\end{document} .

When 0<p<1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<p<1$$\end{document} , the function μ||x1-μ||p+||x2-μ||p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu \mapsto ||x_{1}-\mu ||_{p}+||x_{2}-\mu ||_{p}$$\end{document} is stepwise concave on [-,x1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-\infty ,x_{1}]$$\end{document} , [x1,x2] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[x_{1},x_{2}]$$\end{document} , and [x2,) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[x_{2},\infty )$$\end{document} ; hence, its minimum is either x1,x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},x_{2}$$\end{document} , or both. It is clear that both x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1}$$\end{document} and x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{2}$$\end{document} maps to ||x1-x2||p \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$||x_{1}-x_{2}||_{p}$$\end{document} ; hence, both are Fréchet means. The case p=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=0$$\end{document} is obvious and omitted. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

True Values in the Normal Simulation

We give a brief explanation why the true values of κd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}$$\end{document} and πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi _{d}$$\end{document} are 0.8 for the quadratic weights and 1-0.2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\sqrt{0.2}$$\end{document} for the absolute value weights.

First notice that, since the marginals of Xr1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{r_{1}}$$\end{document} and Xr2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{r_{2}}$$\end{document} are equal for all r1,r2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1},r_{2}$$\end{document} , we have that κd=πd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}=\pi _{d}$$\end{document} . Moreover, we can ignore the number of raters, since the pairwise distribution do not depend on them. Then, from standard theory about the multivariate and folded normal, we find that

E(|Xr1-Xr2|)=21-ρπ,E(|Xr1-Xr2|2)=2(1-ρ). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E(|X_{r_{1}}-X_{r_{2}}|)=2\sqrt{\frac{1-\rho }{\pi }},\quad E(|X_{r_{1}}-X_{r_{2}}|^{2})=2(1-\rho ). \end{aligned}$$\end{document}

Let Xr1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X'_{r_{1}}$$\end{document} be a copy of Xr1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{r_{1}}$$\end{document} that is independent of Xr2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{r_{2}}$$\end{document} . Then E(|Xr1-Xr2|)=2/π \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(|X'_{r_{1}}-X_{r_{2}}|)=2/\sqrt{\pi }$$\end{document} and E(|Xr1-Xr2|2)=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(|X'_{r_{1}}-X_{r_{2}}|^{2})=2$$\end{document} . Now rewrite the kappas using disagreement instead of agreement. Use the fact that (pwa-pfa)/(1-pfa)=1-dwa/dfa \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_{wa}-p_{fa})/(1-p_{fa})=1-d_{wa}/d_{fa}$$\end{document} , where dwa=1-E(w(Xr1,Xr2)) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{wa}=1-E(w(X_{r_{1}},X_{r_{2}}))$$\end{document} and dfa=1-E(w(Xr1,Xr2)) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_{fa}=1-E(w(X'_{r_{1}},X_{r_{2}}))$$\end{document} , where Xr1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X'_{r_{1}}$$\end{document} is a copy of Xr1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{r_{1}}$$\end{document} that is independent of Xr2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{r_{2}}$$\end{document} .

Thus, κd=πd=1-E(|Xr1-Xr2|)/E(|Xr1-Xr2|2)=1-1-ρ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}=\pi _{d}=1-E(|X_{r_{1}}-X_{r_{2}}|)/E(|X'_{r_{1}}-X_{r_{2}}|^{2})=1-\sqrt{1-\rho }$$\end{document} for the absolute value weights and 1-E(|Xr1-Xr2|2)/E(|Xr1-Xr2|2)=ρ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-E(|X_{r_{1}}-X_{r_{2}}|^{2})/E(|X'_{r_{1}}-X_{r_{2}}|^{2})=\rho $$\end{document} for the quadratic weights.

Variance of U-Statistics

Let Un1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{n}^{1}$$\end{document} and Un2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{n}^{2}$$\end{document} be two U-statistics of n observations with symmetric kernels ψ1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{1}$$\end{document} , ψ2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{2}$$\end{document} of dimension k1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{1}$$\end{document} and k2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{2}$$\end{document} . Define

(6.1) σcc2=Cov(E[ψ1(X1,,Xk1)X1,,Xc)],E[ψ2(X1,,Xk2)X1,,Xc)]). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{cc}^{2} = {{\,\textrm{Cov}\,}}(E[\psi _{1}(X_{1},\ldots ,X_{k_{1}})\mid X_{1},\ldots ,X_{c})],E[\psi _{2}(X_{1},\ldots ,X_{k_{2}})\mid X_{1},\ldots ,X_{c})]). \end{aligned}$$\end{document}

Proposition 2

The exact covariance of U1n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{1}^{n}$$\end{document} and U2n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U_{2}^{n}$$\end{document} is

Cov(U1n,U2n)=nk1-1c=1k1k2cn-k2k1-cσcc2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {{\,\textrm{Cov}\,}}(U_{1}^{n},U_{2}^{n})=\left( {\begin{array}{c}n\\ k_{1}\end{array}}\right) ^{-1}\sum _{c=1}^{k_{1}} \left( {\begin{array}{c}k_{2}\\ c\end{array}}\right) \left( {\begin{array}{c}n-k_{2}\\ k_{1}-c\end{array}}\right) \sigma _{cc}^{2}. \end{aligned}$$\end{document}

If k1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{1}$$\end{document} and k2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_{2}$$\end{document} are fixed, its asymptotic variance is nCov(U1n,U2n)k1k2σ12 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n{{\,\textrm{Cov}\,}}(U_{1}^{n},U_{2}^{n})\rightarrow k_{1}k_{2}\sigma _{12}$$\end{document} .

Proof

See (Lee, Reference Lee2019, Theorem 2, p. 17) and (Lee, Reference Lee2019, Theorem 2, p. 76). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Expanding the Definitions

Here is sketch of how we could expand the definitions in Sect. 2 to encompass more complicated scenarios. We restrict ourselves to g=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g=2$$\end{document} , but the analysis can be expanded to arbitrary g. Suppose that any finite number of raters R is possible, the raters are not exchangeable, and that not every item is rated by every rater.

Let X denote a rating, R be the raters, and I be the items rated. Suppose we sample pairs (X1,R1,I1),(X2,R2,I2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_{1},R_{1},I_{1}),(X_{2},R_{2},I_{2})$$\end{document} independently from the same distribution F. Then we may define

(6.2) Dd=E[d(X1,X2)I1=I2,R1R2],Cd=E[d(X1,X2)R1R2],Fd=E[d(X1,X2)]. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} D_{d}= & {} E[d(X_{1},X_{2})\mid I_{1}=I_{2},R_{1}\ne R_{2}],\nonumber \\ C_{d}= & {} E[d(X_{1},X_{2})\mid R_{1}\ne R_{2}],\nonumber \\ F_{d}= & {} E[d(X_{1},X_{2})]. \end{aligned}$$\end{document}

These quantities have natural sample analogues; e.g.,

D^d=N-1i=1nr1r2d(xir1,xir2), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{D}_{d}=N^{-1}\sum _{i=1}^{n}\sum _{r_{1}\ne r_{2}}d(x_{ir_{1}},x_{ir_{2}}), \end{aligned}$$\end{document}

where N is the total number of paired observations and the rater indices run over the raters who observed at the ith observation x. Population and sample definitions of Cohen’s kappa and Fleiss’ kappa follow as laid out in the main text, e.g., κd=1-Dd/Cd \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{d}=1-D_{d}/C_{d}$$\end{document} .

Table 8 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Fleiss’s kappa.

Table 9 Coverage (first entry) and lengths (second entry) of confidence intervals: Normal model, Fleiss’s kappa.

Table 10 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Fleiss’ kappa ( R=5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=5$$\end{document} ).

Krippendorff’s Alpha

Now suppose that the ratings can take on only a finite number C distinct values. Define ock \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$o_{ck}$$\end{document} as the number of times a pair of raters has classified an item into c and k, i.e.,

ock=i=1nr1r21[xir1=c,xir2=k]. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} o_{ck}=\sum _{i=1}^{n}\sum _{r_{1}\ne r_{2}}1[x_{ir_{1}}=c,x_{ir_{2}}=k]. \end{aligned}$$\end{document}

Then N=c,kock \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=\sum _{c,k}o_{ck}$$\end{document} and D^d=N-1c,kockd(c,k). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{D}_{d}=N^{-1}\sum _{c,k}o_{ck}d(c,k).$$\end{document} Moreover, define nc \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{c}$$\end{document} as the number of items classified as c. Then nc=kock \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_{c}=\sum _{k}o_{ck}$$\end{document} , cnc=N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{c}n_{c}=N$$\end{document} , and c,kncnkd(c,k)=N2F^d. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{c,k}n_{c}n_{k}d(c,k)=N^{2}\hat{F}_{d}.$$\end{document}

Proposition 3

Using the above definitions, α^d=π^d+1N(1-π^d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\alpha }_{d}=\hat{\pi }_{d}+\frac{1}{N}(1-\hat{\pi }_{d})$$\end{document} . Since there are N=2Rn \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=2Rn$$\end{document} rating pairs in the rectangular setup used in Sect. 2, α^d=π^d+12Rn(1-π^d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\alpha }_{d}=\hat{\pi }_{d}+\frac{1}{2Rn}(1-\hat{\pi }_{d})$$\end{document} in that case.

Proof

The definition of α^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\alpha }_{d}$$\end{document} can be found on Krippendorff (Reference Krippendorff2018, p.235),

α^d=1-(N-1)ckockd(c,k)ckncnkd(c,k). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\alpha }_{d}=1-(N-1)\frac{\sum _{c\ne k}o_{ck}d(c,k)}{\sum _{c\ne k}n_{c}n_{k}d(c,k)}. \end{aligned}$$\end{document}

From the above definitions, and the fact that d(c,k)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d(c,k)=0$$\end{document} when c=k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c=k$$\end{document} , we find that

ckockd(c,k)=c,kockd(c,k)=ND^d. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{c\ne k}o_{ck}d(c,k)=\sum _{c,k}o_{ck}d(c,k)=N\hat{D}_{d}. \end{aligned}$$\end{document}

In the same way,

ckncnkd(c,k)=c,kncnkd(c,k)=N2F^d. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{c\ne k}n_{c}n_{k}d(c,k)=\sum _{c,k}n_{c}n_{k}d(c,k)=N^{2}\hat{F}_{d}. \end{aligned}$$\end{document}

Thus,

α^d=1-(N-1)ND^dF^d=1-D^dF^d+1ND^dF^d, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\alpha }_{d}=1-\frac{(N-1)}{N}\frac{\hat{D}_{d}}{\hat{F}_{d}} =1-\frac{\hat{D}_{d}}{\hat{F}_{d}}+\frac{1}{N}\frac{\hat{D}_{d}}{\hat{F}_{d}}, \end{aligned}$$\end{document}

and using that π^d=1-D^dF^d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\pi }_{d}=1-\frac{\hat{D}_{d}}{\hat{F}_{d}}$$\end{document} , we are done. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Proof of Correspondence with Gwet (Reference Gwet2021)

Using the nominal disagreement function, Gwet (Reference Gwet2021) uses the following estimator for the asymptotic variance of the pairwise Fleiss’ kappa:

σ^2=1n-1i=1n(κi-κ^)2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{\sigma }^{2}=\frac{1}{n-1}\sum _{i=1}^{n}(\kappa _{i}^{\star }-\hat{\kappa })^{2}. \end{aligned}$$\end{document}

Translating into our notation (dropping the dependence on the disagreement d), we have that κ^=1-D^/F^ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }=1-\hat{D}/\hat{F}$$\end{document} . Moreover, one can verify that κi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{i}^{\star }$$\end{document} equals

κi=1-μ^(xi)F^-2D^F^1-μ^F(xi)F^, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \kappa _{i}^{\star }=1-\frac{\hat{\mu }(x_{i})}{\hat{F}} -2\frac{\hat{D}}{\hat{F}}\left( 1-\frac{\hat{\mu }_{F}(x_{i})}{\hat{F}}\right) , \end{aligned}$$\end{document}

where μ^(xi) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }(x_i)$$\end{document} and μ^F(xi) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }_F(x_i)$$\end{document} were defined in Sect. 4.

Following a small reorganization of the terms, we find that

1n-1i=1n(κi-κ^)2=1F^21n-1i=1n2D^F^μ^F(xi)-F^-[μ^d(xi)-D^]2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{1}{n-1}\sum _{i=1}^{n}(\kappa _{i}^{\star }-\hat{\kappa })^{2} =\frac{1}{\hat{F}^{2}}\frac{1}{n-1}\sum _{i=1}^{n}\left( 2\frac{\hat{D}}{\hat{F}}\left[ \hat{\mu }_{F}(x_{i})-\hat{F}\right] -[\hat{\mu }_{d}(x_{i}) -\hat{D}]\right) ^{2}. \end{aligned}$$\end{document}

Using the definitions of σ^D2,σ^FD \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{D}^{2},\hat{\sigma }_{FD}$$\end{document} and σ^F2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{F}^{2}$$\end{document} (c.f. Section 4.2), one can verify using simple algebraic manipulations that

1n-1i=1nκi-κ^2=1F^2σ^D2-2σ^FDD^dF^d+σ^F2D^d2F^d2; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{1}{n-1}\sum _{i=1}^{n}\left( \kappa _{i}^{\star }-\hat{\kappa }\right) ^{2} = \frac{1}{\hat{F}^{2}}\left( \hat{\sigma }_{D}^{2}-2\hat{\sigma }_{FD} \frac{\hat{D}_{d}}{\hat{F}_{d}}+\hat{\sigma }_{F}^{2}\frac{\hat{D}_{d}^{2}}{\hat{F}_{d}^{2}}\right) ; \end{aligned}$$\end{document}

hence, the estimator of Gwet (Reference Gwet2021) is a special case of the proposed estimator in Sect. 4.2.

Simulation of Fleiss’s Kappa

Here are the results of the simulation study in 5.1 for Fleiss’s kappa (Tables 8, 9, 10).

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-023-09945-2.

1 For instance, Fleiss (Reference Fleiss1971), in his paper introducing Fleiss’ kappa, removed several ratings from this data to make sure the total number of ratings was 6 for each item.

2 Note that the concordance correlation coefficient is an intraclass correlation coefficient, see (Carrasco & Jover, Reference Carrasco and Jover2003, p. 850).

3 The Schuster–Smith coefficient also encompasses the case of 2<g<R \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2<g<R$$\end{document} provided their weight function v(s) is appropriately defined, see the discussion on dispersion weights in (Schuster and Smith, Reference Schuster and Smith2005).

4 The Fréchet mean and variances are usually defined slightly differently, using l2(x,xk) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l^{2}(x,x_{k})$$\end{document} instead of l(x,xk) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l(x,x_{k})$$\end{document} , with l being a metric. Our definition of the Fréchet mean is sometimes called the generalized Fréchet mean or the α \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} -Fréchet mean.

5 We use fewer simulations (1, 000) than in the previous simulation (10, 000) since estimation is far more computationally expensive when dealing with continuous data, as it does not allow for binning.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Berry, K. J., Johnston, J. E., Mielke, P. W. Jr. (2008). Weighted kappa for multiple raters. Perceptual and Motor Skills, 107(3), 837848.CrossRefGoogle ScholarPubMed
Berry, K. J., Mielke, P. W.. (1988). A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurement, 48(4), 921933.CrossRefGoogle Scholar
Carrasco, J. L., Jover, L.. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics, 59(4), 849858.CrossRefGoogle ScholarPubMed
Cicchetti, D. V., Feinstein, A. R.. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43(6), 551558.CrossRefGoogle ScholarPubMed
Cohen, J.. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 3746.CrossRefGoogle Scholar
Cohen, J.. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213220.CrossRefGoogle ScholarPubMed
Cohen, M. B. , Lee, Y. T. , Miller, G. , Pachocki, J., & Sidford, A. (2016). Geometric median in nearly linear time. In Proceedings of the forty-eighth annual ACM symposium on theory of computing (pp. 9–21). Association for Computing Machinery.CrossRefGoogle Scholar
Conger, A. J.. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88(2), 322328.CrossRefGoogle Scholar
Cooil, B., Rust, R. T.. (1994). Reliability and expected loss: A unifying principle. Psychometrika, 59(2), 203216.CrossRefGoogle Scholar
Drezner, Z., Klamroth, K., Schöbel, A., & Wesolowsky, G. O. (2002). The weber broblem. In Z. Drezner & H. Horst (Eds.), Facility location: Applications and theory (pp. 1–36). Springer.Google Scholar
Dubey, P., Müller, H. G.. (2019). Fréchet analysis of variance for random objects. Biometrika, 106(4), 803821.CrossRefGoogle Scholar
Efron, B.. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171185.CrossRefGoogle Scholar
Fisher, R. A.. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507521.Google Scholar
Fleiss, J. L.. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378382.CrossRefGoogle Scholar
Fréchet, . (1948). Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré, 10(4), 215230.Google Scholar
Gwet, K. L.. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. The British Journal of Mathematical and Statistical Psychology, 61, 2948.CrossRefGoogle ScholarPubMed
Gwet, K. L. (2014). Handbook of inter-rater reliability. Advanced Analytics, LLC.Google Scholar
Gwet, K. L. (2021). Large-sample variance of fleiss generalized kappa. Educational and Psychological Measurement.CrossRefGoogle Scholar
Hoeffding, W. (1992). A class of statistics with asymptotically normal distribution. In: S. Kotz & N. L. Johnson (eds), Breakthroughs in statistics: Foundations and basic theory (pp. 308–334). Springer.Google Scholar
Huber, P. J.. (1964). Robust estimation of a location parameter. Annals of Mathematical Statistics, 35(1), 73101.CrossRefGoogle Scholar
Hubert, L.. (1977). Kappa revisited. Psychological Bulletin, 84(2), 289297.CrossRefGoogle Scholar
Janson, H., Olsson, U.. (2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61(2), 277289.CrossRefGoogle Scholar
Korolyuk, V. S., & Borovskich, Y. V. (2013). Theory of U-statistics (1994th ed.). Springer.Google Scholar
Krippendorff, K.. (1970). Bivariate agreement coefficients for reliability of data. Sociological Methodology, 2, 139150.CrossRefGoogle Scholar
Krippendorff, K. (2018). Content analysis: An introduction to its methodology.CrossRefGoogle Scholar
Lee, A. J. (2019). U-statistics: Theory and practice. Routledge.Google Scholar
Lehmann, E. L. (2004). Elements of large-sample theory. Springer.Google Scholar
Light, R. J.. (1971). Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin, 76(5), 365377.CrossRefGoogle Scholar
Lin, L. I.. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45(1), 255268.CrossRefGoogle ScholarPubMed
Lin, L. I. (1992). Assay validation using the concordance correlation coefficient. Biometrics, 48(2), 599–604.CrossRefGoogle Scholar
Martín Andrés, A., Álvarez Hernández, M.. (2020). Hubert’s multi-rater kappa revisited. The British Journal of Mathematical and Statistical Psychology, 73(1), 122.CrossRefGoogle ScholarPubMed
Maxwell, A. E.. (1977). Coefficients of agreement between observers and their interpretation. The British Journal of Psychiatry, 130, 7983.CrossRefGoogle ScholarPubMed
Moss, J.. (2023). Measuring agreement using guessing models and knowledge coefficients. Psychometrika,.CrossRefGoogle ScholarPubMed
O’Connell, D. L., Dobson, A. J.. (1984). General Observer-Agreement measures on individual subjects and groups of subjects. Biometrics, 40(4), 973983.CrossRefGoogle Scholar
Perreault, W. D., Leigh, L. E.. (1989). Reliability of nominal data based on qualitative judgments. Journal of Marketing Research, 26(2), 135148.CrossRefGoogle Scholar
Sandifer, M. G., Hordern, A., Timbury, G. C., Green, L. M.. (1968). Psychiatric diagnosis: A comparative study in north Carolina, London and Glasgow. The British Journal of Psychiatry, 114(506), 19.CrossRefGoogle ScholarPubMed
Schouten, H. J. A.. (1980). Measuring pairwise agreement among many observers. Biometrical Journal, 22(6), 497504.CrossRefGoogle Scholar
Schouten, H. J. A.. (1982). Measuring pairwise agreement among many observers. II. Some improvements and additions. Biometrical Journal, 24(5), 431435.CrossRefGoogle Scholar
Schuster, C., Smith, D. A.. (2005). Dispersion-weighted kappa: An integrative framework for metric and nominal scale agreement coefficients. Psychometrika,.CrossRefGoogle Scholar
Scott, W. A.. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3), 321325.CrossRefGoogle Scholar
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. Wiley.CrossRefGoogle Scholar
van Oest, R.. (2019). A new coefficient of interrater agreement: The challenge of highly unequal category proportions. Psychological Methods, 24(4), 439451.CrossRefGoogle ScholarPubMed
Varian, H. R. (1975). A Bayesian approach to real estate assessment. In: A. Z. Stephen & E. Fienberg (Eds.), Studies in Bayesian econometric and statistics in honor of Leonard J. Savage (pp. 195–208). North Holland.Google Scholar
Warrens, M. J.. (2012). Equivalences of weighted kappas for multiple raters. Statistical Methodology, 9(3), 407422.CrossRefGoogle Scholar
Warton, D. I., Hui, F. K. C.. (2011). The arcsine is asinine: The analysis of proportions in ecology. Ecology, 92(1), 310.CrossRefGoogle ScholarPubMed
Zapf, A., Castell, S., Morawietz, L., Karch, A.. (2016). Measuring inter-rater reliability for nominal data—Which coefficients and confidence intervals are appropriate?. BMC Medical Research Methodology, 16, 93.CrossRefGoogle ScholarPubMed
Figure 0

Table 1 Weighted agreement coefficients.

Figure 1

Table 2 Maximal agreement for the data of Fleiss (1971).

Figure 2

Figure 1 Simulated sampling distribution of κ^d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} for quadratic weights using three transformations, n=20,R=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20, R=3$$\end{document}. The simulation setup is described in Example 3. The arcsine transform makes the sampling distribution closest to the normal distribution.

Figure 3

Table 3 Confidence intervals for the data of Fleiss (1971) using the arcsine method.

Figure 4

Table 4 Confidence intervals for Zapf et al. (2016) using the arcsine method.

Figure 5

Table 5 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Cohen’s kappa.

Figure 6

Table 6 Coverage (first entry) and lengths (second entry) of confidence intervals: normal model, Cohen’s kappa.

Figure 7

Table 7 Coverage (first entry) and lengths (second entry) of confidence intervals for g-wise coefficients: Perreault–Leigh model, Cohen’s kappa.

Figure 8

Figure 2 Sample distribution of κ^d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} for nominal (left) and absolute value (right) weights. Both plots omit a dominating spike at 1. Here n=20\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} and j=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=5$$\end{document}, and we use the Perreault–Leigh model (same parameters as in Sect. 5.1) to simulate the data. There were 2573 unique values for the nominal weight and 8790 unique values for the absolute value weight after N=200,000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=200{,}000$$\end{document} simulations.

Figure 9

Table 8 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Fleiss’s kappa.

Figure 10

Table 9 Coverage (first entry) and lengths (second entry) of confidence intervals: Normal model, Fleiss’s kappa.

Figure 11

Table 10 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Fleiss’ kappa (R=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=5$$\end{document}).

Supplementary material: File

Moss supplementary material

Moss supplementary material
Download Moss supplementary material(File)
File 5.5 MB