Article contents
Assessing the Reliability of Blind Wine Tasting: Differentiating Levels of Clinical and Statistical Meaningfulness*
Published online by Cambridge University Press: 08 June 2012
Abstract
The author distinguishes between the clinical and statistical meaning of varying levels of intertaster reliability for the 11 judges who evaluated 10 Chardonnays (6 American and 4 French) in the heralded 1976 Paris wine competition. Four wines showed levels of weighted kappa values (<0.40), that are considered poor by established biostatistical criteria. These ranged between 0.10, for the French Beaune Clos des Mouches 1973 Chardonnay to 0.33 for the U.S. Veedercrest 1972 Chardonnay. However, when levels of statistical significance of the weighted kappa (Kw) values were obtained, only the Clos des Mouches failed to reach statistical significance at the .05 level. The other three wines-the U.S. Chateau Montelena, 1973, with a Kw of 0.20; the U.S. 1973 David Bruce regular, with a weighted kappa value of .27 and the U.S. Veedercrest, with one of .33-reached statistical significance at p values of <.05, <.001, and <.0001, respectively. These findings are not weighted kappa specific, and reveal that when sample sizes are large enough, even the most trivial of results will be statistically significant, while often devoid of practical or clinical meaning-fulness. A level of Kw that is clinically meaningful will most likely be statistically significant. But high levels of statistical significance are no guarantee of clinical significance. Methods for resolving this “big N phenomenon” are presented and discussed. (JEL Classification: C12, C49)
- Type
- Articles
- Information
- Copyright
- Copyright © American Association of Wine Economists 2007
References
- 8
- Cited by