I. Introduction
The ratings that critics and judges assign to wines in newsletters, blogs, magazines, and in local to international competitions affect consumers’ decisions and the economics of the wine industry. While many of those ratings are assigned by tasters who are “blind” (to price, label, capsule, and closure), several sources of potential error and bias remain. Even when a wine is assessed blind, the rating assigned may be influenced by factors that are not in the glass.
Section II of this short article is a review of the findings that blind wine ratings are uncertain and subject to stochastic error. Section III is a review of findings that anchoring, expectation, serial position, commercial, and non-taste non-smell sensory (sight, sound, touch) factors may induce cognitive and omitted-variable biases in blind ratings. Section IV is a review of findings that differences in physical preparation (decanting, filtering, aeration, temperature) can also be omitted variables that affect ratings. Conclusions and implications follow in Section V.
II. Stochastic errors
The uncertainty surrounding the ratings that judges assign, even when assigned blind, is old news. Without specific reference to wine ratings, Saal, Downey, and Lahey (Reference Saal, Downey and Lahey1980) reviewed the history of variability in judgment-related ratings dating from 1909. Focusing on wine ratings, Filipello (Reference Filipello1955, Reference Filipello1956, Reference Filipello1957), Filipello and Berg (Reference Filipello and Berg1958), Tish (Reference Tish2004), Hodgson (Reference Hodgson2008a, Reference Hodgson2008b), Ashton (Reference Ashton2012, Reference Ashton2013), and Bodington (Reference Bodington2012, Reference Bodington2017) published results showing that there is variance in the rating that a judge assigns to a wine. Ough and Baker (Reference Ough and Baker1961), Goldberg (Reference Goldberg1991), Castriota, Curzi, and Delmastro (Reference Castriota, Curzi and Delmastro2012), Goode (Reference Goode2014), Shepherd (Reference Shepherd2018), Bodington (Reference Bodington2020), and Glancy (Reference Glancy2020) published results showing that variance in ratings can change from wine to wine (for the same judge) and from judge to judge (for the same wine).
None of the literature cited previously means that wine ratings are merely random, but it does show that such ratings are both uncertain and heteroscedastic. Those findings are not unique to wine. Kahneman, Sibony, and Sunstein (Reference Kahneman, Sibony and Sunstein2021, pp. 80–86, 215–258) describe heteroscedasticity in other areas of judgment, including diagnoses by physicians, fingerprint identifications, and sentencings of criminals by judges.
III. Cognitive and omitted-variable biases
Although judges and critics focus on the wine in the glass, evidence shows that blind ratings are affected by factors that are not in the glass. Some of those factors are indicated by literature that is cited next. Other factors described next are reported as anecdotal and thus hypotheses that remain to be tested.
A. Anchoring
Many score-based rating systems assign categories of quality or award to score thresholds and ranges. De Long (Reference De Long2006) describes ten well-known, score-based wine rating systems that have score ranges for different categories of quality. For example, the International Organization of Vine and Wine (OIV, 2021) prescribes scoring between 0 and 100 along with thresholds for Bronze, Silver, and Gold medals at scores of 80, 85, and 90, respectively. For 8,400 ratings according to the OIV system, Bodington and Malfeito-Ferreira (Reference Bodington and Malfeito-Ferreira2017) showed spikes in the frequencies of scores assigned just below those thresholds.
Even without quality categories, evidence shows that critics and judges favor and avoid certain scores. For a competition that prescribed scores between 50 and 100, Bodington (Reference Bodington2017) showed spikes in frequency at nearly every 5-point interval. Chaudhary and Siegel (Reference Chaudhary and Siegel2016) reported a sharp increase in scores of 90 and higher published in anonymous “major wine magazines.” Hunt (Reference Hunt2013) found spikes in the frequencies of the scores assigned by Jancis Robinson, Robert Parker, and others. While some of the results reported by Chaudhary and Siegel (Reference Chaudhary and Siegel2016) and Hunt (Reference Hunt2013) may be due to sample bias, the combination of research cited here shows that some judges appear to anchor scores about categorical or psychological thresholds. Scores that appear to be cardinal may be more accurately interpreted as ordinal.
B. Expectations
Much research shows that judges’ expectations affect the ratings that they assign. Ashton (Reference Ashton2014) showed that judges assigned higher ratings to wines from New Jersey when told the wines were from California and lower ratings to wines from California when told the wines were from New Jersey.
Several aspects of expectations remain to be tested. The pre-printed forms provided to California State Fair (CSF) judges list the grape variety, vintage, alcohol by volume, and residual sugar of a wine next to spaces where the judge writes in a comment and then a rating.Footnote 1 Whether or not such judgments should be represented as “blind” is open to debate, and that information may affect ratings. Furthermore, even when blind to all information about a wine, a judge's expectation of overall good quality based on the assumption that vintners enter their good and not so good wines in competitions may lead to a central tendency in ratings within whatever range of scores or categories indicates good quality.
C. Sequential position
In contrast to flights in which wines can be reassessed, sequential or taste-then-rate protocols are common. The Judgment of Paris, the CSF, and many publishing critics employ sequential protocols.
Serial position bias may occur in sequential tastings due to carryover, palate fatigue, rest breaks, meal breaks, physiological, and psychological factors.Footnote 2 There are anecdotal reports from judges who say there is temptation to assign a high rating to a dry and high-acid wine because it is refreshing in a sequence just after several off-dry and alcoholic wines. UC Davis’ class for potential wine judges warns of position bias due to the sequence of wines, breaks, and lunch.Footnote 3 Filipello (Reference Filipello1955, Reference Filipello1956, Reference Filipello1957) and Filipello and Berg (Reference Filipello and Berg1958), conducted various tests using sequential protocols and found evidence of primacy bias. Mantonakis et al. (Reference Mantonakis, Rodero, Lesschaeve and Hastie2009, p. 1311), found that “high knowledge” wine tasters are more prone than “low knowledge” wine tasters to primacy and recency bias. The sequence of wines tasted at the 1976 Judgment of Paris has never been disclosed, so what effect position bias may have had on the results remains unknown.Footnote 4
Other forms of position bias are possible. There is anecdotal evidence that, in a taste-and-score sequential protocol, a judge may assign a rating to the first wine and then rate the remaining wines “around” that anchor. A lag structure may also exist in which a judge rates around some composite of the most recent wines. In addition to the effects of stochastic error discussed in Section II, that lag structure could cause a resulting set of scores to violate transitive axioms of equality and inequality.
D. Commercial
Accusations that money and favors affect critics’ writings and ratings are common in the wine-trade press.
Even when wines are assessed blind, some assert that commercial considerations affect judges’ ratings. Gregutt (Reference Gregutt2022) wrote that some critics inflate scores to get publicity for themselves. Gray (Reference Gray2013) reports that some competitions encourage judges to assign high scores and medals, and judges have told this author that competition officials asked them to assign more gold medals to increase current-entrant satisfaction and future submissions. Although those reports indicate a potential for commercial bias in some cases, to date, a documented analysis of such bias does not appear to have been published.
E. Other senses: Sight, sound and touch
Wine assessment is more sensory than just smell and taste. Sight, sound, and touch may affect ratings too. Chaudhary and Siegel (Reference Chaudhary and Siegel2016) showed that red wines tend to get higher ratings than white wines. Seeing the color of wine alone may affect expectations and thus ratings. Spence, Velasco, and Knoeferle (Reference Spence, Velasco and Knoeferle2014) showed that the color of the light in the tasting room and the type of music played affected the ratings assigned by over 3,000 novice tasters. North (Reference North2012) and Wang and Spence (Reference Wang and Spence2017) showed that background music can affect novices’ and wine professionals’ wine descriptors, purchases, and ratings. Campo, Reinoso-Carvalho, and Rosato (Reference Campo, Reinoso-Carvalho and Rosato2021) reviewed the literature concerning how tasting wine is an experience in which taste, smell, vision, sound, and touch interact. Regarding touch, the quality of glassware was shown to affect a taster's perceptions of wine quality.
IV. Physical preparation: Decanting, filtering, aeration, and temperature
The physical preparation of wine, not yet in a glass, can alter what is in the glass. While physical preparation may not affect the relative ratings assigned by judges on a panel tasting from the same bottle at about the same time, it may affect the ratings assigned by judges at different times and/or after different preparations.
Although it is obvious that decanting can remove sediment and critics including Rosenthal (Reference Rosenthal2008) argue the merits of filtering, no trials appear to have yet examined their potential effects on ratings. Wollan, Pham, and Wilkinson (Reference Wollan, Pham and Wilkinson2016) showed that exposure due to active aeration, or merely time in an open glass, enables evaporation of ethanol and other volatiles, including hydrogen sulfide, that “significantly influence the perception of wine attributes.” Wollan, Pham, and Wilkinson did not examine the effects of aeration on short-term oxidation. Master of Wine Canterbury (Reference Canterbury2014) and Fox (Reference Fox2016) conducted informal blind trials with aerators and found no differences between aerated wines and wines poured into glasses and left to stand for a few minutes. Much is also written in the trade press about the best serving temperatures for various wines. Campo, Reinoso-Carvalho, and Rosato (Reference Campo, Reinoso-Carvalho and Rosato2021) cite literature showing that temperature does affect mouthfeel and tasters’ perceptions of aromas.
V. Conclusion and implications
Even when wines are assessed blind (to price, label, capsule, and closure), published research shows that the ratings that critics and judges assign to wines may be influenced by noise and biases, as shown in Figure 1.
Functional forms that treat ratings as if they are deterministic, or uncertain but identically distributed, are misspecifications of the uncertain and heteroscedastic nature of ratings. The ratings that a judge assigns to a series of wines may not comply with the transitive axioms of equality and inequality. Analyses of scores as if they are cardinal may miss anchoring and cognitive biases that make them ordinal. Expectation, serial position, commercial, and sight-sound-touch sensory factors may induce cognitive and omitted-variable biases that cause a judge's rating to differ from a rating of only what is in the glass. In addition to stochastic error, ratings on blind replicates may be differentiated by carryover, palate fatigue, aeration, and temperature. Further, a central tendency in expectations and other factors may alter the null hypothesis that ought to be employed in tests of statistical significance.
While stochastic error and biases make an analysis of ratings difficult, ratings data are not merely random or impenetrable. For example, using 2019 CSF data from Bodington (Reference Bodington2020), Figure 2 shows that the correlations between vectors of judges’ ratings on the same wines concentrate between 0.3 and 0.7.
Acknowledgments
The author thanks an anonymous reviewer for insightful and constructive comments.