1 Introduction
Over 40 years ago Reference Kahneman and TverskyKahneman and Tversky (1972) proposed that judgments of likelihood are sometimes based on representativeness, or the similarity of an event to some class of events. This representativeness heuristic, although often useful, can lead to departures from normative probability rules. Relatively little research has been done on investigating whether manipulations of similarity affect likelihood judgments, as would be expected by the representativeness heuristic. In a classic set of studies, Bar-Hillel (1982) examined the relationship between similarity and likelihood judgments. She used stimuli in which similarity and likelihood did not coincide and found that judgments of likelihood were strongly related to judgments of similarity. Read and Grushka-Cockayne (2011) built on her work and showed that similarity can be used to make accurate judgments of likelihood.
Here we take a somewhat different route and investigate whether manipulating similarity has similar effects on both similarity and likelihood judgments. We use Tversky’s (1977) seminal contrast model of similarity to determine factors that affect similarity judgments, and test whether they also influence judgments of likelihood.
1.1 Contrast model of similarity
Tversky (1977) noted that many items, such as faces, countries, or personalities, are better described in terms of qualitative features than in terms of a small number of quantitative dimensions. Rather than as point on a continuous dimension, such items might be better described in terms of presence or absence of specific features. Accordingly, he proposed a model in which objects are represented by sets of features, and similarity judgments depend on how the features match.
Features are usually discrete, often binary variables, but any dimension can also be represented as nested or overlapping sets of discrete features. The observed similarity of object a to object b, S(a,b), is a function of their common features, those that are shared by both a and b, and their distinctive features, those that belong to one but not to the other. The theory then expresses similarity as a linear combination, or a contrast, of the measures of their common and distinctive features:
where θ, α, β, ≥ 0. This model allows for a variety of similarity relations over the same set of objects, depending on the values of the parameters θ, α, and β. If θ = 1 and α = β = 0, the similarity of the objects is entirely determined by their common features. If, on the other hand, θ = 0 and α = β = 1, the similarity of the objects is determined entirely by their distinctive features. The scale f reflects prominence or salience of different features, determining the contribution of each individual feature to the similarity between objects.Footnote 1
By means of the parameters θ, α, and β the contrast model allows people to pay more attention to the objects’ common features when assessing their similarity, and more attention to their distinctive features when assessing their dissimilarity. As a result, a pair of richly described objects, which are likely to share many common and many distinctive features, could be judged to be both more similar and more different than a pair of less richly described objects, which are likely to share fewer common and distinctive features. As an example of such effects of the nature of comparison, Tversky (1977; Reference Tversky and GatiTversky & Gati, 1978, Study 1) showed that countries which their respondents described as “more prominent” (such as West and East Germany at the time, or England and Ireland) were judged to be both more similar and more different than countries they described as “less prominent” (such as Ceylon and Nepal, or Pakistan and Mongolia). The results confirm the hypothesis that the relative weight of common and distinctive features varies with nature of the comparison. Shafir (1993), who has extended this paradigm to choices between different options has shown that more richly described (“enriched”) options tend to be both more often chosen and more often rejected than the less richly described (“impoverished”) options.
The contrast model is also consistent with context effects on the judged similarity of objects, because the function f is sensitive to the context of comparison. Some features have greater diagnostic value, and affect the judgments of similarity more in certain contexts than in others. In one of the many demonstrations of these context effects, respondents were presented with four countries that naturally formed two clusters (Tversky, 1977; Reference Tversky and GatiTversky & Gati, 1978, Study 4). For example, consider the two sets of four countries, (England, Israel, Syria, Iran) and (England, Israel, Syria, France), which differ only in the fourth country in the list. A natural grouping in the first case is Syria and Iran as Muslim countries and England and Iran as non-Muslim countries. A natural grouping in the second case is Israel and Syria as Middle-Eastern countries and England and France as European countries. Accordingly, England and Israel are judged as more similar to each other in the first case than in the second. Note that somewhat similar effects have been found in the choice-set effects literature (e.g. Hsee, 1996), where presence of an additional option changes evaluation of the existing options.
If the representativeness heuristic for judging likelihood is based on similarity (Reference Kahneman and TverskyKahneman & Tversky, 1972; Reference Kahneman and TverskyKahneman & Frederick, 2002), then it should be sensitive to factors that influence similarity, such as nature of comparison and context effects. To test this idea, we designed two experiments aimed at exploring these factors. In designing the experiments, we closely followed the methodology of the studies described in Tversky (1977) and Reference Tversky and GatiTversky and Gati (1978). In what follows, we first describe the methodology shared by both experiments, and then address them separately.
2 General methodology
Materials. Both experiments used fictitious citizens of 15th century FlorenceFootnote 2 identified by Italian names selected from a list of names online.Footnote 3 An average respondent knows little or nothing about the inhabitants of 15th century Florence, yet the objects – citizens - sound plausible and are suitable for formulating different tasks (see also Reference Koehler, Brenner, Liberman and TverskyKoehler, Brenner, Liberman & Tversky, 1996).
As features, we used trait adjectives associated with two broad factors derived from the Big Five Factor Model of Personality. These factors, Agreeableness and Openness to Experience, have high loadings on different higher-order factors (Reference DigmanDigman, 1997). We aimed to add or remove traits from the descriptions of citizens without influencing the meaning/connotations of the remaining traits.
Each citizen was described by a number of trait adjectives, taken from Goldberg’s (1990) list of 100 clusters of adjectives, which he derived by a factor analysis of 339 adjectives describing different personality traits (Table 3 in Goldberg, 1990). Average reliability of his clusters was α = .66, and average pair-wise correlation of trait adjectives within the clusters was r = .40. We chose eight positive and eight negative clusters of trait adjectives from each of the two factors mentioned above. Each citizen was described by one or more trait adjectives belonging to the same cluster, for each of the two factors. The Appendix provides a complete list of trait adjectives we used.
Procedure. The questionnaire started with a consent form, and a few questions on respondents’ demographic characteristics and English language skills. The respondents then received a short introduction to the study (Figure 1). The respondents proceeded to answer questions about the experimental items, presented in random order. Respondents had to answer each item before they could continue to the next one.
3 Experiment 1: Nature of comparison.
3.1 Hypothesis
According to Tversky’s (1977) contrast model, when judging similarity among objects, people tend to weigh their common features more heavily than their distinctive features. This relative weighting is reversed when judging differences among objects. As a result, “enriched” pairs of objects, i.e., those with both more common and more distinctive features will be judged as both more similar and more different to each other than will pairs of “impoverished” objects, i.e., those with fewer common and distinctive features. Consequently, if the representativeness heuristic is used to judge likelihoods, pairs of enriched objects should be judged as both more and less likely to belong to the same class than pairs of impoverished objects.
3.2 Method
Respondents. The respondents were recruited either from the pool of the University of Maryland undergraduate psychology students (n = 54, 76% female), or through online advertising to the general public (n = 130, 70% female). To reduce the burden for the latter group, they were asked to complete only a random half of all experimental items. At the end of the study, the students were rewarded by course credit, and the Web respondents were offered a list of potentially interesting links related to judgment, decision-making, and perception. There were no significant differences between the two samples so we report pooled results.
Materials. In this experiment, an item consisted of two pairs of citizens, each described by adjectives associated with the traits of Agreeableness and Openness to Experience. In one pair six adjectives described each member, three focusing on one trait and three on the other. In the other pair only two adjectives described each member, one on each trait. An example of a typical item is shown in Figure 2. By using clusters of adjectives describing the same trait, respondents in the enriched conditions arguably received little new information (according to Goldberg’s 1992 analysis described above), but via a larger number of adjectives.
Pretest. We generated 160 items consisting of two pairs of citizens. In order to equate richness of description with what Tversky called “prominence”, we asked the respondents to select the pair in each item that “stands out more.” The sample included 28 undergraduate students of psychology, tested on computers in our lab, and 57 respondents tested online, recruited through word of mouth. Across items, the average percentage of respondents choosing the enriched pair was 79%. For the main study, we selected 20 items for which agreement was 100%.
Procedure. Respondents were randomized to four experimental groups (see Table 1, rows). Two groups made similarity judgments regarding the pairs of citizens: one group was asked to assess their similarity (“Choose the pair whose members are more similar”), while the other assessed their dissimilarity (“Choose the pair whose members are less similar”). The other two groups assessed the likelihood that the citizens from each of the pairs belong to the same family. Equivalently to the similarity group, one of the groups was told to “Choose the pair whose members are more likely to belong to the same family”, while the other was told to “Choose the pair whose members are less likely to belong to the same family”. Position of the items on the screen was counterbalanced – the enriched pair was put above the impoverished pair for half of the items. Items were randomized for each respondent.
3.3 Results
The percentage of respondents choosing the enriched pair in each of the four conditions is shown in Table 1. If the nature of comparison does not play any role, we would expect that the percentages for each type of judgments sum to 100. However, the average sums were larger than 100 (t(67)=5.55, p=.001).
Of particular interest to our study are the judgments of likelihood. The percentage of respondents who chose the enriched pair as “more likely to belong to the same family” and those who chose that same pair as “less likely to belong to the same family” summed to 117.8, higher than 100 (t(50)=7.37, p=.001).
4 Experiment 2: Context effects
4.1 Hypothesis
According to the contrast model, objects can be judged to be more or less similar to each other based on the context in which they appear. In particular, context can alter the salience of certain features by changing the natural clustering of the objects. The same effects should hence be observed in judgments of likelihood if based on similarity.
4.2 Method
Respondents.. We used Amazon Mechanical Turk to recruit 63 respondents for the pretest (29% female) and 118 respondents for the main study (43% female). They were all native English speakers and most of them were between 25 and 40 years of age.
Materials. The objects and features used to form the items in this experiment were the same as in Experiment 1. An item consisted of a quadruple of citizens. Each citizen was described with four features. Each quadruple had two versions. Three of the citizens were the same in both versions (a, b, and c), while the fourth differed (p or q). Their features were chosen in such a way that the natural groupings within the quadruple changed when the fourth citizen was changed. In one version, the natural groupings were a with b and c with p, while in the other the natural groupings were a with c and b with q. Hence, we expected that a would be perceived as more similar to b than to c in the presence of p, but would be perceived as more similar to c than to b in the presence of q. An example is shown in Figure 4. Here, Fiorenza is citizen a, Amadora b, Rosa c, Gianina p, and Ottavia q.
Pretest. We generated 20 pairs of items, or 40 quadruples in total. In the pretest (equivalent to the one described in Tversky, 1977; and Reference Tversky and GatiTversky & Gati, 1978, study 4), we checked whether the natural groupings of each quadruple were in accord with our expectations. The respondents were asked to divide each quadruple into two most natural pairs. All quadruples were divided in the expected pairs. The average percentage of the respondents who grouped the quadruples as expected was 69% (minimum 57%, maximum 83%).
Procedure. Respondents were randomly divided into four groups and asked questions about the 20 experimental items. Two of the groups assessed the similarity of the citizens, and two the likelihood that the citizens were cousins. For each quadruple, the respondents had to say which of the three citizens – b, c, or p/q, is the most similar to, or the most likely to belong to the same family as, the citizen a. Within each type of judgment, one group of respondents got quadruples containing citizen p, while the other group got those same quadruples but with citizen q instead. Citizen a was always positioned on the top of the page, while the order of the other three citizens on the page was counterbalanced. Items were randomized.
4.3 Results
When a quadruple contained citizens a, b, c and p, the expected groupings were a with b and c with p; when it contained a, b, c and q, the expected groupings were a with c and b with q. Consequently, in line with Reference Tversky and GatiTversky & Gati (1978), we expected that b would be chosen as the most similar to a more often in the presence of p than of q, and c would be chosen as the most similar to a more often in the presence of q than of p. Accordingly, for likelihood judgments, we expected that b would be chosen as the most likely to belong to the same family as a more often in the presence of p than in the presence of q, and to the same family as c more often in the presence of q than in the presence of p.
As shown in Table 2, the context manipulation affected both the similarity and the likelihood judgments. The difference in the percentage of respondents who chose person b in the presence of p vs. q was significantly different from zero for both types of judgments (for similarity, t(19)=3.42, p=.003; for likelihood, t(19)=5.46, p=.001). The same was the case for the average difference in the percentage of respondents who chose person c in the presence of q vs. p (for similarity, t(19)=12.03, p=.001; for likelihood, t(19)=7.98, p=.001). Average differences were also reliably different from zero (for similarity, t(19)=10.42, p=.001; for likelihood, t(19)=8.13, p=.001).
5 Discussion
We used Tversky’s (1977) contrast model to develop manipulations that are known to affect similarity judgments and tested whether they also influence likelihood judgments. Results of both experiments were in accord with the representativeness heuristic, which holds that judgments of likelihood are affected by the similarity of objects.Footnote 4 One experiment showed that pairs of enriched objects – those defined with more features — were judged to be both more and less similar, as well as both more and less likely to belong to the same class. The other experiment showed that context affects similarity and likelihood judgments in similar ways.
These results provide support for the assumption that judgments of likelihood are based on similarity of objects, in accord with the studies of Bar-Hillel (1982). Our study is novel in that it is the first to test whether manipulating similarity has comparable effects on both judgments of similarity and judgments of likelihood.
Our results are also in line with those of Nilsson, Olsson & Juslin (2005; see also Reference Nilsson, Juslin and OlssonNilsson, Juslin & Olsson, 2008). They compared three cognitive mechanisms that could underlie probability judgments: (1) representativeness heuristics – modeled as prototype similarity, relative likelihood, or evidential support accumulation; (2) cue-based relative frequency; and (3) exemplar memory accounts. They found that the mechanism based on exemplar memory outperformed other accounts of probability judgments in a range of tasks. The exemplar-based mechanism differed from the other accounts in that it responded to both the similarity of an event to exemplars from other categories, and to the relative frequency of exemplars from other categories.
The idea that manipulating features of objects can affect both similarity and likelihood judgments has been investigated in the feature-based categorization literature (e.g., Sloman, 1993; Smith & Osherton, 1989). Further studies could investigate whether feature-based models of induction could be applied to the types of tasks investigated in this paper, and enable a more precise understanding of the underlying processes. Furthermore, it would be useful to investigate the relationship between similarity and likelihood judgments using different sets of stimuli, going beyond persons and families, and beyond categorization tasks.
Reference Tenenbaum and GriffithsTenenbaum & Griffiths (2001) analyzed the rational basis of representativeness using a Bayesian approach. One of their findings was that similarity-based models can approximate rational Bayesian models with reasonable accuracy but require much simpler computations. This result is in line with the idea that similarity might be used as a heuristic for probability judgments. Our results, showing that similarity and likelihood judgments track each other seem to be in accord with these suggestions.
Reference Hertwig and GigerenzerHertwig & Gigerenzer (1999) warned that the way respondents interpret tasks involving probability judgments depends on the overall context. When tasks involve cues that are irrelevant to judgments of likelihood (such as the description of Linda in Reference Tversky and KahnemanTversky & Kahneman, 1983), respondents may use standard conversational norms (Reference GriceGrice, 1989) and infer that the task involves more than simply evaluating mathematical probability of an event. A related issue might be relevant in our study. In our tasks, the information that could have been used to assess the likelihood that any two guests belong to a particular family (i.e., number of families and their members present at the party given at the beginning) was less salient than the information about the similarity of personality traits (given on every page). This is because we tried to emulate the similarity tasks Tversky used as closely as possible. Had we created a context that involved more likelihood cues, we might have obtained different results.
Nilsson et al. (2008), showed that subjective probability depends not only on similarity but also on other factors. Thus, we close by emphasizing that, while we have shown that similarity is an important contributor to likelihood judgments, we are not claiming that the two types of judgments are identical. Clearly, likelihood judgments depend on other factors in addition to similarity, or might we say in addition to representativeness.
Appendix: Trait adjectives used as features
1 Trait adjectives marked in italics were the ones used in impoverished conditions in Experiment 1.
2 Because Goldberg’s list (1990; in his Table 3) included more positive than negative trait adjectives related to the factor Openness to experience, we added several negative trait adjectives by finding antonyms for the positive trait adjectives within this factor. We used Merriam-Webster’s dictionary available at www.m-w.com.