1 Introduction
If preferences are transitive, then if X is preferred to Y and Y is preferred to Z, then X is preferred to Z. Many people consider transitivity to be both rational and also descriptive of risky decision making. But there are some who argue that transitivity is neither rational nor descriptive (e.g., Reference FishburnFishburn, 1991; Reference Butler and BlavatskyyButler & Blavatskyy, 2020; McNemara, et al., 2014).
Reference Butler and PogrebnaButler and Pogrebna (2018) theorized that if a person follows a binary decision rule of choosing the alternative that has a higher probability of yielding a better outcome, then that person would show systematic violations of transitivity of preference in specially constructed choice problems. This decision strategy is known as the most probable winner (MPW) model, and is sometimes known as majority rule.
For example, consider gambles with three equally likely consequences, X = (x 1, x 2, x 3), denoting equal chances to win $x 1, $x 2, or $x 3. Suppose X = (15, 15, 3), Y = (10, 10, 10), and Z = (27, 5, 5). If the prizes received under X, Y, Z are statistically independent, then the probability that X gives a higher outcome than Y is 2/3; the probability that Y gives a higher outcome than Z is 2/3; and the probability that Z gives a higher outcome than X is 5/9. So, if people choose by MPW, their choices would be intransitive with this triple.
The MPW model is an extreme special case of the additive difference models, a class of models investigated by Reference Birnbaum and DiecidueBirnbaum and Diecidue (2015) that also includes regret theory (Reference Loomes and SugdenLoomes & Sugden, 1982) as a special case. Regret theory implies the opposite pattern of intransitivity (See Appendix) from that of MPW. Reference Birnbaum and DiecidueBirnbaum and Diecidue (2015) found very few individuals whose data showed either type of intransitive response patterns; they also found that the data showed systematic violations of restricted branch independence, a property implied by the additive difference models.
Reference Butler and PogrebnaButler and Pogrebna (2018) conjectured that intransitive behavior might be observed when there are no more than two distinct outcome values in each gamble, and when riskier gambles have slightly higher expected values (EV) than safer ones. In this example, the most risky gamble, Z has an EV of 12.33, medium risky gamble, X, has an EV = 11, and the safest gamble, Y, has the lowest EV, 10. There were a total of 11 triples with similar values constructed from this recipe.
For each triple of gambles, X, Y, and Z, there are three binary choice problems, XY, YZ, and ZX. In XY the response can be coded as 1 if the X is chosen, and 2 if Y; in YZ, code 2 or 3 if Y or Z was chosen; in ZX, code 1 or 3 if X or Z was preferred, respectively. There are 8 possible preference patterns: 121, 123, 131, 133, 221, 223, 231, or 233, two of which are intransitive, 123 and 231. For these X, Y, and Z, the MPW model implies 123. Other theories are discussed in the Appendix.
Butler and Pogrebna used these three gambles (in pounds rather than dollars) and 10 other, similar triples of gambles; they obtained three binary choices in each triple, which were presented twice to each of 100 participants. In between the two repetitions of the binary choice task, participants were asked to rank the three gambles in each triple, which forces a transitive order. Because the intervening task of ranking may have affected preferences, I use the term “repetitions” rather than “replications” for the two presentations of the binary choices.Footnote 1
1.1 Problems with Previous Analyses
In previous research, there has been a long standing problem of how to test transitivity when there might be random errors in the data, when the error rates in different choice problems might be unequal, and when data might include a mixture of true preference patterns, either because different people have different true preferences or because the same person may change preferences over time. It turns out that methods of analysis used in the past cannot be relied upon in such cases to properly evaluate the issue of transitivity; those methods can easily fail to correctly diagnose whether data arose from a transitive or intransitive true preference structure.
Reference Sopher and GigliottiSopher and Giglioti (1993) noted that if different choice problems have different rates of error, then tests of transitivity based on inequality of response patterns (once considered evidence of intransitivity) might easily signal “intransitivity” when the data are actually perfectly transitive. Reference Birnbaum and SchmidtBirnbaum and Schmidt (2008) gave clear examples to illustrate that one cannot properly address the issue of transitivity without measuring error rates in the choice problems used to measure preferences and test transitivity. They noted that with proper experimental designs, the true and error model can be used to estimate error rates and can resolve this problem.
Reference Regenwetter, Dana and Davis-StoberRegenwetter, Dana, and Davis-Stober (2011) developed statistical tests of weak stochastic transitivity and the triangle inequality based on the assumptions that choice responses satisfy independence and identical distribution (iid). However, Reference BirnbaumBirnbaum (2012), Reference Birnbaum and BahraBirnbaum and Bahra (2012) and others have reported evidence that iid is systematically violated by empirical data, so these new statistical tests are dubious.
But there are even more fundamental problems with testing these properties of binary choice probabilities (than issues with the statistical assumptions): Tests of the triangle inequality and weak stochastic transitivity cannot properly diagnose the issue of transitivity when the data might arise from a mixture of true preferences. The triangle inequality and weak stochastic transitivity can both be perfectly satisfied in cases where the vast majority of participants systematically violated transitivity, and weak stochastic transitivity can be significantly violated in a group of data in which every individual was perfectly transitive but different participants had different true preferences.Footnote 2
Furthermore, statistical tests (of properties like weak stochastic transitivity) give only a “reject” or “retain” answer. They do not provide estimates of the incidences of different transitive and intransitive response patterns, which one needs in order to evaluate models of decision making. Reference Birnbaum and WanBirnbaum and Wan (2020) simulated data from transitive or intransitive processes and illustrated that those older methods of analysis simply cannot correctly identify whether a transitive or intransitive model had been used to create the data. They also showed that the true and error model correctly identified whether a transitive or intransitive model had been used to generate the data in these cases where the older methods fail.
Because the data analyses presented in Reference Butler and PogrebnaButler and Pogrebna (2018) were based on these older methods, a skeptic could remain unconvinced that their data actually contained any real evidence against transitivity, once errors and differences in true preferences are allowed. Fortunately, because their study included two presentations of each choice problem, it is possible to apply the true and error (TE) model to reanalyze their data in order to estimate error rates and the incidence of intransitive and transitive behavior (Reference Birnbaum and WanBirnbaum, 2013; Reference Birnbaum, Navarro-Martinez, Ungemach, Stewart and Quispe-TorreblancaBirnbaum, Navarro-Martinez, Ungemach, Stewart, & Quispe-Torreblanca, 2016; Reference Birnbaum and WanBirnbaum & Wan, 2020)
1.2 True and Error Fitting Model
The numbers of participants who showed each of the response patterns on first and second repetitions for Triple #4 of Reference Butler and PogrebnaButler and Pogrebna (2018) are shown in Table 1. This triple appeared to have the strongest evidence of intransitivity. Rows represent the response pattern on the first repetition of the binary choice problems, and the columns show the response pattern on the second repetition. There were 26 people out of 100 participants who showed intransitive cycle (231) on both repetitions. There were 33 others who showed the 231 response pattern on the first repetition and switched to other patterns on the second repetition, including 12 who switched to 223 and 8 who reversed all three preferences to 123. However, not even one participant responded on both occasions with this intransitive pattern (123) implied by the MPW model. Although Table 1 did not appear in the published version of Reference Butler and PogrebnaButler and Pogrebna (2018), this table and those for the other triples are included in the journal’s Online supplement to their paper.
Total n = 100. The most probable winner model implies the intransitive pattern 123; the opposite pattern, 231, is also intransitive.
The TE model can be fit to Table 1 and the corresponding tables for the other triples. There are two sets of parameters in a group TE model: the probabilities of making errors in each of the three choice problems, denoted e 1, e 2, and e 3; and the probabilities of the 8 possible true preference patterns, p 121, p 123, p 131, p 133, p 221, p 223, p 231, and p 233, which represent the distribution in the mixture of true preference patterns among the individuals. According to the TE fitting model used here, the predicted frequency that people will show the response pattern 123 (implied by MPW) on two replications (of three choice problems) is given as follows:
where P 123,123 is the predicted frequency (count) of the 123 response pattern in both repetitions of the task (i.e., six separate binary choice responses on different trials), and n is the number of participants. Note that if a person has the true preference pattern of 123, then she or he would have to make no errors on six separate choice problems to exhibit this response pattern, and if the true pattern were 121, then she or he made an error on the third choice problem twice. This expression is one of the 64 equations for the predicted frequencies of the 64 possible response patterns, as in Table 1. The “predicted” (or “fitted”) frequencies corresponding to the data in Table 1 are simply n times the theoretical probabilities, using parameter estimates best-fit to the data.
Birnbaum’s (2013) Excel spreadsheet, TE8x8_fit.xlsx, [available from the journal’s website supplement to Reference Birnbaum and WanBirnbaum and Wan (2020)] can be used to select parameters to minimize either χ2 or G indices of fit. Minimizing G is equivalent to a maximum likelihood solution. The index G, sometimes denoted G 2, is defined as follows:
where the summation is over the 64 cells (8 rows by 8 columns), O ij is the observed frequency in the cell (as in Table 1), P ij is the “predicted”, or “fitted” frequency. The indices, i and j, represent the 8 response patterns for the rows and columns of tables (as in Table 1), respectively; i.e., i = 1, 2, 3, …, 8 correspond to 121, 123, 131, …, 233, respectively. The χ2, is similar:
These indices usually take on similar values, and both are asymptotically Chi-Square distributed.
There are 64 data values in the 8 by 8 tables, which sum to the number of participants, and thus have 63 degrees of freedom (df). The 11 parameters to be estimated use 10 degrees of freedom because the eight probabilities of true preference patterns sum to 1, so the Chi-Squares have 53 df.
The transitive model is a special case of the TE model in which p 123 and p 231 are both fixed to zero. One can therefore test transitivity by computing the difference between the fits of the TE model and the transitive special case, which is also Chi-Squared distributed with 2 df.
A program in R (Reference Birnbaum, Navarro-Martinez, Ungemach, Stewart and Quispe-TorreblancaBirnbaum, et al., 2016), TE8x2_fit.R, is available from this journal’s website supplement to Reference Birnbaum and WanBirnbaum and Wan (2020). This program can be used to analyze the TE model when sample sizes are relatively small. This program is applied to an 8 by 2 simplification of Table 1, which partitions the data into the 8 diagonal entries and the 8 column sums minus the diagonal entries). It uses Monte Carlo simulations to estimate distributions of the test statistic and it employs bootstrapping to estimate sampling distributions of parameter estimates.Footnote 3
The TE fitting model allows that different participants may have different true preference patterns, but it assumes that each person maintains the same true preferences in both replications, and it allows that a person might make different responses on two replications due to random errors. The errors are assumed to be mutually independent. The assumption that errors are mutually independent does not imply that responses are independent, except in special cases such as when all persons have the same true preferences (Reference Birnbaum and WanBirnbaum, 2013).
2 Results
Table 2 shows estimated parameters of the TE fitting model applied to the data of Reference Butler and PogrebnaButler and Pogrebna (2018). To save space, numbers are presented as percentages (e.g., 04 indicates 0.04). The ranges in parentheses represent bootstrapped 95% confidence intervals, based on 10,000 bootstrapped samples. Three of 11 triples (#4, 7, and 10) have convincing evidence of the 231 intransitive pattern. In these three triples, 26, 12, and 17 people (out of 100) showed this same pattern on both repetitions, and the TE fitting model gave estimated probabilities of p 231 = 0.51, 0.34, and 0.38, respectively. The lower bounds of the confidence intervals are 0.40, 0.19, and 0.24, respectively, giving confidence that the incidence of intransitive behavior is more than trivial in these three triples.
The patterns 123 and 231 are intransitive. Values are expressed as percentages, so 01 indicates 0.01. Numbers in parentheses show 95% bootstrapped confidence intervals. The most probable winner model allows only the 123 pattern in all triples except #5, 8, and 9; in Triple 8, it implies only 121, and it allows either 121 or 123 in #5 and 9.
In Triple # 4, X = (15, 15, 3), Y = (10, 10, 10), and Z = (27, 5, 5); in Triple #7, X = (9, 9, 3), Y = (6, 6, 6), Z = (16, 4, 4); in Triple #10, X = (14, 14, 2), Y = (8, 8, 8), Z = (21, 6, 6). According to the MPW model, these three triples should have shown only the 123 true preference pattern. Instead, there is systematic evidence of the opposite pattern of intransitive behavior and of transitive patterns not allowed by MPW.Footnote 4
There is also evidence of small incidence of the 231 pattern in Triples 1, 5, and 6. Averaged over all triples, the estimated incidence of the 231 cycle was 0.18, meaning that on average, 18% of participants appear to manifest this intransitive true preference pattern with the triples generated with this recipe. Although 18% does not represent the majority of participants, this rate of intransitive behavior is higher than that reported in previous studies with similar choice tasks (Reference Birnbaum and DiecidueBirnbaum & Diecidue, 2015).
Evidence of the intransitive pattern implied by the MPW model, however, appears to be weaker and less convincing than evidence of the opposite. The best case for the 123 pattern implied by MPW is in Triple #3, where only 7 of 100 participants repeated the 123 pattern in both repetitions; the lower limit of the confidence interval for pattern 123 is only 0.06.
The argument that the 123 pattern may result from use of the MPW is thus weak and is made even less compelling because MPW implies that these same people should have only the 123 pattern, not only for Triple # 3 but also for all other triples in the study except #5, 8, and 9, including Triples # 4, #7, and #10, where no one repeated the 123 pattern. Using the TE model, one can reject the hypothesis that the incidence of pattern 123 exceeds 0.05 in Triples #7 and #10, G = 51.51 and 8.85, p < 0.01.
The fit of the TE model can be tested by conventional maximum likelihood tests of the 8 by 8 matrices (as in Table 1). The G tests for each triple are shown in Table 3; these are theoretically Chi-Square distributed with 53 df. Two cases (Triples #4 and 11) have large G values (156.33 and 128.44, respectively). Because sample size is relatively small (n = 100), Monte Carlo simulations were applied to χ2 index in the 8 by 2 partition, using TE8x2_fit.R. The same two triples (#4 and 11) were found to have significant deviations by both conservative and refit Monte Carlo methods (see Reference Birnbaum, Navarro-Martinez, Ungemach, Stewart and Quispe-TorreblancaBirnbaum, et al., 2016), but Triple 5 (which had the third largest G in Table 3) was not significant by either conservative or refit methods.
“TE fit” is G value for true and error model; “Test 123” is a test of the increase in G when p 123 is fixed to zero; “Test Trans” is the increase when both intransitive patterns, p 123 and p 231, are fixed to zero. Critical values with α = 0.01 for 53, 1, and 2 df are 79.8, 6.63, and 9.21, respectively.
Table l shows the nature of the violations of the TE model in Triple #4. The model implies that the matrix should be symmetric; however, 33 people who displayed the 231 response pattern on the first repetition switched to other response patterns on the second repetition (Row 231), but only 13 changed from other patterns to 231 in the second repetition (Column 231). Thus, this intransitive 231 pattern occurs less often in the second repetition, following the ranking task. Because Triple #4 had the strongest evidence for violations of transitivity, one might be concerned that evidence of intransitivity might somehow be an artifact of violations of the TE model. Nevertheless, Triples #3, 7, and 10, which showed evidence of intransitive behavior, all had G tests of fit less than 70 (not significant) and the TE model appeared to approximate their data fairly well.
The column labeled “Test 123” in Table 3 shows G tests of the hypothesis that p 123 = 0. To construct these tests, one computes the fit of TE with all parameters free and the fit to the same data with p 123 fixed to zero. The difference in G between these fits is then theoretically Chi-Square distributed with one degree of freedom. Only two cases show significant evidence to reject p 123 = 0: Triples 3 and 8, which are also the only cases where bootstrapped confidence intervals for p 123 have lower limits that exceed zero (Table 2).
In Triple #3, X=(12, 12, 2), Y=(8, 8, 8), and Z=(20, 4, 4), whereas in Triple 8, Triple # 8, X=(15, 15, 5), Y=(10, 10, 10), and Z=(30, 3, 3). The MPW implies the intransitive preference pattern 123 for Triple 3, but it implies the transitive preference pattern 121 in Triple 8, so evidence of the 123 pattern is evidence against MPW in this case. The frequency of the 121 pattern implied by MPW in Triple 8 is not significantly different from zero, whereas the 221 pattern appears to be most common for Triple 8, contrary to the MPW model.
There are two ways to examine the predictions of a model: One way is to look for what a model predicts and ask if there is any significant trace of evidence “for” (consistent with) what that model predicts; the other is to look for what the model cannot predict and ask if the deviations are significant. Although there is a significant trace of evidence “for” the intransitive pattern predicted by MPW in Triple #3, the MPW model cannot account for any other response patterns, including transitive ones, in this triple. In fact, MPW implies only the 123 preference pattern in all triples except #5, 8, and 9.Footnote 5 As one can see in Table 2, there is substantial evidence of transitive patterns, especially 221, that cannot be reconciled with the MPW model. Therefore, if we treat MPW as a candidate descriptive model, we must reject it because of significant violations of MPW that occur in all 11 triples.
The third column in Table 3 labeled “Test Trans” shows G tests of the transitive special case of the TE model; that is, tests of the hypothesis that p 123 = p 231 = 0. These have 2 degrees of freedom. All except Triples 2, 9, and 11 show significant violations of transitivity, and these cases all correspond to cases where the bootstrapped confidence intervals in Table 2 exclude zero for at least one intransitive pattern.
3 Discussion
In summary, reanalysis of Reference Butler and PogrebnaButler and Pogrebna (2018) via the TE model reveals evidence of small but systematic violations of transitivity of preference. However, the majority of violations of transitivity in Table 2 occur in the opposite direction from that predicted by MPW. Those violations of transitivity are more consistent with the concept of regret (Reference Loomes and SugdenLoomes & Sugden, 1982; see also Appendix). Further, the data show significant violations of the MPW model and one can reject the hypothesis that more than a very tiny fraction of people might have used a MPW strategy. Of two cases where significant evidence of nonzero 123 violations was found, one case was consistent with MPW and the other was a violation of MPW, so it seems hard to argue that the small trace of evidence in the one case actually resulted from use of this strategy.
To describe these data, one could argue they are compatible with a mixture of transitive and intransitive true patterns generated by individual differences and parameters that change over time within a person, as in the MARTER models of Reference Birnbaum and WanBirnbaum and Wan (2020).
Other recent studies using TE models to evaluate violations of transitivity reported significant but small incidences (Reference Birnbaum and GutierrezBirnbaum & Gutierrez, 2007; Reference Birnbaum and BahraBirnbaum & Bahra, 2012; Reference Birnbaum and DiecidueBirnbaum & Diecidue, 2015; Reference Birnbaum, Navarro-Martinez, Ungemach, Stewart and Quispe-TorreblancaBirnbaum, et al., 2016; Reference Birnbaum and SchmidtBirnbaum & Schmidt, 2008). The rates of violation of transitivity estimated here in the Reference Butler and PogrebnaButler and Pogrebna (2018) data are small, but they are higher than those reported in previous studies, so the recipe for constructing triples presented by Reference Butler and PogrebnaButler and Pogrebna (2018) may indeed show promise, even if the MPW theory that motivated this design can be rejected.
X = (15, 15, 3), Y = (10, 10, 10), Z = (27, 5, 5); 1, 2, 3 denote preference for X, Y, or Z in choices XY, YZ, and ZX, respectively. Patterns 123 and 231 are intransitive.
Appendix: Theoretical Analysis
Table 4 lists certain theoretical decision rules or models that are compatible with different response patterns in Triple 4 of Reference Butler and PogrebnaButler and Pogrebna (2018). The codings 1, 2, and 3 indicate preferences for X, Y, or Z in choice problems, XY, YZ, and ZX, respectively. For Triple 4 of Reference Butler and PogrebnaButler and Pogrebna (2018), X = (15, 15, 3), Y = (10, 10, 10), and Z = (27, 5, 5). Table 4 notes, for example, that the preference pattern 121 would be consistent with a decision rule to choose the gamble with the higher median value (X has a median of 15, higher than the median of Y, 10, which is higher than the median of Z, 5).
If a person chose the gamble with the larger highest outcome, the pattern would be 133, which in this case is also the pattern implied by mean outcome; i.e., expected value (EV). Choosing the gamble with the better lowest outcome would lead to the preference pattern 223.
The response pattern, 123, is intransitive, and is consistent with the most probable winner (MPW) model under either the assumption that the gambles are treated as independent, or under the assumption that the outcomes are completely dependent events (e.g., if 27 were the outcome of Z, then the outcome of X must be 15).
If a person chose by higher expected utility (EU), and if utility is a power function of money, then the preference pattern depends on the exponent of the utility function, u(x) = x α. For three, equally likely consequences, . For α < 0.4, the pattern is 223, for 0.4 ≤ α ≤ 0.5, the pattern is 233, and it is 133 for α > 0.5.
Birnbaum’s (2008) TAX model with its “prior” parameters (Reference Birnbaum and BaileyBirnbaum & Bailey, 1998) implies the pattern 221, but like EU, which is a special case of TAX, it can also imply other patterns. But the TAX model and EU are transitive, so they cannot imply true preference patterns of 123 or 321.
The additive difference model (ADM), with power functions (Reference Birnbaum and DiecidueBirnbaum & Diecidue, 2015, Equations 10 and 13), for gambles X = (x 1, p 1; x 2, p 2; x 3, p 3) and Y = (y 1, q 1; y 2, q 2; y 3, q 3), can be written:
where ψ(X,Y) is the decision function such that if it is positive, prefer X; if it is negative, prefer Y; α and β are parameters. σ(x i, y j) is the sign function (−1, 0, 1) that retains the sign of x i − y j. The summation in the dependent case includes only corresponding event-consequence branches (only where i = j), and the product of probabilities is replaced by the branch probabilities. In the independent case, one must sum over all possible contrasts and weight them by the appropriate product of the branch probabilities. This model is fairly general (Reference Birnbaum and DiecidueBirnbaum & Diecidue, 2015) and can be used to represent regret theory (Reference Loomes and SugdenLoomes & Sugden, 1982) as well as advantage-seeking models, like most probable winner.
The ADM model, assuming independence, can imply the intransitive pattern, 123, as well as the patterns 133, 233, and 223. To test between the independent and dependent interpretations of these regret-type models experimentally, one could manipulate permutations of the consequences over events: the dependent model implies intransitive cycles should be reversible, an implication called “recycling” by Reference Birnbaum and DiecidueBirnbaum and Diecidue (2015), whereas the independent model implies no effects of permuting the consequences over equally likely events or positions. If I understand their method, Reference Butler and PogrebnaButler and Pogrebna (2018) always presented the consequences in in the same positions, so no tests of permutation or recycling were provided in their study.
As shown in Figure 1, the ADM model for dependent gambles can handle six preference patterns (123, 133, 233, 231, 221, and 223). The intransitive pattern of 123 occurs, for example, when α = 0.4, β=0.7, and the opposite intransitive cycle, 231, is implied by parameters with a “regret” interpretation (Reference Loomes and SugdenLoomes & Sugden, 1982); that is, when β > 1; e.g., α = 0.4, β=1.3. This model can also handle the transitive, 221 pattern that is the most probable preference pattern overall in the study (Table 2). This model is quite flexible, allowing so many possible patterns; it rules out only the 121 and 131 response patterns. The strongest evidence against it in these data appears in Triple 9, where 8 people showed the 121 pattern on both repetitions, and 17% are estimated (Table 2) to have this true preference pattern.
The additive difference model implies the property of restricted branch independence, which was significantly violated in Reference Birnbaum and DiecidueBirnbaum and Diecidue (2015) and other studies, but was not tested in Reference Butler and PogrebnaButler and Pogrebna (2018). Thus, a skeptic might not be impressed by the fact that this model, which can handle so many different preference orders, remains compatible with these data.
Many other theories could be (or have been) devised to make predictions for this study, besides those listed in Table 4. Therefore, I prefer to say that evidence of any subset of preference patterns is “consistent with” rather than “supports” a theory. Some theories are more flexible (allow more possible preference patterns) than others. Although some people like to compare models by statistical computations of fit adjusted for a model’s complexity in a single study, I do not favor that approach. I prefer to compare theories by their success in predicting the results of new tests of diagnostic properties, where the rival theories make qualitatively different predictions. Further comments on testing and comparison of models like these are in Reference BirnbaumBirnbaum (2019).