1. Introduction
At the core of Bayesian epistemology lies a small number of fundamental credal principles. Probabilism tells you how your credences in logically related propositions should relate to one another. Conditionalization tells you how to update your credences in response to a specific sort of new evidence, namely, evidence that makes you certain of a proposition. The Principal Principle tells you how your credence about the objective chance of an event should relate to your credence in that event. And, for objective Bayesians, there are further principles that tell you how to set your prior credences, that is, those you have before you incorporate your data.
The accuracy-first programme in epistemology seeks new foundations for these central credal principles (Joyce Reference Joyce1998; Greaves and Wallace Reference Greaves and David2006; Pettigrew Reference Pettigrew2016a). The idea is straightforward. We adopt the orthodox Bayesian assumption that your uncertain doxastic states can be represented by assigning precise numerical credences to each proposition you consider. By convention, your credence in a proposition is represented by a single real number at least 0 and at most 1, which we take to measure how strongly you believe that proposition. We then represent the whole doxastic state by a credence function, which takes each proposition about which you have a credal opinion and assigns to it your credence in that proposition. So far, that’s just the representational claim of orthodox Bayesianism. Now for the distinctive claim of accuracy-first epistemology. It is a claim about what makes a credal state, represented by a credence function, better or worse from the epistemic point of view. That is, it says what determines the epistemic or cognitive value of a credal state. It says: A credal state is better the more accurate it is; it is worse the more inaccurate it is. And we might intuitively think of the inaccuracy of a credence function as how far it lies from the ideal credence function, which is the one that assigns maximal credence to each truth and minimal credence to each falsehood. In accuracy-first epistemology, we formulate mathematically precise ways of measuring this epistemic good. We then ask what principles a credence function should have if it is to serve the goal of attaining that good; or, perhaps better, what properties should it not have if it is to avoid being suboptimal from the point of view of that good. And indeed we can find arguments of exactly this sort for the various fundamental credal principles of Bayesianism that we listed above.
Each of these arguments has the same form. Its first premise specifies properties that an inaccuracy measure must have.Footnote 1 Its second premise provides a bridge principle connecting inaccuracy with rationality. And its third and final premise is a mathematical theorem that shows that, if we apply the bridge principle from the second premise using an inaccuracy measure that has the properties demanded by the first premise, it follows that any credence function that violates the credal principle we seek to establish is irrational. We thus conclude that principle.
So, for instance, Jim Joyce lays down a series of properties that legitimate measures of inaccuracy must have: Structure, extensionality, normality, dominance, weak convexity, and symmetry (Joyce Reference Joyce1998). He then formulates his bridge principle that connects inaccuracy and rationality: It says that a credence function is irrational if it is accuracy dominated; that is, if there is an alternative that is guaranteed to be more accurate. Then he proves a mathematical theorem to show that any credence function that is not probabilistic is accuracy dominated. And he concludes Probabilism.
Similarly, Hilary Greaves and David Wallace lay down a single property they take to be necessary for measure of inaccuracy (Greaves and Wallace Reference Greaves and David2006): It is strict propriety, and it will play a central role in what follows. Then they say that your updating plan is irrational if there is an alternative that your prior credence function expects to be more accurate. And finally, they prove that updating rules that proceed by conditioning your prior on your evidence, and only such rules, minimize expected inaccuracy from the point of view of your prior. And they conclude Conditionalization.
In contrast, R. A. Briggs and I demanded not only strict propriety, but also continuity and additivity. While Greaves and Wallace’s bridge principle talks of the irrationality of your updating rule from the point of view of your prior, the principle that Briggs and I used talks of the irrationality of the combination of your prior and updating rule. It says that a prior and updating rule taken together are irrational if there is an alternative prior and an alternative updating rule that, when taken together, are more accurate than your prior and your updating rule taken together. And they show that any combination in which the updating rule does not proceed by applying Bayes’ rule to the prior is rendered irrational by this bridge principle (Briggs and Pettigrew Reference Briggs and Richard2020).
My argument for the Principal Principle demanded the same three properties: Additivity, continuity, and strict propriety. The bridge principle: A credence function is irrational if there is an alternative that every possible objective chance function expects to be better. And the mathematical theorem says that there is such an alternative for every credence function that violates the Principal Principle (Pettigrew Reference Pettigrew2013).
Finally, my argument for the Principle of Indifference assumed two different properties: Egalitarianism and rendering indifference immodest. The bridge principle said that a credence function is irrational if there is an alternative whose worst-case inaccuracy is lower than the worst-case inaccuracy of the original one. And I showed that the only credence function that is not irrational by these lights is the uniform distribution, which is exactly what the Principle of Indifference demands (Pettigrew Reference Pettigrew2016b).
Now, an argument is only as strong as its premises are plausible. In this paper, I’d like to consider the first premise in each of these arguments. In these first premises, we lay down what we will demand of an inaccuracy measure. My aim is to take the current best version of this premise and improve it by making it less demanding. There are really eight sets of conditions offered in the literature:
-
(I) In his 1998 argument for Probabilism, Joyce imposes six conditions on measures of inaccuracy: Structure, extensionality, normality, dominance, weak convexity, and symmetry (Joyce Reference Joyce1998).
-
(II) In his 2009 argument for a restricted version of Probabilism, he imposes just two: Truth-directedness and coherent admissibility (Joyce Reference Joyce, Huber and Schmidt-Petri2009).
-
(III) In their 2006 argument for Conditionalization, Greaves and Wallace impose just one: Strict propriety (Greaves and Wallace Reference Greaves and David2006).
-
(IV) In their 2009 argument for Probabilism, Predd et al. impose three: Continuity, additivity, and strict propriety. These are also the three conditions imposed in my argument for the Principal Principle and the argument for Conditionalization that Briggs and I offer (Predd et al. Reference Predd, Robert, Lieb, Osherson, Vincent Poor and Kulkarni2009; Pettigrew Reference Pettigrew2013; Briggs and Pettigrew Reference Briggs and Richard2020).
-
(V) In our 2010 arguments for Probabilism and Conditionalization, Leitgeb and I considered three different sets of conditions: For our purposes, the important condition is global normality and dominance, which entails additivity, the condition we seek to excise here (Leitgeb and Pettigrew Reference Pettigrew2010).
-
(VI) In my 2014 argument for the Principle of Indifference, I imposed two conditions: The inaccuracy measures must be egalitarian and they must render indifference immodest (Pettigrew Reference Pettigrew2016b).
-
(VII) In their 2016 argument, D’Agostino and Sinigaglia impose five: One-dimensional value sensitivity, subvector consistency, monotonic order sensitivity, permutation invariance, and replication invariance (D’Agostino and Dardanoni Reference D’Agostino and Valentino2009; D’Agostino and Sinigaglia Reference D’Agostino, Corrado, Suárez, Dorato and Rédei2010).
-
(VIII) In his 1982 paper, Lindley argued not for probabilism itself, but for a weaker principle. He did not assume, as we have, that credences are measured on a scale from 0 to 1, nor that 0 is the minimum and 1 is the maximum. Instead, he made few assumptions about the scale on which credences are measured; he imposed some reasonably weak conditions on measures of the inaccuracy of credences and then showed that those credal assignments that are not accuracy-dominated are precisely those that can be transformed into probabilistic credence functions using a particular transformation. However, while Lindley’s conditions are weaker than some of the others listed here, they nonetheless include additivity (Lindley Reference Lindley1982).
There is a lot to be said about the relationships between these different sets of necessary conditions for inaccuracy measures, but that’s not my purpose here. Here, I want to take what I think are the best accuracy-first arguments for Probabilism, Conditionalization, and the Principal Principle and improve them by weakening the demands they make of inaccuracy measures in their first premises. That is, I want to show that those arguments go through for a wider range of inaccuracy measures than we’ve previously allowed. As I will explain below, I take those best arguments to be the ones based on Predd et al.’s set of conditions: Strict propriety, continuity, and additivity. I will strengthen those arguments by showing that they go through if we impose only strict propriety and continuity. We do not need to impose the condition of additivity, which says roughly that the inaccuracy of a whole credence function should be the sum of the inaccuracies of the credences that it assigns. That is, we can strengthen those arguments by weakening their first premise.
Why should this interest us? After all, Joyce, as well as D’Agostino and Sinigaglia, have offered arguments for probabilism that don’t assume additivity. True, but Patrick Maher (Reference Maher2002) has raised serious worries about Joyce’s 1999 characterization, and I have built on those (Pettigrew Reference Pettigrew2016a, section 3.1); and Joyce’s Reference Joyce, Huber and Schmidt-Petri2009 characterization applies only to credence functions defined over a partition and not those defined on a full algebra, so while its premises are weak, so is its conclusion. D’Agostino and Sinigaglia do not assume additivity, but their characterization does entail it, and the subvector consistency requirement they impose is implausibly strong for the same reason that additivity is implausibly strong. And, similarly, the global normality and dominance condition that Leitgeb and I assumed entails additivity and so is implausibly strong for the same reason. And, as noted above, Lindley explicitly assumes additivity. This suggests that the best existing accuracy-first argument for probabilism is the one based on Predd et al.’s results, which assumes additivity, strict propriety, and continuity. So there is reason to show that probabilism follows from strict propriety and continuity alone.
What about the Principal Principle? Well, the only existing accuracy-first argument for that is my 2013 argument, and that assumed additivity, strict propriety, and continuity. So, again, there is reason to show that the principle follows from strict propriety and continuity alone. What about Conditionalization? Here, Greaves and Wallace have offered an argument based on strict propriety alone—it does not assume additivity, nor even continuity. True, but their result applies to a very specific case, namely, one in which (i) you know ahead of time the partition from which your evidence will come, (ii) you know that you will learn a proposition iff it is true, and (iii) you form a plan for how you will respond should you learn a particular proposition from that partition. In contrast, the result that Briggs and I offered can be generalized to cover many more cases than just this. As we will see, it can be generalized to establish what I will call the Weak Reflection Principle, which entails the restricted version of Conditionalization that Greaves and Wallace consider. So, there is reason to excise additivity from the assumptions that Briggs and I made. However, as we will see, our argument assumes additivity in two guises: First, it demands that we measure the inaccuracy of an individual credence function using an inaccuracy measure that satisfies additivity; second, it assumes we obtain the inaccuracy of a combination of prior and updating rule by adding the inaccuracy of the prior to the inaccuracy of the updating rule. We will see how to remove the first assumption of additivity, but not the second. That must await future work.
2. Predd et al.’s conditions
In this section, I describe Predd et al.’s set of conditions—the ones we numbered (IV) in our list above. This will furnish us with statements of strict propriety and continuity, the assumptions we’ll use in our new arguments for Probabilism, the Principal Principle, and Conditionalization; and it will also introduce us to additivity, the assumption that we’re dropping from the existing best arguments for these conclusions. We will explain the problems with additivity in section 3 below.
First, let’s lay out the framework in which we’re working:
-
We write ${\cal{W}}$ for the set of possible worlds. We assume ${\cal{W}}$ is finite. Footnote 2 So ${\cal{W}} = \{w_1, \ldots, w_n\}$ .
-
We write ${\cal{F}}$ for the full algebra of propositions built over ${\cal{W}}$ . That is, ${\cal{F}}$ is the set of all subsets of ${\cal{W}}$ .
-
We write ${\cal{C}}$ for the set of credence functions defined on ${\cal{F}}$ . That is, ${\cal{C}}$ is the set of functions $c: {\cal{F}} \rightarrow [0, 1]$ .
-
We write ${\cal{P}}$ for the set of probabilistic credence functions defined on ${\cal{F}}$ . That is, p is in ${\cal{P}}$ iff p is in ${\cal{C}}$ and (i) $p(\top) = 1$ , and (ii) $p(A \vee B) = p(A) + p(B)$ when A and B are mutually exclusive, that is, when there is no possible world at which A and B are both true.
-
Given a credence function c, we write $c_i$ for the credence that c assigns to world $w_i$ .
-
We write $w^i$ for the ideal credence function on ${\cal{F}}$ at world $w_i$ . That is, for X in ${\cal{F}}$ ,
$${w^i}(X) = \left\{ {\begin{array}{*{20}{c}} {1\,{\rm{if}}\,X\,{\rm{is}}\,{\rm{true}}\,{\rm{at}}\,{w_i}} \\ {0\,{\rm{if}}\,X\,{\rm{is}}\,{\rm{false}}\,{\rm{at}}\,{w_i}} \\ \end{array}} \right.$$So, in particular, $w^i(w_j) = w^i_j = 1$ if $i = j$ and 0 if $i \neq j$ . -
An inaccuracy measure is a function ${\mathfrak{I}}: {\cal{C}} \times {\cal{W}} \rightarrow [0, \infty]$ . For c in ${\cal{C}}$ and $w_i$ in ${\cal{W}}$ , ${\mathfrak{I}}(c, i)$ is the inaccuracy of c at world $w_i$ .
Here are the three properties that Predd et al. demand of inaccuracy measures.
Continuity For each world $w_i$ , ${\mathfrak{I}}(c, i)$ is a continuous function of c.
Additivity For each X in ${\cal{F}}$ , there is a scoring rule ${\mathfrak{s}}_X: \{0, 1\} \times [0, 1] \rightarrow [0, \infty]$ such that, for all c in ${\cal{C}}$ and $w_i$ in ${\cal{W}}$ ,
We say that the scoring rules ${\mathfrak{s}}_X$ for each X in ${\cal{F}}$ generate ${\mathfrak{I}}$ .
Additivity says that the inaccuracy of a credence function is the sum of the inaccuracies of the individual credences it assigns.
Strict Propriety For all p in ${\cal{P}}$ , $\sum^n_{i=1} p_i {\mathfrak{I}}(c, i)$ is minimized uniquely, as a function of c, at $c = p$ .
That is, for all p in ${\cal{P}}$ and $c \neq p$ in ${\cal{C}}$ ,
Strict propriety says that each probabilistic credence function should expect itself to be most accurate. Footnote 3
A few examples of inaccuracy measures:
-
Brier score ${\mathfrak{B}}(c, i) = \sum_{X \in {\cal{F}}} {(w^i(X) - c(X))}^2$
-
Log score ${\mathfrak{L}}(c, i) = -\log c_i$
-
Enhanced log score ${\mathfrak{L}}^\star(c, i) = \sum_{X \in {\cal{F}}} \left(-w^i(X)\log c(X) + c(X) \right)$
-
Absolute value score ${\mathfrak{A}}(c, i) = \sum_{X \in {\cal{F}}} |w^i(X) - c(X)|$
-
Logsumexp score
${\mathfrak{LSE}}(c, i) = -\log (1+\sum\limits_{X \in {\cal{F}}} e^{c(X)}) + {{\sum_{X \in {\cal{F}}}(w^i(X) - c(X))e^{c(X)}}\over{1+ \sum_{X \in {\cal{F}}} e^{c(X)}}$
Then:
Continuity | Additivity | Strict Propriety | |
---|---|---|---|
${\mathfrak{B}}$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
${\mathfrak{L}}$ | $\checkmark$ | $\times$ | $\times$ |
${\mathfrak{L}}^\star$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
${\mathfrak{A}}$ | $\checkmark$ | $\checkmark$ | $\times$ |
${\mathfrak{LSE}}$ | $\checkmark$ | $\times$ | $\checkmark$ |
Some notes:
-
The Brier score is additive. It is generated by using the quadratic scoring rule ${\mathfrak{q}}$ for every proposition, where
-
- ${\mathfrak{q}}(1,x) = {(1-x)}^2$ ;
-
- ${\mathfrak{q}}(0, x) = x^2$ .
Since ${\mathfrak{q}}$ is continuous and strictly proper, so is ${\mathfrak{B}}$ .
-
-
The log score is not additive, and it is not strictly proper. A credence function that assigns credence 1 to each world dominates any credence function that assigns less than credence 1 to each world. The log score is, however, strictly ${\cal{P}}$ -proper: That is, for all p in ${\cal{P}}$ , $\sum_i p_i{\mathfrak{L}}(q, i)$ is minimized uniquely, among credence functions q in ${\cal{P}}$ , at $q = p$ .
-
The enhanced log score is additive. It is generated using the enhanced logarithmic scoring rule ${\mathfrak{l}}^\star$ for every proposition, where
-
- ${\mathfrak{l}}^\star(1, x) = -\log x + x$ ;
-
- ${\mathfrak{l}}^\star(0, x) = x$ .
Since ${\mathfrak{l}}$ is continuous and strictly proper, so is ${\mathfrak{L}}^\star$ .
-
-
The absolute value score is additive. It is generated by using the absolute scoring rule ${\mathfrak{a}}$ for every proposition, where
-
- ${\mathfrak{a}}(1, x) = 1-x$ ;
-
- ${\mathfrak{a}}(0, x) = x$ .
But ${\mathfrak{a}}$ is not strictly proper. If $$p \lt {1 \over 2}$$ , then $p{\mathfrak{a}}(1, x) + (1-p){\mathfrak{a}}(0, x) = p(1-x) + (1-p)x$ is minimized at $x = 0$ ; if $$p \gt {1 \over 2}$$ , it is minimized at $x = 1$ ; if $$p = {1 \over 2}$$ , it is minimized at any $0 \leq x \leq 1$ . So ${\cal{A}}$ is not strictly proper either.
-
-
The logsumexp score is strictly proper and continuous, but it is not additive. This shows that assuming strict propriety and continuity, as we do below, is strictly weaker than assuming strict propriety, additivity, and continuity. Lewis and Fallis (Reference Lewis and Fallis2019) give another example of an inaccuracy measure that satisfies strict propriety and continuity, but not additivity: They call it the asymmetric spherical rule.
My concern in this paper is to strengthen the accuracy-first arguments for Probabilism, the Principal Principle, and Conditionalization based on continuity + additivity + strict propriety by showing that they go through if we assume only continuity + strict propriety. Thus, for instance, they go through for the logsumexp score defined above and Lewis and Fallis’ asymmetic spherical rule as well as the Brier and enhanced log scores. Of course, weakening the premises of an argument always strengthens it. So there seems good reason to note this fact regardless of your view of additivity. Nonetheless, in the next section, I explain why you might be suspicious of additivity. Then, in section 4, I give the arguments for Probabilism, the Principal Principle, and Conditionalization without appealing to it. In section 5, I conclude. The appendix contains the proofs of all the theorems on which these arguments are based.
3. Why additivity?
I should begin by pointing out that, while I appealed to Predd et al.’s mathematical results in my presentation of the accuracy dominance argument for probabilism in Accuracy and the Laws of Credence, those authors weren’t themselves working in the accuracy-first framework (Pettigrew Reference Pettigrew2016a). What we are calling inaccuracy measures are for them loss functions; and in the context of loss functions, the additivity assumption is perfectly natural—providing the loss is stated in units of some commodity in which your utility is linear, the total loss to which your credence function is subject is the sum of the individual losses to which your credences are subject. But what about the accuracy-first framework? Is additivity still so natural there? Footnote 4 Some claim it is not: Kotzen (Reference Kotzen2018, 778) claims that, by assuming additivity, we rule out certain plausible ways of measuring inaccuracy “more or less by fiat”; and, as I mentioned above, Lewis and Fallis (Reference Lewis and Fallis2019) describe a continuous and strictly proper inaccuracy measure that seems reasonable, but note that it is not additive.
On the other hand, I wrote:
[S]umming the inaccuracy of individual credences to give the total inaccuracy of a credence function is the natural thing to do (Pettigrew Reference Pettigrew2016a, 49).
My reason? Your credence function is not a single, unified doxastic state, but rather merely the motley agglomeration of all of your individual credal doxastic states. We might mistakenly think of a credence function as unified because we represent it by a single mathematical object, but mathematical functions are anyway just collections of assignments of values to arguments. Or so I argued.
However, while this may be so, it does not entail that there are not features of a credal state that partly determine its accuracy but which cannot be captured by looking only at the credences individually—we might call these global features of the credence function. For instance, we might think that, at a world at which two propositions A and B have the same truth value, it is more accurate to have equal credence in A and in B than to have different credences in them. After all, the ideal credence function will have the same credence in them, namely, 1 in both or 0 in both. And perhaps resembling the ideal credence function in this respect gets you closer to it and therefore more accurate. But having the same credence in A and in B is a global feature of a credence function. To determine whether or not a credence function has it, you can’t just look at the credences it assigns individually.
However, interestingly, while this is indeed a global feature of a credence function, some additive inaccuracy measures will in fact reward it. Take the Brier score, for instance. If A and B are both true, then the inaccuracy of assigning credences a and b to them, respectively, is ${(1-a)}^2 + {(1-b)}^2$ . But it’s easy to see that, if $a \neq b$ , then assigning their average, $${1 \over 2}a + {1 \over 2}b$$ , to both is more accurate. Since ${(1-x)}^2$ is a strictly convex function of x,
Similarly, if A and B are both false,
So, if we measure inaccuracy using the Brier score, which is additive, having the global property in question—that is, assigning equal credences to A and B when A and B are either both true or both false—improves accuracy.
In Accuracy and the Laws of Credence, I argued that we should simply assume that, for any global feature of a credence function that we consider relevant to accuracy, it must be possible to capture it using additive measures along the lines just sketched.
Just as some hold that risk aversion phenomena in practical decision theory are best understood as the result of doing something other than maximizing expected utility—minimizing regret, for instance, or maximizing the quantity favoured by one of the many nonexpected utility theories—and not as having a concave utility function, so any sensitivity to global features of credence functions ought to be understood either as following from their local features or as following from the adoption of an alternative decision principle and not as having a nonadditive inaccuracy measure. (Pettigrew Reference Pettigrew2016a, 51)
But why? Why think that this is true? I imagined that this follows from the fact that credence functions are not unified entities. I assumed that this warrants a defeasible assumption in favour of additivity. That assumption could be defeated if we were to find a global feature we wished to reward but which could not be rewarded by additive measures. But no such feature has presented itself.
Perhaps. But even if we grant the move from the disunified nature of credence functions to this defeasible assumption in favour of additivity, such a defeasible assumption is a flimsy basis for an argument, particularly since we have not systematically investigated the sorts of global features we might consider relevant to accuracy and so have rather sparse evidence that there is no defeater to be found. As we showed above, we can explain why one particular global feature is conducive to accuracy, namely, having the same credence in two propositions that have the same truth value. And indeed you can view the accuracy dominance argument for probabilism as furnishing us with another example. After all, probabilism makes two demands of a credence function. The first is local: The credence function should assign maximal credence to a tautology and minimal credence to a contradiction. The second is global: The credence it assigns to a disjunction of mutually exclusive propositions should be the sum of the credences it assigns to the disjuncts. To tell whether a credence function satisfies this latter condition, you must look at the relationships between the credences it assigns. However, the fact that you can run the accuracy dominance argument for probabilism using additive inaccuracy measures like the Brier score shows that you can show that this global feature of a credence function is conducive to accuracy without building into your inaccuracy measure that it should be rewarded explicitly. Indeed, that is one of the remarkable features of de Finetti’s original proof, which is the ultimate source of Predd et al.’s theorem. But that’s about it. Those are the only two global features of credence functions we’ve succeeded in capturing using additive inaccuracy measures.
So, in the end, to the extent that there still exists doubt that we have considered all global features of credence functions that are relevant to their accuracy and showed how their relevance can be captured by additive inaccuracy measures, there still exists doubt over additivity. And while there still exists doubt over additivity, removing that assumption from the arguments for Probabilism, the Principal Principle, and Conditionalization strengthens them.
4. Arguments without additivity
Let me begin this section by spelling out the arguments we wish to give. Then I’ll move on to explaining and proving the theorems to which they appeal. As I noted above, each argument has the same form:
(NC) Necessary conditions on being a legitimate inaccuracy measure.
(BP) Accuracy–rationality bridge principle.
(MT) Mathematical theorem.
Therefore,
(PR) Principle of rationality.
The first component will be the same in each argument that we give. We will assume only that inaccuracy measures satisfy continuity and strict propriety. But the accuracy–rationality principles will differ from case to case. In the remainder of this section, I’ll state each principle for which we’re arguing and then the bridge principle to which we’ll appeal in that argument.
To state some of the principles of credal rationality, we need some moderately technical notions. I’ll lay them out here. Suppose ${\cal{X}}$ is a set of credence functions. Then:
-
${\cal{X}}$ is convex if it is closed under taking mixtures.
That is, for any c, c’ in ${\cal{X}}$ and any $0 \leq \lambda \leq 1$ , $\lambda c + (1-\lambda) c'$ is also in ${\cal{X}}$ .
-
${\cal{X}}$ is closed if it is closed under taking limits.
That is, for any infinite sequence $c_1, c_2, \ldots$ of credence functions in ${\cal{X}}$ that tends to c in the limit, c is in ${\cal{X}}$ . Footnote 5
-
The convex hull of ${\cal{X}}$ is the smallest convex set that contains ${\cal{X}}$ . We write it ${\cal{X}}^+$ .
That is, if ${\cal{Z}}$ is convex and ${\cal{X}} \subseteq {\cal{Z}}$ , then ${\cal{X}}^+ \subseteq {\cal{Z}}$ .
-
The closed convex hull of ${\cal{X}}$ is the smallest closed set that contains the convex hull of ${\cal{X}}$ . We write it ${\textrm{cl}}({\cal{X}}^+)$ .
Thus, if ${\cal{Z}}$ is closed and ${\cal{X}}^+ \subseteq {\cal{Z}}$ , then ${\textrm{cl}}({\cal{X}}^+) \subseteq {\cal{Z}}$ .
As well as clarifying the technical notions just presented, I also want to flag that these arguments appeal to a notion of possibility at various points: In setting up the framework, we have already talked of possible worlds and algebras built on those worlds; in accuracy dominance arguments for Probabilism, we will quantify over possible worlds; in the accuracy dominance argument for Conditionalization we quantify over possible worlds, but also over possible future credence functions that you currently endorse conditional on you receiving certain evidence; in expected inaccuracy arguments like Greaves and Wallace’s argument for Conditionalization, we will sum over inaccuracies at possible worlds weighted by credences in those worlds to give expectations; and in chance dominance arguments for the Principal Principle, we’ll quantify over possible objective chance functions. What notion of possibility is at play here? Joyce (Reference Joyce1998) takes it to be logical possibility; in a recent paper (Reference Pettigrew2020), I take it to be something like epistemic possibility. It is beyond the scope of this essay to argue for one or the other in detail, so I will leave the notion as a placeholder. But my own preference is for epistemic possibility. What makes it irrational to violate Probabilism, for instance, is that there is another credence function that is more accurate than yours at all epistemically possible worlds—that is, you can tell from the inside, so to speak, that this alternative is better than yours because the only worlds you need to consider are those you consider possible.
4.1 Argument for Probabilism
Here’s probabilism:
Probabilism Your credence function at each time during your epistemic life should be probabilistic.
That is, if c is your credence function at a given time, then we should have:
-
(i) $c(\top) = 1$ ;
-
(ii) $c(A \vee B) = c(A) + c(B)$ for mutually exclusive A and B.
And here’s the bridge principle: A credence function is irrational if there’s another that is guaranteed to be more accurate; that is, if it is accuracy dominated. That is:
Worldly dominance. If there is $c^\star$ such that, for all possible worlds $w_i$ , ${\mathfrak{I}}(c^\star, i) \lt {\mathfrak{I}}(c, i)$ , then c is irrational.
Thus, to move from the claim that an inaccuracy measure must satisfy continuity and strict propriety, together with worldly dominance, to probabilism, we need the following theorem:
Theorem 1 Suppose ${\mathfrak{I}}$ is continuous and strictly proper. If c is not in ${\cal{P}}$ , there is $c^\star$ in ${\cal{P}}$ , such that ${\mathfrak{I}}(c^\star, i) \lt {\mathfrak{I}}(c, i)$ for all $w_i$ in ${\cal{W}}$ .
We prove that in the appendix.
4.2 Argument for the Principal Principle
The principal principle is usually stated as follows, where, for any probability function ch, $C_{ch}$ is the proposition that says that ch is the objective chance function: Footnote 6
Principal principle. If $C_{ch}$ is in ${\cal{F}}$ and $c(C_{ch}) \gt 0$ , then, for all A in ${\cal{F}}$ , we should have
The version we’ll consider here is more general than this version. And indeed, the more general version entails the usual version.
Generalized Principal Principle. Suppose ${\cal{A}}$ is the set of possible objective chance functions. Then your credence function should be in the closed convex hull of ${\cal{A}}$ .
Thus, for instance, if all the possible objective chance functions consider A and B equally likely, your credence function should consider them equally likely; and if all the possible objective chance functions consider A more likely than B, then you should consider A at least as likely as B; and if all the possible objective chance functions consider A to be between 30% and 70% likely, then you should not assign credence 0.8 to A; and so on.
Suppose ch is a possible objective chance function and $C_{ch}$ is in ${\cal{F}}$ . And suppose further that ch is not a self-undermining chance function. That is, it is certain that it gives the chances. That is, $ch(C_{ch}) = 1$ . Then, if you satisfy the Generalized Principal Principle, then $c(A | C_{ch}) = ch(A)$ . Footnote 7
So that’s the version of the Principal Principle that we’ll consider. And here’s the bridge principle: A credence function is irrational if there is another that is guaranteed to have greater expected accuracy from the point of view of the objective chance function. That is:
Chance Dominance. If there is $c^\star$ such that, for all possible chance functions ch, $\sum_i ch_i {\mathfrak{I}}(c^\star, i) \lt \sum_i ch_i {\mathfrak{I}}(c, i)$ , then c is irrational.
Thus, to move from the claim that an inaccuracy measure must satisfy continuity and strict propriety, together with Chance Dominance, to the Generalized Principal Principle, we need the following theorem:
Theorem 2 Suppose ${\mathfrak{I}}$ is continuous and strictly proper. If c is not in ${\textrm{cl}}({\cal{A}}^+)$ , there is $c^\star$ in ${\textrm{cl}}({\cal{A}}^+)$ , such that $\sum_i ch_i{\mathfrak{I}}(c^\star, i) \lt \sum_i ch_i{\mathfrak{I}}(c, i)$ , for all ch in ${\cal{A}}$ .
Again, we prove this in the appendix.
What’s more, Theorem 2 allows us to improve not only my argument for the Principal Principle, but also my argument for linear pooling as the correct method of probabilistic judgment aggregation (Pettigrew Reference Pettigrew2019a). In that argument, I appealed to the following bridge principle: If we have a group of individuals with probabilistic credence functions $c_1, \ldots, c_m$ , and we wish to find a credence function that aggregates them, then we should not pick an aggregate c if there is an alternative $c^\star$ such that each member of the group expects $c^\star$ to be more accurate than they expect c to be. I assumed that inaccuracy measures satisfy additivity, continuity, and strict propriety. And I showed that, if you use an aggregation method other than linear pooling, then there will be an alternative aggregate that everyone expects to be better; but if you use linear pooling, there won’t be. Armed with Theorem 2, we obtain the same result assuming only continuity and strict propriety and thereby strengthen my original argument.
4.3 Argument for Conditionalization and the Weak Reflection Principle
The version of conditionalization for which Greaves and Wallace as well as Briggs and I argue is this:
Plan Conditionalization Suppose your prior is $c_0$ and you know that your evidence will come from a partition $E_1, \ldots, E_m$ . And suppose you will learn a particular cell of this partition iff it is true. And suppose you plan to adopt credence function $c_k$ in response to evidence $E_k$ . Then, for all $E_k$ with $c_0(E_k) \gt 0$ and for all X in ${\cal{F}}$ , we should have
That is, you should plan to update by applying Bayes’ rule to your prior.
As we mentioned above, we won’t argue for Plan Conditionalization directly. Rather, we’ll argue for a more general principle of rationality called the Weak Reflection Principle; and we’ll show that it entails Plan Conditionalization (Pettigrew Reference Pettigrew2020).
Weak Reflection Principle. Suppose ${\cal{R}} = \{c_1, \ldots, c_m\}$ is the set of possible future credence functions you endorse. Then your current credence function should be in the convex hull of ${\cal{R}}$ .
Here’s the idea behind this principle. A lot might happen between now and tomorrow. You might see new sights, think new thoughts; you might forget things you know today, take mind-altering drugs that enhance or impair your thinking; and so on. So perhaps there is a set of credence functions, any one of which you think you might have tomorrow. Some of those you’ll endorse—perhaps those you’d adopt if you saw certain new things, or enhanced your cognition in various ways. And some of them you’ll disavow—perhaps those that you’d adopt if you were to forget certain things, or were to impair your cognition in some way. The Weak Reflection Principle asks you to separate out the wheat from the chaff, and once you’ve identified the ones you endorse, it tells you that your current credence function should lie within the convex hull of those future ones.
Now, suppose that you are in the situation that Plan Conditionalization covers. That is, (i) you know that you will receive evidence from the partition $E_1, \ldots, E_m$ , (ii) you will learn $E_k$ iff $E_k$ is true, and (iii) you form a plan for how to respond to these different possible pieces of evidence—you will adopt $c_1$ if you learn $E_1$ , $c_2$ if you learn $E_2$ , and so on. Thus, the possible future credence functions that you endorse are $c_1, \ldots, c_m$ , for they are the possible outcomes of a plan that you have made and which you know covers all the bases. Then, by the Weak Reflection Principle, $c_0$ should be in the convex hull of $c_1, \ldots, c_m$ . Then, if $c_k(E_k) = 1$ and $c_0(E_k) \gt 0$ , then $c_k(X) = c_0(X |E_k)$ , as Plan Conditionalization requires. After all, by the Weak Reflection Principle, there are $0 \leq \lambda_1, \ldots, \lambda_m \leq 1$ with $\sum^m_{i=1} \lambda_i = 1$ such that $c_0(X) = \sum^m_{i=1} \lambda_i c_i(X)$ for all X in ${\cal{F}}$ . What’s more, by assumption, $c_i(E_k) = 1$ if $i = k$ and $c_i(E_k) =0$ if $i \neq k$ . So, $c_i(XE_k) = c_i(X)$ if $i = k$ and $c_i(XE_k) =0$ if $i \neq k$ . So,
as required.
However, the Weak Reflection Principle applies in many other situations beyond those imagined by Plan Conditionalization. For instance, suppose you know you’ll receive evidence from $E_1, \ldots, E_m$ , but, while these propositions are mutually exclusive, they do not exhaust the logical space. Then again, if you plan to adopt $c_k$ upon learning $E_k$ , and $c_k(E_k) = 1$ and $c_0(E_k) \gt 0$ , then the Weak Reflection Principle says that you should plan so that $c_k(X) = c_0(X|E_k)$ . That follows from the working at the end of the previous paragraph. Or suppose your evidence is not perfectly reliable. So it’s not certain that you’ll learn $E_k$ iff $E_k$ is true. Then again, if you plan to adopt $c_k$ upon learning $E_k$ , and $c_k(E_k) = 1$ and $c_0(E_k) \gt 0$ , then the Weak Reflection Principle says that you should plan so that $c_k(X) = c_0(X|E_k)$ . Again, this follows from the working above. What’s more, suppose you know you’ll receive evidence from $E_1, \ldots, E_m$ , but you don’t make a deterministic plan. Perhaps you plan as follows: If I learn $E_1$ , then I’ll adopt $c_1$ or I’ll adopt $c'_1$ ; if I learn $E_2$ , then I’ll adopt $c_2$ or I’ll adopt $c'_2$ ; and so on. Then Plan Conditionalization no longer applies. But the Weak Reflection Principle does. It constrains your plans. It says that your current credence function should be in the convex hull of $c_1, c'_1, c_2, c'_2, \ldots, c_m, c'_m$ (Pettigrew Reference Pettigrew2019b).
So that’s the Weak Reflection Principle. Now for the bridge principle we use to establish it: It is irrational for you to have a prior credence function and a set of possible future credence functions you endorse if there is some alternative prior and, for each possible future credence function you endorse, an alternative to that, such that the sum of the inaccuracy of your prior and the inaccuracy of one of the possible future credence functions you endorse is always greater than the inaccuracy of the alternative prior and the corresponding alternative possible future credence function. That is:
Diachronic Worldly Dominance. Suppose $c_0$ is your current credence function and $c_1, \ldots, c_m$ are the possible future credence functions you endorse. Then if there are $c^\star_0, c^\star_1, \ldots, c^\star_m$ such that, for each $1 \leq k \leq m$ and all $w_i$ in ${\cal{W}}$ ,
then you are irrational.
Thus, to move from the claim that an inaccuracy measure must satisfy continuity and strict propriety, together with diachronic worldly dominance, to the Weak Reflection Principle, we need the following theorem:
Theorem 3 Suppose ${\mathfrak{I}}$ is continuous and strictly proper. If $c_0$ is not in ${\{c_1, \ldots, c_n\}}^+$ , then there are $c^\star_0, c^\star_1, \ldots, c^\star_n$ such that, for all $w_i$ in ${\cal{W}}$ ,
We prove this in the appendix. Note, however, as we mentioned above: While our new proof removes one way in which an additivity enters into the argument that Briggs and I gave, it doesn’t remove another. We still take the inaccuracy of your prior and your posterior, taken together, to be the sum of the inaccuracy of your prior and the inaccuracy of your posterior. It must await future work to explore whether that assumption might also be removed.
5. Conclusion
Accuracy arguments for the core Bayesian tenets differ mainly in the conditions they place on the legitimate inaccuracy measures. The best existing arguments rely on Predd et al.’s conditions: Continuity, additivity, and strict propriety. In this paper, I showed how to strengthen each argument based on these by showing that the central mathematical theorem on which it depends goes through without assuming additivity.
Acknowledgments
I am very grateful to Catrin Campbell-Moore and Jason Konek for very valuable discussions around this topic and to the two referees for this journal for extremely useful feedback.
6. Appendix: The proofs
6.1 Theorems 1 and 2
As will be obvious to anyone familiar with the proof strategy in (Predd et al. Reference Predd, Robert, Lieb, Osherson, Vincent Poor and Kulkarni2009), many of the ideas used here are adapted from that paper, and those in turn are adapted from insights in (Savage Reference Savage1971). Predd et al. proceed by proving a connection between additive and continuous strictly proper inaccuracy measures, on the one hand, and a sort of divergence between credence functions, on the other. A divergence from one credence function defined on ${\cal{F}}$ to another is a function ${\mathfrak{D}}: {\cal{C}} \times {\cal{C}} \rightarrow [0, \infty]$ such that
-
(i) if $c \neq c'$ , then ${\mathfrak{D}}(c, c') \gt 0$ ,
-
(ii) if $c = c'$ , then ${\mathfrak{D}}(c, c') = 0$ .
That is, the divergence from one credence function to another is always positive, while the divergence from a credence function to itself is always zero.
Given a continuous scoring rule ${\mathfrak{s}}_X$ , Predd et al. define the following function for $0 \leq x \leq 1$ :
And then, given an additive and continuous strictly proper inaccuracy measure ${\mathfrak{I}}$ that is generated by continuous strictly proper scoring rules ${\mathfrak{s}}_X$ for X in ${\cal{F}}$ , they define a divergence as follows:
They show that this is a species of divergence known as a Bregman divergence. What’s more, using a representation theorem due to Savage (Reference Savage1971), they show that, for any $w_i$ in ${\cal{W}}$ and c in ${\cal{C}}$ , ${\mathfrak{D}}(w^i, c) = {\mathfrak{I}}(c, i)$ . That is, the divergence from the ideal credence function at a world to a given credence function is the inaccuracy of the given credence function at that world. Having established this, Predd et al. can then appeal to various properties of Bregman divergences to establish their dominance result. In our proofs, since we do not assume additivity, it is not so straightforward to construct a Bregman divergence from our inaccuracy measures. Instead, we construct a restricted divergence, which gives the divergence from a probabilistic credence function to any credence function. We say that it is restricted because it doesn’t say how to measure the divergence from a nonprobabilistic credence function to another credence function. That is, our divergence is defined on ${\cal{P}} \times {\cal{C}}$ , not on ${\cal{C}} \times {\cal{C}}$ . As a result, the function we define is not a Bregman divergence. And, as a result of that, we must prove for ourselves that it has the properties needed to establish the various theorems that have previously been proved using the Bregman divergences that Predd et al. construct from additive inaccuracy measures. The following lemma defines this restricted divergence and describes four properties it boasts.
Lemma 4 Suppose ${\mathfrak{I}}$ is a strictly proper inaccuracy measure. Then define ${\mathfrak{D}}_{\mathfrak{I}}: {\cal{P}} \times {\cal{C}} \rightarrow [0, \infty]$ as follows:
Then:
-
(i) ${\mathfrak{D}}_{\mathfrak{I}}$ is a divergence. That is, ${\mathfrak{D}}_{\mathfrak{I}}(p, c) \geq 0$ for all p in ${\cal{P}}$ and c in ${\cal{C}}$ with equality iff $p = c$ .
-
(ii) ${\mathfrak{D}}_{\mathfrak{I}}(w^i, c) = {\mathfrak{I}}(c, i)$ , for all $1 \leq i \leq n$ .
-
(iii) ${\mathfrak{D}}_{\mathfrak{I}}$ is strictly convex in its first argument. That is, for all p, q in ${\cal{P}}$ and c in ${\cal{C}}$ and for all $0 \leq \lambda \leq 1$ ,
${\mathfrak{D}}_{\mathfrak{I}}(\lambda p + (1-\lambda) q, c) \lt \lambda {\mathfrak{D}}_{\mathfrak{I}}(p, c) + (1-\lambda) {\mathfrak{D}}_{\mathfrak{I}}(q, c)$ -
(iv) ${\mathfrak{D}}_{\mathfrak{I}}(p, c) \geq {\mathfrak{D}}_{\mathfrak{I}}(p, q) + {\mathfrak{D}}_{\mathfrak{I}}(q, c)$ iff $\sum^n_{i=1} (p_i - q_i)({\mathfrak{I}}(c, i) - {\mathfrak{I}}(q, i)) \geq 0$
Proof of Lemma 4.
-
(i) ${\mathfrak{I}}$ is strictly proper. So:
-
(a) if $c \neq p$ , $\sum_i p_i {\mathfrak{I}}(p, i) \lt \sum_i p_i {\mathfrak{I}}(c, i)$ ; and,
-
(b) if $c = p$ , $\sum_i p_i {\mathfrak{I}}(p, i) = \sum_i p_i {\mathfrak{I}}(c, i)$ .
-
So, ${\mathfrak{D}}_{\mathfrak{I}}(p, c) = \sum_i p_i {\mathfrak{I}}(c, i) - \sum_i p_i {\mathfrak{I}}(p, i) \geq 0$ with equality iff $c = p$ .
-
(ii)
$$\eqalign{{\mathfrak{D}}_{\mathfrak{I}}(w^i, c) = \quad\quad 0 \times {\mathfrak{I}}(c, 1) + \ldots + \quad\quad 0 \times {\mathfrak{I}}(c, i-1) + 1 \times {\mathfrak{I}}(c, i) + 0 \times {\mathfrak{I}}(c, i+1) + \ldots + \quad\quad 0 \times {\mathfrak{I}}(c, n) - \quad\quad 0 \times {\mathfrak{I}}(w^i, 1) - \ldots - \quad\quad 0 \times {\mathfrak{I}}(w^i, i-1) - 1 \times {\mathfrak{I}}(w^i, i) - 0 \times {\mathfrak{I}}(w^i, i+1) - \ldots - \quad\quad 0 \times {\mathfrak{I}}(w^i, n) = {\mathfrak{I}}(c, i)} $$
since ${\mathfrak{I}}(w^i, i) = 0$ .
-
(iii) Suppose p and q are in ${\cal{P}}$ , and suppose $0 \lt \lambda \lt 1$ . Then let $r = \lambda p + (1-\lambda) q$ . Then, since $\sum_i p_i{\mathfrak{I}}(c, i)$ is uniquely minimized, as a function of c, at $c = p$ , and $\sum_i q_i{\mathfrak{I}}(c, i)$ is uniquely minimized, as a function of c, at $c = q$ , we have
$$\matrix{{{ {\sum\limits_i {{p_i}} \mathfrak{I}} (p,i) \lt \sum\limits_i {{p_i}} \mathfrak{I}}(r,i)} \cr {{\sum\limits_i {{q_i}} \mathfrak{I}} (q,i) \lt \sum\limits_i {{q_i}} \mathfrak{I}}(r,i) \cr } $$Thus$\eqalign{\lambda [-\sum_i p_i {\mathfrak{I}}(p, i)] + (1-\lambda) [-\sum_i q_i {\mathfrak{I}}(q, i)] \gt\cr\lambda [-\sum_i p_i {\mathfrak{I}}(r, i)] + (1-\lambda) [-\sum_i q_i {\mathfrak{I}}(r, i)] =\cr-\sum_i r_i {\mathfrak{I}}(r, i)$Now, adding$$\eqalign{\lambda \sum_i p_i {\mathfrak{I}}(c, i) + (1-\lambda)\sum_i q_i{\mathfrak{I}}(c, i) = \cr\sum_i (\lambda p_i + (1-\lambda)q_i) {\mathfrak{I}}(c, i) = \sum_i r_i {\mathfrak{I}}(c, i)}$$to both sides gives
$$\eqalign{\lambda [\sum_i p_i {\mathfrak{I}}(c, i)-\sum_i p_i {\mathfrak{I}}(p, i)]+ \cr \quad (1-\lambda) [\sum_i q_i{\mathfrak{I}}(c, i)-\sum_i q_i {\mathfrak{I}}(q, i)] \gt\cr \quad\quad \sum_i r_i {\mathfrak{I}}(c, i)-\sum_i r_i {\mathfrak{I}}(r, i)} $$That is,$\lambda {\mathfrak{D}}_{\mathfrak{I}}(p, c) + (1-\lambda) {\mathfrak{D}}_{\mathfrak{I}}(q, c) \gt {\mathfrak{D}}_{\mathfrak{I}}(r, c) = {\mathfrak{D}}_{\mathfrak{I}}(\lambda p + (1-\lambda)q, c)$as required. -
(iv)
$$\eqalign{{\mathfrak{D}}_{\mathfrak{I}}(p, c) - {\mathfrak{D}}_{\mathfrak{I}}(p, q) - {\mathfrak{D}}_{\mathfrak{I}}(q, c) = \cr \quad \bigg[\sum_i p_i {\mathfrak{I}}(c, i) - \sum_i p_i {\mathfrak{I}}(p, i)] - \cr \quad\quad [\sum_i p_i {\mathfrak{I}}(q, i) - \sum_i p_i {\mathfrak{I}}(p, i)] - \cr \quad\quad\quad [\sum_i q_i {\mathfrak{I}}(c, i) - \sum_i q_i {\mathfrak{I}}(q, i)] = \cr \quad\quad\quad\quad \sum_i (p_i - q_i)({\mathfrak{I}}(c, i) - {\mathfrak{I}}(q, i))} $$
as required.
Lemma 5 Suppose ${\mathfrak{I}}$ is a continuous strictly proper inaccuracy measure. Suppose ${\cal{X}}$ is a closed convex subset of ${\cal{P}}$ . And suppose c is not in ${\cal{X}}$ . Then there is q in ${\cal{X}}$ such that
-
(i) ${\mathfrak{D}}_{\mathfrak{I}}(q, c) \lt {\mathfrak{D}}_{\mathfrak{I}}(p, c)$ for all $p \neq q$ in ${\cal{X}}$ .
-
(ii) For all p in ${\cal{X}}$ ,
$\sum^n_{i=1} (p_i - q_i)({\mathfrak{I}}(c, i) - {\mathfrak{I}}(q, i)) \geq 0$ -
(iii) For all p in ${\cal{X}}$ ,
${\mathfrak{D}}_{\mathfrak{I}}(p, c) \geq {\mathfrak{D}}_{\mathfrak{I}}(p, q) + {\mathfrak{D}}_{\mathfrak{I}}(q, c)$
Proof of Lemma 5. Suppose c is not in ${\cal{X}}$ . Then, since ${\cal{X}}$ is a closed convex set and since Lemma 4(iii) shows that ${\mathfrak{D}}_{\mathfrak{I}}$ is strictly convex in its first argument, there is a unique q in ${\cal{X}}$ that minimizes ${\mathfrak{D}}_{\mathfrak{I}}(x, c)$ as a function of x. So, as (i) requires, ${\mathfrak{D}}_{\mathfrak{I}}(q, c) \lt {\mathfrak{D}}_{\mathfrak{I}}(p, c)$ for all $p \neq q$ in ${\cal{X}}$ .
We now turn to proving (ii). We begin by observing that, since p, q are in ${\cal{P}}$ , since ${\cal{P}}$ is convex, and since ${\mathfrak{D}}_{\mathfrak{I}}(x, c)$ is minimized uniquely at $x = q$ , if $0 \lt \varepsilon \lt 1$ , then
Expanding that, we get
So
So
Now, since ${\mathfrak{I}}$ is strictly proper,
So, for all $\varepsilon \gt 0$ ,
So, since ${\mathfrak{I}}$ is continuous
which is what we wanted to show.
We now briefly sketch an alternative proof of (ii) that is available if ${\mathfrak{D}}_{\mathfrak{I}}$ is not only continuous in its first argument, but also differentiable in that argument. As before, we begin by observing that, since p, q are in ${\cal{P}}$ , since ${\cal{P}}$ is convex, and since ${\mathfrak{D}}_{\mathfrak{I}}(x, c)$ is minimized uniquely at $x = q$ , then, for all $0 \lt \varepsilon \lt 1$ ,
And so
And so, if it exists,
But, if it exists, the left-hand side is just the directional derivative of ${\mathfrak{D}}_{\mathfrak{I}}(q, c)$ with respect to q and relative to the vector $p-q$ . Footnote 8 And we know from a foundational result about directional derivatives that
(See (Rudin Reference Rudin1976, 217).) But we can also show that
After all,
since ${\mathfrak{I}}$ is strictly proper and therefore
for all k. So
as required, giving us (ii).
And finally, (iii) follows immediately from (ii) and Lemma 4(iv).
Finally, this allows us to prove Theorems 1 and 2.
Proof of Theorem 1. Suppose c is not in ${\cal{P}}$ . Then, by Lemma 5, there is $c^\star$ in ${\cal{P}}$ such that, for all p in ${\cal{P}}$ ,
So, in particular, since each $w_i$ is in ${\cal{P}}$ ,
But, since $c^\star$ is in ${\cal{P}}$ and c is not, $c^\star \neq c$ , and since Lemma 4(i) shows that ${\mathfrak{D}}_{\mathfrak{I}}$ is a divergence, ${\mathfrak{D}}_{\mathfrak{I}}(c^\star, c) \gt 0$ . So
So, by Lemma 4(ii), for all $w_i$ in ${\cal{W}}$ ,
as required.
Proof of Theorem 2. Suppose c is not in ${\textrm{cl}}({\cal{A}}^+)$ . Then, by Lemma 5, there is $c^\star$ such that, for all p in ${\textrm{cl}}({\cal{A}}^+)$ ,
So, in particular, since each possible chance function ch is in ${\textrm{cl}}({\cal{A}}^+)$ ,
But, since $c^\star$ is in ${\textrm{cl}}({\cal{A}}^+)$ and c is not, $c^\star \neq c$ , and since Lemma 4(i) shows that ${\mathfrak{D}}_{\mathfrak{I}}$ is a divergence, ${\mathfrak{D}}_{\mathfrak{I}}(c^\star, c) \gt 0$ . So,
Now,
-
${\mathfrak{D}}_{\mathfrak{I}}(ch, c) = \sum_i ch_i {\mathfrak{I}}(c, i) - \sum_i ch_i {\mathfrak{I}}(ch, i)$
-
${\mathfrak{D}}_{\mathfrak{I}}(ch, c^\star) = \sum_i ch_i {\mathfrak{I}}(c^\star, i) - \sum_i ch_i {\mathfrak{I}}(ch, i)$ ,
so
as required.
6.2. Proof of Theorem 3
To prove Theorem 3, we need a divergence not between one credence function and another, but between a sequence of $m+1$ credence functions and another sequence of $m+1$ credence functions. We create that in the natural way. That is, given $p^0, p^1, \ldots, p^m$ in ${\cal{P}}$ and $c^0, c^1, \ldots, c^m$ in ${\cal{C}}$ , the divergence from the former sequence to the latter is just the sum of the divergences from $p^0$ to $c^0$ , $p^1$ to $c^1$ , and so on. Thus:
Corollary 6 Suppose ${\mathfrak{I}}$ is a strictly proper inaccuracy measure. Then define ${\mathfrak{D}}_{\mathfrak{I}}: {\cal{P}}^{m+1} \times {\cal{C}}^{m+1} \rightarrow [0, \infty]$ as follows:
Then:
-
(i) ${\mathfrak{D}}_{\mathfrak{I}}$ is a divergence.
-
(ii) ${\mathfrak{D}}_{\mathfrak{I}}((w^i, c^1, \ldots, c^{k-1}, w^i, c^{k+1}, \ldots, c^m), (c^0, c^1, \ldots, c^m)) = {\mathfrak{I}}(c^0, i) + {\mathfrak{I}}(c^k, i)$ , for all $1 \leq k \leq m$ and $1 \leq i \leq n$ .
-
(iii) ${\mathfrak{D}}_{\mathfrak{I}}$ is strictly convex in its first argument.
Proof of Corollary 6. These follow immediately from lemma 4.
Corollary 7 Suppose ${\mathfrak{I}}$ is a continuous strictly proper inaccuracy measure. Suppose ${\cal{X}}$ is a closed convex subset of ${\cal{P}}^{m+1}$ . And suppose $(c^0, c^1, \ldots, c^m)$ is not in ${\cal{X}}$ . Then there is $(q^0, q^1, \ldots, q^m)$ in ${\cal{X}}$ such that
-
(i) $\sum^m_{k=0} {\mathfrak{D}}_{\mathfrak{I}}(q^k, c^k) \lt \sum^m_{k=0} {\mathfrak{D}}_{\mathfrak{I}}(p^k, c^k)$ for all $(p^0, p^1, \ldots, p^m) \neq (q^0, q^1, \ldots, q^m)$ in ${\cal{X}}$ .
-
(ii) For all $(p^0, p^1, \ldots, p^m)$ in ${\cal{X}}$ ,
$\sum\limits^m_{k=0} \left(\sum\limits^n_{i=1} (p^k_i - q^k_i)({\mathfrak{I}}(c^k, i) - {\mathfrak{I}}(q^k, i)) \right) \geq 0$ -
(iii) For all $(p^0, p^1, \ldots, p^m)$ in ${\cal{X}}$ ,
$\sum\limits^m_{k=0} {\mathfrak{D}}_{\mathfrak{I}}(p^k, c^k) \geq \sum\limits^m_{k=0}{\mathfrak{D}}_{\mathfrak{I}}(p^k, q^k) + \sum\limits^m_{k=0}{\mathfrak{D}}_{\mathfrak{I}}(q^k, c^k)$
Proof of Corollary 7. The proof strategy is exactly as for Lemma 5.
To prove theorem 3, we now need just one more result:
Lemma 8 Given $c^0, c^1, \ldots, c^m$ in ${\cal{P}}$ , let
Then,
-
(i) ${\cal{X}}^+ \subseteq {\cal{P}}^{m+1}$ .
-
(ii) If $c^0$ is not in the convex hull of $c^1, \ldots, c^m$ , then $(c^0, c^1, \ldots, c^m)$ is not in ${\cal{X}}^+$ .
Proof of Lemma 8. ${\cal{P}}$ is closed under taking mixtures, which gives us (i). We prove (ii) by proving the contrapositive. Suppose $(c^0, c^1, \ldots, c^m)$ is in ${\cal{X}}^+$ . Then there are $0 \leq \lambda_{i, k} \leq 1$ such that $\sum^n_{i=1}\sum^m_{k=1} \lambda_{i, k} = 1$ and
Thus,
and
So
So let $\lambda_k = \sum^n_{i=1} \lambda_{i, k}$ . Then, for $1 \leq k \leq m$ ,
And thus
as required.
Now we can turn to the proof of Theorem 3.
Proof of Theorem 3. If $c^0$ is not in the convex hull of $c^1, \ldots, c^m$ , then $(c^0, c^1, \ldots, c^m)$ is not in ${\cal{X}}^+$ . Thus, by Lemma 7, there is $(q^0, q^1, \ldots, q^m)$ such that, for all $(p^0, p^1, \ldots, p^m)$ in ${\cal{X}}^+$ ,
In particular, for $w_i$ in ${\cal{W}}$ and $1 \leq k \leq m$ ,
But
as required.