Hostname: page-component-586b7cd67f-t7czq Total loading time: 0 Render date: 2024-11-22T06:50:34.197Z Has data issue: false hasContentIssue false

Indifference to Anti-Humean Chances

Published online by Cambridge University Press:  10 March 2023

J. Dmitri Gallow*
Affiliation:
Dianoia Institute of Philosophy, Australian Catholic University, Fitzroy, Victoria, Australia
Rights & Permissions [Opens in a new window]

Abstract

An indifference principle says that your credences should be distributed uniformly over each of the possibilities you recognise. A chance deference principle says that your credences should be aligned with the chances. My thesis is that if we are anti-Humeans about chance, then these two principles are incompatible. Anti-Humeans think that it is possible for the actual frequencies to depart from the chances. As long as you recognise possibilities like this, you cannot both spread your credences evenly and defer to the chances. I discuss some weaker forms of indifference which will allow anti-Humeans to defer to the chances.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Canadian Journal of Philosophy

An indifference principle says that your initial, or ‘ur-prior,’ credences should be distributed uniformly over each of the possibilities you recognise. A chance deference principle says that your ur-prior credences should be aligned with the chances in the following sense: your ur-prior credence in P, given that the chance of P is x, should be x. My thesis is that if we are anti-Humeans about chance, then these two principles are incompatible with each other. Anti-Humeans think that it is possible—though unlikely—for the actual frequencies to depart from the chances. As long as you recognise possibilities like this, you cannot both invest equal credence in every possibility and defer to the chances. If your ur-prior credences are spread evenly over every possibility, then they will not defer to the chances; and if your ur-prior credences defer to the chances, they will not be spread evenly over every possibility.

In sections 13 below, I’ll introduce anti-Humeanism (AH), a principle of chance deference (CD), and an indifference principle (IP). The principles CD and IP both say something about what your credences should be like in the absence of any evidence—that is to say, they both impose constraints on your initial, or ‘ur-prior,’ credence function. In section 4, I’ll explain why, if we’re anti-Humeans, we cannot satisfy both of these constraints. In section 5, I’ll consider some anti-Humean responses to this incompatibility. In brief: anti-Humeans may retreat to a weaker indifference principle according to which you should invest equal credence in each categorical possibility—where a categorical possibility describes what happens but does not say what the objective chances are. Alternatively, anti-Humeans could move to an even weaker indifference principle which says only that you should give equal credence to evidentially symmetric propositions but does not say that, in the absence of evidence, each possibility you recognise is evidentially symmetric with every other. As I’ll explain in section 5, this weaker indifference principle has prominent advocates—for instance, Keynes (Reference Keynes1921) and White (Reference White, Gendler and Hawthorne2009). However, the indifference principle which I will call ‘IP’ is not some straw man. Authors like Pettigrew (Reference Pettigrew2016) argue for exactly this principle.

1. Anti-Humeanism

I’ll suppose that you have degrees of belief, or credences, defined over the sentences in some language. In the simplest case, this will be a truth-functional language with a finite number of atomic sentences, $ {A}_1,{A}_2,\dots, {A}_N $ . In a language like this, a ‘state description’ is a conjunction of the form $ \pm {A}_1\wedge \pm {A}_2\wedge \dots \wedge \pm {A}_N $ , where each $ \pm {A}_i $ is either the atomic sentence $ {A}_i $ or its negation. Let ‘ $ \Omega $ ’ be the set of state descriptions, and let $ \omega \in \Omega $ be some particular state description. If the language is truth functional, a state description will settle the truth-value of every other sentence in the language and every sentence in the language will be equivalent to some disjunction of state descriptions.

Instead of taking your credences to be defined over sentences in a language, we could instead start with a space of ‘possible worlds,’ $ \mathcal{W} $ , and take your credences to be defined over ‘propositions’ (sets of those possible worlds). Some defenders of indifference principles—Williamson (Reference Williamson2010), for instance—use the sentential framework, while others—Pettigrew (Reference Pettigrew2016), for instance—use the propositional framework. But the distinctions between these two frameworks won’t make a difference to my discussion here. For it is straightforward to translate between them; I’ll leave the details in this footnote.Footnote 1 Here, I’ll stick to the sentential framework, but the translation scheme allows everything I say about it to carry over to the propositional framework.

In the simplest case, your credences are defined over a simple truth-functional language. However, if you wish to entertain sentences about the chances—sentences like ‘ $ \mathcal{C}h(P)=x $ ’ (‘the objective chance of P is x’) or ‘ $ \mathcal{C}h= ch $ ’ (‘ch is the objective chance function’)—then we will need to consider a slightly more complicated language. We can generate an appropriately rich language by distinguishing two different kinds of atomic sentences, which I’ll call the ‘atoms’ and ‘chance hypotheses.’ The atoms are just the atomic sentences which aren’t chance hypotheses. Let the atoms be $ {A}_1,{A}_2,\dots, {A}_N $ . The chance hypotheses say what the objective chance function is—that is, the chance hypotheses are sentences of the form ‘ $ \mathcal{C}h= ch $ ’, for some probability function ch. Footnote 2 We then get the full language by taking the union of the chance hypotheses and the atoms, and closing the resulting set under negation and disjunction. We may recover sentences like ‘ $ \mathcal{C}h(P)=x $ ’, since they are equivalent to the disjunction ‘ $ {\bigvee}_{ch:ch(P)=x}\mathcal{C}h=ch $ .

With this richer language, we should change the way we think about a ‘state description.’ If $ ch $ and $ c{h}^{\prime } $ are two distinct probability functions, then both $ \mathcal{C}h= ch $ and $ \mathcal{C}h=c{h}^{\prime } $ will be atomic sentences. If we continue on with our old definition of a state description, there would be a state description which included both of these sentences as conjuncts. These sentences may be known a priori to be incompatible with each other. This creates two problems. In the first place, a state description is meant to represent an epistemic possibility. But if the state description is a priori false, then it is cannot be an epistemic possibility. In the second place, there is a problem for anyone who wishes to endorse an indifference principle like the one I’ll introduce in section 3. That principle says that every state description should receive the same credence; so, if we used this understanding of ‘state description,’ then this indifference principle would require that you to have a positive credence in something you may know a priori to be false.

Once we’ve got credences defined over chance hypotheses, we must change the way we think about state descriptions. Which change is appropriate may depend upon our metaphysical commitments. Consider the following example: we’re going to flip a coin N times, and then destroy it. No other coins have ever or will ever be flipped throughout the history of the universe. Now, suppose that you have credences defined over the atoms $ {H}_1,{H}_2,\dots, {H}_N $ , where $ {H}_i $ says that the ith flip landed heads. Now, the question is whether, in this application, we should count $ {H}_1\wedge {H}_2\wedge \dots \wedge {H}_N\wedge \mathcal{C}h= ch $ as a state description if $ ch\left({H}_i\right)=50\% $ , for each i. That is: should we have a state description which tells us that the coin landed heads every flip, and that the coin was fair? It’s quite natural to think yes—after all, it appears to be possible for a fair coin to land heads N times in a row. Sure, it’s unlikely, but that doesn’t make it impossible. This is what anti-Humeans think. But Humeans will disagree.

As I’m using the term here, Humeans think that the chance laws supervene upon the distribution of local matters of particular fact. According to Humeans, chances are something like executive summaries of what actually happens in the world. For instance, actual frequentism is a form of Humeanism. The actual frequentist thinks that the chances are just the actual frequencies. In our example, the actual frequentist will say that the chance of the coin landing heads on the ith flip is just the proportion of flips which actually land heads. Thus, if every flip lands heads, the chance of the ith flip landing heads would have to be 100%, and not 50%. For this reason, $ {H}_1\wedge {H}_2\wedge \dots \wedge {H}_N\wedge \mathcal{C}h= ch $ will be impossible unless $ ch\left({H}_i\right)=100\% $ .

Anti-Humeans disagree. According to them, the chance laws have an independent existence. Chance laws govern outcomes, but they are not reducible to outcomes. A crude analogy: for the anti-Humean, when outcomes are objectively chancy, God is rolling dice. Statements about the chances tell you something about the bias of God’s dice, but the chance laws don’t necessitate that those dice land any particular way, not even in the long run. So, for instance, anti-Humeans say that there’s no reason that a fair coin couldn’t land heads on every flip. More generally, they will say that there’s no reason some conjunction of (negations) of atoms $ \pm {A}_1\wedge \pm {A}_2\wedge \dots \wedge \pm {A}_N $ couldn’t be true even while the chance of it being true is minuscule. So, if we are anti-Humeans, then we should think that the conjunction $ \pm {A}_1\wedge \pm {A}_2\wedge \dots \wedge \pm {A}_N\wedge \mathcal{C}h= ch $ is possible—and we should count it as a state description—even if $ ch\left(\pm {A}_1\wedge \pm {A}_2\wedge \dots \wedge \pm {A}_N\right) $ is miniscule.

In general, given a language which consists of a set of atoms $ {A}_1,{A}_2,\dots, {A}_N $ and potential chance hypotheses $ \mathcal{C}h={ch}_1,\hskip0.3em \mathcal{C}h={ch}_2,\dots, \mathcal{C}h={ch}_M $ an anti-Humean should be happy to say that a ‘state description’ is any conjunction of the form

$$ \pm {A}_1\wedge \pm {A}_2\wedge \dots \wedge \pm {A}_N\wedge \pm \mathcal{C}h $$

where each $ \pm {A}_i $ is either the atom $ {A}_i $ or its negation, and $ \pm \mathcal{C}h $ is either one of the chance hypotheses $ \mathcal{C}h={ch}_i $ or else the negation of every chance hypotheses, $ {\bigwedge}_{i=1}^M\mathcal{C}h\ne {ch}_i $ . In the possible worlds framework, this means that for every chance hypothesis and every assignment of truth-values to the atoms, there is a possible world at which that chance hypothesis is true and that assignment of truth-values is realised.

For my purposes, I won’t need a thorough-going anti-Humeanism. Instead, I will need only the following consequence of it, which I will from here on out refer to as ‘anti-Humeanism,’ or ‘AH’:

Anti-Humeanism (AH): There is a pair of state descriptions which have the form $ \phi \wedge \mathcal{C}h= ch $ and $ \psi \wedge \mathcal{C}h= ch $ , where $ ch\left(\phi \right)\ne ch\left(\psi \right) $ .

Actual frequentists will want to deny this assumption in the case of Bernoulli processes like coin flips since any state description with the same chance function will have to have the same frequency of heads landings (or whatever), and the chance function will assign the same chance to state descriptions with the same frequencies. Nonetheless, AH is a very minimal form of anti-Humeanism. Even ‘best system’ Humeans like Lewis (Reference Lewis1994) will be happy to accept AH in many contexts.

2. Chance deference

The most prominent principle of chance deference is David Lewis’s (Reference Lewis and Jeffrey1980) ‘principal principle.’Footnote 3 The principal principle says something about a rational initial, or ur-prior, credence function $ C $ —the credence function it would be rational to have in the absence of any evidence.Footnote 4 In particular, it says: if $ P $ is any sentence,Footnote 5 $ t $ is some future time, $ {Ch}_t(P)=x $ says that the time $ t $ chance of $ P $ is $ x $ , for some real number $ x\in \left[0,1\right] $ , and $ E $ is any time $ t $ admissible evidence which is compatible with $ {Ch}_t(P)=x $ , then

$$ C\left(P|{Ch}_t(P)=x\wedge E\right)=x $$

The time $ t $ won’t be important in my discussion, so I’ll fix $ t $ to be some future time and omit explicit mention of $ t $ in the remainder. Likewise, the admissible evidence $ E $ won’t play any important role. I’ll assume only that information about the chances is itself admissible at $ t $ . If $ \mathcal{C}h= ch $ is compatible with $ \mathcal{C}h(P)=x $ , then $ ch(P)=x $ . And if $ ch(P)=x $ , then the conjunction $ \mathcal{C}h(P)=x\wedge \mathcal{C}h= ch $ is equivalent to $ \mathcal{C}h= ch $ . So if we set $ E=\mathcal{C}h= ch $ in the principal principle, we get the following:

Chance Deference (CD): Your ur-prior credence in $ P $ , given that the objective chance function is $ ch $ , should be $ ch(P) $ .

$$ C\left(P|\mathcal{C}h= ch\right)= ch(P) $$

CD governs your conditional credences; but I’ll suppose that these conditional credences place a constraint on your unconditional credences via the product rule, which says that for any sentences $ P $ and $ Q $ , your credence in $ P\wedge Q $ is equal to the product of your credence that $ P $ given $ Q $ and your credence that $ Q $ . So, if $ C\left(\mathcal{C}h= ch\right)>0 $ , CD implies that

(1) $$ \frac{C\left(P\wedge \mathcal{C}h= ch\right)}{C\left(\mathcal{C}h= ch\right)}= ch(P) $$

CD works well when there are at most a finite number of potential values for $ x $ . However, you may want your credences to be defined over uncountably many sentences of the form $ \mathcal{C}h(P)=x $ —one for each of the uncountably many real numbers, $ x $ , between $ 0 $ and $ 1 $ . In that case, CD may have to be generalised. I discuss this generalisation in the appendix, section B.2.

3. Indifference

The indifference principle I’ll be interested in here says that in the absence of evidence, you should give every state description precisely the same probability.

Indifference Principle (IP): For any two state descriptions, $ \omega $ and $ {\omega}^{\ast } $ , your ur-prior credence in $ \omega $ should be equal to your ur-prior credence in $ {\omega}^{\ast } $ , $ C\left(\omega \right)=C\left({\omega}^{\ast}\right) $ .

In the propositional framework, IP says that for any two possible worlds, $ w $ and $ {w}^{\ast } $ , your ur-prior credence in $ w $ must equal your ur-prior credence in $ {w}^{\ast } $ , $ C(w)=C\left({w}^{\ast}\right) $ .

Let me separate out two different theses which together imply IP:

Symmetry to Indifference (STI): If ‘ $ P $ ’ and ‘ $ Q $ ’ are evidentially symmetric, then your credence in ‘ $ P $ ’ should equal your credence in ‘ $ Q $ .

State Description Symmetry (SDS): Any two state descriptions are evidentially symmetric.

As I will discuss in section 5 below, not every defender of indifference principles has endorsed SDS. So not every defender of indifference principles has endorsed the principle I am here calling ‘IP.’ However, as I will also discuss in section 5, some prominent defenders of indifference principles have endorsed the stronger IP. Moreover, prominent arguments for indifference imply the stronger thesis IP, not just the weaker STI.

If there are a finite number of atomic sentences in your language, then there will be finitely many state descriptions in $ \Omega $ . However, if there are a countable infinity of atomic sentences, there will be uncountably many state descriptions. In this case, IP will be trivially satisfied as long as every state description is given a probability of zero. Nonetheless, there is another form of indifference which we may want to impose in this case. I discuss this stronger indifference principle in the appendix, section B.1.

4. The incompatibility

In this section, I’ll show that AH, CD, and IP are incompatible by assuming all three and deriving a contradiction. This will show that if we are anti-Humeans, we must choose between indifference and showing deference to the chances.

Assume AH, CD, and IP. By AH, there is a pair of state descriptions which have the form $ \phi \wedge \mathcal{C}h= ch $ and $ \psi \wedge \mathcal{C}h= ch $ , where $ ch\left(\phi \right)\ne ch\left(\psi \right) $ . Then,

$$ \frac{C\left(\phi \wedge \mathcal{C}h= ch\right)}{C\left(\psi \wedge \mathcal{C}h= ch\right)}=\frac{C\left(\phi \wedge \mathcal{C}h= ch\right)/C\left(\mathcal{C}h= ch\right)}{C\left(\psi \wedge \mathcal{C}h= ch\right)/C\left(\mathcal{C}h= ch\right)}=\frac{ch\left(\phi \right)}{ch\left(\psi \right)}\ne 1 $$

The final equality follows from (1), with ‘ $ \phi $ ’ and ‘ $ \psi $ ’ substituted in for ‘ $ P $ ’. As we saw in section 2, (1) follows from CD. It therefore follows from CD that

(2) $$ C\left(\phi \wedge \mathcal{C}h= ch\right)\ne C\left(\psi \wedge \mathcal{C}h= ch\right) $$

On the other hand, since IP requires that every state description get the same credence, it implies that

(3) $$ C\left(\phi \wedge \mathcal{C}h= ch\right)=C\left(\psi \wedge \mathcal{C}h= ch\right) $$

Contradiction.

So, if we assume AH, CD, and IP, we arrive at a contradiction. Assuming we are anti-Humeans, then, we face a choice between CD and IP. We cannot both be indifferent and show deference to the chances.

Perhaps this incompatibility only arises because we considered a finite number of chance hypotheses? In appendix B, I show that a similar incompatibility arises even if there are uncountably many chance hypotheses.

5. Further discussion

One kind of reaction to this incompatibility is to reject one of the principles and leave it at that. For instance, Humeans may say: so much the worse for anti-Humeanism! (As an anti-Humean myself, I am more inclined to see the foregoing as a reason to reject IP, though I won’t insist upon that here.) I won’t have anything further to say about this kind of reaction. However, there is a more moderate reaction which is worth discussing: anti-Humeans may wish to defer to the chances and still endorse some form of indifference principle. They may achieve this by weakening the principle IP.

As I formulated IP, it says that your credence in any state description must be equal to your credence in any other state description. And as I’ve understood it, a ‘state description’ specifies all the things your language is able to tell you about the world. It describes matters in as precise a detail as your language will permit. If $ \omega $ is a state description, then any other description of the world (in your language) is either entailed by $ \omega $ or incompatible with $ \omega $ . However, we might weaken IP by having it say that your credence in any categorical state description is the same as your credence in any other—where a categorical state description is logically weaker than a full state description. It specifies what happens, but fails to say what the objective chances of those happenings are. Such a weakening of IP need not conflict with CD and AH.

Again, suppose you have some collection of atomic sentences in your language: the ‘atoms’ $ {A}_1,{A}_2,\dots, {A}_N $ , and the ‘chance hypotheses’ $ \mathcal{C}h={ch}_1,\mathcal{C}h={ch}_2,\dots, \mathcal{C}h={ch}_M $ . Say that a ‘categorical state description’ describes the world in as rich a detail as the atoms (excluding the chance hypotheses) permit. That is, it is a conjunction of the form $ \pm {A}_1\wedge \pm {A}_2\wedge \dots \wedge \pm {A}_N $ , where each $ \pm {A}_i $ is either $ {A}_i $ or $ \neg {A}_i $ . Then consider the weakened principle WIP.

Weak Indifference Principle (WIP): For any two categorical state descriptions, $ \omega $ and $ {\omega}^{\ast } $ , your ur-prior credence in $ \omega $ should be equal to your ur-prior credence in $ {\omega}^{\ast } $ , $ C\left(\omega \right)=C\left({\omega}^{\ast}\right) $ .

We have not shown any conflict between AH, CD, and WIP. So why not simply restrict IP in this way so as to make it consistent with AH and CD?

For all I’ve shown, we could do so. However, I personally have a hard time seeing the philosophical motivation for accepting WIP while rejecting IP. By way of explanation, let me say something about what kind of constraint IP imposes on an ur-prior credence, and why its defenders have thought you should satisfy this constraint when you lack evidence. In general, a credence function will encode relations of evidential relevance. If your credence in $ P $ given $ Q $ is greater than your credence in $ P $ , this encodes the fact that you take $ Q $ to be evidence for $ P $ . The IP imposes a rather demanding constraint on what kinds of evidential relevance relations you’re permitted to recognise in the absence of evidence. It forbids taking any atomic sentence of your language to be evidence for any other atomic sentence of your language in the absence of evidence.

Williamson (Reference Williamson2010) justifies the ur-prior recommended by IP on the grounds that it is leads to maximally cautious actions: it “is on average the more cautious policy when it comes to risky decisions,” in the sense that it “minimises worst-case expected loss.”Footnote 6 Similarly, Pettigrew (Reference Pettigrew2016) argues for IP on the grounds that it is epistemically cautious: it minimises the worst case with respect to the accuracy of your beliefs. According to Pettigrew, “what is wrong with assigning greater credence to one possibility over another in the absence of evidence is that by doing so you risk greater inaccuracy than you need to risk. [If you violate IP, then] there is an alternative [ur-prior] credence function, namely the uniform distribution … that has lower inaccuracy in its worst-case scenario than you have in yours.”Footnote 7 Neither of these arguments depend in any way upon assumptions about the content of the atomic sentences in your language, nor whether they are about chance hypotheses. So I have a hard time seeing why we should find those arguments any less compelling when some of the atomic sentences in your language are chance hypotheses. Moreover, if we grant an exemption for the chance hypotheses, one wants to know why a similar exemption cannot be granted for other atomic sentences.

Suppose your language contains only the atomic sentences $ {B}_1, $ $ {B}_2, $ $ \dots, $ $ {B}_N $ , where $ {B}_i $ says that the $ i $ th raven is black. Some of us think that, even before receiving evidence, you should take the first $ k $ ravens being black to be evidence for the $ k+1 $ st raven being black. Some of us say that—even with this simple language, and even in the absence of evidence—your credence in $ {B}_N $ given $ {B}_1\wedge {B}_2\wedge \dots \wedge {B}_{N-1} $ should be greater than your unconditional credence in $ {B}_N $ . Both IP and WIP disagree. They say that with this simple language, before you have any evidence, you must not take the fact that the first $ N-1 $ ravens are black to be evidentially relevant to whether the $ N $ th raven is black. They say that your credence in the state description $ {B}_1\wedge {B}_2\wedge \dots \wedge {B}_{N-1}\wedge {B}_N $ (every raven is black) must be the same as your credence in the state description $ {B}_1\wedge {B}_2\wedge \dots \wedge {B}_{N-1}\wedge \neg {B}_N $ (every raven is black except for the last one). And if that’s so, then your credence that the $ N $ th raven is black given that the first $ N-1 $ ravens are black will be 1/2, which will be the same as your unconditional credence that the $ N $ th raven is black. (Exactly half of the state descriptions include ‘ $ {B}_N $ ’, and exactly half contain its negation.) So, if you satisfy either IP or WIP, then you won’t see the blackness of the first $ N-1 $ ravens as evidence for the $ N $ th raven being black.

More generally, IP requires that—in the absence of evidence—every atomic sentence is given a credence of 1/2, and every atomic sentence is probabilistically independent of every other. So it forbids recognising evidential relations between atomic sentences, unless you have evidence supporting those evidential relations. This imposes a kind of a priori inductive skepticism. It forbids an ur-prior from recognising many evidential relations typically recognised by inductive methods. It says that in the absence of evidence, it is irrational to take ‘John testifies that $ P $ ’ or ‘It appears that $ P $ ’ to be evidence for ‘ $ P $ ’.

Weakening IP to WIP makes an exception to the general rule of not recognising evidential relations between atomic sentences. Such an exception could, of course, be granted. But the reasons provided for IP by defenders like Jaynes (Reference Jaynes1957), Williamson (Reference Williamson2010), and Pettigrew (Reference Pettigrew2016) do not seem to motivate such an exemption. Take an anti-Humean ur-prior which satisfies CD by being more confident in state descriptions in which $ A\wedge \mathcal{C}h(A)=0.6 $ than it is in state descriptions in which $ \neg A\wedge \mathcal{C}h(A)=0.6 $ . This ur-prior builds in more information and so has greater entropy than one which satisfies IP by spreading its credence equally over all state descriptions. If we should minimise prior information about whether nature is uniform, whether testifiers are trustworthy, and whether appearances are deceiving, then why shouldn’t we also minimise information about whether the chances are accurate? If the outcome of a risky action depends upon whether $ A\wedge \mathcal{C}h(A)=0.6 $ , the ur-prior which satisfies CD will lead to less cautious actions than the one which satisfies IP. If we shouldn’t take incautious actions when it comes to whether nature is uniform, whether testifiers are trustworthy, and whether appearances are deceiving, then why should we take incautious actions when it comes to whether the chances are accurate? And, if we should minimise worst-case epistemic risk when it comes to whether nature is uniform, testifiers are trustworthy, and appearances are deceiving, why shouldn’t we also minimise worst-case epistemic risk when it comes to whether chance is accurate? I am not contending that there is no reason for a selective a priori inductive skepticism, according to which we have a priori grounds to trust in chance, but no a priori grounds to trust in regularities, testifiers, or our senses. I am contending that, to my knowledge, no such reason has been given.

Of course, we could weaken WIP further by allowing an ur-prior to build in assumptions about the uniformity of nature, as well as the reliability of testifiers and appearances. More generally, we could allow in any number of a priori rationality constraints, and say only that you should spread your ur-prior credences as evenly as possible subject to these constraints. That is: your ur-prior credences should be spread evenly, except when this conflicts with some other a priori norm of rationality.

Let me make four observations about a principle like this. Firstly, some authors who have defended indifference principles have a principle like this in mind. For instance, White (Reference White, Gendler and Hawthorne2009) defends the following, which he calls “the principle of indifference”:

If ‘ $ P $ ’ and ‘ $ Q $ ’ are evidentially symmetrical, then your credence in ‘ $ P $ ’ should equal your credence in ‘ $ Q $ .’

In section 3, I called this principle ‘Symmetry to Indifference’ (STI). If we combine STI with the assumption that in the absence of evidence, any two state descriptions are evidentially symmetrical (SDS), then we get back the principle IP. However, White is not committed to the evidential symmetry of state descriptions. When explaining what it takes for ‘ $ P $ ’ and ‘ $ Q $ ’ to be evidentially symmetrical, he makes it clear that this can include a priori reasons to think ‘ $ P $ ’ is more likely than ‘ $ Q $ .’ He writes: “I mean to understand evidence very broadly here to encompass whatever we have to go on in forming an opinion about the matter. This can include non-empirical evidence or reasons, if there are such” (Reference Eva2019, 161–62).

Secondly, several other authors who have defended indifference principles have the stronger thesis IP in mind. For instance, Pettigrew explicitly rejects White’s thesis, and, in its place, advocates the following stronger formulation:

Suppose that $ \mathrm{\mathcal{F}} $ is a finite, rank-complete set of propositions. If an agent has an initial credence function $ {c}_0 $ defined on $ \mathrm{\mathcal{F}} $ , then rationality requires that $ {c}_0 $ is the uniform distribution on $ \mathrm{\mathcal{F}} $ … [where the uniform distribution] assigns to each proposition the proportion of the possible worlds at which it is true. (Reference Pettigrew2016, 164)

Pettigrew formulates this principle in a framework where the arguments of your credence function are sets of possible worlds. But, as I explained in section 3 above, we may translate between a framework like this and a framework where the arguments of your credence function are sentences. Translating between the two frameworks, his requirement that $ \mathrm{\mathcal{F}} $ be finite is analogous to requiring that there are finitely many atomic sentences. (The notion of a ‘rank complete’ set is a slightly technical notion which is needed for Pettigrew’s theorem, but which isn’t relevant to our discussion here. Just note that given our translation scheme, this condition will be satisfied as long as your language is closed under negation and conjunction and you have a credence in every sentence in your language.Footnote 8) Given the translation, Pettigrew’s principle says exactly what IP does: your ur-prior should give every state description the same credence.Footnote 9

Thirdly, while there may be good reason for an anti-Humean to endorse STI while rejecting stronger principles like WIP and IP, the arguments of Williamson (Reference Williamson2010) and Pettigrew (Reference Pettigrew2016) do not support this more moderate position. An anti-Humean ur-prior which satisfies CD will lead to less cautious actions than one which satisfies IP. So adopting the weaker principles does not minimise worst-case expected loss. Since Williamson’s justification of IP appeals to a principle of minimising worst-case expected loss, that justification cannot be used to support the moderate position. Similarly, an anti-Humean ur-prior which satisfies CD will lead to less epistemic caution than one which satisfies IP. As Pettigrew taught us, if you satisfy CD, then there is an alternative ur-prior—namely, the uniform ur-prior—which has a lower inaccuracy in its worst-case scenario than you have in yours (assuming AH). Since Pettigrew’s justification of IP appeals to a principle which says that in the absence of evidence, you must minimise your worst-case inaccuracy, that justification cannot be used to support accepting STI while rejecting SDS either.

Finally, depending upon how exacting the other a priori norms of rationality are, there may be little to no work left over for STI to do. For instance, suppose that the other a priori norms of rationality pin down a precise rational credence in every state description. Then, STI would be vacuously satisfied—which is to say, it would impose no constraint at all. There would be no difference between it and a norm which says to spread your credence as unevenly as possible, given the (other) a priori rational norms.

In closing, it’s worth noting that there is another, less conservative, reaction to the incompatibility from section 4. Anti-Humeans may decide to abandon the framework which represents your degrees of confidence with a precise real-valued credence function $ C $ . In its place, they may wish to move to a framework in which your degrees of confidence are represented with a comparative confidence ordering, or a framework in which they are represented with an imprecise probability distribution. For the interested reader, I discuss these alternative reactions in appendix A. In brief: a similar incompatibility arises in both of these alternative frameworks.

6. In summation

In sum: anti-Humeans cannot accept both CD and IP. If they wish to spread their ur-prior credences evenly over each possibility they recognise, then they must not defer to the chances; if they wish to defer to the chances, they cannot spread their ur-prior credences evenly over each of the possibilities they recognise. We could slightly weaken IP to render it compatible with AH and CD, though I personally have a hard time seeing the philosophical motivation for this weakening. There is also an even weaker indifference principle anti-Humeans could satisfy while deferring to the chances. This principle allows an ur-prior credence distribution to be uneven as long as this unevenness is required by some other a priori requirement of rationality. It says merely that your ur-prior credences should be as even as the other requirements of rationality allow them to be. This principle does not conflict with AH and CD. While there may be good reason to endorse this weaker indifference principle, it is not supported by the arguments of Williamson and Pettigrew.

Acknowledgments

Thanks to Kevin Dorst and two anonymous reviewers at this journal for helpful feedback on this material.

J. Dmitri Gallow is senior research fellow at the Dianoia Institute of Philosophy. Before coming to Dianoia in 2020, he was assistant professor at the University of Pittsburgh and a Bersoff Faculty Fellow at New York University. He received his PhD in philosophy from the University of Michigan, Ann Arbor.

Appendix A. Indifference with comparative confidence and imprecision

I have been taking for granted a traditional Bayesian framework in which your degrees of confidence get represented with a precise probability function. However, there are other frameworks available, and these other frameworks afford us different ways of thinking about what’s involved in being ‘indifferent’ between state descriptions, and what’s involved in deferring to the chances. In this appendix, I’ll look at two alternative approaches: an approach which represents rational degrees of confidence with a comparative confidence ordering (section A.1), and an approach which represents rational degrees of confidence with an imprecise probability function (section A.2).

A.1 Comparative confidence

In this section, I’ll introduce ‘comparative confidence orderings.’ Representing rational doxastic states with these orderings allows us to formulate indifference principles which avoid the familiar objections to IP $ {}^{\infty } $ from section B.1.1. However, even with these orderings, we face an analogue of the incompatibility from section 4.

The most general kind of comparative confidence ordering is a conditional comparative confidence ordering. This is a binary relation between pairs of sentences, which we may write ‘ $ \left[A|E\right]\succcurlyeq \left[B|F\right] $ ,’ and give the interpretation that you are at least as confident in $ A $ given $ E $ as you are in $ B $ given $ F $ . From this ordering, we may recover an unconditional comparative confidence ordering by setting the ‘conditioning’ sentences equal to a tautology. That is, we assume that you think $ A $ is not less likely than $ B $ exactly when $ \left[A|\mathrm{T}\right]\succcurlyeq \left[B|\mathrm{T}\right] $ , which I will abbreviate with ‘ $ A\succcurlyeq B $ .’Footnote 10 As usual, we may stipulate that $ \left[A|E\right]\succ \left[B|F\right] $ iff $ \left[A|E\right]\succcurlyeq \left[B|F\right] $ and it’s not the case that $ \left[B|F\right]\succcurlyeq \left[A|E\right] $ . And we may stipulate that $ \left[A|E\right]\approx \left[B|F\right] $ iff $ \left[A|E\right]\succcurlyeq \left[B|F\right] $ and $ \left[B|F\right]\succcurlyeq \left[A|E\right] $ . For my purposes, I’ll only need to assume that $ \succcurlyeq $ is reflexive and transitive, and that $ A\wedge E\succcurlyeq B\wedge E $ whenever $ \left[A|E\right]\succcurlyeq \left[B|E\right] $ .Footnote 11

In this framework, the natural analogue of CD is this:

Comparative Chance Deference (CCD): In the absence of any evidence, for any sentences $ P $ , $ Q $ , and any $ ch $ : if $ ch(P)> ch(Q) $ , then, given that the chance function is $ ch $ , you should be more confident in $ P $ than $ Q $ ,

$$ \left[P|\mathcal{C}h= ch\right]\hskip0.3em \succ \hskip0.3em \left[Q|\mathcal{C}h= ch\right] $$

There are multiple ways we might try to formulate an indifference principle in this framework. Adapting a proposal from Norton (Reference Norton2008), we could say that you should be as confident in any one state description as you are in any other.

Comparative Indifference Principle (CIP): For any two state descriptions $ \omega $ and $ {\omega}^{\ast } $ , in the absence of evidence, you should be as confident in $ \omega $ as you are in $ {\omega}^{\ast } $ ,

$$ \omega \hskip0.3em \approx \hskip0.3em {\omega}^{\ast } $$

CIP is incompatible with CCD whenever there is a pair of state descriptions $ \phi \wedge \mathcal{C}h= ch $ and $ \psi \wedge \mathcal{C}h= ch $ such that $ ch\left(\phi \right)> ch\left(\psi \right) $ . For CCD tells us that

(4) $$ \left[\phi |\mathcal{C}h= ch\right]\hskip0.54em \succ \hskip0.54em \left[\psi |\mathcal{C}h= ch\right] $$

Given our assumptions, it follows from (4) that

$$ \phi \wedge \mathcal{C}h= ch\hskip0.54em \succ \hskip0.42em \psi \wedge \mathcal{C}h= ch $$

But, by CIP, we have that

$$ \phi \wedge \mathcal{C}h= ch\hskip0.3em \approx \hskip0.3em \psi \wedge \mathcal{C}h= ch $$

Contradiction.

Eva (Reference Eva2019) proposes another way of showing indifference. His suggestion is that in the absence of evidence, you shouldn’t have any comparative judgements about one state description being more or less likely than another. That is: rather than saying that any two states descriptions are equally likely, you should say that any two state descriptions are incomparable. Abbreviate ‘ ’ with ‘ $ A\odot B $ ’. Then, Eva’s proposal is this:

Comparative Indifference Principle′ (CIP′): For any two state descriptions, $ \omega $ and $ {\omega}^{\ast } $ , in the absence of evidence, you should not make any comparative confidence judgments about $ \omega $ and $ {\omega}^{\ast } $ ,

$$ \omega \odot {\omega}^{\ast } $$

CIP $ \hbox{'} $ is also incompatible with CCD for the same reason that CIP is. Again, consider a pair a state descriptions, $ \phi \wedge \mathcal{C}h= ch $ and $ \psi \wedge \mathcal{C}h= ch $ , where $ ch\left(\phi \right)> ch\left(\psi \right) $ . Again, CCD implies that

$$ \phi \wedge \mathcal{C}h= ch\hskip0.54em \succ \hskip0.54em \psi \wedge \mathcal{C}h= ch $$

whereas CIP $ {'} $ implies that

$$ \phi \wedge \mathcal{C}h= ch\hskip0.52em \odot \psi \mathcal{C}h= ch $$

Contradiction.

A.2 Imprecision

In this section, I’ll introduce imprecise credence functions. Representing rational doxastic states with imprecise credences allows us to formulate indifference principles that avoid the familiar objections to IP $ {}^{\infty } $ from section B.1.1. But imprecise credences will nonetheless give rise to an analogue of the incompatibility from section 4.

An imprecise credence function, $ \mathrm{\mathbb{C}} $ , is just a set of precise credence functions, with the interpretation that your doxastic state has all and only the features shared by every credence function in $ \mathrm{\mathbb{C}} $ . A helpful metaphor: each credence function $ C\in \mathrm{\mathbb{C}} $ is a ‘committee member’ who gets a vote in determining your doxastic state. Your doxastic state has a property iff the committee passes a motion saying that it has the property. Each committee member $ C\in \mathrm{\mathbb{C}} $ votes in favour of a motion saying your doxastic state has a certain property exactly when $ C $ has that property. And the committee only passes a motion when the vote is unanimous. For instance, suppose that, for every real number $ x $ between 1/3 and 2/3, there is a committee member whose credence in $ P $ is $ x $ . Then, the committee unanimously agrees that your confidence in $ P $ is between 1/3 and 2/3, though, when it comes to your confidence in $ P $ , it does not agree on anything stronger than this.Footnote 12

In this framework, it’s natural to impose a principle of chance deference by demanding that every committee member defers to the chances.

Imprecise Chance Deference (ICD): For every sentence $ P $ and every committee member $ C\in \mathrm{\mathbb{C}} $ , $ C $ ’s credence in $ P $ , conditional on the chance function being $ ch $ , is $ ch(P) $ .

$$ C\left(P|\mathcal{C}h= ch\right)= ch(P) $$

Corresponding to the set $ \mathrm{\mathbb{C}} $ is a set-valued function which we can write ‘ $ \mathrm{\mathbb{C}}(P) $ ,’ and which is defined to be the set of all real numbers $ x $ such that for some $ C\in \mathrm{\mathbb{C}} $ , $ C(P)=x $ . Likewise, we can let $ \mathrm{\mathbb{C}}\left(P|Q\right) $ be the set of real numbers $ x $ such that for some $ C\in \mathrm{\mathbb{C}} $ , $ C\left(P\hskip0.2em |\hskip0.2em Q\right)=x $ . In these terms, the principle ICD requires that $ \mathrm{\mathbb{C}}\left(P\hskip0.2em |\hskip0.2em \mathcal{C}h= ch\right)=\left\{ ch(P)\right\} $ .

Within this framework, your attitudes are maximally undecided exactly when they are maximally imprecise. That is: what it is for you to assume nothing at all about whether $ P $ is for your committee members to agree on nothing at all about your attitude towards $ P $ other than that it lies somewhere between $ 0 $ and $ 1 $ . That is: what it is for you to be maximally undecided about $ P $ is for $ \mathrm{\mathbb{C}}(P) $ to be the unit interval.

In general, indifference principles say that in the absence of evidence, your doxastic state should build in as little information as possible about which state description is true. So, in the imprecise framework, it is natural to formulate an indifference principle by saying that in the absence of evidence, you should be maximally undecided about every state description.

Imprecise Indifference Principle (IIP): For every state description $ \omega $ , in the absence of evidence, your credence in $ \omega $ should be maximally imprecise,

$$ \mathrm{\mathbb{C}}\left(\omega \right)=\left[0,1\right] $$

Assuming there is at least one state description $ \phi \wedge \mathcal{C}h= ch $ such that $ ch\left(\phi \right)=x<1 $ , ICD and IIP are incompatible. For ICD requires every committee member $ C\in \mathrm{\mathbb{C}} $ to give a credence of $ x $ to $ \phi $ , conditional on $ \mathcal{C}h= ch $ . This means that the greatest credence any committee member could give to $ \phi \wedge \mathcal{C}h= ch $ is $ x $ —any greater, and its credence in $ \phi $ , conditional on $ \mathcal{C}h= ch $ , would be greater than $ x $ , in violation of ICD.Footnote 13 So, if you satisfy ICD, you’ll have

$$ \mathrm{\mathbb{C}}\left(\phi \wedge \mathcal{C}h= ch\right)\subseteq \left[0,x\right] $$

However, IIP requires that

$$ \mathrm{\mathbb{C}}\left(\phi \wedge \mathcal{C}h= ch\right)=\left[0,1\right] $$

Since $ x<1 $ , there’s no way to satisfy both of these principles at once.

Joyce (Reference Joyce2010, 289–90) proposes another, weaker, way of understanding ‘indifference’ in the imprecise framework. He puts forward an imprecise analogue of principle I called ‘Symmetry to Indifference’ (STI) in section 3. According to this principle, given any partitionFootnote 14 of evidentially symmetric sentences, your attitude towards these sentences should be symmetric, in the sense that any committee member who deviates from a uniform distribution is ‘balanced out’ by committee members who deviate from the uniform distribution to the same degrees, but in different ways. More carefully:

Imprecise Symmetry to Indifference: If $ \mathrm{\mathcal{E}}=\left\{{E}_1,{E}_2,\dots, {E}_N\right\} $ is a partition such that for every $ {E}_i,{E}_j\in \mathrm{\mathcal{E}} $ , $ {E}_i $ and $ {E}_j $ are evidentially symmetric, then for any $ C\in \mathrm{\mathbb{C}} $ and any permutation $ p $ of $ \mathrm{\mathcal{E}} $ , there is some $ {C}^{\ast}\in \mathrm{\mathbb{C}} $ such that for each $ E\in \mathrm{\mathcal{E}} $ , $ {C}^{\ast}\left(p(E)\right)=C(E) $ .

If we combine this principle with the assumption that any two state descriptions are evidentially symmetric (SDS), we get the following.

Imprecise Indifference Principle′ (IIP′): For any $ C\in \mathrm{\mathbb{C}} $ and any permutation of state descriptions, $ p $ , there is some $ {C}^{\ast}\in \mathrm{\mathbb{C}} $ such that for every state description $ \omega $ , $ {C}^{\ast}\left(p\left(\omega \right)\right)=C\left(\omega \right) $ .

Assume that there is a pair of state descriptions, $ {\omega}_{\phi}\overset{\mathrm{def}}{=}\phi \wedge \mathcal{C}h= ch $ and $ {\omega}_{\psi}\overset{\mathrm{def}}{=}\psi \wedge \mathcal{C}h= ch $ such that $ ch\left(\phi \right)=z\cdot ch\left(\psi \right) $ for some $ z\ne 1 $ . Then, IIP will be incompatible with ICD. Without loss of generality, suppose that $ z>1 $ , so that $ ch\left(\phi \right)> ch\left(\psi \right) $ . By ICD, for every $ C\in \mathrm{\mathbb{C}} $ , $ C\left({\omega}_{\phi}\right)=z\cdot C\left({\omega}_{\psi}\right) $ . So every committee member gives a higher credence to $ {\omega}_{\phi } $ than they do to $ {\omega}_{\psi } $ .

(5) $$ \forall C\in \mathrm{\mathbb{C}}\hskip2em C\left({\omega}_{\phi}\right)>C\left({\omega}_{\psi}\right) $$

Now, consider a permutation $ p $ which swaps $ {\omega}_{\phi } $ with $ {\omega}_{\psi } $ but maps every other state description to itself. Since there’s some $ C\in \mathrm{\mathbb{C}} $ such that $ C\left({\omega}_{\phi}\right)>C\left({\omega}_{\psi}\right) $ (by 5), IIP requires that there’s another $ {C}^{\ast}\in \mathrm{\mathbb{C}} $ such that $ {C}^{\ast}\left({\omega}_{\psi}\right)={C}^{\ast}\left(p\left({\omega}_{\phi}\right)\right)=C\left({\omega}_{\phi}\right) $ and $ {C}^{\ast}\left({\omega}_{\phi}\right)={C}^{\ast}\left(p\left({\omega}_{\psi}\right)\right)=C\left({\omega}_{\psi}\right) $ . So IIP $ {'} $ requires that

(6) $$ \exists {C}^{\ast}\in \mathrm{\mathbb{C}}\hskip2em {C}^{\ast}\left({\omega}_{\psi}\right)>{C}^{\ast}\left({\omega}_{\phi}\right) $$

But (5) and (6) contradict each other.

B. Infinite indifference to anti-Humean chances

In this appendix, I will discuss whether the incompatibility between deference to the chances, indifference, and anti-Humeanism extends to contexts in which there are infinitely many state descriptions (or possible worlds). In section B.1, I introduce an infinitary analogue of IP, IP $ {}^{\infty } $ . In section B.2, I introduce an infinitary analogue of CD, CD $ {}^{\infty } $ . Then, in section B.3, I introduce an infinitary analogue of AH, AH $ {}^{\infty } $ , and I argue that accepting IP $ {}^{\infty } $ , CD $ {}^{\infty } $ , and AH $ {}^{\infty } $ leads to a contradiction.

B.1 Infinite indifference

If there are countably many atomic sentences, then a state description will be an infinitary conjunction, and there will be continuum-many state descriptions. If there are uncountably many state descriptions, then any nontrivial indifference principle will require us to impose additional structure on the set $ \Omega $ . We find some random variable, $ V $ , which maps every state description $ \omega \in \Omega $ to some real number, $ V\left(\omega \right)\in \mathrm{\mathbb{R}} $ . Then, we can assign to each value $ v $ in the range of the variable $ V $ a ‘credence density,’ $ {\rho}_V(v) $ . This density function doesn’t say what your credence that $ V=v $ is.Footnote 15 If you abide by IP, your credence that $ V $ takes on any particular value, $ v $ , will have to be zero. Instead, $ {\rho}_V(v) $ says how dense your credence is at $ V=v $ . Think about it like this: for any narrow interval $ \left[v,v+\varepsilon \right] $ , the ratio $ C\left(V\in \left[v,v+\varepsilon \right]\right)/\varepsilon $ is the density of your credence over the interval $ \left[v,v+\varepsilon \right] $ . By taking the limit of this ratio as $ \varepsilon $ goes to zero, we get the density of your credence at the point $ V=v $ , $ {\rho}_V(v) $ .

With a credence density function $ {\rho}_V $ , we can determine your credence distribution by integrating over $ {\rho}_V $ . For instance, your credence that $ V $ is between $ a $ and $ b $ will be given by $ {\int}_a^b{\rho}_V(v)\hskip0.2em \mathrm{d}v $ . And, in general, for any measurable set of values $ \mathbf{v} $ , your credence that $ V $ is within $ \mathbf{v} $ is given by $ {\int}_{\mathbf{v}}{\rho}_V(v)\hskip0.2em \mathrm{d}v $ .Footnote 16 Then, indifference may be implemented by saying that your credences should have a uniform density. That is: every value of $ v $ should have exactly the same credence density.

Infinitary Indifference Principle (IP $ {}^{\infty } $ ): Your credence density should be uniform.

For instance: consider a random variable $ U $ that tells us what percentage of space is unoccupied. $ U $ can take on values between $ 0 $ and $ 1 $ . Then, indifference requires that the density of your credence should be uniform over these values. This uniform credence density is shown in figure 1.

Figure 1. The uniform credence density over $ U $ . Your credence that $ U $ lies in the set $ \mathbf{u}=\left[1/4,1/2\right]\cup \left[3/4,1\right] $ is given by the integral $ {\int}_{\mathbf{u}}{\rho}_U(u)\hskip0.3em \mathrm{d}u $ , which is the area under the curve $ {\rho}_U(u) $ shown in grey.

B.1.1 Familiar objections to IP $ {}^{\infty } $

In this subsection, I’ll briefly review some familiar objections to IP $ {}^{\infty } $ that are orthogonal to my interests here. The uninterested reader should skip ahead to section B.2.

If $ \Omega $ is infinite, then there will inevitably be more than one way of parametrising the state descriptions in $ \Omega $ . For instance, consider the variable $ R $ , which gives the ratio of unoccupied space to occupied space. Like $ U $ , $ R $ maps each state description to some real number.Footnote 17 However, unlike $ U $ , $ R $ raises two pressing issues for IP $ {}^{\infty } $ . The first issue is that unlike $ U $ , the potential values of $ R $ are unbounded $ R $ could take on any value from $ 0 $ to $ \infty $ . So, if we demand that the density of your credence is uniform over $ R $ , then we will run into a conflict with normalisation. For the uniform density over $ R $ will either be positive for each $ r $ or else it will be zero for each $ r $ . If positive, $ {\rho}_R(r)=\alpha >0 $ , then your credence in $ \Omega $ will be $ {\int}_0^{\infty}\alpha =\infty $ . If zero, then your credence in $ \Omega $ will be $ {\int}_0^{\infty }0=0 $ . Either way, you will violate normalisation. In response, defenders of IP $ {}^{\infty } $ could allow that even if a perfectly uniform credence density is impossible, the density of your credences should still be sufficiently uniform.Footnote 18

The second issue: once it has pronounced on your credence density over $ U $ , IP $ {}^{\infty } $ has already pronounced on your credence density over $ R $ . For there is a logical relationship between the values of $ U $ and $ R $ : necessarily, $ R=U/\left(1-U\right) $ . But this means that a uniform credence density over $ U $ induces the following credence density over $ R $ : for each $ r\geqslant 0 $ , $ {\rho}_R(r)={\left(1+r\right)}^{-2} $ (if $ r<0 $ , then $ {\rho}_R(r)=0 $ ). This credence density is shown in figure 2. The second problem is just that this density is far from uniform. With this density function, your credence that $ R $ is between 0 and $ n $ is given by $ n/\left(n+1\right) $ . So, you will be 90% confident that $ R $ is between 0 and 9 and 99% confident that $ R $ is between 0 and 99.Footnote 19

Figure 2. A uniform credencegure density over $ U $ induces a nonuniform credence density over $ R=U/\left(1-U\right) $ .

Either IP $ {}^{\infty } $ should be applied to multiple parametrisations or else there is one privileged parametrisation to which it should be applied. In the first case, IP $ {}^{\infty } $ is outright inconsistent. In the second case, the principle is either language-dependent or arbitrary. Arbitrariness and language-dependence are better than contradiction, so I’ll suppose that defenders of IP $ {}^{\infty } $ think that there is some privileged parametrisation,Footnote 20 or that the requirements of rationality are language-dependent.Footnote 21

B.2 Infinite chances

Suppose you want your credences to be defined over uncountably many sentences of the form $ \mathcal{C}h(P)=x $ —one for each of the uncountably many real numbers $ x $ between 0 and 1. Then, as long as your credences are real valued, you’ll have to assign a credence of zero to uncountably many of the sentences $ \mathcal{C}h(P)=x $ . If your credence in $ \mathcal{C}h(P)=x $ is zero, then the product rule will not impose any constraint on the relationship between $ C\left(P\hskip0.2em |\hskip0.2em \mathcal{C}h(P)=x\right) $ and $ C\left(P\wedge \mathcal{C}h(P)=x\right) $ . Lewis was not concerned with this, because he allowed rational credences to take on infinitesimal values.Footnote 22 So he thought that even when you’re spreading your credences over uncountably many state descriptions, you needn’t give a credence of zero to any of them. If we agree with him about this, then perhaps CD is already general enough. But I’ve been persuaded that Lewis was wrong to rely upon infinitesimals.Footnote 23 If, like me, you want your credences to be real-valued, then you should be looking for a natural generalisation of CD for the case where you have credences over uncountably many chance sentences.

Even if your credence that the chance of $ P $ is $ x $ will be zero for any particular choice of $ x $ , your credence that the chance of $ P $ lies within an interval of values $ \left[x,x+\varepsilon \right] $ (with $ \varepsilon >0 $ ) can be nonzero, no matter how small the interval $ \left[x,x+\varepsilon \right] $ . So a natural generalisation of CD says that a rational ur-prior credence in $ P $ , given that the chance of $ P $ lies in some interval $ \left[x,x+\varepsilon \right] $ , is within the interval $ \left[x,x+\varepsilon \right] $ :

Infinitary Chance Deference (CD $ {}^{\infty } $ ): Your credence that $ P $ , given that the chance of $ P $ is between $ x $ and $ x+\varepsilon $ , should be between $ x $ and $ x+\varepsilon $ (for any $ \varepsilon >0 $ ).

$$ x\leqslant C\left(P|\mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)\leqslant x+\varepsilon $$

If your credence in $ P $ , given $ \mathcal{C}h(P)\in \left[x,x+\varepsilon \right] $ , is in the interval $ \left[x,x+\varepsilon \right] $ , then your credence in $ \neg P $ , given $ \mathcal{C}h(P)\in \left[x,x+\varepsilon \right] $ , is within the interval $ \left[1-x-\varepsilon, 1-x\right] $ :

$$ 1-x-\varepsilon \leqslant C\left(\neg P|\mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)\leqslant 1-x $$

As long as $ C\left(\mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)>0 $ for any positive $ \varepsilon $ no matter how small, it then follows from the product rule that for any $ \varepsilon >0 $ ,

$$ {\displaystyle \begin{array}{l}\hskip1.24em C\left(P\wedge \mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)\leqslant \frac{x+\varepsilon }{1-x-\varepsilon}\cdot C\left(\neg P\wedge \mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)\\ {}\mathrm{and}\hskip0.45em C\left(P\wedge \mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)\geqslant \frac{x}{1-x}\cdot C\left(\neg P\wedge \mathcal{C}h(P)\in \left[x,x+\varepsilon \right]\right)\end{array}} $$

Divide both sides of these inequalities by $ \varepsilon $ and take the limit as $ \varepsilon $ goes to zero. Thereby, we get that the density of your credence in the conjunction $ P\wedge \mathcal{C}h(P)=x $ must be $ x/\left(1-x\right) $ times the density of your credence in the conjunction $ \neg P\wedge \mathcal{C}h(P)=x $ ,

(7) $$ \rho \left(P\wedge \mathcal{C}h(P)=x\right)=\frac{x}{1-x}\cdot \rho \left(\neg P\wedge \mathcal{C}h(P)=x\right) $$

Equation 7 follows from CD $ {}^{\infty } $ . It will be important in section B.3 below.

B.3 Incompatibility

To keep matters simple, let’s suppose that there is just a single atom, $ A $ . Then, we may have for each $ x\in \left[0,1\right] $ a chance hypothesis $ \mathcal{C}h={ch}_x $ , where $ {ch}_x $ is a probability function defined over the sentences we get by taking the set $ \left\{A\right\} $ and closing it under negation and disjunction. Every such sentence will be equivalent to one of the following four: (1) $ A\wedge \neg A $ , (2) $ A\vee \neg A $ , (3) $ \neg A $ , and (4) $ A $ . Since chance is a probability function, we must have $ {ch}_x\left(A\wedge \neg A\right)=0 $ , $ {ch}_x\left(A\vee \neg A\right)=1 $ , and $ {ch}_x\left(\neg A\right)=1-{ch}_x(A) $ . So we may characterise each potential chance function $ {ch}_x $ with a single parameter, $ x $ , which is the probability $ {ch}_x $ assigns to the atom $ A $ .

In this context, I will take ‘anti-Humeanism’ to be the following thesis:

Infinitary Anti-Humeanism (AH $ {}^{\infty } $ ): For each $ x\in \left[0,1\right] $ , there are two corresponding state descriptions: $ A\wedge \mathcal{C}h={ch}_x $ and $ \neg A\wedge \mathcal{C}h={ch}_x $ .

In that case, there are uncountably many state descriptions in $ \Omega $ . To apply IP then, we must first parametrise these state descriptions by using an appropriate random variable from $ \Omega $ to $ \mathrm{\mathbb{R}} $ . We can encode the information of which chance hypothesis is true with a variable $ \mathcal{C}{h}_A $ , which maps a state description $ \omega \in \Omega $ to $ x $ iff the chance hypothesis $ \mathcal{C}h={ch}_x $ is included in $ \omega $ . But this variable on its own doesn’t tell us everything. Besides the chance of $ A $ , we also need to know whether $ A $ is true or false. I will encode this information with a variable $ {\mathbf{2}}_A $ , which maps a state description $ \omega \in \Omega $ to the value $ 2 $ iff $ A $ is included in $ \omega $ , and maps $ \omega $ to $ 0 $ if $ \neg A $ is included in $ \omega $ . We can then put these two pieces of information together with a variable $ V=\mathcal{C}{h}_A+{\mathbf{2}}_A $ . $ V $ tells us everything there is to tell about both the chance of $ A $ is and whether $ A $ is true or false. If $ V $ is between 0 and 1, then $ A $ is false and the chance of $ A $ is the value of $ V $ . If $ V $ is between 2 and 3, then $ A $ is true and the chance of $ A $ is $ V-2 $ .

What IP $ {}^{\infty } $ says will depend upon how we parametrise the state descriptions. (See the discussion from section B.1.1.) The parametrisation I’ve chosen here in terms of $ V $ is meant to be as natural as possible. It cleanly gives us exactly the information of whether $ A $ is true and what $ A $ ’s chance is, and the uniform distribution over that chance corresponds to the standard Lebesgue measure. Applying IP $ {}^{\infty } $ to this very natural parametrisation, it tells you to have the uniform credence density shown in figure 3.

Figure 3. The uniform density over $ V=\mathcal{C}{h}_A+{\mathbf{2}}_A $

But this is incompatible with CD $ {}^{\infty } $ . For CD $ {}^{\infty } $ requires that, for any $ v $ between 0 and 1,

(8) $$ {\rho}_V\left(v+2\right)=\frac{v}{1-v}\cdot {\rho}_V(v) $$

(Equation 8 follows from from equation 7, which itself follows from CD $ {}^{\infty } $ , as we saw in section B.2.) But the uniform credence density shown in figure 3 sets $ {\rho}_V\left(v+2\right)={\rho}_V(v)=1/2 $ for every value of $ v $ between 0 and 1. So the uniform credence density will violate equation 8 for every value of $ v $ other than $ v=1/2 $ . So the uniform credence density violates CD $ {}^{\infty } $ .

Footnotes

1 We may take each state description $ \omega \in \Omega $ to correspond to a possible world $ w\in \mathcal{W} $ . Each sentence is equivalent to some disjunction of state descriptions. So we may associate each sentence with the set of state descriptions in this disjunction, which we may in turn associate with a set of possible worlds, or a proposition. While this translation scheme gives us a surjective function from sentences to propositions (sets of possible worlds), the function is not a bijection. For there will be multiple sentences translated to the same proposition. Even so, any two sentences translated to the same proposition are equivalent. Since your ur-prior is a probability, it assigns equivalent sentences the same probability. Consider the equivalence classes of equivalent sentences. The proposed translation establishes a bijection between propositions and these equivalence classes. So the probability which an ur-prior gives to a proposition (a set of possible worlds) will correspond to the probability which an ur-prior gives to any sentence in the corresponding equivalence class. So we may go back and forth between the two frameworks.

2 Here we face a choice point. We could either take the potential chance functions ch to be defined only over the sentences in the language generated from the atoms $ {A}_1,{A}_2,\dots, {A}_N $ , or we could take them to be defined over every sentence in the language. If the objective chances may be uncertain about what the objective chances are, then this second option leads to cardinality worries—since, in general, the space of possible probability distributions over $ \Omega $ is larger than $ \Omega $ . However, anti-Humeans should be happy to assume that the objective chances are certain of what the objective chances are, so that, for each potential chance function ch, $ ch\left(\mathcal{C}h= ch\right)=1 $ . (This follows from Lewis’s ‘principal principle.’) This means that, even if we take the objective chances to be defined over sentences like ‘ $ \mathcal{C}h= ch $ ’, there will be exactly one such distribution for each function ch, and we avoid cardinality concerns.

3 For alternative chance deference principles, see Hall (Reference Hall1994), Ismael (Reference Ismael2008), Levinstein (Reference LevinsteinForthcoming), and Dorst et al. (Reference Dorst, Levinstein, Salow, Husic and Fitelson2021), for instance.

4 I will assume that $ C $ is a probability, by which I mean: (1) $ C(P)\geqslant 0 $ for every $ P $ ; (2) if $ P $ is a priori knowable, then $ C(P)=1 $ ; and (3) if it is a priori knowable that no two of $ {P}_1,{P}_2,\dots $ are true at once, then $ C\left({P}_1\vee {P}_2\vee \dots \right)=C\left({P}_1\right)+C\left({P}_2\right)+\dots $ .

5 Lewis assumes that the arguments of your credence function are propositions. Since I’m assuming here that the arguments of your credence function are sentences, I’ve slightly emended his principal principle. Given the translation scheme from section 3, the formulation is equivalent.

6 Williamson (Reference Williamson2010, 62, 65). See Williamson (Reference Williamson2010, sec. 3.4.4) for more.

7 Pettigrew (Reference Pettigrew2016, 164). See Pettigrew (Reference Pettigrew2016, part III) for more.

8 For the curious: this is what it is for $ \mathrm{\mathcal{F}} $ to be rank complete: if there is a proposition $ P\in \mathrm{\mathcal{F}} $ that contains $ N $ possible worlds, then every other set of $ N $ worlds is also included in $ \mathrm{\mathcal{F}} $ .

9 I believe that Williamson (Reference Williamson2010) also endorses the principle I’ve called ‘IP,’ though this is more difficult to establish exegetically since it hinges upon whether Williamson understands ‘evidence’ to include a priori knowledge, and the text says very little about evidence. In any case, Williamson rejects CD, accepting instead a diachronic norm which says that, upon learning that the chance of $ P $ is $ x $ , your credence in $ P $ should be $ x $ . This means, by the way, that Williamson rejects the diachronic norm of conditionalisation.

10 See Fine (Reference Fine1973, chap. 2), and the references contained therein.

11 This follows from Fine’s qcc7 (1973, 30).

12 For more comprehensive and thorough introductions to imprecise credences, see van Fraassen (Reference van Fraassen, Dunn and Gupta1990, Reference van Fraassen2006), Walley (Reference Walley1991), Seidenfeld and Wasserman (Reference Seidenfeld and Wasserman1993), Joyce (Reference Joyce2010), Schoenfield (Reference Schoenfield2017), and Moss (Reference Moss2020), and others.

13 More carefully: $ C\left(\phi |\mathcal{C}h= ch\right) $ is the ratio $ C\left(\phi \wedge \mathcal{C}h= ch\right)/C\left(\mathcal{C}h= ch\right) $ . Since ICD says this ratio must be equal to $ x $ , we have that $ C\left(\phi \wedge \mathcal{C}h= ch\right)=x\cdot C\left(\mathcal{C}h= ch\right) $ . By setting $ C\left(\mathcal{C}h= ch\right) $ equal to 1, we may set $ C\left(\phi \wedge \mathcal{C}h= ch\right) $ equal to $ x $ , but if $ C\left(\mathcal{C}h= ch\right) $ is any lower than 1, $ C\left(\phi \wedge \mathcal{C}h= ch\right) $ will be less than $ x $ .

14 For our purposes, we can take a partition to be a set of sentences such that no sentence in the set is knowable a priori to be false, and such that it is knowable a priori that the set contains exactly one truth.

15 Notation: ‘ $ V=v $ ’ is the disjunction of state descriptions which $ V $ maps to $ v $ , $ V=v:= {\vee}_{\omega \in \Omega :V\left(\omega \right)=v}\omega $ .

16 In general, we could characterise the possibilities in $ \Omega $ with any finite number of real-valued variables, $ {V}_1,{V}_2,\dots, {V}_N $ . Then, instead of having a density function on $ \mathrm{\mathbb{R}} $ , we’d have a density function on $ {\mathrm{\mathbb{R}}}^N $ . However, we won’t require these additional complications here.

17 More carefully, it maps each possibility to an extended real number—if no space is occupied, then we will stipulate that $ R=\infty $ .

18 Cf. Williamson (Reference Williamson2010).

19 Versions of this problem appear in Bertrand (Reference Bertran1889). For more recent philosophical discussion, see van Fraassen (Reference van Fraassen1989).

20 This is the route taken by White (Reference White, Gendler and Hawthorne2009)—though White does not endorse IP $ {}^{\infty } $ (see section 5 above).

21 This is the route taken by Williamson (Reference Williamson2010).

22 See Lewis (Reference Lewis and Jeffrey1980, 267–68).

23 See Williamson (Reference Williamson2007), Easwaran (Reference Easwaran2014), and Hájek (Ms., section 7).

References

Bertran, Joseph. 1889. Calcul des Probabilités. Paris: Gauthier-Villars.Google Scholar
Dorst, Kevin, Levinstein, Benjamin A., Salow, Bernhard, Husic, Brooke E., and Fitelson, Branden. 2021. “Deference Done Better.” Philosophical Perspectives 35 (1): 99150.CrossRefGoogle Scholar
Easwaran, Kenny. 2014. “Regularity and Hyperreal Credences.” The Philosophical Review 123 (1): 141.CrossRefGoogle Scholar
Eva, Benjamin. 2019. “Principles of Indifference.” The Journal of Philosophy 116 (7): 390411.CrossRefGoogle Scholar
Fine, Terrence L. 1973. Theories of Probability: An Examination of Foundations. New York: Academic Press.Google Scholar
Hájek, Alan. Ms. “Staying Regular?” http://hplms.berkeley.edu/HajekStayingRegular.pdf.Google Scholar
Hall, Ned. 1994. “Correcting the Guide to Objective Chance.” Mind 103 (412): 505–17.CrossRefGoogle Scholar
Ismael, Jenann. 2008. “Raid! Dissolving the Big, Bad Bug.” Noûs 42 (2): 292307.CrossRefGoogle Scholar
Jaynes, E. T. 1957. “Information Theory and Statistical Mechanics.” Physical Review 106 (4): 620–30.CrossRefGoogle Scholar
Joyce, James M. 2010. “A Defense of Imprecise Credences in Inference and Decision Making.” Philosophical Perspectives 24 (1): 281323.CrossRefGoogle Scholar
Keynes, John Maynard. 1921. A Treatise on Probability. New York: Harper & Row.Google Scholar
Levinstein, Ben. Forthcoming. “Accuracy, Deference, and Chance.” The Philosophical Review.Google Scholar
Lewis, David K. 1980. “A Subjectivist’s Guide to Objective Chance.” In Studies in Inductive Logic and Probability, Volume II, edited by Jeffrey, Richard C., 263–93. Berkeley: University of California Press.CrossRefGoogle Scholar
Lewis, David K. 1994. “Humean Supervenience Debugged.” Mind 103 (412): 473–90.CrossRefGoogle Scholar
Moss, Sarah. 2020. “Global Constraints on Imprecise Credences: Solving Reflection Violations, Belief Inertia, and Other Puzzles.” Philosophy and Phenomenological Research 103 (3): 620–38.CrossRefGoogle Scholar
Norton, John D. 2008. “Ignorance and Indifference.” Philosophy of Science 75 (1): 4568.CrossRefGoogle Scholar
Pettigrew, Richard. 2016. Accuracy and the Laws of Credence. Oxford: Oxford University Press.CrossRefGoogle Scholar
Schoenfield, Miriam. 2017. “The Accuracy and Rationality of Imprecise Credences.” Noûs 51 (4): 667–85.CrossRefGoogle Scholar
Seidenfeld, Teddy, and Wasserman, Larry. 1993. “Dilation for Sets of Probabilities.” Annals of Statistics 21: 1139–54.CrossRefGoogle Scholar
van Fraassen, Bas C. 1989. Laws and Symmetry. Oxford: Oxford University Press.CrossRefGoogle Scholar
van Fraassen, Bas C. 1990. “Figures in a Probability Landscape.” In Truth or Consequences: Essays in Honor of Nuel Belnap, edited by Dunn, J. Michael and Gupta, Anil, 345–56. Dordrecht, Nether.: Kluwer Academic Publishers.CrossRefGoogle Scholar
van Fraassen, Bas C. 2006. “Vague Expectation Value Loss.” Philosophical Studies 127: 483–91.CrossRefGoogle Scholar
Walley, Peter. 1991. Statistical Reasoning with Imprecise Probabilities. London: Chapman & Hall.CrossRefGoogle Scholar
White, Roger. 2009. “Evidential Symmetry and Mushy Credence.” In Oxford Studies in Epistemology, edited by Gendler, Tamar Szabo and Hawthorne, John, 161–86. Oxford: Oxford University Press.Google Scholar
Williamson, Jon. 2010. In Defense of Objective Bayesianism. Oxford: Oxford University Press.CrossRefGoogle Scholar
Williamson, Timothy. 2007. “How Probable Is an Infinite Sequence of Heads?Analysis 67 (3): 173–80.CrossRefGoogle Scholar
Figure 0

Figure 1. The uniform credence density over $ U $. Your credence that $ U $ lies in the set $ \mathbf{u}=\left[1/4,1/2\right]\cup \left[3/4,1\right] $ is given by the integral $ {\int}_{\mathbf{u}}{\rho}_U(u)\hskip0.3em \mathrm{d}u $, which is the area under the curve $ {\rho}_U(u) $ shown in grey.

Figure 1

Figure 2. A uniform credencegure density over $ U $ induces a nonuniform credence density over $ R=U/\left(1-U\right) $.

Figure 2

Figure 3. The uniform density over $ V=\mathcal{C}{h}_A+{\mathbf{2}}_A $