Hostname: page-component-78c5997874-m6dg7 Total loading time: 0 Render date: 2024-11-19T13:04:37.348Z Has data issue: false hasContentIssue false

NON-FACTIVE KOLMOGOROV CONDITIONALIZATION

Published online by Cambridge University Press:  31 October 2023

MICHAEL RESCORLA*
Affiliation:
DEPARTMENT OF PHILOSOPHY UNIVERSITY OF CALIFORNIA, LOS ANGELES 390 PORTOLA PLAZA LOS ANGELES, CA 90095, USA
Rights & Permissions [Opens in a new window]

Abstract

Kolmogorov conditionalization is a strategy for updating credences based on propositions that have initial probability 0. I explore the connection between Kolmogorov conditionalization and Dutch books. Previous discussions of the connection rely crucially upon a factivity assumption: they assume that the agent updates credences based on true propositions. The factivity assumption discounts cases of misplaced certainty, i.e., cases where the agent invests credence 1 in a falsehood. Yet misplaced certainty arises routinely in scientific and philosophical applications of Bayesian decision theory. I prove a non-factive Dutch book theorem and converse Dutch book theorem for Kolmogorov conditionalization. The theorems do not rely upon the factivity assumption, so they establish that Kolmogorov conditionalization has unique pragmatic virtues that persist even in cases of misplaced certainty.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of The Association for Symbolic Logic

1. Conditioning on a probability zero proposition

Bayesian decision theory models how the credences of an idealized rational agent evolve over time. Suppose that the agent has initial credences P and subsequently transitions to new credences ${P}_{new}$ . The agent has conditionalized on E when

  1. (1) ${P}_{new}(.)=P(.|E)$ ,

where $P(H|E)$ is the initial conditional probability of H given E. A conditioning proposition is a proposition on which the agent has conditionalized.

Philosophers usually confine attention to scenarios where the conditioning proposition E has non-zero initial probability. Conditional probabilities then satisfy the ratio formula

$$\begin{align*}P(H|E)=\frac{P(H\&E)}{P(E)},\end{align*}$$

which yields the following recipe for conditionalizing on E:

  1. (2) ${P}_{new}(H)=\dfrac{P(H\&E)}{P(E)}.$

When an agent updates her credences through (2), I will say that she engages in ratio conditionalization. Using ratio conditionalization, we can model a wide range of credal reallocation scenarios.

The ratio formula is ill-defined when $P(E)=0$ . Yet cases where $P(E)=0$ arise frequently within scientific applications. To illustrate, let X be a random variable with continuum many values. Orthodox probability theory demands that $P(X=x)=0$ for all but countably many values x. Conditionalization on propositions of the form $X=x$ occurs routinely in statistics [Reference Ghosal and van der Vaart14], physics [Reference Trotta47], robotics [Reference Thrun, Burgard and Fox46], economics [Reference Kiefer, Nyarko, Kirman and Salmon23], and virtually every other scientific discipline that uses Bayesian inference. For example, a navigator might estimate her position based on her distance from a familiar landmark; an astronomer might estimate when a comet will reach Earth given its current velocity; and so on.

Say that an agent engages in null updating when she updates her credences based on a proposition E such that $P(E)=0$ . Since null updating plays an important role in scientific practice, any complete foundation for Bayesian decision theory should address it. We must look beyond the ratio formula for a more general analysis of conditional probability.

The good news is that Kolmogorov [Reference Kolmogorov24] already offered a satisfying analysis in the same treatise where he laid measure-theoretic foundations for probability. Kolmogorov’s analysis hinges upon the notion of a regular conditional distribution (rcd). In the decades since Kolmogorov’s discussion, rcds have come to figure indispensably in probability theory [Reference Billingsley2, Reference Rao34] and Bayesian statistics [Reference Florens, Mouchart and Rolin11, Reference Ghosal and van der Vaart14]. The bad news is that philosophical inquiry into rcds has lagged behind mathematical and scientific practice. For many decades, rcds received almost no philosophical attention save for the occasional hostile dismissal (e.g., [Reference Hájek16, Reference Hájek, Bandyopadhyay and Forster18, Reference Howson19, Reference Myrvold30, Reference Seidenfeld, Hendricks, Pedersen and Jørgensen41, Reference Seidenfeld, Schervish and Kadane42]).

Lately, some philosophers have explored rcds in a more sympathetic vein (e.g., [Reference Easwaran8Reference Easwaran, Pettigrew and Weisberg10, Reference Gyenis, Hofer-Szabó and Rédei15, Reference Huttegger20Reference Huttegger and Nielsen22, Reference Meehan and Zhang27, Reference Meehan and Zhang28, Reference Nielsen31, Reference Rescorla35, Reference Rescorla36, Reference Rescorla40]). A running theme is that rcds support a kind of conditionalization, sometimes called Kolmogorov conditionalization. Recent work has explored how Kolmogorov conditionalization works and why one might find it an attractive credal update strategy. The present paper contributes to this initiative by probing the connection between Kolmogorov conditionalization and Dutch books.

A Dutch book is a set of acceptable bets with a guaranteed net loss. You are Dutch bookable if it is possible to rig a Dutch book against you. Dutch bookability is a very undesirable property. When you are Dutch bookable, a sufficiently clever bookie can pump you for money by offering you a series of bets that you accept. Thus, it is somehow bad to be Dutch bookable and somehow good to render yourself immune to Dutch books.

Dutch books have long played a foundational role in Bayesian decision theory. Ramsey [Reference Ramsey and Braithwaite33] and de Finetti [Reference de Finetti, Kyburg and Smokler6] independently proved that, under ancillary assumptions, agents whose credences violate the probability calculus axioms are Dutch bookable. Lewis proved a Dutch book theorem for conditionalization. He showed that one can rig a diachronic Dutch book (a Dutch book with bets at different times) against an agent who follows any update rule other than conditionalization. Although Lewis did not initially publish his proof, it became well-known due to Teller’s [Reference Teller45] exposition, and Lewis [Reference Lewis26] eventually published his own treatment. Skyrms [Reference Skyrms43] proved a converse theorem: conditionalizers who obey the probability calculus axioms are not Dutch bookable.

Lewis and Skyrms confined attention to learning scenarios where the conditioning proposition has non-zero initial probability. For that reason, their theorems have fairly limited scope when compared with scientific applications of the Bayesian framework. [Reference Rescorla36] extends the Lewis–Skyrms theorems to numerous learning scenarios that feature null updating. The extended theorems show that, in those learning scenarios, Kolmogorov conditionalization is the unique update strategy that immunizes the agent from Dutch books.

Unfortunately, the aforementioned diachronic Dutch book theorems rely heavily upon a questionable factivity assumption: they assume that the agent updates her credences based only on true propositions. The factivity assumption discounts cases of misplaced certainty, i.e., cases where the agent invests credence 1 in a falsehood. Misplaced certainty arises routinely in scientific and philosophical applications of the Bayesian framework. A navigator who estimates her position based on her distance from a landmark may mismeasure the distance; an astronomer estimating a comet’s arrival time based upon its velocity may miscalculate the comet’s velocity; and so on. Mistakes may arise through carelessness, deception, faulty measurement, or various other factors. Given the omnipresent potential for misplaced certainty, a complete treatment should not presuppose the factivity assumption [Reference Rescorla37]. A complete treatment should be non-factive.

In this paper, I will generalize the results from [Reference Rescorla36] to a non-factive setting. I will consider a class of learning scenarios where the conditioning proposition may be false and may have non-zero initial probability. I will show that, in those learning scenarios, Kolmogorov conditionalization is the unique update strategy that guards against Dutch books. Thus, conditionalization’s pragmatic virtues persist when we allow non-factive null updating.

Section 2 reviews the basics of rcds. Section 3 discusses how rcds support Kolmogorov conditionalization. Section 4 states and proves the non-factive Dutch book theorem for Kolmogorov conditionalization. Section 5 concludes that Kolmogorov conditionalization has desirable pragmatic features distinguishing it from rival credal reallocation strategies.

2. Regular conditional distributions

Probability theory as formalized by Kolmogorov studies a probability space $(\Omega, \mathcal{F},P)$ , where $\Omega$ is a set, $\mathcal{F}$ is a $\sigma$ -field over $\Omega$ , and P is a probability measure on $\mathcal{F}$ . Elements of $\Omega$ are outcomes. Elements of $\mathcal{F}$ are events. For purposes of this paper, we use $(\Omega, \mathcal{F},P)$ to model an agent’s credences. Events serve as the objects of credence. The real number assigned by P to event $A\in \mathcal{F}$ measures the agent’s degree of belief in A. Since events serve as the objects of credence, they play one role traditionally assigned by philosophers to propositions. However, we need not assume that events have all the features traditionally ascribed to propositions.

In a measure-theoretic setting, the ratio formula mentions intersection of events rather than conjunction of propositions:

$$\begin{align*}P(A|B)=\frac{P(A\cap B)}{P(B)},\end{align*}$$

which is not well-defined when $P(B)=0$ . Kolmogorov offers a generalized treatment designed to accommodate cases where $P(B)=0$ . The basic idea is to associate B with a collection of suitably related events (some of which may also have probability 0). Rather than conditionalize on B taken by itself, one conditionalizes upon the associated collection.

Formally speaking, Kolmogorov’s theory centers on a subset $\mathcal{G}\subseteq \mathcal{F}$ , where $\mathcal{G}$ is itself a $\sigma$ -field. The idea is to define probabilities conditional on information (possibly including misinformation) regarding whether the true outcome belongs to each $G\in \mathcal{G}$ . Kolmogorov’s theory can be glossed in different ways depending on how we interpret the rather vague phrase “information.” In what follows, I will present my own preferred interpretation and elucidate Kolmogorov’s approach accordingly.Footnote 1

2.1. Kolmogorov certainty acquisition scenarios

Suppose the agent gains new certainties regarding sub- $\sigma$ -field $\mathcal{G}$ . She assigns probability 0 or 1 to each event $G\in \mathcal{G}$ , and on that basis she reallocates credence over the rest of $\mathcal{F}$ . For $\omega \in \Omega$ , let ${\delta}_{\omega }:\mathcal{G}\to \mathbf{\mathbb{R}}$ be the function defined by

${\delta}_{\omega }(G)=\left\{\begin{array}{@{}c}1\kern2.12em \textit{if}\;\omega \in G,\\ {}0\kern2em \textit{if}\;\omega \notin G,\end{array}\right.$ for each $G\in \mathcal{G}$ .

Call ${\delta}_{\omega }$ a certainty profile over $\mathcal{G}$ , and call $\omega$ an index of ${\delta}_{\omega }$ . Note that ${\delta}_{\omega }$ is a probability measure. ${\delta}_{\omega }$ models a scenario where, for every $G\in \mathcal{G}$ , the agent becomes certain that G does or does not obtain. A notable special case arises when the agent’s certainty profile ${\delta}_{\omega }$ tracks the truth:

  1. (3) For each $G\in \mathcal{G}$ , ${\delta}_{\omega }(G)=1$ iff the true outcome belongs to G.

Since I am studying non-factive conditionalization, I will not assume that (3) prevails. I will consider learning scenarios where the agent acquires a new certainty profile ${\delta}_{\omega }$ that may or may not track the truth. I call these Kolmogorov certainty acquisition scenarios. In a Kolmogorov certainty acquisition scenario, the agent becomes certain as to whether the true outcome belongs to each $G\in \mathcal{G}$ , but her new certainties may be misplaced.Footnote 2

A single certainty profile can have many different indices. If outcomes $\omega$ and $\nu$ belong to precisely the same members of $\mathcal{G}$ , then they index the same certainty profile:

$$\begin{align*}{\delta}_{\omega}={\delta}_{\nu }.\end{align*}$$

In practice, it is often more convenient to deal with indices rather than certainty profiles. But our main concern is certainty profiles, not indices.

Many scientifically important situations are naturally modeled as Kolmogorov certainty acquisition scenarios. To illustrate, let $X:\Omega \to \mathrm{\mathbb{R}}$ be a random variable. Suppose the agent becomes certain that X has value r. Thus, she assigns credence 1 to the event $\left\{\omega :X(\omega)=r\right\}$ . If the expression “r” is suitably informative, she is positioned to become certain of numerous additional events. Assuming that “r” belongs to any standard notational scheme for the real numbers, she should be willing to affirm or deny that X’s value falls between a and b, for any $a,b\in \mathrm{\mathbb{Q}}$ . So she should be willing to assign probability 1 or 0 to each event

$$\begin{align*}{X}^{-1}(a,b)\qquad\qquad \textit{for}\ \textit{any}\ a,b\in \mathrm{\mathbb{Q}}.\end{align*}$$

Let $\sigma (X)$ be the $\sigma$ -field generated by these events, i.e., $\sigma (X)$ results from starting with the events ${X}^{-1}(a,b)$ and closing under complementation and countable union. The agent’s new certainties over events ${X}^{-1}(a,b)$ determine a unique certainty profile over $\sigma (X)$ .

There are uncountably many events in $\sigma (X)$ . Most of those events a typical agent will never consider. Even when an agent explicitly represents an event $G\in \sigma (X)$ , she may not compute which probability she should assign to G given her new certainties regarding the events ${X}^{-1}(a,b)$ . So ${\delta}_{\omega }$ does not plausibly model the explicit certainties of any agent who remotely resembles a normal human. We should instead regard ${\delta}_{\omega }$ as modeling an agent’s implicit certainties. The agent does not extrapolate from her certainty that $X=r$ to new certainties over each event in $\sigma (X)$ , but she is positioned in principle to do so. ${\delta}_{\omega }$ captures a large collection of certainties that the agent is positioned to acquire, given her newfound certainty that $X=r$ .

2.2. Update rules

In a Kolmogorov certainty acquisition scenario, circumstances influence the agent’s credences by fixing new certainties over $\mathcal{G}\subseteq \mathcal{F}$ . Her new certainties over $\mathcal{G}$ provide her sole basis for reallocating credence over the rest of $\mathcal{F}$ . How should we capture these intuitive ideas in mathematical terms? Initially, one might suggest a function that takes certainty profiles as inputs and yields new credal reallocations as outputs. This suggestion is not very mathematically tractable, so Kolmogorov employs a more roundabout procedure. Rather than treat credal reallocation as a function of the certainty profile, he treats it as a function of an index for the certainty profile. He considers a function $C:\mathcal{F}\times \Omega \to \mathrm{\mathbb{R}}$ , where $C(A,\omega )$ is the new credence to be assigned to $A\in \mathcal{F}$ in light of certainty profile ${\delta}_{\omega }$ . I notate $C(A,\omega)$ as $C(A|\;\omega)$ . I will sometimes notate $C(.|\;\omega )$ as ${C}_{\omega }$ .

What constraints should we place upon C? Since we are using probability measures to model credences, we demand that

  1. (4) ${C}_{\omega }:\mathcal{F}\to \mathrm{\mathbb{R}}$ is a probability measure for each $\omega \in \Omega$ .

Kolmogorov imposes a crucial additional constraint. He demands that, for each $A\in \mathcal{F}$ , the one-place function $C(A|.):\Omega \to \mathrm{\mathbb{R}}$ be $\mathcal{G}$ -measurable:

  1. (5) $C{(A|.)}^{-1}(-\infty, a]\in \mathcal{G}$ for each $a\in \mathrm{\mathbb{R}}$ .

Call any function C that satisfies conditions (4) and (5) an update rule for $(\Omega, \mathcal{F})$ and $\mathcal{G}$ . As I will now explain, condition (5) ensures that the agent’s newfound implicit certainties over $\mathcal{G}$ dictate the new credences to be allocated over $\mathcal{F}$ .

$C(A|\;\omega)$ is the credence to be assigned to A when the agent newly acquires certainty profile ${\delta}_{\omega }$ . If an index $\omega$ for the agent’s new certainty profile belongs to the event

$$\begin{align*}C{(A|.)}^{-1}(-\infty, a],\end{align*}$$

then the new credence to be assigned to A is $\le a$ . If an index $\omega$ for the agent’s new certainty profile belongs to event

$$\begin{align*}C{(A|.)}^{-1}(a,\infty ),\end{align*}$$

then the new credence to be assigned to A is $>a$ . Assuming C satisfies the $\mathcal{G}$ -measurability condition (5), both events belong to $\mathcal{G}$ . The agent’s certainty profile ${\delta}_{\omega }$ must therefore assign one of the events probability 1. We consider each case:

  • If ${\delta}_{\omega }$ assigns probability 1 to $C{(A|.)}^{-1}(-\infty, a]$ , then $\omega \in C{(A|.)}^{-1}(-\infty, a]$ . Thus, the new credence to be assigned to A is $\le a$ .

  • If ${\delta}_{\omega }$ assigns probability 1 to $C{(A|.)}^{-1}(a,\infty )$ , then $\omega \in C{(A|.)}^{-1}(a,\infty )$ . Thus, the new credence to be assigned to A is $>a$ .

So the agent’s new certainty profile dictates whether the new credence to be assigned to A is $\le \mathrm{or}>a$ . By contrast, suppose C violates (5). Then the agent’s certainty profile does not dictate whether the new credence to be assigned to A is $\le \mathrm{or}>a$ . Her implicit certainties do not speak to whether the new credence should be $\le \mathrm{or}>a$ . Hence, $\mathcal{G}$ -measurability formalizes the thought that the agent’s new credal assignment over $\mathcal{F}$ is embedded in her implicit certainties over $\mathcal{G}$ .

As an added bonus, $\mathcal{G}$ -measurability entails that C induces a well-defined mapping from certainty profiles to credences. When C is $\mathcal{G}$ -measurable, its output does not depend on the particular index through which we identify a certainty profile:

$$\begin{align*}{\delta}_{\omega}={\delta}_{\nu}\to {C}_{\omega}={C}_{\nu }.\end{align*}$$

To see this, let us prove the contrapositive. Assume that

$$\begin{align*}{C}_{\omega}\ne {C}_{\nu }.\end{align*}$$

Pick $A\in \mathcal{F}$ such that $C(A|\;\omega)\ne C(A|\;\nu )$ and let

$$\begin{align*}H\ {=}_{df}\left\{\rho\in\Omega :C(A|\omega)=C\!\left(A|\rho \right)\right\}.\end{align*}$$

$C(A|.):\Omega \to \mathrm{\mathbb{R}}$ is $\mathcal{G}$ -measurable, and H is the inverse image of $\left\{C(A|\omega)\right\}$ under $C(A|.)$ , so $H\in \mathcal{G}$ . Since $\omega \in H$ and $\nu \notin H$ , it follows that

$$\begin{align*}{\delta}_{\omega }(H)=1\ \&\ {\delta}_{\nu }(H)=0\end{align*}$$

and hence that ${\delta}_{\omega}\ne {\delta}_{\nu }$ . Thus, ${C}_{\omega }$ and ${C}_{\nu }$ are identical whenever $\omega$ and $\nu$ index identical certainty profiles. This observation vindicates Kolmogorov’s decision to study update rules that take indices rather than certainty profiles as arguments. We want to model credal reallocation in light of new certainty profiles, so the particular index through which we identify a certainty profile should not matter.

2.3. The integral formula

Kolmogorov supplements (4) and (5) with an additional constraint upon C. The constraint is now usually called the integral formula:

  1. (6) $P(A\cap G)=\underset{G}{\int }C(A|\omega) dP(\omega)$ for each $A\in \mathcal{F}$ and $G\in \mathcal{G}$ .

The integral formula generalizes the law of total probability.

When $C(A|.)$ satisfies (5) and (6), it is a conditional probability for A given $\mathcal{G}$ . A function $C:\mathcal{F}\times \Omega \to \mathrm{\mathbb{R}}$ is a regular conditional distribution (rcd) for P given $\mathcal{G}$ iff it satisfies (4)–(6). I will often use the notation ${P}_{\mathcal{G}}:\mathcal{F}\times \Omega \to \mathrm{\mathbb{R}}$ to denote an rcd for P given $\mathcal{G}$ .

As an important special case, suppose that $\mathcal{G}$ is generated by a countable partition ${E}_1,\dots, {E}_n$ of $\Omega$ , where $P({E}_i)>0$ for each i. Then it is not hard to show that there exists a unique rcd for P given $\mathcal{G}$ , defined by

${P}_{\mathcal{G}}(A|\omega)=P(A|{E}_i)=\dfrac{P(A\cap {E}_i)}{P({E}_i)}$ if $\omega \in {E}_i$ .

In this way, Kolmogorov’s theory subsumes the special case where the ratio formula dictates conditional probabilities.

There always exists a conditional probability for A given $\mathcal{G}$ [Reference Billingsley2, p. 430]. This conditional probability is unique up to measure 0: any two conditional probabilities for A given $\mathcal{G}$ must agree everywhere except possibly on a set of P-measure 0. Existence of rcds is less straightforward. If the probability space is pathological, then an rcd for P given $\mathcal{G}$ may not exist [Reference Billingsley2, p. 443]. When the probability space is nice enough, there exists an rcd for P given $\mathcal{G}$ . For any Borel set $A\subseteq \left[0,1\right]$ , let $\mathcal{B}(A)$ consist of the Borel subsets of A. A measurable space $(\Omega, \mathcal{F})$ is Borel iff there exists a Borel set $A\subseteq \left[0,1\right]$ and a bijection f from $(\Omega, \mathcal{F})$ to $(A,\mathcal{B}(A))$ such that f and ${f}^{-1}$ are measurable. The following theorem is basic to the discipline [Reference Fristedt and Gray12, p. 418]:

Theorem: Let $(\Omega, \mathcal{F})$ be a Borel space, P a probability measure on $(\Omega, \mathcal{F})$ , and $\mathcal{G}\subseteq \mathcal{F}$ a sub- $\sigma$ -field. Then there exists an rcd for P given $\mathcal{G}$ .

Virtually all probability spaces used in Bayesian decision theory are Borel. For example, every Polish space is Borel. For further discussion of rcd existence, see [Reference Rao34].

3. Kolmogorov conditionalization

Imagine an agent who begins with credences given by $(\Omega, \mathcal{F},P)$ . Suppose there exists ${P}_{\mathcal{G}}$ , an rcd for P given $\mathcal{G}$ . Suppose the agent adopts the following credal reallocation policy:

  1. (7) Respond to new certainty profile ${\delta}_{\omega }$ over $\mathcal{G}$ by adopting new credences ${P}_{\mathcal{G}}(.|\omega )$ over $\mathcal{F}$ .

Suppose the agent subsequently acquires certainty profile ${\delta}_{\omega }$ and, on that basis, adopts new credences ${P}_{\mathcal{G}}(.|\omega )$ over $\mathcal{F}$ . Then I say that the agent uses ${P}_{\mathcal{G}}$ to conditionalize on ${\delta}_{\omega }$ . When an agent uses an rcd to conditionalize, I say that she engages in Kolmogorov conditionalization.

Kolmogorov conditionalization plays a foundational role in Bayesian statistics [Reference Florens, Mouchart and Rolin11, Reference Ghosal and van der Vaart14]. It also figures prominently in many applications of the Bayesian framework, such as within cognitive science [Reference Bennett, Hoffman, Prakash, Knill and Richards1] and economics [Reference Kiefer, Nyarko, Kirman and Salmon23, Reference Mertens and Zamir29].

My main thesis in the paper is that Kolmogorov conditionalization is an appealing update strategy with distinctive virtues. However, an honest appraisal must grant that it also faces significant challenges.

The first challenge concerns existence. If there does not exist an rcd for P given $\mathcal{G}$ , then an agent with initial credences given by P cannot Kolmogorov conditionalize. Luckily, rcds exist in a wide range of cases, including virtually all cases likely to arise in scientific practice.

The second challenge concerns uniqueness. Setting aside the special case given by the ratio formula, unconditional probabilities do not determine unique conditional probabilities: when an rcd exists, there exist infinitely many rcds. Then there are infinitely many credal reallocation policies (7) to choose from. A Kolmogorov conditionalizer must select one such policy. Whereas ratio conditionalization mandates a unique credal update rule, Kolmogorov conditionalization allows infinitely many distinct credal update rules. This is not a fatal problem, but it suggests that we may want to supplement Kolmogorov conditionalization with additional constraints upon credal updates.

The third challenge, which requires more extended discussion, centers upon a phenomenon called impropriety. Section 3.1 explains the phenomenon. Sections 3.23.4 explore the challenge that it poses to Kolmogorov conditionalizers.

3.1. Propriety

Intuitively speaking, the probability of an event conditional on itself should be 1. And it is indeed an easy theorem that $P(E|E)=1$ when $P(E)>0$ . What about cases where $P(E)=0$ ? Say that an update rule C for $(\Omega, \mathcal{F})$ and $\mathcal{G}$ is proper at $\omega$ iff

  1. (8) If $\omega \in G$ , then $C(G|\omega)=1$ for all $G\in \mathcal{G}$ .

Equivalently, C is proper at $\omega$ iff

$$\begin{align*}{C}_{\omega}\mid \mathcal{G}={\delta}_{\omega }.\end{align*}$$

Here $f\mid d$ is the restriction of function f to domain d. C is proper iff it is proper at all $\omega \in \Omega$ . Thus, an rcd ${P}_{\mathcal{G}}$ is proper iff

${P}_{\mathcal{G}}(.|\omega )\mid \mathcal{G}={\delta}_{\omega }$ for each $\omega \in \Omega$ .

Propriety is the natural extension of the formula $P(E|E)=1$ to the rcd formalism. Given Section 2’s interpretation of the formalism, it is a highly intuitive desideratum: ${P}_{\mathcal{G}}$ encodes a rule for updating credences based upon new certainties over $\mathcal{G}$ , so ${P}_{\mathcal{G}}$ should preserve those certainties.

Unfortunately, the desideratum is not always satisfiable. Blackwell and Dubins [Reference Blackwell and Dubins3] prove the following:

Theorem: If $\mathcal{F}$ is a countably generated $\sigma$ -field, and $\mathcal{G}\subseteq \mathcal{F}$ is sub- $\sigma$ -field that is not countably generated, and P is a probability measure on $\mathcal{F}$ , then no rcd for P given $\mathcal{G}$ is proper.

As a simple illustration, let $\lambda$ be Lebesgue measure on Borel subsets $\mathcal{B}$ of $\left[0,1\right]$ . Let $\mathcal{C}$ be the sub- $\sigma$ -field generated by the countable subsets of $\left[0,1\right]$ . Thus, $S\in \mathcal{C}$ iff S is countable or S’s complement is countable. Note that $\mathcal{B}$ is countably generated but that $\mathcal{C}$ is not. As Billingsley [Reference Billingsley2, p. 437] observes, the function ${\lambda}_{\mathcal{C}}:\mathcal{F}\times \Omega \to \mathrm{\mathbb{R}}$ defined by

  1. (9) ${\lambda}_{\mathcal{C}}(A|\omega)=\lambda (A)$

is an rcd for $\lambda$ given $\mathcal{C}$ . This function massively violates propriety. In particular, (9) entails

$$\begin{align*}{\lambda}_{\mathcal{C}}(\left\{\omega \right\}|\omega )=0.\end{align*}$$

Intuitively: the probability of an event conditional on itself should be 1, but in this case it is 0! Worse, Seidenfeld, Schervish, and Kadane [Reference Seidenfeld, Schervish and Kadane42] show that any rcd for $\lambda$ given $\mathcal{C}$ must satisfy (9) for $\lambda$ -almost all $\omega$ . Each rcd for $\lambda$ given $\mathcal{C}$ is improper almost everywhere. For further theorems and examples along these lines, see [Reference Seidenfeld, Schervish and Kadane42].

Propriety becomes more achievable when $\mathcal{G}$ is countably generated, due to the following theorem [Reference Seidenfeld, Schervish and Kadane42, p. 1614]:

Theorem: If $\mathcal{G}$ is countably generated, and ${P}_{\mathcal{G}}$ is an rcd for P given $\mathcal{G}$ , then ${P}_{\mathcal{G}}$ is proper at $\omega$ for P-almost all $\omega$ .

This theorem covers cases of conditioning on the value of random variable X, because $\sigma (X)$ is countably generated. Even in the countably generated case, though, one cannot always eliminate the exceptional null event $\left\{\omega :{P}_{\mathcal{G}}\;\mathrm{is}\kern0.17em \mathrm{improper}\;\mathrm{at}\;\omega \right\}$ . Only under special assumptions does there exist an rcd that is proper at all outcomes $\omega$ , as shown by a theorem due to Blackwell and Ryll-Nardzewski [Reference Blackwell and Ryll-Nardzewski4]:

Theorem: Let $(\Omega, \mathcal{F})$ be a Borel space, P a probability measure on $(\Omega, \mathcal{F})$ , and X a random variable on $(\Omega, \mathcal{F})$ . There exists a proper rcd for P given $\sigma (X)$ only if $X(\Omega)$ is a Borel set.

There are many random variables X for which $X(\Omega)$ is not a Borel set.

More generally, say that $\Phi :\mathcal{F}\to \mathcal{G}$ is a selection homomorphism for $\mathcal{G}$ with respect to $\mathcal{F}$ iff (a) $\Phi$ respects complementation and countable union, and (b) $\Phi (G)=G$ for every $G\in \mathcal{G}$ . As Sokal [Reference Sokal44] notes, the proof techniques from [Reference Blackwell and Ryll-Nardzewski4] easily extend to establish the following theorem:

Theorem: Let $(\Omega, \mathcal{F})$ be a Borel space, P a probability measure on $(\Omega, \mathcal{F})$ , and $\mathcal{G}\subseteq \mathcal{F}$ a sub- $\sigma$ -field. Then there exists a proper rcd for P given $\mathcal{G}$ iff there exists a selection homomorphism for $\mathcal{G}$ with respect to $\mathcal{F}$ .

Selection homomorphisms may not exist even when $\mathcal{G}$ is countably generated.

Some authors, disturbed by impropriety, deny that rcds furnish a good mathematical elucidation of conditional probability [Reference Blackwell and Dubins3, Reference Hájek, Bandyopadhyay and Forster18, Reference Seidenfeld, Schervish and Kadane42]. These authors maintain that conditional probabilities should always be proper. They argue on that basis that rcds cannot serve as conditional probabilities.

In this paper, I am not addressing how well rcds elucidate our pre-theoretic notion of conditional probability. I am addressing a slightly different question: do rcds help us model null updating? So I will not explore the challenge that improper rcds pose to the conceptual analysis of conditional probability. Instead, I will explore the challenge that they pose to Kolmogorov conditionalizers.

3.2. Implications of impropriety

In a Kolmogorov certainty acquisition scenario, the agent acquires new certainties over $\mathcal{G}$ and on that basis reallocates credences over the rest of $\mathcal{F}$ . Improper updates rules do not provide a good basis for the reallocation. For suppose that update rule C is improper at $\omega$ :

  1. (10) ${C}_{\omega}\mid \mathcal{G}\ne {\delta}_{\omega }.$

Then it is impossible to maintain certainty profile ${\delta}_{\omega }$ while also maintaining credences ${C}_{\omega }$ . The agent cannot update her credences using C, because she cannot simultaneously assign different credences to the same event. Since proper rcds do not always exist even when rcds exist, there are situations where the agent cannot Kolmogorov conditionalize even though an rcd ${P}_{\mathcal{G}}$ exists. If ${P}_{\mathcal{G}}$ is improper, then the agent cannot follow the credal reallocation policy (7).

Ideally, we would have liked a credal reallocation strategy that works for all possible Kolmogorov certainty acquisition scenarios. It is disappointing that Kolmogorov conditionalization does not give us everything we want. But just how disappointing?

Begin with cases where $\mathcal{G}$ is not countably generated. A thinker who mentally represents putative membership information for $\mathcal{G}$ must be able to entertain uncountably many propositions simultaneously. This is a highly infinitary mental capacity. On that basis, Easwaran [Reference Easwaran, Bandyopadhyay and Forster9, p. 143] argues that $\mathcal{G}$ is “irrelevant for probability as degree of belief” because it “can only be grasped by minds that are far more complicated than the ones that we normally attribute subjective probabilities to.” I agree. Cases where $\mathcal{G}$ is not countably generated may have mathematical interest, but they are not well-suited to model agents remotely like us. We should not fret that some of them fall outside the scope of Kolmogorov conditionalization.

Relatedly, scientific and philosophical applications of Bayesian decision theory make serious use only of conditioning sub- $\sigma$ -fields that are countably generated. Scientific applications emphasize learning scenarios that can be modeled with countably generated conditioning sub- $\sigma$ -fields. To the best of my knowledge, these are the only conditioning sub- $\sigma$ -fields that play a significant role in Bayesian statistics, probabilistic robotics, economics, Bayesian cognitive science, or any other field that studies credal reallocation.Footnote 3

Let us turn to cases where $\mathcal{G}$ is countably generated. In these cases, each rcd ${P}_{\mathcal{G}}$ is proper almost everywhere. More precisely, if we define

$$\begin{align*}\mathit{Prop}\ {=}_{df}\left\{\omega :{P}_{\mathcal{G}}\;\mathrm{is}\kern0.17em \mathrm{proper}\;\mathrm{at}\;\omega \right\}\!,\end{align*}$$

then $\mathit{Prop}^c$ has probability 0. At first blush, it might seem that the challenge posed by impropriety is not so worrisome here. If you have credence 0 that impropriety will occur, then you can rest assured that conflicts between your new certainties and your update rule will almost never arise. You can adopt credal reallocation policy (7) and be certain that it will not break down due to a conflict between certainty profile ${\delta}_{\omega }$ and update rule ${P}_{\mathcal{G}}$ .

On further examination, this reassuring line of reasoning is mistaken. Since misplaced certainty is possible, you can acquire new certainties ${\delta}_{\omega }$ even though no index for ${\delta}_{\omega }$ is the true outcome. You may therefore assign a second-order credence to the proposition

  1. (11) $\exists \omega$ (I will acquire new certainty profile ${\delta}_{\omega }$ & ${P}_{\mathcal{G}}$ is improper at $\omega$ )

that differs from your first-order credence in $\mathit{Prop}^c$ . Probability space $(\Omega, \mathcal{F},P)$ may not formally model second-order credences of this kind, but you may nevertheless assign them.Footnote 4 In particular, you may assign positive credence to (11) while assigning credence zero to $\mathit{Prop}^c$ . Even if you are certain that Prop will occur, you may think there is a serious chance that you will acquire future certainties as if $\mathit{Prop}^c$ occurs. For example, you may suspect that an evil demon will manipulate you into acquiring future certainties ${\delta}_{\omega }$ , with $\omega \in \mathit{Prop}^c$ . Thus, you may assign positive second-order credence to the possibility that your update rule conflicts with your new certainty profile.

Overall, then, impropriety seems harmless enough when $\mathcal{G}$ is not countably generated but rather disappointing when $\mathcal{G}$ is countably generated. Still, we should keep this setback in perspective. Although Kolmogorov conditionalization is not always usable, it is a very general strategy that can be implemented in numerous Kolmogorov certainty acquisition scenarios.

To see just how far its reach extends, consider the following setup ([Reference Florens, Mouchart and Rolin11, pp. 26–28], [Reference Ghosal and van der Vaart14, pp. 5–7]), which is general enough to accommodate virtually every scientific application of Bayesian inference. A “parameter space” $\left(X,\mathcal{X}\right)$ models possible states of the world. A “sample space” $\left(E,\mathcal{E}\right)$ models possible information the agent will receive. $\mathcal{X}$ is a $\sigma$ -field over X. $\mathcal{E}$ is a $\sigma$ -field over E. The parameter space and sample space combine into a joint space $\left(X\times E,\mathcal{X}\otimes \mathcal{E}\right)$ , where

$$\begin{align*}\mathcal{X}\otimes \mathcal{E}\ {=}_{\textit{df}}\ \sigma (\mathcal{X}\times \mathcal{E})\end{align*}$$

is the $\sigma$ -field generated by all measurable rectangles of the form

$$\begin{align*}A\times B\kern4.04em A\in \mathcal{X},B\in \mathcal{E}.\end{align*}$$

The agent’s credences are given by a probability measure P over $\mathcal{X}\otimes \mathcal{E}$ . Let $\mathcal{G}$ be the sub- $\sigma$ -field containing all events

$$\begin{align*}X\times B\kern4.04em B\in \mathcal{E}.\end{align*}$$

$\mathcal{G}$ is the canonical embedding of $\mathcal{E}$ within $\mathcal{X}\otimes \mathcal{E}$ . It is easy to show that there exists a selection homomorphism for $\mathcal{G}$ with respect to $\mathcal{X}\otimes \mathcal{E}$ [Reference Sokal44]. Thus, assuming $\left(X\times E,\mathcal{X}\otimes \mathcal{E}\right)$ is Borel, a proper rcd ${P}_{\mathcal{G}}$ exists. The agent can use ${P}_{\mathcal{G}}$ to conditionalize. Upon gaining new certainties over $\mathcal{G}$ , she can use ${P}_{\mathcal{G}}$ to reallocate credence over the rest of $\mathcal{X}\otimes \mathcal{E}$ . This analysis indicates that, although impropriety may arise in principle, it seldom if ever arises in scientific practice.

3.3. Non-credal interpretations

Given the challenge posed by impropriety, it is worth briefly examining whether the challenge persists if we reinterpret the “information” based on which the agent conditionalizes. I have adopted a credal interpretation: I glossed “information” in terms of a change in the agent’s credences over $\mathcal{G}$ . Specifically, I interpreted the agent as gaining new certainties regarding $\mathcal{G}$ , codified through a certainty profile ${\delta}_{\omega }$ , and I asked how the agent should reallocate credences over the rest of $\mathcal{F}$ in light of her newfound certainties over $\mathcal{G}$ . But we might instead interpret “information” in non-credal terms. We might posit an epistemic state that impacts the agent’s credences from outside the credal system.

In principle, there are many possible non-credal interpretations one might explore. Here are two examples from the recent literature:

  • In [Reference Rescorla36], I interpreted the agent as receiving membership knowledge regarding $\mathcal{G}$ . On this interpretation, ${\delta}_{\omega }(A)=1$ means that the agent knows that the true outcome belongs to A, while ${\delta}_{\omega }(A)=0$ means that the agent knows that the true outcome does not belong to A. The question then becomes how the agent should reallocate her credences over $\mathcal{F}$ in light of her new knowledge regarding $\mathcal{G}$ .

  • Meehan and Zhang [Reference Meehan and Zhang28] interpret the agent as receiving evidence regarding $\mathcal{G}$ , where evidence is guaranteed to be veridical. The question then becomes how the agent should reallocate her credences over $\mathcal{F}$ in light of her new evidence regarding $\mathcal{G}$ .

Both interpretations are factive. In particular, knowledge is factive: if an agent knows that the true outcome belongs to A, then the true outcome belongs to A.

Suppose we regard the agent as conditionalizing based on membership knowledge regarding $\mathcal{G}$ . Knowledge is not certainty: an agent may know that E without being certain that E. It is possible for an agent to know that the true outcome belongs to ${G\in \mathcal{G}}$ without assigning credence 1 to G. Hence, impropriety no longer poses the challenge raised in Section 3.2. Yet one might worry that impropriety now poses a slightly different challenge. Maybe it is possible to know that E while setting $P(E)<1$ , but is it rational to do so? An improper update rule may look unsuited for conditionalizing on knowledge, given that it leads the agent to assign credence $<1$ to what she knows. A similar dialectic arises on Meehan and Zhang’s [Reference Meehan and Zhang28, p. 729] evidentiary interpretation: it is possible to receive evidence E and set $P(E)<1$ , but is it rational to do so? Meehan and Zhang claim that it is not. They endorse a norm that they call Evidential Certainty: “one should be certain of one’s total evidence.”

I am skeptical of Evidential Certainty and other similar norms. I also see no reason to suspect that knowledge rationally mandates certainty. For present purposes, though, I set these matters aside. The key point is that, however they play out, impropriety poses a less serious challenge to non-credal factive interpretations than to Section 2’s credal interpretation.

Here is why. On a factive interpretation, the agent receives veridical information ${\delta}_{\omega }$ indexed by the true outcome $\omega$ . I argued in Section 3.2 that we may restrict attention to countably generated $\mathcal{G}$ , so that C is proper at $\omega$ for P-almost all $\omega$ . Given this restriction, the agent is certain that her update rule will be proper on whatever outcome $\omega$ indexes her new veridical information. On the knowledge interpretation, for example, she is certain that she will receive new knowledge encapsulated by a ${\delta}_{\omega }$ such that C is proper at $\omega$ . Since the agent regards impropriety as vanishingly unlikely to occur, she need not find it particularly worrisome. For that reason, non-credal factive interpretations look much better positioned than Section 2’s credal interpretation with respect to impropriety.

However, my goal in this paper is to explore null updating from a non-factive perspective that allows misplaced certainty in conditioning propositions. No treatment that interprets “information” in factive terms can capture all the credal updates we would like to capture: factive treatments neglect numerous credal updates that occur routinely in scientific practice. So interpretations centered around knowledge or non-credal factive evidence are too limited in scope for my purposes.

One might pursue a non-credal, non-factive interpretation of the agent’s “information” regarding $\mathcal{G}$ . For example, one might articulate a non-factive conception of “evidence” and treat the agent as gaining evidence regarding $\mathcal{G}$ . I am not sure what this conception would look like or how it would adjudicate the challenge posed by impropriety. Pursuing such questions would inevitably embroil us in many epistemological complexities.

In contrast, the credal interpretation is straightforward and requires only notions that we already need when doing Bayesian epistemology. The credal interpretation also lets us study credal updates on their own terms, ignoring whatever non-credal states prompt the updates. So I think we do well to explore the credal interpretation, which is the task I have undertaken in the present paper. Once we adopt the credal interpretation, the challenge raised in Section 3.2 arises.

3.4. Alternative update rules

To gain a better perspective on the challenge, let us compare Kolmogorov conditionalization with alternative credal reallocation strategies one might pursue.

I confine attention to strategies based on update rules. Imagine an agent who adopts the following credal reallocation policy:

  1. (12) Respond to new certainty profile ${\delta}_{\omega }$ over $\mathcal{G}$ by adopting new credences ${C}_{\omega }$ over $\mathcal{F}$ ,

where C is an update rule but not an rcd, i.e., C satisfies (4) and (5) but does not satisfy (6). (12) faces a problem parallel to that faced by (7): C may be improper at $\omega$ . The theorems proved by Blackwell and Ryll-Nardzewski [Reference Blackwell and Ryll-Nardzewski4] are more general than stated in Section 3.2. The general theorems state:

Theorem: Let $(\Omega, \mathcal{F})$ be a Borel space and X a random variable on $(\Omega, \mathcal{F})$ . There exists a proper update rule for $(\Omega, \mathcal{F})$ and $\sigma (X)$ only if $X(\Omega)$ is a Borel set.

Theorem: Let $(\Omega, \mathcal{F})$ be a Borel space and $\mathcal{G}\subseteq \mathcal{F}$ a sub- $\sigma$ -field. There exists a proper update rule for $(\Omega, \mathcal{F})$ and $\mathcal{G}$ iff there exists a selection homomorphism for $\mathcal{G}$ with respect to $\mathcal{F}$ .

When C is improper at $\omega$ , it is impossible to maintain certainty profile ${\delta}_{\omega }$ while simultaneously maintaining credences ${C}_{\omega }$ . Thus, discarding the integral formula does not eliminate the challenge posed by impropriety.

A theorem proved by Ramachandran [Reference Ramachandran32] shows that, under very natural assumptions, discarding the integral formula does not ameliorate impropriety at all:

Theorem: Let $(\Omega, \mathcal{F},P)$ be a probability space and $\mathcal{G}\subseteq \mathcal{F}$ a sub- $\sigma$ -field. There exists a proper rcd for P given $\mathcal{G}$ iff: (i) there exists an rcd for P given $\mathcal{G}$ that is proper at $\omega$ for P-almost all $\omega$ ;and (ii) there exists a proper update rule for $(\Omega, \mathcal{F})$ and $\mathcal{G}$ .

I argued in Section 3.2 that we may confine attention to cases where $\mathcal{G}$ is countably generated, so that condition (i) of the theorem is satisfied. In such cases, a proper rcd exists iff a proper update rule exists. Abandoning the integral formula yields no progress whatsoever regarding impropriety.

As this dialectic reveals, the disappointment engendered by impropriety is not specific to rcds. It arises for all update rules, whether or not one adopts the integral formula as a constraint on conditional probabilities. Using an update rule other than an rcd is no remedy.

Some readers may suspect that we should look beyond update rules. Perhaps we should lift the requirement (4) that new credences be codified by a probability measure? Or lift the measurability requirement (5)? Neither option seems promising to me:

  • The only controversial element in (4) is the requirement that credences be countably additive. Some authors contend that we should abandon countable additivity, requiring only that credences be finitely additive (e.g., [Reference de Finetti7, Reference Howson19, Reference Seidenfeld, Schervish and Kadane42]). However, countable additivity figures crucially in probability theory and in many scientific applications of Bayesian decision theory. Abandoning it would require massive revisions to scientific practice. I doubt that a finitely additive approach can replicate the explanatory and pragmatic success achieved by the standard countably additive framework.

  • The measurability requirement (5) is highly plausible, as argued in Section 2.2. Measurability formalizes the intuitive thought that new credal assignments are embedded in the agent’s implicit certainties regarding $\mathcal{G}$ .

For these reasons, I restrict attention to update rules. Given the restriction to update rules, impropriety poses no special problems for rcds.

4. Dutch book theorems

I will now show that rcds have distinctive pragmatic benefits that privilege them over other update rules. I will prove a Dutch book theorem and converse Dutch book theorem tailored to Kolmogorov conditionalization.

As mentioned in Section 1, Lewis [Reference Lewis26] proves a Dutch book theorem for ratio conditionalization, and Skyrms [Reference Skyrms43] proves a converse theorem. Lewis and Skyrms assume that the conditioning proposition is true and that it has non-zero initial probability. They do not address scenarios where misplaced certainty is possible or where null updating occurs. Subsequent work generalizes the Lewis–Skyrms theorems as follows:

  • [Reference Rescorla38] extends the Lewis–Skyrms theorems to a non-factive setting. This paper proves a Dutch book theorem and converse Dutch book theorem concerning a class of learning scenarios where misplaced certainty is possible. The theorems do not cover null updating scenarios.

  • [Reference Rescorla36] extends the Lewis–Skyrms theorems to a setting where null updating can occur. This paper proves a Dutch book theorem and converse Dutch book theorem tailored to Kolmogorov knowledge acquisition scenarios (i.e., scenarios where the agent gains membership knowledge for a sub- $\sigma$ -field $\mathcal{G}$ and updates her credences on that basis).Footnote 5 Since knowledge is factive, the theorems do not cover scenarios where the conditioning proposition is false.

[Reference Rescorla38] lifts the restriction to true conditioning propositions. [Reference Rescorla36] lifts the restriction to propositions with positive initial probability. In what follows, I lift both restrictions simultaneously. I prove a Dutch book theorem and converse Dutch book theorem tailored to Kolmogorov certainty acquisition scenarios.

In Section 4.1, I formalize key notions such as bet, bookie strategy, and Dutch book. Section 4.2 reviews literature that applies these formalized notions within a factive setting. Section 4.3 proves a non-factive Dutch book theorem and converse Dutch book theorem.

4.1. Dutch books formalized

Consider an agent who at time ${t}_1$ has credences modeled by probability space $(\Omega, \mathcal{F},P)$ and at time ${t}_2$ gains new certainties ${\delta}_{\omega }$ over $\mathcal{G}\subseteq \mathcal{F}$ . We want to compare possible strategies for reallocating credence based on the new certainties. We assume that the agent uses a proper update rule $C:\mathcal{F}\times \Omega \to \mathrm{\mathbb{R}}$ , satisfying constraints (4), (5), and (8). She responds to new certainties ${\delta}_{\omega }$ by forming new credences ${C}_{\omega }$ . Thus, her credal reallocation policy is given by (12). Assumption (4) guarantees that ${C}_{\omega }$ is a probability measure. If $C(A|.)$ satisfies the integral formula for all $A\in \mathcal{F}$ , then C is an rcd for P given $\mathcal{G}$ . I do not assume that $C(A|.)$ satisfies the integral formula.

The agent faces a bookie who can offer bets at both ${t}_1$ and ${t}_2$ . Following standard practice in probability theory, I formalize a bet as a random variable. It is convenient to allow random variables that take values in the extended real line $\overline{\mathrm{\mathbb{R}}}=\left[{-}\infty, \infty \right]$ . Thus, a bet is an $\mathcal{F}$ -measurable function $X:\Omega \to \overline{\mathrm{\mathbb{R}}}$ . Here $X(\omega)$ is the net gain for outcome $\omega$ . Write

$$\begin{align*}{E}_{\mu}[X]\ \kern1pt{=}_{df}\underset{\Omega}{\int } Xd\mu\end{align*}$$

for the expectation of random variable X with respect to probability measure $\mu$ . A bet X is acceptable relative to $\mu$ iff its expected value with respect to $\mu$ is non-negative:

$$\begin{align*}{E}_{\mu}[X]\ {=}_{df}\underset{\Omega}{\int } Xd\mu \ge 0.\end{align*}$$

At ${t}_2$ , the agent acquires updated credences ${C}_{\omega }$ based on new certainties ${\delta}_{\omega }$ . Say that X is acceptable given ${\delta}_{\omega }$ iff its expected value with respect to ${C}_{\omega }$ is non-negative:

$$\begin{align*}{E}_{C_{\omega }}[X]=\int {XdC}_{\omega}\ge 0.\end{align*}$$

This definition is well-formed, since $\mathcal{G}$ -measurability ensures that

$$\begin{align*}{\delta}_{\omega}={\delta}_{\nu}\to {C}_{\omega}={C}_{\nu }.\end{align*}$$

I assume that our agent adopts a policy of accepting all acceptable bets that are offered and rejecting all unacceptable bets.

Lewis and Skyrms assume that the bookie learns at ${t}_2$ which conditioning proposition is true. In [Reference Rescorla36], I made a comparable assumption when extending the Dutch book results to Kolmogorov knowledge acquisition scenarios: I assumed that the bookie at ${t}_2$ learns membership facts for $\mathcal{G}$ . These assumptions no longer seem appropriate once we broaden the allowed learning scenarios to include misplaced certainty. After all, if the bookie learns membership facts for $\mathcal{G}$ but the agent has no such luck, then the bookie’s ability to exploit his superior knowledge for sure gain hardly indicates that the agent’s credal reallocation policy is problematic. So a non-factive setting requires us to rethink our assumptions regarding the bookie’s epistemic state at ${t}_2$ . What now seems most appropriate is to assume that the agent and the bookie acquire the same certainty profile ${\delta}_{\omega }$ . Whereas I assumed in [Reference Rescorla36] that agent and bookie acquire the same membership knowledge regarding $\mathcal{G}$ , I now assume that they acquire the same certainties regarding $\mathcal{G}$ . Thus, the bookie is no better and no worse off than the agent when it comes to membership facts about $\mathcal{G}$ .

Based on his new certainty profile over $\mathcal{G}$ , the bookie decides what bet (if any) to offer at ${t}_2$ . The bookie does this using a bookie strategy, which maps certainty profiles ${\delta}_{\omega }$ into bets. We can formalize these ideas through the measurable space $\left(\Omega \times \Omega, \mathcal{G}\otimes \mathcal{F}\right)$ . A bookie strategy is a $\mathcal{G}\otimes \mathcal{F}$ -measurable function $Y:\Omega \times \Omega \to \overline{\mathrm{\mathbb{R}}}$ . For fixed $\omega$ , $Y(\omega, .):\Omega \to \overline{\mathrm{\mathbb{R}}}$ is the bet that the bookie offers upon acquiring new certainties ${\delta}_{\omega }$ over $\mathcal{G}$ . Since Y is $\mathcal{G}\otimes \mathcal{F}$ -measurable, one can easily show that

$Y(\omega, .):\Omega \to \overline{\mathrm{\mathbb{R}}}$ is $\mathcal{F}$ -measurable for every $\omega \in \Omega$ ,

so that $Y(\omega, .)$ is indeed a bet according to our official definition. One can also show that

$Y(.,\nu ):\Omega \to \overline{\mathrm{\mathbb{R}}}$ is $\mathcal{G}$ –measurable for every $\nu \in \Omega$ ,

which ensures that

$$\begin{align*}{\delta}_{\omega}={\delta}_{\rho}\to Y(\omega, .)=Y(\rho, .).\end{align*}$$

Thus, a bookie strategy carries each certainty profile ${\delta}_{\omega }$ to a well-defined bet, independent of the index $\omega$ . I will often abbreviate $Y(\omega, .)$ as ${Y}_{\omega }$ . To model scenarios where the bookie offers no bet in light of new certainty profile ${\delta}_{\omega }$ , I set ${Y}_{\omega}(\nu)=0$ for all inputs $\nu$ .

The $\mathcal{G}\otimes \mathcal{F}$ -measurability requirement ensures that the event ${Y}^{-1}(-\infty, a]=\left\{(\omega, \nu):{Y}_{\omega}(\nu)\le a\right\}$ belongs to $\mathcal{G}\otimes \mathcal{F}$ . This event is notable, for the following reason:

  1. (13) The bet offered by the bookie has net gain $\le a$ iff $(\omega, \nu)\in {Y}^{-1}(-\infty, a]$ , where $\omega$ is any index for the bookie’s newly acquired certainty profile and $\nu$ is the true outcome.

We may employ ${Y}^{-1}(-\infty, a]$ as a formal proxy for the proposition The bet offered by the bookie has net gain $\le a$ . $\mathcal{G}\otimes \mathcal{F}$ -measurability ensures that the formal proxy belongs to $\mathcal{G}\otimes \mathcal{F}$ .

We may use $\mathcal{G}\otimes \mathcal{F}$ to model the implicit knowledge of an observer who learns which new certainty profile over $\mathcal{G}$ the agent and bookie acquire at time ${t}_2$ and who learns which events in $\mathcal{F}$ occurred. Such an observer learns whether

$$\begin{align*}{\delta}_{\omega }(G)=1,\qquad\qquad \textit{for}\ each\ G\in \mathcal{G},\end{align*}$$

where $\omega$ is any index for the agent’s new certainty profile. Equivalently, the observer learns whether $\omega \in G$ , for each $G\in \mathcal{G}$ . The observer also learns whether the true outcome $\nu$ belongs to F, for each $F\in \mathcal{F}$ . In principle, the observer can extrapolate from his knowledge regarding $\mathcal{G}$ and $\mathcal{F}$ to knowledge regarding $\mathcal{G}\otimes \mathcal{F}$ . For each $D\in \mathcal{G}\otimes \mathcal{F}$ , he gains implicit knowledge whether

$$\begin{align*}(\omega, \nu)\in D,\end{align*}$$

where $\omega$ is any index for the agent’s new certainty profile and $\nu$ is the true outcome. Assuming $\mathcal{G}\otimes \mathcal{F}$ -measurability, our observer gains implicit knowledge whether $(\omega, \nu)\in {Y}^{-1}(-\infty, a]$ , where $\omega$ is any index for the agent’s new certainty profile and $\nu$ is the true outcome. In other words, he gains implicit knowledge whether the bet offered by the bookie has net gain $\le a$ . On the other hand, if ${Y}^{-1}(-\infty, a]$ does not belong to $\mathcal{G}\otimes \mathcal{F}$ , then the observer does not gain implicit knowledge whether the selected bet has net gain $\le a$ . Hence, $\mathcal{G}\otimes \mathcal{F}$ -measurability codifies the following intuitive thought: an observer who knows which new certainty profile the agent and bookie acquire and which events in $\mathcal{F}$ occurred thereby gains implicit knowledge whether the bet selected by the bookie has net gain $\le a$ .

We may now formalize Dutch bookability. I consider two formalizations, corresponding respectively to the intuitive notions sure loss in Kolmogorov knowledge acquisition scenarios and sure loss in Kolmogorov certainty acquisition scenarios. A factive Dutch book for $\left(\boldsymbol{\Omega}, \boldsymbol{\mathcal{F}},\boldsymbol{P}\right)$ , sub- $\boldsymbol{\sigma}$ -field $\boldsymbol{\mathcal{G}}\,{\mathbf{\subseteq}}\,\boldsymbol{\mathcal{F}}$ , and update rule C is a pair $\left(X,Y\right)$ such that, for all $\omega \in \Omega$ :

(i) X is a bet that is acceptable relative to P.

(ii) Y is a bookie strategy.

(iii) ${Y}_{\omega }$ is acceptable given ${\delta}_{\omega }$ .

(iv) $X(\omega)+{Y}_{\omega}(\omega)<0$ .

Condition (i) requires that X have nonnegative expectation at ${t}_1$ . This condition ensures that the agent will accept bet X at ${t}_1$ . Conditions (iii) and (iv) jointly ensure that a bookie who offers bet X and pursues bookie strategy Y will inflict a sure loss assuming that the agent and the bookie learn true membership facts for $\mathcal{G}$ . A non-factive Dutch book for $\left(\boldsymbol{\Omega}, \boldsymbol{\mathcal{F}},\boldsymbol{P}\right)$ , sub- $\boldsymbol{\sigma}$ -field $\boldsymbol{\mathcal{G}}\,{\mathbf{\subseteq}}\,\boldsymbol{\mathcal{F}}$ , and update rule C is a pair $\left(X,Y\right)$ such that for all $\omega, \nu \in \Omega$ :

(i) X is a bet that is acceptable relative to P.

(ii) Y is a bookie strategy.

(iii) ${Y}_{\omega }$ is acceptable given ${\delta}_{\omega }$ .

(iv*) $X(\nu )+{Y}_{\omega}(\nu)<0$ .

Conditions (iii) and (iv*) jointly ensure that a bookie who offers bet X and pursues bookie strategy Y will inflict a sure loss whether or not the agent and bookie become certain of true membership facts for $\mathcal{G}$ . A non-factive Dutch book ensures a net loss even in situations where agent and bookie acquire misplaced certainties regarding $\mathcal{G}$ . See Figure 1.

Figure 1 A visualization of $\Omega \times \Omega$ . In general, there need not be any natural linear ordering of $\Omega$ . Nevertheless, the visualization is a useful heuristic. The horizontal axis corresponds to $\left(\Omega, \mathcal{G}\right)$ . Points on this axis index the agent’s new certainty profile, which determines the bet selected by the bookie. When the agent acquires certainty profile ${\delta}_{\omega }$ , the bookie offers bet ${Y}_{\omega }$ . The vertical axis corresponds to $(\Omega, \mathcal{F})$ . Points on this axis reflect the true outcome. We specify how a gambling scenario unfolds by specifying an ordered pair $(\omega, \nu)$ , where $\omega$ is an index for the agent’s new certainty profile and $\nu$ is the true outcome. Each ordered pair $(\omega, \nu)$ determines a well-defined net gain ${Y}_{\omega}(\nu)$ . A factive Dutch book inflicts a net loss along the diagonal. A non-factive Dutch book inflicts a net loss on all points.

4.2. Factive Dutch books

The Dutch book theorem and converse Dutch book theorem from [Reference Rescorla36] are tailored to Kolmogorov knowledge acquisition scenarios:

Factive Dutch Theorem for Kolmogorov Conditionalization : Let $(\Omega, \mathcal{F},P)$ be a probability space, let $\mathcal{G}\subseteq \mathcal{F}$ be a sub- $\sigma$ -field, and let C be an update rule for $(\Omega, \mathcal{F})$ and $\mathcal{G}$ . If C is not an rcd for P given $\mathcal{G}$ , then there exists a factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C.

Factive Converse Dutch Book Theorem for Kolmogorov Conditionalization : Let $(\Omega, \mathcal{F},P)$ be a probability space, let $\mathcal{G}\subseteq \mathcal{F}$ be a sub- $\sigma$ -field, and let C be an rcd for P given $\mathcal{G}$ . Then there does not exist a factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C.

The two theorems show that rcds offer unique pragmatic advantages in Kolmogorov knowledge acquisition scenarios.

In [Reference Rescorla36], I motivated the $\mathcal{G}\otimes \mathcal{F}$ -measurability requirement on bookie strategies along lines similar to Section 4.1. I spoke of “membership information” rather than “certainty profiles,” but otherwise my reasoning was essentially the same. I offered the event ${Y}^{-1}(-\infty, a]$ as a formal proxy for the proposition The bet offered by the bookie has net gain $\le a$ . $\mathcal{G}\otimes \mathcal{F}$ -measurability then codifies the following intuitive thought: an observer who knows which new membership information the agent and bookie acquire and which events in $\mathcal{F}$ occurred thereby gains implicit knowledge whether the bet offered by the bookie has net gain $\le a$ .

Meehan and Zhang [Reference Meehan and Zhang28] argue that, assuming a factive conception of credal updates, the $\mathcal{G}\otimes \mathcal{F}$ -measurability requirement on bookie strategies is misguided. Specifically, they deny that ${Y}^{-1}(-\infty, a]$ is the correct formal proxy for the proposition The bet offered by the bookie has net gain $\le a$ . In a factive setting, the bookie will offer bet ${Y}_{\omega }$ where $\omega$ is the true outcome. The bet’s net gain is ${Y}_{\omega}(\omega)$ . Accordingly, Meehan and Zhang urge that the correct formal proxy is

$$\begin{align*}\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}.\end{align*}$$

They offer a revised notion of Dutch book, which requires

  1. (14) $\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}\in \mathcal{F}$

rather than

$$\begin{align*}{Y}^{-1}(-\infty, a]=\left\{(\omega, \nu):{Y}_{\omega}(\nu)\le a\right\}\in \mathcal{G}\otimes \mathcal{F}\end{align*}$$

in order for Y to count as a bookie strategy. They prove a Dutch book theorem and converse Dutch book theorem using the revised notion of Dutch book. The theorems establish that a Dutch book (in the sense favored by Meehan and Zhang) does not exist for update rule C iff C is an rcd that is proper almost everywhere. Meehan and Zhang conclude that rcds per se do not offer unique pragmatic advantages over alternative update rules; only rcds that are proper almost everywhere offer unique pragmatic advantages.

In my opinion, $\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}$ is not a good formal proxy for the proposition The bet offered by the bookie has net gain $\le a$ . Assuming a factive setting,

  1. (15) The bookie will acquire membership information ${\delta}_{\omega }$ indexed by the true outcome $\omega$ .

It then follows that

  1. (16) The bet offered by the bookie has net gain $\le a$ iff the true outcome belongs to $\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}$ .

However, the bookie may not know (15) even if the agent knows it. If the bookie does not know (15), then he is not positioned to know (16) either. But then the bookie, in learning that the true outcome belongs to $\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}$ , will not thereby learn that his bet has net gain $\le a$ . Furthermore, even if the agent and the bookie both know (15), the agent may not know that the bookie knows (15). But then the agent does not know that the bookie, by learning that the true outcome belongs to $\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}$ , thereby learns that his bet has net gain $\le a$ . For that reason, the agent does not know that the bookie can assess the outcome of his own bet. For all the agent knows, the bookie cannot detect whether he won the bet!

Dutch books are supposed to inflict a sure loss. A sure loss means that, from the agent’s viewpoint, the agent is guaranteed to lose [Reference Rescorla39]. If the bookie does not know that the agent lost the bet, then the agent will not actually lose money because the bookie will never collect his winnings. If the agent does not expect the bookie to know that the agent lost money, then the agent does not regard a loss as guaranteed. A loss that is undetectable by the bookie or that the agent thinks is undetectable by the bookie does not constitute a “sure loss.”

(14) allows a “loss” that the bookie never detects. It also allows a “loss” that the bookie detects but that the agent does not expect the bookie to detect. In neither case does the book inflict a sure loss. So (14) does not secure genuine Dutch bookability. In contrast, suppose we impose the $\mathcal{G}\otimes \mathcal{F}$ -measurability requirement. Then, as argued in Section 4.1, the bookie can detect whether he won the bet provided that he learns which new information was acquired and learns which events in $\mathcal{F}$ occurred. I conclude that we should impose the $\mathcal{G}\otimes \mathcal{F}$ -measurability requirement on bookie strategies even for the special case of factive Dutch books.Footnote 6

However things stand with factive Dutch books, $\mathcal{G}\otimes \mathcal{F}$ -measurability is surely needed for a suitable conception of non-factive Dutch books. Once we lift all factivity assumptions, the bookie may acquire a certainty profile ${\delta}_{\omega }$ such that

$$\begin{align*}{\delta}_{\omega}\ne {\delta}_{\nu },\end{align*}$$

where $\nu$ is the true outcome. In that context, ${Y}^{-1}(-\infty, a]$ is clearly a much better formal proxy than $\left\{\omega :{Y}_{\omega}(\omega)\le a\right\}$ for the proposition The bet offered by the bookie has net gain $\le a$ . Even if we agree with Meehan and Zhang that $\mathcal{G}\otimes \mathcal{F}$ -measurability is a misguided constraint on bookie strategies in the factive case, we should embrace it in the non-factive case that I am addressing in the present paper.

4.3. Non-factive Dutch books

I now prove a Dutch book theorem and converse Dutch book theorem tailored to Kolmogorov certainty acquisition scenarios.

Non-Factive Dutch Book Theorem for Kolmogorov Conditionalization : Let $(\Omega, \mathcal{F},P)$ be a probability space, and let C be a proper update rule for $(\Omega, \mathcal{F})$ and $\mathcal{G}$ . If C is not an rcd for P given $\mathcal{G}$ , then there exists a non-factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C.

Proof: Since C is not an rcd for P given $\mathcal{G}$ , $C(A|.)$ must violate the integral formula for some $A\in \mathcal{F}$ . It follows that

$$\begin{align*}P\left\{\omega :C(A|\omega)\ne {P}_{\mathcal{G}}(A|\omega )\right\}>0,\end{align*}$$

where ${P}_{\mathcal{G}}(A|.)$ is some conditional probability for A given $\mathcal{G}$ . Note that

$$\begin{align*}\left\{\omega :C(A|\omega)\ne {P}_{\mathcal{G}}(A|\omega )\right\}=\left\{\omega :C(A|\omega)<{P}_{\mathcal{G}}(A|\omega )\right\}\cup \left\{\omega :C(A|\omega)>{P}_{\mathcal{G}}(A|\omega )\right\}.\end{align*}$$

Both sets on the right-hand side belong to $\mathcal{G}$ . At least one of these two sets must have non-zero P-measure. Without loss of generality, suppose the first does. Call this set G:

$$\begin{align*}G\ {=}_{df}\left\{\omega :C(A|\omega)<{P}_{\mathcal{G}}(A|\omega )\right\}.\end{align*}$$

We will use G to rig the Dutch book. Informally, we proceed as follows. At ${t}_1$ , the bookie offers a conditional bet given by Table 1. At ${t}_2$ , the bookie offers a new conditional bet given by Table 2. Table 3 summarizes net gain for the two bets, given any outcome $\nu \in \Omega$ . Net gain is 0 when $\nu \notin G$ and negative when $\nu \in G$ . To ensure a net gain for all outcomes $\nu$ , we add a side bet on G at ${t}_1$ . We thereby achieve a non-factive Dutch book. In the case where $\left\{\omega :C(A|\omega)>{P}_{\mathcal{G}}(A|\omega )\right\}$ has non-zero P-measure, we merely reverse payoffs from all the aforementioned bets.

Table 1 The bet has payoff 1 if A occurs, but it is cancelled if G does not occur. Furthermore, the price of the bet ${P}_{\mathcal{G}}(A|\nu)$ depends on the true outcome $\nu$ . At time ${t}_1$ , the bookie offers to sell this conditional bet to the agent. The final column gives the agent’s net gain if she accepts the bookie’s offer

Table 2 At time ${t}_2$ , the bookie offers to buy this conditional bet from the agent. The final column gives the agent’s net gain if she accepts the bookie’s offer. Net gain is computed by subtracting the payoff from the price, since the bookie is offering to buy rather than sell the bet

Table 3 Net gain for the overall gambling scenario

Let us now formalize these ideas. As always, we codify bets as random variables. First, define random variable X by

$$\begin{align*}X(\nu)=\left\{\begin{array}{@{}cl}1-{P}_{\mathcal{G}}(A|\nu) &\mathit{if}\;\nu\in A\cap G,\\ {}-{P}_{\mathcal{G}}(A|\nu) &\mathit{if}\;\nu\in {A}^c\cap G,\\ {}0 &\mathit{if}\;\nu\notin G.\end{array}\right.\end{align*}$$

Intuitively, this is the bet offered at ${t}_1$ . Define bookie strategy $Y(\omega, \nu)$ by

$$\begin{align*}Y\!\left(\omega, \nu\right)=\left\{\begin{array}{@{}cl}C(A|\nu)-1 &\mathit{if}\;\nu\in A\cap G,\\ {}C(A|\nu) &\mathit{if}\;\nu\in {A}^c\cap G,\\ {}0 &\mathit{if}\;\nu\notin G.\end{array}\right.\end{align*}$$

Note that $Y(\omega, \nu)$ does not depend upon $\omega$ . Intuitively: the bookie offers the bet from Table 2 no matter which new certainties over $\mathcal{G}$ the agent gains at ${t}_2$ . Let $L\ {=}_{df}\ \underset{G}{\int}\left[C(A|\nu)-{P}_{\mathcal{G}}(A|\nu )\right] dP(\nu )$ if this integral is finite. If the integral is infinite, then let L be any finite negative number. Either way, we have

$$\begin{align*}\underset{G}{\int}\left[C(A|\nu)-{P}_{\mathcal{G}}(A|\nu )\right] dP(\nu )\le L.\end{align*}$$

Define random variable Z:

$$\begin{align*}Z(\nu)=\left\{\begin{array}{@{}ll}(P(G)-1)\left[C(A|\nu)-{P}_{\mathcal{G}}(A|\nu)\right] &\mathit{if}\;\nu\in G,\\ {}L &\mathit{if}\;\nu\notin G.\end{array}\right.\end{align*}$$

This is the side bet on G advertised above. We show that $(X+Z,Y)$ is a non-factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C.

One can easily check that Y is a bookie strategy and that $(X+Z,Y)$ satisfies clause (iv*) in the definition of non-factive Dutch book. To prove that X is acceptable relative to P, we write

$$\begin{align*}X(\nu )={I}_G(\nu )\left[{I}_A(\nu )-{P}_{\mathcal{G}}(A|\nu )\right],\end{align*}$$

where, for any set S, ${I}_S$ is the indicator function for S:

$$\begin{align*}{I}_s(\omega)=\left\{\begin{array}{@{}cl}1 &\mathit{if}\;\omega \in S,\\ {}0& \mathit{if}\;\omega \notin S.\end{array}\right.\end{align*}$$

Routine computation using the integral formula confirms that

$$\begin{align*}{E}_P[X]=\underset{\Omega}{\int }{I}_G(\nu )\left[{I}_A(\nu )-{P}_{\mathcal{G}}(A|\nu )\right] dP(\nu )=0.\end{align*}$$

Similarly, we write

$$\begin{align*}Z(\nu)={I}_G(\nu)(P(G)-1)\left[C(A|\nu)-{P}_{\mathcal{G}}(A|\nu)\right]+{I}_{G^c}(\nu)L,\end{align*}$$

and routine computation confirms that

$$\begin{align*}{E}_P\left[Z\right]=\underset{\Omega}{\int}\left[{I}_G(\nu)(P(G)-1)\left[C(A|\nu)-{P}_{\mathcal{G}}(A|\nu)\right]+{I}_{G^c}(\nu)L\right] dP(\nu)\ge 0.\end{align*}$$

See [Reference Rescorla36] for the details of these computations. Thus, $X+Z$ is acceptable relative to P. To show that Y satisfies clause (iii), pick any $\omega$ . We have assumed that C is proper at $\omega$ :

  1. (17) ${C}_{\omega }(B)=1\;\mathrm{if}\;\omega \in B$ ,

  2. (18) ${C}_{\omega }(B)=0\;\mathrm{if}\;\omega \notin B$ ,

for all $B\in \mathcal{G}$ . We will show that ${Y}_{\omega }$ ’s expectation is 0 relative to ${C}_{\omega }$ :

$$\begin{align*}{E}_{C_{\omega }}\left[{Y}_{\omega}\right]=\int {Y}_{\omega }{dC}_{\omega}=0.\end{align*}$$

Define

$$\begin{align*}J\ {=}_{df}\left\{\pi :C(A|\omega)=C(A|\pi )\right\}.\end{align*}$$

Since $\omega \in J\in \mathcal{G}$ , (17) and (18) entail that

$$\begin{align*}{C}_{\omega }(J)=1,\end{align*}$$
$$\begin{align*}{C}_{\omega}({J}^c)=0.\end{align*}$$

We may write

$$\begin{align*}{Y}_{\omega}(\nu)={I}_G(\nu )\left[C(A|\nu)-{I}_A(\nu )\right],\end{align*}$$

and then compute

$$\begin{align*}&{E}_{C_{\omega }}\left[{Y}_{\omega}\right]=\underset{\Omega}{\int }{I}_G\left[C(A|\nu)-{I}_A\right]{dC}_{\omega}=\underset{\Omega}{\int }{I}_GC(A|\nu){dC}_{\omega }-\underset{\Omega}{\int }{I}_G{I}_A{dC}_{\omega }\\& \quad =\underset{G}{\int }C(A|\nu){dC}_{\omega }-\underset{G\cap A}{\int }{dC}_{\omega}=\underset{G\cap J}{\int }C(A|\nu){dC}_{\omega }+\underset{G\cap {J}^c}{\int }C(A|\nu){dC}_{\omega }-\underset{G\cap A}{\int }{dC}_{\omega }\\& \quad =\underset{G\cap J}{\int }C(A|\nu){dC}_{\omega }+0-{C}_{\omega}(G\cap A)=\underset{G\cap J}{\int }C(A|\omega){dC}_{\omega }-C(A\cap G|\omega)\\& \quad =C(A|\omega)\underset{G\cap J}{\int }{dC}_{\omega }-C(A\cap G|\omega)=C(A|\omega)C(G\cap J|\omega )-C(A\cap G|\omega),\end{align*}$$

where $\omega$ in these equations is held fixed and is not an integration variable. Now, either $\omega \in G$ or $\omega \notin G$ . In the first case, we have $\omega \in G\cap J\in \mathcal{G}$ , so (17) ensures that

$$\begin{align*}C(G\cap J|\omega )=1.\end{align*}$$

(17) also ensures that $C(G|\omega)=1$ , so that

$$\begin{align*}C(A\cap G|\omega)=C(A|\omega).\end{align*}$$

Thus, our expression for ${Y}_{\omega }$ ’s expectation reduces to

$$\begin{align*}{E}_{C_{\omega }}\left[{Y}_{\omega}\right]=C(A|\omega)C(G\cap J|\omega )-C(A\cap G|\omega)=C(A|\omega)-C(A|\omega)=0.\end{align*}$$

If on the other hand $\omega \notin G$ , then (18) ensures that

$$\begin{align*}C(G\cap J|\omega )=0\end{align*}$$

and

$$\begin{align*}C(A\cap G|\omega)=0,\end{align*}$$

so our expression for ${Y}_{\omega }$ ’s expectation again reduces to 0. Either way, ${Y}_{\omega }$ ’s expectation is 0. Hence, $(X+Z,Y)$ is a non-factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C.

We can strengthen the theorem so that all bets have positive expected net payoff. To accomplish this, we add a small “sweetener” that is large enough to boost expected net payoffs above zero but small enough not to cancel out the net loss inflicted by the original book. See [Reference Rescorla36] for details regarding the factive case; those details require only slight modification to fit the non-factive case.

Non-Factive Converse Dutch Book Theorem for Kolmogorov Conditionalization : Let $(\Omega, \mathcal{F},P)$ be a probability space, and let C be an rcd for P given $\mathcal{G}$ . Then there does not exist a non-factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C.

Proof: Clause (iv*) in the definition of non-factive Dutch book entails clause (iv) in the definition of factive Dutch book. So a non-factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C is a factive Dutch book for $(\Omega, \mathcal{F},P)$ , $\mathcal{G}$ , and C. The Factive Converse Dutch Book Theorem entails that a factive Dutch book does not exist. It follows that a non-factive Dutch book does not exist.

The non-factive Dutch book theorem confines attention to proper update rules. It states that, if a proper update rule violates the integral formula, then a non-factive Dutch book exists. Note the sharp contrast with the factive Dutch book theorem proved in [Reference Rescorla36], which does not assume propriety. The contrast reflects a crucial difference between Kolmogorov certainty acquisition scenarios and Kolmogorov knowledge acquisition scenarios. In a Kolmogorov certainty acquisition scenario, the agent gains new certainties regarding $\mathcal{G}$ and updates her credences on that basis. The only update rules that can deliver such an update are proper update rules—hence the restriction to proper updates rules in the non-factive Dutch book theorem. In a Kolmogorov knowledge acquisition scenario, the agent gains knowledge regarding $\mathcal{G}$ and updates her credences on that basis. As emphasized in Section 3.4, knowledge is not certainty. An agent who comes to know E does not necessarily become certain that E. It is possible in principle for the agent to use an improper update rule—hence the attention to all update rules, and not just proper update rules, in the factive Dutch book theorem.

There is also a sharp contrast between the role that propriety plays in my discussion and the role that it plays in Meehan and Zhang’s discussion. Meehan and Zhang attempt to show that impropriety induces Dutch bookability. Section 4.3 raised doubts about whether the attempt succeeds. Be that as it may, the non-factive Dutch book theorem does not even purport to show that impropriety induces Dutch bookability. Propriety figures in Meehan and Zhang’s discussion as a putative rational norm that an update policy may or may not violate. It figures in my discussion as a usability condition: only proper update rules can be used to update credences in light of new certainties, because impropriety induces a conflict (10) between the update rule and the new certainties. My setup treats propriety as a necessary condition on usable update rules rather than a putative rational norm that requires justification.

5. Significance of the theorems

Taken together, the non-factive Dutch book theorem and non-factive converse Dutch book theorem show that Kolmogorov conditionalization has highly desirable pragmatic properties. Any alternative credal reallocation policy leaves you vulnerable to a guaranteed net loss in Kolmogorov certainty acquisition scenarios. If you are a Kolmogorov conditionalizer, then you are immunized from that unsavory prospect. Kolmogorov conditionalization is the unique credal reallocation strategy that guards against Dutch bookability in Kolmogorov certainty acquisition scenarios.

The theorems also clarify the status of impropriety. Upon learning that rcds are not always proper, one naturally asks whether some alternative theory of conditional probability might prove superior to Kolmogorov’s. Section 4’s theorems show that, whatever benefits an alternative theory might offer, it would carry serious costs. Any proper update rule besides an rcd will run afoul of Section 4’s Dutch book theorem. If the specter of impropriety impels you to replace Kolmogorov conditionalization with some alternative update strategy, then you face a sure loss no matter the true outcome and no matter your new certainty profile.

We must distinguish between Dutch book theorems and Dutch book arguments. Dutch book theorems are uncontroversial mathematical results. Dutch book arguments use Dutch book theorems to argue that one should conform to certain credal norms. For example, Lewis [Reference Lewis26] and Skyrms [Reference Skyrms43] cite diachronic Dutch book theorems to argue that epistemic rationality favors conditionalization over rival credal reallocation policies. Critics have raised various objections to Dutch book arguments [Reference Hájek, Anand, Pattanaik and Puppe17], especially diachronic Dutch book arguments [Reference Christensen5]. In this paper, I have not mounted a Dutch book argument. I have not argued that epistemic rationality requires you to be a Kolmogorov conditionalizer, or anything along those lines.

To my mind, Section 4’s theorems are important primarily for elucidatory and diagnostic purposes, not for their potential to sustain a Dutch book argument. The theorems help us understand the benefits that accrue when your update rule abides by the integral formula. Someone who flouts the integral formula leaves herself vulnerable to a guaranteed net loss in a way that someone who conforms to it does not. Whether or not the theorems show that Kolmogorov conditionalization is an epistemically rational credal reallocation policy, they show that it has uniquely attractive pragmatic virtues. In that respect, the theorems depict Kolmogorov conditionalization as a natural extension of ratio conditionalization to null updating scenarios.

Admittedly, the extension does not go quite as smoothly as one might hope. We have encountered several concerning features of rcds:

  1. (a) Rcds may not exist, in which case Kolmogorov conditionalization is impossible.

  2. (b) Outside the special case where the ratio formula prevails, there are infinitely many rcds for P given $\mathcal{G}$ .

  3. (c) Rcds may be improper, in which case Kolmogorov conditionalization is impossible in Kolmogorov certainty acquisition scenarios.

(a) and (c) show that Kolmogorov conditionalization is not always possible. (b) shows that there are sometimes infinitely many distinct ways to be a Kolmogorov conditionalizer; in contrast, ratio conditionalization dictates a single unique credal update.

Despite challenges regarding existence, uniqueness, and impropriety, Kolmogorov conditionalization is a highly general credal reallocation strategy that applies across a huge range of null updating scenarios. Most likely, it applies to all null updating scenarios that might arise in scientific and philosophical applications of Bayesian decision theory. Section 4’s theorems establish that it offers distinctive benefits. All the more fitting, then, that it figures so centrally in contemporary scientific applications of Bayesian decision theory. The time is long past for philosophers to join the broader scientific community by assigning rcds a central role within the foundations of Bayesian inference.

Acknowledgments

I thank Alexander Meehan, Stephen Ge, Snow Zhang, and two anonymous referees for this journal for discussion and feedback that helped me improve the paper.

Footnotes

1 Kolmogorov himself favored a frequentist rather than subjectivist interpretation of probability. So he would not have elucidated his mathematical framework in the subjectivist way that I will elucidate it.

2 Lee [Reference Lee25] also discusses how sub- $\sigma$ -fields can model non-factive doxastic states.

3 One of the main examples discussed by Seidenfeld, Schervish, and Kadane [Reference Seidenfeld, Schervish and Kadane42] is the tail $\sigma$ -field $\mathcal{A}$ for infinitely many flips of a fair coin. $\mathcal{A}$ contains all events that do not depend upon any finite initial sequence of coin tosses. For example, the event the 100th coin toss is heads does not belong to $\mathcal{A}$ , while the event all but finitely many coin tosses are heads does belong. $\mathcal{A}$ , which is not countably generated, generates extreme impropriety when used as a conditioning sub- $\sigma$ -field. Since tail $\sigma$ -fields arise naturally in probability theory, one might argue that $\mathcal{A}$ is a conditioning sub- $\sigma$ -field that occurs in scientific practice yet that is not countably generated. However, while $\mathcal{A}$ figures in probability theory, it does not figure in scientific applications that use probability theory to model credence. Thus, I stand by my assertion in the main text: the only conditioning sub- $\sigma$ -fields that figure in the scientific study of credal reallocation are countably generated sub- $\sigma$ -fields.

4 See [Reference Gaifman, Skyrms and Harper13, Reference Rescorla39] for discussion of how to model higher-order credence.

5 In [Reference Rescorla36], I called these Kolmogorov learning scenarios. My terminology here is more informative albeit more cumbersome.

6 $\left(\Omega \times \Omega, \mathcal{G}\otimes \mathcal{F}\right)$ extends the measurable space $(\Omega, \mathcal{F})$ we are using to model the agent’s credal states at ${t}_1$ and ${t}_2$ . We may imagine the agent and the bookie, once all bets have been offered, reflecting on the past gambling interaction and transitioning to knowledge modeled by the larger space $\left(\Omega \times \Omega, \mathcal{G}\otimes \mathcal{F}\right)$ . After the transition, the agent and the bookie both gain at least implicit knowledge regarding the outcome of the bet offered at ${t}_2$ .

References

BIBLIOGRAPHY

Bennett, B., Hoffman, D., & Prakash, C. (1996). Observer theory, Bayes theory, and psychophysics. In Knill, D. & Richards, W., editors. Perception as Bayesian Inference. Cambridge: Cambridge University Press, pp. 229235.Google Scholar
Billingsley, P. (1995). Probability and Measure (third edition). New York: Wiley.Google Scholar
Blackwell, D., & Dubins, L. (1975). On existence and non-existence of proper, regular conditional distributions. The Annals of Probability, 3, 741752.CrossRefGoogle Scholar
Blackwell, D., & Ryll-Nardzewski, C. (1963). Non-existence of everywhere proper conditional distributions. The Annals of Mathematical Statistics, 34, 223225.CrossRefGoogle Scholar
Christensen, D. (1991). Clever bookies and coherent beliefs. Philosophical Review, 100, 229247.CrossRefGoogle Scholar
de Finetti, B. (1937/1980). Foresight: Its logical laws, its subjective sources. In KyburgJr., H. E. & Smokler, H. E., editors. Rpt. in Studies in Subjective Probability . Huntington: Robert E. Krieger, pp. 53118.Google Scholar
de Finetti, B. (1972). Probability, Induction, and Statistics. New York: Wiley.Google Scholar
Easwaran, K. (2008). The Foundations of Conditional Probability. Ph.D. Thesis, University of California, Berkeley; Ann Arbor: ProQuest/UMI (Publication No. 3331592).Google Scholar
Easwaran, K. (2011). Varieties of conditional probability. In Bandyopadhyay, P. & Forster, M., editors. Philosophy of Statistics. Burlington: Elsevier, pp. 137148.CrossRefGoogle Scholar
Easwaran, K. (2019). Conditional probabilities. In Pettigrew, R. & Weisberg, J., editors. The Open Handbook of Formal Epistemology. PhilPapers, pp. 131198.Google Scholar
Florens, J.-P., Mouchart, M., & Rolin, J.-M. (1990). Elements of Bayesian Statistics. New York: Marcel Dekker.Google Scholar
Fristedt, B., & Gray, L. (1997). A Modern Approach to Probability Theory. Boston: Birkhäuser.CrossRefGoogle Scholar
Gaifman, H. (1988). A theory of higher-order probabilities. In Skyrms, B. & Harper, W., editors. Causation, Chance, and Credence: Proceedings of the Irvine Conference on Probability and Causation. Boston: Kluwer.Google Scholar
Ghosal, S., & van der Vaart, A. (2017). Fundamentals of Nonparametric Bayesian Inference. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Gyenis, Z., Hofer-Szabó, G., & Rédei, M. (2017). Conditioning using conditional expectations: The Borel–Kolmogorov paradox. Synthese, 194, 25952630.CrossRefGoogle Scholar
Hájek, A. (2003) What conditional probability could not be. Synthese, 137, 273323.CrossRefGoogle Scholar
Hájek, A. (2008). Dutch book arguments. In Anand, P., Pattanaik, P., & Puppe, C., editors, The Handbook of Rationality and Social Choice. Oxford: Oxford University Press, pp. 173195.Google Scholar
Hájek, A. (2011). Conditional probability. In Bandyopadhyay, P. & Forster, M., editors. Philosophy of Statistics. Burlington: Elsevier.Google Scholar
Howson, C. (2014). Finite additivity, another lottery paradox, and conditionalization. Synthese, 191, 9891012.CrossRefGoogle Scholar
Huttegger, S. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8, 611648.CrossRefGoogle Scholar
Huttegger, S. (2017). The Probabilistic Foundations of Rational Learning. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Huttegger, S., & Nielsen, M. (2020). Generalized learning and conditional expectation. Philosophy of Science, 87, 868883.CrossRefGoogle Scholar
Kiefer, N., & Nyarko, Y. (1995). Savage–Bayesian models of economics. In Kirman, A. & Salmon, M., editors. Learning and Rationality in Economics. Oxford: Blackwell, pp. 4062.Google Scholar
Kolmogorov, A. N. (1933/1956). Foundations of the Theory of Probability (second English edition). N. Morrison, translator. New York: Chelsea.Google Scholar
Lee, J. J. (2018). Formalization of information: Knowledge and belief. Economic Theory, 66, 10071022.CrossRefGoogle Scholar
Lewis, D. (1999). Why conditionalize?. In Papers in Metaphysics and Epistemology. Cambridge: Cambridge University Press, pp. 403407.CrossRefGoogle Scholar
Meehan, A., & Zhang, S. (2020). Jeffrey meets Kolmogorov: A general theory of conditioning. Journal of Philosophical Logic, 49, 941979.CrossRefGoogle Scholar
Meehan, A., & Zhang, S. (2022). Kolmogorov conditionalizers can be Dutch booked (if and only if they are evidentially uncertain). The Review of Symbolic Logic, 15, 722757.CrossRefGoogle Scholar
Mertens, J.-F., & Zamir, S. (1985). Formulation of Bayesian analysis for games of incomplete information. International Journal of Game Theory, 14, 129.CrossRefGoogle Scholar
Myrvold, W. (2015). You can’t always get what you want: Some considerations regarding conditional probabilities. Erkenntnis, 80, 573603.CrossRefGoogle Scholar
Nielsen, M. (2021). A new argument for Kolmogorov conditionalization. The Review of Symbolic Logic, 14, 930945.CrossRefGoogle Scholar
Ramachandran, D. (1979). Existence of independent complements in regular conditional probability spaces. Annals of Probability, 7, 433443.CrossRefGoogle Scholar
Ramsey, F. P. (1931). Truth and probability. In Braithwaite, R. B., editor. The Foundations of Mathematics and Other Logical Essays. London: Routledge and Kegan, pp. 156199.Google Scholar
Rao, M. M. (2005). Conditional Measures and Applications (second edition). Boca Raton: CRC Press.CrossRefGoogle Scholar
Rescorla, M. (2015). Some epistemological ramifications of the Borel–Kolmogorov paradox. Synthese, 192, 735767.CrossRefGoogle Scholar
Rescorla, M. (2018). A Dutch book theorem and converse Dutch book theorem for Kolmogorov Conditionalization. The Review of Symbolic Logic, 11, 705735.CrossRefGoogle Scholar
Rescorla, M. (2021). On the proper formulation of Conditionalization. Synthese, 198, 19351965.CrossRefGoogle Scholar
Rescorla, M. (2022). An improved Dutch book theorem for Conditionalization. Erkenntnis, 87, 10131041.CrossRefGoogle Scholar
Rescorla, M. (2023). Reflecting on diachronic Dutch books. Noûs, 57, 511538.CrossRefGoogle Scholar
Rescorla, M. (2024). Bayesian defeat of certainties. Synthese, 203, 50.CrossRefGoogle Scholar
Seidenfeld, T. (2001). Remarks on the theory of conditional probability: Some issues of finite versus countable additivity. In Hendricks, V., Pedersen, S., & Jørgensen, K., editors. Probability Theory: Philosophy, Recent History, and Relations to Science. Dordrecht: Kluwer, pp. 167178.CrossRefGoogle Scholar
Seidenfeld, T., Schervish, M., & Kadane, J. (2001). Improper regular conditional distributions. Annals of Probability, 29, 16121624.Google Scholar
Skyrms, B. (1987). Dynamic coherence and probability kinematics. Philosophy of Science, 54, 120.CrossRefGoogle Scholar
Sokal, A. (1981). Existence of compatible families of proper regular conditional probabilities. Zeitschrift für Wahrscheinliehkeitstheorie und verwandte Gebiete, 56, 537548.CrossRefGoogle Scholar
Teller, P. (1973). Conditionalization and observation. Synthese, 26, 218258.CrossRefGoogle Scholar
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. Cambridge: MIT Press.Google Scholar
Trotta, R. (2008). Bayes in the sky: Bayesian inference and model selection in cosmology. Contemporary Physics, 49, 71104.CrossRefGoogle Scholar
Figure 0

Figure 1 A visualization of $\Omega \times \Omega$. In general, there need not be any natural linear ordering of $\Omega$. Nevertheless, the visualization is a useful heuristic. The horizontal axis corresponds to $\left(\Omega, \mathcal{G}\right)$. Points on this axis index the agent’s new certainty profile, which determines the bet selected by the bookie. When the agent acquires certainty profile ${\delta}_{\omega }$, the bookie offers bet ${Y}_{\omega }$. The vertical axis corresponds to $(\Omega, \mathcal{F})$. Points on this axis reflect the true outcome. We specify how a gambling scenario unfolds by specifying an ordered pair $(\omega, \nu)$, where $\omega$ is an index for the agent’s new certainty profile and $\nu$ is the true outcome. Each ordered pair $(\omega, \nu)$ determines a well-defined net gain ${Y}_{\omega}(\nu)$. A factive Dutch book inflicts a net loss along the diagonal. A non-factive Dutch book inflicts a net loss on all points.

Figure 1

Table 1 The bet has payoff 1 if A occurs, but it is cancelled if G does not occur. Furthermore, the price of the bet ${P}_{\mathcal{G}}(A|\nu)$ depends on the true outcome $\nu$. At time ${t}_1$, the bookie offers to sell this conditional bet to the agent. The final column gives the agent’s net gain if she accepts the bookie’s offer

Figure 2

Table 2 At time ${t}_2$, the bookie offers to buy this conditional bet from the agent. The final column gives the agent’s net gain if she accepts the bookie’s offer. Net gain is computed by subtracting the payoff from the price, since the bookie is offering to buy rather than sell the bet

Figure 3

Table 3 Net gain for the overall gambling scenario