1. Introduction
Consider the following discrete optimal stopping problem as first described in [Reference Shepp32] by Shepp. An urn initially contains m balls worth $-\$1$ each and p balls worth $+\$1$ each, where m and p are positive integers known a priori. Balls are randomly sampled (one at a time and without replacement) and their value is added to a running total. Before any draw, the optimal stopper can choose to stop sampling and receive the cumulative sum up to that point. The goal is to find the stopping rule which maximizes the expected payout from a given (m, p)-urn.
The urn scheme described above was originally formulated in relation to the classical optimal stopping problem of maximizing the average value of a sequence of independent and identically distributed random variables (see [Reference Breiman4, Reference Chow and Robbins8, Reference Dvoretzky14], among others). The scheme has also been considered in relation to numerous other problems in the subsequent literature. For example, in [Reference Chen, Grigorescu and Kang6], the authors considered an extension in which the stopper exhibits risk aversion (modeled as the limited ability to endure negative fluctuations in the running total). An extension in which the stopper is able to draw more than one ball at a time was considered in [Reference Dehghanian and Kharoufeh11]. Related to the current note, [Reference Chen, Zame, Lin and Wu7] (and subsequently [Reference Hung25]) considered the urn problem where the composition of balls in the urn is not known with certainty (i.e. where $p+m$ is known but p is not).
The aim of the present note is to introduce a variant of Shepp’s urn problem in which the sampling procedure used is not known with certainty. Specifically, while the result of each draw is observable, we assume that the optimal stopper is uncertain about whether or not the balls are removed from the urn after sampling. In other words, whether sampling is done with or without replacement. Since the probability of sampling a given ball type is different under the two different sampling procedures, sequentially observing the random draws will reveal statistical information about the true procedure being used. Hence, we adopt a Bayesian approach and assume the optimal stopper has a prior belief of $\pi$ that the samples are not being replaced. They then, sequentially, update this belief (via Bayes) after each random draw. Since the goal is to maximize the expected payout upon stopping, any stopping rule must account for the expected learning that will occur over time.
Shepp demonstrated that the optimal rule for the original problem is of a threshold type. In particular, denoting by $\mathcal{C}$ the set of all urns with a positive expected value (upon stopping optimally), then $\mathcal{C}=\{(m,p) \mid m\leq\beta(p)\}$ , where $\beta(p)$ is a sequence of unique constants dependent on p (which must be computed via recursive methods, cf. [Reference Boyce3]). It is thus optimal to draw a ball if there are sufficiently many p balls relative to m balls (or sufficiently few m balls relative to p balls). Intuitively, $\beta(p)>p$ , and hence a ball should not be sampled when the current state of the urn satisfies $p-m\leq p-\beta(p)<0$ . Put differently, the optimal stopper should stop sampling when the running total exceeds some critical (positive) threshold, dependent on the current state of the urn.
Of particular importance to the current note, Shepp [Reference Shepp32, p. 1001 also connected the urn problem (when the sampling method was known) with the continuous-time problem of optimally stopping a Brownian bridge. Specifically, via an appropriate scaling, the running total (cumulative sum) process was shown to converge to a Brownian bridge which starts at zero (at $t=0$ ) and pins to some location a (at $t=1$ ). Importantly, the known constant a depends on the initial values of m and p, with $a = ({m-p})/{\sqrt{m+p}}$ . Hence, the sign of the pinning location depends on the relative abundance of m- and p-balls in the urn. The continuous-time problem was shown by Shepp [Reference Shepp32] to admit a closed-form solution, and the optimal stopping strategy was found, once more, to be of threshold type – being the first time that the Brownian bridge exceeds some time-dependent boundary, given by $a+\alpha\sqrt{1-t}$ with $\alpha\approx 0.839\,92$ .
Given the success and closed-form nature of such continuous-time approximations, we choose not to tackle the discrete version of our problem directly, instead formulating and solving the continuous-time analog. In such a setting, uncertainty about the true sampling procedure manifests itself in uncertainty about the drift of the underlying (cumulative sum) process. In particular, the process is believed to be either a Brownian bridge pinning to a (if sampling is done without replacement) or a Brownian motion with drift a (if sampling is done with replacement), and the optimal stopper must learn about which it is over time. Despite this additional uncertainty, we find that the problem has a closed-form solution when $a=0$ and, remarkably, the optimal strategy is found to coincide with the optimal strategy of the classical problem (where the sampling procedure/drift is known with certainty). The expected payout, however, is lower due to the additional uncertainty present. When $a\neq 0$ , the problem is more complicated and a richer solution structure emerges (with multiple optimal stopping boundaries possible).
This note therefore contributes to the literature on both optimally stopping a Brownian bridge (e.g. [Reference Baurdoux, Chen, Surya and Yamazaki2, Reference De Angelis and Milazzo10, Reference Ekström and Vaicenavicius17, Reference Ekström and Wanntorp19–Reference Föllmer21, Reference Shepp32]) and optimal stopping in the presence of incomplete information (e.g. [Reference Ekström and Lu15, Reference Ekström and Vaicenavicius16, Reference Ekström and Vannestål18, Reference Gapeev22, Reference Glover23, Reference Johnson and Peskir26, Reference Johnson and Peskir27]). We also note that Brownian bridges have found many applications in the field of finance. For example, they have been used to model the so-called stock pinning effect [Reference Avellandeda and Lipkin1], and the dynamics of certain arbitrage opportunities [Reference Brennan and Schwartz5, Reference Liu and Longstaff29]. In both settings, the existence of the underlying economic force (creating the pinning) is more often than not uncertain. Hence, the additional uncertainty considered in this note may find application in more realistic modeling of these market dynamics.
The rest of this note is structured as follows. We start in Section 2 by commenting further on the connection between the discrete urn problem and the continuous-time analog. In Section 3, we formulate the continuous-time problem, making clear our informational assumptions. Upper and lower bounds on the value function are presented in Section 4, along with the explicit solution to the problem in the case where $a=0$ . We conclude in Section 5 with a brief discussion of the case where a is nonzero.
2. Connecting the urn problem to Brownian bridges/motion
Let $\varepsilon_i$ , for $i=1,\ldots, m+p$ , denote the results of sampling from a given (m, p)-urn, with $\varepsilon_i=-1$ for an m-ball and $\varepsilon_i=1$ for a p-ball. The partial sum after n draws is thus $X_n=\sum_{i=1}^{n}\varepsilon_i$ , with $X_0=0$ . It is well known that the discrete process $\{X_n\}_{n=0}^{m+p}$ can be approximated as a continuous-time diffusion process if we let m and p tend to infinity in an appropriate way. The resulting diffusion, however, will depend on whether sampling is done with or without replacement. Fixing m and p, we define, for $0\leq n\leq m+p$ and $n<(m+p)t\leq n+1$ ,
If sampling is done without replacement, then for $n=m+p$ (after all balls have been sampled) we have
Hence, the final value (at $t=1$ ) is known with certainty to be the constant a. In this case, it is also clear that the samples $\varepsilon_i$ are not independent and identically distributed (i.i.d.). However, Shepp demonstrated that, if a is fixed, the process $X_{m,p}(t)$ converges in distribution as $p\to\infty$ to a Brownian bridge process pinning to the point a at $t=1$ (see [Reference Shepp32, p. 1001]).
On the other hand, if sampling is done with replacement, then the samples $\varepsilon_i$ are i.i.d., and the process $X_{m,p}(t)$ in (2.1) can be seen to converge in distribution to a Brownian motion (with drift), via Donsker’s theorem. We also note that with replacement, more than $m+p$ balls can be sampled. Indeed, sampling could continue indefinitely if each sampled ball were replaced. However, after $m+p$ balls have been sampled the true nature of the sampling procedure will be revealed, since there will either be no balls left or another sample is produced. In our modified urn problem we therefore make the natural assumption that stopping must occur before more than $m+p$ balls have been sampled.
Remark 2.1. If the optimal stopper were allowed to continue beyond $m+p$ samples, then the stopper would never stop for $p>m$ ( $a>0$ ) in the sampling-with-replacement scenario (since the cumulative sum is expected to increase indefinitely). It would then also be optimal never to stop before $m+p$ balls are sampled (for $\pi<1$ at least) due to the unbounded payoff expected after $m+p$ balls. On the other hand, if $p\leq m$ ( $a\leq 0$ ), the stopper would stop at $m+p$ balls in all scenarios since the cumulative sum process after $m+p$ balls is a supermartingale regardless of whether sampling with or without replacement had been revealed. Thus, the solution to the stopping problems with restricted and unrestricted stopping times would coincide for the case where $a\leq 0$ . To avoid the degeneracy of the problem for $a>0$ , i.e. to guarantee a finite value, we chose to restrict the set of admissible stopping times to $n\leq m+p$ .
To apply Donsker’s theorem, we note that the probability of drawing a given ball type (with replacement) is constant and given by $p/(m+p)$ for a positive ball and $m/(m+p)$ for a negative ball. Therefore, $\mathbb{E}[\varepsilon_i]=(p-m)/(m+p)=a/\sqrt{m+p}$ and $\textrm{Var}(\varepsilon_i)=1-a^2/(m+p)$ . This allows us to rewrite (2.1) as
where $\widehat{\varepsilon}_i$ are now standardized random variables (with zero mean and unit variance). Since we are restricting our attention to $n\leq m+p$ , we can once more fix m and p and define $n<(m+p)t\leq n+1$ . Letting $p\to\infty$ , the process $X_{m,p}(t)$ in (2.2) thus converges to $X_t=at+B_t$ , $0\leq t\leq 1$ , where $(B_t)_{0\leq t\leq 1}$ is a standard Brownian motion (cf. [Reference Donsker12]). Note that the drift in this equation coincides with the pinning point of the Brownian bridge in the case without replacement.
With this necessary connection in place, we now proceed to formulate the continuous-time stopping problem corresponding to our variant of Shepp’s urn scheme.
3. Problem formulation and learning assumptions
Let $X=(X_t)_{t\geq 0}$ denote an observable stochastic process that is believed by an optimal stopper to be either a Brownian motion with known drift a, or a Brownian bridge that pins to a at $t=1$ . Adopting a Bayesian approach, we also assume that the optimal stopper has an initial belief of $\pi$ that the true process is a Brownian bridge (and hence a belief of $1-\pi$ that it is a Brownian motion).
This information structure can be realised on a probability space $(\Omega,\mathcal{F},\mathbb{P}_{\pi})$ where the probability measure $\mathbb{P}_{\pi}$ has the following structure:
where $\mathbb{P}_0$ is the probability measure under which the process X is the Brownian motion and $\mathbb{P}_1$ is the probability measure under which the process X is the Brownian bridge (cf. [Reference Peskir and Shiryaev31, Chapter VI, Section 21]). More formally, we can introduce an unobservable random variable $\theta$ taking values 0 or 1 with probability $1-\pi$ and $\pi$ under $\mathbb{P}_\pi$ , respectively. Thus, the process X solves the stochastic differential equation
where $B=(B_t)_{t\geq 0}$ is a standard Brownian motion, independent of $\theta$ under $\mathbb{P}_{\pi}$ .
The problem under investigation is to find the optimal stopping strategy that maximizes the expected value of X upon stopping, i.e.
Recall that the time horizon of the optimal stopping problem in (3.3) is set to one, since the uncertainty about the nature of the process is fully revealed at $t=1$ (it either pins to a or it does not).
If the process was known to be a Brownian bridge then it would be evident from (3.3) that $V\geq a$ , since simply waiting until $t=1$ would yield a value of a with certainty. However, uncertainty about $\theta$ introduces additional uncertainty in the terminal payoff, since the value received at $t=1$ could be less than a if the true process was actually a Brownian motion.
Remark 3.1. The problem described above is related to the problem studied in [Reference Ekström and Vaicenavicius17], in which the underlying process is known to be a Brownian bridge, but for which the location of the pinning point is unknown. Specifically, if the process defined in (3.2) was a standard Brownian motion then the distribution of its expected location at $t=1$ would be normal, i.e. $X_1\sim\mathcal{N}(a,1)$ . On the other hand, if the process was a Brownian bridge pinning to a at $t=1$ , then the distribution of its expected location at $t=1$ would be a point mass, i.e. $X_1\sim\delta_a$ (where $\delta_a$ denotes the Dirac delta). Hence, setting a prior on the location of the pinning point in [Reference Ekström and Vaicenavicius17] to $\mu=\pi\delta_a+(1-\pi)\mathcal{N}(a,1)$ is equivalent to the problem formulated in this note.
To account for the uncertainty about $\theta$ in (3.2), we define the posterior probability process $\Pi_t \;:\!=\; \mathbb{P}_{\pi}\big(\theta=1\mid\mathcal{F}_t^X\big)$ for $t\geq 0$ , which represents the belief that the process will pin at $t=1$ , and importantly how it is continually updated over time through observations of the process X. To determine the dynamics of the process $\Pi=(\Pi_t)_{t\geq 0}$ , we appeal to well-known results from stochastic filtering theory (see [Reference Liptser and Shiryaev28, Theorem 9.1] or [Reference Johnson and Peskir27, Section 2]), namely that, for $t\geq 0$ ,
where $\bar{B}=(\bar{B}_t)_{t\geq 0}$ is a $\mathbb{P}_{\pi}$ -Brownian motion called the innovation process, and $\rho$ denotes the signal-to-noise ratio, defined as $\rho(t,X_t) \;:\!=\; ({a-X_t})/({1-t})-a$ . While the payoff in (3.3) is only dependent on X (not $\Pi$ ), the drift of X in (3.4) contains $\Pi$ . Therefore, at first blush, it would appear that the optimal stopping problem is two-dimensional (in X and $\Pi$ ). However, since both X and $\Pi$ are driven by the same Brownian motion ( $\bar{B}$ ), the problem can, in fact, be reduced to only one spatial variable (either X or $\Pi$ ) by identifying a (time-dependent) mapping between $X_t$ and $\Pi_t$ . In what follows, we formulate the problem in terms of the original process X, since this facilitates a more transparent comparison with the case where the process is known to pin with certainty.
To establish the mapping between $X_t$ and $\Pi_t$ we have the following result.
Proposition 3.1. Given the processes $X=(X_t)_{t\geq 0}$ and $\Pi=(\Pi_t)_{t\geq 0}$ defined by (3.4) and (3.5), respectively, the following identity holds for $t\in[0,1)$ :
Proof. To establish the mapping we take advantage of the fact that both processes are driven by the same Brownian motion and define the process (cf. [Reference Johnson and Peskir26, Proposition 4])
which, after applying Itô’s formula, is seen to be of bounded variation with dynamics
Thus, $U_t$ can be solved explicitly as
and, after combining (3.7) and (3.8), we obtain the desired result.
To solve the optimal stopping problem in (3.3), we will exploit various changes of measure. In particular, from $\mathbb{P}_{\pi}$ to $\mathbb{P}_0$ (under which the process X is a standard Brownian motion with drift a) and then from $\mathbb{P}_0$ to $\mathbb{P}_1$ (under which X is a Brownian bridge pinning to a). In order to perform these measure changes, we have the following result that establishes the necessary Radon–Nikodym derivatives (cf. [Reference Johnson and Peskir27, Lemma 1]).
Proposition 3.2. Let $\mathbb{P}_{\pi,\tau}$ be the restriction of the measure $\mathbb{P}_\pi$ to $\mathcal{F}_{\tau}^X$ for $\pi\in[0,1]$ . We thus have the following:
for all stopping times $\tau$ of X, where $L^a$ is given in (3.6). The process in (3.10) is often referred to as the likelihood ratio process.
Proof. A standard rule for Radon–Nikodym derivatives under (3.1) gives
for any $\tau$ and $\pi$ , yielding identity (i). Similar arguments show that
yielding (ii). Using (3.11) and (3.12) together, and noting (3.6), we get (iii).
Next, we embed (3.3) into a Markovian framework where the process X starts at time t with value x. However, in doing so, we cannot forget that the optimal stopper’s learning about the true nature of the underlying process started at time 0 with an initial belief of $\pi$ and with $X_0=0$ . To incorporate this information, we exploit the mapping in (3.6) to calculate the stopper’s updated belief should the process reach x at time t. In other words, in our Markovian embedding, we must assume that the ‘initial’ belief at time t is not $\pi$ but $\Pi_t$ (which depends on t and x). More formally, the embedded optimal stopping problem becomes
where the processes $X=X^{t,x}$ and $\Pi$ are defined by
Note that the function $L^a$ is defined as in (3.6) and, with a slight abuse of notation, we have defined the function $\Pi(t,x,\pi)$ to be the ‘initial’ value of $\Pi$ in the embedding (dependent on t, x, and $\pi$ ). Note further that, since we are able to replace any dependence on $\Pi_{t+s}$ (for $s>0$ ) via the mapping in (3.6), we no longer need to consider the dynamics for $\Pi$ in what follows (only the initial point $\Pi_t$ ).
Since its value will be used in our subsequent analysis, we conclude this section by reviewing the solution to the classical Brownian bridge problem which is known to pin to a (at $t=1$ ) with certainty (i.e. when $\pi=1$ ). In this case, the stopping problem in (3.13) has an explicit solution (cf. [Reference Ekström and Wanntorp19, p. 175]) given by
for $t<1$ and $V^a_1(1,a)=a$ . The function $\Phi(y)$ denotes the standard cumulative normal distribution function, and $b(t)\;:\!=\;a+\alpha\sqrt{1-t}$ with $\alpha$ being the unique positive solution to
which is approximately 0.839 924. (Note that $\pi$ in (3.16) and (3.17) denotes the universal constant and not the initial belief.) Further, the optimal stopping strategy in this case is given by
4. Bounds on the value function and solution when $\boldsymbol{a}=0$
As may be expected, the solution to (3.13) depends crucially on the value of a. In fact, we find below that the problem is completely solvable in closed form when $a=0$ (corresponding to $m=p$ ). For a nonzero value of a, the problem is more complicated and a richer solution structure emerges. However, we are able to provide the following useful bounds on the value function in (3.13) for an arbitrary a. Moreover, these bounds can be seen to coincide when $a=0$ , yielding the explicit solution in this case.
Proposition 4.1. (Upper bound.) The value function defined in (3.13) satisfies
where $V_1^a$ is as given in (3.16), and the function $\Pi$ is the updated belief conditional on the process reaching x at time t, defined in (3.15).
Proof. To establish the upper bound, we consider a situation in which the true nature of the process (i.e. $\theta$ ) was revealed to the optimal stopper immediately after starting, i.e. at time $t+$ . In this situation, the optimal stopper would subsequently be able to employ the optimal stopping strategy for the problem given full knowledge of the nature of the underlying process. Specifically, if the process was revealed as a Brownian bridge, then using $\tau_b$ , as defined in (3.18), would be optimal, generating an expected value (at $t=t+$ ) of $V_1^a(t,x)$ . On the other hand, if the process was revealed as a Brownian motion with drift a, then the optimal strategy would be different. In the case where $a<0$ , it would be optimal to stop immediately and receive the value x, and in the case where $a>0$ it would be optimal to wait until $t=1$ and receive the expected value $\mathbb{E}_0[X_1]=x+a$ . When $a=0$ , however, any stopping rule would yield an expected value of x, due to the martingality of the process X in this case.
Considering now the value function at $t=t-$ . Acknowledging that the true nature of the process will be immanently revealed, the expected payout is given by $(1-\Pi_t)(x+\max\!(a,0))+\Pi_t V_1^a(t,x)$ , upon noting that $\Pi_t=\Pi(t,x,\pi)$ represents the current belief about the true value of $\theta$ . Finally, recognizing that the set of stopping times in (3.13) is a subset of the stopping times used in the situation described above (where $\theta$ is revealed at $t+$ ), the stated inequality is clear.
Proposition 4.2. (Lower bound.) The value function defined in (3.13) satisfies
where $V_1^a$ is as given in (3.16), and $\tau_b$ denotes the optimal strategy for the known pinning case described in (3.18). Moreover, the function $\Pi$ is the updated belief conditional on the process reaching x at time t, defined in (3.15).
Proof. The desired bound can be established by employing the optimal strategy for the known pinning case, defined in (3.18), in the stopping problem in (3.13), for $\pi<1$ . In detail, letting $X=X^{t,x}$ for ease of notation, we have
where we have applied the measure change from $\mathbb{P}_{\pi}$ to $\mathbb{P}_{0}$ , via (3.9), in the second equality, and the measure change from $\mathbb{P}_0$ to $\mathbb{P}_1$ , via (3.10), in the last equality. Furthermore, employing the stopping rule $\tau_b$ from (3.18) (which may or may not be optimal) yields
upon noting the definition of $V_1^a$ , and where we have ensured that stopping under $\mathbb{P}_0$ happens at or before $t=1$ (since the boundary b is not guaranteed to be hit by a Brownian motion with drift, unlike the Brownian bridge).
Computation of $\mathbb{E}_0[X_{(t+\tau_b)\wedge 1}]$ is difficult in general, being the expected hitting level of a Brownian motion with drift to a square-root boundary. Alternatively, we have $\mathbb{E}_0[X_{(t+\tau_b)\wedge 1}]=x+a\mathbb{E}_0[\tau_b\wedge(1-t)]+\mathbb{E}_0[B_{(t+\tau_b)\wedge 1}]=x+a\mathbb{E}_0[\tau_b\wedge(1-t)]$ , with the first-passage time $\tau_b=\inf\{s\geq 0\mid B_s\geq c(s)\}$ , where $c(s)\;:\!=\;a(1-s)-x+\alpha\sqrt{1-t-s}$ . Hence, the computation reduces to the problem of finding the mean first-passage time of a driftless Brownian motion (started at zero) to a time-dependent boundary (which is a mixture of a linear and square-root function). While no explicit expression for $\mathbb{E}_0[\tau_b\wedge(1-t)]$ exists, there are numerous numerical approximations available; see, for example, [Reference Durbin and Williams13], or more recently [Reference Herrmann and Tanré24]. When $a=0$ , it is clear that $\mathbb{E}_0[X_{(t+\tau_b)\wedge 1}]=x$ , a result which we will exploit below.
Given Propositions 4.1 and 4.2, the following result is evident, and constitutes the main result of this note.
Theorem 4.1. When $a=0$ , the value function in (3.13) is given by
where $\Pi$ is defined in (3.15) and $V^0_1$ is defined in (3.16) (upon setting $a=0$ ). Further, the optimal stopping strategy in (3.13) is given by $\tau^*=\tau_b\wedge(1-t)$ . This stopping strategy is the same for all $\pi\in[0,1]$ .
Proof. The result is evident given the fact that the upper bound defined in (4.1) and the lower bound defined in (4.2) coincide when $a=0$ . Specifically, we observe that $\mathbb{E}_0[X_{(t+\tau_b)\wedge 1}]=x$ in (4.2) since X is a $\mathbb{P}_0$ -martingale when $a=0$ . Moreover, since the process is not guaranteed to pin at $t=1$ , we specify explicitly that the stopper must stop at $t=1$ should the boundary b not be hit.
Note that the optimality of the solution presented in (4.3) does not need to be verified since it follows directly from the proven identity in (4.3) and the existing verification arguments establishing the optimality of $V_1^0$ (provided in [Reference Ekström and Wanntorp19], for example).
The equality found in (4.3) also demonstrates that (when $a=0$ ) there is no loss in value due to the optimal stopper using a sub-optimal stopping strategy for the ‘true’ drift. The optimal stopping strategy for a Brownian bridge also achieves the maximum possible value for the Brownian motion (due to martingality), hence using $\tau^*$ will achieve the maximum possible value regardless of the true nature of the underlying process. For $a\neq 0$ , it would not be possible to achieve the maximum value in both drift scenarios simultaneously through a single optimal stopping rule. Hence there would be loss in value due to this, as indicated by the inequality in Propositions 4.1.
Remark 4.1. It is also worth noting that the arguments in the proof of Theorem 4.1 would carry over to a more general setting in which the process is believed to be either a martingale M or a diffusion X (with an initial probability $\pi$ of being X). In this case, similar arguments to Propositions 4.1 will show that $V(t,x,\pi)\leq(1-\Pi_t)x+\Pi_tV_1(t,x)$ , where $V_1$ denotes the solution to the associated stopping problem for the diffusion X. Under $\mathbb{P}_0$ , all stopping rules generate the expected value of x, due to M being a $\mathbb{P}_0$ -martingale. Moreover, similar arguments to Propositions 4.2 will show that $V(t,x,\pi)\geq(1-\Pi_t)x+\Pi_t V_1(t,x)$ , upon using the optimal strategy for the optimal stopping problem under $\mathbb{P}_1$ , and noting again that $\mathbb{E}_0[X_{t+\tau}]=x$ , for any stopping rule. Finally, we must note that the function $\Pi_t$ would need to be found on a case-by-case basis via a mapping similar to (3.6). In general, however, this mapping could also include path-dependent functionals of the process over [0, t], in addition to the values of t and x (cf. [Reference Johnson and Peskir27, Proposition 4]).
Next, Theorem 4.1 also implies the following result.
Corollary 4.1. When $a=0$ , we have $x\leq V(t,x,\pi)\leq V_1^0(t,x)$ and $\pi\mapsto V(t,x,\pi)$ is increasing, with $V(t,x,0)=x$ and $V(t,x,1)=V_1^0(t,x)$ .
Proof. From (4.3) we have that $V-V_1^0=(1-\Pi)(x-V_1^0)\leq 0$ , where the inequality is due to the fact that $\Pi\leq 1$ and $V_1^0\geq x$ , from (3.16). Direct differentiation of (4.3), upon noting (3.15), also shows that
proving the second claim.
Corollary 4.1 reveals that, while the optimal stopping strategy is the same with pinning certainty or uncertainty when $a=0$ , the value function with uncertainty is lower than if the pinning was certain/known. In other words, when sampling from a balanced urn with uncertainty about replacement, the optimal stopping strategy is the same as with replacement, but the expected payout is lower. To illustrate this, Figure 1 plots the value function V in (4.3) in comparison with $V_1^0$ as defined in (3.16). We confirm that a larger $\pi$ (hence a stronger belief that the process is indeed a Brownian bridge) corresponds to a larger value of V.
Figure 1 also highlights the fact that the value function in (3.13) can be negative, since pinning to zero is not guaranteed (and hence stopping at $t=1$ does not guarantee a minimum payoff of zero). For example, if $\pi=0.5$ (i.e. sampling with or without replacement were both initially thought to be equally likely), then the value function in (4.3) would be negative for all $x<-0.286$ . This does not mean, however, that it would be optimal to stop once the running payoff drops below this value, since an immediate negative payoff would be received, compared to the zero expected payoff from continuing and stopping according to $\tau^*$ .
5. The case where a is nonzero
If the urn is not balanced, meaning that $m\neq p$ , then a nonzero drift and a nonzero pinning point are introduced into the process X. This asymmetry complicates the problem considerably and, while the bounds in (4.1) and (4.2) are still valid, a closed-form solution to (3.13) is no longer available. Attempting to provide a detailed analytical investigation of this case is beyond the scope of this note. However, numerical investigation of the variational inequality associated with (3.13) suggests that a rich solution structure emerges, particularly in the case where $a>0$ , when multiple stopping boundaries can arise. We therefore conclude this note by exposing some of this structure to pique the reader’s interest.
Remark 5.1. It should be noted that if the drift of the Brownian motion was zero, but the Brownian bridge had a nonzero pinning level, then the results of Theorem 4.1 would still hold (due to the martingality of X under $\mathbb{P}_0$ ). However, this situation does not correspond to the urn problem described in Section 2, in which both the drift and the pinning point must be the same.
To shed some light on the optimal stopping strategy for a nonzero a, it is useful to reformulate the problem in (3.13) under the measure $\mathbb{P}_0$ as follows:
where we have used (3.9) in the second equality (to change measure) and the mapping from (3.6) in the third equality (to eliminate $\Pi_{t+\tau}$ ). We have also defined the auxiliary optimal stopping problem
and the payoff function
where $L^a$ is given in (3.6), which importantly is dependent on the parameter a.
Next, defining the infinitesimal generator associated with X as
then Itô’s formula and an application of the optional sampling theorem for any given $\tau$ yield
where
Hence, from (5.2) it is clear that it would never be optimal to stop at a point (t, x) for which $H(t,x)>0$ . For $a=0$ , this region corresponds to $x<0$ . However, the shape of this region is qualitatively different for nonzero a. To illustrate this, Figure 2 plots the behavior of H for both positive and negative values of a.
Considering the case where $a<0$ , Figure 2 reveals that H is strictly negative for all x before some critical time (calculated to be 0.536 for the $a=-1$ example). Furthermore, when the function does become positive, it only does so in a rather narrow interval (below a). This suggests that the incentive to stop is rather strong when $a<0$ , as one might expect. However, little more can be gleaned from the function H in this case. For $a>0$ , however, the function H is more informative about the optimal stopping strategy. Here, we find that H is strictly positive for all x before some critical time (again found to be 0.536 for $a=1$ ). This indicates that when $a>0$ , it would never be optimal to stop before this critical time. Moreover, since $\lim_{x\to\infty}H(t,x)=a$ , we also observe that any stopping region must be contained in a finite interval (above a). This suggests the existence of a disjoint continuation region and the presence of two separate optimal stopping boundaries. Indeed, these predictions are confirmed numerically below. This richer structure is also consistent with the results in [Reference Ekström and Vaicenavicius17], whose authors found similar disjoint continuation regions in a situation where the location of the pinning point of a Brownian bridge was uncertain.
To investigate the solution to (5.1), and hence (3.13), numerically, we employ finite difference techniques applied to an associated variational inequality. The connection between optimal stopping problems and variational inequalities has long been established (see, for example, [Reference Øksendal30, Section 10.4]). Specifically, it can be seen that a candidate solution to (5.1) can be obtained by solving the following variational inequality, expressed as a linear complementarity problem (see [Reference Zhu, Wu, Chern and Sun33, Section 2.5.5] for a general formulation):
We have stated the problem as a linear complementarity problem (as opposed to a free-boundary problem with smooth-pasting applied at the unknown boundaries), since the structure of the continuation and stopping regions in (t, x) is not known a priori for (5.1). As can be seen from (5.3), the location of the optimal stopping boundaries do not appear explicitly in the problem formulation, instead being implicitly defined by the condition $\widetilde{V}^{\pi,a}\geq G^{\pi,a}(t,x)$ . Once problem (5.3) has been solved (or numerically approximated), the optimal stopping boundaries can simply be read off from the solution by identifying where the function $\widetilde{V}^{\pi,a}-G^{\pi,a}$ switches from being positive to zero. The implicit treatment of the optimal stopping boundaries in (5.3) means that any complex structure of the continuation and stopping regions will be revealed as part of the solution. Indeed, this is exactly what we see below when $a>0$ , where multiple stopping boundaries are identified.
To approximate the solution to (5.3) numerically, we discretize the problem using the Crank–Nicolson differencing scheme and then solve the resulting system of equations using the projected successive over relaxation (PSOR) algorithm. A more detailed description of the PSOR method, along with proofs of convergence, can be found in [Reference Cryer9]. More details about the discretization and implementation of the algorithm can also be made available from the author upon request. Figure 3 shows the optimal stopping boundaries obtained from our numerical procedure for various values of a (both negative and positive).
Let us first discuss the case where $a>0$ . As predicted, Figure 3 reveals that it would never be optimal to stop before some critical time (for large enough a or small enough $\pi$ at least). Recalling that the optimal strategy for a Brownian motion with positive drift is to wait until $t=1$ , it would appear that waiting to learn more about the true nature of the process is optimal (at least initially). In addition, beyond some critical time, we observe two disjoint continuation regions. Indicating that, depending on the sample path experienced, it can be optimal to stop either after an increase in X (i.e. after a p-ball has been drawn) or after a decrease in X (i.e. after an m-ball has been drawn). Based on the terminology introduced in [Reference Ekström and Vaicenavicius17], we can interpret the former boundary as a too-good-to-persist boundary and the latter as a stop-loss boundary. The emergence of an endogenous stop-loss boundary in the optimal stopping strategy is a unique feature of the problem with uncertain pinning. Finally, we also observe that both stopping boundaries lie above the corresponding boundary if pinning was certain (given by $a+\alpha\sqrt{1-t}$ ), indicating that when $a>0$ , stopping will happen later in the presence of pinning uncertainty.
For the case where $a<0$ , we have the following remarks. Firstly, numerical investigations suggest that it is never optimal to stop when $x<0$ , despite the negative drift. Secondly, the optimal stopping strategy appears to be of the form $\tau=\inf\{s\geq 0\mid X_{t+s}\geq \widehat{b}(t+s)\}\wedge(1-t)$ for some time-dependent boundary $\widehat{b}$ . Further, $\widehat{b}(t)$ appears to converge to zero at $t=1$ , although it does not do so monotonically for all parameters. Moreover, the boundary itself is not monotone in the parameter a, i.e. $a\mapsto\widehat{b}(t)$ is not monotone. This behavior is most likely due to the differing effects of a on the linear drift of the Brownian motion and the pinning behavior of the Brownian bridge.
Due to the existence of multiple stopping boundaries, and their observed non-monotonic behavior, further analytical investigation of the problem for $a\neq 0$ would be challenging and is left for the subject of future research.
Acknowledgements
I wish to thank Erik Ekström for many fruitful conversations about this work, and two anonymous referees for their detailed comments, all of which helped improve both the results and the presentation of the paper.
Funding information
I am grateful for funding received from the UTS Business School Research Grant Scheme.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.