Hostname: page-component-7bb8b95d7b-nptnm Total loading time: 0 Render date: 2024-09-30T00:39:55.364Z Has data issue: false hasContentIssue false

Optimal proportional reinsurance to maximize an insurer’s exponential utility under unobservable drift

Published online by Cambridge University Press:  21 February 2023

Xiaoqing Liang*
Affiliation:
Hebei University of Technology
Virginia R. Young*
Affiliation:
University of Michigan
*
*Postal address: Departmnt of Statistics, School of Sciences, Hebei University of Technology, Tianjin 300401, P. R. China. Email: [email protected]
**Postal address: Department of Mathematics, University of Michigan, Ann Arbor, Michigan, 48109. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

We study an optimal reinsurance problem for a diffusion model, in which the drift of the claim follows an Ornstein–Uhlenbeck process. The aim of the insurer is to maximize the expected exponential utility of its terminal wealth. We consider two cases: full information and partial information. Full information occurs when the insurer directly observes the drift; partial information occurs when the insurer observes only its claims. By applying stochastic control and by solving the corresponding Hamilton–Jacobi–Bellman equations, we find the value function and the optimal reinsurance strategy under both full and partial information. We determine a relationship between the value function and reinsurance strategy under full information with the value function and reinsurance strategy under partial information.

Type
Original Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Research on applying stochastic control theory to analyze insurance problems has long attracted a great deal of interest among actuaries because optimal control provides both theoretical and practical solutions to optimization problems in insurance. There are many research papers studying optimal insurance problems involving reinsurance and investment, which help the insurer to increase profits and to reduce the claim risk. For example, [Reference Browne8] considered two optimal investment problems for an insurer under a diffusion risk model, namely, maximizing expected exponential utility of terminal wealth and minimizing the probability of ruin. The probability of ruin for a diffusion risk model was minimized via reinsurance and investment in [Reference Promislow and Young26]; [Reference Schmidli28] studied a similar problem for a compound Poisson risk model; [Reference Irgens and Paulsen17] incorporated proportional and excess-of-loss reinsurance into a jump-diffusion model with investment to maximize the expected utility of terminal wealth; [Reference Liang, Yuen and Guo22] explored an optimal proportional reinsurance and investment model in a stock market driven by an Ornstein–Uhlenbeck process; and [Reference Li, Li and Young20] studied an optimal investment and reinsurance problem under the mean-variance criterion. See also [Reference Bi, Liang and Yuen3, Reference Hipp and Taksar15, Reference Sun, Zhang and Yuen29, Reference Zhang and Siu32], to name just a few.

In most of above-referenced research, the insurer has complete knowledge of the model and of the values of the processes in that model, i.e. the problems are considered under full information. However, in reality, the insurance company generally only has partial information, which generally assumes knowledge of the model but not of the values of all the processes in that model. Portfolio optimization problems with unobservable information has been an active topic in mathematical finance. However, results related to insurance models are relatively few. Research on portfolio optimization generally assumes the drift of the traded stock unobservable; see, for example, [Reference Brendle6, Reference Brendle7]. Also, [Reference Lakner18, Reference Lakner19] investigated a similar problem by using the martingale-duality approach; [Reference Bäuerle and Rieder2] assumed that the dynamic of the stock price follows a geometric Brownian motion with Poisson jumps, in which the jump intensity is unobservable; and [Reference Liang and Bayraktar21] considered an optimal reinsurance and investment problem by maximizing the expected exponential utility of the insurer’s terminal wealth in a Black–Scholes financial market. The claim process is a compound Poisson process in which the claim intensity and the jump-size distribution depend on the state of a non-observable Markov chain. See also [Reference Bäuerle and Leimcke1, Reference Björk, Davis and Landén4, Reference Brachetta and Ceci5, Reference Gennotte14, Reference Honda16, Reference Peng and Hu25, Reference Xiong, Xu and Zheng30].

In this paper we find the optimal reinsurance strategy for an insurer to maximize the expected exponential utility of its terminal wealth. We use a diffusion model to describe the dynamics of the claim, in which the drift of the claims follows a mean-reverting Ornstein–Uhlenbeck process. We consider two cases: full information and partial information. Full information occurs when the insurer directly observes the drift; partial information occurs when the insurer observes only its claims. We use the filtering technique to transfer the unobservable problem into an observable one; then, by applying the dynamic programming approach, we derive explicit expressions for the value function and the corresponding optimal reinsurance strategy. We also determine a relationship between the value function and reinsurance strategy under full information with the value function and reinsurance strategy under partial information. Finally, we present numerical examples to illustrate possible outcomes of the model.

The rest of this paper is organized as follows. In Section 2, we describe the insurance model in both full and partial information frameworks. In Section 3, we use a dynamic Hamilton–Jacobi–Bellman (HJB) equation approach to find the value function and optimal reinsurance strategy under complete information. In Section 4, we consider the problem with unobservable information. Moreover, we discuss how to handle a constraint on the proportion reinsured, and we provide numerical simulation results to compute the probability that the reinsurance proportion lies outside of the interval [0, 1] for some values of the parameters. Finally, in Section 5 we compare the value function and the optimal reinsurance strategy under partial observation with the ones under full observation. We also provide numerical examples to show the difference between the two optimal reinsurance proportions at the end of this section.

2. Problem formulation

In this section we describe the claim process of the insurer, and we formulate the problem of maximizing the insurer’s expected exponential utility of terminal wealth. Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space that supports two correlated standard Brownian motions $W_1$ and $W_2$ , with constant coefficient of correlation $\rho \in [{-}1, 1]$ . As in [Reference Promislow and Young26], assume that the claim process $S = \{S_t\}_{0 \le t \le T}$ follows Brownian motion with drift, in which $S_t$ equals the cumulative claims paid by the insurer during the time interval [0, t]. However, unlike [Reference Promislow and Young26], the drift follows a random process. Specifically, $\text{d} S_t = \mu_t \text{d} t - \sigma \text{d} W_{1,t}$ , in which $S_0 = 0$ and $\sigma$ is a positive constant, and

(2.1) \begin{equation}\text{d} \mu_t = -a(\mu_t - \bar{\mu}) \text{d} t + b \, \text{d} W_{2,t},\end{equation}

in which a, b, and $\bar{\mu}$ are positive constants, i.e. $\{\mu_t\}_{0 \le t \le T}$ follows a mean-reverting Ornstein–Uhlenbeck process. We loosely interpret $\bar{\mu}$ as a long-run value of $\mu_t$ , and a measures the ‘speed’ at which $\mu_t$ moves towards $\bar{\mu}$ . This model was explored in [Reference Dassios and Jang12] as the diffusion approximation of a Cox process with a shot-noise process as the claim intensity.

The insurer collects premium at the constant rate $c = (1 + \theta)\bar{\mu}$ , in which $\theta \ge 0$ is the constant proportional risk loading. The insurer is able to purchase proportional reinsurance with constant proportional risk loading $\eta \ge \theta$ . Let $q_t$ denote the proportional amount of business retained at time $t \in [0, T]$ ; thus, the controlled surplus $X = \{X_t\}_{0 \le t \le T}$ follows the stochastic differential equation (SDE)

(2.2) \begin{align} \text{d} X_t &= (c \text{d} t - \text{d} S_t ) - (1 - q_t) (1 + \eta) \bar{\mu} \, \text{d} t + (1 - q_t) \text{d} S_t \notag \\ &= ( q_t ( (1 + \eta) \bar{\mu} - \mu_t ) - (\eta - \theta) \bar{\mu} ) \text{d} t + q_t \sigma \text{d} W_{1,t},\end{align}

with $X_0 = x$ . If $0 \le q_t \le 1$ , then $q_t$ is the usual proportional reinsurance. In Examples 4.1 and 4.2 we calculate the probability that the optimal $q_t$ lies outside the interval [0, 1].

The insurer chooses a retention strategy $q = \{ q_t \}_{0 \le t \le T}$ based on the available information, and we consider two cases in this paper:

  • Full information In this case, the insurer observes both $\{S_t\}$ and $\{ \mu_t \}$ . A retention strategy q is admissible in this case if (i) q is adapted to the filtration $\mathbb F = \{\mathcal F_t\}_{0 \le t \le T}$ , in which $\mathcal F_t = \sigma (S_s, \mu_s\colon 0 \le s \le t)$ for all $t \in [0, T]$ ; (ii) q is conditional ${L}^2$ -integrable, i.e. $\mathbb E \big[\int^T_t q^2_u \, \text{d} u \mid \mathcal{F}_t \big] < \infty$ for any $0 \le t \le T$ ; and (iii) the SDE (2.2) has a pathwise unique solution $\{X^q_t\}_{t\in [0,T]}$ . Let $\mathcal A^{f}$ denote the set of admissible strategies in the full information case.

  • Partial information In this case, the insurer observes only $\{S_t\}$ and does not know the drift of its claim process, although the insurer knows the conditional expectation and variance of $\mu_0$ . A retention strategy q is admissible in this case if (i) q is adapted to the filtration $\mathbb G = \{ \mathcal G_t \}_{0 \le t \le T}$ , in which $\mathcal G_t = \sigma (S_s\colon 0 \le s \le t)$ for all $t \in [0, T]$ ; (ii) if $\mathbb E \big[\int^T_t q^2_u \, \text{d} u \mid \mathcal{G}_t \big] < \infty$ for any $0 \le t \le T$ ; and (iii) the SDE of the controlled surplus under $\mathcal G_t$ has a pathwise unique solution. Let $\mathcal A^{p}$ denote the set of admissible strategies in the partial information case. Note that $\mathcal G_t \subset \mathcal F_t$ for all $t \in [0, T]$ , i.e. $\mathbb G$ is a subfiltration of $\mathbb F$ ; we also assume that $\mathbb F$ and $\mathbb G$ are augmented to satisfy the usual conditions of completeness and right continuity.

In both cases, the insurer chooses q to maximize the expectation of exponential utility of wealth at time T. Let ${V^f}$ denote the maximum expected exponential utility of terminal wealth under full information, i.e.

(2.3) \begin{equation}{V^f}(t, x, \mu) = \sup_{q \in \mathcal A^f} \mathbb E\big({-} {\text{e}}^{-\gamma X_T} \mid X_t = x,\, \mu_t = \mu \big),\end{equation}

in which $\gamma > 0$ is the (constant) coefficient of absolute risk aversion.

For the partial information case, as in [Reference Björk, Davis and Landén4, Reference Brendle6, Reference Brendle7], we first project the drift process $\mu_t$ onto the observable filtration $\mathcal G$ , in order to reduce the partially observable problem to an equivalent problem with full information. Define $m_t = \mathbb E (\mu_t \mid \mathcal G_t)$ , $0 \le t \le T$ ; then, [Reference Liptser and Shiryaev23, Theorem 10.3] shows us that $\{m_t\}_{0 \le t \le T}$ follows the SDE

(2.4) \begin{equation} \text{d} m_t = - a(m_t - \bar{\mu}) \text{d} t + \bigg( \rho b - \dfrac{{\text{Var}}(\mu_t \mid \mathcal G_t)}{\sigma} \bigg) \text{d} \bar{W}_{1,t},\end{equation}

in which $\bar W_1 = \{\bar W_{1,t}\}_{0 \le t \le T}$ is the so-called innovations process given by

(2.5) \begin{equation}\bar W_{1,t} = W_{1,t} + \dfrac{1}{\sigma} \int_0^t (m_s - \mu_s) \, \text{d} s,\end{equation}

and $\bar W_1$ is a $(\mathbb P, \mathbb G)$ -standard Brownian motion. In other words, $\{m_t\}_{0 \le t \le T}$ follows a $\mathbb G$ -Ornstein–Uhlenbeck process with non-constant volatility.

If we define $v(t) = {\text{Var}}(\mu_t \mid \mathcal G_t)$ for $t \in [0, T]$ , then $v = v(t)$ satisfies the Riccati equation

(2.6) \begin{equation} \dfrac{\text{d} v(t)}{\text{d} t} = b^2 - 2 a v(t) - \bigg( \dfrac{\sigma \rho b - v(t)}{\sigma} \bigg)^2,\end{equation}

with initial value $v(0) = {\text{Var}}(\mu_0 \mid \mathcal G_0)$ . See Appendix A for a derivation of (2.6) and the following solution:

\begin{equation*} v(t) = \sigma^2 \Delta \cdot \dfrac{R \text{e}^{2 \Delta \cdot t} - 1}{R \text{e}^{2 \Delta \cdot t} + 1} - \sigma (\sigma a - \rho b),\end{equation*}

in which

(2.7) \begin{equation} \Delta = \dfrac{1}{\sigma} \, \sqrt{(\sigma a - \rho b)^2 + b^2(1 - \rho^2)},\end{equation}

and

(2.8) \begin{equation} R = \dfrac{\sigma^2 \Delta + \sigma (\sigma a - \rho b) + v(0)}{\sigma^2 \Delta - \sigma (\sigma a - \rho b) - v(0)}.\end{equation}

Moreover, by substituting for $W_1$ in terms of $\bar W_1$ in (2.2), we obtain that X follows the dynamics

\begin{equation*} \text{d} X_t = ( q_t ( (1 + \eta) \bar{\mu} - m_t ) - (\eta - \theta) \bar{\mu} ) \text{d} t + q_t \sigma \text{d}\bar W_{1,t},\end{equation*}

which is $\mathbb G$ -adapted in the partial information case because q is $\mathbb G$ -adapted in that case. Let ${V^p}$ denote the maximum expected exponential utility of terminal wealth under partial information, i.e.

(2.9) \begin{equation} {V^p}(t, x, m) = \sup_{q \in \mathcal A^p} \mathbb E\big({-} \text{e}^{-\gamma X_T} \mid X_t = x, \, m_t = m \big).\end{equation}

3. Full information case

We begin by stating a relevant verification theorem without proof because the proof is standard in the actuarial and financial mathematics literature; see, for example, [Reference Promislow and Young26, Theorem 2.1].

Theorem 3.1. Suppose ${v^f} \in \mathcal C^{1, 2, 2}([0, T] \times \mathbb R \times \mathbb R)$ takes values in $\mathbb R^{-}$ , is non-decreasing and concave in x, and satisfies the HJB equation

(3.1) $$\matrix{ {0 = v_t^f - (\eta - \theta )\bar \mu v_x^f - a(\mu - \bar \mu )v_\mu ^f + {\textstyle{1 \over 2}}{b^2}v_{\mu \mu }^f} \hfill \cr {\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + \mathop {\sup }\limits_q [q((1 + \eta )\bar \mu - \mu )v_x^f + q\sigma \rho bv_{x\mu }^f + {\textstyle{1 \over 2}}{q^2}{\sigma ^2}v_{xx}^f],} \hfill \cr } $$

with terminal condition ${v^f}(T, x, \mu) = - \text{e}^{-\gamma x}$ . The maximizer of (3.1) is

\begin{equation*} q^*(t, x, \mu) = - \dfrac{( (1 + \eta) \bar{\mu} - \mu ) v^f_x(t, x, \mu) + \sigma \rho b v^f_{x \mu}(t, x, \mu)} {\sigma^2 v^f_{xx}(t, x, \mu)}. \end{equation*}

If the retention strategy ${q^f}$ given in feedback form by $q^f_t = q^*(t, X^*_t, \mu_t)$ for all $0 \le t \le T$ is admissible, then the value function ${V^f}$ defined by (2.3) equals ${v^f}$ . Here, $X^*_t$ is the optimally controlled surplus at time t.

In the following theorem, we solve the HJB equation in (3.1) with boundary condition ${v^f}(T, x, \mu) = - \text{e}^{-\gamma x}$ . Theorem 3.1 then allows us to deduce that the solution equals the value function ${V^f}$ in (2.3).

Theorem 3.2. The maximum expected exponential utility of terminal wealth under full information is ${V^f}(t, x, \mu) = - \text{e}^{-\gamma x} \exp \{{A}(t)\mu^2 + {B} (t)\mu + {C} (t)\}$ , in which

(3.2) \begin{equation} A(t) = - \dfrac{\text{e}^{2 \Delta \cdot (T-t)} - 1}{\alpha_1 \text{e}^{2 \Delta \cdot (T-t)} + \alpha_2}, \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad \end{equation}
(3.3) \begin{equation} B(t) = \dfrac{2\bar{\mu}}{\Delta} \cdot \dfrac{\text{e}^{\Delta \cdot (T - t)} - 1}{\alpha_1 \text{e}^{2 \Delta \cdot (T-t)} + \alpha_2} \big\{(1 + \eta)\Delta(\text{e}^{\Delta \cdot (T - t)} + 1) + \eta a (\text{e}^{\Delta \cdot (T - t)} - 1) \big\}, \end{equation}
(3.4) \begin{align} C(t) & = \int^T_t \bigg\{\frac{b^2(1-\rho^2)}{2}B^2(s) + \bigg(a-\dfrac{(1+\eta)b\rho}{\sigma}\bigg)\bar{\mu} B(s)+b^2A(s)\bigg\} \, \text{d} s \notag \\ & \quad + \bigg\{(\eta - \theta)\bar{\mu}\gamma-\dfrac{(1+\eta)^2\bar{\mu}^2}{2 \sigma^2}\bigg\}(T - t),\qquad\qquad\qquad\qquad\qquad\quad \end{align}

with $\Delta$ given in (2.7), and we define $\alpha_1 > 0$ and $\alpha_2 > 0$ as $\alpha_1 = 2 \sigma \big( \sigma \Delta + (\sigma a - \rho b) \big)$ , $\alpha_2 = 2 \sigma \big( \sigma \Delta - (\sigma a - \rho b) \big)$ . Moreover, the optimal retention strategy ${q^f}$ is given in feedback form by

(3.5) \begin{equation} q^f_t = \dfrac{1}{\sigma^2 \gamma} \{ (2 \sigma \rho b A(t) - 1)\mu_t + \sigma \rho b B(t) + (1+\eta)\bar{\mu} \}. \end{equation}

Proof. See Appendix B for a proof of this theorem.

Remark 3.1. We derive explicit expressions for A and B in (3.2) and (3.3), respectively, because ${q^f}$ in (3.5) relies on A and B. However, for the sake of space, we do not present an explicit expression for C.

Remark 3.2. Because $A(t) \le 0$ for all $0 \le t \le T$ , as $\mu_t$ increases the proportion retained, namely $q^f_t$ , decreases. This monotonicity makes sense because, as the drift of claims increases, we expect the insurance company to retain less of its risk, especially given a fixed premium rate $(1 + \eta)\bar{\mu}$ . Also, note that, when $\mu_t = 0$ , $q^f_t > 0$ because $B(t) \ge 0$ for all $0 \le t \le T$ .

In the following corollary we show how A and B change with time. We omit the calculations because they are straightforward from (3.2) and (3.3).

Corollary 3.1. The derivative of A(t) is

\begin{equation*} A^{\prime}(t) = \dfrac{8 \sigma^2 \Delta^2 \text{e}^{2 \Delta \cdot (T-t)}}{(\alpha_1 \text{e}^{2 \Delta \cdot (T-t)} + \alpha_2)^2}, \end{equation*}

which is positive for $0 \le t \le T$ , and the derivative of B(t) is

\begin{align*} B^{\prime}(t) & = - \dfrac{4 \bar{\mu} \text{e}^{\Delta \cdot (T - t)}}{(\alpha_1 \text{e}^{2 \Delta \cdot (T-t)} + \alpha_2)^2} \\[5pt] & \quad \times \big\{(1 + \eta) 4 \sigma^2 \Delta^2 \text{e}^{\Delta \cdot (T - t)} + \eta a (\text{e}^{\Delta \cdot (T - t)} - 1)(\alpha_1 \text{e}^{\Delta \cdot (T-t)} + \alpha_2) \big\}, \end{align*}

which is negative for $0 \le t \le T$ . Thus, the slope of $q^f_t$ as a linear function of $\mu_t$ , namely $({1}/({\sigma^2 \gamma})) (2 \sigma \rho b A(t) - 1)$ , becomes less negative over time (specifically, it increases to $-1/(\sigma^2 \gamma)$ as t increases to T), and the intercept, namely $({1}/({\sigma^2 \gamma})) (\sigma \rho b B(t) + (1+\eta)\bar{\mu})$ , becomes less positive over time (specifically, it decreases to $(1 + \eta)\bar{\mu}/(\sigma^2 \gamma)$ as t increases to T).

Remark 3.3. Corollary 3.1 shows us that, over time, the proportion of retained risk, as a linear function of $\mu_t$ , flattens. This flattening is consistent with the risk aversion of the insurer. As time approaches the horizon T, the insurer will not wish to change its retention as much (as a function of $\mu_t$ ) as when further from the horizon. Intuitively, the closer time is to the horizon, the less time the insurer has to maximize its expected utility and, therefore, the insurer reacts less strongly to changes in the drift of the surplus. We see a similar phenomenon in the optimal investment strategy of [Reference Brendle6]; namely, the closer time is to T, the less the investor changes their investment in reaction to changes in the drift of the risky asset.

4. Partial information case

In this section we analyze the problem under partial information. The corresponding verification theorem and its solution parallel the results in Section 3.

Theorem 4.1. Suppose ${v^p} \in \mathcal C^{1, 2, 2}([0, T] \times \mathbb R \times \mathbb R)$ takes values in $\mathbb R^{-}$ , is non-decreasing and concave in x, and satisfies the HJB equation

(4.1) $$\eqalign{ & 0 = v_t^p - (\eta - \theta )\bar \mu v_x^p - a(m - \bar \mu )v_m^p + {1 \over 2}{(\rho b - {{v(t)} \over \sigma })^2}v_{mm}^p \cr & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + \mathop {\sup }\limits_q [q((1 + \eta )\bar \mu - m)v_x^p + q\sigma (\rho b - {{v(t)} \over \sigma })v_{xm}^p + {1 \over 2}{q^2}{\sigma ^2}v_{xx}^p], \cr} $$

with terminal condition ${v^p}(T, x, \mu) = - \text{e}^{-\gamma x}$ . The maximizer of (4.1) is

\begin{equation*} q^*(t, x, m) = - \dfrac{( (1 + \eta) \bar{\mu} - m ) v^p_x(t, x, m) + \sigma \rho b v^p_{x m}(t, x, m)}{\sigma^2 v^p_{xx}(t, x, m)}. \end{equation*}

If the retention strategy ${q^p}$ given in feedback form by $q^p_t = q^*(t, X^*_t, m_t)$ for all $0 \le t \le T$ is admissible, then the value function ${V^p}$ defined by (2.9) equals ${v^p}$ . Here, $X^*_t$ is the optimally controlled surplus at time t.

In the following theorem we solve the HJB equation in (4.1) with boundary condition ${v^p}(T, x, \mu) = - \text{e}^{-\gamma x}$ . Theorem 4.1 then allows us to deduce that the solution equals the value function ${V^p}$ in (2.9).

Theorem 4.2. The maximum expected exponential utility of terminal wealth under partial information is ${V^p}(t,x,m) = - \text{e}^{-\gamma x} \exp\{\widehat A(t) m^2 + \widehat B(t) m + \widehat C(t)\}$ , in which

(4.2) \begin{equation} \widehat A(t) = - \dfrac{1}{4 \sigma^2 \Delta} \cdot \dfrac{(R \text{e}^{2 \Delta \cdot t} + 1)(\text{e}^{2\Delta \cdot (T - t)} - 1)} {R \text{e}^{2\Delta \cdot T} + 1}, \end{equation}
(4.3) \begin{align} \widehat B(t) & = \dfrac{\bar{\mu}}{2 \sigma^2 \Delta^2} \cdot \dfrac{ (R \text{e}^{2 \Delta \cdot t} + 1)(\text{e}^{\Delta \cdot (T - t)} - 1)} {R \text{e}^{2 \Delta \cdot T} + 1} \notag \\ & \quad \times \big\{(1 + \eta)\Delta (\text{e}^{\Delta \cdot (T - t)} + 1) + \eta a(\text{e}^{\Delta \cdot (T - t)} - 1) \big\}, \end{align}
(4.4) \begin{align} \widehat C(t) & = \int^T_t \bigg \{ \bigg(a - (1 + \eta) \dfrac{\sigma \rho b - v(t)}{\sigma^2} \bigg) \bar{\mu}\widehat B(s) + \bigg( \rho b - \frac{v(s)}{\sigma}\bigg)^2 \widehat A(s)\bigg\} \, \text{d} s \notag\\ & \quad + \bigg\{(\eta - \theta)\bar{\mu}\gamma-\dfrac{(1+\eta)^2\bar{\mu}^2}{2\sigma^2}\bigg\}(T - t), \end{align}

in which R is given in (2.8).

Moreover, the optimal retention strategy ${q^p}$ is given in feedback form by

(4.5) \begin{equation} q^p_t = \dfrac{1}{\sigma^2 \gamma} \{(2(\sigma \rho b - v(t)) \widehat A(t) - 1 ) m_t + (\sigma \rho b - v(t) ) \widehat B(t) + (1+\eta) \bar{\mu} \}. \end{equation}

Proof. See Appendix C for a proof of this theorem.

Remark 4.1. As in Section 3, we derive explicit expressions for $\widehat A$ and $\widehat B$ in (4.2) and (4.3), respectively, because ${q^p}$ in (4.5) relies on $\widehat A$ and $\widehat B$ . However, for the sake of space, we do not present an explicit expression for $\widehat C$ .

Because $\sigma \rho b - v(t)$ can be negative, it is not clear whether the slope of $q^p_t$ as a function of $m_t$ is negative, as in the case for $q^p_t$ as a function of $\mu_t$ . The following corollary tells us that the slope of $q^p_t$ is, indeed, negative.

Corollary 4.1. The slope of $q^p_t$ as a linear function of $m_t$ is negative, i.e.

(4.6) \begin{equation} \frac{1}{\sigma^2 \gamma} ( 2 (\sigma \rho b - v(t)) \widehat A(t) - 1 ) < 0 , \end{equation}

for all $0 \le t \le T$ .

Proof. By substituting for $\widehat A$ and v from (4.2) and (2.6), respectively, and by simplifying the result, we can show that inequality (4.6) is equivalent to

\begin{equation*} \dfrac{1}{2 \Delta} \cdot \dfrac{( (\Delta + a) \text{e}^{2 \Delta \cdot (T - t)} + (\Delta - a) )(R \text{e}^{2\Delta \cdot t} + 1)} {R \text{e}^{2 \Delta \cdot T} + 1} > 0, \end{equation*}

or

(4.7) \begin{equation} \dfrac{R \text{e}^{2\Delta \cdot t} + 1}{R \text{e}^{2 \Delta \cdot T} + 1} > 0. \end{equation}

By substituting for R from (2.8), we find that inequality (4.7) is equivalent to

\begin{equation*} \dfrac{\sigma^2 \Delta(\text{e}^{2\Delta \cdot t} + 1) + \sigma(\sigma a - \rho b)(\text{e}^{2\Delta \cdot t} - 1) + v(0)(\text{e}^{2\Delta \cdot t} - 1)} {\sigma^2 \Delta(\text{e}^{2\Delta \cdot T} + 1) + \sigma(\sigma a - \rho b)(\text{e}^{2\Delta \cdot T} - 1) + v(0)(\text{e}^{2\Delta \cdot T} - 1)} > 0, \end{equation*}

which is true because $\sigma \Delta > \pm (\sigma a - \rho b)$ .

As in Corollary 3.1, we show how $\widehat A$ and $\widehat B$ change with time in the following corollary.

Corollary 4.2. The derivative of $\widehat A(t)$ is

\begin{align*} \widehat A^{\prime}(t) = \frac{1}{2 \sigma^2} \cdot \dfrac{R \text{e}^{2 \Delta \cdot t} + \text{e}^{2 \Delta \cdot (T-t)}} {R \text{e}^{2 \Delta \cdot T} + 1} \end{align*}

for $0 \le t \le T$ , and the derivative of $\widehat B(t)$ is

\begin{align*} \widehat B^{\prime}(t) = - \dfrac{\bar{\mu}}{\sigma^2 \Delta} \cdot \dfrac{(1 + \eta) \Delta (R \text{e}^{2 \Delta \cdot t} + \text{e}^{2\Delta \cdot (T - t)})+ \eta a (R \text{e}^{2 \Delta \cdot t} + \text{e}^{\Delta \cdot (T - t)})(\text{e}^{\Delta \cdot (T-t)} - 1)} {R \text{e}^{2 \Delta \cdot T} +1} , \end{align*}

for $0 \le t \le T$ .

Remark 4.2. Unlike Corollary 3.1, we cannot assert that $\widehat A^{\prime}(t) \ge 0$ and $\widehat B^{\prime}(t) \le 0$ because these inequalities might not hold if R is negative enough, which occurs, for example, when v(0) is relatively large. On the other hand, if $R > 0$ , then it is clear that $\widehat A^{\prime}(t) \ge 0$ and $\widehat B^{\prime}(t) \le 0$ .

In the following, we present two numerical examples to further explore the reinsurance strategies in both full and partial information cases. For each example, we set $a=1$ , $b=2.5$ , $\theta=0.4$ , $\eta=0.8$ , $\rho=0.4$ , $\sigma=3$ , $\gamma=1.2$ , $\bar{\mu}=2$ , and $\mu_0\sim N(0,1)$ .

Example 4.1. In Figure 1 we set $T=1$ and used a Monte Carlo approach to simulate the sample paths of $\mu_t$ and $q^f_t$ under full information. We present three sample paths in these two figures.

Figure 1. Left: Sample paths of $\mu_t$ when T is small. Right: Sample paths of $q^f_t$ when T is small.

To compute the probability that $q^f_t < 0$ , $q^f_t > 1$ , or $\mu_t < 0$ , we worked with 3000 sample paths, each discretized into 2000 time intervals. Let i denote a sample path, and let j denote a time instance. For $i =1, 2, \dots, 3000$ and $j = 1, 2, \dots, 2000$ , we counted the number of points for which ${q^f}(i, j) < 0$ , and computed the proportion of that number divided by the total number of observations, namely, $3000 \times 2000$ . That proportion is our estimate of the probability of $q^f_t < 0$ . Similarly, we estimated the probabilities of $q^f_t > 1$ and $\mu_t < 0$ . For our parameter values, we computed that the probability of $q^f_t < 0$ equals $0.0682$ , the probability of $q^f_t > 1$ equals 0, and the probability of $\mu_t<0$ equals $0.1081$ .

In Figure 2 we set $T = 50$ , and we present one sample path of the mean-reverting process $\mu_t$ and the reinsurance proportion $q^f_t$ . In this case, we estimated, via 3000 discretized sample paths, that the probability of $q^f_t < 0$ equals $0.1461$ , and the probability of $\mu_t < 0$ equals $0.1323$ . Thus, as the terminal time T increases, the probabilities of $q^f_t < 0$ and $\mu_t < 0$ also increase.

Figure 2. Left: Sample path of $\mu_t$ when T is large. Right: Sample path of $q^f_t$ when T is large.

As an aside, if b is relatively small and if T is large, say, 50, then the value of $\mu_t$ is close to $\bar{\mu}$ for a good portion of the interval [0, T]; thus, the probability of $\mu_t < 0$ is small. If $T = 1$ , then because $\mu_0$ might be very different from $\bar{\mu}$ , we cannot say that the probability of $\mu_t < 0$ is small.

We also observed (in work not shown here) that, as $\bar{\mu}$ decreases, the probability of $q^f_t < 0$ increases, and as $\bar{\mu}$ increases, the probability of $q^f_t > 1$ increases. Moreover, as $\bar{\mu}$ increases, then the probability of $\mu_t < 0$ decreases.

Example 4.2. In Figure 3 we simulate the sample paths of $m_t$ and $q^p_t$ under partial information. We set $T=1$ , $m_0=1.6$ , and $v(0) = 0.5$ . As in Example 4.1, we plot three sample paths of $q^p_t$ and $m_t$ . In this example, among 3000 discretized sample paths, we find that the optimal reinsurance proportion $q^p_t$ always lies in [0, 1], and the filtered drift process $m_t$ is always positive.

Figure 3. Left: Sample paths of $m_t$ when T is small. Right: Sample paths of $q^p_t$ when T is small.

In Figure 4 we set $T=50$ and plot one sample path. As in the case for $T = 1$ , among 3000 discretized sample paths, the optimal reinsurance proportion $q^p_t$ always lies in [0, 1], and the filtered drift process $m_t$ is always positive.

Figure 4. Left: Sample path of $m_t$ when T is large. Right: Sample path of $q^p_t$ when T is large.

For the optimization problem with the reinsurance constraint, that is, $0 \le q_t \le 1$ for all $0 \le t \le T$ , as in the discussion before, the value function under full information (or partial information) satisfies the HJB equation in (3.1) (or (4.1)) for ${V^f}$ (or ${V^p}$ ), except that q is constrained to lie in [0, 1]. Indeed, from [Reference Crandall, Ishii and Lions10, Reference Yong and Zhou31], we know that a value function for a constrained problem is the unique viscosity solution of its HJB equation subject to the constraint. However, due to the reinsurance constraint, we cannot derive further explicit results by following the classical HJB equation approach. Motivated by the convex-duality approach, as in [Reference Cvitanić and Karatzas11, Reference Putschögl and Sass27], we need to introduce an auxiliary unconstrained optimization problem by modifying the original problem with an auxiliary stochastic parameter, then find the relationship between the value function of the auxiliary problem and that of the original problem. Due to the randomness of the auxiliary parameter process, the corresponding value function satisfies a stochastic HJB equation, which is a special backward stochastic partial differential equation or infinite-dimensional BSDE. The solvability of the stochastic HJB equation was studied in [Reference Peng24], which provided an existence and uniqueness theorem for the case in which the volatility coefficient of the state process does not contain the control variable. In the finance and insurance literature, the BSDE approach is becoming an efficient technique to solve utility maximization problems with stochastic coefficients. A stochastic Stackelberg differential game between an insurer and a reinsurer was considered in [Reference Chen and Shen9] by applying the BSDE approach. See also [Reference Delong13] for more details about the applications in actuarial and financial models with the BSDE approach. Because the convex-duality-plus-BSDE approach differs greatly from the HJB equation approach in this paper, we will leave that work for future research.

5. The relationship between Vf and Vp

In this section we adapt the technique in [Reference Brendle7] to show the link between the value function ${V^f}$ under full observation and the value function ${V^p}$ under partial observation.

Theorem 5.1. The following relationships hold among A, B, C and $\widehat A, \widehat B, \widehat C$ :

(5.1) \begin{align} \widehat A(t) & = \dfrac{A(t)}{1 - 2 v(t)A(t)} ,\end{align}
(5.2) \begin{align} \widehat B(t) & = \dfrac{B(t)}{1 - 2 v(t)A(t)} ,\end{align}
(5.3) \begin{align} \widehat C(t) & = {C(t)} + \dfrac{B^2(t)v(t)}{2(1 - 2 v(t)A(t))} - \frac{1}{2} \ln (1 - 2 v(t)A(t)) + \dfrac{N(t)v(t)}{2(1 - 2 v(t)A(t))} + Q(t) , \end{align}

in which N and Q satisfy the following differential equations:

\begin{align*} \dfrac{\text{d} N(t)}{\text{d} t} & = - \frac{1}{\sigma^2} (1-2\sigma \rho b A(t))^2 - 4 b^2(1 - \rho^2) A(t)N(t) + \dfrac{2}{\sigma} (\sigma a - \rho b) N(t), \\ \frac{\text{d} Q(t)}{\text{d} t} & = - \frac{1}{2} b^2(1 - \rho^2) N(t), \end{align*}

with boundary conditions $N(T) = Q(T) = 0$ . Moreover, the value functions ${V^f}$ and ${V^p}$ satisfy

(5.4) \begin{align} \mathbb E \big[{V^f} (t, X_t, \mu_t) \mid \mathcal G_t \big] = {V^p}(t, X_t, m_t) \text{e}^{-\hat{P}(t)}, \end{align}

in which

\begin{equation*} \hat{P}(t) = \dfrac{N(t)v(t)}{2(1 - 2 v(t)A(t))} + Q(t). \end{equation*}

Finally, the optimal retention strategies ${q^f}$ and ${q^p}$ satisfy

\begin{equation*} \mathbb E\big[q^f_t \mid \mathcal G_t \big] = (1 - 2v(t) A(t)) q^p_t + \frac{v(t)}{\sigma^2 \gamma} \{2(1+\eta)\bar{\mu} A(t) + B(t)\}. \end{equation*}

Proof. The validity of (5.1)–(5.3) can be checked directly by differentiating terms on the right-hand sides and comparing them with the existing differential equations satisfied by the left-hand sides.

We next prove (5.4). Because the distribution of $\mu_t$ conditional on $\mathcal G_t$ is Gaussian with mean $m_t$ and variance v(t), we have

\begin{align*} \mathbb E \big[\exp( A(t) \mu^2_t + B(t) \mu_t ) \mid \mathcal G_t \big] & = \frac{1}{\sqrt{2 \pi v(t)}\,} \int_{\mathbb R} \text{e}^{A(t) x^2 + B(t) x} \exp\bigg\{{-} \frac{(x-m_t)^2}{2v(t)}\bigg\} \, \text{d} x \\ & = \frac{1}{\sqrt{2 \pi v(t)}\,} \int_{\mathbb R} \exp \bigg\{{-} \dfrac{1 - 2 v(t) A(t)}{2v(t)}\bigg(x-\dfrac{m_t + v(t)B(t)}{{1-2v(t)A(t)}} \bigg)^2 \\ & \qquad \qquad \qquad \qquad \quad + \dfrac{(B(t)v(t)+m_t)^2 - m^2_t(1-2v(t)A(t))}{2(1 - 2 v(t) A(t))v(t)}\bigg\} \, \text{d} x \\ & = \frac{1}{\sqrt{1 - 2 v(t) A(t)}\,} \exp \bigg\{\dfrac{B^2(t)v(t)+2B(t)m_t}{2(1 - 2 v(t) A(t))} + \dfrac{A(t)m^2_t}{1 - 2 v(t) A(t)} \bigg\} \\ & = \exp \bigg\{\frac{A(t)}{{1 - 2 v(t) A(t)}}m^2_t + \frac{B(t)}{{1 - 2 v(t) A(t)}}m_t + \dfrac{B^2(t)v(t)}{2(1 - 2 v(t)A(t))} \\ & \qquad \qquad - \frac{1}{2} \ln (1 - 2 v(t)A(t))\bigg\}. \end{align*}

Hence, we obtain

\begin{equation*} \mathbb E \big[\exp( A(t) \mu^2_t + B(t) \mu_t + C(t)) \mid \mathcal G_t \big] = \exp( \widehat A(t) m^2_t + \widehat B(t) m_t + \widehat C(t)) \text{e}^{-\hat{P}(t)}. \end{equation*}

Finally, from the expressions of $q^f_t$ and $q^p_t$ in (3.5) and (4.5), respectively, we derive

\begin{align*} (1 - & 2v(t) A(t)) q^p_t + \frac{v(t)}{\sigma^2 \gamma} \{2(1+\eta)\bar{\mu} A(t) + B(t) \} \\ & = \dfrac{1}{\sigma^2 \gamma}\{(2(\sigma \rho b - v(t)) A(t) - (1 - 2 v(t) A(t))) m_t + (\sigma \rho b - v(t)) B(t)\} \\ & \quad + \dfrac{1}{\sigma^2 \gamma} (1+\eta)(1 - 2 v(t) A(t)) \bar{\mu} + \frac{v(t)}{\sigma^2 \gamma}\{2(1+\eta)\bar{\mu} A(t) + B(t)\} \\ & = \dfrac{1}{\sigma^2 \gamma}\{ (2 \sigma \rho b A(t) - 1)m_t + \sigma \rho b B(t) + (1+\eta)\bar{\mu}\} = \mathbb E\big[q^f_t \mid \mathcal G_t \big], \end{align*}

which completes our proof.

Note that the slopes of both $q^f_t$ and $q^p_t$ as functions of $\mu_t$ and $m_t$ , respectively, are negative for $0 \le t \le T$ . Also, the vertical intercept of $q^f_t$ thought of as a function of $\mu_t$ is positive, although the corresponding statement for the vertical intercept of $q^p_t$ is not necessarily true; see, for example, the right panel of Figure 5. A natural question is how these slopes and intercepts compare to each other, and the following corollary answers this query.

Figure 5. Left: Optimal proportional retention under full and partial information when v(0) is small. Right: Optimal proportional retention under full and partial information when v(0) is large.

Corollary 5.1. For all $0 \le t \le T$ ,

(5.5) \begin{equation} \dfrac{1}{\sigma^2 \gamma}\{ 2 \sigma \rho b A(t) - 1\} \le \dfrac{1}{\sigma^2 \gamma}\{ 2(\sigma \rho b - v(t)) \widehat A(t) - 1\} < 0 \end{equation}

and

(5.6) \begin{equation} \dfrac{1}{\sigma^2 \gamma}\{ \sigma \rho b B(t) + (1 + \eta) \bar{\mu}\} > \dfrac{1}{\sigma^2 \gamma}\{ (\sigma \rho b - v(t)) \widehat B(t) + (1 + \eta) \bar{\mu}\}, \end{equation}

with strict inequalities when $0 \le t < T$ .

Proof. If $t = T$ , then the first inequality in (5.5) is an equality because $A(T) = \widehat A(T) = 0$ . For $0 \le t < T$ , the first inequality in (5.5) holds strictly if and only if the following string of implications holds:

\begin{align*} 2 \sigma \rho b A(t) < 2(\sigma \rho b - v(t)) \widehat A(t) & \iff \sigma \rho b A(t) < (\sigma \rho b - v(t)) \, \dfrac{A(t)}{1 - 2 v(t) A(t)} \\ & \overset{A(t) < 0}{\iff} \sigma \rho b > \dfrac{\sigma \rho b - v(t)}{1 - 2 v(t) A(t)}, \end{align*}

which is true because $A(t) < 0$ and $v(t) > 0$ for all $0 \le t < T$ . The proof of (5.6) is similar (using $B(t) > 0$ when $0 \le t < T$ ), so we omit it.

It is intuitively pleasing that $q^p_t$ reacts less strongly to changes in $m_t$ than $q^f_t$ reacts to $\mu_t$ . Indeed, in the partial information case, the risk-averse insurer has less information and, therefore, is more cautious in changing the proportion of retained risk. Similarly, inequality (5.6) implies that the insurer retains less risk when $m_t = 0$ in the partial information case than when $\mu_t = 0$ in the full information case. See Figure 5 for an illustration of Corollary 5.1.

In the following, we present three numerical examples to further explore the difference between $q^f_t$ and $q^p_t$ . For each example, we choose $b=2.5$ , $\theta=0.4$ , $\eta=0.8$ , $\rho=0.4$ , $\sigma=3$ , $\gamma=1.2$ , $\bar{\mu}=2$ , and $T=2$ .

Example 5.1. In the left panel of Figure 5, we set $a=1$ and $v(0) = 0.5$ . We plot the graphs of $q^f_t$ and $q^p_t$ at time $t=0.5$ as functions of $\mu_t$ and $m_t$ by assuming that $m_t = \mu_t$ at this time. We observe that both $q^f_t$ and $q^p_t$ are linear functions of $\mu_t = m_t$ , as we expect from Theorems 3.2 and 4.2, but the slope of $q^f_t$ is steeper than that of $q^p_t$ , as we expect from Corollary 5.1. From our algebraic work, we also note that $\lim_{t \to T^-} q^f_t = q^p_t$ when $\mu_t = m_t$ , which our numerical work (not shown here) confirms.

Next, in the right panel of Figure 5 we enlarge the value of v(0) by setting $v(0) = 100$ , and we plot the graphs of $q^f_t$ and $q^p_t$ at time $t=0$ . We observe that, unlike the full information case, for which the vertical intercept of $q^f_t$ is positive (see Remark 3.2), the intercept of $q^p_t$ as a function of $m_t$ can be negative.

Example 5.2. From Corollary 3.1, we know that the slope of $q^f_t$ increases with t, and the intercept of $q^f_t$ decreases with t. Motivated by this corollary, in this example, we investigate the changes of the slope and the intercept of $q^p_t$ with t. In the left panel of Figure 6 we set $a=1$ and $v(0)=0.5$ . We plot the graph of the slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ as a function of $t \in [0, T]$ . In the right panel of Figure 6 we plot the graph of the intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ as a function of $t \in [0, T]$ . In these graphs, when v(0) is relatively small, we see that the slope and intercept of $q^p_t$ increase and decrease with t, respectively, as is true for $q^f_t$ .

Figure 6. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is small. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is small.

In Figure 7 we plot the slope and intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is relatively large, that is, $v(0) = 100$ . We find that both the slopes and intercepts of $q^p_t$ are monotonic with time t, but the monotonicity is the opposite of that when v(0) is relatively small, that is, $v(0) = 0.5$ .

Figure 7. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is large. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is large.

Example 5.3. In Figure 8 we plot the graph of the optimal proportional retention $q^f_t$ and $q^p_t$ when the parameter a is large. We set $a=3$ and $v(0) = 0.5$ . This graph shows that the values of $q^f_t$ and $q^p_t$ are close to each other when the parameter a is large.

Figure 8. Optimal proportional retention under full and partial information when a is large.

In Figure 9 we plot the slope and intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ in this case; we see that they are monotonic with respect to t, with the same monotonicity that the slope and intercept of $q^f_t$ possess. Furthermore, when we take a larger value of v(0), such as $v(0) = 5$ , Figure 10 shows that they both lose monotonicity with t.

Figure 9. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when a is large. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when a is large.

Figure 10. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when both a and v(0) are large. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when both a and v(0) are large.

Appendix A. Derivation and solution of the Riccati equation (2.6)

First, [Reference Liptser and Shiryaev23, Section 10.2.1] proved that $v(t) = \mathbb E ( (\mu_t - m_t)^2 )$ , i.e. the conditional variance of $\mu_t$ equals the unconditional variance.

Define the process $\{\delta_t\}_{0 \le t \le T}$ by $\delta_t = \mu_t - m_t$ . Then, by (2.1) and (2.4), we have

\begin{equation*} \text{d} \delta_t = - a \delta_t \text{d} t + b \text{d} W_{2, t} - \dfrac{\sigma \rho b - v(t)}{\sigma} \text{d}\bar W_{1,t}.\end{equation*}

Use (2.5) to replace $\text{d}\bar W_{1,t}$ with $\text{d} W_{1,t} - ({\delta_t}/{\sigma}) \text{d} t$ ,

\begin{equation*} \text{d} \delta_t = \delta_t \bigg( {-} a + \dfrac{\sigma \rho b - v(t)}{\sigma^2} \bigg) \text{d} t + b \text{d} W_{2,t} - \dfrac{\sigma \rho b - v(t)}{\sigma} \text{d} W_{1,t} , \end{equation*}

and Itô’s formula gives us

(A.1) \begin{align} \delta^2_t & = \delta^2_0 + 2 \int_0^t \delta_s \bigg( \bigg( - a + \dfrac{\sigma \rho b - v(s)}{\sigma^2} \bigg) \delta_s \, \text{d} s + b \, \text{d} W_{2,s} - \dfrac{\sigma \rho b - v(s)}{\sigma} \, \text{d} W_{1,s} \bigg) \notag \\ & \quad + \int_0^t \bigg(b^2 - 2 \rho b \dfrac{\sigma \rho b - v(s)}{\sigma} + \bigg( \dfrac{\sigma \rho b - v(s)}{\sigma} \bigg)^2 \bigg) \, \text{d} s.\end{align}

Because $W_1$ and $W_2$ are $\mathbb F$ -standard Brownian motions, and because v(t) and $\mathbb E(\delta^2_t)$ are continuous with respect to t and, thus, are bounded on [0, T], we have

\begin{equation*} \mathbb E \bigg( \int_0^t \delta_s \dfrac{\sigma \rho b - v(s)}{\sigma} \, \text{d} W_{1,s} \bigg) = 0 = \mathbb E \bigg( \int_0^t \delta_s b \, \text{d} W_{2,s} \bigg) .\end{equation*}

By taking (unconditional) expectation of both sides of (A.1), we obtain

\begin{align*} v(t) & = v(0) + \int_0^t \bigg(2 \bigg({-} a + \dfrac{\sigma \rho b - v(s)}{\sigma^2} \bigg) v(s) + b^2 - 2 \rho b \dfrac{\sigma \rho b - v(s)}{\sigma} \\ & \qquad \qquad \qquad \quad + \bigg( \dfrac{\sigma \rho b - v(s)}{\sigma} \bigg)^2 \bigg) \, \text{d} s \\ & = v(0) + \int_0^t \bigg( b^2 - 2a v(s) - \bigg( \dfrac{\sigma \rho b - v(s)}{\sigma} \bigg)^2 \bigg) \, \text{d} s,\end{align*}

which gives us the Riccati equation (2.6).

To solve this equation (see [Reference Lakner19, Remark 4.2], which also provides an explicit solution for the constant-coefficient, one-dimensional Riccati equation), first define the function y by

(A.2) \begin{equation} y(t) = \dfrac{\sigma \rho b - v(t)}{\sigma^2};\end{equation}

(2.6) gives us the following Riccati equation for y:

(A.3) \begin{equation} \dfrac{\text{d} y(t)}{\text{d} t} = \dfrac{2 \sigma a \rho b - b^2}{\sigma^2} - 2ay(t) + y^2(t).\end{equation}

Next, define the function u by

\begin{equation*} y(t) = - \dfrac{u^{\prime}(t)}{u(t)},\end{equation*}

or equivalently,

(A.4) \begin{equation} u(t) = \exp\bigg\{{-}\int y(t) \, \text{d} t\bigg\};\end{equation}

then, (A.3) gives us the following second-order ordinary differential equation (ODE) for u:

(A.5) \begin{equation} \dfrac{2 \sigma a \rho b - b^2}{\sigma^2} u(t) + 2a u^{\prime}(t) + u^{\prime\prime}(t) = 0.\end{equation}

The ODE in (A.5) has the general solution $u(t) = A_1 \text{e}^{r_1 t} + A_2 \text{e}^{r_2 t}$ , in which $A_1$ and $A_2$ are constants to be determined, $r_1 = - a + \Delta$ , and $r_2 = - a - \Delta$ , with $\Delta$ given in (2.7). By reversing (A.2) and (A.4), we obtain the following general expression for v:

\begin{equation*} v(t) = \sigma^2 \Delta \cdot \dfrac{A_1 \text{e}^{2\Delta \cdot t} - A_2}{A_1 \text{e}^{2\Delta \cdot t} + A_2} - \sigma(\sigma a - \rho b),\end{equation*}

or equivalently,

\begin{equation*} v(t) = \sigma^2 \Delta \cdot \dfrac{R \text{e}^{2\Delta \cdot t} - 1}{R \text{e}^{2\Delta \cdot t} + 1} - \sigma(\sigma a - \rho b),\end{equation*}

in which $R = A_1/A_2$ . By using the given initial condition v(0), we determine R:

\begin{equation*} v(0) = \sigma^2 \Delta \cdot \dfrac{R - 1}{R + 1} - \sigma(\sigma a - \rho b),\end{equation*}

which gives us R as in (2.8).

Appendix B. Proof of Theorem 3.2

From related work with exponential utility (e.g. [Reference Brendle6]), we hypothesize that the value function is of the form

(B.1) \begin{equation} {V^f}(t, x, \mu)= - \text{e}^{-\gamma x} \exp\{A(t)\mu^2 + B(t)\mu + C(t)\}\end{equation}

for some functions of time A, B, and C, with $A(T) = B(T) = C(T) = 0$ . The terminal conditions follow from ${V^f}(T, x, \mu) = - \text{e}^{-\gamma x}$ .

Because ${V^f}$ in (B.1) is concave with respect to x, the first-order necessary condition in (3.1) is sufficient, and we obtain the optimal retention in feedback form as

(B.2) \begin{align} {q^f}(t, x, \mu) & = - \dfrac{( (1 + \eta) \bar{\mu} - \mu_t ) V^f_x(t, x, \mu) + \sigma \rho b V^f_{x \mu}(t, x, \mu)}{\sigma^2 V^f_{xx}(t, x, \mu)} \notag \\ & = \dfrac{1}{\sigma^2 \gamma} \{(2 \sigma b \rho A(t) - 1)\mu_t + (1+\eta)\bar{\mu} + \sigma b \rho B(t)\},\end{align}

in which we abuse notation slightly by using ${q^f}$ to refer both to the optimal retention strategy (as in ${q^f} = \{q^f_t\}_{0 \le t \le T}$ ) and to the deterministic function ${q^f}$ in (B.2). Note that ${q^f}$ in (B.2) is independent of the surplus.

By substituting (B.1) and (B.2) into (3.1) and rearranging terms, we obtain

\begin{align*} & \bigg\{A^{\prime}(t) + 2b^2 (1 - \rho^2) A^2(t) - \dfrac{2}{\sigma} (\sigma a - \rho b)A(t) - \frac{1}{2\sigma^2}\bigg\} \mu^2 \\& \quad + \bigg\{B^{\prime}(t) + \bigg(2b^2(1-\rho^2)A(t) - \dfrac{1}{\sigma} (\sigma a - \rho b) \bigg) B(t) + 2 \bar{\mu}\bigg(a - \frac{\rho b (1+\eta)}{\sigma}\bigg)A(t) \\ & \qquad + \dfrac{ (1+\eta) \bar{\mu}}{\sigma^2}\bigg\} \mu + \bigg \{C^{\prime}(t) + \dfrac{1}{2} b^2(1-\rho^2) B^2(t) + \bigg(a - \dfrac{\rho b(1+\eta)}{\sigma}\bigg)\bar{\mu} B(t) \\ & \qquad \qquad \qquad + b^2A(t) + (\eta - \theta)\bar{\mu}\gamma-\dfrac{(1+\eta)^2\bar{\mu}^2}{2 \sigma^2}\bigg\} = 0.\end{align*}

Thus, we have the following three differential equations for A, B, and C:

(B.3) \begin{align} A^{\prime}(t) + 2 b^2 (1 - \rho^2) A^2(t) - \dfrac{2}{\sigma} (\sigma a - \rho b)A(t) - \frac{1}{2 \sigma^2} = 0, \end{align}
(B.4) \begin{align} B^{\prime}(t) + \bigg(2b^2(1-\rho^2)A(t) - \dfrac{1}{\sigma} \, (\sigma a - \rho b) \bigg) B(t) + 2 \bar{\mu}\bigg(a - \frac{\rho b (1+\eta)}{\sigma}\bigg)A(t) + \dfrac{ (1+\eta) \bar{\mu}}{\sigma^2} = 0, \end{align}
(B.5) \begin{align} C^{\prime}(t) + \dfrac{1}{2} b^2(1-\rho^2) B^2(t) + \bigg(a - \dfrac{\rho b(1+\eta)}{\sigma}\bigg)\bar{\mu} B(t) + b^2A(t) + (\eta - \theta)\bar{\mu}\gamma-\dfrac{(1+\eta)^2\bar{\mu}^2}{2 \sigma^2} & = 0, \end{align}

with terminal conditions $A(T) = 0$ , $B(T) = 0$ , and $C(T) = 0$ .

Equation (B.3) is a constant-coefficient Riccati equation, which we can solve explicitly by using the same method as in Appendix A, although we are given $A(T) = 0$ instead of v(0); by doing so, we obtain the expression for A in (3.2).

Equation (B.4) is a linear differential equation, which, by substituting for A from (3.2), we can rewrite as

\begin{align*}0 & = \dfrac{\text{d}}{\text{d} t} \Big[ \Big\{ \alpha_1 \text{e}^{\Delta \cdot (T - t)} + \alpha_2 \text{e}^{-\Delta \cdot (T - t)} \Big\} B(t) \Big] \nonumber\\[4pt] & \quad + 2 \bar{\mu} \Big\{ ((1 + \eta) \Delta + \eta a ) \text{e}^{\Delta \cdot (T - t)} + ((1 + \eta) \Delta - \eta a ) \text{e}^{-\Delta \cdot (T - t)} \Big\}.\end{align*}

By integrating this from t to T and using $B(T) = 0$ , we obtain the expression for B in (3.3).

Finally, by integrating (B.5) from t to T and by using $C(T) = 0$ , we obtain the integral representation for C in (3.4).

It remains to show that ${V^f}$ and ${q^f}$ satisfy the conditions of Theorem 3.1. By construction, ${V^f}$ satisfies the HJB equation (3.1) with boundary condition ${V^f}(T, x, \mu) = -\text{e}^{-\gamma x}$ . To show that the retention strategy ${q^f}$ in (3.5) is admissible, we check the three conditions in the definition of admissibility. First, note that ${q^f}$ is adapted to the filtration $\mathbb{F}$ by its definition. Second, ${q^f}$ is conditional $L^2$ -integrable because A(t) and B(t) are bounded functions on [0, T], and $\mu = \{\mu_t \}$ is conditional $L^2$ -integrable. Third, the SDE (2.2) has a pathwise unique solution, which is easy to see because ${q^f}$ is independent of the surplus X. Thus, ${q^f}$ is admissible.

From Theorem 3.1, we deduce that ${V^f}$ and ${q^f}$ as stated in Theorem 3.2 equal the value function and optimal retention strategy, respectively.

Appendix C. Proof of Theorem 4.2

As in Appendix B, we hypothesize that the value function is of the form

(C.1) \begin{equation} {V^p}(t, x, m)= - \text{e}^{-\gamma x} \exp\big\{\widehat A(t)m^2 + \widehat B(t)m + \widehat C(t) \big\} \end{equation}

for some functions of time $\widehat A$ , $\widehat B$ , and $\widehat C$ , with $\widehat A(T) = \widehat B(T) = \widehat C(T) = 0$ . The terminal conditions follow from ${V^p}(T, x, \mu) = - \text{e}^{-\gamma x}$ .

Because ${V^p}$ in (C.1) is concave with respect to x, the first-order necessary condition in (4.1) is sufficient, and we obtain the optimal retention in feedback form as

(C.2) \begin{align} {q^p}(t, x, m) & = - \dfrac{( (1 + \eta) \bar{\mu} - m ) V^p_x(t, x, m) + (\rho \sigma b - v(t)) V^p_{x m}(t, x, m)} {\sigma^2 V^p_{xx}(t, x, m)} \notag \\ & = \dfrac{1}{\sigma^2 \gamma} \{(2(\sigma \rho b - v(t)) \widehat A(t) - 1 ) m + (\sigma \rho b - v(t) ) \widehat B(t) + (1+\eta) \bar{\mu}\},\end{align}

in which we abuse notation slightly by using ${q^p}$ to refer both to the optimal retention strategy and to the deterministic function ${q^p}$ in (C.2).

By substituting (C.1) and (C.2) into (4.1) and rearranging terms, we obtain

\begin{align*} & \bigg\{\widehat A^{\prime}(t) - \dfrac{2}{\sigma} \bigg((\sigma a - \rho b) + \frac{v(t)}{\sigma}\bigg)\widehat A(t) - \frac{1}{2 \sigma^2}\bigg\} m^2 \\ & + \bigg\{\widehat B^{\prime}(t) - \frac{1}{\sigma} \bigg( (\sigma a - \rho b) + \frac{v(t)}{\sigma} \bigg) \widehat B(t) + 2 \bar{\mu}\bigg(a - (1+\eta) \dfrac{\sigma \rho b - v(t)}{\sigma^2} \bigg)\widehat A(t) \\ & \qquad + \dfrac{(1+\eta) \bar{\mu}}{\sigma^2}\bigg\} m + \bigg\{\widehat C^{\prime}(t) + \bigg(a - (1+\eta) \dfrac{\sigma \rho b - v(t)}{\sigma^2} \bigg) \bar{\mu} \widehat B(t) + \bigg( \rho b - \frac{v(t)}{\sigma}\bigg)^2\widehat A(t) \\ & \qquad \qquad \qquad \qquad \qquad \quad + (\eta - \theta)\bar{\mu} \gamma-\dfrac{(1+\eta)^2\bar{\mu}^2}{2 \sigma^2}\bigg\} = 0 .\end{align*}

Thus, we have the following three differential equations for $\widehat A(t)$ , $\widehat B(t)$ , and $\widehat C(t)$ :

(C.3) \begin{equation} \widehat A^{\prime}(t) - \dfrac{2}{\sigma} \bigg((\sigma a - \rho b) + \frac{v(t)}{\sigma}\bigg)\widehat A(t) - \frac{1}{2 \sigma^2} = 0, \end{equation}
(C.4) \begin{equation} \widehat B^{\prime}(t) - \frac{1}{\sigma} \bigg( (\sigma a - \rho b) + \frac{v(t)}{\sigma} \bigg) \widehat B(t) + 2 \bar{\mu}\bigg(a - (1+\eta) \dfrac{\sigma \rho b - v(t)}{\sigma^2} \bigg)\widehat A(t) + \dfrac{(1+\eta) \bar{\mu}}{\sigma^2} = 0, \end{equation}
(C.5) \begin{align} & \widehat C^{\prime}(t) + \bigg(a - (1+\eta) \dfrac{\sigma \rho b - v(t)}{\sigma^2} \bigg) \bar{\mu} \widehat B(t) + \bigg( \rho b - \frac{v(t)}{\sigma}\bigg)^2\widehat A(t) + (\eta - \theta)\bar{\mu}\gamma\notag \\& \quad - \dfrac{(1+\eta)^2\bar{\mu}^2}{2 \sigma^2} = 0, \end{align}

with terminal conditions $\widehat A(T) = 0$ , $\widehat B(T) = 0$ , and $\widehat C(T) = 0$ .

We can rewrite (C.3) as

\begin{equation*} 0 = \dfrac{\text{d}}{\text{d} t} \bigg[ \dfrac{\widehat A(t)}{(R \text{e}^{\Delta \cdot t} + \text{e}^{-\Delta \cdot t})^2} \bigg] - \dfrac{1}{2 \sigma^2} \cdot \dfrac{1}{(R \text{e}^{\Delta \cdot t} + \text{e}^{-\Delta \cdot t})^2}.\end{equation*}

and by integrating this from t to T and using $\widehat A(T) = 0$ , we obtain the expression for $\widehat A$ in (4.2).

Equation (C.4) is a linear differential equation, which, by substituting for $\widehat A$ from (4.2), we rewrite as

\begin{align*} 0 = \dfrac{\text{d}}{\text{d} t} \bigg[ \dfrac{\widehat B(t)}{R \text{e}^{\Delta \cdot t} + \text{e}^{-\Delta \cdot t}} \bigg] + \dfrac{\bar{\mu}}{2 \sigma^2 \Delta} \cdot \dfrac{\text{e}^{\Delta \cdot t}}{R \text{e}^{2\Delta \cdot T} + 1} \big\{ (1 + \eta) \Delta (\text{e}^{2\Delta \cdot (T - t)} + 1) + \eta a (\text{e}^{2\Delta \cdot (T - t)} - 1) \big\} ,\end{align*}

and by integrating this from t to T and using $\widehat B(T) = 0$ , we obtain the expression for $\widehat B$ in (4.3).

Finally, by integrating (C.5) from t to T and using $\widehat C(T) = 0$ , we obtain the integral representation of $\widehat C$ in (4.4).

It remains to show that ${V^p}$ and ${q^p}$ satisfy the conditions of Theorem 4.1. The argument is similar to the one in the proof of Theorem 3.2, so we omit it. Thus, ${V^p}$ and ${q^p}$ as stated in Theorem 4.2 equal the value function and optimal retention strategy, respectively.

Funding information

X. Liang thanks the Research Foundation for Returned Scholars of Hebei Province (C20200102), Natural Science Foundation of Tianjin (19JCYBJC30400), and the Natural Science Foundation of Hebei Province (A2020202033) for financial support, and Virginia R. Young thanks the Cecil J. and Ethel M. Nesbitt Professorship of Actuarial Mathematics for financial support.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Bäuerle, N. and Leimcke, G. (2021). Robust optimal investment and reinsurance problems with learning. Scand. Actuar. J. 2021, 82109.CrossRefGoogle Scholar
Bäuerle, N. and Rieder, U. (2007). Portfolio optimization with jumps and unobservable intensity process. Math. Finance 17, 205224.CrossRefGoogle Scholar
Bi, J., Liang, Z. and Yuen, K. C. (2019). Optimal mean-variance investment/reinsurance with common shock in a regime-switching market. Math. Meth. Operat. Res. 90, 109135.CrossRefGoogle Scholar
Björk, T., Davis, M. H. A. and Landén, C. (2010). Optimal investment under partial information. Math. Meth. Operat. Res. 71, 371399.10.1007/s00186-010-0301-xCrossRefGoogle Scholar
Brachetta, M. and Ceci, C. (2020). A BSDE-based approach for the optimal reinsurance problem under partial information. Insurance Math. Econom. 95, 116.CrossRefGoogle Scholar
Brendle, S. (2004). Portfolio selection under partial observation and constant absolute risk aversion. Available at https://ssrn.com/abstract=590362.Google Scholar
Brendle, S. (2006). Portfolio selection under incomplete information. Stoch. Process. Appl. 116, 701723.CrossRefGoogle Scholar
Browne, S. (1995). Optimal investment policies for a firm with a random risk process: Exponential utility and minimizing the probability of ruin. Math. Operat. Res. 20, 937958.CrossRefGoogle Scholar
Chen, L. and Shen, Y. (2018). On a new paradigm of optimal reinsurance: A stochastic Stackelberg differential game between an insurer and a reinsurer. ASTIN Bull. 48, 905960.10.1017/asb.2018.3CrossRefGoogle Scholar
Crandall, M. G., Ishii, H. and Lions, P. L. (1992). User’s guide to viscosity solutions of second-order partial differential equations. Bull. Amer. Math. Soc. 27, 167.CrossRefGoogle Scholar
Cvitanić, J. and Karatzas, I. (1992). Convex duality in constrained portfolio optimization, Ann. Appl. Prob. 2, 767818.CrossRefGoogle Scholar
Dassios, A. and Jang, J.-W. (2005). Kalman–Bucy filtering for linear systems driven by the Cox process with shot noise intensity and its application to the pricing of reinsurance contracts. J. Appl. Prob. 42, 93107.CrossRefGoogle Scholar
Delong, Ł. (2013). Backward Stochastic Differential Equations with Jumps and Their Actuarial and Financial Applications. Springer, New York.CrossRefGoogle Scholar
Gennotte, G. (1986). Optimal portfolio choice under incomplete information. J. Finance 41, 733749.CrossRefGoogle Scholar
Hipp, C. and Taksar, M. (2010). Optimal non-proportional reinsurance control. Insurance Math. Econom. 47, 246254.CrossRefGoogle Scholar
Honda, T. (2003). Optimal portfolio choice for unobservable and regime-switching mean returns. J. Econom. Dynam. Control 28, 4578.CrossRefGoogle Scholar
Irgens, C. and Paulsen, J. (2004). Optimal control of risk exposure, reinsurance and investments for insurance portfolios. Insurance Math. Econom. 35, 2151.CrossRefGoogle Scholar
Lakner, P. (1995). Utility maximization with partial information. Stoch. Process. Appl. 56, 247273.CrossRefGoogle Scholar
Lakner, P. (1998). Optimal trading strategy for an investor: The case of partial information. Stoch. Process. Appl. 76, 7797.CrossRefGoogle Scholar
Li, D., Li, D. and Young, V. R. (2017). Optimality of excess-loss reinsurance under a mean-variance criterion. Insurance Math. Econom. 75, 8289.CrossRefGoogle Scholar
Liang, Z. and Bayraktar, E. (2014). Optimal reinsurance and investment with unobservable claim size and intensity. Insurance Math. Econom. 55, 156166.CrossRefGoogle Scholar
Liang, Z., Yuen, K. C. and Guo, J. (2011). Optimal proportional reinsurance and investment in a stock market with Ornstein–Uhlenbeck process. Insurance Math. Econom. 49, 207215.10.1016/j.insmatheco.2011.04.005CrossRefGoogle Scholar
Liptser, R. S. and Shiryaev, A. N. (1997). Statistics of Random Processes Vol. I (Stoch. Model. Appl. Prob. 5). Springer, New York.Google Scholar
Peng, S. (1992). Stochastic Hamilton–Jacobi–Bellman equations. SIAM J. Control Optimization 30, 284304.CrossRefGoogle Scholar
Peng, X. and Hu, Y. (2013). Optimal proportional reinsurance and investment under partial information. Insurance Math. Econom. 53, 416428.CrossRefGoogle Scholar
Promislow, S. D. and Young, V. R. (2005). Minimizing the probability of ruin when claims follow Brownian motion with drift. N. Amer. Actuar. J. 9, 110128.CrossRefGoogle Scholar
Putschögl, W. and Sass, J. (2011). Optimal investment under dynamic risk constraints and partial information. Quant. Finance 11, 15471564.CrossRefGoogle Scholar
Schmidli, H. (2002). On minimizing the ruin probability by investment and reinsurance. Ann. Appl. Prob. 12, 890907.CrossRefGoogle Scholar
Sun, Z., Zhang, X. and Yuen, K. C. (2020). Mean-variance asset-liability management with affine diffusion factor process and a reinsurance option. Scand. Actuar. J. 2020, 218244.CrossRefGoogle Scholar
Xiong, J., Xu, Z. and Zheng, J. (2021). Mean-variance portfolio selection under partial information with drift uncertainty. Quant. Finance 21, 113.CrossRefGoogle Scholar
Yong, J. and Zhou, X. (1999). Stochastic Controls: Hamiltonian Systems and HJB Equations (Stoch. Model. Appl. Prob. 43). Springer, New York.Google Scholar
Zhang, X. and Siu, T. K. (2009). Optimal investment and reinsurance of an insurer with model uncertainty. Insurance Math. Econom. 45, 8188.CrossRefGoogle Scholar
Figure 0

Figure 1. Left: Sample paths of $\mu_t$ when T is small. Right: Sample paths of $q^f_t$ when T is small.

Figure 1

Figure 2. Left: Sample path of $\mu_t$ when T is large. Right: Sample path of $q^f_t$ when T is large.

Figure 2

Figure 3. Left: Sample paths of $m_t$ when T is small. Right: Sample paths of $q^p_t$ when T is small.

Figure 3

Figure 4. Left: Sample path of $m_t$ when T is large. Right: Sample path of $q^p_t$ when T is large.

Figure 4

Figure 5. Left: Optimal proportional retention under full and partial information when v(0) is small. Right: Optimal proportional retention under full and partial information when v(0) is large.

Figure 5

Figure 6. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is small. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is small.

Figure 6

Figure 7. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is large. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when v(0) is large.

Figure 7

Figure 8. Optimal proportional retention under full and partial information when a is large.

Figure 8

Figure 9. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when a is large. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when a is large.

Figure 9

Figure 10. Left: The slope of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when both a and v(0) are large. Right: The intercept of $q^p_t$ divided by the ratio $1/(\sigma^2 \gamma)$ when both a and v(0) are large.