1. Introduction
The embedding problem of Markov chains is a long-standing challenge. It involves investigating whether a given stochastic matrix $ \mathbf{P}$ can be considered as the matrix of transition probabilities from time 0 to time 1 of a continuous-time Markov chain. The problem was first proposed by Elfving [Reference Elfving12] with a first in-depth study by Kingman [Reference Kingman24]. If the Markov chain is homogeneous, this problem boils down to checking whether $ \mathbf{P}$ is the exponential of some matrix $ \mathbf{Q}$ having all non-negative off-diagonal entries and zero row-sums, called a rate matrix. This matrix $ \mathbf{Q}$ then represents the transition rates of the underlying continuous-time homogeneous Markov chain (CTHMC). If $ \exp{\mathbf{Q}} = \mathbf{P}$ for some rate matrix $ \mathbf{Q}$ , it is said that $ \mathbf{P}$ is embeddable and that $ \mathbf{Q}$ is a Markov generator of $ \mathbf{P}$ . For the non-homogeneous case, it has been proved that if $ \mathbf{P}$ belongs to the interior of the set of embeddable matrices, then $ \mathbf{P}$ is equal to a finite product of exponentials of extreme rate matrices, i.e. rate matrices with only one off-diagonal element strictly positive; see [Reference Johansen20]. This characterization is known as a Bang-Bang representation.
The characterizations described above are general and theoretically significant, yet they are difficult to verify practically. As of now, a complete solution to the embedding problem in terms of the matrix elements remains to be found. Currently, embeddability criteria have been established in the homogeneous case for chains up to four states; see [Reference Baake and Sumner4], [Reference Carette5], [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena7], and [Reference Johansen21]. In the non-homogeneous case, research has focused on the Bang-Bang representation of embeddable stochastic matrices, leading to more concrete results for the $ 3\times 3$ case; see [Reference Frydman13] and [Reference Johansen and Ramsey22]. In this paper we focus on the homogeneous embedding problem and will therefore omit the word ‘homogeneous’.
The issue of embeddability presents significant challenges in various aspects. Even if $ \mathbf{P}$ is confirmed as embeddable, there could be more than one Markov generator associated with it. Using a CTHMC as a foundational model necessitates decisions on handling non-embeddable transition matrices or multiple Markov generators. In the case of non-embeddability, a regularization algorithm is proposed to obtain a rate matrix $ \mathbf{Q}$ that minimizes $ \|\mathbf{P} - \exp{\mathbf{Q}}\|$ ; see [Reference Davies10], [Reference Israel, Rosenthal and Wei17], and [Reference Kreinin and Sidelnikova25]. When there are multiple Markov generators, the practitioner can try to identify and select the rate matrix reflecting the nature of the system under study; see [Reference Singer and Spilerman28].
Rather than accounting for the system’s nature when determining an appropriate Markov generator for $ \mathbf{P}$ , model-specific embedding problems have recently been considered for certain classes of stochastic matrices, leading to further results. For example, the elements of the transition matrix can be limited to certain values reflecting the characteristics of the system under study; see [Reference Ardiyansyah, Kosta and Kubjas1], [Reference Baake and Sumner2], [Reference Baake and Sumner3], [Reference Jia19], and [Reference Roca-Lacostena and Fernández-Sánchez27].
An alternative train of thought is to derive a rate matrix from the empirical transition matrix based on the assumption that there is a maximum of one transition per unit of time. Considering that a discrete-time Markov chain allows only one transition or jump per time step, it seems reasonable to view the underlying continuous-time process through a similar perspective. Furthermore, this one-jump hypothesis has not only theoretical but also practical grounds. In financial risk management, credit rating migration models evaluate how borrowers move between credit rating categories over time. These changes in credit ratings usually occur gradually and infrequently. Similarly, in healthcare, transitions between health states occur slowly over extended periods, resulting in infrequent state changes within short time intervals. In their study of credit risk migration, Jarrow, Lando, and Turnbull (JLT) assume that each firm has made zero or one transition per year; see [Reference Jarrow, Lando and Turnbull18]. Fundamentally, their approach designates the empirical transition probability $ p_{ij}$ to represent the probability of jumping from state i to state j by or before $ t=1$ . This facilitates the formulation of an approximate Markov generator for the empirical transition matrix in closed form.
In this study, we explore the one-jump hypothesis through a data-centric approach. Our only assumption is that the sampled entities or individuals adhere to the one-jump hypothesis; we do not presume any specific number of jumps for the entire population under study. Considering that a CTHMC model allows for several jumps in any given period, this approach appears to be more fitting if the goal is to employ the CTHMC to explore population dynamics. Hence we study a conditional version of the embedding problem by aligning the empirical transition matrix with the conditional transition matrix over a single unit of time of the CTHMC given the event that at most one jump has occurred during a unit length time interval. The main result of this paper is that, regardless of the number of states, precisely one rate matrix solves our conditional embedding problem in the case where the empirical transition matrix has non-zero diagonal elements; see Theorem 3.2. We also describe how to obtain this rate matrix from the empirical transition matrix. In summary, the conditional embedding problem, when considered in a one-jump context, yields a solution that avoids the regularization and identification challenges associated with the original embedding problem.
Our paper is structured as follows. In Section 2 we formalize and define the notion of a conditional one-step probability matrix given an event of interest. We derive an explicit expression for the conditional transition probabilities for the event that the number of jumps between t and $ t+1$ is zero or one. In Section 3 we dive into the conditional one-jump version of the embedding problem, introducing and focusing on the concepts of ${\unicode{x1D7D9}}$ -embeddability and ${\unicode{x1D7D9}}$ -generator. Our key finding in this section asserts that any stochastic matrix of order n with non-zero diagonal elements is ${\unicode{x1D7D9}}$ -embeddable and possesses a unique ${\unicode{x1D7D9}}$ -generator. Section 4 highlights the difference in one-jump setting approaches leading to the ${\unicode{x1D7D9}}$ -generator and the rate matrix $ \mathbf{Q}_{\text{JLT}}$ derived by Jarrow et al. [Reference Jarrow, Lando and Turnbull18]. We also show which of these two matrices serves as a better approximate Markov generator of the empirical transition matrix for a number of specific cases. We conclude with a discussion in Section 5. The supporting lemmas and their proofs are gathered in the Appendix.
2. Conditional transition probabilities
To state the conditional embedding problem, we first introduce the concept of conditional transition probability.
Consider a continuous-time homogeneous Markov chain (‘CTHMC’) $ (X_t)_{t\geq 0}$ on a probability triple $ (\Omega,\mathscr{F},\mathbb{P})$ with state space $ \mathcal{S}=\{1,2,\ldots,n\}$ .
Definition 2.1. Let $ t\geq 0$ and $ \mathcal{E}_t\in\mathscr{F}$ . We call the matrix $ \mathbf{P}^{\mathcal{E}_t}(t)$ with elements
the conditional transition probability matrix from time t to time $ t+1$ given the event $ \mathcal{E}_t$ of the CTHMC.
Remark 2.1. The (classical) transition probabilities $ p_{ij}(t)=\mathbb{P}(X_{t+1}=j \mid X_t=i)$ can be obtained using $ \mathcal{E}_t \,:\!=\, \Omega$ , i.e. $ p_{ij}(t)=p^{\Omega}_{ij}(t)$ .
In this paper, for given t, the event of interest is the occurrence of at most one jump of the CTHMC in the interval $ (t,t+1]$ , i.e. $ \mathcal{E}_t=\{N_{t+1}-N_{t}\leq 1\}$ , where $ N_s$ represents the number of state changes (‘jumps’) of the CTHMC up to and including time $ s\geq 0$ . For simplicity, we denote $ \mathbf{P}^{\unicode{x1D7D9}}(t)\,:\!=\, \mathbf{P}^{\mathcal{E}_t}(t)$ .
For a CTHMC with rate matrix $ \mathbf{Q}$ , the connection between $ \mathbf{P}^{\unicode{x1D7D9}}(t)=({p}^{\unicode{x1D7D9}}_{ij}(t))$ and $ \mathbf{Q}$ is established by the following proposition.
Proposition 2.1. For a CTHMC with rate matrix $ \mathbf{Q}=(q_{ij})$ , it holds that
where
and where the function $ \tau \colon \mathbb{R}^2\to\mathbb{R}$ is defined as
Proof. Using the definition of conditional probability, we have
whence
Let us denote
By the sum rule for disjoint events, we have
so it remains to be shown that $ p_{ij}^{*}(t)=p^{*}_{ij}$ for all $ i,j\in\mathcal{S}$ , where $ p^{*}_{ij}$ is given by (2.1).
Denote $ \mathcal{S}$ as the disjoint union of $ \mathcal{S}^{\text{T}}\,:\!=\, \{i\in\mathcal{S} \colon q_{ii} \lt 0\}$ (the subset of transient states) and $ \mathcal{S}^{\text{A}}\,:\!=\, \{i\in\mathcal{S} \colon q_{ii}=0\}$ (the subset of absorbing states). Let $ Y_t\,:\!=\, \inf\{s \gt 0 \colon X_{t+s}\neq X_t\}$ be the residual time in state $ X_t$ until the first jump after t. In a CTHMC, the holding times are memoryless, hence the conditional distribution of $ Y_t$ , given state $ X_t$ , does not depend on time t. For $ k\in\mathcal{S}^{\text{T}}$ , let $ f_k$ denote the conditional probability density function of $ Y_t$ given $ X_t=k$ and $ F_k$ its associated cumulative distribution function. Lastly, we shall write $ s_{ij}$ to denote the transition probability from state i to state j conditional on transitioning out of state i.
We start with the case $ i,j\in\mathcal{S}^{\text{T}}$ , $ i\neq j$ . Then (2.3) entails
and hence, by marginalizing on $ Y_t$ , we obtain
Since both residual and holding times in a transient state k have an exponential distribution with rate parameter $ -q_{kk}$ and since $ s_{ij}={{q_{ij}}/{-q_{ii}}}$ , we arrive at
where $ p^\ast_{ij}$ and $ \tau$ are given by (2.1) and (2.2).
We now turn to the case $ i\in\mathcal{S}^{\text{T}}$ and $ j\in\mathcal{S}^{\text{A}}$ . Then, using (2.3) and marginalizing on $ Y_t$ , we get
Consequently, $ p^\ast_{ij}(t)=q_{ij}\tau(q_{ii},q_{jj})=p^\ast_{ij}$ , as $ q_{jj}=0$ .
When $ i\in\mathcal{S}^{\text{A}}$ and $ j\in\mathcal{S}$ , $ j\neq i$ , we obviously have $ p^\ast_{ij}(t)=0$ . Hence
as $ q_{ij}=0$ under these conditions.
Finally, if $ i\in\mathcal{S}^{\text{T}}$ , (2.3) yields
and if $ i\in\mathcal{S}^{\text{A}}$ , then
We note that Minin and Suchard [Reference Minin and Suchard26] arrive at the same formula for $ p^\ast_{ij}$ , using a matrix differential equation involving the matrix with elements $ q_{ij}(k,t)$ representing the joint probability of transitioning in k jumps from state i to state j within a time-interval of length t.
According to Proposition 2.1, the matrix $ \mathbf{P}^{\unicode{x1D7D9}}(t)$ does not depend on t. We will therefore write $ \mathbf{P}^{\unicode{x1D7D9}}\,:\!=\, \mathbf{P}^{\unicode{x1D7D9}}(t)$ and refer to it as a conditional one-jump transition matrix.
Corollary 2.1. For all $ i\in\mathcal{S}$ , we have $ {p}^{\unicode{x1D7D9}}_{ii} \gt 0$ .
Proof. This follows from the fact that $ p^{*}_{ii}=\tau(q_{ii},q_{ii})=\mathrm{e}^{q_{ii}} \gt 0$ .
Remark 2.2. Corollary 2.1 can also be justified as follows. The only way to go from state i at time t to state i at time $ t+1$ , given the event $ \{N_{t+1}-N_t\leq 1\}$ , is to stay in that state for the entire time interval. The probability of this event is non-zero due to the exponential distribution governing the holding time in a state.
According to Proposition 2.1, the conditional one-jump transition matrix $ \mathbf{P}^{\unicode{x1D7D9}}$ is fully specified by the rate matrix $ \mathbf{Q}$ of the CTHMC. In what follows, and when needed, we explicitly indicate this dependence using the notation $ \mathbf{P}^{\unicode{x1D7D9}}(\mathbf{Q})$ .
3. Conditional embedding problem
The following question then naturally arises. Given a stochastic matrix $ \mathbf{P}$ , is there a rate matrix $ \mathbf{Q}$ such that $ \mathbf{P}^{\unicode{x1D7D9}}(\mathbf{Q})=\mathbf{P}$ ? And if so, is this rate matrix $ \mathbf{Q}$ unique?
It will be helpful to introduce some terminology before proceeding.
Definition 3.1. A stochastic matrix $ \mathbf{P}$ is called ${\unicode{x1D7D9}}$ -embeddable if and only if there exists a CTHMC with rate matrix $ \mathbf{Q}$ satisfying $ \mathbf{P}=\mathbf{P}^{\unicode{x1D7D9}}(\mathbf{Q})$ . This rate matrix is called a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ .
For a transition matrix $ \mathbf{P}$ that is not embeddable, a ${\unicode{x1D7D9}}$ -generator can be seen as a solution to a conditional version of the embedding problem where the rate matrix $ \mathbf{Q}$ satisfies $ \mathbf{P}^{\unicode{x1D7D9}}(\mathbf{Q})=\mathbf{P}$ . Corollary 2.1 establishes that a ${\unicode{x1D7D9}}$ -embeddable stochastic matrix must not contain a zero on its main diagonal. Consequently, our subsequent analysis will focus on stochastic matrices $ \mathbf{P}=(p_{ij})$ where $ p_{ii} \gt 0$ for all i. Note that this necessary condition for ${\unicode{x1D7D9}}$ -embeddability is not more restrictive than the necessary embedding condition $ \prod_{i=1}^{n} p_{ii} \geq \det{P} \gt 0$ as formulated by Goodman in [Reference Goodman16].
Interestingly, the off-diagonal elements of a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ are uniquely defined by its diagonal elements and the elements of $ \mathbf{P}$ . To express this relationship, we introduce the function $ \rho \colon \mathbb{R}_+^2\to\mathbb{R}_+$ , defined as follows:
Proposition 3.1. Suppose $ \mathbf{P}=(p_{ij})$ is an $ n\times n$ stochastic matrix that satisfies $ p_{ii} \gt 0$ for all i. If $ \mathbf{Q}=(q_{ij})$ is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ , then
where $ \theta_i=\mathrm{e}^{1-q_{ii}}$ for all i and the function $ \rho \colon \mathbb{R}_+^2\to\mathbb{R}_+$ is given by (3.1).
Proof. Suppose $ \mathbf{Q}$ is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}=(p_{ij})$ and let $ i\neq j$ . Then, according to Proposition 2.1, we have
Consequently, since $ \tau(q_{ii},q_{ii})=\mathrm{e}^{q_{ii}}=\mathrm{e}/\theta_i$ and $ \tau(q_{ii},q_{jj})=\tau(1-\ln{\theta_i},1-\ln{\theta_j})=\mathrm{e}/\rho(\theta_i,\theta_j)$ , we get
The outcome of Proposition 3.2 provides a requirement concerning the diagonal elements of any ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ .
Proposition 3.2. Let $ \mathbf{P}=(p_{ij})$ be an $ n\times n$ stochastic matrix that satisfies $ p_{ii} \gt 0$ for all i. If $ \mathbf{Q}=(q_{ij})$ is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ , then the n-tuple $ (\mathrm{e}^{1-q_{11}},\ldots,\mathrm{e}^{1-q_{nn}})$ is a fixed point of the vector function $ \mathbf{T}=(T_1,\ldots,T_n) \colon \mathbb{R}_+^{n}\to\mathbb{R}_+^{n}$ , which is defined as
where $ W_0$ represents the principal branch of the Lambert W function, and where the function $ \rho \colon \mathbb{R}_+^2\to\mathbb{R}_+$ is defined as in (3.1).
Proof. Denote $ \theta_i=\mathrm{e}^{1-q_{ii}}$ for all $ i\in\mathcal{S}$ . Then $ \theta_i \gt 0$ and $ q_{ii}=1-\ln{\theta_i}$ for all i. Using Proposition 3.1 and the fact that $ \mathbf{Q}$ is a rate matrix, we then have
which can be rewritten, using the fact that $ \rho(\theta_i,\theta_i)=\theta_i$ , as
Using the principal branch $ W_0$ of the Lambert W function (which is the multi-valued inverse of the function $ w\mapsto w\mathrm{e}^{w}$ ( $ w\in\mathbb{C}$ ); see [Reference Corless, Gonnet, Hare, Jeffrey and Knuth9]), we find that
which proves the result.
Propositions 3.1 and 3.2 imply that a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ defines a fixed point of the vector function $ \mathbf{T}$ . The reverse is also true, as stated in the following proposition.
Proposition 3.3. Suppose $ \mathbf{P}=(p_{ij})$ is a stochastic matrix that satisfies $ p_{ii} \gt 0$ for all $ i\in\mathcal{S}$ . Let $ \boldsymbol{\theta}=(\theta_1,\ldots,\theta_n)$ be a fixed point of the vector function $ \mathbf{T} \colon \mathbb{R}_+^n\to\mathbb{R}_+^n$ , defined in (3.2). Then the matrix $ \mathbf{Q}=(q_{ij})$ with elements
where $ \rho$ is defined by (3.1), is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ .
Proof. Let $ \boldsymbol{\theta}=(\theta_1,\theta_2,\ldots,\theta_n)\in\mathbb{R}_+^n$ be a fixed point of $ \mathbf{T}$ and let the matrix $ \mathbf{Q}$ be defined by (3.4). We first show that $ \mathbf{Q}$ is a rate matrix. By (3.4), all off-diagonal elements of $ \mathbf{Q}$ are non-negative. Since $ T_i(\boldsymbol{\theta})=\theta_i$ , we have
yielding
by definition of the Lambert $ W_0$ function. Using (3.4) and since $ \rho(\theta_i,\theta_i)=\theta_i$ , we can rewrite this equation as
After simplification, we get $ q_{ii}=-\sum_{j\in\mathcal{S}\setminus\{i\}}q_{ij}$ . Thus $ \mathbf{Q}$ has zero row sums. Consequently, $ \mathbf{Q}$ is a rate matrix.
It remains to be proved that $ {p}^{\unicode{x1D7D9}}_{ij}(\mathbf{Q})=p_{ij}$ for all i and j. Let $ p^\ast_{ij}$ be defined by (2.1) and (2.2). Using (3.4) and (3.1), we have
and
hence
Consequently, by Proposition 2.1 and since $ \mathbf{P}$ is a stochastic matrix, we now have
which concludes the proof.
By combining Propositions 3.1, 3.2, and 3.3, we establish a one-to-one relationship between the ${\unicode{x1D7D9}}$ -generators of $ \mathbf{P}$ and the fixed points of the vector function $ \mathbf{T}$ . The subsequent lemma describes the properties of the vector function $ \mathbf{T}$ , which are crucial to determining its number of fixed points.
Lemma 3.1. Let $ \mathbf{P}=(p_{ij})$ be an $ n\times n$ stochastic matrix. Let $ \delta=\min\{p_{ii},\,i\in\mathcal{S}\}$ and $ \Delta=\max\{p_{ii},\,i\in\mathcal{S}\}$ . Suppose $ p_{ii} \gt 0$ for all i. Consider the vector function $ \mathbf{T} \colon \mathbb{R}_+^n\to\mathbb{R}_+^n$ , defined as in (3.2), and the set
Then
(i) every fixed point of $ \mathbf{T}$ belongs to $ \mathcal{X}$ .
(ii) $ \mathbf{T}$ maps $ \mathcal{X}$ into $ \mathcal{X}$ .
Proof. (i) Let $ \boldsymbol{\theta}=(\theta_1,\ldots,\theta_n)\in\mathbb{R}_+^n$ be a fixed point of $ \mathbf{T}$ . Let $ m=\min\{\theta_1,\ldots,\theta_n\}$ and $ M=\max\{\theta_1,\ldots,\theta_n\}$ . We shall prove that $ m\geq\mathrm{e}^{1/\Delta}$ and that $ M\leq\mathrm{e}^{1/\delta}$ .
Let r be an index such that $ \theta_r=m$ . Then, by Lemma A.2(iv), we have $ \rho(\theta_r,\theta_j)\geq m$ for all j. Since $ T_r(\boldsymbol{\theta})=\theta_r$ , we have
yielding
by definition of the Lambert $ W_0$ function. Using the fact that $ \sum_{j\in\mathcal{S}} p_{rj}=1$ , we then obtain
which implies $ \ln{m}\geq{{1}/{\Delta}}$ and thus $ m\geq \mathrm{e}^{1/\Delta}$ . To prove the second inequality, let s be an index such that $ \theta_s=M$ . Then, by Lemma A.2(iv), we have $ \rho(\theta_s,\theta_j)\leq M$ for all j. Hence, by $ T_s(\theta)=\theta_s$ and the unit row sum property of $ \mathbf{P}$ ,
which yields $ \ln{M}\leq {{1}/{\delta}}$ and thus $ M\leq\mathrm{e}^{1/\delta}$ .
(ii) Let $ (x_1,\ldots,x_n)\in \mathcal{X}$ . By Lemma A.2(iv),
Then, since $ \mathbf{P}$ has unit row sums, we have for all i
Now, $ W_0$ and $ \exp$ are increasing functions, and therefore
The result now follows by applying the property $ W_0(x\mathrm{e}^x)=x$ for $ x \gt 0$ .
Lemma 3.1 implies that the maximal and minimal diagonal elements of $ \mathbf{P}$ impose a lower and upper bound on the diagonal elements of the ${\unicode{x1D7D9}}$ -generators of $ \mathbf{P}$ .
Proposition 3.4. Let $ \mathbf{P}=(p_{ij})$ be an $ n\times n$ stochastic matrix. Let $ \Delta=\max\{p_{ii},\,i\in\mathcal{S}\}$ and $ \delta=\min\{p_{ii},\,i\in\mathcal{S}\}$ . Suppose $ p_{ii} \gt 0$ for all i. Then, if $ \mathbf{Q}=(q_{ij})$ is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ , we have
Proof. If $ \mathbf{Q}=(q_{ij})$ is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ , we have by Proposition 3.2 that the vector $ (\theta_1,\ldots,\theta_n)$ , where $ \theta_i=\mathrm{e}^{1-q_{ii}}$ for all i, is a fixed point of $ \mathbf{T}$ . Applying Lemma 3.1(i), we have $ \mathrm{e}^{1/\Delta}\leq\theta_i\leq\mathrm{e}^{1/\delta}$ for all i, from which the result now follows.
With regard to the number of fixed points of $ \mathbf{T}$ , we now prove the following important result.
Theorem 3.1. Suppose $ \mathbf{P}=(p_{ij})$ is an $ n\times n$ stochastic matrix that satisfies $ p_{ii} \gt 0$ for all i. Then the vector function $ \mathbf{T} \colon \mathbb{R}_+^n\to \mathbb{R}_+^n$ , defined as in (3.2), has a unique fixed point.
Proof. According to Lemma 3.1(ii), $ \mathbf{T}$ maps the compact convex set $ \mathcal{X}\subset\mathbb{R}_+^n$ , defined by (3.5), into itself. In addition, $ \mathbf{T}$ is continuous as the function $ \rho$ , defined by (3.1), is continuous (Lemma A.2(ii)) and continuity is preserved by linear combination and composition of continuous functions. Hence, according to the Brouwer fixed point theorem, $ \mathbf{T}$ has a fixed point. By the definition of $ \mathbf{T}$ , this fixed point must have all positive components. We now show that the function $ \mathbf{g} \colon \mathbb{R}_+^n\to\mathbb{R}^n$ defined as $ \mathbf{g}=\mathbf{T}-\mathbf{Id}$ , where $ \mathbf{Id} \colon \mathbb{R}_+^n\to\mathbb{R}_+^n$ is the identity mapping, satisfies all conditions of Theorem 3.1 in [Reference Kennan23]. This theorem presents sufficient conditions for the function $ \mathbf{g}$ to have at most one vector $ \mathbf{x}\in\mathbb{R}_+^n$ with $ \mathbf{g}(\mathbf{x})=(0,\ldots,0)$ . These conditions are that (a) $ \mathbf{g}$ is quasi-increasing and (b) $ \mathbf{g}$ is strictly R-concave. Both (a) and (b) are proved in the Appendix; see Lemma A.4. So, we have established that $ \mathbf{T}$ has precisely one fixed point.
We are now ready to state and prove our main result in the following theorem.
Theorem 3.2. Suppose $ \mathbf{P}=(p_{ij})$ is an $ n\times n$ stochastic matrix that satisfies $ p_{ii} \gt 0$ for all i. Then $ \mathbf{P}$ has precisely one ${\unicode{x1D7D9}}$ -generator. Moreover, this ${\unicode{x1D7D9}}$ -generator $ \mathbf{Q}=(q_{ij})$ is given by
where the scalar function $ \rho \colon \mathbb{R}_+^2\to\mathbb{R}_+$ is defined by (3.1), and $ (\theta_1,\ldots,\theta_n)$ is the unique fixed point of the vector function $ \mathbf{T} \colon \mathbb{R}_+^n\to\mathbb{R}_+^n$ defined by (3.2).
Proof. We first prove that $ \mathbf{P}$ has a ${\unicode{x1D7D9}}$ -generator. By Theorem 3.1, the vector function $ \mathbf{T}$ has a unique fixed point $ \boldsymbol{\theta}=(\theta_1,\theta_2,\ldots,\theta_n)\in\mathbb{R}_+^n$ . Starting from $ \boldsymbol{\theta}$ , construct the matrix $ \mathbf{Q}=(q_{ij})$ according to (3.6). Then $ \mathbf{Q}$ is a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ , by Proposition 3.3.
To prove the uniqueness of the ${\unicode{x1D7D9}}$ -generator, suppose that $ \mathbf{P}$ has ${\unicode{x1D7D9}}$ -generators $ \mathbf{R}=(r_{ij})$ and $ \mathbf{S}=(s_{ij})$ . Then, by Proposition 3.2, the vectors $ {\theta}_{\mathbf{R}}=(\mathrm{e}^{1-r_{11}},\ldots,\mathrm{e}^{1-r_{nn}})$ and $ {\theta}_{\mathbf{S}} = ({\mathrm{e}}^{1-s_{11}},\ldots,{\mathrm{e}}^{1-s_{nn}})$ are fixed points of the vector function $ \mathbf{T}$ . By Theorem 3.1, $ {\theta}_{\mathbf{R}}={\theta}_{\mathbf{S}}$ . Hence, by Proposition 3.1, we must have $ \mathbf{R}=\mathbf{S}$ .
Finally, the requirement that the elements of a ${\unicode{x1D7D9}}$ -generator comply with (3.6) is inferred from Propositions 3.1 and 3.2. The proof is now complete.
Since $ p_{ii} \gt 0$ for all i is a necessary condition for embeddability (see [Reference Goodman16]), Theorem 3.2 immediately establishes a link between embeddability and ${\unicode{x1D7D9}}$ -embeddability.
Corollary 3.1. An embeddable stochastic matrix is ${\unicode{x1D7D9}}$ -embeddable.
According to Theorem 3.2, the unique ${\unicode{x1D7D9}}$ -generator is entirely characterized by the fixed point of the function $ \mathbf{T} \colon \mathcal{X}\to\mathcal{X}$ . When $ \mathbf{T}$ acts as a contraction on $ \mathcal{X}$ , the Banach fixed point theorem ensures the convergence of the fixed point iteration method, which can then be used to estimate $ (\theta_1,\ldots,\theta_n)$ . In the case when not all diagonal elements of $ \mathbf{P}$ are equal, Proposition 3.5 produces a necessary condition for $ \mathbf{T}$ to behave as a contraction under the infinity vector norm. This condition is based on the largest and smallest diagonal elements of the matrix $ \mathbf{P}$ .
Proposition 3.5. Let $ \mathbf{P}=(p_{ij})$ be an $ n\times n$ stochastic matrix such that $ 0 \lt \delta \lt \Delta$ , where $ \delta=\min\{p_{ii},\,i\in\mathcal{S}\}$ and $ \Delta=\max\{p_{ii},\,i\in\mathcal{S}\}$ . Let the set $ \mathcal{X}$ be defined by (3.5) and the function $ \mathbf{T}$ as in (3.2). Then, with respect to the infinity norm, the function $ \mathbf{T}$ is Lipschitz-continuous on $ \mathcal{X}$ with Lipschitz constant
where
Proof. Using Lemmas A.6 and A.7, we can prove that
where
For more details we refer the reader to the Appendix.
If all diagonal elements of $ \mathbf{P}$ are identical, there is a closed-form expression for the unique ${\unicode{x1D7D9}}$ -generator, which is detailed in the following proposition. Such matrices occur as substitution matrices in Markov models for DNA evolution, among others. See [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena6] for a discussion of the embeddability of such matrices in the context of the so-called JC69, K80, and K81 models.
Proposition 3.6. Suppose $ \mathbf{P}=(p_{ij})$ is an $ n\times n$ stochastic matrix satisfying $ p_{ii}=p \gt 0$ for all i. Then its unique ${\unicode{x1D7D9}}$ -generator $ \mathbf{Q}$ is given by $ \mathbf{Q}=\tfrac{1}{p}(\mathbf{P}-\mathbf{I})$ , where $ \mathbf{I}$ is the $ n\times n$ identity matrix.
Proof. Let $ \mathbf{Q}=(q_{ij})$ be a ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ . It follows from Proposition 3.4 that $ q_{ii}=1-1/p$ for all i. Moreover, if $ i\neq j$ , Theorem 3.2 and equation (3.1) imply that
In summary, we have
concluding the proof.
4. Comparison of alternative one-jump approaches and illustrations
Consider a continuous-time Markov chain that is discretely observed at times $ t=0, 1, 2, \ldots , T$ ( $ T\geq 2$ ). Let $ \mathbf{P}$ be the empirical transition matrix estimated from these observations. If $ \mathbf{P}$ is not embeddable, there is no Markov generator of $ \mathbf{P}$ , making it a challenge to find an approximate generator. However, if one assumes that the likelihood of more than one transition occurring between two consecutive observations is very low – a scenario referred to as a one-jump setting – it is possible to find a unique rate matrix serving as an approximate generator of $ \mathbf{P}$ .
Let us adopt the notation $ \mathbf{Q}^{\text{JLT}}$ , employed in [Reference Israel, Rosenthal and Wei17], to denote the one-jump rate matrix of Jarrow et al. [Reference Jarrow, Lando and Turnbull18]. Furthermore, we denote $ \mathbf{Q}^{\unicode{x1D7D9}}$ for the ${\unicode{x1D7D9}}$ -generator of $ \mathbf{P}$ . In this section we discuss the specifics of both approaches and compare the resulting rate matrices $ \mathbf{Q}^{\text{JLT}}$ and $ \mathbf{Q}^{\unicode{x1D7D9}}$ as approximate generators of $ \mathbf{P}$ . To effectively represent the distinction between the two approaches, we use the recursively defined jump times $ \tau_0\,:\!=\, 0$ and $ \tau_k=\inf\{t\geq \tau_{k-1} \colon X_t\neq X_{\tau_{k-1}}\}$ ( $ k=1,2,\ldots$ ) of the associated CTHMC $ (X_t)_{t\geq 0}$ .
In their interpretation of the one-jump scenario, Jarrow et al. [Reference Jarrow, Lando and Turnbull18] deduce the rate matrix $ \mathbf{Q}^{\text{JLT}}=(q_{ij}^{\text{JLT}})$ directly from $ \mathbf{P}=(p_{ij})$ through
assuming $ 0 \lt p_{ii}$ for all i. From a probabilistic perspective considering jump times, the elements $ q_{ij}^{\text{JLT}}$ are defined as the solutions to the equations
Essentially, their one-jump approach designates $ p_{ij}$ to represent the probability of jumping from state i to state j at or before time $ t=1$ .
In this paper we derive the ${\unicode{x1D7D9}}$ -generator via an alternative approach, as expressed by
which implies that $ p_{ij}$ represents the likelihood of going from state i at $ t=0$ to state j at $ t=1$ via at most one jump before or at $ t=1$ . Hence the difference between the methods (4.2) and (4.3) depends on how the empirical transition probability matrix $ \mathbf{P}$ is viewed in the scenario of a single jump. Both approaches have the merit of bypassing the identification and regularization stages (see [Reference Davies10], [Reference Israel, Rosenthal and Wei17], and [Reference Kreinin and Sidelnikova25]) associated with the embedding problem.
In regularization problems where the transition matrix $ \mathbf{P}$ is not embeddable, the goal is to find an approximate Markov generator $ \mathbf{Q}$ for $ \mathbf{P}$ by minimizing the discrepancy between $ \exp{\mathbf{Q}}$ and $ \mathbf{P}$ . The magnitude of this difference $ \|\mathbf{P} - \exp{\mathbf{Q}}\|$ depends on the norm used. Davies [Reference Davies10] explored the use of the infinity norm in addressing the regularization problem. For an $ n\times n$ matrix $ \mathbf{M}=(m_{ij})$ , the infinity norm is given by $ \|\mathbf{M}\|_{\infty} = \max_{1 \leq i \leq n} \sum_{j=1}^{n} |m_{ij}|$ . This norm proves useful in comparing transition matrices, since both $ \mathbf{P}$ and $ \exp{\mathbf{Q}}$ consist of rows that are stochastic vectors. The Manhattan or Taxicab distance between the corresponding rows, the ith rows in particular, i.e. $ \sum_{j=1}^{n} |\mathbf{P}_{ij} - (\exp{\mathbf{Q}})_{ij}|$ , is a simple indicator of the discrepancy between the transition probability distributions from state i. Consequently, the maximum absolute row sum of $ \mathbf{P} - \exp{\mathbf{Q}}$ offers a distinct measure to evaluate the difference between $ \mathbf{P}$ and $ \exp{\mathbf{Q}}$ . The infinity norm will be used consistently in the remainder of this paper.
An interesting question arises as to whether systematically $ \exp{\mathbf{Q}}^{\text{JLT}}$ or $ \exp{\mathbf{Q}}^{\unicode{x1D7D9}}$ is the best approximation to $ \mathbf{P}$ , or whether this answer depends on the transition matrix $ \mathbf{P}$ . In the following sections, we analyse the discrepancies $ \|\mathbf{P} - \exp{\mathbf{Q}}^{\text{JLT}}\|_{\infty}$ and $ \|\mathbf{P} - \exp{\mathbf{Q}}^{\unicode{x1D7D9}}\|_{\infty}$ for specific categories of the transition matrix $ \mathbf{P}$ .
4.1. Credit rating transition matrix
Consider the empirical transition matrix
which is derived from Table 3 in Jarrow et al. [Reference Jarrow, Lando and Turnbull18, p. 506]. We have made minor adjustments to five entries on the main diagonal to ensure that all row sums are equal to one, as some rows in Table 3 did not add up to one because of rounding. For this matrix, the vector function $ \mathbf{T} \colon \mathcal{X}\to\mathcal{X}$ , defined by (3.2), is a contraction mapping with Lipschitz constant $ K \lt 1$ ( $ K\approx 0.7833$ ); see Proposition 3.5. Using fixed-point iteration and (3.6), we find that the unique ${\unicode{x1D7D9}}$ -generator, truncated to four decimal places, is given by
In contrast, the rate matrix $ \mathbf{Q}^{\text{JLT}}$ , published in Jarrow et al. [Reference Jarrow, Lando and Turnbull18] and defined by equation (4.1), is
We find $ \|\mathbf{P}-\exp{\mathbf{Q}}^{\unicode{x1D7D9}}\|_{\infty}\approx 0.0247$ and $ \|\mathbf{P}-\exp{\mathbf{Q}}^{\text{JLT}}\|_{\infty}\approx 0.0277$ (accurate to four decimal places). Hence it appears that $ \mathbf{Q}^{\unicode{x1D7D9}}$ is a better approximate generator of $ \mathbf{P}$ than $ \mathbf{Q}^{\text{JLT}}$ , when using the infinity norm.
Observe that the magnitudes of the diagonal entries in $ \mathbf{Q}^{\text{JLT}}$ are smaller compared to those in the ${\unicode{x1D7D9}}$ -generator. This observation persists across all matrices $ \mathbf{P}$ with non-zero diagonal entries, as demonstrated by the following proposition.
Proposition 4.1. Suppose $ \mathbf{P}=(p_{ij})$ is an $ n\times n$ stochastic matrix that satisfies $ p_{ii} \gt 0$ for all i. Then $ q_{ii}^{\unicode{x1D7D9}}\leq q_{ii}^{\mathrm{JLT}}$ for all i.
Proof. By Theorem 3.2, $ q_{ii}^{\unicode{x1D7D9}}=1-\ln{\theta_i}$ for all i, where $ \theta=(\theta_1,\ldots,\theta_n)$ is the unique fixed point of the vector function $ \mathbf{T}=(T_1,\ldots,T_n)$ , defined by (3.2). For $ i\in\mathcal{S}$ , we have by (3.3) that
Using the definition of the function $ \rho$ in (3.1), the above equation can be rewritten as
Now, since $ 1-\ln{\theta_j}=q_{jj}^{\unicode{x1D7D9}}\leq 0$ , we have $ \theta_j\geq\mathrm{e}$ for all j. By Lemma A.2(iii), $ \rho(\theta_i,\theta_j)\geq \rho(\theta_i,\mathrm{e})$ for all j. Hence, from (4.4), it follows that
which simplifies to $ \theta_i\geq \mathrm{e}/p_{ii}$ . Taking logarithms from both sides of this inequality and using $ q_{ii}^{\unicode{x1D7D9}}=1-\ln{\theta_i}$ , we arrive at $ q_{ii}^{\unicode{x1D7D9}}\leq\ln{p_{ii}}=q_{ii}^{\mathrm{JLT}}$ .
4.2. Transition matrices with identical diagonal entries
When the stochastic matrix $ \mathbf{P}$ has identical diagonal entries $ p_{ii}=p \gt 0$ , for all i, Proposition 3.6 indicates that the ${\unicode{x1D7D9}}$ -generator can be explicitly represented as
with $ \mathbf{I}$ denoting the $ n\times n$ identity matrix. Furthermore, by (4.1), we have
Both $ \mathbf{Q}^{\unicode{x1D7D9}}$ and $ \mathbf{Q}^{\text{JLT}}$ share a similar structure; indeed, they can be expressed as
where
In this section, we explore which matrix, either $ \mathbf{Q}^{\unicode{x1D7D9}}$ or $ \mathbf{Q}^{\text{JLT}}$ , is the better approximate generator of $ \mathbf{P}$ according to the infinity norm.
For the $ 2 \times 2$ case,
it is known from [Reference Kingman24] that $ \mathbf{P}$ is embeddable if and only if its trace exceeds 1, i.e. $ p \gt \tfrac{1}{2}$ . The unique Markov generator of $ \mathbf{P}$ is then given by
see [Reference Singer and Spilerman28, p. 392]. If $ \mathbf{P}$ is not embeddable ( $ p \leq \tfrac{1}{2}$ ), it turns out that $ \|\mathbf{P}-\exp{\mathbf{Q}^{\unicode{x1D7D9}}}\|_{\infty} \lt \|\mathbf{P}-\exp{\mathbf{Q}^{\text{JLT}}}\|_{\infty}$ , an inequality that even holds for all $ p\in(0,1)$ . This is proved in Lemma A.8 in the Appendix.
For the $ n \times n$ case where $ n\geq 3$ , the answer to our question seems no longer clear-cut. For instance, if
we have $ \|\mathbf{P}-\exp{\mathbf{Q}^{\unicode{x1D7D9}}}\|_{\infty} \lt \|\mathbf{P}-\exp{\mathbf{Q}^{\text{JLT}}}\|_{\infty}$ ; see Lemma A.9. However, if
we have the reversed inequality, i.e. $ \|\mathbf{P}-\exp{\mathbf{Q}^{\unicode{x1D7D9}}}\|_{\infty} \gt \|\mathbf{P}-\exp{\mathbf{Q}^{\text{JLT}}}\|_{\infty}$ ; see Lemma A.10. We note that the transition matrices of (4.6) and (4.7) are not embeddable, since for some $ i\neq j$ we have $ \mathbf{P}_{ij}=0$ but $ (\mathbf{P}^2)_{ij} \gt 0$ ; see [Reference Chung8, Theorem 5, p. 126], highlighting the importance of investigating $ \mathbf{Q}^{\unicode{x1D7D9}}$ and $ \mathbf{Q}^{\text{JLT}}$ as approximate Markov generators.
5. Discussion and future research avenues
From a one-jump perspective, we obtain a modified version of the Markov embedding problem, producing for each $ n\times n$ transition matrix $ \mathbf{P}$ having non-zero diagonal elements a unique rate matrix $ \mathbf{Q}^{\unicode{x1D7D9}}$ . If $ \mathbf{P}$ is embeddable, its Markov generator does not necessarily equal $ \mathbf{Q}^{\unicode{x1D7D9}}$ . However, if the one-jump assumption is acceptable, we might expect the Markov generator to be close to $ \mathbf{Q}^{\unicode{x1D7D9}}$ in some sense.
The existence and uniqueness of a ${\unicode{x1D7D9}}$ -generator is an advantage of our conditional embedding approach because neither regularization nor identification is needed. For some types of transition matrix (as in Proposition 3.6) there exists a closed form for the ${\unicode{x1D7D9}}$ -generator; in other situations the ${\unicode{x1D7D9}}$ -generator can be determined based on Theorem 3.2.
Comparing our conditional one-jump approach with that of Jarrow et al. in [Reference Jarrow, Lando and Turnbull18], we observe reduced discrepancies $ \|\mathbf{P}-\exp{\mathbf{Q}}\|_{\infty}$ for certain transition matrices $ \mathbf{P}$ when using $ \mathbf{Q}=\mathbf{Q}^{\unicode{x1D7D9}}$ over $ \mathbf{Q}=\mathbf{Q}^{\text{JLT}}$ . Future research could focus on quantifying the improvements achieved by using $ \mathbf{Q}^{\unicode{x1D7D9}}$ over $ \mathbf{Q}^{\text{JLT}}$ . It would also be valuable to determine which transition matrices $ \mathbf{P}$ are better approximately generated by $ \mathbf{Q}^{\unicode{x1D7D9}}$ compared to $ \mathbf{Q}^{\text{JLT}}$ . Furthermore, exploring how the deviations of $ \exp(\mathbf{Q}^{\unicode{x1D7D9}})$ and $ \exp(\mathbf{Q}^{\text{JLT}})$ from $ \mathbf{P}$ vary with different matrix norms beyond the infinity norm used in this study could provide deeper insights. In addition to the comparison based on the discrepancies $ \|\mathbf{P}-\exp{\mathbf{Q}}\|_{\infty}$ , it would also be interesting to consider other criteria, such as the extent to which $ \mathbf{Q}^{\unicode{x1D7D9}}$ and $ \mathbf{Q}^{\text{JLT}}$ are in line with the specificity of the model. Moreover, for an embeddable transition matrix $ \mathbf{P}$ , a comparison of $ \mathbf{Q}^{\unicode{x1D7D9}}$ and $ \mathbf{Q}^{\text{JLT}}$ with the Markov generators of $ \mathbf{P}$ could be of interest.
To study the dynamics of CTHMCs with Markov generators $ \mathbf{Q}^{\unicode{x1D7D9}}$ and $ \mathbf{Q}^{\text{JLT}}$ , we can evaluate the mobility index $ M(\mathbf{Q}) = -\tfrac{1}{n}\mathrm{Tr}{\mathbf{Q}}$ (see [Reference Geweke, Marshall and Zarkin15]), where $ \mathbf{Q}$ is an $ n\times n$ rate matrix and $ \mathrm{Tr}\mathbf{Q}$ its trace. Proposition 4.1 indicates $ M(\mathbf{Q}^{\unicode{x1D7D9}}) \gt M(\mathbf{Q}^{\text{JLT}})$ , suggesting more dynamic transitions in the CTHMC governed by $ \mathbf{Q}^{\unicode{x1D7D9}}$ . More comparative research on $ \mathbf{Q}^{\unicode{x1D7D9}}$ and $ \mathbf{Q}^{\text{JLT}}$ using other mobility indices would be valuable.
Appendix. Lemmas and proofs
Lemma A.1. The function f with $ f(t)=\mathrm{e}^{W_0(t)}$ , $ t\geq0$ , is strictly concave.
Proof. By taking second-order derivatives and since
(see e.g. [Reference Dence11]), we find
which is negative for all $ t \gt 0$ since $ W_0(t) \gt 0$ if $ t \gt 0$ .
Let $ \mathbf{o}=(0,\ldots,0)\in\mathbb{R}^n$ . In what follows, we consider the partial ordering of $ \mathbb{R}^n$ induced by componentwise ordering. That is, if $ \mathbf{x}=(x_1,\ldots,x_n)\in\mathbb{R}^n,$ and $ \mathbf{y}=(y_1,\ldots,y_n)\in\mathbb{R}^n$ , we write $ \mathbf{x}\preceq\mathbf{y}$ if and only if $ x_i\leq y_i$ for all i. Likewise, we write $ \mathbf{x}\succ\mathbf{o}$ if and only if $ x_i \gt 0$ for all i.
Lemma A.2. The function $ \rho \colon \mathbb{R}_+^2\to\mathbb{R}_+$ , defined as in (3.1), satisfies the following properties:
-
(i) $ \rho$ is linearly homogeneous, i.e. $ \rho(\lambda\mathbf{x})=\lambda\rho(\mathbf{x})$ for all $ \mathbf{x}\in\mathbb{R}_+^2$ and $ \lambda \gt 0$ ,
-
(ii) $ \rho$ is continuous on $ \mathbb{R}_+^2$ ,
-
(iii) $ \rho$ is increasing, i.e. $ \rho(\mathbf{x})\leq\rho(\mathbf{y})$ for all $ \mathbf{x},\mathbf{y}\in\mathbb{R}_+^2$ with $ \mathbf{x}\preceq\mathbf{y}$ ,
-
(iv) $ \min\{x,y\}\leq \rho(x,y)\leq\max\{x,y\}$ for all $ (x,y)\in\mathbb{R}_+^2$ ,
-
(v) $ \rho$ is concave.
Proof. Let $ \mathbf{u}=(u_1,u_2)\in\mathbb{R}_+^2$ . It is easy to see that $ \rho(\mathbf{u})=u_2f(u_1/u_2)=u_1f(u_2/u_1)$ , where f is the continuous function defined by
Property (i) follows directly from the above. Property (ii) is a direct consequence of the above. For property (iii), by standard calculus, we have
hence f is (strictly) increasing on the positive real half-line since $ \ln{t} \lt t-1$ for all $ t \gt 0$ with $ t\neq 1$ . Now, take $ \mathbf{x}=(x_1,x_2)$ and $ \mathbf{y}=(y_1,y_2)$ , so that $ \mathbf{o}\prec\mathbf{x}\preceq\mathbf{y}$ . Then, as f is increasing, $ \rho(\mathbf{x})=x_2f(x_1/x_2)\leq x_2f(y_1/x_2)=y_1f(x_2/y_1)\leq y_1f(y_2/y_1)=\rho(\mathbf{y})$ .
(iv) Consider the case $ 0 \lt x\leq y$ . Since f is increasing on $ \mathbb{R}_+$ , we have
The result for the case $ 0 \lt y\leq x$ is proved analogously.
(v) Since $ \rho(x,y)=x\rho(1,y/x)$ by the homogeneity property (i) from Lemma A.2, $ \rho$ is the perspective of the continuous function $ t\mapsto f(t)=\rho(1,t)$ . A straightforward computation reveals
where $ g(t) = {{1}/{t}}-t+2 \ln t$ . Since
we have $ g(t) \gt g(1) = 0$ for all $ 0 \lt t \lt 1$ and $ g(t) \lt g(1)=0$ for all $ t \gt 1$ . Hence $ f^{\prime\prime}(t) \lt 0$ for all $ t \gt 0$ , $ t\neq 1$ . Therefore f is concave, so $ \rho$ is concave as the perspective of a concave function.
Lemma A.3. The vector function $ \mathbf{T} \colon \mathbb{R}_+^n\to\mathbb{R}_+^n$ from Proposition 3.2 is increasing, i.e. $ \mathbf{T}(\mathbf{x})\preceq\mathbf{T}(\mathbf{y})$ for all $ \mathbf{x},\mathbf{y}\in\mathbb{R}_+^n$ with $ \mathbf{x}\preceq\mathbf{y}$ .
Proof. Let $ i\in\{1,\ldots,n\}$ and take $ \mathbf{x},\mathbf{y}\in\mathbb{R}_+^n$ so that $ \mathbf{x}\preceq\mathbf{y}$ . Denote
By Lemma A.2(iii) we have $ F_i(\mathbf{x})\leq F_i(\mathbf{y})$ , which yields $ T_i(\mathbf{x})=\mathrm{e}^{W_0(F_i(\mathbf{x}))}\leq \mathrm{e}^{W_0(F_i(\mathbf{y}))}=T_i(\mathbf{y})$ since the principal branch $ W_0$ of the Lambert W function is increasing (see e.g. [Reference Corless, Gonnet, Hare, Jeffrey and Knuth9]).
Lemma A.4. Let the vector function $ \mathbf{g} \colon \mathbb{R}_+^n\to\mathbb{R}^n$ given by $ \mathbf{g}(\mathbf{x})=\mathbf{T}(\mathbf{x})-\mathbf{x}$ for all $ \mathbf{x}\succ\mathbf{o}$ , where the vector function $ \mathbf{T} \colon \mathbb{R}_+^n\to\mathbb{R}_+^n$ is defined as in (3.2). Then $ \mathbf{g}=(g_1,\ldots,g_n)$ is
-
(i) quasi-increasing, i.e. for all i it holds that $ \mathbf{o}\prec\mathbf{x}\preceq\mathbf{y}$ and $ x_i=y_i$ imply $ g_i(\mathbf{x})\leq g_i(\mathbf{y})$ ,
-
(ii) strictly R-concave, i.e. if $ \mathbf{x}\succ\mathbf{o}$ , $ \mathbf{g}(\mathbf{x})=\mathbf{o}$ and $ 0 \lt \lambda \lt 1$ , then $ \mathbf{g}(\lambda\mathbf{x})\succ\mathbf{o}$ .
Proof. (i) Take $ i\in\{1,\ldots,n\}$ and suppose that $ \mathbf{o}\prec\mathbf{x}\preceq\mathbf{y}$ with $ x_i=y_i$ . Then $ g_i(\mathbf{x})\leq g_i(\mathbf{y})$ , since $ x_i=y_i$ and $ T_i(\mathbf{x})\leq T_i(\mathbf{y})$ (Lemma A.3).
(ii) Let $ \mathbf{x}=(x_1,\ldots,x_n)\succ\mathbf{o}$ be such that $ \mathbf{g}(\mathbf{x})=\mathbf{o}$ . Let $ 0 \lt \lambda \lt 1$ and take $ i\in\{1,\ldots,n\}$ . Denote
By Lemma A.2(i), $ F_i(\lambda\mathbf{x})=\lambda F_i(\mathbf{x})$ . Hence
where the inequality follows from the fact that the function $ t\mapsto\mathrm{e}^{W_0(t)}$ is strictly concave (Lemma A.1) and $ W_0(0)=0$ . Consequently, for all i,
i.e. $ \mathbf{g}(\lambda\mathbf{x})\succ\mathbf{o}$ .
Lemma A.5. For all $ 0 \lt p \lt 1$ , it holds that
-
(i) $ 1+\mathrm{e}^{2-2/p}-2p \gt 0$ ,
-
(ii) $ 1-\mathrm{e}^{1-1/p} \lt \tfrac{4}{3}(1-p)$ .
Proof. (i) Let $ f(t)=1+\mathrm{e}^{2-2/t}-2t$ , which is continuous on the half-open interval (0, 1]. A straightforward calculation reveals $ f^{\prime}(t)=2t^{-2}(\mathrm{e}^{2-2/t}-t^2)$ and $ f^{\prime\prime}(t)=4t^{-4}(1-t)\mathrm{e}^{2-2/t}$ . So, $ f^{\prime\prime}(t) \gt 0$ for all $ t\in(0,1)$ . Consequently, $ f^{\prime}(t) \lt 0$ for all $ t\in(0,1)$ because $ f^{\prime}$ increases monotonically on (0,1) and $ f^{\prime}(1)=0$ . Hence f is monotonically decreasing on (0,1). The result now follows from the fact that $ f(1)=0$ .
(ii) Consider the function $ f(t)=\tfrac{4}{3}(1-t)-1+\mathrm{e}^{1-1/t}$ which is differentiable on $ \mathbb{R}_+\setminus\{0\}$ . Let $ p_0$ be a critical point of f, then $ f^{\prime}(p_0)=-\tfrac{4}{3}+\mathrm{e}^{1-1/p_0}p_0^{-2}=0$ , yielding $ \mathrm{e}^{1-1/p_0}=\tfrac{4}{3}p_0^2$ . Clearly $ p_0\neq\tfrac{1}{2}$ , and hence
So, all critical points of f have positive function values. In addition, we have $ f(1)=0$ and $ \lim_{t\to0^+}f(t)=1/3 \gt 0$ . Therefore $ f(t) \gt 0$ for all $ t\in(0,1)$ .
Lemma A.6. For all $ x,y\geq 0$ , $ x\neq y$ , it holds that
Proof. Assume $ x, y \geq 0$ and $ x \neq y$ . To simplify, consider the function $ f(t) = {\mathrm{e}}^{W_0(t)}$ where $ t \geq 0$ . Given that $ W_0$ is a monotonically increasing function, it follows that f is also increasing. Hence, according to the mean value theorem, there is a value c in the interval between x and y such that
It remains to be proved that
Since
(see [Reference Corless, Gonnet, Hare, Jeffrey and Knuth9]), we have
where the last equality follows from the identity $ W_0(t)\mathrm{e}^{W_0(t)}=t$ . Hence, because $ W_0$ is an increasing function, we have
Lemma A.7. Let $ \mathbf{P}=(p_{ij})$ be an $ n\times n$ stochastic matrix such that $ p_{ii} \gt 0$ for all i. Let $ \delta=\min\{p_{ii},\,i\in\mathcal{S}\}$ . For all i and $ \mathbf{x}=(x_1,\ldots,x_n)\in\mathbb{R}_+^n$ , denote
For $ m \gt 0$ and $ \alpha \gt 1$ , define the set
Then
where
Proof. Take $ \mathbf{x},\mathbf{y}\in \mathcal{X}_{m,\alpha}$ and take $ i,j\in\{1,\ldots,n\}$ . If $ \rho(x_i,x_j)\leq \rho(y_i,y_j)$ , then, by concavity of the function $ \rho(x,y)$ and the fact that its partial derivatives are non-negative (as stated in Lemma A.2, items (v) and (iii)), it follows that
where
Similarly, if $ \rho(x_i,x_j) \gt \rho(y_i,y_j)$ , the concave nature of the function $ \rho$ along with the non-negativity of its partial derivatives imply
Hence, since $ \mathbf{P}$ is a stochastic matrix,
It remains to be shown that
For $ u \gt 0$ and $ v \gt 0$ , the homogeneity of the function $ \rho$ (see Lemma A.2(i)) and a simple calculation of its partial derivatives entail
Denoting $ t={{u}/{v}}$ and $ f(t)=(1+\tfrac{1}{t})\rho(t,1)$ , it then follows from (A.3) that
Using (3.1), we obtain
and by standard calculus, we find that $ \max\{f(t) \colon {{1}/{\alpha}}\leq t \leq \alpha\}=f(\alpha)$ . The proof is complete.
Proof of Proposition 3.5. Let $ i\in\{1,\ldots,n\}$ and $ \mathbf{x},\mathbf{y}\in \mathcal{X}$ . Using (3.2) and (A.1), we have $ T_i(\mathbf{u})=\exp{W_0(F_i(\mathbf{u}))}$ for all $ \mathbf{u}\in\mathbb{R}_+^n$ . Since $ \mathbf{T}$ maps $ \mathcal{X}$ into itself (see Lemma 3.1(ii)), we have $ \min\{T_i(\mathbf{x}),T_i(\mathbf{y})\}\geq \mathrm{e}^{1/\Delta}$ , hence $ \min\{W_0(F_i(\mathbf{x})),W_0(F_i(\mathbf{y}))\}\geq {{1}/{\Delta}}$ . Therefore, by Lemma A.6,
Denote $ m=\mathrm{e}^{1/\Delta}$ and $ \alpha=\mathrm{e}^{{{1}/{\delta}}-{{1}/{\Delta}}}$ , then by (A.2), $ \mathcal{X}=\mathcal{X}_{m,\alpha}$ . The proposition now follows from Lemma A.7.
Lemma A.8. Let
and let $ \mathbf{Q}(k)$ be defined as in (4.5). Then
Proof. It can be shown (e.g. by using Lagrange–Sylvester interpolation to compute functions of matrices; see [Reference Gantmacher14]), that
so that
where $ f(k)=1+\mathrm{e}^{2k(p-1)}-2p$ . Since $ p \lt 1$ , the function f is strictly decreasing, hence
Furthermore, $ f({{1}/{p}}) \gt 0$ by Lemma A.5(i). Consequently,
Lemma A.9. For
and $ \mathbf{Q}(k)$ defined as in (4.5), it holds that
Proof. It can be shown (e.g. by applying Sylvester’s theorem for computing functions of a matrix) that
where
A simple calculation reveals
hence $ \alpha(k)-\beta(k)\leq 0$ . Therefore, by the triangle inequality,
Since
we have
so that (A.6) leads to
Furthermore, Lemma A.5(i) entails that $ \beta({{1}/{p}}) \gt 0$ . We now need to consider two cases related to $ \alpha({{1}/{p}})$ . If $ \alpha({{1}/{p}})\geq 0$ , we have by (A.6), (A.7) and the fact that $ \beta$ is strictly decreasing (because $ p \lt 1$ ) that
When $ \alpha({{1}/{p}}) \lt 0$ , we have by (A.6) and (A.5)
By Lemma A.5(ii) and the assumption $ 0 \lt p \lt 1$ , we have $ 0 \lt 1-\gamma({{1}/{p}}) \lt \tfrac{4}{3}(1-p)$ . Hence
In either case, we have proved the result.
Lemma A.10. For
and $ \mathbf{Q}(k)$ defined as in (4.5), it holds that
Proof. It can be shown (e.g. by applying Sylvester’s theorem for computing functions of a matrix) that
where $ \beta(k)=\tfrac{1}{2}\,\gamma(k)^2+\tfrac{1}{2}-p$ and $ \gamma(k)=\mathrm{e}^{k(p-1)}$ . Therefore, by the triangle inequality,
Let $ \delta(k)=(\gamma(k)-1)^2$ . Since
we have
so that (A.8) leads to
Moreover, Lemma A.5(i) shows that $ \beta({{1}/{p}}) \gt 0$ . Also,
because $ \gamma$ is a strictly decreasing function of k. Thus it follows from (A.8) that
The proof is now complete because $ \delta$ is a strictly increasing function of k, leading to
Funding information
There are no funding bodies to thank relating to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.