Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-22T23:56:09.782Z Has data issue: false hasContentIssue false

A note on continuity and asymptotic consistency of measures of risk and variability

Published online by Cambridge University Press:  16 December 2024

Niushan Gao
Affiliation:
Department of Mathematics, Toronto Metropolitan University, Toronto, M5B 2K3, Canada
Foivos Xanthos*
Affiliation:
Department of Mathematics, Toronto Metropolitan University, Toronto, M5B 2K3, Canada
*
Corresponding author: Foivos Xanthos; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

In this short note, we show that every convex, order-bounded above functional on a Fréchet lattice is automatically continuous. This improves a result in Ruszczyński and Shapiro ((2006) Mathematics of Operations Research 31(3), 433–452.) and applies to many deviation and variability measures. We also show that an order-continuous, law-invariant functional on an Orlicz space is strongly consistent everywhere, extending a result in Krätschmer et al. ((2017) Finance and Stochastics 18(2), 271–295.).

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of The International Actuarial Association

1. Automatic continuity

Since its introduction in the landmark paper (Artzner et al., Reference Artzner1999), the axiomatic theory of risk measures has been a fruitful area of research. Among many topics, one particular direction is to investigate automatic continuity of risk measures. In general, automatic continuity has long been an interesting research topic in mathematics and probably originates from the fact that a real-valued convex function on an open interval is continuous. This well-known fact was later extended to the following theorem for real-valued convex functionals on general Banach lattices.

Theorem (Ruszczyńki and Shapiro Reference Ruszczyński and Shapiro2006). A real-valued, convex, monotone functional on a Banach lattice is norm continuous.

Recall that a functional $\rho$ on a vector space $\mathcal{X}$ is said to be convex if $\rho(\lambda X+(1-\lambda )Y)\leq \lambda \rho(X)+(1-\lambda )\rho(Y) $ for any $X,Y\in \mathcal{X}$ and any $\lambda\in [0,1]$ . Recall also that a vector lattice $\mathcal{X}$ is a real vector space with a partial order $\leq$ such that (1) if $X\leq Y$ in $\mathcal{X}$ , then $X+Z\leq Y+Z$ for any $Z\in\mathcal{X}$ and $\alpha X\leq \alpha Y$ for any $\alpha\in\mathbb{R}$ with $\alpha\geq 0$ ; (2) the supremum of any two elements exists in $\mathcal{X}$ . We also denote $X\leq Y$ by $Y\geq X$ . In the case of a vector lattice of random variables, the order $X\leq Y$ is always defined by $X\leq Y$ a.s. See Aliprantis and Burkinshaw (Reference Aliprantis and Burkinshaw2006) for standard terminology and facts regarding vector lattices. A Banach lattice $\mathcal{X}$ is a vector lattice with a complete norm such that $\lvert X\rvert\leq \lvert Y\rvert$ in $\mathcal{X}$ implies $\lVert X\rVert\leq \lVert Y\rVert$ . A functional $\rho$ on a Banach lattice $\mathcal{X}$ is said to be increasing if $\rho(X)\leq \rho(Y)$ whenever $X\leq Y$ in $\mathcal{X}$ . $\rho$ is said to be decreasing if $-\rho$ is increasing. $\rho$ is said to be monotone if it is either increasing or decreasing.

The above celebrated result of Ruszczyński and Shapiro has drawn extensive attention in optimization, operations research, and risk management. We refer the reader to Biagini and Frittelli (Reference Biagini, Frittelli, Delbaen, Rasonyi and Stricker2009) for a version on Fréchet lattices and Farkas et al. (Reference Farkas, Koch-Medina and Munari2014) for further literature on automatic norm continuity properties.

With law invariance, other types of continuity properties beyond norm continuity can be established. The theorem below is striking. Recall first that a functional $\rho$ defined on a set $\mathcal{X}$ of random variables is said to be law invariant if $\rho(X)=\rho(Y)$ whenever $X,Y\in \mathcal{X}$ have the same distribution. Recall also that a functional $\rho$ defined on a set $\mathcal{X}$ of random variables is said to have the Fatou property if $\rho(X)\leq\liminf_n\rho(X_n)$ whenever $X_n\xrightarrow{o}X$ in $\mathcal{X}$ . Here, $X_n\xrightarrow{o}X$ in $\mathcal{X}$ , termed as order convergence in $\mathcal{X}$ ,Footnote 1 is used in the literature to denote dominated a.s. convergence in $\mathcal{X}$ , that is, $X_n\stackrel{a.s.}{\longrightarrow}X$ and there exists $Y\in \mathcal{X}$ such that $\lvert X_n\rvert\leq Y$ a.s. for any $n\in\mathbb{N}$ . The Fatou property is therefore just order lower semicontinuity.

Theorem (Jouini et al., Reference Jouini, Schachermayer and Touzi2006). A real-valued, convex, monotone, law-invariant functional on $L^\infty$ over a nonatomic probability space has the Fatou property. Consequently, it is $\sigma(L^\infty, L^1)$ lower semicontinuous and admits a dual representation via $L^1$ .

This theorem was recently extended by Chen et al. (Reference Chen, Gao, Leung and Li2022) to general rearrangement-invariant spaces. See Chen et al. (Reference Chen, Gao, Leung and Li2022), Theorems 2.2., 4.3, 4.7) for details. We also refer the reader to Shapiro (2013) for further interesting continuity properties of law-invariant risk measures.

In this section, we aim at extending the above theorem of Ruszczyński and Shapiro on norm continuity of convex functionals. Specifically, we show that the monotonicity assumption can be significantly relaxed to the following notion on order boundedness.

Definition 1.1. Let $\mathcal{X}$ be a vector lattice. For $U, V\in \mathcal{X}$ with $U\leq V$ , the order interval $[U, V]$ is defined by:

\begin{align*}[U, V]=\{X\in \mathcal{X}:U\leq X\leq V\}.\end{align*}

A functional $\rho:\mathcal{X}\rightarrow \mathbb{R}$ is said to be order bounded above if it is bounded above on all order intervals.

Monotone functionals are easily seen to be order bounded above. While risk measures are usually assumed to be monotone, many important functionals used in finance, insurance, and other disciplines are not necessarily monotone.

Example 1.2. General deviation measures were introduced in Rockafellar et al. (Reference Rockafellar, Uryasev and Zabarankin2006) as functionals $\rho:L^2 \rightarrow [0,+\infty]$ satisfying subadditivity, positive homogeneity, $\rho(X+c)=\rho(X)$ for every $X\in L^2$ and $c \in \mathbb{R}$ , and $\rho(X) \gt 0$ for nonconstant X. They are usually not monotone but may be order bounded above. A specific example is standard deviation and semideviations. Recall that for a random variable $X\in L^2$ , its standard deviation and upper and lower semideviations are given by:

\begin{align*}\sigma(X)=\lVert X-\mathbb{E}[X]\rVert_{L^2},\;\;\sigma_+(X)=\lVert(X-\mathbb{E}[X])^+\rVert_{L^2},\;\;\sigma_-(X)=\lVert(X-\mathbb{E}[X])^-\rVert_{L^2},\end{align*}

respectively. They are well known to be convex. It is also easy to see that they are neither increasing nor decreasing. We show that they are all order bounded above on $L^2$ . Indeed, take any order interval $[U, V]\subset L^2$ and any $X\in[U, V]$ . The desired order boundedness property is immediate by the following inequalities:

\begin{align*}U-\mathbb{E}[V]\leq\, &X-\mathbb{E}[X]\leq V-\mathbb{E}[U]\\0\leq\, & (X-\mathbb{E}[X])^+\leq (V-\mathbb{E}[U])^+\\ 0\leq\, &(X-\mathbb{E}[X])^-\leq (U-\mathbb{E}[V])^-\end{align*}

Example 1.3. General variability measures were introduced in Bellini et al. (Reference Bellini, Fadina, Wang and Wei2022) as law-invariant, positive-homogeneous functionals that vanish on constants. Many of them are also order-bounded above, although usually not monotone. In fact, all the three one-parameter families of variability measures in Bellini et al. (Reference Bellini, Fadina, Wang and Wei2022, Section 2.3) are easily seen to be order bounded above but not monotone. Let us discuss in details the class of inter-ES differences $\Delta_p^\mathrm{ES}$ , $p\in(0,1)$ , on $L^1$ . Let $X\in L^1$ and $p\in (0,1)$ . Recall the right and left expected shortfalls of X:

\begin{align*}\mathrm{ES}_p(X)=\frac{1}{1-p}\int_p^1F_X^{-1}(t)\,\mathrm{d}t,\quad \mathrm{ES}_p^-(X)=\frac{1}{p}\int_0^pF_X^{-1}(t)\,\mathrm{d}t,\end{align*}

where $F_X^{-1}(t)=\inf\{x\in\mathbb{R}:\mathbb{P}(X\leq x)\geq t\}$ is the left quantile function of X. The inter-ES difference $\Delta_p^\mathrm{ES}$ is defined by:

\begin{align*}\Delta_p^\mathrm{ES}(X)\,:\!=\,\mathrm{ES}_p(X)-\mathrm{ES}^-_{1-p}(X)=\mathrm{ES}_p(X)+\mathrm{ES}_p({-}X).\end{align*}

$\Delta_p^\mathrm{ES}$ is clearly convex. Take any order interval $[U, V]\subset L^1$ and any $X\in[U, V]$ . By the monotonicity of expected shortfall, we get

\begin{align*}\Delta_p^\mathrm{ES}(X)\leq \mathrm{ES}_p(V)+\mathrm{ES}_p({-}U) \lt \infty.\end{align*}

This proves that $\mathrm{ES}_p$ is order-bounded above on $L^1$ . Now suppose that $U\in L^1$ follows a uniform distribution on $[-1,1]$ . Then

\begin{align*}U \leq 1,\quad \Delta_p^\mathrm{ES}(U)=2\mathrm{ES}_p(U) \gt 0=\Delta_p^\mathrm{ES}(1),\end{align*}
\begin{align*}U \geq -1,\quad \Delta_p^\mathrm{ES}(U)=2\mathrm{ES}_p(U) \gt 0=\Delta_p^\mathrm{ES}({-}1).\end{align*}

Thus, $\Delta_p^\mathrm{ES}$ is not monotone.

Our main result in this section is as follows. Recall first that a topological vector space $(\mathcal{X},\tau)$ is a Fréchet lattice if $\mathcal{X}$ is a vector lattice and $\tau$ is induced by a complete metric such that 0 has a fundamental system of convex solid neighborhoods. A neighborhood $\mathcal{V}$ of $0 \in \mathcal{X}$ is solid if $Y \in \mathcal{V}$ whenever $X \in \mathcal{V}$ and $Y \in \mathcal{X}$ satisfy that $|Y| \leq |X|$ . All $L^p$ spaces and Orlicz spaces equipped with their natural norm are Banach lattices and, in particular, are Fréchet lattices.

Theorem 1.4. Let $(\mathcal{X},\tau)$ be a Fréchet lattice. Let $\rho:\mathcal{X} \rightarrow \mathbb{R}$ be a convex, order-bounded above functional. Then $\rho $ is $\tau$ -continuous.

Recall that monotone functionals are order bounded above and Banach lattices are Fréchet. Thus, Theorem 1.4 includes the preceding theorem of Ruszczyński and Shapiro as a special case.

Proof of Theorem 1.4. Let $(X_n)$ and X be such that $X_n\xrightarrow{{\tau}}X $ in $\mathcal{X}$ . We want to show that $\rho(X_n)\rightarrow \rho(X)$ . Suppose otherwise that $\rho(X_n)\not\rightarrow \rho(X)$ . By passing to a subsequence, we may assume that

(1.1) \begin{align}\lvert\rho(X_n)-\rho(X)\rvert \gt \varepsilon_0\quad \text{for some }\varepsilon_0 \gt 0 \text{ and any }n\in\mathbb{N}.\end{align}

Let $(\mathcal{V}_n)$ be a basis of 0 for $\tau$ consisting of solid neighborhoods such that $\mathcal{V}_{n+1}+\mathcal{V}_{n+1} \subseteq \mathcal{V}_n$ for each $n\in\mathbb{N}$ . By passing to a further subsequence of $(X_n)$ , we may assume that $n\lvert X_{n}-X\rvert \in \mathcal{V}_n$ for any $n\geq 1$ . Put $W_n=\sum_{i=1}^n i\lvert X_{i}-X\rvert $ . For any $n,m \in \mathbb{N}$ , we have

\begin{align*}W_{n+m}-W_n= \sum_{i=n+1}^{n+m} i\lvert X_{i}-X\rvert \in \mathcal{V}_{n+1}+\mathcal{V}_{n+2}+\cdots+\mathcal{V}_{n+m} \subseteq \mathcal{V}_{n}.\end{align*}

Thus, $(W_n)$ is a $\tau$ Cauchy sequence. By the completeness of $\tau$ , there exists $Y \in \mathcal{X}$ such that $W_n \xrightarrow{{\tau}} Y$ . Since $(W_n)$ is an increasing sequence, $W_n\uparrow Y$ (Aliprantis and Burkinshaw, Reference Aliprantis and Burkinshaw2006, Theorem 3.46). In particular, it follows that $W_n\leq Y$ so that

(1.2) \begin{align}\lvert X_n-X\rvert\leq \frac{1}{n}Y\quad\text{ for any }n\in\mathbb{N}.\end{align}

Moreover, since $\rho$ is order bounded above on $[X-Y,X+Y]=X+[-Y,Y]$ , there exists a real number $M \gt 0$ such that

(1.3) \begin{align}\rho(X+Z) \leq M \quad\text{ for any }Z \in [-Y,Y],\text{ that is, whenever }\lvert Z\rvert\leq Y.\end{align}

Now fix any $\varepsilon \gt 0$ . Put $N=\lfloor \frac{1}{\varepsilon}\rfloor+1$ . By (1.2),

(1.4) \begin{align}\frac{1}{\varepsilon}\lvert X_n-X\rvert\leq Y\quad\text{ for any }n\geq N.\end{align}

On the one hand, by the convexity of $\rho$ and the representation

\begin{align*}X_n=&(1-\varepsilon)X+\varepsilon\left(X+\frac{1}{\varepsilon}(X_n-X)\right),\end{align*}

we have

\begin{align*} \rho(X_n) \leq& (1-\varepsilon) \rho(X)+\varepsilon \rho\left(X+\frac{1}{\varepsilon}(X_n-X)\right),\end{align*}

implying that

\begin{align*}\rho(X_n)-\rho(X) \leq \varepsilon\left( \rho\Big(X+\frac{1}{\varepsilon}(X_n-X)\Big)-\rho(X) \right).\end{align*}

This together with (1.4) and (1.3) implies that

(1.5) \begin{align}\rho(X_n)-\rho(X) \leq \varepsilon\big(M-\rho(X) \big) \quad\text{ for any }n\geq N.\end{align}

On the other hand, by the convexity of $\rho$ and

\begin{align*}2X-X_n=(1-\varepsilon)X+\varepsilon\left(X+\frac{1}{\varepsilon}(X-X_n)\right),\end{align*}

we have as before that

\begin{align*}\rho(2X-X_n)& \leq (1-\varepsilon) \rho(X)+\varepsilon \rho\left(X+\frac{1}{\varepsilon}(X-X_n)\right)\\&\leq (1-\varepsilon) \rho(X)+\varepsilon M,\end{align*}

for any $n\geq N$ . In particular,

\begin{align*}\rho(2X-X_n)-\rho(X)\leq \varepsilon(M-\rho(X)) \quad\text{ for any }n\geq N.\end{align*}

By $X=\frac{1}{2} X_n+\frac{1}{2}(2X-X_n)$ and the convexity of $\rho$ , we also get

\begin{align*}\rho(X) \leq \frac{1}{2} \rho(X_n)+\frac{1}{2} \rho(2X-X_n)\end{align*}

so that

\begin{align*}\rho(X)-\rho(X_n) \leq \rho(2X-X_n)-\rho(X).\end{align*}

It follows that

(1.6) \begin{align}\rho(X)-\rho(X_n)\leq \varepsilon(M-\rho(X)) \quad\text{ for any }n\geq N.\end{align}

Combining (1.5) and (1.6), we have

\begin{align*}\lvert\rho(X)-\rho(X_n)\rvert\leq \varepsilon(M-\rho(X)) \quad\text{ for any }n\geq N.\end{align*}

Hence, $\rho(X_n)\rightarrow \rho(X)$ . This contradicts (1.1) and completes the proof.

2. Strong consistency

In this section, we discuss the strong consistency of estimating the risk $\rho(X)$ using estimates drawn from the empirical distributions. This problem has been studied for general convex risk measures on $L^p$ spaces and Orlicz hearts in Shapiro (2013) and Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014), respectively. We are motivated to study it on general Orlicz spaces.

Throughout this section, fix a nonatomic probability space $(\Omega,\mathcal{F},\mathbb{P})$ . Let $L^0$ be the space of all random variables on $\Omega$ , with a.s. equal random variables identified as the same. Let $\mathcal{X}$ be a subset of $L^0$ . Denote the set of distributions of all random variables in $\mathcal{X}$ by:

\begin{align*}\mathcal{M}(\mathcal{X})=\{ \mathbb{P} \circ X^{-1} : X \in \mathcal{X}\}.\end{align*}

Recall that a law-invariant functional $\rho$ on $\mathcal{X}$ induces a natural mapping $\mathcal{R}_\rho$ on $\mathcal{M}(\mathcal{X})$ by:

\begin{align*}\mathcal{R}_\rho(\mathbb{P} \circ X^{-1})=\rho(X),\quad\text{ for any }X\in\mathcal{X}.\end{align*}

Recall that a sequence $(X_n)$ of random variables is said to be stationary if for any $k, n \in \mathbb{N}$ and any $x_1,...,x_n \in \mathbb{R}$ , it holds that $\mathbb{P}(X_1 \leq x_1,\cdots, X_n \leq x_n)=\mathbb{P}(X_{k+1}\leq x_1\cdots, X_{k+n}\leq x_n)$ . Let $\mathcal{B}$ be the Borel $\sigma$ -algebra on $\mathbb{R}^{\mathbb{N}}$ . A set $A \in \mathcal{F}$ is said to be invariant if there exists $B \in \mathcal{B}$ such that $A=\{(X_n)_{n \geq k} \in B\}$ for every $k\in \mathbb{N}$ . A stationary sequence is said to be ergodic if every invariant set has probability 0 or 1. Birkhoff’s ergodic theorem states that the arithmetic averages of a stationary ergodic sequence $(X_n)$ converge a.s. to $\mathbb{E}[X_1]$ whenever $\mathbb{E}[\lvert X_1\rvert] \lt \infty$ . See Breiman (Reference Breiman1991, Section 6.7) for more facts regarding stationary and ergodic processes.

Let $\mathcal{X}$ be a subset of $L^0$ containing $L^\infty$ . Take any $X\in \mathcal{X}$ . Let $(X_n)$ be a stationary ergodic sequence of random variables with the same distribution as X. We denote the empirical distribution of X arising from $X_1,\ldots,X_n$ by:

\begin{align*}\widehat{m}_n=\frac{1}{n} \sum_{i=1}^n \delta_{X_i};\end{align*}

here $\delta_x$ is the Dirac measure on $\mathbb{R}$ at x. Since $L^\infty\subset \mathcal{X}$ , $\widehat{m}_n\in \mathcal{M}(L^\infty)\subset \mathcal{M}(\mathcal{X})$ . This allows us to consider the corresponding empirical estimate for $\rho(X)$ :

\begin{align*}\widehat{\rho}_n\,:\!=\,\mathcal{R}_{\rho} (\widehat{m}_n);\end{align*}

We say that $\rho$ is strongly consistent at X if for any stationary ergodic sequence of random variables with the same distribution as X:

\begin{align*}\widehat{\rho}_n=\mathcal{R}_{\rho} (\widehat{m}_n) \xrightarrow{{a.s.}} \rho(X).\end{align*}

We refer to Shapiro (2013), Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Reference Krätschmer, Schied and Zähle2017), and the references therein for literature on strong (and weak) consistency of risk measures. In particular, we are interested in the following result.

Theorem (Krätschmer et al., Reference Krätschmer, Schied and Zähle2014, Theorem 2.6). A norm-continuous, law-invariant functional on an Orlicz heart is strongly consistent everywhere.

We remark that this theorem was originally stated for real-valued, law-invariant, convex risk measures. In their definition in Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014), convex risk measures are assumed to be monotone and thus are norm continuous by the aforementioned theorem of Ruszczyński and Shapiro. A quick examination of the proof of Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 2.6) shows that norm continuity and law invariance are the only ingredients of the functional used.

Let’s recall the definitions of Orlicz spaces and hearts. A function $\Phi:[0,\infty)\to[0,\infty)$ is called an Orlicz function if it is nonconstant, convex, increasing, and $\Phi(0)=0$ . The Orlicz space $L^\Phi$ is the space of all $X\in L^0$ such that the Luxemburg norm is finite:

\[\lVert X\rVert_\Phi\,:\!=\,\inf\Big\{\frac{1}{\lambda}:\lambda \gt 0 \text{ and } \mathbb{E}\left[\Phi\big(\lambda\lvert X\rvert\big)\right]\leq 1 \Big\} \lt \infty.\]

The Orlicz heart $H^\Phi$ is a subspace of $L^\Phi$ defined by:

\[H^\Phi \,:\!=\, \left\{X\in L^0 : \mathbb{E}\left[\Phi\big(\lambda|X|\big)\right] \lt \infty \text{ for any } \lambda \gt 0\right\}.\]

We refer to Edgar and Sucheston (Reference Edgar and Sucheston1992) for standard terminology and facts on Orlicz spaces. Risk measures on Orlicz spaces have been studied extensively; see, for example, Bellini et al. (Reference Bellini, Laeven and Rosazza Gianin2021), Bellini and Rosazza Gianin (Reference Bellini and Rosazza Gianin2012), Biagini et al. (Reference Biagini, Frittelli and Grasselli2011), Gao et al. (Reference Gao, Leung, Munari and Xanthos2018, Reference Gao, Leung and Xanthos2019, Reference Gao, Munari and Xanthos2020), Gao and Xanthos (Reference Gao and Xanthos2018), and the references therein.

The above theorem in conjunction with Theorem 1.4 immediately yields the following result, which improves Krätschmer et al., (Reference Krätschmer, Schied and Zähle2014), Theorem 2.6.

Corollary 2.1. A convex, law-invariant, order-bounded-above functional on an Orlicz heart is strongly consistent everywhere.

The above theorem of Krätschmer et al. is essentially due to the fact that for any $X\in H^\Phi$ , and for a.e. $\omega\in \Omega$ , there exist a random variable $X^\omega$ on $\Omega$ with same distribution as X and a sequence of random variables $(X_n^\omega)$ on $\Omega$ with distributions $\widehat{m}_n(\omega)$ ’s such that

(2.1) \begin{align}\lVert X_n^\omega-X^\omega\rVert_\Phi\rightarrow 0.\end{align}

This, however, does not hold for arbitrary random variables in a general Orlicz space $L^\Phi$ . Specifically, when $\Phi$ fails the $\Delta_2$ -condition, there exists $X\in L^\Phi\backslash H^\Phi$ . For this X, (2.1) must fail: $X_n^\omega$ takes only at most n values and thus is a simple random variable lying in $H^\Phi$ ; therefore, (2.1) would imply $X\in H^\Phi$ as well.

We extend the theorem of Krätschmer et al. as follows. Recall first that on a set $\mathcal{X}\subset L^0$ , a functional $\rho:\mathcal{X}\rightarrow \mathbb{R}$ is said to be order continuous at $X\in\mathcal{X}$ if $\rho(X_n)\rightarrow \rho(X)$ whenever $X_n\xrightarrow{o}X $ in $\mathcal{X}$ . In the literature, order continuity is also termed as the Lebesgue property.

Theorem 2.2. An order-continuous, law-invariant functional on an Orlicz space is strongly consistent everywhere.

For the proof of Theorem 2.2, we need to establish a few technical lemmas, which along the way also reveal why order continuity is the most natural condition for general Orlicz spaces. For an Orlicz function $\Phi$ , the Young class is defined by:

\[Y^\Phi \,:\!=\, \left\{X\in L^0 : \mathbb{E}\left[\Phi\left(|X|\right)\right] \lt \infty\right\}.\]

It is easy to see that $H^\Phi\subset Y^\Phi\subset L^\Phi$ . As in Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014), we use the term $\Phi$ -weak topology in place of the $\Phi(\lvert\cdot\rvert)$ -weak topology on $\mathcal{M}(Y^{\Phi})$ for brevity. This topology is metrizable (see e.g., Fölmer and Schied Reference Fölmer and Schied2011, Corollary A.45). For the special case where $\Phi(x)=\frac{x^p}{p}$ for some $1 \leq p \lt \infty$ , the $\Phi$ -weak topology generates the Wasserstein metric of order p (see e.g., Villani Reference Villani2021, Theorem 7.12). Moreover, for a sequence $(\mu_n)\subset \mathcal{M}(Y^{\Phi})$ and $ \mu_0 \in \mathcal{M}(Y^{\Phi})$ , $(\mu_n)$ converges $\Phi$ -weakly to $\mu_0$ , written as $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ , iff

\[\ \mu_n\xrightarrow{\text{weakly}}\mu \ \ \text{and} \ \ \int \Phi(|x|) \mu_{n}(dx) \rightarrow \int \Phi(|x|) \mu_{0}(dx).\]

The following Skorohod representation for $\Phi$ -weak convergence is a general order version of Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 3.5) and Krätschmer et al. (Reference Krätschmer, Schied and Zähle2017, Theorem 6.1) beyond the Orlicz heart and without any restrictions on $\Phi$ .

Lemma 2.3.

  1. (i) Let $(\mu_n)$ be a sequence in $\mathcal{M}(Y^{\Phi})$ that converges $\Phi$ -weakly to some $\mu_0 \in \mathcal{M}(Y^{\Phi})$ . Then there exist a subsequence $(\mu_{n_k})$ of $(\mu_n)$ , a sequence $(X_k)$ in $Y^{\Phi}$ and $X\in Y^{\Phi}$ such that $X_k$ has distribution $\mu_{n_k}$ for each $k \in \mathbb{N}$ , X has distribution $\mu_0$ , and $X_k \xrightarrow{o} X$ in $Y^{\Phi}$ .

  2. (ii) Let $(X_n)$ in $Y^{\Phi}$ and $X \in Y^{\Phi}$ be such that $X_n\xrightarrow{o}X \text{ in } Y^{\Phi}$ . Then $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ , where $\mu_n$ ’s are the distributions of $X_n$ ’s and $\mu_0$ is the distribution of X, respectively.

Proof. We start with the following observation. Since $\Phi$ is continuous and increasing, for any sequence $(X_n)$ in $Y^{\Phi}$ , we have

(2.2) \begin{align}\mathbb{E}\left[\Phi \Big(\sup_{n \in \mathbb{N}} |X_n|\Big)\right]=\mathbb{E}\left[\sup_{n \in \mathbb{N}} \Phi(|X_n|)\right]\end{align}
  1. (i). Take $(\mu_n)$ in $\mathcal{M}(Y^{\Phi})$ that converges $\Phi$ -weakly to $\mu_0 \in \mathcal{M}(Y^{\Phi})$ . Since the probability space is nonatomic, the classical Skorohod representation yields $(X_n)\subset Y^\Phi$ and $X\in Y^\Phi$ such that $X_n \sim \mu_n$ for every $n\in\mathbb{N}$ , $X \sim \mu_0$ , and $X_n\xrightarrow{a.s.}X$ . Clearly,

    (2.3) \begin{equation}\mathbb{E}[\Phi(|X|)]= \int \Phi(|x|) \mu_{0}(dx)= \lim_n\int \Phi(|x|) \mu_{n}(dx)=\lim_n\mathbb{E}[\Phi(|X_n|)] \lt \infty.\end{equation}
    Since $\Phi$ is continuous, we also have that $\Phi(\lvert X_n\rvert) \xrightarrow{{a.s.}} \Phi(\lvert X\rvert) $ . This combined with (2.3) yields (see Aliprantis and Burkinshaw, Reference Aliprantis and Burkinshaw2006, Theorem 31.7) that
    \begin{align*}\bigl\lVert\Phi(\lvert X_n\rvert) -\Phi(\lvert X\rvert)\bigr\rVert_{L^1}\rightarrow 0.\end{align*}
    Passing to a subsequence we may assume that
    \begin{align*}\sum_{n=1}^\infty\bigl\lVert \Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rVert_{L^1} \lt \infty\end{align*}
    so that
    \begin{align*}\sum_{n=1}^\infty \bigl\lvert\Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rvert\in L^1.\end{align*}
    In particular,
    \begin{align*}\sup_{n\in\mathbb{N}}\bigl\lvert\Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rvert\in L^1.\end{align*}
    It follows from $\Phi(\lvert X_n\rvert)\leq \bigl\lvert\Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rvert+\Phi(\lvert X\rvert)$ that $\sup_{n\in\mathbb{N} }\Phi(\lvert X_n\rvert) \in L^1 $ . Hence, by (2.2), $\mathbb{E}[\Phi (\sup_{n \in \mathbb{N}} |X_n|)]=\mathbb{E}[\sup_{n \in \mathbb{N}} \Phi(|X_n|)] \lt \infty$ . That is, $\sup_{n \in \mathbb{N}} \lvert X_n\rvert\in Y^\Phi$ ; equivalently, $(X_n)$ is dominated in $Y^\Phi$ . In particular, we have $X_n \xrightarrow{o}X$ in $Y^{\Phi}$
  2. (ii). Let $(X_n)$ be such that $X_n\xrightarrow{o}X$ in $Y^{\Phi}$ and $\mu_n$ be the distribution of $X_n$ for each n, $\mu_0$ be the distribution of X. We clearly have $ \mu_n\xrightarrow{\text{weakly}}\mu $ and by the continuity of $\Phi$ , we get $\Phi(|X_n|) \xrightarrow{{a.s.}} \Phi(|X|)$ . Since $(X_n)$ is dominated in $Y^\Phi$ , $\sup_{n\in\mathbb{N}}\lvert X_n\rvert\in Y^\Phi$ . Thus in view of (2.2), we get

    \begin{align*}\mathbb{E}\left[\sup_{n \in \mathbb{N}} \Phi(|X_n|)\right]=\mathbb{E}\left[\Phi \Big(\sup_{n \in \mathbb{N}} |X_n|\Big)\right] \lt \infty,\end{align*}
    that is, $\sup_{n \in \mathbb{N}} \Phi(|X_n|)\in L^1$ . By the dominated convergence theorem, we get
    \begin{align*}\int \Phi(|x|) \mu_{0}(dx)=\mathbb{E}[\Phi(|X|)]= \lim_n\mathbb{E}[\Phi(|X_n|)]=\lim_n\int \Phi(|x|) \mu_{n}(dx). \end{align*}
    This proves $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ .

The lemma below reveals the essential and natural role of order continuity.

Lemma 2.4. Let $\rho:Y^\Phi\rightarrow \mathbb{R}$ be law invariant. The following are equivalent.

  1. (i) $\mathcal{R}_\rho$ is continuous on $\mathcal{M}(Y^{\Phi})$ with the $\Phi$ -weak topology.

  2. (ii) $\rho$ is order continuous on $Y^\Phi$ .

Proof. (ii) $\implies$ (i). Suppose that (ii) holds but (i) fails. Recall that the $\Phi$ -weak topology is metrizable. Thus, we can find a sequence $(\mu_n)$ and $\mu_0$ in $\mathcal{M}(Y^{\Phi})$ such that $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ but $\mathcal{R}_\rho (\mu_n) \not\rightarrow \mathcal{R}_{\rho}(\mu_0)$ . Passing to a subsequence, we may assume that

(2.4) \begin{equation} |\mathcal{R}_{\rho} (\mu_n)-\mathcal{R}_{\rho}(\mu_0)| \geq \varepsilon_0,\end{equation}

for some $\varepsilon_0 \gt 0$ and all $n \in \mathbb{N}$ . By Lemma 2.3(i), there exist a subsequence $(\mu_{n_k})$ of $(\mu_n)$ , a sequence $(X_k)$ in $Y^{\Phi}$ and $X\in Y^{\Phi}$ such that $X_k$ has distribution $\mu_{n_k}$ for each $k \in \mathbb{N}$ , X has distribution $\mu_0$ , and $X_k \xrightarrow{o} X$ in $Y^{\Phi}$ . Then (ii) implies that

\begin{align*}\mathcal{R}_{\rho}(\mu_{n_k}) =\rho(X_k)\rightarrow \rho(X)=\mathcal{R}_{\rho}(\mu_0).\end{align*}

This contradicts (2.4) and proves (ii) $\implies$ (i). The reverse implication (i) $\implies$ (ii) is immediate by Lemma 2.3(ii).

We now present the proof of Theorem 2.2.

Proof of Theorem 2.2. Suppose that $\rho:L^\Phi\rightarrow \mathbb{R}$ is law-invariant and order-continuous. Take any $X\in L^\Phi$ and any stationary ergodic sequence of random variables with the same distribution as X. Denote by $\mu_0$ their distribution. Let $\lambda \gt 0$ be such that $\mathbb{E}[\Phi(\lambda\lvert X\rvert)] \lt \infty$ . Put $\Phi_{\lambda}(\cdot)\,:\!=\,\Phi(\lambda {\cdot})$ . Arguing similarly as in the proof of Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 2.6), by applying Birkhoff’s ergodic theorem, one obtains a measurable subset $\Omega_0$ of $\Omega $ such that $\mathbb{P}(\Omega_0)=1$ and for every $\omega\in \Omega_0$ ,

\begin{align*}\widehat{m}_n(\omega)\xrightarrow{\Phi_\lambda\text{-weakly}}\mu_0.\end{align*}

Since $\rho$ is order-continuous on $L^{\Phi}$ and $Y^{\Phi_\lambda}\subset L^\Phi$ , $\rho$ is also order-continuous on $Y^{\Phi_\lambda}$ . By Lemma 2.4, $\mathcal{R}_\rho$ is continuous on $\mathcal{M}(Y^{\Phi_\lambda})$ with the $\Phi_\lambda$ -weak topology. Thus, $\widehat{\rho}_n(\omega)=\mathcal{R}_\rho(\widehat{m}_n(\omega)) \rightarrow\mathcal{R}_\rho(\mu_0)= \rho(X)$ for every $\omega\in \Omega_0$ . This proves that $\rho$ is strongly consistent at X.

Remark 2.5. In our definition of Orlicz spaces, we do not allow the Orlicz function to take the $\infty$ value, which excludes $L^\infty$ from the above considerations. However, Theorem 2.2 remains true for $L^\infty$ . Let $\rho:L^\infty \rightarrow \mathbb{R}$ be an order-continuous, law-invariant functional. Take any $X\in L^\infty$ and any stationary ergodic sequence $(X_n)$ of random variables with the same distribution as X. Denote by $\mu_0$ their distribution. By Birkhoff’s ergodic theorem and an application of Theorem 6.6 in Parthasarathy (Reference Parthasarathy1967, Chapter 1), there exists a measurable subset $\Omega_0$ of $\Omega $ such that $\mathbb{P}(\Omega_0)=1$ and $\widehat{m}_n(\omega)\xrightarrow{\text{weakly}}\mu_0$ for any $\omega \in \Omega_0$ . One may assume further that $\lvert X(\omega)\rvert \leq \lVert X\rVert_{\infty}$ and $\lvert X_n(\omega)\rvert \leq \lVert X\rVert_{\infty} $ for any $\omega \in \Omega_0$ and any $n\geq 1$ . Fix any $\omega \in \Omega_0$ . The classical Skorohod representation yields $(X_n^{\omega})\subset L^{\infty}$ and $X^\omega\in L^{\infty}$ such that $X_n^{\omega} \sim \widehat{m}_n(\omega)$ for every $n\in\mathbb{N}$ , $X^\omega \sim \mu_0$ , and $X_n^{\omega}\xrightarrow{a.s.}X^\omega$ . We may assume that $\lvert X_n^{\omega}\rvert \leq \lVert X\rVert_{\infty}$ on $\Omega$ for every $n\geq 1$ . It follows that $X_n^{\omega} \xrightarrow{{o}} X^\omega$ in $L^\infty$ . Hence, by order continuity of $\rho$ , we get $\widehat{\rho}_n(\omega)=\mathcal{R}_\rho(\widehat{m}_n(\omega))=\rho(X_n^{\omega}) \rightarrow \rho(X^\omega)=\rho(X)$ . This proves that $\rho$ is strongly consistent on $L^\infty$ .

Order continuity of law-invariant functionals on Orlicz spaces is generally stronger than norm continuity. In the following, we show that it is satisfied by a large class of risk measures, namely, spectral risk measures. Spectral risk measures were introduced in Acerbi (Reference Acerbi2002) and includes many important risk measures such as the expected shortfall. Let $\phi$ be a nonnegative and nondecreasing function such that $\int_0^1 \phi(t)dt=1$ ( $\phi$ is called a spectrum). The associated spectral risk measure is defined by:

\begin{align*}\rho_\phi(X)=\int_{0}^1 \phi(t)F^{-1}_X(t)\,\mathrm{d}t,\quad X\in L^1\end{align*}

where $F_X^{-1}(t)=\inf\{x \in \mathbb{R}: F(x) \geq t\}$ is the left quantile function of X. It is known that $\rho_\phi$ takes values in $({-}\infty,\infty]$ and is convex, monotone, and lower semicontinuous with respect to the $L^1$ norm (see e.g., Amarante and Liebrich Reference Amarante and Liebrich2024, Lemma C.1). For spectral risk measures, the empirical estimator $\widehat{\rho}_n$ has the form of an L-statistic and the strong consistency can be studied using tools from the theory of L-statistics (see e.g., Tsukahara Reference Tsukahara2014). Below we give a simple proof of the strong consistency of $\rho_{\phi}$ in the Orlicz space framework based on Theorem 2.2.

Corollary 2.6. Let $\phi$ be a spectrum function such that $\rho_{\phi}$ is real-valued on $L^\Phi$ . Then $\rho_{\phi}$ is order continuous on $L^{\Phi}$ and is thus strongly consistent everywhere on $L^\Phi$ .

Proof. Suppose that $X_n \xrightarrow{{a.s}} X$ and there exists $Y \in L^\Phi$ such that $\lvert X_n\rvert\leq Y$ for every $n\in\mathbb{N}$ . It is well known that $F_{X_n}^{-1} \xrightarrow{{a.s.}} F_X^{-1} $ on (0,1) (with the Lebesgue measure). Hence,

\begin{align*}\phi F_{X_n}^{-1} \xrightarrow{{a.s.}}\phi F_X^{-1}.\end{align*}

Next, note that since $-Y\leq X_n\leq Y$ , $F_{-Y}^{-1}\leq F_{X_n}^{-1}\leq F_{Y}^{-1}$ on (0,1). Thus,

\begin{align*}\phi F_{-Y}^{-1}\leq \phi F_{X_n}^{-1}\leq \phi F_{Y}^{-1} \quad\text{for every }n\in\mathbb{N}.\end{align*}

Since $\rho$ is real-valued on $L^\Phi$ , $\phi F_{Y}^{-1}\in L^1$ and $\phi F_{-Y}^{-1}\in L^1$ . Thus, by the dominated convergence theorem,

\begin{align*}\rho_{\phi}(X_n)=\int_{0}^1 \phi(t)F^{-1}_{X_n}(t)\,\mathrm{d}t \rightarrow\int_{0}^1 \phi(t)F^{-1}_X(t)\,\mathrm{d}t= \rho_{\phi}(X).\end{align*}

This proves that $\rho_\phi$ is order continuous on $L^\phi$ . The strong consistency follows from Theorem 2.2.

We end this note with the following remark that improves the implication (b) $\implies $ (a) in Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 2.8) due to our Theorem 1.4.

Corollary 2.7. Suppose that $\Phi$ satisfies the $\Delta_2$ -condition. Let $\rho$ be any convex, law-invariant, order-bounded-above functional on $L^\Phi$ . Then $\mathcal{R}_\rho$ is continuous on $\mathcal{M}(L^\Phi)$ for the $\Phi$ -weak topology.

Proof. By Theorem 1.4, $\rho$ is norm-continuous. When $\Phi$ satisfies the $\Delta_2$ -condition, order convergence in $L^\Phi$ implies norm convergence. Thus, $\rho$ is also order-continuous. Under the $\Delta_2$ -condition, we also have $H^\Phi=Y^\Phi=L^\Phi$ . Now apply Lemma 2.4.

Acknowledgement

The authors acknowledge the support of NSERC Discovery Grants.

Footnotes

1 The term order convergence originates from the theory of vector lattices (see e.g., Aliprantis and Burkinshaw, Reference Aliprantis and Burkinshaw2006). We note here that in our definition, we do not require that $\mathcal{X}$ is a vector lattice.

References

Acerbi, C. (2002) Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking and Finance, 26(7), 15051518.CrossRefGoogle Scholar
Aliprantis, C.D. and Burkinshaw, O. (1998) Principles of Real Analysis. California: Academic Press.Google Scholar
Aliprantis, C.D. and Burkinshaw, O. (2006) Positive Operators. Dordrecht: Springer.CrossRefGoogle Scholar
Amarante, M. and Liebrich, F.B. (2024) Distortion risk measures: Prudence, coherence, and the expected shortfall. Mathematical Finance, 34(4), 12911327.CrossRefGoogle Scholar
Artzner, Ph., Delbaen, F., Eber, J.-M. and Heath, D. (1999) Coherent measures of risk. Mathematical Finance, 9, 203–228.CrossRefGoogle Scholar
Bellini, F., Fadina, T., Wang, R. and Wei, Y. (2022) Parametric measures of variability induced by risk measures. Insurance, Mathematics and Economics, 106, 270284.CrossRefGoogle Scholar
Bellini, F., Laeven, R.J.A. and Rosazza Gianin, E. (2021) Dynamic robust Orlicz premia and Haezendonck–Goovaerts risk measures. European Journal of Operational Research, 291(2), 438446.CrossRefGoogle Scholar
Bellini, F. and Rosazza Gianin, E. (2012) Haezendonck-Goovaerts risk measures and Orlicz quantiles. Insurance: Mathematics and Economics, 51, 107114.Google Scholar
Biagini, S. and Frittelli, M. (2009) On the extension of the Namioka-Klee theorem and on the Fatou Property for risk measures. In Optimality and Risk: Modern Trends in Mathematical Finance (eds. Delbaen, F., Rasonyi, M. and Stricker, C.), pp. 128. Berlin: Springer.Google Scholar
Biagini, S., Frittelli, M. and Grasselli, M. (2011) Indifference price with general semimartingales. Mathematical Finance, 21(3), 423446.CrossRefGoogle Scholar
Breiman, L. (1991) Probability. Classics in Applied Mathematics, vol. 7. Philadelphia: SIAM (corrected reprint of the 1968 original).Google Scholar
Chen, S., Gao, N., Leung, D. and Li, L. (2022) Automatic fatou property of law-invariant risk measures. Insurance: Mathematics and Economics, 105, 4153.Google Scholar
Edgar, G.A. and Sucheston, L. (1992) Stopping Times and Directed Processes. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Farkas, W., Koch-Medina, P. and Munari, C. (2014) Beyond cash-additive risk measures: When changing the numéraire fails. Finance and Stochastics, 18, 145173.CrossRefGoogle Scholar
Fölmer, H. and Schied, A. (2011) Stochastic Finance. An Introduction in Discrete Time, 3rd edn. Berlin: de Gruyter.CrossRefGoogle Scholar
Gao, N., Leung, D., Munari, C. and Xanthos, F. (2018) Fatou property, representations, and extensions of law-invariant risk measures on general Orlicz spaces. Finance and Stochastics, 22, 395415.CrossRefGoogle Scholar
Gao, N., Leung, D. and Xanthos, F. (2019). Closedness of convex sets in Orlicz spaces with applications to dual representation of risk measures. Studia Mathematica, 249, 329347.CrossRefGoogle Scholar
Gao, N., Munari, C. and Xanthos, F. (2020) Stability properties of Haezendonck-Goovaerts premium principles. Insurance: Mathematics and Economics, 94, 9499.Google Scholar
Gao, N. and Xanthos, F. (2018) On the C-property and $w^*$ -representations of risk measures. Mathematical Finance, 28(2), 748754.CrossRefGoogle Scholar
Jouini, E., Schachermayer, W. and Touzi, N. (2006) Law invariant risk measures have the Fatou Property. In Advances in Mathematical Economics, vol. 9, pp 4971.CrossRefGoogle Scholar
Krätschmer, V, Schied, A. and Zähle, H. (2014) Comparative and qualitative robustness for law-invariant risk measures. Finance and Stochastics, 18(2), 271295.CrossRefGoogle Scholar
Krätschmer, V, Schied, A. and Zähle, H. (2017) Domains of weak continuity of statistical functionals with a view toward robust statistics. Journal of Multivariate Analysis, 158, 119.CrossRefGoogle Scholar
Parthasarathy, K.R. (1967) Probability Measures on Metric Spaces . Probability and Mathematical Statistics, vol. 3. New York: Academic Press.Google Scholar
Rockafellar, R.T., Uryasev, S. and Zabarankin, M. (2006) Generalized deviation in risk analysis. Finance and Stochastics, 10, 5174.CrossRefGoogle Scholar
Ruszczyński, A. and Shapiro, A. (2006). Optimization of convex risk functions. Mathematics of Operations Research, 31(3), 433452.CrossRefGoogle Scholar
Shapiro, A. (2013) Consistency of sample estimates of risk averse stochastic programs. Journal of Applied Probability, 50(2), 533541.CrossRefGoogle Scholar
Tsukahara, H. (2014) Estimation of distortion risk measures. Journal of Financial Econometrics, 12(1), 213235.CrossRefGoogle Scholar
Villani, C. (2021) Topics in Optimal Transportation, vol. 58. American Mathematical Society.Google Scholar