1. Automatic continuity
Since its introduction in the landmark paper (Artzner et al., Reference Artzner1999), the axiomatic theory of risk measures has been a fruitful area of research. Among many topics, one particular direction is to investigate automatic continuity of risk measures. In general, automatic continuity has long been an interesting research topic in mathematics and probably originates from the fact that a real-valued convex function on an open interval is continuous. This well-known fact was later extended to the following theorem for real-valued convex functionals on general Banach lattices.
Theorem (Ruszczyńki and Shapiro Reference Ruszczyński and Shapiro2006). A real-valued, convex, monotone functional on a Banach lattice is norm continuous.
Recall that a functional $\rho$ on a vector space $\mathcal{X}$ is said to be convex if $\rho(\lambda X+(1-\lambda )Y)\leq \lambda \rho(X)+(1-\lambda )\rho(Y) $ for any $X,Y\in \mathcal{X}$ and any $\lambda\in [0,1]$ . Recall also that a vector lattice $\mathcal{X}$ is a real vector space with a partial order $\leq$ such that (1) if $X\leq Y$ in $\mathcal{X}$ , then $X+Z\leq Y+Z$ for any $Z\in\mathcal{X}$ and $\alpha X\leq \alpha Y$ for any $\alpha\in\mathbb{R}$ with $\alpha\geq 0$ ; (2) the supremum of any two elements exists in $\mathcal{X}$ . We also denote $X\leq Y$ by $Y\geq X$ . In the case of a vector lattice of random variables, the order $X\leq Y$ is always defined by $X\leq Y$ a.s. See Aliprantis and Burkinshaw (Reference Aliprantis and Burkinshaw2006) for standard terminology and facts regarding vector lattices. A Banach lattice $\mathcal{X}$ is a vector lattice with a complete norm such that $\lvert X\rvert\leq \lvert Y\rvert$ in $\mathcal{X}$ implies $\lVert X\rVert\leq \lVert Y\rVert$ . A functional $\rho$ on a Banach lattice $\mathcal{X}$ is said to be increasing if $\rho(X)\leq \rho(Y)$ whenever $X\leq Y$ in $\mathcal{X}$ . $\rho$ is said to be decreasing if $-\rho$ is increasing. $\rho$ is said to be monotone if it is either increasing or decreasing.
The above celebrated result of Ruszczyński and Shapiro has drawn extensive attention in optimization, operations research, and risk management. We refer the reader to Biagini and Frittelli (Reference Biagini, Frittelli, Delbaen, Rasonyi and Stricker2009) for a version on Fréchet lattices and Farkas et al. (Reference Farkas, Koch-Medina and Munari2014) for further literature on automatic norm continuity properties.
With law invariance, other types of continuity properties beyond norm continuity can be established. The theorem below is striking. Recall first that a functional $\rho$ defined on a set $\mathcal{X}$ of random variables is said to be law invariant if $\rho(X)=\rho(Y)$ whenever $X,Y\in \mathcal{X}$ have the same distribution. Recall also that a functional $\rho$ defined on a set $\mathcal{X}$ of random variables is said to have the Fatou property if $\rho(X)\leq\liminf_n\rho(X_n)$ whenever $X_n\xrightarrow{o}X$ in $\mathcal{X}$ . Here, $X_n\xrightarrow{o}X$ in $\mathcal{X}$ , termed as order convergence in $\mathcal{X}$ ,Footnote 1 is used in the literature to denote dominated a.s. convergence in $\mathcal{X}$ , that is, $X_n\stackrel{a.s.}{\longrightarrow}X$ and there exists $Y\in \mathcal{X}$ such that $\lvert X_n\rvert\leq Y$ a.s. for any $n\in\mathbb{N}$ . The Fatou property is therefore just order lower semicontinuity.
Theorem (Jouini et al., Reference Jouini, Schachermayer and Touzi2006). A real-valued, convex, monotone, law-invariant functional on $L^\infty$ over a nonatomic probability space has the Fatou property. Consequently, it is $\sigma(L^\infty, L^1)$ lower semicontinuous and admits a dual representation via $L^1$ .
This theorem was recently extended by Chen et al. (Reference Chen, Gao, Leung and Li2022) to general rearrangement-invariant spaces. See Chen et al. (Reference Chen, Gao, Leung and Li2022), Theorems 2.2., 4.3, 4.7) for details. We also refer the reader to Shapiro (2013) for further interesting continuity properties of law-invariant risk measures.
In this section, we aim at extending the above theorem of Ruszczyński and Shapiro on norm continuity of convex functionals. Specifically, we show that the monotonicity assumption can be significantly relaxed to the following notion on order boundedness.
Definition 1.1. Let $\mathcal{X}$ be a vector lattice. For $U, V\in \mathcal{X}$ with $U\leq V$ , the order interval $[U, V]$ is defined by:
A functional $\rho:\mathcal{X}\rightarrow \mathbb{R}$ is said to be order bounded above if it is bounded above on all order intervals.
Monotone functionals are easily seen to be order bounded above. While risk measures are usually assumed to be monotone, many important functionals used in finance, insurance, and other disciplines are not necessarily monotone.
Example 1.2. General deviation measures were introduced in Rockafellar et al. (Reference Rockafellar, Uryasev and Zabarankin2006) as functionals $\rho:L^2 \rightarrow [0,+\infty]$ satisfying subadditivity, positive homogeneity, $\rho(X+c)=\rho(X)$ for every $X\in L^2$ and $c \in \mathbb{R}$ , and $\rho(X) \gt 0$ for nonconstant X. They are usually not monotone but may be order bounded above. A specific example is standard deviation and semideviations. Recall that for a random variable $X\in L^2$ , its standard deviation and upper and lower semideviations are given by:
respectively. They are well known to be convex. It is also easy to see that they are neither increasing nor decreasing. We show that they are all order bounded above on $L^2$ . Indeed, take any order interval $[U, V]\subset L^2$ and any $X\in[U, V]$ . The desired order boundedness property is immediate by the following inequalities:
Example 1.3. General variability measures were introduced in Bellini et al. (Reference Bellini, Fadina, Wang and Wei2022) as law-invariant, positive-homogeneous functionals that vanish on constants. Many of them are also order-bounded above, although usually not monotone. In fact, all the three one-parameter families of variability measures in Bellini et al. (Reference Bellini, Fadina, Wang and Wei2022, Section 2.3) are easily seen to be order bounded above but not monotone. Let us discuss in details the class of inter-ES differences $\Delta_p^\mathrm{ES}$ , $p\in(0,1)$ , on $L^1$ . Let $X\in L^1$ and $p\in (0,1)$ . Recall the right and left expected shortfalls of X:
where $F_X^{-1}(t)=\inf\{x\in\mathbb{R}:\mathbb{P}(X\leq x)\geq t\}$ is the left quantile function of X. The inter-ES difference $\Delta_p^\mathrm{ES}$ is defined by:
$\Delta_p^\mathrm{ES}$ is clearly convex. Take any order interval $[U, V]\subset L^1$ and any $X\in[U, V]$ . By the monotonicity of expected shortfall, we get
This proves that $\mathrm{ES}_p$ is order-bounded above on $L^1$ . Now suppose that $U\in L^1$ follows a uniform distribution on $[-1,1]$ . Then
Thus, $\Delta_p^\mathrm{ES}$ is not monotone.
Our main result in this section is as follows. Recall first that a topological vector space $(\mathcal{X},\tau)$ is a Fréchet lattice if $\mathcal{X}$ is a vector lattice and $\tau$ is induced by a complete metric such that 0 has a fundamental system of convex solid neighborhoods. A neighborhood $\mathcal{V}$ of $0 \in \mathcal{X}$ is solid if $Y \in \mathcal{V}$ whenever $X \in \mathcal{V}$ and $Y \in \mathcal{X}$ satisfy that $|Y| \leq |X|$ . All $L^p$ spaces and Orlicz spaces equipped with their natural norm are Banach lattices and, in particular, are Fréchet lattices.
Theorem 1.4. Let $(\mathcal{X},\tau)$ be a Fréchet lattice. Let $\rho:\mathcal{X} \rightarrow \mathbb{R}$ be a convex, order-bounded above functional. Then $\rho $ is $\tau$ -continuous.
Recall that monotone functionals are order bounded above and Banach lattices are Fréchet. Thus, Theorem 1.4 includes the preceding theorem of Ruszczyński and Shapiro as a special case.
Proof of Theorem 1.4. Let $(X_n)$ and X be such that $X_n\xrightarrow{{\tau}}X $ in $\mathcal{X}$ . We want to show that $\rho(X_n)\rightarrow \rho(X)$ . Suppose otherwise that $\rho(X_n)\not\rightarrow \rho(X)$ . By passing to a subsequence, we may assume that
Let $(\mathcal{V}_n)$ be a basis of 0 for $\tau$ consisting of solid neighborhoods such that $\mathcal{V}_{n+1}+\mathcal{V}_{n+1} \subseteq \mathcal{V}_n$ for each $n\in\mathbb{N}$ . By passing to a further subsequence of $(X_n)$ , we may assume that $n\lvert X_{n}-X\rvert \in \mathcal{V}_n$ for any $n\geq 1$ . Put $W_n=\sum_{i=1}^n i\lvert X_{i}-X\rvert $ . For any $n,m \in \mathbb{N}$ , we have
Thus, $(W_n)$ is a $\tau$ Cauchy sequence. By the completeness of $\tau$ , there exists $Y \in \mathcal{X}$ such that $W_n \xrightarrow{{\tau}} Y$ . Since $(W_n)$ is an increasing sequence, $W_n\uparrow Y$ (Aliprantis and Burkinshaw, Reference Aliprantis and Burkinshaw2006, Theorem 3.46). In particular, it follows that $W_n\leq Y$ so that
Moreover, since $\rho$ is order bounded above on $[X-Y,X+Y]=X+[-Y,Y]$ , there exists a real number $M \gt 0$ such that
Now fix any $\varepsilon \gt 0$ . Put $N=\lfloor \frac{1}{\varepsilon}\rfloor+1$ . By (1.2),
On the one hand, by the convexity of $\rho$ and the representation
we have
implying that
This together with (1.4) and (1.3) implies that
On the other hand, by the convexity of $\rho$ and
we have as before that
for any $n\geq N$ . In particular,
By $X=\frac{1}{2} X_n+\frac{1}{2}(2X-X_n)$ and the convexity of $\rho$ , we also get
so that
It follows that
Combining (1.5) and (1.6), we have
Hence, $\rho(X_n)\rightarrow \rho(X)$ . This contradicts (1.1) and completes the proof.
2. Strong consistency
In this section, we discuss the strong consistency of estimating the risk $\rho(X)$ using estimates drawn from the empirical distributions. This problem has been studied for general convex risk measures on $L^p$ spaces and Orlicz hearts in Shapiro (2013) and Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014), respectively. We are motivated to study it on general Orlicz spaces.
Throughout this section, fix a nonatomic probability space $(\Omega,\mathcal{F},\mathbb{P})$ . Let $L^0$ be the space of all random variables on $\Omega$ , with a.s. equal random variables identified as the same. Let $\mathcal{X}$ be a subset of $L^0$ . Denote the set of distributions of all random variables in $\mathcal{X}$ by:
Recall that a law-invariant functional $\rho$ on $\mathcal{X}$ induces a natural mapping $\mathcal{R}_\rho$ on $\mathcal{M}(\mathcal{X})$ by:
Recall that a sequence $(X_n)$ of random variables is said to be stationary if for any $k, n \in \mathbb{N}$ and any $x_1,...,x_n \in \mathbb{R}$ , it holds that $\mathbb{P}(X_1 \leq x_1,\cdots, X_n \leq x_n)=\mathbb{P}(X_{k+1}\leq x_1\cdots, X_{k+n}\leq x_n)$ . Let $\mathcal{B}$ be the Borel $\sigma$ -algebra on $\mathbb{R}^{\mathbb{N}}$ . A set $A \in \mathcal{F}$ is said to be invariant if there exists $B \in \mathcal{B}$ such that $A=\{(X_n)_{n \geq k} \in B\}$ for every $k\in \mathbb{N}$ . A stationary sequence is said to be ergodic if every invariant set has probability 0 or 1. Birkhoff’s ergodic theorem states that the arithmetic averages of a stationary ergodic sequence $(X_n)$ converge a.s. to $\mathbb{E}[X_1]$ whenever $\mathbb{E}[\lvert X_1\rvert] \lt \infty$ . See Breiman (Reference Breiman1991, Section 6.7) for more facts regarding stationary and ergodic processes.
Let $\mathcal{X}$ be a subset of $L^0$ containing $L^\infty$ . Take any $X\in \mathcal{X}$ . Let $(X_n)$ be a stationary ergodic sequence of random variables with the same distribution as X. We denote the empirical distribution of X arising from $X_1,\ldots,X_n$ by:
here $\delta_x$ is the Dirac measure on $\mathbb{R}$ at x. Since $L^\infty\subset \mathcal{X}$ , $\widehat{m}_n\in \mathcal{M}(L^\infty)\subset \mathcal{M}(\mathcal{X})$ . This allows us to consider the corresponding empirical estimate for $\rho(X)$ :
We say that $\rho$ is strongly consistent at X if for any stationary ergodic sequence of random variables with the same distribution as X:
We refer to Shapiro (2013), Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Reference Krätschmer, Schied and Zähle2017), and the references therein for literature on strong (and weak) consistency of risk measures. In particular, we are interested in the following result.
Theorem (Krätschmer et al., Reference Krätschmer, Schied and Zähle2014, Theorem 2.6). A norm-continuous, law-invariant functional on an Orlicz heart is strongly consistent everywhere.
We remark that this theorem was originally stated for real-valued, law-invariant, convex risk measures. In their definition in Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014), convex risk measures are assumed to be monotone and thus are norm continuous by the aforementioned theorem of Ruszczyński and Shapiro. A quick examination of the proof of Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 2.6) shows that norm continuity and law invariance are the only ingredients of the functional used.
Let’s recall the definitions of Orlicz spaces and hearts. A function $\Phi:[0,\infty)\to[0,\infty)$ is called an Orlicz function if it is nonconstant, convex, increasing, and $\Phi(0)=0$ . The Orlicz space $L^\Phi$ is the space of all $X\in L^0$ such that the Luxemburg norm is finite:
The Orlicz heart $H^\Phi$ is a subspace of $L^\Phi$ defined by:
We refer to Edgar and Sucheston (Reference Edgar and Sucheston1992) for standard terminology and facts on Orlicz spaces. Risk measures on Orlicz spaces have been studied extensively; see, for example, Bellini et al. (Reference Bellini, Laeven and Rosazza Gianin2021), Bellini and Rosazza Gianin (Reference Bellini and Rosazza Gianin2012), Biagini et al. (Reference Biagini, Frittelli and Grasselli2011), Gao et al. (Reference Gao, Leung, Munari and Xanthos2018, Reference Gao, Leung and Xanthos2019, Reference Gao, Munari and Xanthos2020), Gao and Xanthos (Reference Gao and Xanthos2018), and the references therein.
The above theorem in conjunction with Theorem 1.4 immediately yields the following result, which improves Krätschmer et al., (Reference Krätschmer, Schied and Zähle2014), Theorem 2.6.
Corollary 2.1. A convex, law-invariant, order-bounded-above functional on an Orlicz heart is strongly consistent everywhere.
The above theorem of Krätschmer et al. is essentially due to the fact that for any $X\in H^\Phi$ , and for a.e. $\omega\in \Omega$ , there exist a random variable $X^\omega$ on $\Omega$ with same distribution as X and a sequence of random variables $(X_n^\omega)$ on $\Omega$ with distributions $\widehat{m}_n(\omega)$ ’s such that
This, however, does not hold for arbitrary random variables in a general Orlicz space $L^\Phi$ . Specifically, when $\Phi$ fails the $\Delta_2$ -condition, there exists $X\in L^\Phi\backslash H^\Phi$ . For this X, (2.1) must fail: $X_n^\omega$ takes only at most n values and thus is a simple random variable lying in $H^\Phi$ ; therefore, (2.1) would imply $X\in H^\Phi$ as well.
We extend the theorem of Krätschmer et al. as follows. Recall first that on a set $\mathcal{X}\subset L^0$ , a functional $\rho:\mathcal{X}\rightarrow \mathbb{R}$ is said to be order continuous at $X\in\mathcal{X}$ if $\rho(X_n)\rightarrow \rho(X)$ whenever $X_n\xrightarrow{o}X $ in $\mathcal{X}$ . In the literature, order continuity is also termed as the Lebesgue property.
Theorem 2.2. An order-continuous, law-invariant functional on an Orlicz space is strongly consistent everywhere.
For the proof of Theorem 2.2, we need to establish a few technical lemmas, which along the way also reveal why order continuity is the most natural condition for general Orlicz spaces. For an Orlicz function $\Phi$ , the Young class is defined by:
It is easy to see that $H^\Phi\subset Y^\Phi\subset L^\Phi$ . As in Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014), we use the term $\Phi$ -weak topology in place of the $\Phi(\lvert\cdot\rvert)$ -weak topology on $\mathcal{M}(Y^{\Phi})$ for brevity. This topology is metrizable (see e.g., Fölmer and Schied Reference Fölmer and Schied2011, Corollary A.45). For the special case where $\Phi(x)=\frac{x^p}{p}$ for some $1 \leq p \lt \infty$ , the $\Phi$ -weak topology generates the Wasserstein metric of order p (see e.g., Villani Reference Villani2021, Theorem 7.12). Moreover, for a sequence $(\mu_n)\subset \mathcal{M}(Y^{\Phi})$ and $ \mu_0 \in \mathcal{M}(Y^{\Phi})$ , $(\mu_n)$ converges $\Phi$ -weakly to $\mu_0$ , written as $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ , iff
The following Skorohod representation for $\Phi$ -weak convergence is a general order version of Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 3.5) and Krätschmer et al. (Reference Krätschmer, Schied and Zähle2017, Theorem 6.1) beyond the Orlicz heart and without any restrictions on $\Phi$ .
Lemma 2.3.
-
(i) Let $(\mu_n)$ be a sequence in $\mathcal{M}(Y^{\Phi})$ that converges $\Phi$ -weakly to some $\mu_0 \in \mathcal{M}(Y^{\Phi})$ . Then there exist a subsequence $(\mu_{n_k})$ of $(\mu_n)$ , a sequence $(X_k)$ in $Y^{\Phi}$ and $X\in Y^{\Phi}$ such that $X_k$ has distribution $\mu_{n_k}$ for each $k \in \mathbb{N}$ , X has distribution $\mu_0$ , and $X_k \xrightarrow{o} X$ in $Y^{\Phi}$ .
-
(ii) Let $(X_n)$ in $Y^{\Phi}$ and $X \in Y^{\Phi}$ be such that $X_n\xrightarrow{o}X \text{ in } Y^{\Phi}$ . Then $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ , where $\mu_n$ ’s are the distributions of $X_n$ ’s and $\mu_0$ is the distribution of X, respectively.
Proof. We start with the following observation. Since $\Phi$ is continuous and increasing, for any sequence $(X_n)$ in $Y^{\Phi}$ , we have
-
(i). Take $(\mu_n)$ in $\mathcal{M}(Y^{\Phi})$ that converges $\Phi$ -weakly to $\mu_0 \in \mathcal{M}(Y^{\Phi})$ . Since the probability space is nonatomic, the classical Skorohod representation yields $(X_n)\subset Y^\Phi$ and $X\in Y^\Phi$ such that $X_n \sim \mu_n$ for every $n\in\mathbb{N}$ , $X \sim \mu_0$ , and $X_n\xrightarrow{a.s.}X$ . Clearly,
(2.3) \begin{equation}\mathbb{E}[\Phi(|X|)]= \int \Phi(|x|) \mu_{0}(dx)= \lim_n\int \Phi(|x|) \mu_{n}(dx)=\lim_n\mathbb{E}[\Phi(|X_n|)] \lt \infty.\end{equation}Since $\Phi$ is continuous, we also have that $\Phi(\lvert X_n\rvert) \xrightarrow{{a.s.}} \Phi(\lvert X\rvert) $ . This combined with (2.3) yields (see Aliprantis and Burkinshaw, Reference Aliprantis and Burkinshaw2006, Theorem 31.7) that\begin{align*}\bigl\lVert\Phi(\lvert X_n\rvert) -\Phi(\lvert X\rvert)\bigr\rVert_{L^1}\rightarrow 0.\end{align*}Passing to a subsequence we may assume that\begin{align*}\sum_{n=1}^\infty\bigl\lVert \Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rVert_{L^1} \lt \infty\end{align*}so that\begin{align*}\sum_{n=1}^\infty \bigl\lvert\Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rvert\in L^1.\end{align*}In particular,\begin{align*}\sup_{n\in\mathbb{N}}\bigl\lvert\Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rvert\in L^1.\end{align*}It follows from $\Phi(\lvert X_n\rvert)\leq \bigl\lvert\Phi(|X_n|)-\Phi(\lvert X\rvert)\bigr\rvert+\Phi(\lvert X\rvert)$ that $\sup_{n\in\mathbb{N} }\Phi(\lvert X_n\rvert) \in L^1 $ . Hence, by (2.2), $\mathbb{E}[\Phi (\sup_{n \in \mathbb{N}} |X_n|)]=\mathbb{E}[\sup_{n \in \mathbb{N}} \Phi(|X_n|)] \lt \infty$ . That is, $\sup_{n \in \mathbb{N}} \lvert X_n\rvert\in Y^\Phi$ ; equivalently, $(X_n)$ is dominated in $Y^\Phi$ . In particular, we have $X_n \xrightarrow{o}X$ in $Y^{\Phi}$ -
(ii). Let $(X_n)$ be such that $X_n\xrightarrow{o}X$ in $Y^{\Phi}$ and $\mu_n$ be the distribution of $X_n$ for each n, $\mu_0$ be the distribution of X. We clearly have $ \mu_n\xrightarrow{\text{weakly}}\mu $ and by the continuity of $\Phi$ , we get $\Phi(|X_n|) \xrightarrow{{a.s.}} \Phi(|X|)$ . Since $(X_n)$ is dominated in $Y^\Phi$ , $\sup_{n\in\mathbb{N}}\lvert X_n\rvert\in Y^\Phi$ . Thus in view of (2.2), we get
\begin{align*}\mathbb{E}\left[\sup_{n \in \mathbb{N}} \Phi(|X_n|)\right]=\mathbb{E}\left[\Phi \Big(\sup_{n \in \mathbb{N}} |X_n|\Big)\right] \lt \infty,\end{align*}that is, $\sup_{n \in \mathbb{N}} \Phi(|X_n|)\in L^1$ . By the dominated convergence theorem, we get\begin{align*}\int \Phi(|x|) \mu_{0}(dx)=\mathbb{E}[\Phi(|X|)]= \lim_n\mathbb{E}[\Phi(|X_n|)]=\lim_n\int \Phi(|x|) \mu_{n}(dx). \end{align*}This proves $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ .
The lemma below reveals the essential and natural role of order continuity.
Lemma 2.4. Let $\rho:Y^\Phi\rightarrow \mathbb{R}$ be law invariant. The following are equivalent.
-
(i) $\mathcal{R}_\rho$ is continuous on $\mathcal{M}(Y^{\Phi})$ with the $\Phi$ -weak topology.
-
(ii) $\rho$ is order continuous on $Y^\Phi$ .
Proof. (ii) $\implies$ (i). Suppose that (ii) holds but (i) fails. Recall that the $\Phi$ -weak topology is metrizable. Thus, we can find a sequence $(\mu_n)$ and $\mu_0$ in $\mathcal{M}(Y^{\Phi})$ such that $\mu_n\xrightarrow{\Phi\text{-weakly}}\mu_0$ but $\mathcal{R}_\rho (\mu_n) \not\rightarrow \mathcal{R}_{\rho}(\mu_0)$ . Passing to a subsequence, we may assume that
for some $\varepsilon_0 \gt 0$ and all $n \in \mathbb{N}$ . By Lemma 2.3(i), there exist a subsequence $(\mu_{n_k})$ of $(\mu_n)$ , a sequence $(X_k)$ in $Y^{\Phi}$ and $X\in Y^{\Phi}$ such that $X_k$ has distribution $\mu_{n_k}$ for each $k \in \mathbb{N}$ , X has distribution $\mu_0$ , and $X_k \xrightarrow{o} X$ in $Y^{\Phi}$ . Then (ii) implies that
This contradicts (2.4) and proves (ii) $\implies$ (i). The reverse implication (i) $\implies$ (ii) is immediate by Lemma 2.3(ii).
We now present the proof of Theorem 2.2.
Proof of Theorem 2.2. Suppose that $\rho:L^\Phi\rightarrow \mathbb{R}$ is law-invariant and order-continuous. Take any $X\in L^\Phi$ and any stationary ergodic sequence of random variables with the same distribution as X. Denote by $\mu_0$ their distribution. Let $\lambda \gt 0$ be such that $\mathbb{E}[\Phi(\lambda\lvert X\rvert)] \lt \infty$ . Put $\Phi_{\lambda}(\cdot)\,:\!=\,\Phi(\lambda {\cdot})$ . Arguing similarly as in the proof of Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 2.6), by applying Birkhoff’s ergodic theorem, one obtains a measurable subset $\Omega_0$ of $\Omega $ such that $\mathbb{P}(\Omega_0)=1$ and for every $\omega\in \Omega_0$ ,
Since $\rho$ is order-continuous on $L^{\Phi}$ and $Y^{\Phi_\lambda}\subset L^\Phi$ , $\rho$ is also order-continuous on $Y^{\Phi_\lambda}$ . By Lemma 2.4, $\mathcal{R}_\rho$ is continuous on $\mathcal{M}(Y^{\Phi_\lambda})$ with the $\Phi_\lambda$ -weak topology. Thus, $\widehat{\rho}_n(\omega)=\mathcal{R}_\rho(\widehat{m}_n(\omega)) \rightarrow\mathcal{R}_\rho(\mu_0)= \rho(X)$ for every $\omega\in \Omega_0$ . This proves that $\rho$ is strongly consistent at X.
Remark 2.5. In our definition of Orlicz spaces, we do not allow the Orlicz function to take the $\infty$ value, which excludes $L^\infty$ from the above considerations. However, Theorem 2.2 remains true for $L^\infty$ . Let $\rho:L^\infty \rightarrow \mathbb{R}$ be an order-continuous, law-invariant functional. Take any $X\in L^\infty$ and any stationary ergodic sequence $(X_n)$ of random variables with the same distribution as X. Denote by $\mu_0$ their distribution. By Birkhoff’s ergodic theorem and an application of Theorem 6.6 in Parthasarathy (Reference Parthasarathy1967, Chapter 1), there exists a measurable subset $\Omega_0$ of $\Omega $ such that $\mathbb{P}(\Omega_0)=1$ and $\widehat{m}_n(\omega)\xrightarrow{\text{weakly}}\mu_0$ for any $\omega \in \Omega_0$ . One may assume further that $\lvert X(\omega)\rvert \leq \lVert X\rVert_{\infty}$ and $\lvert X_n(\omega)\rvert \leq \lVert X\rVert_{\infty} $ for any $\omega \in \Omega_0$ and any $n\geq 1$ . Fix any $\omega \in \Omega_0$ . The classical Skorohod representation yields $(X_n^{\omega})\subset L^{\infty}$ and $X^\omega\in L^{\infty}$ such that $X_n^{\omega} \sim \widehat{m}_n(\omega)$ for every $n\in\mathbb{N}$ , $X^\omega \sim \mu_0$ , and $X_n^{\omega}\xrightarrow{a.s.}X^\omega$ . We may assume that $\lvert X_n^{\omega}\rvert \leq \lVert X\rVert_{\infty}$ on $\Omega$ for every $n\geq 1$ . It follows that $X_n^{\omega} \xrightarrow{{o}} X^\omega$ in $L^\infty$ . Hence, by order continuity of $\rho$ , we get $\widehat{\rho}_n(\omega)=\mathcal{R}_\rho(\widehat{m}_n(\omega))=\rho(X_n^{\omega}) \rightarrow \rho(X^\omega)=\rho(X)$ . This proves that $\rho$ is strongly consistent on $L^\infty$ .
Order continuity of law-invariant functionals on Orlicz spaces is generally stronger than norm continuity. In the following, we show that it is satisfied by a large class of risk measures, namely, spectral risk measures. Spectral risk measures were introduced in Acerbi (Reference Acerbi2002) and includes many important risk measures such as the expected shortfall. Let $\phi$ be a nonnegative and nondecreasing function such that $\int_0^1 \phi(t)dt=1$ ( $\phi$ is called a spectrum). The associated spectral risk measure is defined by:
where $F_X^{-1}(t)=\inf\{x \in \mathbb{R}: F(x) \geq t\}$ is the left quantile function of X. It is known that $\rho_\phi$ takes values in $({-}\infty,\infty]$ and is convex, monotone, and lower semicontinuous with respect to the $L^1$ norm (see e.g., Amarante and Liebrich Reference Amarante and Liebrich2024, Lemma C.1). For spectral risk measures, the empirical estimator $\widehat{\rho}_n$ has the form of an L-statistic and the strong consistency can be studied using tools from the theory of L-statistics (see e.g., Tsukahara Reference Tsukahara2014). Below we give a simple proof of the strong consistency of $\rho_{\phi}$ in the Orlicz space framework based on Theorem 2.2.
Corollary 2.6. Let $\phi$ be a spectrum function such that $\rho_{\phi}$ is real-valued on $L^\Phi$ . Then $\rho_{\phi}$ is order continuous on $L^{\Phi}$ and is thus strongly consistent everywhere on $L^\Phi$ .
Proof. Suppose that $X_n \xrightarrow{{a.s}} X$ and there exists $Y \in L^\Phi$ such that $\lvert X_n\rvert\leq Y$ for every $n\in\mathbb{N}$ . It is well known that $F_{X_n}^{-1} \xrightarrow{{a.s.}} F_X^{-1} $ on (0,1) (with the Lebesgue measure). Hence,
Next, note that since $-Y\leq X_n\leq Y$ , $F_{-Y}^{-1}\leq F_{X_n}^{-1}\leq F_{Y}^{-1}$ on (0,1). Thus,
Since $\rho$ is real-valued on $L^\Phi$ , $\phi F_{Y}^{-1}\in L^1$ and $\phi F_{-Y}^{-1}\in L^1$ . Thus, by the dominated convergence theorem,
This proves that $\rho_\phi$ is order continuous on $L^\phi$ . The strong consistency follows from Theorem 2.2.
We end this note with the following remark that improves the implication (b) $\implies $ (a) in Krätschmer et al. (Reference Krätschmer, Schied and Zähle2014, Theorem 2.8) due to our Theorem 1.4.
Corollary 2.7. Suppose that $\Phi$ satisfies the $\Delta_2$ -condition. Let $\rho$ be any convex, law-invariant, order-bounded-above functional on $L^\Phi$ . Then $\mathcal{R}_\rho$ is continuous on $\mathcal{M}(L^\Phi)$ for the $\Phi$ -weak topology.
Acknowledgement
The authors acknowledge the support of NSERC Discovery Grants.