1 Introduction
One of the classical results in ergodic theory is the Birkhoff ergodic theorem.
Theorem 1.1. (Birkhoff)
Let $(X,\mathfrak {X},m)$ be a probability space, $\tau :X\to X$ an ergodic map, and $f\in L^1(X,\mathbb {C})$ . Then, for m-almost every $x\in X$ ,
The aim of this work is to study this result in the context of CAT(0) spaces. The traditional extensions of the Birkhoff theorem to this setting replace the arithmetic means by the so-called barycenters. In particular, the barycenters are used to average the function along the ergodic orbit. Our motivation to consider variants of these traditional extensions is that usually the barycenters cannot be computed explicitly. Moreover, in many important cases, the usual ways to approximate the barycenters using convex optimization methods are not useful for applications (see Example 2.3). A similar situation can be found in the extensions of other well-known theorems to CAT(0) spaces. This is the case for the law of large numbers. In [Reference Sturm, Auscher, Coulhon and Grigor’yan30] Sturm introduced the so-called inductive means, which can be computed easily in those spaces where the geodesics are known. Using these means to average the independent copies of the random variable, he obtained in [Reference Sturm, Auscher, Coulhon and Grigor’yan30] a new version of the law of large numbers in CAT(0) spaces. Motivated by this result, we studied a version of the classical Birkhoff ergodic theorem using the inductive means defined by Sturm to average the functions along the ergodic orbit. As a consequence, we get new ways to approximate a barycenter of integrable function with values in a CAT(0) space.
1.1 Framework and related results
Recall that a CAT(0) space, also called a Hadamard space, is a complete metric space $(M,\delta )$ whose metric satisfies the following semiparallelogram law: given $x, y \in M$ , there exists $m \in M$ satisfying
for all $z \in M$ . The point m is unique, and it is called the midpoint between x and y, because, taking $z=x$ and $z=y$ the following identities hold:
The existence and uniqueness of midpoints give rise to a unique (continuous) geodesic which we will denote by $\gamma :[0,1]\to M$ (see §2 for more details). We will denote this curve by $x \#_t y$ instead of $\gamma (t)$ . Typical examples of CAT(0) spaces are the Riemannian manifolds with non-positive sectional curvature, and certain types of graphs such as trees or spiders. The systematic study of these spaces started with work by Alexandrov [Reference Alexandrov1] and Reshetnyak [Reference Reshetnyak28], and the subject was strongly influenced by the works of Gromov [Reference Gromov13, Reference Gromov and Gerten14]. Nowadays there exists a huge bibliography on the subject. The interested reader is referred to the monographs [Reference Ballmann4, Reference Ballmann, Gromov and Schroeder5, Reference Barbaresco and Nielsen10, Reference Jost15].
The convexity properties of the metric allow us to define a notion of barycenter in CAT(0) spaces. Endowed with this barycenter, Hadamard spaces play an important role in the theory of integrations (random variables, expectations and variances), law of large numbers, ergodic theory, Jensen’s inequality (see [Reference Bridson and Haefliger9, Reference Es-Sahib and Heinich12, Reference Lawson and Lim18, Reference Navas24, Reference Sturm, Auscher, Coulhon and Grigor’yan30]), stochastic generalization of Lipschitz retractions and extension problems of Lipschitz and Hölder maps (see [Reference Lee and Naor19, Reference Mendel and Naor22, Reference Ohta25]), optimal transport theory on Riemannian manifolds (see [Reference Pass26, Reference Pass27]), and so on.
Roughly speaking, the barycenter constitutes a way to average points in M, taking into account the metric properties of the space. More precisely, the barycenter is defined for some measures with separable support (see §2 for the formal definition). Given n points in the space M, let $\beta (x_1,\ldots ,x_n)$ denote the barycenter of the points $x_1,\ldots ,x_n$ (more precisely, the barycenter of the point measure $\mu =\delta _{x_1}+\cdots +\delta _{x_n}$ ). On the other hand, if $(X,\mathfrak {X},\mu )$ is a measure space and $f:X\to M$ is a measurable function such that for some $y\in M$ (and therefore for any $y\in M$ )
then $\beta _f$ denotes the barycenter of the pushforward measure $f_*(\mu )$ .
Any Hilbert space $\mathcal {H}$ , and in particular $\mathbb {C}$ , is a CAT(0) space with the metric induced by the norm, and the barycenter in $\mathcal {H}$ is precisely the arithmetic mean. Therefore, the natural extension of the Birkhoff theorem for a function f satisfying (1.3) is obtained by replacing the arithmetic means by the barycenters
This result was proved by Austin in [Reference Austin2] for functions satisfying the integrability condition
instead of (1.3). Later on, in [Reference Navas24] Navas proved it for functions satisfying (1.3). In both cases, the authors considered not only $\mathbb {Z}$ -actions but also much more general actions given by amenable groups. Moreover, Navas’s theorem holds not only in CAT(0) spaces but also in metric spaces of non-positive curvature in the sense of Busemann.
However, the barycenters $\beta (f(x),\ldots ,f(\tau ^{n-1}(x)))$ in (1.4) may be very difficult to compute for $n\geq 4$ . Hence, it is natural to look for an alternative way to average the points $f(x),\ldots ,f(\tau ^{n-1}(x))$ . This leads to the definition of inductive means. To motivate their definition, note that, given a sequence $\{a_n\}_{n\in \mathbb {N}}$ of complex numbers,
Let $\gamma _{a,b}(t)=t\,b+(1-t)a$ , and for a moment let us use the notation $a\,\sharp _t\, b=\gamma _{a,b}(t)$ . Then
and so on and so forth. The segments are the geodesics in the euclidean space. Thus, in our setting, we can replace the segments by the geodesic associated to the Hadamard space. This is the idea that leads to the definition of the inductive means. Given a sequence $\{a_n\}_{n\in \mathbb {N}}$ whose elements belong to a CAT(0) space M, the inductive means are defined as follows:
These means were introduced by Sturm in [Reference Sturm, Auscher, Coulhon and Grigor’yan30], where he proved the following version of the law of the large numbers.
Theorem 1.2. (Sturm)
Let $(X,\mathfrak {X},\mu )$ be a probability space, and let $A=\{A_j\}_{j\in \mathbb {N}}$ be a sequences of independent and identically distributed bounded random variables satisfying (1.5). Then, almost surely,
This result suggests the possibility of finding extensions of the Birkhoff ergodic theorem using inductive means to average the values of the function on the ergodic orbit. Note that, if we want to use inductive means, then we are compelled to consider only $\mathbb {Z}$ -actions.
1.2 Main results
Let $(G,+)$ be a compact and metrizable topological group. In this group we fix a Haar measure m, a shift-invariant metric ${d}_{G}$ , and we take an ergodic automorphism $\tau (h)=h+g$ for some $g\in G$ . Note that the existence of such an ergodic automorphism implies that the group must be abelian (see [Reference Walters32, Theorem 1.9])
On the other hand, let $(M,\delta )$ be a fixed CAT(0) space. Given a function $A: G \rightarrow M$ , we define $a^{\tau } : G \rightarrow M^{\mathbb {N}}$ by
Our first main theorem is the following continuous version of the ergodic theorem.
Theorem 1.3. Let M be a Hadamard space and $A:G\to M$ a continuous function. Then
uniformly in $g\in G$ .
To extend this result to $L^1(G,M)$ functions, we need to find ‘good $L^1$ -approximations by continuous functions’. These approximations are obtained in §3.3, where we study mollifiers in general Hadamard spaces. The results on mollifiers obtained in this subsection are of interest in their own right, since they generalize some results proved by Karcher in [Reference Karcher16] for Riemannian manifolds. Using this $L^1$ -approximation we get the following $L^1$ version of the ergodic theorem.
Theorem 1.4. Given $A\in L^1(G, M)$ , for almost every $g\in G$ ,
From this result, using standard techniques we get the following $L^p$ versions.
Theorem 1.5. Let $1 \leq p < \infty $ and $A\in L^p(G, M)$ . Then
Recall that a topological dynamical system $(\Omega ,\tau )$ is called a Kronecker system if it is isomorphic to a group dynamical system $(G,\tau )$ like the one described above. Also recall that any equicontinuous dynamical system becomes an isometric system by changing the metric, and any minimal isometric dynamical system is a Kronecker system (see, for example, [Reference Sturm31, §2.6]). Therefore, using standard arguments, all the main results of this work can be extended to equicontinuous systems. In order to go further and consider more general dynamical systems we think that a different approach is required.
1.3 Organization of the paper
The paper is organized as follows. Section 2 is devoted to gathering together some preliminaries on Hadamard spaces, barycenters and inductive means that will be used throughout the paper. Section 3 is devoted to the proofs of our main results. In this section we also prove those results related to approximation by continuous functions in general Hadamard spaces.
2 Preliminaries
In this section we recall some results on CAT(0) spaces, barycenters, as well as proving some results on inductive means that we will need later. The interested reader is referred to the monographs [Reference Bačák3–Reference Ballmann, Gromov and Schroeder5, Reference Barbaresco and Nielsen10, Reference Jost15] for more information.
2.1 CAT(0) spaces
Recall that a CAT(0) space, also known as Hadamard space, is a complete metric space $(M,\delta )$ that satisfies the following semiparallelogram law: given $x, y \in M$ , there exists $m \in M$ satisfying
for all $z \in M$ . The point m is unique, and it is called midpoint between x and y, since $ \delta (x,m)=\delta (m,y)=\tfrac 12 \delta (x,y)$ . Recall that, given a continuous curve $\gamma :[a,b]\to M$ , its length is computed as
where the infimum is taken over all the partitions $\{t_1,\ldots ,t_N\}$ of the interval $[a,b]$ . The existence and uniqueness of midpoints give rise to a unique (continuous) geodesic $\gamma _{x,y} : [0, 1] \rightarrow M$ connecting any given two points x and y. Indeed, we first define $\gamma _{x,y}(1/2)$ to be the midpoint of x and y. Then, using an inductive argument, we define the geodesic for all dyadic rational numbers in $[0, 1]$ . Finally, by completeness, it can be extended to all $t \in [0, 1]$ . It can be proved that this curve is the shortest path connecting x and y. As we mentioned in the introduction, we will use the notation $x \#_t y$ instead of $\gamma _{x,y}(t)$ . It is not difficult to see that the points of this geodesic satisfy the following generalized semiparallelogram inequality:
As consequence of this inequality the next result on the convexity of the metric is obtained (see, for example, [Reference Sturm, Auscher, Coulhon and Grigor’yan30, Corollary 2.5]).
Proposition 2.1. Given four points $a, a^{\prime }, b, b^{\prime } \in M$ , let
Then f is convex on $[0, 1]$ ; that is,
Another very important result in CAT(0) spaces is the so-called Reshetnyak quadruple comparison theorem (see, for example, [Reference Sturm, Auscher, Coulhon and Grigor’yan30, Proposition 2.4]).
Theorem 2.2. Let $(M, \delta )$ be a Hadamard space. For all $x_1, x_2, x_3, x_4 \in M$ ,
2.2 Barycenters in CAT(0) spaces
Let $\mathcal {B}(M)$ be the $\sigma $ -algebra of Borel sets (that is, the smallest $\sigma $ -algebra that contains the open sets). Denote by $\mathcal {P}(M)$ the set of all probability measures on $\mathcal {B}(M)$ with separable support, and for $1 \leq \theta < \infty $ , let $\mathcal {P}^{\theta }(M)$ be the set of those measures $\mu \in \mathcal {P}(M)$ such that
for some (and hence for all) $x \in M$ . By means of $\mathcal {P}^{\infty }(M)$ we will denote the set of all measures in $\mathcal {P}(M)$ with bounded support. Finally, given a measure space $(X,\mathfrak {X},\mu )$ and a measurable function $f:X\to M$ , we say that f belongs to $L^p(X,M)$ if the pushforward of $\mu $ by f belongs to $\mathcal {P}^p(M)$ ( $1\leq p\leq \infty $ ).
If $\mu \in \mathcal {P}^2(M)$ , then the usual Cartan definition of barycenter $\beta _\mu $ can be extrapolated to this setting:
The existence of a unique minimizer is guaranteed by the convexity properties of the metric. This definition can be extended to measures in $\mathcal {P}^1(M)$ . Following the ideas of Sturm in [Reference Sturm, Auscher, Coulhon and Grigor’yan30], given any point $y\in M$ , the barycenter of a measure $\mu \in \mathcal {P}^1(M)$ is defined as the unique minimizer of the functional
Although the functional depends on the point y, it is easy to see that the minimizer is independent of it. Hence, the barycenter is well defined. Moreover, if $\mu \in \mathcal {P}^2(M)$ this definition coincides with Cartan’s definition. Note that in this case the quantity
can be thought as a variance. Moreover, the barycenters in this case also satisfy the following inequality known as the variance inequality:
Therefore, sometimes the barycenter is considered as a nonlinear version of the expectations. For instance, this idea was used by Sturm to extend different result from probability theory to this nonlinear setting (see [Reference Sturm, Auscher, Coulhon and Grigor’yan30, Reference Tao29] and the references therein).
Special notation
As we mentioned in the introduction, we will use a special notation in the following two cases. On the one hand, let $(X,\mathfrak {X},\mu )$ be a measure space and let $f:X\to M$ be a measurable function in $L^1(X,M)$ . By means of $\beta _f$ we will denote the barycenter of the pushforward measure $f_*(\mu )$ . On the other hand, given n points $x_1,\ldots ,x_n\in M$ , by means of $\beta (x_1,\ldots ,x_n)$ we will denote the barycenter of the point measure $\mu =\delta _{x_1}+\cdots +\delta _{x_n}$ .
The main issue dealing with barycenters is that they are difficult to compute. The computation of the barycenter of three or more points may be difficult. Although there exists a very rich convex theory in Hadamard space (see, for instance, [Reference Bačák3]), sometimes the approximation of the barycenter using convex optimization is not satisfactory. A good example of this situation is as follows.
Example 2.3. (Positive matrices)
Recall that the set of positive invertible matrices $\mathcal {M}_n(\mathbb {C})^+$ is an open cone in the real vector space of self-adjoint matrices $\mathcal {H}(n)$ . In particular, it is a differentiable manifold and the tangent spaces can be identified for simplicity with $\mathcal {H}(n)$ . The manifold $\mathcal {M}_n(\mathbb {C})^+$ can be endowed with a natural Riemannian structure. With respect to this metric structure, if $\alpha :[a,b]\to \mathcal {M}_n(\mathbb {C})^+$ is a piecewise smooth path, its length is defined by
where $\|\cdot \|_2$ denotes the Frobenius or Hilbert–Schmidt norm. In this way, $\mathcal {M}_n(\mathbb {C})^+$ becomes a Riemannian manifold with non-positive curvature, and in particular a CAT(0) space. The geodesic connecting two positive matrices A and B has the following simple expression:
So, the barycenter of the measure $\mu =\tfrac 12(\delta _{A}+\delta _{B})$ is given by
However, if we add an atom to $\mu $ , there is no longer a closed formula for the barycenter (also called the geometric mean in this setting). As a consequence, simple questions such as the monotonicity of the barycenter with respect to the usual order of matrices become difficult. Using convex optimization, it is possible to construct a sequence that approximates the barycenter of a measure. However, that sequence does not contain enough information in order to prove that the barycenter is monotone. This issue, for instance, motivated intensive research with the aim of finding good ways to approximate the barycenters of more than two matrices [Reference Bini and Iannazzo7, Reference Lawson and Lim17, Reference Lim and Pálfia20]. The barycenters in this setting have attracted much attention in recent years because of their interesting applications in signal processing (see [Reference Bhatia and Karandikar6] and the references therein), and gradient or Newton-like optimization methods (see [Reference Bochi and Navas8, Reference Moakher and Zerai23]).
2.3 The inductive means
Recall that, given $a \in M^{\mathbb {N}}$ , the inductive means are define as:
As a consequence of (2.3), we directly get the following result.
Corollary 2.4. Given $a,b \in M^{\mathbb {N}}$ , then
The next lemma follows from (2.2), and it is a special case of a weighted inequality considered by Lim and Pálfia in [Reference Lim and Pálfia21].
Lemma 2.5. Given $a \in M^{\mathbb {N}}$ and $z\in M$ , for every $k,m \in \mathbb {N}$ ,
Proof. By the inequality (2.2) applied to $S_{n+1}(a)=S_n(a)\,\#_{n+1}\,(a_{n+1})$ we obtain
Summing these inequalities from $n=k$ to $n=k+m-1$ , we get that the difference
obtained from the telescopic sum of the left-hand side, is less than or equal to
Finally, using that $({k+j})/({k+j+1})\geq {k}/({k+m})$ for every $j\in \{0,\ldots ,m-1\}$ , this sum is bounded from above by
which completes the proof.
Given a sequence $a \in M^{\mathbb {N}}$ , let $\Delta (a)$ denote the diameter of its image, that is,
Note that, also by (2.2), $\delta (S_n(a), a_k) \leq \Delta (a)$ for all $n, k \in \mathbb {N}$ .
Lemma 2.6. Given $a\in M^{\mathbb {N}}$ such that $\Delta (a) < \infty $ , we have for all $k,m \in \mathbb {N}$ that
where $\tilde {R}_{m,k}=(({m^2}/{(k+1)^2}) + 2({m}/({k+1}))) \Delta ^2(a)$ .
Proof. Note that by (2.6) and for all k,
Hence
Therefore, for every $j\leq m$ ,
where we have used that $\delta (S_{k+j}(a), a_{k+j+1}) \leq \Delta (a)$ for every $k,j\in \mathbb {N}$ . Summing up these inequalities and dividing by m, we get the desired result.
3 Proof of the main results
3.1 Continuous case
In this section we will prove Theorem 1.3. Recall that, given a function $A: G \rightarrow M$ , we define $a^{\tau } : G \rightarrow M^{\mathbb {N}}$ by
The proof is rather long and technical, so we split it into several lemmas and a technical result, which will be combined at the end of the section to provide the proof of Theorem 1.3.
Lemma 3.1. Let $A: G\rightarrow M$ be a continuous function, and let K be any compact subset of M. For each $n\in \mathbb {N}$ , define $F_n :G \times K \rightarrow \mathbb {R}$ by
Then the family $\{F_n\}_{n \in \mathbb {N}}$ is equicontinuous.
Proof. By the triangular inequality, the map $y\mapsto \delta ^2(A(\cdot ),y)$ is continuous from $(K,\delta )$ into the set of real-valued continuous functions defined on G endowed with the uniform norm. Since K is compact, the family $\{\delta ^2(A(\cdot ),x)\}_{x\in K}$ is (uniformly) equicontinuous. Hence, given $\varepsilon>0$ , there exists $\delta>0$ such that if $d_G(g_1,g_2) < \delta $ then
for every $x\in K$ . Since $\tau $ is isometric and $d_G(g_1,g_2) < \delta $ , we get that
Let $\Delta $ be the diameter of the set $\,(\mbox {Image}(A)\times K)$ in $M^2$ . Since both sets are compact, $\Delta <\infty $ . So, take $(g_1,x_1)$ and $(g_2,x_2)$ such that $d_G(g_1,g_2)<\delta $ and $\delta (x_1,x_2)<{\varepsilon }/{4\Delta }$ . Then
Now, as a consequence of the Arzelà–Ascoli and Birkhoff theorems, we get the following proposition.
Proposition 3.2. Let $A:G\to M$ be a continuous function, and K a compact subset of M. Then
and the convergence is uniform in $(g, x) \in G \times K$ .
From now on we will fix the continuous function $A:G\to M$ . Let
and ${\beta }_{\mbox{A}}$ is the point where this minimum is attained, that is, ${\beta }_{\mbox{A}}$ is the barycenter of the pushforward by A of the Haar measure in G. Then we obtain the following upper estimate.
Lemma 3.3. For every $\varepsilon> 0$ , there exists $m_0 \in \mathbb {N}$ such that, for all $m \geq m_0$ and for all $k \in \mathbb {N}$ ,
Proof. For every $\varepsilon> 0$ , there exists $m_0 \in \mathbb {N}$ such that for all $m \geq m_0$ ,
Note that $m_0$ is independent of k by Proposition 3.2. Now, by Lemma 2.5,
Since $A:G\to M$ is continuous, note that
where, as we have defined before Lemma 2.6, $\Delta (a^{\tau }(g))$ denotes the diameter of the image of the sequence $a^{\tau }(g)$ .
Lemma 3.4. For every $\varepsilon> 0$ , there exists $m_0 \in \mathbb {N}$ such that for all $m \geq m_0$ and for all $k \in \mathbb {N}$ ,
where $\displaystyle R_{m,k}= ({m^2}/{(k+1)^2}) + 2({m}/({k+1})) C_a^2$ .
Proof. Consider the compact set
where the convex hull is in the geodesic sense. For every $\varepsilon> 0$ , there exists $m_0 \in \mathbb {N}$ such that, for all $m \geq m_0$ , by the variance inequality (2.5) and Proposition 3.2,
Finally, by Lemma 2.6,
where $\displaystyle R_{m,k}=(({m^2}/{(k+1)^2}) + 2({m}/({k+1}))) C_a$ .
Lemma 3.5. Given $\varepsilon> 0$ , there exists $m_0\geq 1$ such that for every $\ell \in \mathbb {N}$ ,
uniformly in $g\in G$ , where $L = \alpha + 3 C_a^2$ .
Proof. Fix $\varepsilon> 0$ . By Lemmas 3.3 and 3.4, there exists $m_0 \geq 1$ such that for all $k \in \mathbb {N}$ ,
and
Therefore, combining these two inequalities, we obtain
Consider now the particular case where $k=\ell m_0$ . Since $\displaystyle R_{m_0,\ell m_0} \leq ({3}/{\ell }) C_a^2$ , we get
Using this recursive inequality, the result follows by induction on $\ell $ . Indeed, if $\ell =1$ then
On the other hand, if we assume that the result holds for some $\ell \geq 1$ , that is,
then, combining this inequality with (3.1), we have that
Now we are ready to prove the ergodic formula for continuous functions.
Proof of Theorem 1.3
Given $\varepsilon> 0$ , by Lemma 3.5, there exists $m_0 \in \mathbb {N}$ such that
for every $\ell \in \mathbb {N}$ . Take $\ell _0 \in \mathbb {N}$ such that for all $\ell \geq \ell _0$ ,
Let $n=\ell m_0+d$ such that $\ell \geq \ell _0$ and $d\in \{1,\ldots , m_0-1\}$ . Since $x \#_{t} x = x$ for all $x \in M$ , using Corollary 2.4 with the sequences
we get
Now, taking into account that $\delta(S_{\ell m_0}(a^{\tau}(g)),a^\tau_{\ell m_0+j}(g))\leq C_a$ for every $j\in\{1,\ldots, m_0-1\}$ , we obtain that
Combining this with (3.2) we conclude that, for n big enough, $\delta(S_{n}(a^{\tau}(g)), {\beta }_{\mbox{A}})<\varepsilon$ .
3.2 Preparation for the $L^{1}$ case
The natural framework for the ergodic theorem is $L^1$ . In this section we will firstly prove Theorem 1.4, which is the main result of this paper.
The strategy of the proof involves constructing good approximations by continuous functions, and obtaining the result for $L^1$ functions as a consequence of the theorem for continuous functions (Theorem 1.3 above). So, the first questions that arise are: what does good approximation mean and what should we require of the approximation in order to get the $L^1$ case as a limit of the continuous case? The next two lemmas contain the clues to answer these two questions. The first lemma can be found as [Reference Sturm, Auscher, Coulhon and Grigor’yan30, Theorem 6.3], and it is often called fundamental contraction property. For the sake of completeness we include a simple proof of this fact.
Lemma 3.6. Let $(\Omega ,\mathcal {B}, P)$ be a probability space, and $A, B \in L^1(X, M)$ . If
then
Remark 3.7. Recall that the definition of ${\beta }_{\mbox{A}}$ (respectively, ${\beta }_{\mbox {B}}$ ) does not depend on the chosen $y \in M$ .
Proof. By the variance inequality (2.5) we get
and the combination of these two inequalities leads to
Finally, using the Reshetnyak quadruple comparison theorem (Theorem 2.2), we obtain
which is, after some algebraic simplification, the desired result.
Lemma 3.8. Let $A, B \in L^1(G, M)$ . Given $\varepsilon>0$ , for almost every $g\in G$ there exists $n_0$ , which may depend on g, such that
provided $n\geq n_0$ .
Proof. Indeed, by Corollary 2.4,
and therefore, the lemma follows from the Birkhoff ergodic theorem.
3.3 Good approximation by continuous functions
The previous two lemmas indicate that we need a kind of $L^1$ approximation. More precisely, given $A\in L^1(G,M)$ and $\varepsilon>0$ , we are looking for a continuous function $A_\varepsilon :G\to M$ such that
In some cases there exists an underlying finite-dimensional vector space. This is the case, for instance, when M is the set of (strictly) positive matrices, or more generally, when M is a Riemannian manifold with non-positive curvature. In these cases, the function $A_\varepsilon $ can be constructed by using mollifiers. This idea was used by Karcher in [Reference Karcher16]. In the general case, we can use a similar idea.
Given $\eta>0$ , let $U_\eta $ be a neighborhood of the identity of G so that $m(U_\eta )<\eta $ , whose diameter is also less than $\eta $ . Fix any $y\in M$ , and define
Equivalently, $A_\eta (g_0)$ is the barycenter of the pushforward by A of the Haar measure restricted to $g_0+U_\eta $ . This definition follows the idea of mollifiers, replacing the arithmetic mean by the average induced by barycenters. We will prove that, as in the case of usual mollifiers, these continuous functions provide good approximation in $L^1$ (Theorem 3.12 below). With this aim, we first prove the following lemma.
Lemma 3.9. Let $A \in L^p(G, M)$ where $1 \leq p < \infty $ . The function defined by $\varphi : G \rightarrow [0,+\infty )$ by
is a continuous function.
Proof. Fix $z_0 \in M$ , and define the measure
on the Borel sets of G. By definition, $\nu $ is absolutely continuous with respect to the Haar measure m. In consequence, given $\varepsilon> 0$ , there exists $\eta> 0$ , such that, whenever a Borel set B satisfies
the inequality
holds. By the Lusin theorem [Reference Dudley11, Theorem 7.5.2], there is a compact set $C_{\eta } \subset G$ such that $m(C_{\eta }) \geq 1 - \eta /2$ and the restriction of A to $C_{\eta }$ is (uniformly) continuous.
Since m is a Haar measure, it is enough to prove the continuity of $\varphi $ at the identity. With this aim in mind, take a neighborhood of the identity U so that whenever $g_1,g_2\in C_\eta $ satisfy that $g_1-g_2\in U$ , we have
Given $h\in U$ , define $\Omega := C_{\eta } \cap (C_{\eta } + h)$ , and $\Omega ^c :=G \setminus \Omega $ . Then
where in the last identity we have used that m is shift invariant. Since $|\Omega ^c|<\eta $ we obtain that
Corollary 3.10. For every $\eta>0$ , the functions $A_\eta $ are continuous.
Proof. Indeed, by Lemma 3.6
So the continuity of $A_\eta $ is a consequence of the continuity of $\varphi $ at the identity.
The map $A\mapsto A_\varepsilon $ has the following useful continuity property.
Lemma 3.11. Let $A, B \in L^1(G, M)$ , and $\eta> 0$ . For every $\varepsilon>0$ , there exists $\rho>0$ such that if
then the corresponding continuous functions $A_\eta $ and $B_\eta $ satisfy that
Proof. Indeed, given $\varepsilon>0$ , take $\rho =m(U_{\eta })\varepsilon $ . Then, by Lemma 3.6,
for all $g \in G$ .
We arrive at the main result on approximation.
Proposition 3.12. Given a function $A\in L^1(G,M)$ , if $A_\eta $ are the continuous functions defined by (3.5) then
Proof. First, assume that $A\in L^2(G,M)$ . In this case, by the variance inequality, the inequality
holds. So, using Fubini’s theorem, we obtain
By Lemma 3.9, the function $\varphi $ is continuous. In consequence, if e denotes the identity of G, then
This proves the result for functions in $L^2(G,M)$ since, by Jensen’s inequality,
Now consider a general $A \in L^1(G, M)$ . Fix $z_0\in M$ , and for each natural number N define the truncations
For each N we have that $A^{(N)}\in L^1(G,M)\cap L^\infty (G,M)$ , and therefore it also belongs to $L^2(G,M)$ . On the other hand, since the function defined on G by $g\mapsto \delta (A(g),z_0)$ is integrable, we have that
So, if $A_\eta $ and $A^{(N)}_\eta $ are the continuous functions associated to A and $A^{(N)}$ respectively, then
Note that each term of the right-hand side tends to zero: the first by (3.7), the second by the $L^2$ case done in the first part, and the last by Lemma 3.11.
3.4 The $L^{1}$ case and almost everywhere convergence
Let $\varepsilon>0$ . For each $k\in \mathbb {N}$ , let $A_k$ be a continuous function such that
By Lemma 3.8, we can take a set of measure zero $N\subseteq G$ such that if we take $g\in G\setminus N$ and $k\in \mathbb {N}$ , there exists $n_0$ , which may depend on g and k, such that
provided $n\geq n_0$ . In this expression, $a_{(k)}^{\tau }$ is the sequence defined in terms of $A_k$ and $\tau $ as in (1.6). Fix $g\in G\setminus N$ . Taking k so that $1/k<\varepsilon /4$ , we get that
for every $n\geq n_0$ . By Lemma 3.6, we also have that $ \delta ({\beta }_{\mbox{A}}, {\beta }_{\mbox {A_k}}) \leq {\varepsilon }/{4}, $ where
Finally, by Theorem 1.3, there exists $n_1\geq 1$ such that, for every $n\geq n_1$ ,
Combining all these inequalities, we obtain that
which concludes the proof.
3.5 The $L^p$ results
We conclude this section by proving the $L^p$ ergodic theorems.
Theorem 3.13. Let $1 \leq p < \infty $ and $A\in L^p(G, M)$ . Then
Proof. Let us define the following measure on the Borel sets of G:
By definition, $\nu $ is absolutely continuous with respect to the Haar measure m. In consequence, given $\varepsilon> 0$ , there exists $\eta> 0$ such that, whenever a Borel set B satisfies
we have that
By Egoroff’s theorem, as
converge almost everywhere on a finite measure space, there exists a set $C_{\eta } \subset G$ with $m(C_{\eta }) < \eta $ such that
uniformly on $G \setminus C_\eta .$
On the other hand,
Therefore, by Jensen’s inequality,
Now, there exists $n_0 \in \mathbb {N}$ such that, for all $n \geq n_0$ ,
as a consequence of (3.10). Therefore
On the other hand, taking integral over $C_\eta $ in (3.11), we obtain
Since $\displaystyle \int _{C_{\eta }} \,d{\kern-0.6pt}m(g) < \eta , $ by (3.9),
So, for all $n \in \mathbb {N}$ ,
Finally, combining this two bounds, given $\varepsilon $ , there exists $n_0 \in \mathbb {N}$ such that, for all $n \geq n_0$ ,
which concludes the proof.
Acknowledgements
This work was supported by the Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina (PIP-152), the Agencia Nacional de Promoción de Ciencia y Tecnologí, Argentina (PICT 2015-1505), the Universidad Nacional de La Plata, Argentina (UNLP-11X585), and the Ministerio de Economía y Competitividad, Spain (MTM2016-75196-P). The authors would like to thanks to Enrique Pujals for fruitful conversations on an earlier version of this paper. Also we would like to thank the referee for his/her careful reading of the manuscript and his/her suggestions and corrections which have helped us to improve the paper.