1. Introduction
We denote by $T:M \to M$ a transformation acting on the metric space M, which is either the shift σ acting on $M=\{1,2, \ldots,d\}^{\mathbb{N}}$, or, T is the action of a d to 1 expanding transformation $T:S^1 \to S^1$, of class $C^{1+\alpha}$, where $M=S^1$ is the unit circle.
For a fixed α > 0, we denote by Hol the set of α-Hölder functions on M.
For a Hölder potential $A: M \to \mathbb{R}$, we define the Ruelle operator (sometimes called transfer operator) – which acts on Hölder functions $f: M \to \mathbb{R}$ – by
It is known (see for instance [Reference Parry and Pollicott18] or [Reference Baladi2]) that $\mathscr{L}_A$ has a positive, simple leading eigenvalue λA with a positive Hölder eigenfunction hA. Moreover, the dual operator acting on measures $\mathscr{L}_A^\ast$ has a unique eigenprobability νA which is associated to the same eigenvalue λA.
Given a Hölder potential A, we say that the probability µA – defined on the Borel sigma-algebra of M – is the equilibrium probability for A, if µA maximizes the values
among Borel T-invariant probabilities µ and where $h(\mu)$ is the Kolmogorov–Sinai entropy of µ.
The theory of thermodynamic formalism shows that the probability µA is unique and is given by the expression $\mu_A = h_A \nu_A$.
In some particular cases, the equilibrium probability (also called Gibbs probability) µA is the one observed on the thermodynamical equilibrium in the Statistical Mechanics of the one-dimensional lattice $\mathbb{N}$ (under an interaction described by the potential A). As an example (where the spin in each site of the lattice $\mathbb{N}$ could be + or −) one can take $M=\{+,-\}^{\mathbb{N}}$, $A: M \to \mathbb{R}$ and T is the shift.
Taking into account the above definitions, we say that a Hölder potential A is normalized if $\mathscr{L}_A 1 =1.$ In this case, $\lambda_A=1$ and $\mu_A=\nu_A$.
Two potentials $A, B$ in Hol will be called cohomologous to each other (up to a constant), if there exists a continuous function $g:M \to \mathbb{R}$ and a constant c, such that,
Note that the equilibrium probability for A, respectively B, is the same if A and B are coboundaries to each other. In each coboundary class (an equivalence relation), there exists a unique normalized potential A (see [Reference Parry and Pollicott18]). Therefore, the set of equilibrium probabilities for Hölder potentials $\mathcal{N}$ can be indexed by Hölder potentials A which are normalized. We will use this point of view here: $A \leftrightarrow \mu_A$.
The infinite-dimensional manifold $\mathcal{N}$ of Hölder equilibrium probabilities µA is an analytic manifold (see [Reference Ruelle22], [Reference da Silva, da Silva and Souza8], [Reference Parry and Pollicott18], [Reference Chae6]) and it was shown in [Reference Giulietti, Kloeckner, Lopes and Marcon10] that it carries a natural Riemannian structure. In order to provide a context for our main result, let us review first some of the main properties of this infinite-dimensional manifold and some definitions described on [Reference Giulietti, Kloeckner, Lopes and Marcon10].
The set of tangent vectors X (a function $X: M \to \mathbb{R}$) to $\mathcal{N}$ at the point µA coincides with the kernel of $\mathscr{L}_A $. The Riemannian norm $|X|=|X|_{\mu_A}$ of the vector X, which is tangent to $\mathcal{N}$ at the point µA, is described (see Theorem D in [Reference Giulietti, Kloeckner, Lopes and Marcon10]) via the asymptotic variance, that is, satisfies
The associated bilinear form on the tangent space at the point µA can be described (see Theorem D in [Reference Giulietti, Kloeckner, Lopes and Marcon10]) by
This bilinear form is positive semi-definite and in order to make it definite one can consider equivalence classes (cohomologous up to a constant) as described by Definition 5.4 in [Reference Giulietti, Kloeckner, Lopes and Marcon10]. In this way, we finally get a Riemannian structure on $\mathcal{N}$ (as anticipated in some paragraphs above). Elements X on the tangent space at µA have the property $\int X \,\mathrm{d} \mu_A=0.$ The tangent space to $\mathcal{N}$ at µA is denoted by $T_{A}\mathcal{N}$.
Given a normalized potential A let $\{X_i \}$ be an orthonormal basis of $T_{A}\mathcal{N}$, $i \in \mathbb{N}$.
Our main result is:
Theorem 1.1. Let A be a normalized potential, and let $\{X_i \}$ be an orthonormal basis of $T_{A}\mathcal{N}$. Let $X=X_{1}, Y= X_{2}$, then the sectional curvature $K(X,Y)$ is given by
The expression of $K(X,Y)$ applies of course to any pair of vectors in the basis $\{X_{i}\}$, and we can always change the enumeration of the vectors in the basis without changing the basis. The work consists of two distinct parts: the first part, from § 2 to 5, has a more geometric nature and deals with the calculation of the Levi-Civita connection and the curvature tensor. This estimate becomes quite complex because we are dealing with an infinitely dimensional Riemannian manifold. Our goal was to express the sectional curvature for sections on the tangent space at µA in terms of integrals of functions with respect to µA. An important tool which will be used here is item (iv) on Theorem 5.1 in [Reference Giulietti, Kloeckner, Lopes and Marcon10]: for all normalized $A\in\mathcal{N}$, $X \in T_{A}\mathcal{N}$ and φ a continuous function it holds:
In § 4.3, we describe the expression of sectional curvature $K(X,Y)$ in terms of the calculus of thermodynamic formalism.
The nature of the second part of the paper, from § 6 to 9, is more dynamic, analytical and considers $M=\{0,1\}^{\mathbb{N}}$. We denote by $\mathcal{K}$ the set of stationary Markov probabilities taking values in $\{0,1\}$. The set of shift invariant probabilities $\mu\in \mathcal{K}$ is contained in $ \mathcal{N}$. The probabilities µ are defined on the space $\{0,1\}^{\mathbb{N}}$. The two-dimensional manifold $\mathcal{K}$ is the set of equilibrium probabilities for potentials A depending on the two first coordinates (see [Reference Parry and Pollicott18]), that is, when $A(x_1,x_2,x_3, \ldots,x_n, \ldots)= A(x_1,x_2).$
For each point µA in $\mathcal{K}$, we are able to exhibit a special orthonormal basis $\{\hat{a}_y\}$ for the tangent space $T_{A}\mathcal{N}$, indexed by finite words y on the alphabet $\{0,1\}$ (see expression (28)). This orthonormal family will be denoted by $\mathcal{F}.$ We focus, for each point in $\mathcal{K}$, on the sectional curvatures for pairs of vectors on $\mathcal{F}$. We get explicit results in this case. This second part of the article is perhaps the more technical and subtle part; after some computations, we will get the explicit expression for sectional curvature $K(\hat{a}_x,\hat{a}_z)$ (see expression (45) in Theorem 7.7 and Propositions 7.9 and 7.12).
A remarkable fact appearing in the proof of Theorem 1.1 is that the expression (5) of the sectional curvature $K(\hat{a}_x,\hat{a}_z)$ is actually a sum of a finite number of parcels (see expression (45) in Theorem 7.7 and Remark 7.11).
We highlight some properties that will be demonstrated in the future and that describe the eventual values of the sectional curvature $K(\hat{a}_x,\hat{a}_z)$ depending on the pair of vectors $\hat{a}_x,\hat{a}_z$ and the point in $\mathcal{K}$ under consideration.
1. Each vector $ \hat{a}_y$ is a function which is constant in cylinders of finite size (see expressions (28) and (25)). More precisely, given a finite word $y=(y_1,y_2, \ldots,y_n)$, $n \geq 1$, we denote by $[y]=[y_1,y_2, \ldots,y_n]$ the associated cylinder set in $\{0,1\}^{\mathbb{N}}$. The function $\hat{a}_y$ is constant in each of the cylinder sets $[a,y_1,y_2, \ldots,y_n,b]$, where $a,b=0,1$. The support of $\hat{a}_y$ is the union of these cylinder sets. In this way if the word y has large length, then the support of $\hat{a}_y$ is contained on very small sets. We will have to consider the empty word which will give rise to two tangent vectors $\hat{a}_{\emptyset}^0$ and $\hat{a}_{\emptyset}^1$, which are functions with support on cylinders of size two.
2. The values $K(\hat{a}_x,\hat{a}_z)$ can be positive or negative depending on the point in $\mathcal{K}$ and the words x and z (see Example 7.19).
3. We say that z is a subprefix of x, if x and z satisfy
where $n \geq k$. If x and z do not begin with the same letter (do not share a common subprefix), then $K(\hat{a}_x,\hat{a}_z)=0$ (see Proposition 7.10). As an example take $x=(0,1,1,0)$ and $z=(1,1,0)$.
4. Words x and z with large length can eventually produce extremely negative curvature $K(\hat{a}_x,\hat{a}_z)$. This may happen when x and z have several common subprefixes. This is due to expression (45). As an example take $x=(0,1,1,0,0,1)$ and $z=(0,1,1,0,0,0,1)$. But even in this case, it is possible to get positive curvature depending on the point in $\mathcal{K}$ (see Example 7.19 for a discussion in a particular case).
5. We also show that if µA (a point in $\mathcal{K}$) corresponds to the measure of maximal entropy on $\{0,1\}^{\mathbb{N}}$, most of the sectional curvatures $K(\hat{a}_x,\hat{a}_z)$ are equal to $-1/2$ (see Proposition 7.16). Proposition 7.18 shows, in this case, an example where the sectional curvature $K(\hat{a}_{[\emptyset]}^0,\hat{a}_0)=1/2$. The different possibilities also include the case $K(\hat{a}_{\emptyset}^0,\hat{a}_{\emptyset}^1)=0$.
6. Considering the two-dimensional manifold $\mathcal{K}$ (of the Markov invariant probabilities), it is natural to consider that vectors on $T M$ should be functions depending on two coordinates. In our setting, the corresponding elements on the basis $\mathcal{F}$ are $\hat{a}_{\emptyset}^0$ and $\hat{a}_{\emptyset}^1$. We show that for any points in $\mathcal{K}$ the sectional curvature $K(\hat{a}_{\emptyset}^0,\hat{a}_{\emptyset}^1)=0$ (see Theorem 7.14). In this way, considering $\mathcal{K}$ as a surface in itself, we get that $\mathcal{K}$ is a flat surface (see Remark 7.15).
In [Reference McMullen17] , [Reference Bridgeman, Canary and Sambarino5] and [Reference Pollicott and Sharp21], the authors consider a similar kind of Riemannian structure. The bilinear form considered in [Reference McMullen17] is the one we consider here divided by the entropy of µA. As mentioned in Section 8 in [Reference Giulietti, Kloeckner, Lopes and Marcon10] in that case, the curvature can be positive and also negative in some parts.
The main motivation for the results obtained on [Reference McMullen17] (and also [Reference Bridgeman, Canary and Sambarino5]) is related to the study of a particular norm on the Teichmüller space.
The results presented in [Reference Giulietti, Kloeckner, Lopes and Marcon10] and here are related to the topic of Information Geometry (see [Reference Amari1] for general results on the subject) and this is described in Section 5 in [Reference Lopes and Ruggiero14]. We point out that in the setting of thermodynamic formalism the asymptotic variance is the Fisher information (see Definition 4.3 and Proposition 4.4 in [Reference Ji11]). Results about Kullback–Leibler divergence on thermodynamic formalism appeared recently in [Reference Lopes and Mengue13].
General references for analyticity (and inverse function theorems and implicit function theorems) in Banach spaces are [Reference Chae6] and [Reference Whittlesey23].
A reference for general results in infinite-dimensional Riemannian manifolds is [Reference Biliotti and Mercuri3].
In Section 6 in [Reference Giulietti, Kloeckner, Lopes and Marcon10], it is explained that the Riemannian metric considered here is not compatible with the 2-Wasserstein Riemannian structure on the space of probabilities.
We would like thanks to Paulo Varandas, Miguel Paternain and Gonzalo Contreras for helpful conversations on questions related to the topics considered in this paper.
We thank the referee for extremely careful reading and criticism of previous versions of our paper. Related results appear in [Reference Lopes and Ruggiero15].
2. Preliminaries of Riemannian geometry
Let us introduce some basic notions of Riemannian geometry. Given an infinite-dimensional $C^{\infty}$ manifold (M, g) equipped with a smooth Riemannian metric g, let $T M$ be the tangent bundle and $T_{1} M$ be the set of unit norm tangent vectors of (M, g), the unit tangent bundle. Let $\chi(M)$ be the set of $C^{\infty}$ vector fields of M.
In [Reference Biliotti and Mercuri3], several results for Riemannian metrics on infinite-dimensional manifolds are presented. We will not use any of the results of that paper.
The only infinite-dimensional manifold we will be interested in here is $\mathcal{N}$ which is the set of Hölder equilibrium probabilities (which was initially defined in [Reference Giulietti, Kloeckner, Lopes and Marcon10]). Tangent vectors, differentiability, analyticity, etc., should be always considered in the sense of the setting described in Sections 2.3 and 5.1 in [Reference Giulietti, Kloeckner, Lopes and Marcon10] (see also [Reference Bomfim, Castro and Varandas4] and [Reference da Silva, da Silva and Souza8]). We will elaborate on this later.
So in our case, $M= \mathcal{N}$, and g is the L 2 metric, $g_{A}(X,Y) = \int X Y\,\mathrm{d}\mu_{A}$.
For practical purposes, we shall call Energy the function $E(v) = g(v,v)$, $v \in T \mathcal{N}$, although in mechanics the energy is rather defined by $\frac{1}{2}g(v,v)$.
Given a smooth function $f :\mathcal{N} \longrightarrow \mathbb{R}$, the derivative of f with respect to a vector field $X \in \chi (\mathcal{N} )$ will be denoted by X(f). The Lie bracket of two vector fields $X, Y \in \chi(\mathcal{N} )$ is the vector field whose action on the set of functions $f: \mathcal{N} \longrightarrow \mathbb{R}$ is given by $[X,Y](f) = X(Y(f)) - Y(X(f))$.
The Levi-Civita connection of $(\mathcal{N} ,g)$, $\nabla : \chi(\mathcal{N} )\times \chi(\mathcal{N} ) \longrightarrow \chi(\mathcal{N} )$, with notation $\nabla(X,Y) = \nabla_{X}Y$, is the affine operator characterized by the following properties:
(1) Compatibility with the metric g:
\begin{equation*} Xg(Y,Z) = g(\nabla_{X}Y, Z) + g(Y, \nabla_{X}Z) \end{equation*}for every triple of vector fields $X, Y, Z$.
(2) Absence of torsion:
\begin{equation*} \nabla_{X}Y - \nabla_{Y}X = [X,Y].\end{equation*}(3) For every smooth scalar function f and vector fields $X,Y \in \chi(\mathcal{N} )$, we have
• $ \nabla_{fX}Y = f\nabla_{X}Y$,
• Leibniz rule: $ \nabla_{X}(fY) = X(f)Y + f\nabla_{X}Y$.
The expression of $\nabla_{X}Y$ can be obtained explicitly from the expression of the Riemannian metric, in dual form. Namely, given two vector fields $X, Y \in \chi(\mathcal{N} )$ and $Z \in \chi(\mathcal{N} )$, we have
2.1. Curvature tensor and sectional curvatures
We follow [Reference do Carmo9] for the definitions in the subsection. To simplify the notation, from now on, we shall adopt the convention $g(X,Y) = \langle X, Y \rangle$. The curvature tensor
is defined in terms of the Levi-Civita connection as follows
The sectional curvature of the plane generated by two vector fields $X, Y$ at the point $A \in \mathcal{N}$, which are orthonormal at A, is given by
Let A be a normalized Hölder potential. Let us consider a local smooth surface $S(t,s)$, for $\mid t \mid, \mid s \mid \leq \epsilon$ small, tangent to the plane $ \{A + tX + sY\} $ generated by $X, Y$ at the point $A = S(0,0)$. Let $\bar{X}$, $\bar{Y}$ be the coordinate vector fields of the surface and suppose that $\bar{X}_{A} =X$, $\bar{Y}_{A} =Y$. In § 4.2, we shall exhibit such local surfaces.
Lemma 2.1. The expression of the sectional curvature of the plane generated by the two orthonormal vectors $X, Y$ is
Proof. The fact that $\bar{X}$ and $\bar{Y}$ commute implies that $\nabla_{\bar{X}}\bar{Y} = \nabla_{\bar{Y}}\bar{X}$ and
The first term of $\langle \mathcal{R}(\bar{X},\bar{Y})\bar{X},\bar{Y} \rangle$ gives
The second term of the formula gives
Subtracting the second term from the first one we obtain the lemma.
3. The analytic structure of the set of normalized potentials
Definition 3.1. Let $ (X, | .|)$ and $(Y, |.|)$ Banach spaces and V an open subset of $ X.$ Given $k\in \mathbb{N}$, a function $F : V\to Y$ is called k-differentiable in x, if for each $j=1, \ldots, k$, there exists a j-linear bounded transformation
such that,
where
By definition F has derivatives of all orders in V, if for any $x\in V$ and any $k\in \mathbb{N}$, the function F is k-differentiable in x.
Definition 3.2. Let $X, Y$ be Banach spaces and V an open subset of X. A function $F : V \to X$ is called analytic on V, when F has derivatives of all orders in V, and for each $x \in V$ there exists an open neighbourhood Vx of x in V, such that, for all $v\in V_x$, we have that
where $D^j F(x)v^j = D^j F(x)(v, \ldots, v) $ and $ D_j F(x) $ is the j-th derivative of F in x.
Above we use the notation of Section 3.2 in [Reference da Silva, da Silva and Souza8].
$\mathcal{N}$ can be expressed locally in coordinates via analytic charts (see [Reference Giulietti, Kloeckner, Lopes and Marcon10]).
3.1. Some more estimates from thermodynamic formalism
Given a potential $B \in \text{Hol} $, we consider the associated Ruelle operator $\mathscr{L}_B$ and the corresponding main eigenvalue λB and eigenfunction hB.
The function
describes the projection of the space of potentials B on Hol onto the analytic manifold of normalized potentials $\mathcal{N}$.
We identify below $T_A \mathcal{N}$ with the affine subspace $\{A + X : X \in T_A \mathcal{N}\}.$
The function Π is analytic (see [Reference Giulietti, Kloeckner, Lopes and Marcon10]) and therefore has first and second derivatives. Given the potential B, then the map $D_B \Pi : T_{B}\mathcal{N} \longrightarrow T_{\Pi(B)}\mathcal{N} $ given by
should be considered as a linear map from Hol to itself (with the Hölder norm on Hol). Moreover, the second derivative $D^2_B \Pi$ should be interpreted as a bilinear form from Hol × Hol to Hol and is given by
We denote by $||A||_\alpha$ the α-Hölder norm of an α-Hölder function A.
When B is normalized the eigenvalue is 1 and the eigenfunction is equal to 1. We would like to study the geometry of the projection Π restricted to the tangent space $T_{A}\mathcal{N}$ into the manifold $\mathcal{N}$ (namely, to get bounds for its first and second derivatives with respect to the potential viewed as a variable) for a given normalized potential A.
The space $T_{A}\mathcal{N}$ is a linear subspace of functions and the derivative map $D \Pi$ is analytic when restricted to it.
We denote by $E_0 =E_0^A$ the set of Hölder functions g, such that, $\int g \,\mathrm{d} \mu_A =0,$ where µA is the equilibrium probability for the normalized potential A. Note that $E_0^A$ is contained in $T_{A} (\mathcal{N}).$
Most of the claims of the next Lemma are based mainly on results of [Reference Giulietti, Kloeckner, Lopes and Marcon10] (see also [Reference da Silva, da Silva and Souza8], [Reference Bomfim, Castro and Varandas4]).
Lemma 3.3. Let $\Lambda : \text{Hol} \longrightarrow \mathbb{R}$, $H : \text{Hol} \longrightarrow \text{Hol}$ be given, respectively, by $ \Lambda (B) = \lambda_{B},$ $ H(B) = h_{B}$. Then we have
(1) The maps Λ, H and $A \longrightarrow \mu_{A}$ are analytic.
(2) For a normalized B, we get that $D_{B}\log(\Lambda) (\psi) = \int \psi \,\mathrm{d}\mu_{B}.$
(3) $ D^{2}_{B}\log(\Lambda) (\eta, \psi) = \int \eta \psi \,\mathrm{d}\mu_{B},$ where $\psi , \eta $ are at $T_{B}\mathcal{N}$, and B is normalized.
(4) If A is a normalized potential, then for every function $X \in T_{A}\mathcal{N}$, we have
• $\int X \,\mathrm{d}\mu_{A} =0$.
• $D_{A}\Pi(X) = X$.
In order to simplify the notation, from now on, unless is necessary for the understanding, we will denote $(I - \mathscr{L}_{T,A}|_{E_0^A})^{-1}$ by $ (I - \mathscr{L}_{T,A})^{-1}.$
Items (2) and (3) are taken from Theorem D in [Reference Giulietti, Kloeckner, Lopes and Marcon10]. Item $\int X \,\mathrm{d}\mu_{A} =0$ in (3) follows from Theorem A and Corollary B in [Reference Giulietti, Kloeckner, Lopes and Marcon10], and the other item in (4) is trivial.
The analyticity of Λ and H of the item (1) are well-known facts (see Chapter 4 in [Reference Parry and Pollicott18] or Corollary B in [Reference Giulietti, Kloeckner, Lopes and Marcon10]) which was also proved in [Reference Bomfim, Castro and Varandas4].
The law that takes a Hölder potential B to its normalization A is differentiable according to Section 2.2 in [Reference Giulietti, Kloeckner, Lopes and Marcon10].
Note that the derivative linear operator $X \to D_{A}H(X)$ is zero when A is normalized.
Remark 1: Item (1) above means that for a fixed Hölder function f the map $A \to \int f \,\mathrm{d} \mu_A$ is differentiable on A (see Theorem B in [Reference Bomfim, Castro and Varandas4]).
Questions related to second derivatives on thermodynamic formalism are considered in [Reference Ma and Pollicott16], [Reference Petkov and Stoyanov19] and [Reference Pollicott and Sharp21].
4. Evaluating the sectional curvatures of the Riemannian metric
The goal of the section is to calculate the sectional curvature $K(X,Y)$ of the plane generated by two orthogonal vector fields tangent to $A \in \mathcal{N}$ applying the calculus of thermodynamic formalism. We start with a technical result that is a consequence of formula 6. This lemma will be extensively used in the article.
4.1. Leibniz rule of differentiation
Lemma 4.1. Let $A \in \mathcal{N}$ and let $\gamma: (-\epsilon, \epsilon) \longrightarrow \mathcal{N}$ be a smooth curve such that $\gamma(0) =A$. Let $X(t) = \gamma'(t)$, and let Y be a smooth vector field tangent to $\mathcal{N}$ defined in an open neighbourhood of A. Denote by $Y(t)= Y(\gamma(t))$. Then the derivative of $\int Y(t) \,\mathrm{d}\mu_{\gamma(t)}$ with respect to the parameter t is
for every $ t\in (-\epsilon, \epsilon)$.
Proof. The idea of the proof is very simple and based on the fact that the function $Q: \chi(\mathcal{N})\times m_{T} \longrightarrow \mathbb{R}$ given by
is a bilinear form, where $\chi(\mathcal{N})$ is the set of C 1 vector fields tangent to $\mathcal{N}$ and mT is the set of invariant measures of the map T. So the derivative of a function of the type $Q(X(t), \mu(t))$ satisfies a sort of Leibniz rule. Let us check.
Let us calculate the derivative at t = 0, for every other $t \in (-\epsilon, \epsilon)$, the calculation is analogous. We have
where in the last step we use the fact that the derivative with respect to t only depends on the vector X(0) and not on the curve through A tangent to X(0). By Equation (6), the second term in the above equality is just $\frac{\mathrm{d}}{\mathrm{d}t} \int Y(0) \,\mathrm{d}\mu_{A+tX(0)} \mid_{t=0}$, which equals $\int X(0) Y(0) \,\mathrm{d}\mu_{A }$. This finishes the proof of the lemma.
From now on, we shall adopt the notation $\frac{\partial Y}{\partial t}= Y' = Y_{t}$; the second one applies when there is only one parameter involved in the calculations, and the third one will be used otherwise.
4.2. Auxiliary local surfaces in $\mathcal{N}$
Next, given a normalized potential A and $X, Y$ orthonormal vector in the tangent space of A, we proceed to construct a local surface $S(t,s)$, $\mid t \mid, \mid s \mid \lt \epsilon$ small, such that $S(0,0) =A$, and the tangent space of $S(t,s)$ at A is the plane generated by $X, Y$. Let us consider the plane
where $t, s, \in \mathbb{R} $, that is a subset of $T_{A} \mathcal{N}$, and let Π be the projection into $\mathcal{N}$ defined in Equation (10). The vector fields $X_{P(t,s) } = \frac{\partial}{\partial t} P(t,s) = X$, $Y_{P(t,s)} = \frac{\partial}{\partial s} P(t,s) = Y$ are tangent to the plane P of course.
Let $S(t,s) = \Pi(P(t,s))$. By Lemma 3.3 item (5), the restriction of the map Π to the plane $P(t,s)$ is a local diffeomorphism onto its image, so there exists ϵ > 0 small such that $S(t,s)$ is an analytic embedding of the rectangle $\{\mid t \mid \lt \epsilon\} \times \{\mid s \mid \lt \epsilon \}$.
The coordinate vector fields of $S(t,s)$ are $\bar{X}_{S(t,s)} = \frac{\partial }{\partial t}( \Pi(P(t,s))) = D_{P(t,s)} \Pi(X)$, $\bar{Y}_{S(t,s)} = \frac{\partial }{\partial s}( \Pi (P(t,s)) = D_{P(t,s)} \Pi(Y)$, so $\bar{X}, \bar{Y}$ are extensions of $X, Y$.
Moreover, we have the following result from thermodynamic formalism (for derivatives of high order see (3.4) in [Reference Ma and Pollicott16]):
Lemma 4.2. Suppose $\psi:\{1,2, \ldots,d\}^{\mathbb{N}} \to \mathbb{R}$ is Hölder, normalized and µ denotes the associated equilibrium probability. Assume also that the Hölder function ϕ satisfies $\mathscr{L}_\psi (\phi)=0$. Denote by λt and wt, $t\in \mathbb{R}$, respectively, the eigenvalue and the eigenfunction for the Ruelle operator $\mathscr{L}_{\psi + t \phi}$. Then, we have
(1) The derivative of wt satisfies
(11)\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} w_t(x)|_{t=0}=c, \text{for all}\, x, \end{equation}for some constant c.
(2) Moreover, as ψ is normalized
(12)\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} \log (w_t(x)|_{t=0})=c, \text{for all}\, x. \end{equation}(3) Suppose $\overline{X}$ is an analytic vector field, extending the tangent vector X, defined in a neighbourhood of ϕ. Let $\gamma : (-\epsilon, \epsilon)\to \mathcal{N}$ be an integral curve of $\overline{X}$, with $\gamma(0)=\phi$, and let wt be the curve of eigen functions for the Ruelle operator of $\gamma(t)$. Then,
(13)\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} w_t(x)=c_t, \text{for all}\, x, \end{equation}is a curve of constant functions which is analytic on t.
(4) For any tangent vector X (in the kernel of the Ruelle operator), the directional derivative
(14)\begin{equation} D_{\psi}H (X) =c_X = D_{\psi}\log H (X), \end{equation}where cX depends on X and ψ.
(5) From Equation (11), we get
(15)\begin{equation} \frac{\mathrm{d}}{\mathrm{d}t} w_t(T(x))|_{t=0}=c, \text{for all}\, x, \end{equation}and for the same constant c of Equation (11).
Proof. We are going to take derivative on the Hölder direction ϕ. Assume that ϕ satisfies $\mathscr{L}_\psi (\phi)=0,$ which implies that $\int \phi \,\mathrm{d} \mu=0.$ This is so because iterates of a function under the Ruelle–Perron–Frobenius operator converge to the integral of that function against the eigenmeasure.
Denote by $w(t,x)=w_t(x)$, the normalized eigenfunction for $\mathscr{L}_{\psi + t \phi}$ associated with the eigenvalue λt. That is
Taking derivative on t:
Therefore, for all x, when t = 0, we get
On the other hand, for all x and t
Then, taking t = 0,
Denote $g(x) = \frac{\mathrm{d}}{\mathrm{d}t} w(t,.)|_{t=0} (x) .$
Then, $\forall x$, we get from the above and Equation (16)
for the normalized potential ψ. But, the only continuous eigenfunctions for $\mathscr{L}_{\psi }$, which are associated to the eigenvalue 1 are the constant functions.
Therefore, there exists c such that $ \frac{\mathrm{d}}{\mathrm{d}t} w_t(x)|_{t=0}=c$, for all x.
As, for all t and x
we get Equation (12).
Equation (14) follows at once from the above.
Expression (13) is obtained in the same way as it was derived Equation (11) (applying the argument for each value t), and, finally, Equation (15) follows trivially from Equation (11).
We will use the above result on the next lemma.
Lemma 4.3. The derivatives with respect to $t , s$ of the coordinate vector fields $\bar{X}$, $\bar{Y}$ at the point A (a normalized potential) are
(1) $\frac{\partial }{\partial t} \bar{X} =\frac{\partial }{\partial s} \bar{Y} = - 1 $
(2) $\frac{\partial }{\partial s} \bar{X} = \frac{\partial }{\partial t}\bar{Y} = 0$.
Proof. We assume that the tangent vector is Hölder and in the kernel of the Ruelle operator $\mathscr{L}_A$. The proof of the lemma will be a direct consequence of Lemma 4.2 taking $\psi=A$ and $\phi=X$. We will prove first the item (1) above.
The local surface $S(t,s)$ is contained in the manifold of normalized potentials, and we denote, respectively, the corresponding eigenvalue by $\lambda _{S(t,s)}$ and the associated eigenfunction by $h_{S(t,s)}$ (of the Ruelle operator associated with $S(t,s)$).
Let I be the identity map. The expression of the projection Π (Equation (10)) is
By definition, we have
Lemma 3.3 grants that all the functions involved in the expression of Π are differentiable, so we get at the point t = 0,
The first term gives at t = 0,
since X does not depend on t.
Claim 1:
The second and third term cancel due to Equations (11) and (15).
Indeed, the curves
coincide by Equations (11) and (15) with the expression
for each t, where cX is given in Lemma 4.2. These curves are analytic and therefore differentiable, so their derivatives with respect to t coincide. Since derivatives $\alpha'(t)$, $\beta '(t)$ appear with opposite signs in Equation (18), they add up to zero in this formula. This proves the Claim.
Finally, the fourth line of Equation (18) gives by Lemma 3.3 item (3),
since X has L 2 norm equal to 1. The same argument applies replacing X by Y in the above proof, so this finishes the proof of item (1).
Item (2) follows the same type of reasoning and using Equation (13). By definition, we have
This expression, according to Equation (18), is
The first term gives at t = 0,
since X does not depend on $t,s$. The fourth term is, by Lemma 3.3 item (3),
As for the second and third terms, we have
Claim 2:
The proof goes as in Claim 1, letting
we have by Lemma 4.2 items (3) and (5) that $\alpha_{s}(t) = \beta_{s}(t)$ is an analytic curve of constant functions for each given s. Therefore, the function
is an analytic function of the parameters $t,s$ and therefore, the derivatives of $\alpha_{s}(t)$ and $\beta_{s}(t)$ with respect to s coincide and give a family of constant functions in the local surface $S(t,s)$. This finishes the proof of Claim 2.
Claim 2 yields that the sum of the second and third terms of the expression of $\frac{\partial }{\partial s}( D_{P(t,s)}\Pi(X_{P(t,s)}))_{t=s=0}$ vanishes, just finishing the proof of item (2).
4.3. The expression of $K(X,Y)$ in terms of the calculus of thermodynamic formalism
Let us first state some notation. Let $\bar{X}_{t}$ be the derivative of the vector field $\bar{X}$ with respect to the parameter t and $\bar{X}_{s}$ be the derivative of the vector field $\bar{X}$ with respect to the parameter s. The same convention applies to $\bar{Y}_{t}$, $\bar{Y}_{s}$. The notation $\bar{X}(Y) = \frac{\partial}{\partial t}\bar{Y} = \bar{Y}_{t}$ will always represent derivatives with respect to the vector field $\bar{X}$, while $\bar{X}\bar{Y}$ or $\bar{X}\times \bar{Y}$ will represent the product of the functions $\bar{X}$ and $\bar{Y}$. Through the section, this double character of the vectors tangent to the manifold $\mathcal{N}$ which are also functions will show up in all statements and proofs.
Theorem 4.4. Let A be a normalized potential, let $X, Y \in T_{A} \mathcal{N}$ be a pair of orthonormal vector fields, and let $S: (-\epsilon, \epsilon)\times (-\delta, \delta) \longrightarrow \mathcal{N}$ be the local surface defined in the previous subsection with $S(0,0)= A$, $\bar{X} $, whose coordinate vector fields are $\bar{X}$, $\bar{Y}$, with $\bar{X}(A) =X$, $\bar{Y}(A) = Y$. Then the sectional curvature $K(X,Y)$ at A of the plane generated by $X, Y$ is given by the expression
We shall subdivide the proof into several steps.
Lemma 4.5. We have that $\bar{X}_{s} = \bar{Y}_{t}$ in the local surface S.
This is a straightforward consequence of the fact that the vector fields $\bar{X}, \bar{Y}$ commute.
Next, let us evaluate the terms of the sectional curvature in Lemma 2.1,
Lemma 4.6. At every point $p \in S(t,s)$, we have
(1) $ \bar{X}(\bar{X}(\parallel \bar{Y} \parallel^{2}) )= 2\int \bar{Y}\bar{Y}_{tt}\,\mathrm{d}\mu_{p} - \int \bar{Y}^{2} \,\mathrm{d}\mu_{p} +\int \bar{X}^{2}\bar{Y}^{2} \,\mathrm{d}\mu_{p} .$
(2) $ \bar{Y}(\bar{Y}(\parallel \bar{X} \parallel^{2} ))= 2\int \bar{X}\bar{X}_{ss}\,\mathrm{d}\mu_{p} - \int \bar{X}^{2} \,\mathrm{d}\mu_{p} +\int \bar{X}^{2}\bar{Y}^{2} \,\mathrm{d}\mu_{p} .$
In particular, if p = A, we have
(1) $ \bar{X}(\bar{X}(\parallel \bar{Y} \parallel^{2}) )= 2\int \bar{Y}\bar{Y}_{tt}\,\mathrm{d}\mu_{A} -1 +\int \bar{X}^{2}\bar{Y}^{2} \,\mathrm{d}\mu_{A} .$
(2) $ \bar{Y}(\bar{Y}(\parallel \bar{X} \parallel^{2} ))= 2\int \bar{X}\bar{X}_{ss}\,\mathrm{d}\mu_{A} - 1 +\int \bar{X}^{2}\bar{Y}^{2} \,\mathrm{d}\mu_{A} .$
Proof. The expression follows from the application of the Leibniz rule to differentiate $\parallel \bar{Y} \parallel^{2}= \int \bar{Y}^{2} \,\mathrm{d}\mu_{p}$ (we shall omit for convenience the p in the notation of the measure $\mathrm{d}\mu_{p}$ ):
Since by Lemma 4.3 we have that $\bar{X}_{s} = \bar{Y}_{t}=0$, $\bar{X}_{t}= \bar{Y}_{s} = -1$, we get item (1) just by replacing this values in the integral expressions above.
Interchanging $\bar{X}$ and $\bar{Y}$, t and s, in the above formula, we get item (2). At the point p = A, we have that $\int \bar{X}^{2} \,\mathrm{d}\mu_{A} = \int \bar{T}^{2} \,\mathrm{d}\mu_{A} = 1$, so replacing these values in the formula we finish the proof of the lemma.
Lemma 4.7. The expression of $\bar{Y}(\bar{X}\langle \bar{X}, \bar{Y} \rangle ) = \bar{Y}(\bar{X}\int \bar{X}\bar{Y} \,\mathrm{d}\mu_{p}) $ is
at every point $p \in S(t,s)$. In particular, at p = A, we have
Proof. We apply the Leibniz rule,
Since by Lemma 4.5 we have that $\bar{X}_{s} = \bar{Y}_{t}$ we get the following formula just adding the terms in the above formula:
By Lemma 4.3, $\bar{X}_{s}= \bar{Y}_{t}=0$, $\bar{X}_{t}= \bar{Y}_{s}=-1$, and replacing these values in the integral expression above we obtain the formula in the statement. Moreover, if p = A, we know that $\int \bar{X}^{2}\,\mathrm{d}\mu_{A} = \int ^{2}\,\mathrm{d}\mu_{A} = 1$, as well as $\int \bar{Y}^{2}\,\mathrm{d}\mu_{A} = \int Y^{2}\,\mathrm{d}\mu_{A} = 1$, thus concluding the proof of the Lemma.
Corollary 4.8. The term $-\frac{1}{2}(\bar{X}(\bar{X}(\parallel \bar{Y} \parallel^{2}) )+ \bar{Y}(\bar{Y}(\parallel \bar{X} \parallel^{2}))) + \bar{Y}(\bar{X}\langle \bar{X}, \bar{Y} \rangle )$ in the expression of $K(X,Y)$ at the point A vanishes.
Proof. To shorten notation, we shall omit the dependence of A in the expressions. According to Lemma 4.5, we have that
(1) $ \int \bar{X}\bar{X}_{ss}\,\mathrm{d}\mu = \int \bar{X} \bar{Y}_{ts}\,\mathrm{d}\mu$.
(2) $\int \bar{Y} \bar{X}_{st}\,\mathrm{d}\mu = \int \bar{Y} \bar{Y}_{tt}\,\mathrm{d}\mu$.
Replacing the above equalities in the expressions of Lemmas 4.6 and 4.7, and adding the resulting formulae we get Corollary 4.8.
5. Cristoffel coefficients at the expression of $K(X,Y)$
We denote by $\{X_{i} \}$, $i \in \mathbb{N}$, a complete orthonormal base of the vector space $T_{A} \mathcal{N} \subset L^2(\mu)$ (for the Gibbs probability µ associated with the normalized potential A).
The main goal of the section is to obtain the expression for the sectional curvature in Theorem 1.1.
Namely, let $A \in \mathcal{N}$ be a point in the manifold of normalized potentials, let $X,Y \in T_{A} \mathcal{N}$ be two orthonormal tangent vectors. Then the expression of the curvature of the plane generated by $X, Y$ is
In Proposition 5.2, we will show that the above sum is well-defined.
The proof is a direct calculation of the terms $ \parallel \nabla_{\bar{Y}}\bar{X} \parallel^{2}, \langle \nabla_{\bar{X}}\bar{X}, \nabla_{\bar{Y}}\bar{Y} \rangle$ that appear in the expression of the curvature in Theorem 4.4. We shall subdivide the calculation in several lemmas.
We follow the notation of the previous section. Let $S(t,s)$ be the local surface given in § 4 tangent to the plane generated by the vectors $X, Y$, satisfying $S(0,0) = A$, let $\bar{X}, \bar{Y}$ be the local extensions of the vectors $X, Y$ obtained by projecting by the map Π the plane generated by $X, Y$ at $T_{A} \mathcal{N}$ into the tangent space of $\mathcal{N}$.
Let us define local extensions $\bar{X}_{i}$ of the vector fields Xi in an analogous way we defined the extensions of $X, Y$: let Sk be the plane generated by $X_{1}, X_{2}, \ldots,X_{k}$ and let us project by Π the tangent space of Sk into $T\mathcal{N}$ by the differential of the projection into $\mathcal{N}$.
The terms $ \parallel \nabla_{\bar{Y}}\bar{X} \parallel^{2}, \langle \nabla_{\bar{X}}\bar{X}, \nabla_{\bar{Y}}\bar{Y}\rangle$ involve the Cristoffel symbols of the vector fields $\bar{X}, \bar{Y}$, at the point A we have:
where $\Gamma_{kl}^{i} = \langle \nabla_{\bar{X}_{k}}\bar{X}_{l}, \bar{X}_{i} \rangle $ is the Cristoffel coefficient. We follow [Reference do Carmo9] for the definitions and basic properties of Cristoffel coefficients.
The coefficient $\Gamma_{ij}^{k}$ can be calculated in terms of the coefficients of the first fundamental form of the metric at A, the inner products $g_{ij} = \langle X_{i}, X_{j} \rangle $ by the following formula:
where g im is the coefficient of the inverse of the first fundamental form of index im, $g_{mk,l}$ is the derivative with respect to $\bar{X}_{l}$ of the coefficient gmk and the above notation is Einstein’s convention for the sum on the index m.
The expression ‘inverse of the first fundamental form’ requires some explanation since we are dealing with an infinite-dimensional Riemannian manifold. One natural rigorous approach is to evaluate the series $\sum_{i=1}^{\infty} \Gamma_{kl}^{i}X_{i}$ as the limit of its partial sums $\sum_{i=1}^{n} \Gamma_{kl}^{i}X_{i}$ that includes the Cristoffel coefficients in the subspace of $T_{A}\mathcal{N}$ generated by $\{X_{1},X_{2}, \ldots,X_{n}\}$. The first fundamental form restricted to this subspace is a n × n matrix that, under our assumptions, is the identity. Its inverse is of course the identity. This allows us to define all the terms in the partial sum, then we take the limit as $n \rightarrow \infty$ to get the series. We shall prove that the series converges absolutely, so the above procedure provides the expression of $\nabla_{\bar{X}_{k}}\bar{X}_{l}$ as an infinite series.
In particular, since the basis $\{X_{1},X_{2}, \ldots,X_{n}, \ldots\}$ is orthonormal, the indices in the sum of the expression of $\nabla_{\bar{X}_{k}}\bar{X}_{l}$ according to Einstein’s convention just reduce to ii,kk, ll, depending on the case, and $g_{kl} = g^{kl} = \delta_{kl}$. So at the point A we get the formula
Lemma 5.1. The term $g_{ik,l}$ at A, for any permutation of the indices, is
Then,
Proof. We have that $g_{ik,l} = \bar{X}_{l} \langle \bar{X}_{i}, \bar{X}_{k} \rangle = \bar{X}_{l} \int \bar{X}_{i}\bar{X}_{k}\,\mathrm{d}\mu$. By the Leibniz rule, we have
where $\frac{\partial}{\partial \bar{X}_{l}} (\bar{X}_{i})$ is the derivative of the vector field $\bar{X}_{i}$ in the direction of $\bar{X}_{l}$.
Notice that Lemma 4.3 extends to the submanifolds Sk for every $k \in \mathbb{N}$. So we have
(1) $\frac{\partial}{\partial \bar{X}_{l}} (\bar{X}_{i}) = 0 $ if l ≠ i,
(2) $\frac{\partial}{\partial \bar{X}_{l}} (\bar{X}_{i}) = -1$ if l = i.
In both cases, since $\int \bar{X}_{i}\,\mathrm{d}\mu =0$ for every i, we get $g_{ik,l} = \int X_{i}X_{k}X_{l}\,\mathrm{d}\mu$ as claimed.
The expression for $\nabla_{\bar{X}_{k}}\bar{X}_{l}$ is straightforward from this formula.
Corollary 5.2. Let us assume that $X=X_{1}$ and $ Y= X_{2}$ are the first two vectors of the orthonormal base $\{X_{i} \}$. For the normalized potential $A= S(0,0)$, we get the following expressions
Moreover, for any pair $X,Y \in T_{A} \mathcal{N}$ the sums
are both finite.
Proof. We consider an extension of the family Xr, $r \in \mathbb{N}$, to all $ L^2(\mu)$ and we get a complete orthonormal base of the vector space $ L^2(\mu)$, given by $X_r, Y_s$, $r,s \in \mathbb{N}$. The first three expressions in the statement are straightforward from Lemma 5.1.
Given two elements $X,Y \in T_{A} \mathcal{N}$ consider $f=X Y = \sum_r a_r^f X_r + \sum_s b_s^f Y_s \in L^2(\mu)$, then,
It follows that $\sum_{i=1}^\infty ( \int X Y X_i \,\mathrm{d} \mu)^2 = \sum_{i=1}^\infty |a_i^f|^2\leq \parallel f\parallel^2$ is finite.
Denote $g= X^2 =\sum_r a_r^g X_r + \sum_s b_s^g Y_s $ and $h=Y^2= \sum_r a_r^h X_r + \sum_s b_s^h Y_s$. Therefore,
Form this follows that $\sum_{i=1}^\infty a_i^g a_i^h $ converges. Note that $ \int X^2 X_i \,\mathrm{d} \mu = a_i^g $ and $ \int Y^2 X_i \,\mathrm{d} \mu = a_i^h.$ Then,
converges.
Theorem 1.1 follows from direct calculation applying Corollary 5.2 to the expression of $K(X,Y)$.
6. A worked example in the Markov case: an orthonormal basis for the kernel of the Ruelle operator
From now on $M=\{0,1\}^{\mathbb{N}}$ and we denote by $\mathcal{K}$ the set of stationary Markov probabilities taking values in $\{0,1\}$.
In this section, given a probability $\mu_A\in K$, we will exhibit an orthonormal basis for the tangent space to $\mathcal{N}$ (the kernel of the Ruelle operator) at µA.
Given a finite word $x =(x_1,x_2, \ldots,x_k)\in \{0,1\}^k$, $k \in \mathbb{N}$, we denote by $[x]$ the associated cylinder set in $M=\{0,1\}^{\mathbb{N}}$.
Consider an invariant Markov probability µ obtained from a row stochastic matrix $(P_{i,j})_{i,j=0,1}$ and an initial left invariant vector of probability $\pi=(\pi_0,\pi_1)\in \mathbb{R}^2$.
Given $r \in (0,1)$ and $s\in (0,1)$, we denote
In this way $(r,s)\in (0,1) \times (0,1)$ parameterize all row stochastic matrices.
The explicit expression is
Definition 6.1. Denote by $J:\{0,1\}^{\mathbb{N}} \to \mathbb{R}$ the Jacobian associated to P. This function J is such that is constant equal
on the cylinder $[i,j]$, $i,j=0,1$.
According to our previous notation $\mu_A=\mu_{\log J}$ (which in this section will be called just µ).
Definition 6.2. The Ruelle operator for $\log J$ acts on continuous functions φ and is given by: for each $\varphi:M \to \mathbb{R}$, we get that
It is known that $\mathscr{L}_{\log J}^* (\mu)=\mu$ (see [Reference Parry and Pollicott18]).
We also consider the action of $\mathscr{L}_{\log J}$ on $L^2(\mu)$ and we are interested in the kernel of this operator when acting on Holder functions.
Given a finite word $x=(x_1,x_2, \ldots,x_n)$, depending of the context $[x]$ will either denote the word or the corresponding cylinder set in $ \{0,1\}^{\mathbb{N}}.$ The empty word is also considered a finite word.
We start by recalling that, given a Markov probability µ on $\{0,1\}^{\mathbb{N}}$, the family of Holder functions
where $x=(x_1,x_2, \ldots,x_n)$ is a finite word on the symbols $\{0,1\}$, is an orthonormal set for $\mathscr{L}^2 (\mu)$ (see [Reference Kessebohmer and Samuel12] for a general expression and [Reference Cioletti, Hataishi, Lopes and Stadlbauer7] for the specific expression we are using here). In order to get a (Haar) basis, we should add $e_{[\emptyset]}^0=\frac{1}{\sqrt{\mu ([0])}} \mathfrak{1}_{[0]}$ and $ e_{[\emptyset]}^1=\frac{1}{\sqrt{\mu ([1])}} \mathfrak{1}_{[1]}$ to this family.
Definition 6.3. Given a finite word $x=(x_1,x_2, \ldots,x_n)$, we denote
It will follow from Equations (33) and (34) that the terms $| a_x |$ are uniformly bounded away from zero (the minimum value is 2). Moreover, they depend just on the first letter of the word $[x]$.
Definition 6.4. We denote by
the normalization of ax.
In order to get a complete orthonormal set for the kernel of the Ruelle operator, we will have to add to the functions of the form (25) two more functions: $\hat{a}_{[\emptyset]}^0$ and $ \hat{a}_{[\emptyset]}^0$ to be set in Definition 6.8. To show this result is our main goal in this section. This family will be later denoted by $\mathcal{F}$ according to Definition 6.9.
In this direction, we first consider the problem of exhibiting an orthogonal family which is a basis for the kernel of the Ruelle operator, and later via normalization, we will get a complete orthonormal family which is a basis for the kernel of the Ruelle operator.
Following this line of reasoning, one of our main tasks in this section is to show the following:
Theorem 6.5. The family ax, indexed by all words $x=(x_1,x_2, \ldots,x_n)$, plus the two functions $e_{[\emptyset]}^0$ and $e_{[\emptyset]}^1$, determine an orthogonal set on the kernel of the Ruelle operator $\mathscr{L}_{\log J}$.
We will address first the issue related to the functions ax, and later to questions regarding the functions $e_{[\emptyset]}^0$ and $e_{[\emptyset]}^1$.
First note that as the family $e_{[x]}$, where x is a finite word, is orthonormal, then, ax, where x is a finite word with size bigger or equal to 1, is an orthogonal family.
Indeed, it follows from the fact that the family $e_{[x]}$ defined by Equation (23) is orthogonal, and the bilinearity of the inner product, that
for all $x=(x_1,x_2, \ldots,x_n)\neq z= (z_1,z_2, \ldots,z_k)$.
We shall subdivide the proof of Theorem 6.5 into several steps. First of all, we have that:
Proposition 6.6. Given $x=[x_1,x_2, \ldots,x_n]$ with a size larger or equal to 1,
From this follows that all elements in the orthogonal family
indexed by words $x=(x_1,x_2, \ldots,x_n)$, are in the kernel of the Ruelle operator $\mathscr{L}_{\log J}$.
Proof. We consider finite words x with size larger or equal to 1.
Indeed, given the word $x=(x_1,x_2, \ldots,x_n)$, let $L = \mathscr{L}_{\log J} ( e_{[x_1,x_2, \ldots,x_n]} )$, then we get
This is equal to
which is equivalent to
that yields
Then,
and therefore,
For each finite word $(x_1,x_2, \ldots,x_n)$ denote
From the above reasoning, it follows that the family ax is in the kernel of the Ruelle operator.
For words, x of size greater or equal to 1 the function ax is constant in cylinder sets of size equal to the length of x plus 2.
As an example, we get that
is constant on cylinders of size 3.
Note that if x and z are different words, then, 1x, 0x, 0z and 1z are four different words.
Note that
Therefore,
From the above, it follows that
Using the notation in the variables $r,s$ for the matrix P, when $x_1=0$ we get
and when $x_1=1$ we get
Definition 6.7. We denote by $\tilde{\mathcal{F}}$ the orthonormal set of normalized functions $ \hat{a}_x$, where $x= (x_1,x_2, \ldots,x_k)$ is a finite word with size equal or larger than 1.
As we mentioned before, we will have to add two more functions in order to get a basis (a completely orthogonal set in the Hilbert space) for the kernel of the Ruelle operator $\mathscr{L}_{\log J}$.
We claim that the orthogonal pair (constant in cylinders of size 2)
is in the kernel of the Ruelle operator (see Proposition 6.11). The functions V 1 and V 2 are orthogonal to all $\hat{a}_x \in \tilde{\mathcal{F}}$ and they depend on the first two coordinates $x_1,x_2$ of x.
The vectors $\hat{V}_1 = \frac{V_1}{|V_1| }$ and $\hat{V}_2 = \frac{V_2}{|V_2| }$ are normalized and orthogonal to all $\hat{a}_x$. This claim will be proved in Proposition 6.11.
One can show that
and
Definition 6.8. As a matter of notation, we denote $\hat{a}_{[\emptyset]}^0= \hat{V_1}$ and $\hat{a}_{[\emptyset]}^1= \hat{V_2}$.
These two functions are constant in cylinders of size 2.
Definition 6.9. We add $\hat{a}_{[\emptyset]}^0$ and $\hat{a}_{[\emptyset]}^1$ to the family $\tilde{\mathcal{F}}$ in order to get the family $\mathcal{F}$.
Remark 6.10. The elements in $\mathcal{F}$ range in all possible words of size larger or equal to zero. A generic element in $\mathcal{F}$ is denoted by $\hat{a}_x$, and by this we mean that $\hat{a}_x$ can eventually represent $\hat{a}_{[\emptyset]}^0$ or $\hat{a}_{[\emptyset]}^1.$
Proposition 6.11. The orthogonal pair
is such that, each one of them is orthogonal to the other elements $\hat{a}_x$, where x ranges in all finite words with size bigger or equal to 1. V1 and V2 are on the kernel of the Ruelle operator $\mathscr{L}_{\log J}$.
Proof. Note first that $\mathfrak{1}_{[00]} $ is orthogonal to all ax, where $x=(x_1,x_2, \ldots,x_n)$ is a word with size equal or greater than 1. This claim follows from (28). Indeed, if $x_1=0$, we get that
If $x_1=1$ the claim follows at once.
Using the same reasoning one can show that $\mathfrak{1}_{[01]},\mathfrak{1}_{[10]},\mathfrak{1}_{[11]} $ are orthogonal to all ax, where length of x is bigger than zero. It follows that linear combinations of this functions are also orthogonal to all ax. It follows that V 1 and V 2 are orthogonal to all ax, where the length of x is bigger than zero.
We will show that V 1 is in the kernel of the Ruelle operator (for V 2 the proof is similar). Given $y=(y_1,y_2, \ldots,y_n, \ldots)\in M$, suppose first that $y_1=0$, then, we get
In the case $y_1=1$, we get
Remark 6.12. A function of the form $w=r_1 \mathfrak{1}_{[0]} + r_2 \mathfrak{1}_{[1]}$ is in the kernel of $\mathscr{L}_{\log J}$ only in the case where $P_{01}=(1-r)= s=P_{11}$. In this case
is such that $\mathscr{L}_{\log J} (w)=0.$
We do not have to take into account in our future reasoning this function because
Proposition 6.13. The family of elements in $ \mathcal{F}$ (see Definition 6.9 and Remark 6.10) is an orthonormal basis for the kernel of the Ruelle operator $\mathscr{L}_{\log J}$.
Proof. From Proposition 6.6, we know that given $x=[x_1,x_2, \ldots,x_n]$
Suppose φ is in the kernel of the Ruelle operator. We will show that φ can be expressed as an infinite linear combination of the normalized functions $\hat{a}_x \in \mathcal{F}$.
We can express φ as
When applying $\mathscr{L}_{\log J}$ on φ, we separate the infinite sum in subsums of the form
Assuming that φ is in the kernel of $\mathscr{L}_{\log J}$, we get from Equation (40) that
Then, for fixed n and $(\alpha_2,\alpha_3, \ldots,\alpha_n)$
which means
Then, the sum
is equal to
Multiplying the above expression by $\frac{\sqrt{\pi_{\alpha_2}}}{\sqrt{\pi_{1} }} \frac{1}{\sqrt{P_{1,\alpha_2}}} $, we get
which is equal to
Then, $( c_{0,\alpha_2, \ldots,\alpha_n } e_{[0,\alpha_2, \ldots,\alpha_n]} + c_{1,\alpha_2, \ldots,\alpha_n } e_{[1,\alpha_2, \ldots,\alpha_n]} )$ is a multiple of the function $\hat{a}_{[\alpha_2, \ldots,\alpha_n]}.$ Since the above reasoning was done for a generic choice of $(\alpha_2,\alpha_3, \ldots,\alpha_n),$ we conclude that for each n the sum $\sum_{\text{words} \,y \,\text{of length} \,n} c_y e_{[y]}$ can be expressed as a linear combination of elements $\hat{a}_x$, using words of length n − 1, n > 1.
From this follows that each element in the kernel of $\mathscr{L}_{\log J} $ can be expressed as an infinite linear combination of the functions $\hat{a}_x$.
Theorem 6.5 follows from the combination of Propositions 6.6 and 6.13.
The above shows that the set $\mathcal{F}$ is a complete orthonormal set for the kernel of the Ruelle operator acting on $\mathscr{L}^2(\mu).$
7. A worked example in the Markov case: preliminary calculations of the terms in $K(X,Y)$
In this section, we shall devote ourselves to the calculation of the sectional curvatures in the case of Markov stationary probabilities on $M=\{0,1\}^{\mathbb{N}}$.
We denote by $K\subset \mathcal{N},$ the set of Markov invariant probabilities. We will consider this section the sectional curvature for points in $\mathcal{K}$ for general orthogonal pairs of tangent vectors to $\mathcal{N}.$
We can also consider $\mathcal{K}$ as a two-dimensional manifold carrying the Riemannian structure induced by $\mathcal{N}.$ From this point of view, there exists just one orthonormal pair to be considered. One of our main results (see Theorem 7.14) claims that for the two-dimensional manifold $K,$ for any point in $\mathcal{K}$, the sectional curvature for the pair of tangent vectors to $\mathcal{K}$ is always zero.
We will consider in our reasoning the empty word as a regular word. $\hat{a}_\emptyset^0 $ and $\hat{a}_\emptyset^1 $ are two elements in $\mathcal{F}$ associated with the empty word.
Definition 7.1. We say that z is a subprefix of x, if x and z satisfy
where $n \geq k$.
Note that, even when z is not a subprefix of x and x is not a subprefix of z, they can share some common subprefix. Note also that if x and z do not share a common subprefix, then z is not a subprefix of x and x is not a subprefix of z.
If $[x]=[z]$, then, x is a subprefix of $z.$
Definition 7.2. We say that z is a strict subprefix of x, if x and z satisfy
where n > k.
Two different words with the same length cannot be subprefix of each other. If the length of z is strictly larger than the length of x, then, z cannot be a subprefix of x.
Definition 7.3. Given the finite words $x,z$ we denote by $D[x,z]$ the set of all finite words y such that are subprefix of x and z.
If for example $ x=(0,0,0)$ and $z=(0,0,0,1)$, then
In the case $x=(0,1,0,0,1) $ and $z=(0,1,1)$ we get that $D[x,z]= \{\hat{a}_\emptyset^0(0),(0,1),\}.$
Another example: $D[a_{0,0}, \hat{a}_\emptyset^0]=\{\hat{a}_\emptyset^0 \}$ and $D[a_{0,0}, \hat{a}_\emptyset^1]=\emptyset.$
Note that in the case $z=(z_1,z_2, \ldots,z_k)$ is a subprefix of $x=(x_1,x_2, \ldots,x_n)$, n > k, then, $z_1=x_1$. Then, it follows from (32) that $|a_x|=|a_z|$.
Proposition 7.4. Assume that x is not a subprefix of z and z is not a subprefix of x. Then,
Proof. Note that az is a linear combination of $\mathfrak{1}_{[0z0]}, \mathfrak{1}_{[0z1]}, \mathfrak{1}_{[1z0]}$ and $\mathfrak{1}_{[1z1]}$. As ax is a linear combination of $\mathfrak{1}_{[0x0]}, \mathfrak{1}_{[0x1]}, \mathfrak{1}_{[1x0]}$ and $\mathfrak{1}_{[1x1]}$, the result follows.
Note that the hypothesis of the last proposition is equivalent to saying that the cylinders $[x]$ and $[z]$ are disjoint.
Corollary 7.5. Given a word x assume that x is not a subprefix of y and y is not a subprefix of x. Then,
Proof. This follows from at once from Proposition 7.4.
Note that if x and y have the same length, but they are different, then $\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu=0.$
From Proposition 7.4, it follows:
Corollary 7.6. Assume that x is not a subprefix of z and z is not a subprefix of x. Then, we get that the products (part of the first sum contribution in Equation (44)) satisfy
for all word y.
Remember that $\mathcal{F}$ (defined in last section) is the set of all functions of the form
where $x= (x_1,x_2, \ldots,x_k)$ is a general finite word, plus the functions $\hat{a}_{[\emptyset]}^0 $ and $\hat{a}_{[\emptyset]}^1$.
Remember that Proposition 6.13 of last section claims that the family of functions $\mathcal{F}$ determines an orthonormal basis for the Kernel of the Ruelle operator.
We want to estimate for $X=\hat{a}_x,$ $Y= \hat{a}_z \in \mathcal{F}$ and the orthogonal basis $X_i= \hat{a}_y\in \mathcal{F}$ the explicit expression of the curvature which was described in Theorem 1.1
We will not present the explicit expression of the sectional curvature $K(X,Y)$ for any pair of vectors $X,Y$ in the kernel, but just for the case where the functions $X,Y$ are part of the family $\hat{a}_x\in \mathcal{F}$.
An important issue is: $0=\langle \hat{a}_z^2, \hat{a}_y\rangle = \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu$, when the length of y is strictly larger than the length of z (as will be proved in § 8 and 9). We mention this point to stress the point that the last sum in expression (44) is a sum of a finite number of terms.
Our main result in this section concerns the Markov case:
Theorem 7.7. For a fixed pair $\hat{a}_x, \hat{a}_z\in \mathcal{F}$ (with z different from x) the value
In the case the length of x is strictly larger than the length of z we get that Equation (44) can be expressed in a more simplified form as:
In this case, the above expression is a sum of a finite number of terms.
In the general case, the value $\int \hat{a}_x^2 \hat{a}_z \,\mathrm{d} \mu$ is zero if z is not a subprefix of x. If z is a strict subprefix of x and y is a strict subprefix of z, then the term
is non-positive. Moreover, by Equation (75), we get $\int \hat{a}_z^2 \hat{a}_x \,\mathrm{d} \mu=0$. Then, it follows that Equation (44) is a sum of a finite number of terms, for any given x and z, with z ≠ x.
The proof of this result will take several sections and subsections. Proposition 7.12 will summarize several explicit computations that are necessary in our reasoning.
We will also provide an explicit expression for the curvature (45) in terms of the words $x,z$ and the probability µ (which is indexed by (r, s) of expression (20)). This will follow from explicit expressions for $(\int a_x a_z a_y \,\mathrm{d} \mu)^2 $, $ \int a_z^2 a_y \,\mathrm{d} \mu $ and $ \int a_x^2 a_y \,\mathrm{d} \mu$, for all finite words $x,z,y$, that will be presented the Propositions 7.9 and 7.12 (which will be proved in § 8 and 9).
It will also follow that when x and z do not share a common subprefix y, then the curvature $K(\hat{a}_z,\hat{a}_x)$ is equal to 0 (see Proposition 7.10).
There are examples (for instance, the case $ x=(0,1,0)$ and $z=(0,1,0,0)$) where the curvature $K(\hat{a}_z,\hat{a}_x) $ is positive for some values of the parameters (r, s) and negative for others (see Example 7.19). We can show from the explicit expressions we obtain that for fixed values of the parameters (r, s) the curvature $K(\hat{a}_z,\hat{a}_x)$ can be very negative if both words $x,z$ have large lengths and share common subprefix with large length (see Remark 7.17). In Example 7.20, we show that $K( \hat{a}_{(0)}, \hat{a}_{(0,0)} )= -0.205714 \cdots $, when $r=0.1,s=0.3$. In Proposition 7.18, we show the curvature $K(\hat{a}_{[\emptyset]}^0 ,\hat{a}_0)$ can be positive for some pairs $r,s\in(0,1)$. It follows from the expressions of Proposition 7.12 that all sectional curvatures $K( \hat{a}_{z}, \hat{a}_{x} )$ are equal to $-1/2$, when $r=1/2=s$, the size of z is bigger than 1 and z is a strict subprefix of x. See also Proposition 7.18, when $r=1/2=s$, for the computation of $ K(\hat{a}_{[\emptyset]}^0,\hat{a}_0)=1/2$.
Remark 7.8. Expression (73) in § 8.3 shows that in the case the length of x is larger than the length of z, then $(\int \hat{a}_z^2 \hat{a}_x d \mu)^2=0.$
Proposition 7.9. Assume that the length of x is larger than the length of z. The first sum on expression (44) is given by
For a proof of this claim see expression (78) in § 9. This term in the sum (44) is the part that contributes to the curvature to be more positive. The second term in the sum (44) will contribute to the curvature becoming more negative (see Proposition 7.12).
Note that Equation (47) does not depend on y. Note also that from expression (21) one can get explicitly the values (47) as a function of (r, s).
In Proposition 7.4, we show that if x is not a subprefix of z and z is not a subprefix of x, we get that $\int \hat{a}_x^2 \hat{a}_z \,\mathrm{d} \mu=0$. In this case, the contribution of Equation (47) for the curvature will be null.
Proposition 7.10. When z and x do not share common subprefix the curvature
Proof. When z and x do not share a common subprefix, it follows that x is not a subprefix of z and z is not a subprefix of x.
We will show that in this case $K(\hat{a}_z,\hat{a}_x)=0$. Indeed, from Proposition 7.4, we get that $(\int \hat{a}_x^2 \hat{a}_z \,\mathrm{d} \mu)^2 + (\int \hat{a}_z^2 \hat{a}_x \,\mathrm{d} \mu)^2=0.$ Fix the words z, x and consider a variable word y. In order to estimate the second sum in expression (45), we have to consider all different possible words y such that are subprefix of x and z. But there is no such kind of y.
Therefore, $K(\hat{a}_z,\hat{a}_x)=0$.
See also Proposition 7.18, when $r=1/2=s$, for the computation of other sectional curvatures.
Remark 7.11. It follows from Remark 7.8 that $ \sum_{\text{word}\, y} \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu $ is a sum of a finite number of terms; because when estimating $\int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu$, we do not have to take into account words y with length strictly larger than the minimum of the lengths of x and z. It also follows from Proposition 7.10 that if x is not a subprefix of y and y is not a subprefix of x, we get that $ \int a_x^2 a_y \,\mathrm{d} \mu=0.$
Note that the above makes clear that in expression (44), the second sum has non-zero terms only when $y \in D[x,z]$. This justifies the simplified expression (45).
With all this in mind, in order to have explicit expressions, the next proposition deals just with the words y with lengths smaller than or equal to the length of a given word x.
Proposition 7.12. Assume that the length of x is larger or equal to the length of y. Then we have:
(a) $\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu=0$, if y is not a subprefix of x. This also includes the case where x ≠ y and length of x is equal to the length of y.
(b.0) Assume that $[x]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n] \subset [y]=[x_1,x_2, \ldots,x_k]$, where n > k, and $x_{k+1}=0$. Note that from Equation (32) we get that $|a_x| =|a_y|$. Then,
(b.1) Assume that $[x]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n] \subset [y]=[x_1,x_2, \ldots,x_k]$, where n > k, and $x_{k+1}=1$. Then,
(b.2) Assume, $[x]=[x_1,x_2, \ldots,x_n] = [y].$
Then,
(b.3) If $x_1=0$, then
and
When $r=1/2=s$, we get that for any word x (with size bigger or equal to 1), such that, $x_1=0$
• We point out that Equations (48) and (49) do not depend on $x_{k+2}, \ldots,x_{n-1},x_n$.
• If $y\in D[x,z]-\{z\}$, then the product $\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu$ is non-negative for any choice of (r, s) (the product will not depend on x and z). This follows from the expressions in (b.0) and (b.1). This shows Equation (46).
• The term $\int \hat{a}_x^2 \hat{a}_z \,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_z \,\mathrm{d} \mu$ may be sometimes negative.
We previously denoted by $\mathcal{K}$ the two-dimensional manifold of Markov invariant probabilities (the set of equilibrium probabilities for potentials depending on two coordinates and parametrized by $r,s$).
Given the Markov invariant probability µ associated with the parameters $r,s$, the set of vectors which are tangent to $\mathcal{K}$ at this point is the set of functions that depend on two coordinates $(x_1,x_2)$. The ones that are on the kernel of $\mathscr{L}_{\log J}$ are $\hat{a}_{[\emptyset]}^0 $ and $\hat{a}_{[\emptyset]}^1.$
Theorem 7.14. Given the two-dimensional manifold of Markov invariant probabilities $M,$ for any point in $\mathcal{K}$ the sectional curvature for the pair of tangent vectors to $\mathcal{K}$ is always zero.
Proof. Remember that $ V_1 = \pi_1 P_{1,0} \mathfrak{1}_{[00]} - \pi_0 P_{0,0} \mathfrak{1}_{[10]}$ and $ V_2 = \pi_0 P_{0,1} \mathfrak{1}_{[11]} - \pi_1 P_{1,1} \mathfrak{1}_{[01]}$ determine an orthogonal basis for the tangent space to $\mathcal{K}$ at µ.
We claim that the curvature $K(\hat{a}_\emptyset^0,\hat{a}_\emptyset^1)= 0$.
Indeed, take $X_i= \hat{a}_z$, for some finite word $z=(x_1,x_2, \ldots,x_k)$. If we assume that $x_1=1$, then
Above we use the fact that $\mathfrak{1}_{[0 1 x_2...x_k 0]} \mathfrak{1}_{[00]}=0$, etc.
Therefore, it follows that:
If we assume that $x_1=0$, then in a similar way $\int \hat{a}_z V_2^2 \,\mathrm{d} \mu =0$, and therefore, $\int \hat{a}_z \hat{V}_1^2 \,\mathrm{d} \mu \int \hat{a}_z \hat{V}_2^2 \,\mathrm{d} \mu =0.$
Note that $V_1 V_2=0.$
Then, for any word y, we get $\int \hat{V}_1 \hat{V}_2 \hat{a}_y \,\mathrm{d} \mu=0.$ In the same way, $\int \hat{V}_1^2 \hat{V}_2 \,\mathrm{d} \mu=0$ and $\int \hat{V}_2^2 \hat{V}_1 \,\mathrm{d} \mu=0$.
Finally, we get
Remark 7.15. Recall that the expression of the Gauss sectional curvature $K_{M}(X,Y)$ of an isometric immersion $(M,g_{M})$, submanifold of the Riemannian manifold (N, g), at the plane generated by two orthogonal vector fields $X, Y$ tangent to $\mathcal{K}$, is given by
according to Gauss formula (see for instance [Reference do Carmo9]). Here, the operator $\nabla^{\perp}_{X}Y$ is the component of the covariant derivative $\nabla_{X}Y$ of the Riemannian manifold (N, g) that is normal to $(M,g_{M})$.
Notice that the sectional curvature
includes all the terms of the normal component of the covariant derivative of $X,Y$. By Theorem 7.14, all the components of the covariant derivative of a certain pair of orthogonal vector fields tangent to the surface of Markov probabilities vanish. In particular, all the terms of the normal covariant derivative of $X,Y$ vanish. Therefore, Theorem 7.14 yields that the Gaussian curvature of the surface of Markov probabilities vanishes, its intrinsic curvature as an isometric immersion of the manifold of normalized potentials is zero. This is a remarkable fact, which implies for instance that the surface would be totally geodesic in the manifold of normalized potentials provided that geodesics exist. We won’t consider the problem of the existence of geodesics in this article, we shall study this problem in further papers.
Proposition 7.16. When r = 0.5 and s = 0.5, we get that
for words $x,y$ with size bigger or equal to 1
Proof. It follows from the above proposition that due to symmetry, when r = 0.5 and s = 0.5, we get $\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu=0$, for words with size bigger or equal to 1. Moreover, $\int \hat{a}_x^2 w \,\mathrm{d} \mu=0$, for any ax. In this case, if $x_1=0$ and x is a subprefix of y, we get that for words with size bigger or equal to 1 (see Equation (52)),
Remark 7.17. From the explicit expressions, we obtain (for fixed values of the parameters (r, s)) the curvature $K(\hat{a}_z,\hat{a}_x)$ can be very negative if both words $x,z$ have large lengths and have common subprefix y with large length. Indeed, for fixed $\hat{a}_z,\hat{a}_x$, as $\int \hat{a}_x^2 \hat{a}_y\,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_y\,\mathrm{d} \mu$ is non-negative for any common word y, in the calculus of the curvature $K(\hat{a}_z,\hat{a}_x)$, we get a sum of several expressions $\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu$. Note that $\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu$ does depend on y (but not on x and z). Note also that for fixed x the expression (48) can be very large if the length of y is very large (and, so $\mu([a y b])$, $a,b=0,1$, is very small).
Proposition 7.18. The curvature $K(\hat{a}_{[\emptyset]}^0,\hat{a}_0)=1/2$, when $r=1/2=s$.
Proof. Note that Equation (45) can be expressed as
For any $r,s$, it is known from Equation (51) that
Note that
Then,
which is equal to $ \frac{1}{2}^6 \frac{1}{2}^2 - \frac{1}{2}^6 \frac{1}{2}^2 =0$, in the case $r=1/2=s$.
Therefore,
In other examples, we used the software Mathematica for getting explicit computations.
Example 7.19. Consider the case where $ z=(0,1,0)$ and $x=(0,1,0,0).$
$\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu =0$, unless $\hat{a}_y$ is such that
Note that $\int \hat{a}_z^2 \hat{a}_{[\emptyset]}^1 \,\mathrm{d} \mu=\int \hat{a}_x^2 \hat{a}_{[\emptyset]}^1 \,\mathrm{d} \mu=0.$
Using Mathematica and the formulas of Proposition 7.12, we made computations when r = 0.1 and s = 0.3. In this case, $\pi_0=0.4375$ and $\pi_1=0.5625$ and from Equation (25) we get $|a_{(0,1,0)}|=|a_{(0,1,0,0)}|=3.33 \cdots $ and $|V_1|= 0.086 \cdots.$ Finally, $\frac{1}{|a_x|^2 |a_z|}= \frac{1}{|a_z|^3}=\frac{1}{|a_x|^2 |a_y|}=\frac{1}{|a_z|^2 |a_y|}=0.027 \cdots. $
We will show that $K( \hat{a}_{(0,1,0)}, \hat{a}_{(0,1,0,0)} )=35.9142 \cdots.$
We get the following values:
and finally, using Equation (51)
Using Equations (51) and (36) (note that $x_1=0$), we get that the expression (45) can be written in this case as
Taking $r=0.8,s=0.5$, we get $K( \hat{a}_{(0,1,0)}, \hat{a}_{(0,1,0,0)} )=-3.17713 \cdots.$ When $r=1/2=s$, we get $K( \hat{a}_{(0,1,0)}, \hat{a}_{(0,1,0,0)} )=-1/2$.
$ \diamondsuit$
Example 7.20. Consider the case where $ z=(0)$ and $x=(0,0).$ Then, $D[(0),(0,0)] =\{\hat{a}_0, \hat{a}_{[\emptyset]}^0\}.$ Therefore,
In this case, using Mathematica, one can show that $K( \hat{a}_{(0)}, \hat{a}_{(0,0)} )\leq 0$, for all values $r,s\in(0,1)$. For r = 0.1, s = 0.3, we will show that $K( \hat{a}_{(0)}, \hat{a}_{(0,0)} )= -0.205714 \cdots.$
When, $r=0.1,s=0.3$, we get
Finally, when r = 0.1, s = 0.3, we get $K( \hat{a}_{(0)}, \hat{a}_{(0,0)} )= -0.205714 \cdots.$
$ \diamondsuit$
8. Computations for the integral $\int X^2 Y $
Our purpose in this section is to evaluate the integral
for any given pair of words $x,z$. This corresponds to the second term in the sum given by expression (44).
We assume that x is different from z.
From Proposition 7.5, if x is not a subprefix of y and y is not a subprefix of x, and x ≠ y, then:
In the same way, if z is not a subprefix of y and y is not a subprefix of z, and z ≠ y, then:
If y has the same length as x but y ≠ x, then $\hat{a}_x^2 \hat{a}_y=0.$
In this way, for a fixed pair of words $x,z$, several words y do not contribute to the sum (74).
8.1. The value of $\langle \hat{a}_x^2, \hat{a}_y\rangle $ when length of x is larger or equal than the length of y
We want to compute $\langle \hat{a}_x^2, \hat{a}_y\rangle =\int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu$ in the case where the length of x is larger or equal to the length of y.
Our computation is in fact for $\langle a_x^2, a_y\rangle $ and after that, of course, to get $\langle \hat{a}_x^2, \hat{a}_y\rangle $ it will be necessary to divide by $|a_x|^2 |a_y|.$
We assume that $[x]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n] \subset [y]=[x_1,x_2, \ldots,x_k]$, where $n \geq k$ (otherwise we get zero).
Note that these assumptions include the integral $\int \hat{a}_x^3 \,\mathrm{d} \mu$, that is, the case x = y (see (iii) below).
(i) Case n > k – We will assume first that $x_{k+1}=0$ in the word $[x]$.
Given the words $z=(v_1, \ldots,v_t)$ and $v=(v_1,v_2, \ldots,v_t, v_{t+1}, \ldots,v_m)$, assume $v_{t+1}=0$, then, from Equations (23) and (58)
Note that in the above reasoning when going from the second to the third line the term multiplying $\mathfrak{1}_{[v_1,...,v_t , 1]} $ disappear because we assume that $v_{t+1}=0.$
We are going to apply the above when $z=[0 y], z=[1 y], v=[0 x], v=[1 x], m=n$ and $t+1=k$.
Then, from Equations (27), (31) and (58) and using the fact that
we get
Finally, as the matrix P is row stochastic
(ii) Case n > k – If we assume $x_{k+1}=1$ in the word $[x]$, then we get in a similar way as before
Indeed, given the words $z=(v_1, \ldots,v_t)$ and $v=(v_1,v_2, \ldots,v_t, v_{t+1}, \ldots,v_m)$, assume $v_{t+1}=1$, then, from Equations (23) and (58)
We are going to apply the above when $z=[0 y], z=[1 y], v=[0 x], v=[1 x], m=n$ and $t=k+1$.
Then, from Equations (27), (31) and (61) and using the fact that
we get
Finally, as the matrix P is row stochastic
(iii) Case n = k – We assume $[x]=[x_1,x_2, \ldots,x_n] = [y]$, otherwise $ \int \hat{a}_x^2 \hat{a}_y \,\mathrm{d} \mu=0. $ Then, one can show that
Indeed, note first that from Equation (58), $v=[v_1,v_2, \ldots,v_m]$
Then, from Equations (27), (31) and (61)
Therefore,
The above reasoning shows (iii).
Given the word $[x]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n] $, we get n words y, such that the cylinder $[x]\subset [y]=[x_1,x_2, \ldots,x_k]$, where $n\geq k$.
Given x and z, with length larger than y, then $\int \hat{a}_{x}^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_{z}^2 \hat{a}_y \,\mathrm{d} \mu$ will be non-zero only for the subprefixes y which are common to both x and z (see Proposition 7.5). If there are no common subprefixes for x and z, then the contribution $\int \hat{a}_{x}^2 \hat{a}_y \,\mathrm{d} \mu \int \hat{a}_{z}^2 \hat{a}_y \,\mathrm{d} \mu$, for words y of length strictly smaller than the length of x and z, in the sum (45) is null.
8.2. The values of $\langle \hat{a}_x^2, \hat{a}_{[\emptyset]}^0\rangle $ and $\langle \hat{a}_x^2, \hat{a}_{[\emptyset]}^1\rangle $ when x is a finite word
Denote $[x]=[x_1,x_2, \ldots, x_n]$. We assume that $n\geq 2.$
In fact, we will compute $\langle a_x^2, V_1\rangle $ and $\langle a_x^2, V_2\rangle $. In order to compute $\langle \hat{a}_x^2, \hat{a}_{[\emptyset]}^0\rangle $ and $\langle \hat{a}_x^2, \hat{a}_{[\emptyset]}^1\rangle $, it will be necessary to normalize.
(i) Case $\langle a_x^2, V_1\rangle $
We will consider first the case $x_1=0$.
Denote $y=(y_1,y_2, \ldots,y_k)$. If we assume $y_1=0,y_2=0$, then, from Equation (58)
If we assume $y_1=1,y_2=0$, then, from Equation (58)
As we assume that $x_1=0$, then, from Equations (27), (31) and (61), we get
Therefore,
As we assumed that $x_1=0$ we get
(ii) $\langle a_x^2, V_2\rangle $
Now we will compute $ \int a_x^2 V_2\,\mathrm{d}\mu.$
Denote $y=(y_1,y_2, \ldots,y_k)$. If we assume $y_1=0,y_2=0$, then, from Equation (58)
If we assume $y_1=1,y_2=0$, then, from Equation (58)
As we assumed that $x_1=0$, then, from Equations (27), (31) and (61), we get
Therefore, if $x_1=0$, we get
The case $x_1=1$ is left for the reader.
8.3. The value of $\langle \hat{a}_z^2, \hat{a}_y\rangle $ when length of y is strictly larger than the length of z
Now we want to estimate $\langle \hat{a}_z^2, \hat{a}_y\rangle =\int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu$ in the case that the length of y is strictly larger than the length of z. We will show that $\int \hat{a}_z^2 \hat{a}_y \,\mathrm{d} \mu=0.$
We assume that $[y]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n] \subset [z]=[x_1,x_2, \ldots,x_k]$, where n > k (otherwise we get that$\int \hat{a}_z^2 \hat{a}_y d \mu$ is zero from Proposition 7.5).
In fact, we will show that $\int a_z^2 a_y \,\mathrm{d} \mu=0.$
(i) If we assume $x_{k+1}=0$ in the word $[y]$, then, from Equation (58)
Note that above, from the second to the third line, we use the fact that
and
Then, from Equations (27), (70) and (31)
Finally,
(ii) If we assume $x_{k+1}=1$ in the word $[y]$, then, from Equation (58)
Then, from Equations (27), (72) and (31)
Finally,
9. Computations for the integral $\int X Y Z $
Our purpose on this section is: given x and z, we want to compute for all y
which corresponds to the first term in the sum given by expression (44).
Remember that from Corollary 7.12 if x is not a subprefix of z and z is not a subprefix of x, we get that for any y
Without loss of generality, we assume that z is a subprefix of x (see Proposition 7.4). The only possible non-zero value for Equation (74) is $\int \hat{a}_x^2 \hat{a}_z \,\mathrm{d} \mu.$ This justify the first term in the sum (45).
We assume first that:
$[y]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n,x_{n+1}, \ldots,x_j] \subset [x]=[x_1,x_2, \ldots,x_k,x_{k+1}, \ldots, x_n] \subset [z]=[x_1,x_2, \ldots,x_k]$, where $j \gt n\geq k$.
We will show in all cases that $ \int a_x a_y a_z \,\mathrm{d} \mu=0.$ This includes the case
(i) First we assume that $x_{k+1} =0= x_{n+1}.$
Then,
Note that for all j
and
Finally, from Equations (76) and (77)
(ii) Now we assume that $x_{k+1} =1= x_{n+1}.$ In a similar way as before
(iii) Now if we assume that $x_{k+1} =0 $ and $ x_{n+1}=1$ or that $x_{k+1} =1 $ and $ x_{n+1}=0$, we get that in a similar way that
After all these computations, for fixed $\hat{a}_x$ and $\hat{a}_z$, we want to compute $K(\hat{a}_z,\hat{a}_x)$. In this direction, we have to consider Equation (74) which is the first sum in expression (44).
We wonder for which y we have that $ (\int \hat{a}_x \hat{a}_z \hat{a}_y \,\mathrm{d} \mu)^2\neq 0.$ We assumed without loss of generality that z is a subprefix of x. In this case, the length of x is strictly larger than the length of z.
Considering first the case where the length of y is larger than z and x, it follows from the above that
Now we consider the case where the length of y is strictly smaller than the length of z and x.
For the case where the length of y is strictly smaller than z and x, we need to assume that y is a subprefix of z (otherwise $\hat{a}_y \hat{a}_z=0$ and we get $ (\int \hat{a}_x \hat{a}_z \hat{a}_y \,\mathrm{d} \mu)^2=0$). If y is a strict subprefix of z and z is a strict subprefix of s we get from the above that $ (\int \hat{a}_x \hat{a}_z \hat{a}_y \,\mathrm{d} \mu)^2=0$.
Finally, we assume that the length of y is strictly smaller than x and strictly larger than z. In this case, we have to assume that x is a subprefix of y and y is a subprefix of z (otherwise by Proposition 7.4 we have $ \int \hat{a}_x \hat{a}_z \hat{a}_y \,\mathrm{d} \mu^2=0$). It follows from the above that also in this case $ (\int \hat{a}_x \hat{a}_z \hat{a}_y \,\mathrm{d} \mu)^2=0$.
Therefore, in the estimation of expression (74), it follows from our reasoning that all elements in this sum are zero up to expressions $(\int \hat{a}_x^2 \hat{a}_z \,\mathrm{d} \mu)^2$ and $(\int \hat{a}_z^2 \hat{a}_x \,\mathrm{d} \mu)^2$, that is, the cases where y = x or y = z. From Proposition 7.5, we have to assume that x is a subprefix of z or vice versa. The explicit expressions for these two cases were analysed in § 8.1 and 8.3.
If the length of x is larger than the length of z, then, from Equation (73) we get $(\int \hat{a}_z^2 \hat{a}_x d \mu)^2 =0.$
The final conclusion is that