1 Introduction
1.1 Weighted topological entropy and pressure
The purpose of this paper is to introduce a new approach to weighted topological entropy and pressure introduced by Feng and Huang [Reference Feng and HuangFH16]. In this subsection, we describe their original theory. We explain our new approach in the next subsection.
We first quickly review the classical theory of entropy and pressure of dynamical systems. See the book of Walters [Reference WaltersWal82] for the details. A pair $(X, T)$ is called a dynamical system if X is a compact metrizable space and $T:X\to X$ is a continuous map. We denote its topological entropy by $h_{\mathrm {top}}(X,T)$ . This is a topological invariant of dynamical systems, which counts the number of bits per iterate for describing the orbits of $(X,T)$ .
One of the most basic theorems about topological entropy is the variational principle. We define $\mathscr {M}^T(X)$ as the set of invariant Borel probability measures on X. For each measure $\mu \in \mathscr {M}^T(X)$ , we denote its Kolomogorov–Sinai entropy by $h_\mu (T)$ . Then the variational principle states that [Reference Downarowicz and HuczekDin70, Reference GoodmanGoodm71, Reference GoodwynGoodw69]
This theory can be generalized to pressure. Let $(X, T)$ be a dynamical system with a continuous function $f:X\to \mathbb {R}$ . Motivated by statistical mechanics, Ruelle [Reference RuelleRue73] (in some special cases) and Walters [Reference WaltersWal75] (for general systems) introduced the topological pressure $P(T, f)$ and proved the variational principle
The above (1.1) and (1.2) are classical and standard in ergodic theory. Recently, Feng and Huang [Reference Feng and HuangFH16] found an ingenious generalization of this classical theory. Motivated by fractal geometry of self-affine carpets and sponges [Reference Downarowicz and HuczekBed84, Reference Kenyon and PeresKP96a, Reference McMullenMc84], they introduced weighted versions of entropy and pressure.
Let $(X, T)$ and $(Y, S)$ be dynamical systems. A map $\pi :X\to Y$ is called a factor map if $\pi $ is a continuous surjection with $\pi \circ T = S\circ \pi $ . We sometimes write $\pi :(X, T)\to (Y, S)$ for clarifying the maps T and S. For an invariant probability measure $\mu \in \mathscr {M}^T(X)$ , we denote by $\pi _*\mu \in \mathscr {M}^S(Y)$ the push-forward of $\mu $ by $\pi $ (this is defined by $\pi _*\mu (A) = \mu (\pi ^{-1}A)$ for $A\subset Y$ ). Let $f:X\to \mathbb {R}$ be a continuous function, and let $a_1, a_2$ be two real numbers with $a_1>0$ and $a_2\geq 0$ . Feng and Huang [Reference Feng and HuangFH16, Question 1.1] asked (and then solved) the following question.
Question 1.1. How can one define a meaningful term $P^{(a_1, a_2)}(T, f)$ such that the following variational principle holds?
We describe their approach below. It is a modification of the definition of topological entropy given by Bowen [Reference BowenBow73], which is in turn a modification of the standard definition of the Hausdorff dimension.
Here we explain only the case of $f\equiv 0$ for simplicity of the exposition. For the case of $f\not \equiv 0$ , see their paper [Reference Feng and HuangFH16, §3.1] (they also studied the case where a sequence of factor maps $\pi _i:X_i\to X_{i+1}$ ( $i=1, 2, \ldots , k$ ) is given. We think that our new approach can be also generalized to this setting. However, we concentrate on the simplest case in this paper).
Let d and $d^\prime $ be metrics on X and Y respectively. For $x\kern1.2pt{\in}\kern1.2pt X$ , a natural number n and $\varepsilon\kern1.2pt{>}\kern1.2pt0$ , we define $B_n^{(a_1, a_2)}(x,\varepsilon ) \subset X$ as the set of $y\in X$ satisfying the following two conditions:
Here $\lceil u\rceil $ denotes the least integer not less than u. We call $B_n^{(a_1, a_2)}(x,\varepsilon )$ an $(a_1,a_2)$ -weighted Bowen ball.
Let N be a natural number. We consider families of $(a_1,a_2)$ -weighted Bowen balls $\{B^{(a_1, a_2)}_{n_j}(x_j, \varepsilon )\}_{j=1}^\infty $ satisfying
Let $s\geq 0$ . We define $\Lambda ^{(a_1, a_2), s}_{N,\varepsilon }(X)$ as the infimum of
where the infimum is taken over all families $\{B^{(a_1, a_2)}_{n_j}(x_j, \varepsilon )\}_{j=1}^\infty $ satisfying (1.3).
The quantity $\Lambda ^{(a_1, a_2), s}_{N,\varepsilon }(X)$ is monotone in N. So we define
We vary the parameter s from $0$ to $\infty $ . There exists a unique value of s, which we denote by $h_{\mathrm {top}}^{(a_1, a_2)}(T, \varepsilon )$ , where the value of $\Lambda ^{(a_1, a_2), s}_\varepsilon (X)$ jumps from $\infty $ to $0$ :
Here, $h_{\mathrm {top}}^{(a_1, a_2)}(T, \varepsilon )$ is monotone in $\varepsilon $ . So we define the $(a_1, a_2)$ -weighted topological entropy of $\pi : X\to Y$ by
Feng and Huang [Reference Feng and HuangFH16, Theorem 1.4 and Corollary 1.5] solved Question 1.1 by this quantity.
Theorem 1.2. (Feng and Huang, 2016)
In the above setting,
1.2 New approach
In the previous subsection, we have described the original definition of weighted topological entropy introduced by Feng and Huang [Reference Feng and HuangFH16]. In this subsection, we explain our new approach. Our approach is a modification of the familiar definition of topological entropy (not the Hausdorff-dimension-like definition of [Reference BowenBow73]).
First of all, notice that we can assume $a_1+a_2 = 1$ in Question 1.1 because we can reduce the general case to this special case by a simple rescaling. So we study only this case. As in the previous subsection, here we explain the entropy case (that is, the case of $f\equiv 0$ ) for simplicity. We will explain the pressure case in §2.
Let $(X,T)$ and $(Y, S)$ be dynamical systems, and let $\pi :X\to Y$ be a factor map. Let d and $d^\prime $ be metrics on X and Y respectively. For a natural number N we define metrics $d_N$ and $d^\prime _N$ on X and Y respectively by
For $\varepsilon>0$ and a non-empty subset $\Omega \subset X$ , we define
Here, ${\mathrm {diam}}(U_k, d_N) = \sup _{x_1, x_2\in U_k} d_N(x_1, x_2)$ is the diameter of $U_k$ with respect to the metric $d_N$ . When $\Omega $ is the empty set, we define $\#(\Omega , N, \varepsilon )=0$ . As is well known, the topological entropy of $(X, T)$ is defined by
We will modify this definition.
Let $0\leq w \leq 1$ be a real number. We set
It is easy to check that this quantity is sub-multiplicative in N and monotone in $\varepsilon $ . So we define the w-weighted topological entropy of $\pi :X\to Y$ by
This definition uses the metrics d and $d^\prime $ , but the value of $h_{\mathrm {top}}^w(\pi , T)$ is a topological invariant (that is, independent of the choice of metrics).
The quantity $h^w_{\mathrm {top}}(\pi , T)$ provides another solution to Question 1.1 for the case of $f\equiv 0$ and $(a_1, a_2) = (w, 1-w)$ . This is our main result for the weighted topological entropy.
Theorem 1.3. (Variational principle for w-weighted topological entropy)
For $0\leq w\leq 1$ ,
As the above definition of $h_{\mathrm {top}}^w(\pi , T)$ is close to the standard definition of topological entropy, the proof of this theorem is also close to a well-known proof of the standard variational principle. The basic structure of the proof is borrowed from the famous argument of Misiurewicz [Reference MisiurewiczMis76]. At some technical points, we use the theory of principal extensions [Reference FengDH13, Reference DownarowiczDow11].
By combining Theorems 1.2 and 1.3, we get a corollary.
Corollary 1.4. For $0 \leq w \leq 1$ ,
Here the left-hand side is the weighted topological entropy $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ for $(a_1, a_2) = (w, 1-w)$ defined in the previous subsection.
This corollary seems to be a very interesting statement. The author cannot see any direct way to prove it (without using the variational principles).
Problem 1.5. Can one prove the equality $h_{\mathrm {top}}^{(w, 1-w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ without using measure theory?
The following example illustrates the importance of the equality $h_{\mathrm {top}}^{(w, 1-w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ .
Example 1.6. (Bedford–McMullen carpets)
Let $\mathbb {T} = \mathbb {R}/\mathbb {Z}$ be the circle, and let $\mathbb {T}^2 = \mathbb {R}^2/\mathbb {Z}^2$ be the torus. Let a and b be two natural numbers with $a\geq b\geq 2$ . Set $A = \{0,1,2,\ldots , a-1\}$ and $B = \{0,1,2,\ldots , b-1\}$ . Let $R\subset A\times B$ be a non-empty subset, and define
We define $X\subset \mathbb {T}^2$ and $Y\subset \mathbb {T}$ by
The space X is the famous Bedford–McMullen carpet [Reference Downarowicz and HuczekBed84, Reference McMullenMc84]. We are going to explain that we can calculate the Hausdorff dimension of X (with respect to the natural metric on $\mathbb {T}^2$ ) by using Corollary 1.4.
We define continuous maps $T:X\to X$ and $S:Y\to Y$ by
Here, $(X, T)$ and $(Y, S)$ are dynamical systems. Let $\pi :X\to Y$ be the natural projection. Then $\pi $ is a factor map between $(X, T)$ and $(Y, S)$ . We are interested in its weighted topological entropy. Set
We have $0\leq w \leq 1$ . It directly follows from the definitions (the $(a_1, a_2)$ -weighted Bowen ball $B_n^{(a_1, a_2)}(x,\varepsilon )$ for $a_1 = \log _a b$ and $a_2=1-\log _a b$ is approximately a square of side length $\varepsilon b^{-n}$ ) in §1.1 that the Hausdorff dimension of X is given by
From the equality $h_{\mathrm {top}}^{(w, 1-w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ in Corollary 1.4, we also have
Now we calculate the w-weighted topological entropy $h^{w}(\pi , T)$ .
Claim 1.7. For each $y\in B$ , we define $t(y)$ as the number of $x\in A$ satisfying $(x, y)\in R$ . Then
Proof. First notice that in the definitions (1.5) and (1.6), we can use closed covers instead of open covers; this does not change their values. Here we will consider closed covers.
We define a metric $d^\prime $ on $\mathbb {T}$ by
We define a metric d on $\mathbb {T}^2$ by
Let $\varepsilon>0$ and take a natural number m with $b^{-m} < \varepsilon $ . Let N be a natural number. For each $v\in (R^\prime )^{N+m}$ , set
These form a closed covering of Y with ${\mathrm {diam}}(V_v, d^\prime _N) < \varepsilon $ . For each $(u, v)\in R^{N+m} \subset A^{N+m}\times B^{N+m}$ (where $u\in A^{N+m}$ and $v\in (R^\prime )^{N+m}$ ), we set
These are closed subsets of X with ${\mathrm {diam}}(U_{(u,v)}, d_N) < \varepsilon $ and
Hence, for $v=(v_1, \ldots , v_{N+m}) \in (R^\prime )^{N+m}$ ,
Therefore,
Thus,
Next, let $0<\varepsilon <{1}/{a}$ . Fix $(p, q) \in R$ . For a natural number N, we consider the following points in Y:
These points form an $\varepsilon $ -separated set in Y with respect to the metric $d^\prime _N$ . We also consider the following points in X:
These points form an $\varepsilon $ -separated set in X with respect to the metric $d_N$ .
Suppose $Y=V_1\cup \cdots \cup V_n$ is a covering with ${\mathrm {diam}}(V_k, d^\prime _N) < \varepsilon $ . Then each $V_k$ contains at most one point of (1.8). If $V_k$ contains a point $\sum _{n=1}^N ({v_n}/{b^n}) + \sum _{n=N+1}^\infty ({q}/{b^n})$ , then $\pi ^{-1}(V_k)$ contains $t(v_1)\cdots t(v_N)$ points of the form (1.9) and hence
So
This shows
Notice that this proof of the claim is completely elementary. We have not used any sophisticated technique (in particular, measure theory).
This is a famous formula for the Hausdorff dimension of the Bedford–McMullen carpet [Reference Downarowicz and HuczekBed84, Reference McMullenMc84]. Therefore, we conclude that the equality $h_{\mathrm {top}}^{(1-w,w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ provides this famous formula fairly easily. This suggests that the equality $h_{\mathrm {top}}^{(1-w, w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ is a rather deep statement. We can say that it is a topological generalization of the dimension formula for the Bedford–McMullen carpet.
Kenyon–Peres [Reference Kenyon and PeresKP96b, Theorems 1.1 and 3.2] generalized the formula (1.10) to closed T-invariant subsets of $\mathbb {T}^2$ which correspond to subshifts of finite type or sofic subshifts under the natural Markov partition. We can also prove their results from the equality $h_{\mathrm {top}}^{(1-w,w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ as in the above.
The above example also illustrates that the two notions $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ and $h_{\mathrm {top}}^w(\pi , T)$ have their own advantages. One of the great advantages of $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ is that its definition is intrinsically related to the Hausdorff dimension. So it can be directly applied to the study of geometric measure theory. The advantage of $h_{\mathrm {top}}^w(\pi , T)$ is that its definition is elementary and hence (sometimes) easy to calculate.
In [Reference Feng and HuangFH16, pp. 441], Feng and Huang asked how to generalize their result to $\mathbb {Z}^d$ -actions. It seems rather straightforward to generalize our new approach to $\mathbb {Z}^d$ -actions and, possibly, actions of amenable groups.
Problem 1.8. Suppose that both $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ and $h_{\mathrm {top}}^w(\pi , T)$ are generalized to group actions. Can one deduce any interesting consequence of their coincidence (like the above calculation of the Hausdorff dimension of the Bedford–McMullen carpet)?
We would like to mention the papers of Barral and Feng [Reference BedfordBF09, Reference Barral and FengBF12] and Feng [Reference FengFen11] (see also Yayama [Reference YayamaYa11a, Reference YayamaYa11b]). These papers studied Question 1.1 and related questions when $(X, T)$ and $(Y, S)$ are subshifts over finite alphabets. When $(X, T)$ and $(Y, S)$ are subshifts, the above definition of $h_{\mathrm {top}}^w(\pi , T)$ (and its pressure version in §2) is essentially the same as that given in [Reference BedfordBF09, Theorem 1.1] (see also [Reference Barral and FengBF12, Theorem 3.1]). So we can say that the above definition generalizes the approach in [Reference BedfordBF09, Theorem 1.1] from subshifts to general dynamical systems.
This paper studies only the abstract theory of $h_{\mathrm {top}}^w(\pi , T)$ and its pressure version. However, the main motivation for the author to introduce these quantities is not to develop an abstract theory. The author naturally came up with the above definition of $h_{\mathrm {top}}^w(\pi , T)$ when he studied the mean Hausdorff dimension of certain infinite dimensional fractals. (The mean Hausdorff dimension is a dynamical version of the Hausdorff dimension introduced in [Reference McMullenLT19].) We plan to describe this connection in a separate paper.
2 Weighted topological pressure
In this section, we introduce our new definition of weighted topological pressure. For the original approach, see [Reference Feng and HuangFH16, §3.1].
Let $\pi :X\to Y$ be a factor map from a dynamical systems $(X, T)$ to a dynamical system $(Y, S)$ . Let $f:X\to \mathbb {R}$ be a continuous function.
Let d and $d^\prime $ be metrics on X and Y respectively. For a natural number N, we define new metrics $d_N$ and $d^\prime _N$ on X and Y respectively by (1.4). We also define a continuous function $\mathbb {S}_N f:X\to \mathbb {R}$ by
The metrics $d_N, d^\prime _N$ , and function $\mathbb {S}_N f$ are sometimes denoted by $d^T_N, (d^\prime )^S_N$ , and $\mathbb {S}^T_N f$ respectively for clarifying the underlying dynamics.
For $\varepsilon>0$ and a non-empty subset $\Omega \subset X$ , we define
(When $U_k$ is the empty set, we assume that the term $\exp (\sup _{U_k} \mathbb {S}_N f)$ is zero.) We sometimes denote $P(\Omega , f, N, \varepsilon )$ by $P_T(\Omega , f, N, \varepsilon )$ for clarifying the map T. When $\Omega $ is the empty set, we define $P(\Omega , f, N, \varepsilon ) = 0$ . It is well known that the topological pressure of $(X, T, f)$ is given by
We will modify this definition. Let $0\leq w \leq 1$ be a real number. We set
We sometimes denote this by $P^w_T(\pi , f, N, \varepsilon )$ .
The quantity $P^w(\pi , f, N,\varepsilon )$ is sub-multiplicative in N and monotone in $\varepsilon $ . So we define the w-weighted topological pressure by
The value of $P^w(\pi , T, f)$ is independent of the choices of the metrics d and $d^\prime $ . So it provides a topological invariant. We sometimes use the notation $P^w(\pi , X, T, Y, S, f)$ instead of $P^w(\pi , T, f)$ for clarifying all the data involved.
Now we state our main result of the paper.
Theorem 2.1. (Variational principle for w-weighted topological pressure)
For any $0\leq w\leq 1$ ,
When $f\equiv 0$ , we have $P^w(\pi , T, f) = h_{\mathrm {top}}^w(\pi , T)$ . So Theorem 1.3 in §1.2 follows from Theorem 2.1. The proof of Theorem 2.1 occupies the rest of the paper.
For the simplicity of the notation, we write
(Here var is the abbreviation of variational.) Then our main purpose is to prove the equality
In the rest of this section, we gather some elementary properties of w-weighted topological pressure. Here we always assume that $\pi :(X, T)\to (Y, S)$ is a factor map between dynamical systems with a continuous function $f:X\to \mathbb {R}$ . We take $0\leq w\leq 1$ . Let d and $d^\prime $ be metrics on X and Y respectively.
Lemma 2.2. Let m be a natural number.
Here the left-hand side is $P^w(\pi , X,T^m,Y,S^m, \mathbb {S}^T_m f)$ .
Proof. Let $\varepsilon $ be a positive number. There exists $0<\delta <\varepsilon $ such that
Then for any natural number N,
Because $\mathbb {S}^{T^m}_N (\mathbb {S}^T_m f) = \mathbb {S}^T_{mN} f$ , for any subset $\Omega \subset X$ ,
Then,
Thus,
Lemma 2.3. Let $(X^\prime , T^\prime )$ be a dynamical system, and let $\varphi : (X^\prime , T^\prime ) \to (X, T)$ be a factor map.
Then
Here the right-hand side is $P^w(\pi \circ \varphi , X^\prime , T^\prime , Y, S, f\circ \varphi )$ .
Proof. Let $\tilde {d}$ be a metric on $X^\prime $ . For any $\varepsilon>0$ there exists $0<\delta <\varepsilon $ satisfying
Then for any $N>0$
From this, we have for any $\Omega \subset X^\prime $
For any $V\subset Y$
So
Then
Therefore
The next lemma is a bit complicated. It might be better for some readers to look at Remark 2.5 below before reading the lemma. It will provide a clearer perspective.
Lemma 2.4. Let $(Y^\prime , S^\prime )$ be a dynamical system and let $\phi :(Y^\prime , S^\prime )\to (Y,S)$ be a factor map. Define the fiber product,
Now, $(X\times _Y Y^\prime , T\times S^\prime )$ becomes a dynamical system. We define factor maps $\varphi : X\times _Y Y^\prime \to X$ and $\Pi :X\times _Y Y^\prime \to Y^\prime $ by
The diagram is as follows:
Then,
Here the right-hand side is $P^w(\Pi , X\times _Y Y^\prime , T\times S^\prime , Y^\prime , S^\prime , f\circ \varphi )$ .
Proof. The point of the proof is that for any subset $A\subset Y^\prime $ , we have
Let $\tilde {d}$ be a metric on $Y^\prime $ and we define a metric $\rho $ on $X\times _Y Y^\prime $ by
Let $\varepsilon $ be a positive number. We have
Then for any natural number N and any subset $\Omega \subset X\times _Y Y^\prime $ ,
In particular, for any subset $A \subset Y^\prime $ ,
There exists $0<\delta <\varepsilon $ such that
Now we claim that
Indeed, take any positive number C with
Then there exists an open covering $Y^\prime = V_1\cup \cdots \cup V_n$ such that ${\mathrm {diam}}(V_k, \tilde {d}_N) < \delta $ for all $1\leq k \leq n$ and
We can find compact subsets $A_k \subset V_k$ satisfying $Y^\prime = A_1\cup \cdots \cup A_n$ . We have
Each $\phi (A_k)$ is a closed subset of Y with ${\mathrm {diam}}(\phi (A_k), d^\prime _N) < \varepsilon $ . By the definition (2.1), there exist open subsets $W_k \supset \phi (A_k)$ of Y for $1\leq k\leq n$ such that ${\mathrm {diam}}(W_k, d^\prime _N) < \varepsilon $ and
Noticing $Y = W_1\cup \cdots \cup W_n$ , we have
Because C is an arbitrary number larger than $P^w_{T\times S^\prime }(\Pi , f\circ \varphi , N,\delta )$ , this shows
Thus, we conclude
Remark 2.5. Let $(X^\prime , T^\prime )$ and $(Y^\prime , S^\prime )$ be dynamical systems, and let $\pi ^\prime :X^\prime \to Y^\prime $ be a factor map. Suppose there exist factor maps $\varphi :(X^\prime , T^\prime ) \to (X, T)$ and $\phi :(Y^\prime , S^\prime )\to (Y, S)$ satisfying $\pi \circ \varphi = \phi \circ \pi ^\prime $ .
Then,
Here the right-hand side is $P^w(\pi ^\prime , X^\prime , T^\prime , Y^\prime , S^\prime , f\circ \varphi )$ . Lemmas 2.3 and 2.4 are special cases of this statement. We can prove (2.5) by using the variational principle (Theorem 2.1). However, it seems difficult to prove it in an elementary way. We will not use (2.5) in the paper.
Finally, we mention two basic results on calculus, which underpin many arguments of this paper.
Lemma 2.6.
-
(1) For $0\leq w \leq 1$ and non-negative numbers $x, y$ ,
$$ \begin{align*} (x+y)^w \leq x^w + y^w. \end{align*} $$ -
(2) Let $p_1, \ldots , p_n$ be non-negative numbers with $p_1+\cdots +p_n = 1$ . For any real numbers $x_1, \ldots , x_n$ ,
$$ \begin{align*} \sum_{i=1}^n (-p_i \log p_i + p_i x_i) \leq \log \sum_{i=1}^n e^{x_i}. \end{align*} $$In particular (letting $x_1 = \cdots = x_n = 0$ ),$$ \begin{align*} - \sum_{i=1}^n p_i \log p_i \leq \log n. \end{align*} $$
Proof. (1) is completely elementary. (2) is proved in [Reference WaltersWal82, §9.3, Lemma 9.9].
3 Kolmogorov–Sinai entropy
In this section, we review basic definitions on Kolmogorov–Sinai entropy. For the details, see the book of Walters [Reference WaltersWal82].
Let $(X, \mu )$ be a probability measure space, namely X is a set equipped with a $\sigma $ -algebra and $\mu $ is a probability measure defined on it. In our later applications, X is always a compact metrizable space with the standard Borel $\sigma $ -algebra.
Let $\mathscr {A} = \{A_1, A_2, \ldots , A_n\}$ be a finite measurable partition of X, namely each $A_i$ is a measurable subset of X and
We define the Shannon entropy of $\mathscr {A}\ {}$ by
where we assume $0 \log 0 = 0$ .
For another finite measurable partition $\mathscr{A}{\kern2pt}^\prime = \{A^\prime _1, A^\prime _2, \ldots , A^\prime _m\}$ , we set
This is a finite measurable partition of X. We define the conditional entropy by
Here, in the first summation, we have considered only the index j satisfying $\mu (A^\prime _j) \!>\! 0$ . We have [Reference WaltersWal82, Theorem 4.3(i)]
We write $\mathscr{A}{\kern2pt}^\prime \prec \mathscr {A}$ if $\mathscr {A}\vee \mathscr{A}{\kern2pt}^\prime = \mathscr {A}$ . This is equivalent to the condition that for every $A \in \mathscr {A}$ , there exists $A^\prime \in \mathscr{A}{\kern2pt}^\prime $ containing A. If $\mathscr{A}{\kern2pt}^\prime \prec \mathscr {A}$ , then
and $H_\mu (\mathscr{A}{\kern2pt}^\prime ) \leq H_\mu (\mathscr{A}{\kern2pt})$ .
Lemma 3.1.
-
(1) $H_\mu (\mathscr{A}{\kern2pt})$ is subadditive in $\mathscr {A}$ . Namely, for two finite measurable partitions $\mathscr {A}\ {}$ and $\mathscr{A}{\kern2pt}^\prime $ of X,
$$ \begin{align*} H_\mu(\mathscr{A}\vee \mathscr{A}{\kern2pt}^\prime) \leq H_\mu(\mathscr{A}{\kern2pt}) + H_\mu(\mathscr{A}{\kern2pt}^\prime). \end{align*} $$ -
(2) $H_\mu (\mathscr{A}{\kern2pt})$ is concave in $\mu $ . Namely, for $0\leq t\leq 1$ and two probability measures $\mu $ and $\mu ^\prime $ on X,
$$ \begin{align*} H_{(1-t)\mu + t\mu^\prime}(\mathscr{A}{\kern2pt}) \geq (1-t)H_\mu(\mathscr{A}{\kern2pt}) + t H_\mu(\mathscr{A}{\kern2pt}). \end{align*} $$
Proof. See [Reference WaltersWal82, Theorem 4.3(viii)] and [Reference WaltersWal82, §8.1 Remark] for the proofs of (1) and (2) respectively.
Let $T:X\to X$ be a measurable map satisfying $T_*\mu = \mu $ . Let $\mathscr {A}\ {}$ be a finite measurable partition of X. For a natural number N, we define a new measurable partition $\mathscr {A}^{\kern3pt N}$ of X by
We define the entropy $h_\mu (T, \mathscr{A}{\kern2pt})$ by
Finally, we define the Kolmogorov–Sinai entropy of the measure-preserving transformation T by
We will need the following lemma later. See theorem 4.12(iv) of the book [Reference WaltersWal82, §4.5] for the proof.
Lemma 3.2. If $\mathscr {A}\ {}$ and $\mathscr{A}{\kern2pt}^\prime $ are two finite measurable partitions of X, then
4 Proof of $P^w_{\mathrm {var}}(\pi , T, f) \leq P^w(\pi , T, f)$
Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems and let $f:X\to \mathbb {R}$ be a continuous function. The purpose of this section is to prove a half of the variational principle.
Proposition 4.1. For any $0\leq w\leq 1$ and $\mu \in \mathscr {M}^T(X)$ ,
Therefore, $P^w_{\mathrm {var}}(\pi , T, f) \leq P^w(\pi , T, f)$ .
Proof. Set $\nu = \pi _*\mu $ . This is an invariant probability measure on Y. We will prove
If this is proved, then we will get the above statement by the standard amplification trick. Namely, for each natural number m, we apply (4.1) to $\pi :(X, T^m)\to (Y, S^m)$ with a continuous function $\mathbb {S}_m f:X\to \mathbb {R}$ :
We have $h_\mu (T^m) = m h_\mu (T)$ , $h_{\nu }(S^m) = m h_{\nu }(S)$ , $\int _X \mathbb {S}_m f\, d\mu = m\int _X f\, d\mu $ and
Hence,
Letting $m\to \infty $ , we get the statement. So it is enough to prove (4.1).
Let $\mathscr {A} = \{A_1, \ldots , A_\alpha \}$ be a finite measurable partition of Y and let $\mathscr {B}$ be a finite measurable partition of X. We will prove that
For each $A_a$ in $\mathscr {A}\ {}$ ( $1\leq a \leq \alpha $ ), we take a compact subset $C_a\subset A_a$ satisfying
We set $C_0 = Y\setminus (C_1\cup \cdots \cup C_\alpha )$ and $\mathscr {C} = \{C_0, C_1, C_2, \ldots , C_\alpha \}$ .
Claim 4.2. $\mathscr {C}$ is a finite measurable partition of Y satisfying
Proof. From Lemma 3.2,
Because $C_a\subset A_a$ for $1\leq a\leq \alpha $ ,
The last term is smaller than one by (4.3).
We consider $\mathscr {B}\vee \pi ^{-1}(\mathscr {C}\kern1.7pt)$ , which has the form
For each $B_{ab} \ (0\leq a\leq \alpha , 1\leq b\leq \beta _a)$ , we take a compact subset $D_{ab}\subset B_{ab}$ such that
We set
We define
Claim 4.3. $\mathscr {D}$ is a finite measurable partition of X with $\pi ^{-1}(\mathscr {C}\kern1.7pt) \prec \mathscr {D}$ and
Proof. $\pi ^{-1}(\mathscr {C}\kern1.7pt) \prec \mathscr {D}$ is obvious by the construction.
Because $D_{ab}\subset B_{ab}$ for $0\leq a\leq \alpha $ and $1\leq b\leq \beta _a$ ,
We will prove that
If this is proved, then (4.2) will follow from Claims 4.2 and 4.3.
From the definition of the entropy,
Because $\nu = \pi _*\mu $ , we have $H_\nu (\mathscr {C}^N) = H_\mu (\pi ^{-1}(\mathscr {C}^N))$ . Because $\pi ^{-1}(\mathscr {C}^N) \prec \mathscr {D}^N$ ,
So,
We have
Therefore,
For $C\in \mathscr {C}^N$ , we define
Then,
For $C\in \mathscr {C}^N$ with $\nu (C)>0$ and $D\in \mathscr {D}^N_C$ , we set
For each $C\in \mathscr {C}^N$ with $\nu (C)>0$ , we have
Claim 4.4. We have the following inequality:
Proof. We have
Hence,
By Lemma 2.6(2),
So
Therefore,
We take metrics d and $d^\prime $ on X and Y respectively. Recall that $C_a \ (1\leq a\leq \alpha )$ are mutually disjoint compact subsets of Y and that $D_{ab} \ (0\leq a \leq \alpha , 1\leq b \leq \beta _a)$ are mutually disjoint compact subsets of X. Hence, we can take $\varepsilon>0$ such that:
-
(a) for any $y\in C_a$ and $y^\prime \in C_{a^\prime }$ with distinct $1\leq a, a^\prime \leq \alpha $ ,
$$ \begin{align*} \varepsilon < d^\prime(y, y^\prime); \end{align*} $$ -
(b) for any $x\in D_{ab}$ and $x^\prime \in D_{a b^\prime }$ with $0\leq a\leq \alpha $ and distinct $1\leq b, b^\prime \leq \beta _a$ ,
$$ \begin{align*} \varepsilon < d(x, x^\prime). \end{align*} $$
Claim 4.5. Let N be a natural number.
-
(1) If a subset $V\subset Y$ has ${\mathrm {diam}}(V,d^\prime _N) < \varepsilon $ , then the number of $C\in \mathscr {C}^N$ having non-empty intersection with V is at most $2^N$ :
$$ \begin{align*} |\{C\in \mathscr{C}^N\mid C\cap V\neq \emptyset\}| \leq 2^N. \end{align*} $$ -
(2) If a subset $U\subset X$ has ${\mathrm {diam}}(U, d_N) < \varepsilon $ , then for each $C\in \mathscr {C}^N$ , the number of $D\in \mathscr {D}^N_C$ having non-empty intersection with U is at most $2^N$ :
$$ \begin{align*} |\{D\in \mathscr{D}^N_C\mid D\cap U \neq \emptyset\}| \leq 2^N. \end{align*} $$
Proof. (1) For each $0\leq k < N$ , the set $S^k V$ may have non-empty intersection with $C_0$ and at most one set in $\{C_1, C_2, \ldots , C_\alpha \}$ . The above statement follows from this.
(2) Suppose $C\in \mathscr {C}^N$ has the form
with $0\leq a_0, \ldots , a_{N-1} \leq \alpha $ . Recall that $\{D_{a_k 0}, D_{a_k 1}, D_{a_k 2}, \ldots , D_{a_k \beta _{a_k}}\}$ is a partition of $\pi ^{-1}(C_{a_k})$ . Then any set $D\in \mathscr {D}^N_C$ has the form
with $0 \leq b_k \leq \beta _{a_k}$ for $0\leq k \leq N-1$ .
For each $0\leq k <N$ , the set $T^k U$ may have non-empty intersection with $D_{a_k 0}$ and, at most, one set in $\{D_{a_k 1}, D_{a_k 2}, \ldots , D_{a_k \beta _{a_k}}\}$ . Now the above statement follows from this.
Let N be a natural number. Suppose we are given an open cover $Y= V_1\cup \cdots \cup V_n$ with ${\mathrm {diam}}(V_i, d^\prime _N) < \varepsilon $ for all $1\leq i\leq n$ . Moreover, suppose that for each $1\leq i\leq n$ , we are given an open cover $\pi ^{-1}(V_i) = U_{i1}\cup U_{i2}\cup \cdots \cup U_{i m_i}$ with ${\mathrm {diam}}(U_{ij}, d_N) < \varepsilon $ for all $1\leq j\leq m_i$ . We are going to prove
Suppose this is proved. Then by Claim 4.4,
Taking the infimum over $\{V_i\}$ and $\{U_{ij}\}$ satisfying the above assumptions, we have
Divide this by N and let $N\to \infty $ . Recalling (4.5), we get
Letting $\varepsilon \to 0$ , we get the desired result:
So the rest of the work is to prove (4.6).
For $D\in \mathscr {D}^N$ , we have
Here the sum is taken over the index $(i, j)$ such that $U_{ij}$ has non-empty intersection with D.
Let $C\in \mathscr {C}^N$ . We define $\mathscr {V}_C$ as the set of $1\leq i \leq n$ such that $V_i \cap C \neq \emptyset $ . By Claim 4.5(2),
Then (recall $0\leq w\leq 1$ ),
Hence,
By Claim 4.5(1), for each $1\leq i \leq n$ , the number of $C\in \mathscr {C}^N$ satisfying $i \in \mathscr {V}_C$ is at most $2^N$ . So the right-hand side is bounded from above by
Therefore,
Taking the logarithm,
This is the estimate (4.6). So we have finished the proof of the proposition.
5 Zero-dimensional principal extension
In this section, we prepare some definitions and results on principal extensions. The main reference is the book of Downarowicz [Reference DownarowiczDow11].
Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems. Let d be a metric on X. We define the topological conditional entropy of $\pi $ by
Here, $\#(\pi ^{-1}(y), N, \varepsilon )$ is the number defined by (1.5). It is easy to check that the quantity
is sub-additive in N and monotone in $\varepsilon $ . So the above limits exist. This definition of the topological conditional entropy arises from [Reference DownarowiczDow11, Lemma 6.8.2].
The factor map $\pi $ is said to be principal if $h_{\mathrm {top}}(X,T\mid Y,S)=0$ . In the case that this condition holds, the dynamical system $(X, T)$ is called a principal extension of $(Y, S)$ .
The next theorem shows an important consequence of this condition. This is proved in [Reference DownarowiczDow11, Corollary 6.8.9]. (See also the paper of Ledrappier and Walters [Reference Lindenstrauss and TsukamotoLW77].)
Theorem 5.1. A principal factor map preserves Kolmogorov–Sinai entropy. Namely, if $\pi :(X, T)\to (Y, S)$ is a principal factor map between dynamical systems, then for any invariant probability measure $\mu \in \mathscr {M}^T(X)$ ,
Remark 5.2. Indeed, [Reference DownarowiczDow11, Corollary 6.8.9] proves the following more precise result. Let $\pi :(X, T)\to (Y, S)$ be a factor map with ${h_{\mathrm {top}}}(Y, S) < \infty $ . Then $\pi $ is a principal factor map if and only if $h_\mu (T) = h_{\pi _*\mu }(S)$ for all $\mu \in \mathscr {M}^T(X)$ .
Lemma 5.3. Let $(X, T), (Y, S), (Y^\prime , S^\prime )$ be dynamical systems. Let $\pi :X\to Y$ be a factor map and let $\phi :Y^\prime \to Y$ be a principal factor map. We define the fiber product (see Lemma 2.4)
So, $(X\times _Y Y^\prime , T\times S^\prime )$ becomes a dynamical system. We define factor maps $\varphi : X\times _Y Y^\prime \to X$ and $\Pi :X\times _Y Y^\prime \to Y^\prime $ by
Then $\varphi $ is a principal factor map. (The map $\Pi $ is not used in this statement, but we have introduced it for convenience in what follows.)
Proof. Let d and $d^\prime $ be metrics on X and $Y^\prime $ respectively. We define a metric $\rho $ on $X\times _Y Y^\prime $ by
For any natural number N and $x\in X$ , the metric space
is isometric to $(\phi ^{-1}(\pi (x)), d^\prime _N)$ . Therefore, for any $\varepsilon>0$ ,
So (recall that a factor map is always surjective),
Thus,
The next theorem is a key technical result. This is proved in [Reference DownarowiczDow11, Theorem 7.6.1]. (See also [Reference FengDH13].) Here recall that a compact metrizable space is said to be zero-dimensional if clopen subsets form an open basis of the topology (a subset of a topological space is called clopen if it is closed and open). For example, the Cantor set $\{0, 1\}^{\mathbb {N}}$ is zero-dimensional. A dynamical system $(X, T)$ is said to be zero-dimensional if X is a zero-dimensional compact metrizable space.
Theorem 5.4. Every dynamical system has a zero-dimensional principal extension. Namely, for any dynamical system $(X, T)$ , there exist a dynamical system $(X^\prime , T^\prime )$ and a factor map $\phi :X^\prime \to X$ such that $X^\prime $ is zero-dimensional and $\phi $ is principal.
Recall that we have defined two terms $P^w(\pi ,T, f)$ and $P^w_{\mathrm {var}}(\pi , T, f)$ in §2.
Corollary 5.5. Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems with a continuous function $f:X\to \mathbb {R}$ . There exists a factor map $\pi ^\prime :(X^\prime , T^\prime )\to (Y^\prime , S^\prime )$ with a continuous function $f^\prime :X^\prime \to \mathbb {R}$ satisfying the following two conditions.
-
(1) $X^\prime $ and $Y^\prime $ are zero-dimensional.
-
(2) For any $0\leq w\leq 1$ , we have
$$ \begin{align*} P^w(\pi,T, f) \leq P^w(\pi^\prime, T^\prime, f^\prime), \quad P^w_{\mathrm{var}}(\pi^\prime, T^\prime, f^\prime) \leq P^w_{\mathrm{var}}(\pi,T, f). \end{align*} $$
Proof. By Theorem 5.4, there exists a zero-dimensional principal extension $\phi :(Y^\prime ,S^\prime )\to (Y, S)$ . We consider the fiber product $(X\times _Y Y^\prime , T\times S^\prime )$ and the projections $\varphi : X\times _Y Y^\prime \to X$ and $\Pi : X\times _Y Y^\prime \to Y^\prime $ as in Lemma 5.3. Then $\varphi $ is a principal factor map.
By Lemma 2.4, for any $0\leq w\leq 1$ ,
Here, the right-hand side is $P^w(\Pi , X\times _Y Y^\prime , T\times S^\prime , Y^\prime , S^\prime , f\circ \varphi )$ . By Theorem 5.1, for any invariant probability measure $\mu \in \mathscr {M}^{T\times S^\prime }(X\times _Y Y^\prime )$ ,
Then,
(Here, we prove $P^w_{\mathrm {var}}(\Pi , T\times S^\prime , f\circ \varphi ) \leq P^w_{\mathrm {var}}(\pi , T, f)$ . Indeed we can prove the equality $P^w_{\mathrm {var}}(\Pi , T\times S^\prime , f\circ \varphi ) = P^w_{\mathrm {var}}(\pi , T, f)$ because the map $\varphi _*:\mathscr {M}^{T\times S^\prime }(X\times _Y Y^\prime )\to \mathscr {M}^{T}(X)$ is surjective. However, we do not need this.)
By applying Theorem 5.4 to the system $(X\times _Y Y^\prime , T\times S^\prime )$ , there exists a zero-dimensional principal extension $\psi : (X^\prime , T^\prime ) \to (X\times _Y Y^\prime , T\times S^\prime )$ .
By Lemma 2.3,
Here, the right-hand side is $P^w(\Pi \circ \psi , X^\prime , T^\prime , Y^\prime , S^\prime , f \circ \varphi \circ \psi )$ . As in the above (5.1), by Theorem 5.1,
So we conclude
Set $\pi ^\prime := \Pi \circ \psi : (X^\prime , T^\prime ) \to (Y^\prime , S^\prime )$ and $f^\prime := f\circ \varphi \circ \psi : X^\prime \to \mathbb {R}$ . These satisfy the required conditions.
6 Completion of the proof of the variational principle
In this section, we prove $P^w(\pi , T, f) \leq P^w_{\mathrm {var}}(\pi , T, f)$ and complete the proof of the variational principle. First, we consider the case of zero-dimensional dynamical systems. Later, we will reduce the general case to this zero-dimensional case.
Proposition 6.1. Let $\pi :(X,T)\to (Y, S)$ be a factor map between zero-dimensional dynamical systems. Then, for any $0\leq w\leq 1$ and a continuous function $f:X\to \mathbb {R}$ ,
Proof. Let $\varepsilon>0$ . We will prove that there exists $\mu \in \mathscr {M}^T(X)$ satisfying
We take metrics d and $d^\prime $ on X and Y respectively. Let $Y = A_1\cup \cdots \cup A_\alpha $ be a clopen partition (that is, $A_a$ are mutually disjoint clopen subsets of Y) with ${\mathrm {diam}} (A_a, d^\prime ) < \varepsilon $ for all $1\leq a\leq \alpha $ . Here we have used $\dim Y=0$ .
From $\dim X = 0$ , for each $1\leq a\leq \alpha $ , we can also take a clopen partition
Set $\mathscr {A} = \{A_1, \ldots , A_\alpha \}$ and $\mathscr {B} = \{B_{ab}\mid 1\leq a\leq \alpha , 1\leq b\leq \beta _a\}$ . These are clopen partitions of Y and X respectively. We have $\pi ^{-1}(\mathscr{A}{\kern2pt}) \prec \mathscr {B}$ .
Let N be a natural number. We have $\pi ^{-1}(\mathscr {A}^{\kern3pt N}) \prec \mathscr {B}^N$ . For each non-empty $A\in \mathscr {A}^{\kern3pt N}$ , we define
We have
We set
Define
Here, the sum is taken over only non-empty $A \in \mathscr {A}^{\kern3pt N}$ . When we consider below a sum over $A\in \mathscr {A}^{\kern3pt N}$ (or $B\in \mathscr {B}^N$ ), we always assume that A (or B) is not empty.
We have
So it is enough to prove that there exists $\mu \in \mathscr {M}^T(X)$ satisfying
where the limit in the right-hand side exists because $Z_N$ is sub-multiplicative in N.
Let N be a natural number. For non-empty $B\in \mathscr {B}^N$ , we denote by $\mathscr {A}^{\kern3pt N}(B)$ the unique element of $\mathscr {A}^{\kern3pt N}$ containing $\pi (B)$ . For non-empty $A\in \mathscr {A}^{\kern3pt N}$ , we have $\mathscr {A}^{\kern3pt N}(B) = A$ for all $B\in \mathscr {B}^n_A$ .
For each non-empty set B in $\mathscr {B}^N$ , we take a point $x_B\in B$ satisfying $\mathbb {S}_N f(x_B) = \sup _{B} \mathbb {S}_N f$ . (Such a point exists because B is closed.) We define a probability measure on X by
Here, $\delta _{x_B}$ is the delta probability measure at the point $x_B$ and $\sigma _N$ is not an invariant measure in general. We set
We can take a subsequence $\{\mu _{N_k}\}$ converging to an invariant probability measure $\mu $ on X in the weak $^*$ topology. We will prove that this measure $\mu $ satisfies
Claim 6.2. For every natural number N,
Proof. We have
For each non-empty $A\in \mathscr {A}^{\kern3pt N}$ ,
Then,
For non-empty $B\in \mathscr {B}^N$ ,
Then,
We calculate the term (I) by
The term (II) is calculated by
For the term (III), we consider
Thus,
Combining this with (6.1),
Claim 6.3. Let M and N be natural numbers. We have
Here, $|\mathscr {A}|$ and $|\mathscr {B}|$ are the cardinalities of $\mathscr {A}\ {}$ and $\mathscr {B}$ respectively.
Proof. This is rather standard. (See the proof of the standard variational principle in [Reference WaltersWal82, §8.2].) Here we provide the proof for $\mathscr {B}^M$ . The case of $\mathscr {A}^M$ is the same.
From the concavity of the entropy function (Lemma 3.1(2)), for $\mu _n = ({1}/{N})\sum _{n=0}^{N-1} T^n_*\sigma _N$ ,
Let $N = qM + r$ with $0\leq r <M$ ,
We estimate $\sum _{s=0}^q H_{\sigma _N}(T^{-sM-t}\mathscr {B}^M)$ from below for each t. We have
When we fix $0\leq t\leq M-1$ and move $0\leq s\leq q$ and $0\leq m \leq M-1$ , the number $sM+t+m$ moves over
Hence,
Therefore,
Thus,
So by (6.3),
From (6.2), we conclude that
We have
Claim 6.3 implies
Because $\mu _{N_k}\to \mu $ as $k\to \infty $ , letting $N=N_k\to \infty $ ,
Here, we have used the clopenness of the elements of $\mathscr {A}^M$ and $\mathscr {B}^M$ . Finally, letting $M\,{\to}\, \infty $ , we get
Now we can prove the main result (Theorem 2.1). We repeat the statement for the convenience of readers.
Theorem 6.4. ( $=$ Theorem 2.1)
Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems. Then for any $0\leq w\leq 1$ and a continuous function $f:X\to \mathbb {R}$ ,
Proof. We already proved in Proposition 4.1 that
By Corollary 5.5, there exists a factor map $\pi ^\prime :(X^\prime , T^\prime )\to (Y^\prime , S^\prime )$ between zero-dimensional dynamical systems with a continuous function $f^\prime :X^\prime \to \mathbb {R}$ such that
By Proposition 6.1,
Therefore,
So we conclude that
Remark 6.5. The book of Downarowicz [Reference DownarowiczDow11] systematically develops the idea of using zero-dimensional dynamical systems in the study of entropy theory. The above proof is influenced by this idea. We also notice that it seems difficult to use this zero-dimensional trick in the proof of Proposition 4.1 in §4 because it is difficult to prove that principal extensions preserve weighted topological pressure without using the variational principle. A similar remark is given in [Reference DownarowiczDow11, Remark 7.6.12] about the proof of the standard variational principle.
Acknowledgements
M. Tsukamoto was supported by JSPS KAKENHI JP21K03227.