Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-23T23:32:13.339Z Has data issue: false hasContentIssue false

New approach to weighted topological entropy and pressure

Published online by Cambridge University Press:  28 January 2022

MASAKI TSUKAMOTO*
Affiliation:
Department of Mathematics, Kyushu University, Moto-oka 744, Nishi-ku, Fukuoka 819-0395, Japan
*
Rights & Permissions [Opens in a new window]

Abstract

Motivated by fractal geometry of self-affine carpets and sponges, Feng and Huang [J. Math. Pures Appl. 106(9) (2016), 411–452] introduced weighted topological entropy and pressure for factor maps between dynamical systems, and proved variational principles for them. We introduce a new approach to this theory. Our new definitions of weighted topological entropy and pressure are very different from the original definitions of Feng and Huang. The equivalence of the two definitions seems highly non-trivial. Their equivalence can be seen as a generalization of the dimension formula for the Bedford–McMullen carpet in purely topological terms.

Type
Original Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1 Introduction

1.1 Weighted topological entropy and pressure

The purpose of this paper is to introduce a new approach to weighted topological entropy and pressure introduced by Feng and Huang [Reference Feng and HuangFH16]. In this subsection, we describe their original theory. We explain our new approach in the next subsection.

We first quickly review the classical theory of entropy and pressure of dynamical systems. See the book of Walters [Reference WaltersWal82] for the details. A pair $(X, T)$ is called a dynamical system if X is a compact metrizable space and $T:X\to X$ is a continuous map. We denote its topological entropy by $h_{\mathrm {top}}(X,T)$ . This is a topological invariant of dynamical systems, which counts the number of bits per iterate for describing the orbits of $(X,T)$ .

One of the most basic theorems about topological entropy is the variational principle. We define $\mathscr {M}^T(X)$ as the set of invariant Borel probability measures on X. For each measure $\mu \in \mathscr {M}^T(X)$ , we denote its Kolomogorov–Sinai entropy by $h_\mu (T)$ . Then the variational principle states that [Reference Downarowicz and HuczekDin70, Reference GoodmanGoodm71, Reference GoodwynGoodw69]

(1.1) $$ \begin{align} h_{\mathrm{top}}(X, T) = \sup_{\mu\in \mathscr{M}^T(X)} h_\mu(T). \end{align} $$

This theory can be generalized to pressure. Let $(X, T)$ be a dynamical system with a continuous function $f:X\to \mathbb {R}$ . Motivated by statistical mechanics, Ruelle [Reference RuelleRue73] (in some special cases) and Walters [Reference WaltersWal75] (for general systems) introduced the topological pressure $P(T, f)$ and proved the variational principle

(1.2) $$ \begin{align} P(T, f) = \sup_{\mu\in \mathcal{M}^T(X)} \bigg(h_\mu(T) + \int_X f\, d\mu\bigg). \end{align} $$

The above (1.1) and (1.2) are classical and standard in ergodic theory. Recently, Feng and Huang [Reference Feng and HuangFH16] found an ingenious generalization of this classical theory. Motivated by fractal geometry of self-affine carpets and sponges [Reference Downarowicz and HuczekBed84, Reference Kenyon and PeresKP96a, Reference McMullenMc84], they introduced weighted versions of entropy and pressure.

Let $(X, T)$ and $(Y, S)$ be dynamical systems. A map $\pi :X\to Y$ is called a factor map if $\pi $ is a continuous surjection with $\pi \circ T = S\circ \pi $ . We sometimes write $\pi :(X, T)\to (Y, S)$ for clarifying the maps T and S. For an invariant probability measure $\mu \in \mathscr {M}^T(X)$ , we denote by $\pi _*\mu \in \mathscr {M}^S(Y)$ the push-forward of $\mu $ by $\pi $ (this is defined by $\pi _*\mu (A) = \mu (\pi ^{-1}A)$ for $A\subset Y$ ). Let $f:X\to \mathbb {R}$ be a continuous function, and let $a_1, a_2$ be two real numbers with $a_1>0$ and $a_2\geq 0$ . Feng and Huang [Reference Feng and HuangFH16, Question 1.1] asked (and then solved) the following question.

Question 1.1. How can one define a meaningful term $P^{(a_1, a_2)}(T, f)$ such that the following variational principle holds?

$$ \begin{align*} P^{(a_1, a_2)}(T, f) = \sup_{\mu\in \mathscr{M}^T(X)} \bigg(a_1 h_\mu(T) + a_2 h_{\pi_*\mu}(S) + \int_X f\, d\mu\bigg). \end{align*} $$

We describe their approach below. It is a modification of the definition of topological entropy given by Bowen [Reference BowenBow73], which is in turn a modification of the standard definition of the Hausdorff dimension.

Here we explain only the case of $f\equiv 0$ for simplicity of the exposition. For the case of $f\not \equiv 0$ , see their paper [Reference Feng and HuangFH16, §3.1] (they also studied the case where a sequence of factor maps $\pi _i:X_i\to X_{i+1}$ ( $i=1, 2, \ldots , k$ ) is given. We think that our new approach can be also generalized to this setting. However, we concentrate on the simplest case in this paper).

Let d and $d^\prime $ be metrics on X and Y respectively. For $x\kern1.2pt{\in}\kern1.2pt X$ , a natural number n and $\varepsilon\kern1.2pt{>}\kern1.2pt0$ , we define $B_n^{(a_1, a_2)}(x,\varepsilon ) \subset X$ as the set of $y\in X$ satisfying the following two conditions:

$$ \begin{align*} d(T^j x, T^j y) < \varepsilon\quad (0\leq j < \lceil a_1 n \rceil); \end{align*} $$
$$ \begin{align*} d^\prime(S^j \pi(x), S^j \pi(x)) < \varepsilon\quad (0\leq j < \lceil (a_1+a_2)n \rceil). \end{align*} $$

Here $\lceil u\rceil $ denotes the least integer not less than u. We call $B_n^{(a_1, a_2)}(x,\varepsilon )$ an $(a_1,a_2)$ -weighted Bowen ball.

Let N be a natural number. We consider families of $(a_1,a_2)$ -weighted Bowen balls $\{B^{(a_1, a_2)}_{n_j}(x_j, \varepsilon )\}_{j=1}^\infty $ satisfying

(1.3) $$ \begin{align} X = \bigcup_j B^{(a_1,a_2)}_{n_j}(x_j, \varepsilon), \quad n_j \geq N \> (\text{for all } j\geq 1). \end{align} $$

Let $s\geq 0$ . We define $\Lambda ^{(a_1, a_2), s}_{N,\varepsilon }(X)$ as the infimum of

$$ \begin{align*} \sum_j \exp(-sn_j), \end{align*} $$

where the infimum is taken over all families $\{B^{(a_1, a_2)}_{n_j}(x_j, \varepsilon )\}_{j=1}^\infty $ satisfying (1.3).

The quantity $\Lambda ^{(a_1, a_2), s}_{N,\varepsilon }(X)$ is monotone in N. So we define

$$ \begin{align*} \Lambda^{(a_1, a_2), s}_\varepsilon(X) = \lim_{N\to \infty} \Lambda^{(a_1, a_2), s}_{N,\varepsilon}(X). \end{align*} $$

We vary the parameter s from $0$ to $\infty $ . There exists a unique value of s, which we denote by $h_{\mathrm {top}}^{(a_1, a_2)}(T, \varepsilon )$ , where the value of $\Lambda ^{(a_1, a_2), s}_\varepsilon (X)$ jumps from $\infty $ to $0$ :

$$ \begin{align*} \Lambda^{(a_1, a_2),s}_\varepsilon(X) = \begin{cases} 0 \quad &(s>h_{\mathrm{top}}^{(a_1, a_2)}(T, \varepsilon)), \\[3pt] \infty \quad &(s< h_{\mathrm{top}}^{(a_1, a_2)}(T, \varepsilon)). \end{cases} \end{align*} $$

Here, $h_{\mathrm {top}}^{(a_1, a_2)}(T, \varepsilon )$ is monotone in $\varepsilon $ . So we define the $(a_1, a_2)$ -weighted topological entropy of $\pi : X\to Y$ by

$$ \begin{align*} h^{(a_1, a_2)}_{\mathrm{top}}(\pi, T) = \lim_{\varepsilon\to 0} h_{\mathrm{top}}^{(a_1, a_2)}(T, \varepsilon). \end{align*} $$

Feng and Huang [Reference Feng and HuangFH16, Theorem 1.4 and Corollary 1.5] solved Question 1.1 by this quantity.

Theorem 1.2. (Feng and Huang, 2016)

In the above setting,

$$ \begin{align*} h^{(a_1, a_2)}_{\mathrm{top}}(\pi, T) = \sup_{\mu\in \mathscr{M}^T(X)} (a_1 h_\mu(T) + a_2 h_{\pi_*\mu}(S)). \end{align*} $$

1.2 New approach

In the previous subsection, we have described the original definition of weighted topological entropy introduced by Feng and Huang [Reference Feng and HuangFH16]. In this subsection, we explain our new approach. Our approach is a modification of the familiar definition of topological entropy (not the Hausdorff-dimension-like definition of [Reference BowenBow73]).

First of all, notice that we can assume $a_1+a_2 = 1$ in Question 1.1 because we can reduce the general case to this special case by a simple rescaling. So we study only this case. As in the previous subsection, here we explain the entropy case (that is, the case of $f\equiv 0$ ) for simplicity. We will explain the pressure case in §2.

Let $(X,T)$ and $(Y, S)$ be dynamical systems, and let $\pi :X\to Y$ be a factor map. Let d and $d^\prime $ be metrics on X and Y respectively. For a natural number N we define metrics $d_N$ and $d^\prime _N$ on X and Y respectively by

(1.4) $$ \begin{align} d_N(x_1, x_2) = \max_{0\leq n<N} d(T^n x_1, T^n x_2), \quad d^\prime_N(y_1,y_2) = \max_{0\leq n<N} d^\prime(S^n y_1, S^n y_2). \end{align} $$

For $\varepsilon>0$ and a non-empty subset $\Omega \subset X$ , we define

(1.5)

Here, ${\mathrm {diam}}(U_k, d_N) = \sup _{x_1, x_2\in U_k} d_N(x_1, x_2)$ is the diameter of $U_k$ with respect to the metric $d_N$ . When $\Omega $ is the empty set, we define $\#(\Omega , N, \varepsilon )=0$ . As is well known, the topological entropy of $(X, T)$ is defined by

$$ \begin{align*} h_{\mathrm{top}}(X,T) = \lim_{\varepsilon \to 0} \bigg(\lim_{N\to \infty} \frac{\log \#(X, N, \varepsilon)}{N}\bigg). \end{align*} $$

We will modify this definition.

Let $0\leq w \leq 1$ be a real number. We set

(1.6)

It is easy to check that this quantity is sub-multiplicative in N and monotone in $\varepsilon $ . So we define the w-weighted topological entropy of $\pi :X\to Y$ by

$$ \begin{align*} h_{\mathrm{top}}^w(\pi, T) = \lim_{\varepsilon \to 0} \bigg(\lim_{N\to \infty} \frac{\log \#^w(\pi, N, \varepsilon)}{N}\bigg). \end{align*} $$

This definition uses the metrics d and $d^\prime $ , but the value of $h_{\mathrm {top}}^w(\pi , T)$ is a topological invariant (that is, independent of the choice of metrics).

The quantity $h^w_{\mathrm {top}}(\pi , T)$ provides another solution to Question 1.1 for the case of $f\equiv 0$ and $(a_1, a_2) = (w, 1-w)$ . This is our main result for the weighted topological entropy.

Theorem 1.3. (Variational principle for w-weighted topological entropy)

For $0\leq w\leq 1$ ,

$$ \begin{align*} h_{\mathrm{top}}^w(\pi, T) = \sup_{\mu\in \mathscr{M}^T(X)} \{w h_\mu(T) + (1-w)h_{\pi_*\mu}(S)\}. \end{align*} $$

As the above definition of $h_{\mathrm {top}}^w(\pi , T)$ is close to the standard definition of topological entropy, the proof of this theorem is also close to a well-known proof of the standard variational principle. The basic structure of the proof is borrowed from the famous argument of Misiurewicz [Reference MisiurewiczMis76]. At some technical points, we use the theory of principal extensions [Reference FengDH13, Reference DownarowiczDow11].

By combining Theorems 1.2 and 1.3, we get a corollary.

Corollary 1.4. For $0 \leq w \leq 1$ ,

$$ \begin{align*} h_{\mathrm{top}}^{(w, 1-w)}(\pi, T) = h_{\mathrm{top}}^w(\pi, T). \end{align*} $$

Here the left-hand side is the weighted topological entropy $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ for $(a_1, a_2) = (w, 1-w)$ defined in the previous subsection.

This corollary seems to be a very interesting statement. The author cannot see any direct way to prove it (without using the variational principles).

Problem 1.5. Can one prove the equality $h_{\mathrm {top}}^{(w, 1-w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ without using measure theory?

The following example illustrates the importance of the equality $h_{\mathrm {top}}^{(w, 1-w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ .

Example 1.6. (Bedford–McMullen carpets)

Let $\mathbb {T} = \mathbb {R}/\mathbb {Z}$ be the circle, and let $\mathbb {T}^2 = \mathbb {R}^2/\mathbb {Z}^2$ be the torus. Let a and b be two natural numbers with $a\geq b\geq 2$ . Set $A = \{0,1,2,\ldots , a-1\}$ and $B = \{0,1,2,\ldots , b-1\}$ . Let $R\subset A\times B$ be a non-empty subset, and define

$$ \begin{align*} R^\prime = \{y\in B\mid (x, y) \in R \text{ for some } x\in A\}. \end{align*} $$

We define $X\subset \mathbb {T}^2$ and $Y\subset \mathbb {T}$ by

$$ \begin{gather*} X := \bigg\{\bigg(\sum_{n=1}^\infty \frac{x_n}{a^n}, \sum_{n=1}^\infty \frac{y_n}{b^n}\bigg) \in \mathbb{T}^2 \,\bigg|\, (x_n, y_n)\in R \text{ for all } \ n\geq 1\bigg\}, \\[2pt] Y := \bigg\{\sum_{n=1}^\infty \frac{y_n}{b^n} \in \mathbb{T}\,\bigg|\, y_n \in R^\prime \text{ for all } \ n\geq 1 \bigg\}. \end{gather*} $$

The space X is the famous Bedford–McMullen carpet [Reference Downarowicz and HuczekBed84, Reference McMullenMc84]. We are going to explain that we can calculate the Hausdorff dimension of X (with respect to the natural metric on $\mathbb {T}^2$ ) by using Corollary 1.4.

We define continuous maps $T:X\to X$ and $S:Y\to Y$ by

$$ \begin{align*} T(x, y) = (ax, by), \quad S(y) = by. \end{align*} $$

Here, $(X, T)$ and $(Y, S)$ are dynamical systems. Let $\pi :X\to Y$ be the natural projection. Then $\pi $ is a factor map between $(X, T)$ and $(Y, S)$ . We are interested in its weighted topological entropy. Set

$$ \begin{align*} w = \frac{\log b}{\log a} = \log_a b. \end{align*} $$

We have $0\leq w \leq 1$ . It directly follows from the definitions (the $(a_1, a_2)$ -weighted Bowen ball $B_n^{(a_1, a_2)}(x,\varepsilon )$ for $a_1 = \log _a b$ and $a_2=1-\log _a b$ is approximately a square of side length $\varepsilon b^{-n}$ ) in §1.1 that the Hausdorff dimension of X is given by

$$ \begin{align*} \dim_H X = \frac{h_{\mathrm{top}}^{(w, 1-w)}(\pi, T)}{\log b}. \end{align*} $$

From the equality $h_{\mathrm {top}}^{(w, 1-w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ in Corollary 1.4, we also have

(1.7) $$ \begin{align} \dim_H X = \frac{h_{\mathrm{top}}^w(\pi, T)}{\log b}. \end{align} $$

Now we calculate the w-weighted topological entropy $h^{w}(\pi , T)$ .

Claim 1.7. For each $y\in B$ , we define $t(y)$ as the number of $x\in A$ satisfying $(x, y)\in R$ . Then

$$ \begin{align*} h_{\mathrm{top}}^{w}(\pi, T) = \log\bigg(\sum_{y\in R^\prime} t(y)^w\bigg). \end{align*} $$

Proof. First notice that in the definitions (1.5) and (1.6), we can use closed covers instead of open covers; this does not change their values. Here we will consider closed covers.

We define a metric $d^\prime $ on $\mathbb {T}$ by

$$ \begin{align*} d^\prime(x_1, x_2) = \min_{n\in \mathbb{Z}} |x_1-x_2-n|. \end{align*} $$

We define a metric d on $\mathbb {T}^2$ by

$$ \begin{align*} d((x_1, y_1), (x_2, y_2)) = \max(d^\prime(x_1, x_2), d^\prime(y_1, y_2)). \end{align*} $$

Let $\varepsilon>0$ and take a natural number m with $b^{-m} < \varepsilon $ . Let N be a natural number. For each $v\in (R^\prime )^{N+m}$ , set

$$ \begin{align*} V_v = \bigg\{\sum_{n=1}^\infty \frac{y_n}{b^n} \in Y\bigg| (y_1, \ldots, y_{N+m}) = v \bigg\}. \end{align*} $$

These form a closed covering of Y with ${\mathrm {diam}}(V_v, d^\prime _N) < \varepsilon $ . For each $(u, v)\in R^{N+m} \subset A^{N+m}\times B^{N+m}$ (where $u\in A^{N+m}$ and $v\in (R^\prime )^{N+m}$ ), we set

$$ \begin{align*} U_{(u, v)} = \bigg\{\bigg(\sum_{n=1}^\infty \frac{x_n}{a^n}, \sum_{n=1}^\infty \frac{y_n}{b^n}\bigg) \in X\bigg| (x_1, \ldots, x_{N+m}) = u, \, (y_1, \ldots, y_{N+m}) = v\bigg\}. \end{align*} $$

These are closed subsets of X with ${\mathrm {diam}}(U_{(u,v)}, d_N) < \varepsilon $ and

$$ \begin{align*} \pi^{-1}(V_v) = \bigcup_{\substack{u\in A^{N+m} \\[2pt] \text{with } {(u,v)\in R^{N+m}}}} U_{(u, v)}. \end{align*} $$

Hence, for $v=(v_1, \ldots , v_{N+m}) \in (R^\prime )^{N+m}$ ,

$$ \begin{align*} \#(\pi^{-1}(V_v), N, \varepsilon) \leq t(v_1)\cdots t(v_{N+m}). \end{align*} $$

Therefore,

$$ \begin{align*} \#^w(\pi, N, \varepsilon) \leq \sum_{v_1, \ldots, v_{N+m}\in R^\prime} (t(v_1)\cdots t(v_{N+m}))^{w} = \bigg(\sum_{v\in R^\prime} t(v)^w\bigg)^{N+m}. \end{align*} $$

Thus,

$$ \begin{align*} h_{\mathrm{top}}^w(\pi, T) = \lim_{\varepsilon \to 0} \bigg(\lim_{N\to \infty} \frac{\log \#^w(\pi, N, \varepsilon)}{N}\bigg) \leq \log\bigg(\sum_{y\in B} t(y)^w\bigg). \end{align*} $$

Next, let $0<\varepsilon <{1}/{a}$ . Fix $(p, q) \in R$ . For a natural number N, we consider the following points in Y:

(1.8) $$ \begin{align} \sum_{n=1}^N \frac{v_n}{b^n} + \sum_{n=N+1}^\infty \frac{q}{b^n} \quad (v_1, \ldots, v_N\in R^\prime). \end{align} $$

These points form an $\varepsilon $ -separated set in Y with respect to the metric $d^\prime _N$ . We also consider the following points in X:

(1.9) $$ \begin{align} \bigg(\sum_{n=1}^N \frac{u_n}{a^n}, \sum_{n=1}^N \frac{v_n}{b^n} \bigg) + \sum_{n=N+1}^\infty \bigg(\frac{p}{a^n}, \frac{q}{b^n}\bigg) \quad ((u_1, v_1), \ldots, (u_N, v_N)\in R). \end{align} $$

These points form an $\varepsilon $ -separated set in X with respect to the metric $d_N$ .

Suppose $Y=V_1\cup \cdots \cup V_n$ is a covering with ${\mathrm {diam}}(V_k, d^\prime _N) < \varepsilon $ . Then each $V_k$ contains at most one point of (1.8). If $V_k$ contains a point $\sum _{n=1}^N ({v_n}/{b^n}) + \sum _{n=N+1}^\infty ({q}/{b^n})$ , then $\pi ^{-1}(V_k)$ contains $t(v_1)\cdots t(v_N)$ points of the form (1.9) and hence

$$ \begin{align*} \#(\pi^{-1}(V_k), N, \varepsilon) \geq t(v_1) \cdots t(v_N). \end{align*} $$

So

$$ \begin{align*} \#^w(\pi, N, \varepsilon) \geq \sum_{v_1, \ldots, v_{N}\in R^\prime} (t(v_1)\cdots t(v_{N}))^{w} = \bigg(\sum_{v\in R^\prime} t(v)^w\bigg)^{N}. \end{align*} $$

This shows

$$ \begin{align*} h_{\mathrm{top}}^w(\pi, T) = \lim_{\varepsilon \to 0} \bigg(\lim_{N\to \infty} \frac{\log \#^w(\pi, N, \varepsilon)}{N}\bigg) \geq \log\bigg(\sum_{y\in R^\prime} t(y)^w\bigg). \end{align*} $$

Notice that this proof of the claim is completely elementary. We have not used any sophisticated technique (in particular, measure theory).

From (1.7) and Claim 1.7,

(1.10) $$ \begin{align} \dim_H X = \frac{\log(\sum_{y\in R^\prime} t(y)^w)}{\log b} = \log_b \bigg(\sum_{y\in R^\prime} t(y)^{\log_a b}\bigg). \end{align} $$

This is a famous formula for the Hausdorff dimension of the Bedford–McMullen carpet [Reference Downarowicz and HuczekBed84, Reference McMullenMc84]. Therefore, we conclude that the equality $h_{\mathrm {top}}^{(1-w,w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ provides this famous formula fairly easily. This suggests that the equality $h_{\mathrm {top}}^{(1-w, w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ is a rather deep statement. We can say that it is a topological generalization of the dimension formula for the Bedford–McMullen carpet.

Kenyon–Peres [Reference Kenyon and PeresKP96b, Theorems 1.1 and 3.2] generalized the formula (1.10) to closed T-invariant subsets of $\mathbb {T}^2$ which correspond to subshifts of finite type or sofic subshifts under the natural Markov partition. We can also prove their results from the equality $h_{\mathrm {top}}^{(1-w,w)}(\pi , T) = h_{\mathrm {top}}^w(\pi , T)$ as in the above.

The above example also illustrates that the two notions $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ and $h_{\mathrm {top}}^w(\pi , T)$ have their own advantages. One of the great advantages of $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ is that its definition is intrinsically related to the Hausdorff dimension. So it can be directly applied to the study of geometric measure theory. The advantage of $h_{\mathrm {top}}^w(\pi , T)$ is that its definition is elementary and hence (sometimes) easy to calculate.

In [Reference Feng and HuangFH16, pp. 441], Feng and Huang asked how to generalize their result to $\mathbb {Z}^d$ -actions. It seems rather straightforward to generalize our new approach to $\mathbb {Z}^d$ -actions and, possibly, actions of amenable groups.

Problem 1.8. Suppose that both $h_{\mathrm {top}}^{(a_1, a_2)}(\pi , T)$ and $h_{\mathrm {top}}^w(\pi , T)$ are generalized to group actions. Can one deduce any interesting consequence of their coincidence (like the above calculation of the Hausdorff dimension of the Bedford–McMullen carpet)?

We would like to mention the papers of Barral and Feng [Reference BedfordBF09, Reference Barral and FengBF12] and Feng [Reference FengFen11] (see also Yayama [Reference YayamaYa11a, Reference YayamaYa11b]). These papers studied Question 1.1 and related questions when $(X, T)$ and $(Y, S)$ are subshifts over finite alphabets. When $(X, T)$ and $(Y, S)$ are subshifts, the above definition of $h_{\mathrm {top}}^w(\pi , T)$ (and its pressure version in §2) is essentially the same as that given in [Reference BedfordBF09, Theorem 1.1] (see also [Reference Barral and FengBF12, Theorem 3.1]). So we can say that the above definition generalizes the approach in [Reference BedfordBF09, Theorem 1.1] from subshifts to general dynamical systems.

This paper studies only the abstract theory of $h_{\mathrm {top}}^w(\pi , T)$ and its pressure version. However, the main motivation for the author to introduce these quantities is not to develop an abstract theory. The author naturally came up with the above definition of $h_{\mathrm {top}}^w(\pi , T)$ when he studied the mean Hausdorff dimension of certain infinite dimensional fractals. (The mean Hausdorff dimension is a dynamical version of the Hausdorff dimension introduced in [Reference McMullenLT19].) We plan to describe this connection in a separate paper.

2 Weighted topological pressure

In this section, we introduce our new definition of weighted topological pressure. For the original approach, see [Reference Feng and HuangFH16, §3.1].

Let $\pi :X\to Y$ be a factor map from a dynamical systems $(X, T)$ to a dynamical system $(Y, S)$ . Let $f:X\to \mathbb {R}$ be a continuous function.

Let d and $d^\prime $ be metrics on X and Y respectively. For a natural number N, we define new metrics $d_N$ and $d^\prime _N$ on X and Y respectively by (1.4). We also define a continuous function $\mathbb {S}_N f:X\to \mathbb {R}$ by

$$ \begin{align*} \mathbb{S}_N f(x) = f(x) + f(Tx) + f(T^2 x) + \cdots + f(T^{N-1}x). \end{align*} $$

The metrics $d_N, d^\prime _N$ , and function $\mathbb {S}_N f$ are sometimes denoted by $d^T_N, (d^\prime )^S_N$ , and $\mathbb {S}^T_N f$ respectively for clarifying the underlying dynamics.

For $\varepsilon>0$ and a non-empty subset $\Omega \subset X$ , we define

(2.1)

(When $U_k$ is the empty set, we assume that the term $\exp (\sup _{U_k} \mathbb {S}_N f)$ is zero.) We sometimes denote $P(\Omega , f, N, \varepsilon )$ by $P_T(\Omega , f, N, \varepsilon )$ for clarifying the map T. When $\Omega $ is the empty set, we define $P(\Omega , f, N, \varepsilon ) = 0$ . It is well known that the topological pressure of $(X, T, f)$ is given by

$$ \begin{align*} P(T, f) = \lim_{\varepsilon\to 0} \bigg(\lim_{N\to \infty} \frac{\log P(X, f, N, \varepsilon)}{N}\bigg). \end{align*} $$

We will modify this definition. Let $0\leq w \leq 1$ be a real number. We set

(2.2)

We sometimes denote this by $P^w_T(\pi , f, N, \varepsilon )$ .

The quantity $P^w(\pi , f, N,\varepsilon )$ is sub-multiplicative in N and monotone in $\varepsilon $ . So we define the w-weighted topological pressure by

$$ \begin{align*} P^w(\pi, T, f) = \lim_{\varepsilon\to 0} \bigg(\lim_{N\to \infty} \frac{\log P^w(\pi, f, N,\varepsilon)}{N}\bigg). \end{align*} $$

The value of $P^w(\pi , T, f)$ is independent of the choices of the metrics d and $d^\prime $ . So it provides a topological invariant. We sometimes use the notation $P^w(\pi , X, T, Y, S, f)$ instead of $P^w(\pi , T, f)$ for clarifying all the data involved.

Now we state our main result of the paper.

Theorem 2.1. (Variational principle for w-weighted topological pressure)

For any $0\leq w\leq 1$ ,

$$ \begin{align*} P^w(\pi, T, f) = \sup_{\mu\in \mathscr{M}^T(X)} \bigg(w h_\mu(T) + (1-w)h_{\pi_*\mu}(S) + w\int_X f\, d\mu\bigg). \end{align*} $$

When $f\equiv 0$ , we have $P^w(\pi , T, f) = h_{\mathrm {top}}^w(\pi , T)$ . So Theorem 1.3 in §1.2 follows from Theorem 2.1. The proof of Theorem 2.1 occupies the rest of the paper.

For the simplicity of the notation, we write

(2.3) $$ \begin{align} P^w_{\mathrm{var}}(\pi, T, f) := \sup_{\mu\in \mathscr{M}^T(X)} \bigg(w h_\mu(T) + (1-w)h_{\pi_*\mu}(S) + w\int_X f\, d\mu\bigg). \end{align} $$

(Here var is the abbreviation of variational.) Then our main purpose is to prove the equality

$$ \begin{align*} P^w(\pi, T, f) = P^w_{\mathrm{var}}(\pi, T, f). \end{align*} $$

In the rest of this section, we gather some elementary properties of w-weighted topological pressure. Here we always assume that $\pi :(X, T)\to (Y, S)$ is a factor map between dynamical systems with a continuous function $f:X\to \mathbb {R}$ . We take $0\leq w\leq 1$ . Let d and $d^\prime $ be metrics on X and Y respectively.

Lemma 2.2. Let m be a natural number.

$$ \begin{align*} P^w(\pi, T^m, \mathbb{S}_m f) = m P^w(\pi, T, f). \end{align*} $$

Here the left-hand side is $P^w(\pi , X,T^m,Y,S^m, \mathbb {S}^T_m f)$ .

Proof. Let $\varepsilon $ be a positive number. There exists $0<\delta <\varepsilon $ such that

$$ \begin{align*} d(x_1, x_2) < \delta \Longrightarrow d^T_m(x_1, x_2) < \varepsilon \quad (x_1, x_2\in X), \end{align*} $$
$$ \begin{align*} d^\prime(y_1, y_2) < \delta \Longrightarrow (d^\prime)^S_m(y_1,y_2) < \varepsilon \quad (y_1, y_2\in Y). \end{align*} $$

Then for any natural number N,

$$ \begin{align*} d^{T^m}_N(x_1, x_2) < \delta \Longrightarrow d^T_{mN}(x_1, x_2) < \varepsilon \quad (x_1, x_2\in X), \end{align*} $$
$$ \begin{align*} (d^\prime)^{S^m}_N(y_1,y_2) < \delta \Longrightarrow (d^\prime)^S_{mN}(y_1,y_2) < \varepsilon \quad (y_1, y_2\in Y). \end{align*} $$

Because $\mathbb {S}^{T^m}_N (\mathbb {S}^T_m f) = \mathbb {S}^T_{mN} f$ , for any subset $\Omega \subset X$ ,

$$ \begin{align*} P_{T^m}(\Omega, \mathbb{S}^T_m f, N,\varepsilon) \leq P_T(\Omega, f, mN, \varepsilon) \leq P_{T^m}(\Omega, \mathbb{S}^T_m f, N, \delta). \end{align*} $$

Then,

$$ \begin{align*} P^w_{T^m}(\pi, \mathbb{S}^T_m f, N,\varepsilon) \leq P^w_T(\pi, f, mN, \varepsilon) \leq P^w_{T^m}(\pi, \mathbb{S}^T_m f, N, \delta ). \end{align*} $$

Thus,

$$ \begin{align*} P^w(\pi, T^m, \mathbb{S}^T_m f) = m P^w(\pi, T, f). \\[-36pt] \end{align*} $$

Lemma 2.3. Let $(X^\prime , T^\prime )$ be a dynamical system, and let $\varphi : (X^\prime , T^\prime ) \to (X, T)$ be a factor map.

Then

$$ \begin{align*} P^w(\pi, T, f) \leq P^w(\pi\circ \varphi, T^\prime, f\circ \varphi). \end{align*} $$

Here the right-hand side is $P^w(\pi \circ \varphi , X^\prime , T^\prime , Y, S, f\circ \varphi )$ .

Proof. Let $\tilde {d}$ be a metric on $X^\prime $ . For any $\varepsilon>0$ there exists $0<\delta <\varepsilon $ satisfying

$$ \begin{align*} \tilde{d}(x_1, x_2) < \delta \Longrightarrow d(\varphi(x_1), \varphi(x_2)) < \varepsilon. \end{align*} $$

Then for any $N>0$

$$ \begin{align*} \tilde{d}_N(x_1, x_2) < \delta \Longrightarrow d_N(\varphi(x_1), \varphi(x_2)) < \varepsilon. \end{align*} $$

From this, we have for any $\Omega \subset X^\prime $

$$ \begin{align*} P_T(\varphi(\Omega), f, N, \varepsilon) \leq P_{T^\prime}(\Omega, f\circ \varphi, N, \delta). \end{align*} $$

For any $V\subset Y$

$$ \begin{align*} \varphi((\pi\circ \varphi)^{-1}(V)) = \pi^{-1}(V). \end{align*} $$

So

$$ \begin{align*} P_T(\pi^{-1}(V), f, N, \varepsilon) \leq P_{T^\prime}((\pi\circ \varphi)^{-1}(V), f \circ \varphi, N, \delta). \end{align*} $$

Then

$$ \begin{align*} P^w_T(\pi, f, N,\varepsilon) \leq P^w_{T^\prime}(\pi\circ \varphi, f\circ \varphi, N, \delta ). \end{align*} $$

Therefore

$$ \begin{align*} P^w(\pi, T, f) \leq P^w(\pi\circ \varphi, T^\prime, f\circ \varphi). \\[-36pt] \end{align*} $$

The next lemma is a bit complicated. It might be better for some readers to look at Remark 2.5 below before reading the lemma. It will provide a clearer perspective.

Lemma 2.4. Let $(Y^\prime , S^\prime )$ be a dynamical system and let $\phi :(Y^\prime , S^\prime )\to (Y,S)$ be a factor map. Define the fiber product,

$$ \begin{align*} X\times_Y Y^\prime = \{(x, y)\in X\times Y^\prime\mid \pi(x) = \phi(y)\}. \end{align*} $$

Now, $(X\times _Y Y^\prime , T\times S^\prime )$ becomes a dynamical system. We define factor maps $\varphi : X\times _Y Y^\prime \to X$ and $\Pi :X\times _Y Y^\prime \to Y^\prime $ by

$$ \begin{align*} \varphi(x, y) = x, \quad \Pi(x,y) = y. \end{align*} $$

The diagram is as follows:

Then,

$$ \begin{align*} P^w(\pi, T, f) \leq P^w(\Pi, T\times S^\prime, f\circ \varphi). \end{align*} $$

Here the right-hand side is $P^w(\Pi , X\times _Y Y^\prime , T\times S^\prime , Y^\prime , S^\prime , f\circ \varphi )$ .

Proof. The point of the proof is that for any subset $A\subset Y^\prime $ , we have

$$ \begin{align*} \pi^{-1}(\phi(A)) = \varphi(\Pi^{-1}(A)). \end{align*} $$

Let $\tilde {d}$ be a metric on $Y^\prime $ and we define a metric $\rho $ on $X\times _Y Y^\prime $ by

$$ \begin{align*} \rho((x_1, y_1), (x_2, y_2)) = \max(d(x_1,x_2), \tilde{d}(y_1,y_2)). \end{align*} $$

Let $\varepsilon $ be a positive number. We have

$$ \begin{align*} \rho((x_1, y_1), (x_2, y_2)) < \varepsilon \Longrightarrow d(x_1, x_2) < \varepsilon. \end{align*} $$

Then for any natural number N and any subset $\Omega \subset X\times _Y Y^\prime $ ,

$$ \begin{align*} P_T(\varphi(\Omega), f, N, \varepsilon) \leq P_{T\times S^\prime}(\Omega, f\circ \varphi, N, \varepsilon). \end{align*} $$

In particular, for any subset $A \subset Y^\prime $ ,

(2.4) $$ \begin{align} P_T(\pi^{-1}(\phi(A)), f, N, \varepsilon) & = P_T(\varphi(\Pi^{-1}(A)), f, N, \varepsilon) \nonumber\\[2pt] & \leq P_{T\times S^\prime}(\Pi^{-1}(A), f\circ \varphi, N, \varepsilon). \end{align} $$

There exists $0<\delta <\varepsilon $ such that

$$ \begin{align*} \tilde{d}(y_1, y_2) < \delta \Longrightarrow d^\prime(\phi(y_1), \phi(y_2)) < \varepsilon. \end{align*} $$

Now we claim that

$$ \begin{align*} P^w_T(\pi, f, N, \varepsilon) \leq P^w_{T\times S^\prime}(\Pi, f\circ \varphi, N,\delta). \end{align*} $$

Indeed, take any positive number C with

$$ \begin{align*} P^w_{T\times S^\prime}(\Pi, f\circ \varphi, N,\delta) < C. \end{align*} $$

Then there exists an open covering $Y^\prime = V_1\cup \cdots \cup V_n$ such that ${\mathrm {diam}}(V_k, \tilde {d}_N) < \delta $ for all $1\leq k \leq n$ and

$$ \begin{align*} \sum_{k=1}^n (P_{T\times S^\prime}(\Pi^{-1}(V_k), f\circ \varphi, N, \delta ))^w < C. \end{align*} $$

We can find compact subsets $A_k \subset V_k$ satisfying $Y^\prime = A_1\cup \cdots \cup A_n$ . We have

$$ \begin{align*} \begin{split} \sum_{k=1}^n (P_T(\pi^{-1}(\phi(A_k)), f, N, \varepsilon))^w & \leq \sum_{k=1}^n (P_{T\times S^\prime}(\Pi^{-1}(A_k), f\circ \varphi, N, \varepsilon ))^w \quad \text{by } (2.4) \\[2pt] & \leq \sum_{k=1}^n (P_{T\times S^\prime}(\Pi^{-1}(A_k), f\circ \varphi, N, \delta ))^w \quad \text{by } \delta<\varepsilon \\[2pt] &\leq \sum_{k=1}^n (P_{T\times S^\prime}(\Pi^{-1}(V_k), f\circ \varphi, N, \delta ))^w \quad \text{by } A_k\subset V_k \\[2pt] & < C. \end{split} \end{align*} $$

Each $\phi (A_k)$ is a closed subset of Y with ${\mathrm {diam}}(\phi (A_k), d^\prime _N) < \varepsilon $ . By the definition (2.1), there exist open subsets $W_k \supset \phi (A_k)$ of Y for $1\leq k\leq n$ such that ${\mathrm {diam}}(W_k, d^\prime _N) < \varepsilon $ and

$$ \begin{align*} \sum_{k=1}^n (P(\pi^{-1}(W_k), f, N, \varepsilon))^w < C. \end{align*} $$

Noticing $Y = W_1\cup \cdots \cup W_n$ , we have

$$ \begin{align*} P^w_T(\pi, f, N, \varepsilon) < C. \end{align*} $$

Because C is an arbitrary number larger than $P^w_{T\times S^\prime }(\Pi , f\circ \varphi , N,\delta )$ , this shows

$$ \begin{align*} P^w_T(\pi, f, N, \varepsilon) \leq P^w_{T\times S^\prime}(\Pi, f\circ \varphi, N,\delta). \end{align*} $$

Thus, we conclude

$$ \begin{align*} P^w(\pi, T, f) \leq P^w(\Pi, T\times S^\prime, f\circ \varphi). \\[-36pt] \end{align*} $$

Remark 2.5. Let $(X^\prime , T^\prime )$ and $(Y^\prime , S^\prime )$ be dynamical systems, and let $\pi ^\prime :X^\prime \to Y^\prime $ be a factor map. Suppose there exist factor maps $\varphi :(X^\prime , T^\prime ) \to (X, T)$ and $\phi :(Y^\prime , S^\prime )\to (Y, S)$ satisfying $\pi \circ \varphi = \phi \circ \pi ^\prime $ .

Then,

(2.5) $$ \begin{align} P^w(\pi, T, f) \leq P^w(\pi^\prime, T^\prime, f\circ \varphi). \end{align} $$

Here the right-hand side is $P^w(\pi ^\prime , X^\prime , T^\prime , Y^\prime , S^\prime , f\circ \varphi )$ . Lemmas 2.3 and 2.4 are special cases of this statement. We can prove (2.5) by using the variational principle (Theorem 2.1). However, it seems difficult to prove it in an elementary way. We will not use (2.5) in the paper.

Finally, we mention two basic results on calculus, which underpin many arguments of this paper.

Lemma 2.6.

  1. (1) For $0\leq w \leq 1$ and non-negative numbers $x, y$ ,

    $$ \begin{align*} (x+y)^w \leq x^w + y^w. \end{align*} $$
  2. (2) Let $p_1, \ldots , p_n$ be non-negative numbers with $p_1+\cdots +p_n = 1$ . For any real numbers $x_1, \ldots , x_n$ ,

    $$ \begin{align*} \sum_{i=1}^n (-p_i \log p_i + p_i x_i) \leq \log \sum_{i=1}^n e^{x_i}. \end{align*} $$
    In particular (letting $x_1 = \cdots = x_n = 0$ ),
    $$ \begin{align*} - \sum_{i=1}^n p_i \log p_i \leq \log n. \end{align*} $$

Proof. (1) is completely elementary. (2) is proved in [Reference WaltersWal82, §9.3, Lemma 9.9].

3 Kolmogorov–Sinai entropy

In this section, we review basic definitions on Kolmogorov–Sinai entropy. For the details, see the book of Walters [Reference WaltersWal82].

Let $(X, \mu )$ be a probability measure space, namely X is a set equipped with a $\sigma $ -algebra and $\mu $ is a probability measure defined on it. In our later applications, X is always a compact metrizable space with the standard Borel $\sigma $ -algebra.

Let $\mathscr {A} = \{A_1, A_2, \ldots , A_n\}$ be a finite measurable partition of X, namely each $A_i$ is a measurable subset of X and

$$ \begin{align*} X = \bigcup_{i=1}^n A_i, \quad A_i \cap A_j = \emptyset \quad (i\neq j). \end{align*} $$

We define the Shannon entropy of $\mathscr {A}\ {}$ by

$$ \begin{align*} H_\mu(\mathscr{A}{\kern2pt}) = -\sum_{i=1}^n \mu(A_i) \log \mu(A_i), \end{align*} $$

where we assume $0 \log 0 = 0$ .

For another finite measurable partition $\mathscr{A}{\kern2pt}^\prime = \{A^\prime _1, A^\prime _2, \ldots , A^\prime _m\}$ , we set

$$ \begin{align*} \mathscr{A} \vee \mathscr{A}^\prime = \{A_i \cap A^\prime_j \mid 1\leq i \leq n, 1\leq j\leq m\}. \end{align*} $$

This is a finite measurable partition of X. We define the conditional entropy by

$$ \begin{align*} H_\mu(\mathscr{A}\mid \mathscr{A}{\kern2pt}^\prime) = -\sum_{\substack{1\leq j\leq m \\[2pt] \text{with } {\mu(A^\prime_j)> 0}}} \mu(A_j^\prime) \bigg\{\sum_{i=1}^n \frac{\mu(A_i \cap A^\prime_j)}{\mu(A^\prime_j)} \log \frac{\mu(A_i \cap A^\prime_j)}{\mu(A^\prime_j)}\bigg\}. \end{align*} $$

Here, in the first summation, we have considered only the index j satisfying $\mu (A^\prime _j) \!>\! 0$ . We have [Reference WaltersWal82, Theorem 4.3(i)]

$$ \begin{align*} H_\mu(\mathscr{A}\vee \mathscr{A}{\kern2pt}^\prime) = H_\mu(\mathscr{A}{\kern2pt}^\prime) + H_\mu(\mathscr{A}\mid \mathscr{A}{\kern2pt}^\prime). \end{align*} $$

We write $\mathscr{A}{\kern2pt}^\prime \prec \mathscr {A}$ if $\mathscr {A}\vee \mathscr{A}{\kern2pt}^\prime = \mathscr {A}$ . This is equivalent to the condition that for every $A \in \mathscr {A}$ , there exists $A^\prime \in \mathscr{A}{\kern2pt}^\prime $ containing A. If $\mathscr{A}{\kern2pt}^\prime \prec \mathscr {A}$ , then

$$ \begin{align*} H_\mu(\mathscr{A}\mid\mathscr{A}{\kern2pt}^\prime) = H_\mu(\mathscr{A}{\kern2pt}) - H_\mu(\mathscr{A}{\kern2pt}^\prime) \end{align*} $$

and $H_\mu (\mathscr{A}{\kern2pt}^\prime ) \leq H_\mu (\mathscr{A}{\kern2pt})$ .

Lemma 3.1.

  1. (1) $H_\mu (\mathscr{A}{\kern2pt})$ is subadditive in $\mathscr {A}$ . Namely, for two finite measurable partitions $\mathscr {A}\ {}$ and $\mathscr{A}{\kern2pt}^\prime $ of X,

    $$ \begin{align*} H_\mu(\mathscr{A}\vee \mathscr{A}{\kern2pt}^\prime) \leq H_\mu(\mathscr{A}{\kern2pt}) + H_\mu(\mathscr{A}{\kern2pt}^\prime). \end{align*} $$
  2. (2) $H_\mu (\mathscr{A}{\kern2pt})$ is concave in $\mu $ . Namely, for $0\leq t\leq 1$ and two probability measures $\mu $ and $\mu ^\prime $ on X,

    $$ \begin{align*} H_{(1-t)\mu + t\mu^\prime}(\mathscr{A}{\kern2pt}) \geq (1-t)H_\mu(\mathscr{A}{\kern2pt}) + t H_\mu(\mathscr{A}{\kern2pt}). \end{align*} $$

Proof. See [Reference WaltersWal82, Theorem 4.3(viii)] and [Reference WaltersWal82, §8.1 Remark] for the proofs of (1) and (2) respectively.

Let $T:X\to X$ be a measurable map satisfying $T_*\mu = \mu $ . Let $\mathscr {A}\ {}$ be a finite measurable partition of X. For a natural number N, we define a new measurable partition $\mathscr {A}^{\kern3pt N}$ of X by

$$ \begin{align*} \mathscr{A}^N = \mathscr{A}\vee T^{-1}\mathscr{A} \vee T^{-2}\mathscr{A} \vee \cdots \vee T^{-(N-1)}\mathscr{A}. \end{align*} $$

We define the entropy $h_\mu (T, \mathscr{A}{\kern2pt})$ by

$$ \begin{align*} h_\mu(T, \mathscr{A}{\kern2pt}) = \lim_{N\to \infty} \frac{H_\mu(\mathscr{A}^N)}{N}. \end{align*} $$

Finally, we define the Kolmogorov–Sinai entropy of the measure-preserving transformation T by

$$ \begin{align*} h_\mu(T) = \sup\{h_\mu(T,\mathscr{A}{\kern2pt})\mid \mathscr{A} \text{ is a finite measurable partition of } X\}. \end{align*} $$

We will need the following lemma later. See theorem 4.12(iv) of the book [Reference WaltersWal82, §4.5] for the proof.

Lemma 3.2. If $\mathscr {A}\ {}$ and $\mathscr{A}{\kern2pt}^\prime $ are two finite measurable partitions of X, then

$$ \begin{align*} h_\mu(T, \mathscr{A}{\kern2pt}) \leq h_\mu(T, \mathscr{A}{\kern2pt}^\prime) + H_\mu(\mathscr{A} \mid \mathscr{A}{\kern2pt}^\prime). \end{align*} $$

4 Proof of $P^w_{\mathrm {var}}(\pi , T, f) \leq P^w(\pi , T, f)$

Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems and let $f:X\to \mathbb {R}$ be a continuous function. The purpose of this section is to prove a half of the variational principle.

Proposition 4.1. For any $0\leq w\leq 1$ and $\mu \in \mathscr {M}^T(X)$ ,

$$ \begin{align*} w h_\mu(T) + (1-w)h_{\pi_*\mu}(S) + w\int_X f\, d{\hspace{-0.8pt}\mu} \leq P^w(\pi, T, f). \end{align*} $$

Therefore, $P^w_{\mathrm {var}}(\pi , T, f) \leq P^w(\pi , T, f)$ .

Proof. Set $\nu = \pi _*\mu $ . This is an invariant probability measure on Y. We will prove

(4.1) $$ \begin{align} w h_\mu(T) + (1-w)h_{\nu}(S) + w\int_X f\, d\mu \leq P^w(\pi, T, f) + 1+ 2\log 2. \end{align} $$

If this is proved, then we will get the above statement by the standard amplification trick. Namely, for each natural number m, we apply (4.1) to $\pi :(X, T^m)\to (Y, S^m)$ with a continuous function $\mathbb {S}_m f:X\to \mathbb {R}$ :

$$ \begin{align*} w h_\mu(T^m) + (1-w)h_{\nu}(S^m) + w\int_X \mathbb{S}_m f\, d\mu \leq P^w(\pi, T^m, \mathbb{S}_m f) + 1+ 2\log 2. \end{align*} $$

We have $h_\mu (T^m) = m h_\mu (T)$ , $h_{\nu }(S^m) = m h_{\nu }(S)$ , $\int _X \mathbb {S}_m f\, d\mu = m\int _X f\, d\mu $ and

$$ \begin{align*} P^w(\pi, T^m, \mathbb{S}_m f) = m P^w(\pi, T, f) \quad (\text{Lemma } 2.2). \end{align*} $$

Hence,

$$ \begin{align*} w h_\mu(T) + (1-w)h_{\nu}(S) + w\int_X f\, d\mu \leq P^w(\pi, T, f) + \frac{1+ 2\log 2}{m}. \end{align*} $$

Letting $m\to \infty $ , we get the statement. So it is enough to prove (4.1).

Let $\mathscr {A} = \{A_1, \ldots , A_\alpha \}$ be a finite measurable partition of Y and let $\mathscr {B}$ be a finite measurable partition of X. We will prove that

(4.2) $$ \begin{align} w h_\mu(T,\mathscr{B}\kern1.6pt) + (1-w) h_\nu(S,\mathscr{A}{\kern2pt}) + w \int_X f\, d\mu \leq P^w(\pi, T, f) + 1 + 2\log 2. \end{align} $$

For each $A_a$ in $\mathscr {A}\ {}$ ( $1\leq a \leq \alpha $ ), we take a compact subset $C_a\subset A_a$ satisfying

(4.3) $$ \begin{align} \sum_{a=1}^\alpha \nu(A_a\setminus C_a) < \frac{1}{\log \alpha}. \end{align} $$

We set $C_0 = Y\setminus (C_1\cup \cdots \cup C_\alpha )$ and $\mathscr {C} = \{C_0, C_1, C_2, \ldots , C_\alpha \}$ .

Claim 4.2. $\mathscr {C}$ is a finite measurable partition of Y satisfying

$$ \begin{align*} h_\nu(S, \mathscr{A}{\kern2pt}) < h_\nu(S,\mathscr{C}\kern1.7pt) + 1. \end{align*} $$

Proof. From Lemma 3.2,

$$ \begin{align*} h_\nu(S, \mathscr{A}{\kern2pt}) \leq h_\nu(S,\mathscr{C}\kern1.7pt) + H_\nu(\mathscr{A}\mid \mathscr{C}\kern1.7pt). \end{align*} $$

Because $C_a\subset A_a$ for $1\leq a\leq \alpha $ ,

$$ \begin{align*} H_\nu(\mathscr{A}\mid \mathscr{C}\kern1.7pt) = \nu(C_0) \sum_{a=1}^\alpha\bigg(\!{-}\,\frac{\nu(A_a\cap C_0)}{\nu(C_0)} \log \frac{\nu(A_a\cap C_0)}{\nu(C_0)}\bigg) \leq \nu(C_0) \log \alpha. \end{align*} $$

The last term is smaller than one by (4.3).

We consider $\mathscr {B}\vee \pi ^{-1}(\mathscr {C}\kern1.7pt)$ , which has the form

$$ \begin{align*} \mathscr{B}\vee \pi^{-1}(\mathscr{C}\kern1.7pt) = \{B_{ab}\mid 0\leq a \leq \alpha, 1\leq b\leq \beta_a\}, \quad \pi^{-1}(C_a) = \bigcup_{b=1}^{\beta_a} B_{ab} \quad (0\leq a\leq \alpha). \end{align*} $$

For each $B_{ab} \ (0\leq a\leq \alpha , 1\leq b\leq \beta _a)$ , we take a compact subset $D_{ab}\subset B_{ab}$ such that

(4.4) $$ \begin{align} \sum_{a=0}^\alpha \log \beta_a \bigg(\sum_{b=1}^{\beta_a} \mu(B_{ab}\setminus D_{ab})\bigg) < 1. \end{align} $$

We set

$$ \begin{align*} D_{a0} = \pi^{-1}(C_a) \setminus \bigcup_{b=1}^{\beta_a} D_{ab} \quad (0\leq a \leq \alpha). \end{align*} $$

We define

$$ \begin{align*} \mathscr{D} = \{D_{ab}\mid 0\leq a\leq \alpha, 0\leq b\leq \beta_a \}. \end{align*} $$

Claim 4.3. $\mathscr {D}$ is a finite measurable partition of X with $\pi ^{-1}(\mathscr {C}\kern1.7pt) \prec \mathscr {D}$ and

$$ \begin{align*} h_\mu(T,\mathscr{B}\kern1.6pt) \leq h_\mu(T, \mathscr{D}) + 1. \end{align*} $$

Proof. $\pi ^{-1}(\mathscr {C}\kern1.7pt) \prec \mathscr {D}$ is obvious by the construction.

$$ \begin{align*} \begin{split} h_\mu(T,\mathscr{B}\kern1.6pt) & \leq h_\mu(T, \mathscr{B}\vee \pi^{-1}(\mathscr{C}\kern1.7pt)) \\[2pt] &\leq h_\mu(T, \mathscr{D}) + H_\mu(\mathscr{B}\vee \pi^{-1}(\mathscr{C}\kern1.7pt)\mid \mathscr{D}) \quad \text{by Lemma } 3.2. \end{split} \end{align*} $$

Because $D_{ab}\subset B_{ab}$ for $0\leq a\leq \alpha $ and $1\leq b\leq \beta _a$ ,

$$ \begin{align*} \begin{split} H_\mu(\mathscr{B}\vee \pi^{-1}(\mathscr{C}\kern1.7pt)\mid \mathscr{D}) & = \sum_{a=0}^\alpha \mu(D_{a0}) \sum_{b=1}^{\beta_a}\bigg(\!{-}\,\frac{\mu(D_{a0}\cap B_{ab})}{\mu(D_{a0})} \log \frac{\mu(D_{a0}\cap B_{ab})}{\mu(D_{a0})}\bigg) \\[2pt] &\leq \sum_{a=0}^\alpha \mu(D_{a0}) \log \beta_a \\[2pt] & < 1 \quad \text{by } (4.4). \end{split} \\[-38pt] \end{align*} $$

We will prove that

$$ \begin{align*} w h_\mu(T, \mathscr{D}) + (1-w) h_\nu(S,\mathscr{C}\kern1.7pt) + w \int_X f\, d\mu \leq P^w(\pi, T, f) + 2\log 2. \end{align*} $$

If this is proved, then (4.2) will follow from Claims 4.2 and 4.3.

From the definition of the entropy,

$$ \begin{align*} \begin{split} w h_\mu(T, \mathscr{D}) + (1-w) h_\nu(S,\mathscr{C}\kern1.7pt) & = \lim_{N\to \infty} \bigg(w \cdot \frac{H_\mu(\mathscr{D}^N)}{N} + (1-w)\cdot \frac{H_\nu(\mathscr{C}^N)}{N}\bigg) \\[2pt] & = \lim_{N\to \infty}\frac{1}{N} \{H_\nu(\mathscr{C}^N) + w (H_\mu(\mathscr{D}^N) - H_\nu(\mathscr{C}^N))\}. \end{split} \end{align*} $$

Because $\nu = \pi _*\mu $ , we have $H_\nu (\mathscr {C}^N) = H_\mu (\pi ^{-1}(\mathscr {C}^N))$ . Because $\pi ^{-1}(\mathscr {C}^N) \prec \mathscr {D}^N$ ,

$$ \begin{align*} H_\mu(\mathscr{D}^N) - H_\mu(\pi^{-1}(\mathscr{C}^N)) = H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)). \end{align*} $$

So,

$$ \begin{align*} w h_\mu(T, \mathscr{D}) + (1-w) h_\nu(S,\mathscr{C}\kern1.7pt) = \lim_{N\to \infty}\frac{1}{N} \{H_\nu(\mathscr{C}^N) + w \cdot H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N))\}. \end{align*} $$

We have

$$ \begin{align*} \int_X f\, d\mu = \frac{1}{N} \int_X \mathbb{S}_N f\, d\mu. \end{align*} $$

Therefore,

(4.5) $$ \begin{align} & w h_\mu(T, \mathscr{D}) + (1-w) h_\nu(S,\mathscr{C}\kern1.7pt) + w\int_X f\, d\mu \nonumber\\[2pt] &\quad = \lim_{N\to \infty}\frac{1}{N} \{H_\nu(\mathscr{C}^N) + w \cdot H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + w \int_X \mathbb{S}_N f\, d\mu \}. \end{align} $$

For $C\in \mathscr {C}^N$ , we define

$$ \begin{align*} \mathscr{D}^N_C = \{D\in \mathscr{D}^N\mid D\cap \pi^{-1}(C) \neq \emptyset\} = \{D\in \mathscr{D}^N\mid D\subset\pi^{-1}(C), \, D\neq \emptyset \}. \end{align*} $$

Then,

$$ \begin{align*} \pi^{-1}(C) = \bigcup_{D\in \mathscr{D}^N_C} D. \end{align*} $$

For $C\in \mathscr {C}^N$ with $\nu (C)>0$ and $D\in \mathscr {D}^N_C$ , we set

$$ \begin{align*} \mu(D\mid C) = \frac{\mu(D)}{\nu(C)} = \frac{\mu(D)}{\mu(\pi^{-1}(C))}. \end{align*} $$

For each $C\in \mathscr {C}^N$ with $\nu (C)>0$ , we have

$$ \begin{align*} \sum_{D\in \mathscr{D}^N_C} \mu(D\mid C) = 1. \end{align*} $$

Claim 4.4. We have the following inequality:

$$ \begin{align*} H_\nu(\mathscr{C}^N) + w \cdot H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + w \int_X \mathbb{S}_N f\, d\mu \leq \log \sum_{C\in \mathscr{C}^N} \bigg(\sum_{D\in \mathscr{D}^N_C} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w. \end{align*} $$

Proof. We have

$$ \begin{align*} \begin{split} \int_X \mathbb{S}_N f\, d\mu & = \sum_{D\in \mathscr{D}^N} \int_D \mathbb{S}_N f\, d\mu \leq \sum_{D\in \mathscr{D}^N} \mu(D) \sup_{D} \mathbb{S}_N f \\[2pt] & = \sum_{\substack{C\in \mathscr{C}^N \\[2pt] \text{with } {\nu(C)>0}}} \nu(C) \bigg(\sum_{D\in \mathscr{D}^N_C} \mu(D\mid C) \sup_D \mathbb{S}_N f\bigg). \end{split} \end{align*} $$

Hence,

$$ \begin{align*} & H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + \int_X \mathbb{S}_N f\, d\mu \\[2pt] &\quad \leq \sum_{\substack{C\in \mathscr{C}^N \\[2pt] \text{with } {\nu(C)>0}}} \nu(C)\bigg\{ \sum_{D\in \mathscr{D}^N_C} \Big( -\mu(D\mid C)\log \mu(D\mid C) + \mu(D\mid C) \sup_D \mathbb{S}_N f\Big)\bigg\}. \end{align*} $$

By Lemma 2.6(2),

$$ \begin{align*} \sum_{D\in \mathscr{D}^N_C} \Big( -\mu(D\mid C)\log \mu(D\mid C) + \mu(D\mid C) \sup_D \mathbb{S}_N f\Big) \leq \log \sum_{D\in \mathscr{D}^N_C} e^{\sup_D \mathbb{S}_N f}. \end{align*} $$

So

$$ \begin{align*} H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + \int_X \mathbb{S}_N f\, d\mu \leq \sum_{C\in \mathscr{C}^N} \nu(C) \bigg(\log \sum_{D\in \mathscr{D}^N_C} e^{\sup_D \mathbb{S}_N f}\bigg). \end{align*} $$

Therefore,

$$ \begin{align*} \begin{split} & H_\nu(\mathscr{C}^N) + w \cdot H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + w \int_X \mathbb{S}_N f\, d\mu \\[2pt] & \leq \sum_{C\in \mathscr{C}^N} \bigg\{-\nu(C) \log \nu(C) + \nu(C) \log \bigg(\sum_{D\in \mathscr{D}^N_C} e^{\sup_D \mathbb{S}_N f}\bigg)^w\bigg\} \\[2pt] & \leq \log \sum_{C\in \mathscr{C}^N} \bigg(\sum_{D\in \mathscr{D}^N_C} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w \quad \text{by Lemma } {2.6}(2). \end{split} \\[-50pt] \end{align*} $$

We take metrics d and $d^\prime $ on X and Y respectively. Recall that $C_a \ (1\leq a\leq \alpha )$ are mutually disjoint compact subsets of Y and that $D_{ab} \ (0\leq a \leq \alpha , 1\leq b \leq \beta _a)$ are mutually disjoint compact subsets of X. Hence, we can take $\varepsilon>0$ such that:

  1. (a) for any $y\in C_a$ and $y^\prime \in C_{a^\prime }$ with distinct $1\leq a, a^\prime \leq \alpha $ ,

    $$ \begin{align*} \varepsilon < d^\prime(y, y^\prime); \end{align*} $$
  2. (b) for any $x\in D_{ab}$ and $x^\prime \in D_{a b^\prime }$ with $0\leq a\leq \alpha $ and distinct $1\leq b, b^\prime \leq \beta _a$ ,

    $$ \begin{align*} \varepsilon < d(x, x^\prime). \end{align*} $$

Claim 4.5. Let N be a natural number.

  1. (1) If a subset $V\subset Y$ has ${\mathrm {diam}}(V,d^\prime _N) < \varepsilon $ , then the number of $C\in \mathscr {C}^N$ having non-empty intersection with V is at most $2^N$ :

    $$ \begin{align*} |\{C\in \mathscr{C}^N\mid C\cap V\neq \emptyset\}| \leq 2^N. \end{align*} $$
  2. (2) If a subset $U\subset X$ has ${\mathrm {diam}}(U, d_N) < \varepsilon $ , then for each $C\in \mathscr {C}^N$ , the number of $D\in \mathscr {D}^N_C$ having non-empty intersection with U is at most $2^N$ :

    $$ \begin{align*} |\{D\in \mathscr{D}^N_C\mid D\cap U \neq \emptyset\}| \leq 2^N. \end{align*} $$

Proof. (1) For each $0\leq k < N$ , the set $S^k V$ may have non-empty intersection with $C_0$ and at most one set in $\{C_1, C_2, \ldots , C_\alpha \}$ . The above statement follows from this.

(2) Suppose $C\in \mathscr {C}^N$ has the form

$$ \begin{align*} C = C_{a_0}\cap S^{-1}C_{a_1}\cap S^{-2}C_{a_2} \cap \cdots \cap S^{-(N-1)}C_{a_{N-1}}, \end{align*} $$

with $0\leq a_0, \ldots , a_{N-1} \leq \alpha $ . Recall that $\{D_{a_k 0}, D_{a_k 1}, D_{a_k 2}, \ldots , D_{a_k \beta _{a_k}}\}$ is a partition of $\pi ^{-1}(C_{a_k})$ . Then any set $D\in \mathscr {D}^N_C$ has the form

$$ \begin{align*} D = D_{a_0 b_0} \cap T^{-1}D_{a_1 b_1} \cap T^{-2} D_{a_2 b_2} \cap \cdots \cap T^{-(N-1)} D_{a_{N-1} b_{N-1}}, \end{align*} $$

with $0 \leq b_k \leq \beta _{a_k}$ for $0\leq k \leq N-1$ .

For each $0\leq k <N$ , the set $T^k U$ may have non-empty intersection with $D_{a_k 0}$ and, at most, one set in $\{D_{a_k 1}, D_{a_k 2}, \ldots , D_{a_k \beta _{a_k}}\}$ . Now the above statement follows from this.

Let N be a natural number. Suppose we are given an open cover $Y= V_1\cup \cdots \cup V_n$ with ${\mathrm {diam}}(V_i, d^\prime _N) < \varepsilon $ for all $1\leq i\leq n$ . Moreover, suppose that for each $1\leq i\leq n$ , we are given an open cover $\pi ^{-1}(V_i) = U_{i1}\cup U_{i2}\cup \cdots \cup U_{i m_i}$ with ${\mathrm {diam}}(U_{ij}, d_N) < \varepsilon $ for all $1\leq j\leq m_i$ . We are going to prove

(4.6) $$ \begin{align} \log \sum_{C\in \mathscr{C}^N} \bigg(\sum_{D\in \mathscr{D}^N_C} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w \leq 2N\log 2 + \log\sum_{i=1}^n\bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w\!\!\!. \end{align} $$

Suppose this is proved. Then by Claim 4.4,

$$ \begin{align*} &H_\nu(\mathscr{C}^N) + w \cdot H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + w \int_X \mathbb{S}_N f\, d\mu\nonumber\\[2pt] &\quad \leq 2N\log 2 + \log\sum_{i=1}^n\bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w\!\!\!. \end{align*} $$

Taking the infimum over $\{V_i\}$ and $\{U_{ij}\}$ satisfying the above assumptions, we have

$$ \begin{align*} H_\nu(\mathscr{C}^N) + w \cdot H_\mu(\mathscr{D}^N\mid \pi^{-1}(\mathscr{C}^N)) + w \int_X \mathbb{S}_N f\, d\mu \leq 2N\log 2 + \log P^w(\pi, f, N, \varepsilon). \end{align*} $$

Divide this by N and let $N\to \infty $ . Recalling (4.5), we get

$$ \begin{align*} w h_\mu(T,\mathscr{D}) + (1-w) h_\nu(S,\mathscr{C}\kern1.7pt) + w\int_X f\, d\mu \leq 2 \log 2 + \lim_{N\to \infty} \frac{\log P^w(\pi, f, N, \varepsilon)}{N}. \end{align*} $$

Letting $\varepsilon \to 0$ , we get the desired result:

$$ \begin{align*} w h_\mu(T,\mathscr{D}) + (1-w) h_\nu(S,\mathscr{C}\kern1.7pt) + w\int_X f\, d\mu \leq 2 \log 2 + P^w(\pi, T, f). \end{align*} $$

So the rest of the work is to prove (4.6).

For $D\in \mathscr {D}^N$ , we have

$$ \begin{align*} e^{\sup_D \mathbb{S}_N f} \leq \sum_{U_{ij}\cap D \neq \emptyset} e^{\sup_{U_{ij}} \mathbb{S}_N f}. \end{align*} $$

Here the sum is taken over the index $(i, j)$ such that $U_{ij}$ has non-empty intersection with D.

Let $C\in \mathscr {C}^N$ . We define $\mathscr {V}_C$ as the set of $1\leq i \leq n$ such that $V_i \cap C \neq \emptyset $ . By Claim 4.5(2),

$$ \begin{align*} \sum_{D\in \mathscr{D}_C^N} e^{\sup_{D} \mathbb{S}_N f} \leq 2^N \sum_{i\in \mathscr{V}_C} \sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}. \end{align*} $$

Then (recall $0\leq w\leq 1$ ),

$$ \begin{align*} \begin{split} \bigg(\sum_{D\in \mathscr{D}_C^N} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w & \leq 2^{Nw} \bigg(\sum_{i\in \mathscr{V}_C}\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w \\[2pt] & \leq 2^{Nw} \sum_{i\in \mathscr{V}_C} \bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w \quad \text{by Lemma } 2.6(1). \end{split} \end{align*} $$

Hence,

$$ \begin{align*} \sum_{C\in \mathscr{C}^N} \bigg(\sum_{D\in \mathscr{D}_C^N} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w \leq 2^{Nw} \sum_{C\in \mathscr{C}^N} \bigg\{\sum_{i\in \mathscr{V}_C} \bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w\bigg\}. \end{align*} $$

By Claim 4.5(1), for each $1\leq i \leq n$ , the number of $C\in \mathscr {C}^N$ satisfying $i \in \mathscr {V}_C$ is at most $2^N$ . So the right-hand side is bounded from above by

$$ \begin{align*} 2^{Nw} \cdot 2^N \sum_{i=1}^n \bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w\!\!\!. \end{align*} $$

Therefore,

$$ \begin{align*} \sum_{C\in \mathscr{C}^N} \bigg(\sum_{D\in \mathscr{D}_C^N} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w \leq 2^{Nw} \cdot 2^N \sum_{i=1}^n \bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w\!\!\!. \end{align*} $$

Taking the logarithm,

$$ \begin{align*} \begin{split} \log \sum_{C\in \mathscr{C}^N} \bigg(\sum_{D\in \mathscr{D}_C^N} e^{\sup_{D} \mathbb{S}_N f}\bigg)^w & \leq (N+Nw) \log 2 + \log \sum_{i=1}^n \bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w \\[2pt] & \leq 2N\log 2 + \log \sum_{i=1}^n \bigg(\sum_{j=1}^{m_i} e^{\sup_{U_{ij}} \mathbb{S}_N f}\bigg)^w\!\!\!. \end{split} \end{align*} $$

This is the estimate (4.6). So we have finished the proof of the proposition.

5 Zero-dimensional principal extension

In this section, we prepare some definitions and results on principal extensions. The main reference is the book of Downarowicz [Reference DownarowiczDow11].

Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems. Let d be a metric on X. We define the topological conditional entropy of $\pi $ by

$$ \begin{align*} h_{\mathrm{top}}(X, T\mid Y, S) = \lim_{\varepsilon\to 0} \bigg(\lim_{N\to \infty} \frac{\sup_{y\in Y} \log \#(\pi^{-1}(y), N, \varepsilon)}{N}\bigg). \end{align*} $$

Here, $\#(\pi ^{-1}(y), N, \varepsilon )$ is the number defined by (1.5). It is easy to check that the quantity

$$ \begin{align*} \sup_{y\in Y} \log \#(\pi^{-1}(y), N, \varepsilon) \end{align*} $$

is sub-additive in N and monotone in $\varepsilon $ . So the above limits exist. This definition of the topological conditional entropy arises from [Reference DownarowiczDow11, Lemma 6.8.2].

The factor map $\pi $ is said to be principal if $h_{\mathrm {top}}(X,T\mid Y,S)=0$ . In the case that this condition holds, the dynamical system $(X, T)$ is called a principal extension of $(Y, S)$ .

The next theorem shows an important consequence of this condition. This is proved in [Reference DownarowiczDow11, Corollary 6.8.9]. (See also the paper of Ledrappier and Walters [Reference Lindenstrauss and TsukamotoLW77].)

Theorem 5.1. A principal factor map preserves Kolmogorov–Sinai entropy. Namely, if $\pi :(X, T)\to (Y, S)$ is a principal factor map between dynamical systems, then for any invariant probability measure $\mu \in \mathscr {M}^T(X)$ ,

$$ \begin{align*} h_\mu(T) = h_{\pi_*\mu}(S). \end{align*} $$

Remark 5.2. Indeed, [Reference DownarowiczDow11, Corollary 6.8.9] proves the following more precise result. Let $\pi :(X, T)\to (Y, S)$ be a factor map with ${h_{\mathrm {top}}}(Y, S) < \infty $ . Then $\pi $ is a principal factor map if and only if $h_\mu (T) = h_{\pi _*\mu }(S)$ for all $\mu \in \mathscr {M}^T(X)$ .

Lemma 5.3. Let $(X, T), (Y, S), (Y^\prime , S^\prime )$ be dynamical systems. Let $\pi :X\to Y$ be a factor map and let $\phi :Y^\prime \to Y$ be a principal factor map. We define the fiber product (see Lemma 2.4)

$$ \begin{align*} X\times_Y Y^\prime = \{(x, y)\in X\times Y^\prime\mid \pi(x) = \phi(y)\}. \end{align*} $$

So, $(X\times _Y Y^\prime , T\times S^\prime )$ becomes a dynamical system. We define factor maps $\varphi : X\times _Y Y^\prime \to X$ and $\Pi :X\times _Y Y^\prime \to Y^\prime $ by

$$ \begin{align*} \varphi(x, y) = x, \quad \Pi(x,y) = y. \end{align*} $$

Then $\varphi $ is a principal factor map. (The map $\Pi $ is not used in this statement, but we have introduced it for convenience in what follows.)

Proof. Let d and $d^\prime $ be metrics on X and $Y^\prime $ respectively. We define a metric $\rho $ on $X\times _Y Y^\prime $ by

$$ \begin{align*} \rho((x_1, y_1), (x_2, y_2)) = \max(d(x_1, x_2), d^\prime(y_1, y_2)). \end{align*} $$

For any natural number N and $x\in X$ , the metric space

$$ \begin{align*} (\varphi^{-1}(x), \rho_N) \end{align*} $$

is isometric to $(\phi ^{-1}(\pi (x)), d^\prime _N)$ . Therefore, for any $\varepsilon>0$ ,

$$ \begin{align*} \#(\varphi^{-1}(x), N, \varepsilon) = \#(\phi^{-1}(\pi(x)), N, \varepsilon). \end{align*} $$

So (recall that a factor map is always surjective),

$$ \begin{align*} \sup_{x\in X} \#(\varphi^{-1}(x), N, \varepsilon) = \sup_{x\in X} \#(\phi^{-1}(\pi(x)), N, \varepsilon) = \sup_{y\in Y} \#(\phi^{-1}(y), N, \varepsilon). \end{align*} $$

Thus,

$$ \begin{align*} {h_{\mathrm{top}}}(X\times_Y Y^\prime, T\times S^\prime\mid X, T) = {h_{\mathrm{top}}}(Y^\prime, S^\prime\mid Y, S) = 0. \\[-36pt] \end{align*} $$

The next theorem is a key technical result. This is proved in [Reference DownarowiczDow11, Theorem 7.6.1]. (See also [Reference FengDH13].) Here recall that a compact metrizable space is said to be zero-dimensional if clopen subsets form an open basis of the topology (a subset of a topological space is called clopen if it is closed and open). For example, the Cantor set $\{0, 1\}^{\mathbb {N}}$ is zero-dimensional. A dynamical system $(X, T)$ is said to be zero-dimensional if X is a zero-dimensional compact metrizable space.

Theorem 5.4. Every dynamical system has a zero-dimensional principal extension. Namely, for any dynamical system $(X, T)$ , there exist a dynamical system $(X^\prime , T^\prime )$ and a factor map $\phi :X^\prime \to X$ such that $X^\prime $ is zero-dimensional and $\phi $ is principal.

Recall that we have defined two terms $P^w(\pi ,T, f)$ and $P^w_{\mathrm {var}}(\pi , T, f)$ in §2.

Corollary 5.5. Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems with a continuous function $f:X\to \mathbb {R}$ . There exists a factor map $\pi ^\prime :(X^\prime , T^\prime )\to (Y^\prime , S^\prime )$ with a continuous function $f^\prime :X^\prime \to \mathbb {R}$ satisfying the following two conditions.

  1. (1) $X^\prime $ and $Y^\prime $ are zero-dimensional.

  2. (2) For any $0\leq w\leq 1$ , we have

    $$ \begin{align*} P^w(\pi,T, f) \leq P^w(\pi^\prime, T^\prime, f^\prime), \quad P^w_{\mathrm{var}}(\pi^\prime, T^\prime, f^\prime) \leq P^w_{\mathrm{var}}(\pi,T, f). \end{align*} $$

Proof. By Theorem 5.4, there exists a zero-dimensional principal extension $\phi :(Y^\prime ,S^\prime )\to (Y, S)$ . We consider the fiber product $(X\times _Y Y^\prime , T\times S^\prime )$ and the projections $\varphi : X\times _Y Y^\prime \to X$ and $\Pi : X\times _Y Y^\prime \to Y^\prime $ as in Lemma 5.3. Then $\varphi $ is a principal factor map.

By Lemma 2.4, for any $0\leq w\leq 1$ ,

$$ \begin{align*} P^w(\pi, T, f) \leq P^w(\Pi, T\times S^\prime, f\circ\varphi). \end{align*} $$

Here, the right-hand side is $P^w(\Pi , X\times _Y Y^\prime , T\times S^\prime , Y^\prime , S^\prime , f\circ \varphi )$ . By Theorem 5.1, for any invariant probability measure $\mu \in \mathscr {M}^{T\times S^\prime }(X\times _Y Y^\prime )$ ,

$$ \begin{align*} h_\mu(T\times S^\prime) = h_{\varphi_*\mu}(T), \quad h_{\Pi_*\mu}(S^\prime) = h_{\phi_*\Pi_*\mu}(S) = h_{\pi_*\varphi_*\mu}(S). \end{align*} $$

Then,

(5.1) $$ \begin{align} & P^w_{\mathrm{var}}(\Pi, T\times S^\prime, f\circ\varphi) \nonumber\\[2pt] & \quad=\sup_{\mu\in \mathscr{M}^{T\times S^\prime}(X\times_Y Y^\prime)} \bigg\{w h_\mu(T\times S^\prime) + (1-w)h_{\Pi_*\mu}(S^\prime) + w\int_{X\times_Y Y^\prime} f\circ \varphi\, d\mu\bigg\} \nonumber\\[2pt] & \quad= \sup_{\mu\in \mathscr{M}^{T\times S^\prime}(X\times_Y Y^\prime)} \bigg\{w h_{\varphi_*\mu}(T) + (1-w) h_{\pi_*\varphi_*\mu}(S) + w \int_X f \, d(\varphi_*\mu)\bigg\} \nonumber\\[2pt] & \quad\leq P^w_{\mathrm{var}}(\pi, T, f). \end{align} $$

(Here, we prove $P^w_{\mathrm {var}}(\Pi , T\times S^\prime , f\circ \varphi ) \leq P^w_{\mathrm {var}}(\pi , T, f)$ . Indeed we can prove the equality $P^w_{\mathrm {var}}(\Pi , T\times S^\prime , f\circ \varphi ) = P^w_{\mathrm {var}}(\pi , T, f)$ because the map $\varphi _*:\mathscr {M}^{T\times S^\prime }(X\times _Y Y^\prime )\to \mathscr {M}^{T}(X)$ is surjective. However, we do not need this.)

By applying Theorem 5.4 to the system $(X\times _Y Y^\prime , T\times S^\prime )$ , there exists a zero-dimensional principal extension $\psi : (X^\prime , T^\prime ) \to (X\times _Y Y^\prime , T\times S^\prime )$ .

By Lemma 2.3,

$$ \begin{align*} P^w(\Pi, T\times S^\prime, f\circ\varphi) \leq P^w(\Pi\circ \psi, T^\prime, f\circ \varphi\circ \psi). \end{align*} $$

Here, the right-hand side is $P^w(\Pi \circ \psi , X^\prime , T^\prime , Y^\prime , S^\prime , f \circ \varphi \circ \psi )$ . As in the above (5.1), by Theorem 5.1,

$$ \begin{align*} P^w_{\mathrm{var}}(\Pi\circ \psi, T^\prime, f\circ \varphi\circ \psi) \leq P^w_{\mathrm{var}}(\Pi, T\times S^\prime, f\circ\varphi). \end{align*} $$

So we conclude

$$ \begin{align*} P^w(\pi, T, f) {{\kern-2pt}\leq{\kern-2pt}} P^w(\Pi\circ \psi, T^\prime, f\circ \varphi\circ \psi), \ \ \, P^w_{\mathrm{var}}(\Pi\circ \psi, T^\prime, f\circ \varphi\circ \psi) {{\kern-2pt}\leq{\kern-2pt}} P^w_{\mathrm{var}}(\pi, T, f). \end{align*} $$

Set $\pi ^\prime := \Pi \circ \psi : (X^\prime , T^\prime ) \to (Y^\prime , S^\prime )$ and $f^\prime := f\circ \varphi \circ \psi : X^\prime \to \mathbb {R}$ . These satisfy the required conditions.

6 Completion of the proof of the variational principle

In this section, we prove $P^w(\pi , T, f) \leq P^w_{\mathrm {var}}(\pi , T, f)$ and complete the proof of the variational principle. First, we consider the case of zero-dimensional dynamical systems. Later, we will reduce the general case to this zero-dimensional case.

Proposition 6.1. Let $\pi :(X,T)\to (Y, S)$ be a factor map between zero-dimensional dynamical systems. Then, for any $0\leq w\leq 1$ and a continuous function $f:X\to \mathbb {R}$ ,

$$ \begin{align*} P^w(\pi, T, f) \leq P^w_{\mathrm{var}}(\pi, T, f). \end{align*} $$

Proof. Let $\varepsilon>0$ . We will prove that there exists $\mu \in \mathscr {M}^T(X)$ satisfying

$$ \begin{align*} w h_\mu(T) + (1-w)h_{\pi_*\mu}(S) + w \int_X f\, d\mu \geq \lim_{N\to \infty} \frac{\log P^w(\pi, f, N, \varepsilon)}{N}. \end{align*} $$

We take metrics d and $d^\prime $ on X and Y respectively. Let $Y = A_1\cup \cdots \cup A_\alpha $ be a clopen partition (that is, $A_a$ are mutually disjoint clopen subsets of Y) with ${\mathrm {diam}} (A_a, d^\prime ) < \varepsilon $ for all $1\leq a\leq \alpha $ . Here we have used $\dim Y=0$ .

From $\dim X = 0$ , for each $1\leq a\leq \alpha $ , we can also take a clopen partition

$$ \begin{align*} \pi^{-1}(A_a) = \bigcup_{b=1}^{\beta_a} B_{ab} \quad \text{with } {{\mathrm{diam}}(B_{ab}, d) < \varepsilon} \text{ for all } {1\leq b\leq \beta_a}. \end{align*} $$

Set $\mathscr {A} = \{A_1, \ldots , A_\alpha \}$ and $\mathscr {B} = \{B_{ab}\mid 1\leq a\leq \alpha , 1\leq b\leq \beta _a\}$ . These are clopen partitions of Y and X respectively. We have $\pi ^{-1}(\mathscr{A}{\kern2pt}) \prec \mathscr {B}$ .

Let N be a natural number. We have $\pi ^{-1}(\mathscr {A}^{\kern3pt N}) \prec \mathscr {B}^N$ . For each non-empty $A\in \mathscr {A}^{\kern3pt N}$ , we define

$$ \begin{align*} \mathscr{B}^N_A = \{B\in \mathscr{B}^N\mid B\cap \pi^{-1}(A) \neq \emptyset\} = \{B\in \mathscr{B}^N\mid B\subset \pi^{-1}(A), B \neq \emptyset\}. \end{align*} $$

We have

$$ \begin{align*} \pi^{-1}(A) = \bigcup_{B\in \mathscr{B}^N_A} B. \end{align*} $$

We set

$$ \begin{align*} Z_{N,A} = \sum_{B\in \mathscr{B}^N_A} e^{\sup_{B} \mathbb{S}_N f}. \end{align*} $$

Define

$$ \begin{align*} Z_N = \sum_{A\in \mathscr{A}^N} (Z_{N,A})^w. \end{align*} $$

Here, the sum is taken over only non-empty $A \in \mathscr {A}^{\kern3pt N}$ . When we consider below a sum over $A\in \mathscr {A}^{\kern3pt N}$ (or $B\in \mathscr {B}^N$ ), we always assume that A (or B) is not empty.

We have

$$ \begin{align*} P^w(\pi, f, N, \varepsilon) \leq Z_N. \end{align*} $$

So it is enough to prove that there exists $\mu \in \mathscr {M}^T(X)$ satisfying

$$ \begin{align*} w h_\mu(T,\mathscr{B}\kern1.6pt) + (1-w)h_{\pi_*\mu}(S,\mathscr{A}{\kern2pt}) + w \int_X f\, d\mu \geq \lim_{N\to \infty} \frac{\log Z_N}{N}, \end{align*} $$

where the limit in the right-hand side exists because $Z_N$ is sub-multiplicative in N.

Let N be a natural number. For non-empty $B\in \mathscr {B}^N$ , we denote by $\mathscr {A}^{\kern3pt N}(B)$ the unique element of $\mathscr {A}^{\kern3pt N}$ containing $\pi (B)$ . For non-empty $A\in \mathscr {A}^{\kern3pt N}$ , we have $\mathscr {A}^{\kern3pt N}(B) = A$ for all $B\in \mathscr {B}^n_A$ .

For each non-empty set B in $\mathscr {B}^N$ , we take a point $x_B\in B$ satisfying $\mathbb {S}_N f(x_B) = \sup _{B} \mathbb {S}_N f$ . (Such a point exists because B is closed.) We define a probability measure on X by

$$ \begin{align*} \begin{split} \sigma_N & = \frac{1}{Z_N} \sum_{B\in \mathscr{B}^N} (Z_{N,\mathscr{A}^N(B)})^{w-1} e^{\mathbb{S}_N f(x_B)} \cdot \delta_{x_B} \\[2pt] & = \frac{1}{Z_N} \sum_{A\in \mathscr{A}^N} \sum_{B\in \mathscr{B}^N_A} (Z_{N,A})^{w-1} e^{\mathbb{S}_N f(x_B)} \cdot \delta_{x_B}. \end{split} \end{align*} $$

Here, $\delta _{x_B}$ is the delta probability measure at the point $x_B$ and $\sigma _N$ is not an invariant measure in general. We set

$$ \begin{align*} \mu_N = \frac{1}{N} \sum_{n=0}^{N-1} T^n_*\sigma_N. \end{align*} $$

We can take a subsequence $\{\mu _{N_k}\}$ converging to an invariant probability measure $\mu $ on X in the weak $^*$ topology. We will prove that this measure $\mu $ satisfies

$$ \begin{align*} w h_\mu(T) + (1-w)h_{\pi_*\mu}(S) + w \int_X f\, d\mu \geq \lim_{N\to \infty} \frac{\log Z_N}{N}. \end{align*} $$

Claim 6.2. For every natural number N,

$$ \begin{align*} w H_{\sigma_N}(\mathscr{B}^N) + (1-w) H_{\pi_* \sigma_N}(\mathscr{A}^N) + w \int_X \mathbb{S}_N f\, d\sigma_N = \log Z_N. \end{align*} $$

Proof. We have

$$ \begin{align*} \pi_*\sigma_N = \frac{1}{Z_N} \sum_{B\in \mathscr{B}^N} (Z_{N,\mathscr{A}^N(B)})^{w-1} e^{\mathbb{S}_N f(x_B)}\cdot \delta_{\pi(x_B)}. \end{align*} $$

For each non-empty $A\in \mathscr {A}^{\kern3pt N}$ ,

$$ \begin{align*} \begin{split} \pi_*\sigma_N(A) & = \frac{1}{Z_N} \sum_{B\in \mathscr{B}^N_A} (Z_{N,\mathscr{A}^N(B)})^{w-1} e^{\mathbb{S}_N f(x_B)} \\[2pt] &= \frac{1}{Z_N} (Z_{N,A})^w \quad \text{by } {\mathscr{A}^N(B) = A} \text{ for } {B\in \mathscr{B}^N_A}. \end{split} \end{align*} $$

Then,

(6.1) $$ \begin{align} H_{\pi_*\sigma_N} (\mathscr{A}^N) = \log Z_N - w\sum_{A\in \mathscr{A}^N}\frac{(Z_{N, A})^w}{Z_N} \log Z_{N,A}. \end{align} $$

For non-empty $B\in \mathscr {B}^N$ ,

$$ \begin{align*} \sigma_N(B) = \frac{(Z_{N, \mathscr{A}^N(B)})^{w-1}}{Z_N} e^{\mathbb{S}_N f(x_B)}. \end{align*} $$

Then,

$$ \begin{align*} \begin{split} H_{\sigma_N}(\mathscr{B}^N) &= - \sum_{B\in \mathscr{B}^N} \frac{(Z_{N, \mathscr{A}^N(B)})^{w-1}}{Z_N} e^{\mathbb{S}_N f(x_B)} \log \bigg(\frac{(Z_{N, \mathscr{A}^N(B)})^{w-1}}{Z_N} e^{\mathbb{S}_N f(x_B)}\bigg) \\[2pt] &= \frac{\log Z_N}{Z_N} \underbrace{\sum_{B\in \mathscr{B}^N} (Z_{N, \mathscr{A}^N(B)})^{w-1} e^{\mathbb{S}_N f(x_B)}}_{\text{(I)}} \\[2pt] &\quad - \frac{w-1}{Z_N} \underbrace{\sum_{B\in \mathscr{B}^N} (Z_{N,\mathscr{A}^N(B)})^{w-1} e^{\mathbb{S}_N f(x_B)} \log Z_{N, \mathscr{A}^N(B)}}_{\text{(II)}} \\[2pt] &\quad- \underbrace{\sum_{B\in \mathscr{B}^N} \frac{(Z_{N, \mathscr{A}^N(B)})^{w-1}}{Z_N} e^{\mathbb{S}_N f(x_B)} \mathbb{S}_N f (x_B)}_{\text{(III)}}. \end{split} \end{align*} $$

We calculate the term (I) by

$$ \begin{align*} \text{(I)} = \sum_{A\in \mathscr{A}^N} \sum_{B\in \mathscr{B}^N_A} (Z_{N, A})^{w-1} e^{\mathbb{S}_N f (x_B)} = \sum_{A\in \mathscr{A}^N} (Z_{N, A})^{w-1} \cdot Z_{N,A} = Z_N. \end{align*} $$

The term (II) is calculated by

$$ \begin{align*} \text{(II)} = \sum_{A\in \mathscr{A}^N} \sum_{B\in \mathscr{B}_A^N} (Z_{N,A})^{w-1} e^{\mathbb{S}_N f(x_B)} \log Z_{N,A} = \sum_{A\in \mathscr{A}^N} (Z_{N,A})^w \log Z_{N,A}. \end{align*} $$

For the term (III), we consider

$$ \begin{align*} \int_X \mathbb{S}_N f\, d\sigma_N = \frac{1}{Z_N} \sum_{B\in \mathscr{B}^N} (Z_{N, \mathscr{A}^N(B)})^{w-1} e^{\mathbb{S}_N f(x_B)} \mathbb{S}_N f(x_B) = \text{(III)}. \end{align*} $$

Thus,

$$ \begin{align*} H_{\sigma_N}(\mathscr{B}^N) + \int_X \mathbb{S}_N f\, d\sigma_N = \log Z_N - \frac{w-1}{Z_N} \sum_{A\in \mathscr{A}^N} (Z_{N,A})^{w} \log Z_{N, A}. \end{align*} $$

Combining this with (6.1),

$$ \begin{align*} w H_{\sigma_N}(\mathscr{B}^N) + (1-w) H_{\pi_* \sigma_N}(\mathscr{A}^N) + w \int_X \mathbb{S}_N f\, d\sigma_N = \log Z_N. \\[-42pt] \end{align*} $$

Claim 6.3. Let M and N be natural numbers. We have

$$ \begin{align*} \begin{split} \frac{1}{M} H_{\mu_N}(\mathscr{B}^M) & \geq \frac{1}{N} H_{\sigma_N}(\mathscr{B}^N) -\frac{2M \log |\mathscr{B}|}{N}, \\[2pt] \frac{1}{M} H_{\pi_* \mu_N}(\mathscr{A}^M) & \geq \frac{1}{N} H_{\pi_*\sigma_N}(\mathscr{A}^N) -\frac{2M \log |\mathscr{A}|}{N}. \end{split} \end{align*} $$

Here, $|\mathscr {A}|$ and $|\mathscr {B}|$ are the cardinalities of $\mathscr {A}\ {}$ and $\mathscr {B}$ respectively.

Proof. This is rather standard. (See the proof of the standard variational principle in [Reference WaltersWal82, §8.2].) Here we provide the proof for $\mathscr {B}^M$ . The case of $\mathscr {A}^M$ is the same.

From the concavity of the entropy function (Lemma 3.1(2)), for $\mu _n = ({1}/{N})\sum _{n=0}^{N-1} T^n_*\sigma _N$ ,

(6.2) $$ \begin{align} H_{\mu_N}(\mathscr{B}^M) \geq \frac{1}{N}\sum_{n=0}^{N-1}H_{T^n_*\sigma_N}(\mathscr{B}^M) = \frac{1}{N}\sum_{n=0}^{N-1}H_{\sigma_N}(T^{-n}\mathscr{B}^M). \end{align} $$

Let $N = qM + r$ with $0\leq r <M$ ,

(6.3) $$ \begin{align} \sum_{n=0}^{N-1} H_{\sigma_N} (T^{-n}\mathscr{B}^M) & = \sum_{t=0}^{M-1}\sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) - \sum_{n=qM+r}^{qM+M-1} H_{\sigma_N}(T^{-n}\mathscr{B}^M) \nonumber\\[2pt] & \geq \sum_{t=0}^{M-1}\sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) - M \log |\mathscr{B}^M| \nonumber\\[2pt] & \geq \sum_{t=0}^{M-1}\sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) - M^2 \log |\mathscr{B}|. \end{align} $$

We estimate $\sum _{s=0}^q H_{\sigma _N}(T^{-sM-t}\mathscr {B}^M)$ from below for each t. We have

$$ \begin{align*} T^{-sM-t}\mathscr{B}^M = \bigvee_{m=0}^{M-1} T^{-(sM+t+m)}\mathscr{B}. \end{align*} $$

When we fix $0\leq t\leq M-1$ and move $0\leq s\leq q$ and $0\leq m \leq M-1$ , the number $sM+t+m$ moves over

$$ \begin{align*} t, t+1, t+2, \ldots, t+(q+1)M-1 \quad \text{without multiplicity}. \end{align*} $$

Hence,

$$ \begin{align*} &\sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) + \sum_{n=0}^{t-1} H_{\sigma_N}(T^{-n}\mathscr{B}\kern1.6pt) \\[2pt] &\quad \geq H_{\sigma_N}\bigg(\bigvee_{n=0}^{t+(q+1)M-1} T^{-n}\mathscr{B}\bigg) \quad \text{by Lemma } {3.1}(1) \\[2pt] &\quad\geq H_{\sigma_N}(\mathscr{B}^N) \quad \text{by } {t+(q+1)M \geq (q+1)M> N}. \end{align*} $$

Therefore,

$$ \begin{align*} \begin{split} \sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) & \geq H_{\sigma_N}(\mathscr{B}^N) - \sum_{n=0}^{t-1} H_{\sigma_N}(T^{-n}\mathscr{B}\kern1.6pt) \\[2pt] & \geq H_{\sigma_N}(\mathscr{B}^N) - t \log |\mathscr{B}| \\[2pt] & \geq H_{\sigma_N}(\mathscr{B}^N) - M \log |\mathscr{B}| \quad \text{by } t<M. \end{split} \end{align*} $$

Thus,

$$ \begin{align*} \sum_{t=0}^{M-1}\sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) \geq M\cdot H_{\sigma_N}(\mathscr{B}^N) - M^2 \log |\mathscr{B}|. \end{align*} $$

So by (6.3),

$$ \begin{align*} \begin{split} \sum_{n=0}^{N-1} H_{\sigma_N}(T^{-n}\mathscr{B}^M) & \geq \sum_{t=0}^{M-1}\sum_{s=0}^q H_{\sigma_N}(T^{-sM-t}\mathscr{B}^M) - M^2 \log |\mathscr{B}| \\[2pt] & \geq M\cdot H_{\sigma_N}(\mathscr{B}^N) - 2 M^2 \log |\mathscr{B}|. \end{split} \end{align*} $$

From (6.2), we conclude that

$$ \begin{align*} \frac{1}{M}H_{\mu_N}(\mathscr{B}^M) \geq \frac{1}{N M}\sum_{n=0}^{N-1} H_{\sigma_N}(T^{-n}\mathscr{B}^M) \geq \frac{1}{N} H_{\sigma_N}(\mathscr{B}^N) - \frac{2M\log |\mathscr{B}|}{N}. \\[-48pt] \end{align*} $$

We have

$$ \begin{align*} \int_X f\, d\mu_N = \frac{1}{N} \int_X \sum_{n=0}^{N-1} f\circ T^n \, d\sigma_N = \frac{1}{N} \int_X \mathbb{S}_N f\, d\sigma_N. \end{align*} $$

Claim 6.3 implies

$$ \begin{align*} \begin{split} & \frac{w}{M} H_{\mu_N}(\mathscr{B}^M) + \frac{1-w}{M}H_{\pi_* \mu_N}(\mathscr{A}^M) + w \int_X f\, d\mu_N \\[2pt] &\quad \geq \frac{w}{N} H_{\sigma_N}(\mathscr{B}^N) + \frac{1-w}{N} H_{\pi_*\sigma_N}(\mathscr{A}^N) + \frac{w}{N} \int_X \mathbb{S}_N f \, d\sigma_N - \frac{2M(\log |\mathscr{A}| +\log |\mathscr{B}|)}{N} \\[2pt] &\quad = \frac{\log Z_N}{N} - \frac{2M(\log |\mathscr{A}| +\log |\mathscr{B}|)}{N} \quad \text{by Claim } {6.2}. \end{split} \end{align*} $$

Because $\mu _{N_k}\to \mu $ as $k\to \infty $ , letting $N=N_k\to \infty $ ,

$$ \begin{align*} \frac{w}{M} H_{\mu}(\mathscr{B}^M) + \frac{1-w}{M}H_{\pi_* \mu}(\mathscr{A}^M) + w \int_X f\, d\mu \geq \lim_{N\to \infty} \frac{\log Z_N}{N}. \end{align*} $$

Here, we have used the clopenness of the elements of $\mathscr {A}^M$ and $\mathscr {B}^M$ . Finally, letting $M\,{\to}\, \infty $ , we get

$$ \begin{align*} w h_\mu(T, \mathscr{B}\kern1.6pt) + (1-w) h_{\pi_*\mu}(S, \mathscr{A}{\kern2pt}) + w \int_X f\, d\mu \geq \lim_{N\to \infty} \frac{\log Z_N}{N}. \\[-42pt] \end{align*} $$

Now we can prove the main result (Theorem 2.1). We repeat the statement for the convenience of readers.

Theorem 6.4. ( $=$ Theorem 2.1)

Let $\pi :(X, T)\to (Y, S)$ be a factor map between dynamical systems. Then for any $0\leq w\leq 1$ and a continuous function $f:X\to \mathbb {R}$ ,

$$ \begin{align*} P^w(\pi, T, f) = P^w_{\mathrm{var}}(\pi, T, f). \end{align*} $$

Proof. We already proved in Proposition 4.1 that

$$ \begin{align*} P^w_{\mathrm{var}}(\pi, T, f) \leq P^w(\pi,T,f). \end{align*} $$

By Corollary 5.5, there exists a factor map $\pi ^\prime :(X^\prime , T^\prime )\to (Y^\prime , S^\prime )$ between zero-dimensional dynamical systems with a continuous function $f^\prime :X^\prime \to \mathbb {R}$ such that

$$ \begin{align*} P^w(\pi,T, f) \leq P^w(\pi^\prime, T^\prime, f^\prime), \quad P^w_{\mathrm{var}}(\pi^\prime, T^\prime, f^\prime) \leq P^w_{\mathrm{var}}(\pi,T, f). \end{align*} $$

By Proposition 6.1,

$$ \begin{align*} P^w(\pi^\prime, T^\prime, f^\prime) \leq P^w_{\mathrm{var}}(\pi^\prime, T^\prime, f^\prime). \end{align*} $$

Therefore,

$$ \begin{align*} P^w(\pi,T, f) \leq P^w_{\mathrm{var}}(\pi,T, f). \end{align*} $$

So we conclude that

$$ \begin{align*} P^w(\pi, T, f) = P^w_{\mathrm{var}}(\pi, T, f). \\[-34pt] \end{align*} $$

Remark 6.5. The book of Downarowicz [Reference DownarowiczDow11] systematically develops the idea of using zero-dimensional dynamical systems in the study of entropy theory. The above proof is influenced by this idea. We also notice that it seems difficult to use this zero-dimensional trick in the proof of Proposition 4.1 in §4 because it is difficult to prove that principal extensions preserve weighted topological pressure without using the variational principle. A similar remark is given in [Reference DownarowiczDow11, Remark 7.6.12] about the proof of the standard variational principle.

Acknowledgements

M. Tsukamoto was supported by JSPS KAKENHI JP21K03227.

References

Bedford, T.. Crinkly curves, Markov partitions and box dimension in self-similarsets. PhD Thesis, University of Warwick, 1984.Google Scholar
Barral, J. and Feng, D.-J.. Weighted thermodynamic formalism and applications. Preprint, 2009,arXiv:0909.4247.Google Scholar
Barral, J. and Feng, D.-J.. Weighted thermodynamic formalism on subshifts and applications.Asian J. Math. 16 (2012), 319352.CrossRefGoogle Scholar
Bowen, R.. Topological entropy for noncompactsubsets. Trans. Amer. Math. Soc. 184 (1973),125136.CrossRefGoogle Scholar
Downarowicz, T. and Huczek, D.. Zero-dimensional principal extensions. Acta Appl.Math. 126 (2013), 117129.CrossRefGoogle Scholar
Dinaburg, E. I.. A correlation between topological entropy andmetric entropy. Dokl. Akad. Nauk SSSR 190 (1970),1922.Google Scholar
Downarowicz, T.. Entropy in Dynamical Systems.Cambridge University Press, Cambridge, 2011.CrossRefGoogle Scholar
Feng, D.-J.. Equilibrium states for factor maps betweensubshifts. Adv. Math. 226 (2011),24702502.CrossRefGoogle Scholar
Feng, D.-J. and Huang, W.. Variational principle for weighted topological pressure. J.Math. Pures Appl. (9) 106 (2016), 411452.CrossRefGoogle Scholar
Goodman, T. N. T.. Relating topological entropy and measureentropy. Bull. Lond. Math. Soc. 3 (1971),176180.CrossRefGoogle Scholar
Goodwyn, L. W.. Topological entropy bounds measure-theoreticentropy. Proc. Amer. Math. Soc. 23 (1969),679688.CrossRefGoogle Scholar
Kenyon, R. and Peres, Y.. Measures of full dimension on affine-invariant sets. Ergod.Th. & Dynam. Sys. 16 (1996), 307323.CrossRefGoogle Scholar
Kenyon, R. and Peres, Y.. Hausdorff dimensions of sofic affine-invariant sets. Israel J.Math. 94 (1996), 157178.CrossRefGoogle Scholar
Lindenstrauss, E. and Tsukamoto, M.. Double variational principle for mean dimension. Geom.Funct. Anal. 29 (2019), 10481109.CrossRefGoogle Scholar
Ledrappier, F. and Walters, P.. A relativised variational principle for continuous transformations.J. Lond. Math. Soc. (2) 16 (1977),568576.CrossRefGoogle Scholar
McMullen, C.. The Hausdorff dimension of general Sierpinskicarpets. Nagoya Math. J. 96 (1984),19.CrossRefGoogle Scholar
Misiurewicz, M.. A short proof of the variational principle for ${\mathbb{Z}}_{+}^N$ actions on a compact space. Int. Conf.on Dyn. Syst. Math. Phys. (Rennes, 1975) (Astérisque, 40). SociétéMathématique de France, Paris, 1976, pp.145157.Google Scholar
Ruelle, D.. Statistical mechanics on a compact set with ${Z}^{\nu}$ action satisfying expansiveness and specification. Trans. Amer. Math. Soc. 185 (1973), 237251.CrossRefGoogle Scholar
Walters, P.. A variational principle for the pressure of continuoustransformations. Amer. J. Math. 17 (1975),937971.CrossRefGoogle Scholar
Walters, P.. An Introduction to Ergodic Theory.Springer-Verlag, New York, 1982.CrossRefGoogle Scholar
Yayama, Y.. Existence of a measurable saturated compensationfunction between subshifts and its applications. Ergod. Th. & Dynam. Sys. 31(2011), 15631589.CrossRefGoogle Scholar
Yayama, Y.. Application of a relative variational principle todimension of nonconformal expanding maps. Stoch. Dyn. 11 (2011),643679.CrossRefGoogle Scholar