1. Introduction
The distribution of various random variables associated with trees is widely studied in the literature. Typically, the tree parameters that behave additively exhibit normal distribution, which was observed by Drmota [Reference Drmota7, Chapter 3], Janson [Reference Janson16], and Wagner [Reference Wagner27]. For example, the number of leaves or, more generally, the number of vertices of a given degree satisfies a central limit theorem (CLT) for many random models: labelled trees, unlabelled trees, plane trees, forests; see Drmota and Gitteberger [Reference Drmota and Gittenberger8] and references therein for more details.
The classical limit theorems of probability theory are impractical for random trees due to the dependency of adjacencies. Instead, one employs more elaborate tools such as the analysis of generating functions [Reference Bender and Richmond2], the conditional limit theorems [Reference Holst12], and Hwang’s quasi-power theorem [Reference Hwang13]. These methods are particularly efficient for parameters that admit a recurrence relation, which is often the case for trees.
The martingale CLT [Reference Brown4] is a powerful tool that has been extensively used to study random structures. Nevertheless, it is surprisingly overlooked in the context of the distribution of tree parameters and the vast majority of known results rely on the methods mentioned in the paragraph above. We are aware of only a few applications of the martingale CLT: Smythe [Reference Smythe25] and Mahmoud [Reference Mahmoud18] analysed growth of leaves in the random trees related to urn models; Móri [Reference Móri22] examined the max degree for Barabási–Albert random trees; Fen and Hu [Reference Feng and Hu9] considered the Zagreb index for random recursive trees; Sulzbach [Reference Sulzbach26] studied the path length in a random model encapsulating binary search trees, recursive trees and plane-oriented recursive trees.
We prove a CLT for an arbitrary tree parameter using the martingale approach. Unlike other methods, the parameter is not required to be of a specific form or to satisfy a recurrence relation. Our only assumption is that the parameter is stable with respect to small perturbations in the sense that precisely specified below. We also bound the rate of convergence to the normal distribution. In this paper, we restrict our attention to unrooted labelled trees even though martingales appear naturally in many other random settings. This is sufficient to demonstrate the power of the new approach and cover several important applications that go beyond the toolkit of existing methods.
Let $\mathcal{T}_n$ be the set of trees whose vertices are labelled by $[n]\,:\!=\,\{1,\ldots, n\}$ and ${\textbf{T}}$ be a uniform random element of $\mathcal{T}_n$ . By Cayley’s formula, we have $|\mathcal{T}_n| = n^{n-2}$ . For a tree $T \in \mathcal{T}_n$ and two vertices $i,j\in[n]$ , let $d_T(i,j)$ denote the distance between i and j that is the number of edges in the unique path from i to j in T. For $A,B \subseteq [n]$ , let
Throughout the paper, we identify graphs and their edge sets. Consider an operation defined $\textrm{S}_{i}^{jk}$ as follows. If $ij \in T$ and $ik \notin T$ , let $\textrm{S}_{i}^{jk} T$ be the graph obtained from T by deleting the edge ij and inserting the edge ik; see Figure 1 below.
Observe that $\textrm{S}_{i}^{jk}T$ is a tree if and only if the path from j to k in T does not contain the vertex i. We refer the operation $\textrm{S}_{i}^{jk}$ as a tree perturbation.
Let ${\mathbb{R}}^+$ denote the set of non-negative real numbers. For $\alpha \in {\mathbb{R}}^+$ , we say a tree parameter $F\,:\, \mathcal T_n \rightarrow {\mathbb{R}}$ is $\alpha$ -Lipschitz if
for all $T \in \mathcal{T}_n$ and triples (i, j, k) that $\textrm{S}_{i}^{jk} T$ is a tree. We also require that the effects on the parameter F of sufficiently distant perturbations $\textrm{S}_{i}^{jk}$ and $\textrm{S}_{a}^{bc}$ superpose; that is
For $\rho \in {\mathbb{R}}^+$ , we say F is $\rho$ -superposable if the above equation holds for all $T \in \mathcal{T}_n$ and triples (i, j, k), (a, b, c) such that $\textrm{S}_{i}^{jk}T$ , $\textrm{S}_{a}^{bc}T$ , $\textrm{S}_{i}^{jk}\textrm{S}_{a}^{bc}T$ are trees and $d_T(\{j,k\}, \{b,c\})\geqslant \rho$ . Note that the sets $\{j,k\}$ and $\{b,c\}$ are at the same distance in all four trees T, $\textrm{S}_{i}^{jk}T$ , $\textrm{S}_{a}^{bc}T$ , and $\textrm{S}_{i}^{jk}\textrm{S}_{a}^{bc}T$ . Thus, $d_T(\{j,k\}, \{b,c\})$ is an appropriate measure for the distance between the two tree perturbations $\textrm{S}_{i}^{jk}$ and $\textrm{S}_{a}^{bc}$ .
For a random variable X let
where $\Phi(t) = (2\pi)^{-1/2} \int_{-\infty}^t e^{-x^2/2} dx$ . In other words, $\delta_{\textrm{K}} [X]$ is the Kolmogorov distance between the scaled random variable X and the standard normal distribution. We say $X=X_n$ is asymptotically normal if $\delta_{\textrm{K}} [X] \rightarrow 0$ as $n \rightarrow \infty$ .
In the following theorem, F, $\alpha$ , and $\rho$ stand for sequences parametrised by a positive integer n that is $(F,\alpha, \rho) = (F_n, \alpha_n, \rho_n)$ . We omit the subscripts for notation simplicity. All asymptotics in the paper refer to $n\rightarrow \infty$ and the notations $o(\cdot)$ , $O(\cdot)$ , $\Theta(\cdot)$ have the standard meaning.
Theorem 1.1. Let a tree parameter $F\,:\, \mathcal T_n \rightarrow {\mathbb{R}}$ be $\alpha$ -Lipschitz and $\rho$ -superposable for some $\alpha> 0$ and $\rho\geqslant 1$ . Assume also that, for a fixed constant $\varepsilon>0$ ,
Then, $F({\textbf{T}})$ is asymptotically normal. Moreover, $\delta_{\rm{K}}[F({\textbf{T}})] = O(n^{-\varepsilon'})$ for any $\varepsilon' \in (0,\varepsilon)$ .
To clarify the assumptions Theorem 1.1, we consider a simple application to the aforementioned parameter L(T), the number of leaves in a tree T. The distribution of $L({\textbf{T}})$ was derived for the first time by Kolchin [Reference Kolchin17], using generating functions and the connection to the Galton–Watson branching process. Theorem 1.1 immediately leads to the following result:
Corollary 1.2. $L({\textbf{T}})$ is asymptotically normal and $\delta_{\rm{K}}[L({\textbf{T}})] = O(n^{-1/4+\epsilon})$ for any $\epsilon>0$ .
Proof. For any tree $T\in \mathcal{T}_n$ and a triple (i, j, k) that $\textrm{S}_{i}^{jk}T$ is a tree, the numbers of leaves of T and $\textrm{S}_{i}^{jk}T$ differ by at most one. Thus, L is $\alpha$ -Lipchitz on $\mathcal{T}_n$ with $\alpha=1$ .
Next, observe that if T, $\textrm{S}_{i}^{jk} T$ , $\textrm{S}_{a}^{bc} T$ , and $\textrm{S}_{i}^{jk} \textrm{S}_{a}^{bc} T$ are trees and $\{j,k\}\cap \{b,c\} = \emptyset$ , then
Indeed, the trees T, $\textrm{S}_{i}^{jk} T$ , $\textrm{S}_{a}^{bc} T$ , $\textrm{S}_{i}^{jk} \textrm{S}_{a}^{bc} T$ have the same sets of leaves except possibly vertices $\{j,k,b,c\}$ . However, any vertex from $\{j,k,b,c\}$ contributes to the same number of negative and positive terms in the left-hand side of the above. This implies that L is $\rho$ -superposable with $\rho=1$ .
It is well known that $\textrm{Var} [L({\textbf{T}})] =(1+o(1))n/e$ ; see, for example, [Reference Moon21, Theorem 7.7]. Then, all the assumptions of Theorem 1.1 are satisfied with $ \alpha = \rho = 1$ and $\varepsilon = 1/4$ . This completes the proof.
Remark 1.3. The rates of convergence $\delta_K[F({\textbf{T}})]= O(n^{-1/4+\epsilon})$ are typical in applications of Theorem 1.1 because, for many examples, $\textrm{Var} [F({\textbf{T}})]$ is linear and $\alpha$ , $\rho$ are bounded by some power of ${\textrm{log}}\, n$ . Wagner [Reference Wagner29] pointed out that Hwang’s quasi-power theorem [Reference Hwang13] leads to a better estimate $\delta_K[L({\textbf{T}})]= O(n^{-1/2+\epsilon})$ for the number of leaves. This matches the rates of convergence in the classical Berry–Esseen result (for a sum of i.i.d. variables) and, thus, is likely optimal. It remains an open question whether the bound $\delta_K[F({\textbf{T}})]= O(n^{-1/2+\epsilon})$ always hold for an arbitrary $\alpha$ -Lipschitz and $\rho$ -superposable tree parameter F (assuming the variance is linear and $\alpha$ and $\rho$ are not too large).
The asymptotic normality of the number of vertices in ${\textbf{T}}$ with a given degree is proved identically to Corollary 1.2. However, for many other applications, a tree parameter F might behave badly on a small set of trees. Then, Theorem 1.1 does not work directly since $\alpha$ and $\rho$ are too large. For example, a single perturbation $\textrm{S}_{i}^{jk}$ can destroy a lot of paths on three vertices in a tree with large degrees. To overcome this difficulty, one can apply Theorem 1.1 to a parameter $\tilde{F}$ , which is related to F, but ignores the vertices with degrees larger ${\textrm{log}}\, n$ . This trick does not change the limiting distribution because the trees with large degrees are rare: Moon [Reference Moon21, formula (7.3)] showed that, for any $d \in [n]$ ,
Similarly, one can restrict attention to the trees for which the neighbourhoods of vertices do not grow very fast. Let
In this paper, we prove the following result, which might be of independent interest.
Theorem 1.4. $ \displaystyle \mathbb{P} \left( \beta({\textbf{T}}) \geqslant {\rm{log}}^4 n \right) \leqslant e^{-\omega({\rm{log}}\, n)}. $
Remark 1.5. The distribution of the height profiles in branching processes is a well-studied topic. In particular, the number of vertices in ${\textbf{T}}$ at distance at most d from a given vertex was already considered by Kolchin [Reference Kolchin17]. However, we could not find a suitable large deviation bound for $\beta({\textbf{T}})$ in the literature. In fact, the constant 4 in the exponent of the logarithm in the bound above is not optimal, but sufficient for our purposes.
The structure of the paper is as follows. In Section 2, we analyse the number of occurrences of an arbitrary tree pattern. For various interpretations of the notion “occurrence,” the asymptotic normality in this problem was established by Chysak, Drmota, Klausner, Kok [Reference Chysak, Drmota, Klausner and Kok5] and Janson [Reference Janson16]. Applying Theorem 1.1, we not only confirm these results but also allow much more general types of occurrences. In particular, we prove the asymptotical normality for the number of induced subgraphs isomorphic to a given tree of fixed size and for the number of paths of length up to $n^{1/8 -\varepsilon}$ . Both of these applications go beyond the setup of [Reference Chysak, Drmota, Klausner and Kok5, Reference Janson16]. In Section 3, we derive the distribution of the number of automorphisms of ${\textbf{T}}$ and confirm the conjecture by Yu [Reference Yu30]. To our knowledge, this application of Theorem 1.1 is also not covered by any of the previous results.
We prove Theorem 1.1 in Section 5, using a martingale construction based on the Aldous–Broder algorithm [Reference Aldous1] for generating random labelled spanning trees of a given graph. Section 4 contains the necessary background on the theory of martingales. We also use martingales to prove Theorem 1.4 in Section 6. This proof is independent of Section 5 and, in fact, Theorem 1.4 is one of the ingredients that we need for our main result, Theorem 1.1. We also use Theorem 1.4 in the application to long induced paths to bound the number of the paths affected by one perturbation; see Theorem 2.9.
Tedious technical calculations of the variance for the pattern and automorphism counts are given in Appendices A and B.
2. Pattern counts
In this section, we apply Theorem 1.1 to analyse the distribution of the number of occurrences of a tree pattern H as an induced subtree in uniform random labelled tree ${\textbf{T}}$ . To our knowledge, the strongest results for this problem were obtained by Chysak et al. [Reference Chysak, Drmota, Klausner and Kok5] and Janson [Reference Janson16].
Chysak et al. [Reference Chysak, Drmota, Klausner and Kok5] consider occurrences of a pattern H as an induced subgraph of a tree T with the additional restriction that the internal vertices in the pattern match the degrees the corresponding vertices in T. That is, the other edges of T can only be adjacent to leaves of H. For example, the tree T on Figure 2 contains only three paths on three vertices in this sense, namely $T[\{1,5,8\}]$ , $T[\{1,3,6\}]$ , and $T[\{3,6,13\}]$ . In particular, the induced path on vertices 1, 2, 7 is not counted since the internal vertex 2 is adjacent to 4. The result by Chysak, Drmota, Klausner, Kok is given below.
Theorem 2.1 ([Reference Chysak, Drmota, Klausner and Kok5, Theorem 1]). Let H to be a given finite tree. Then the limiting distribution of the number of occurrences of H (in the sense described above) in ${\textbf{T}}$ is asymptotically normal with mean and variance asymptotically equivalent to $\mu n$ and $\sigma^2 n$ , where $\mu>0$ and $\sigma^2\geqslant 0$ depend on the pattern H and can be computed explicitly and algorithmically and can be represented as polynomials (with rational coefficients) in $1/e$ .
Janson [Reference Janson16] considers the subtree counts $\eta_H(T)$ defined differently. Set vertex 1 to be a root of $T\in \mathcal{T}_n$ . For any other vertex v, let $T_v$ be the subtree consisting of v an all its descendants. Such subtrees are called fringe subtrees. The parameter $\eta_H(T)$ equals the number of fringe subtrees isomorphic to H (with a root). For example, the tree T on Figure 2 contains only one path with three vertices (rooted at end vertex), namely $T[\{3,6,13\}]$ . In particular, the induced paths $T[\{1,5,8\}]$ and $T[\{1,3,6\}]$ are not counted since they are not fringe subtrees. Janson [Reference Janson16] proved the following result about joint asymptotic normality for several such subtree counts.
Theorem 2.2 ([Reference Janson16, Corollary 1.8]) Let ${\textbf{T}}^{\rm{GW}}_n$ be a conditioned Galton–Watson tree of order n with offspring distribution $\xi$ , where ${\mathbb{E}}[\xi] = 1$ and $0 < \sigma^2 \,:\!=\, \rm{Var} [\xi] <\infty$ . Then, the subtree counts $\eta_H({\textbf{T}}^{\rm{GW}}_n)$ (for all H from a given set of patterns) are asymptotically jointly normal.
Janson [Reference Janson16, Corollary 1.8] also gives expressions for the covariances of the limiting distribution in terms of the distribution of the corresponding unconditioned Galton–Watson tree. To relate this model to uniform random labelled tree ${\textbf{T}}$ , one need to take the conditioned Galton–Watson tree of order n with the Poisson offspring distribution.
We consider a more general type of tree counts which encapsulates both counts from above. In fact, it was suggested by Chysak et al. [Reference Chysak, Drmota, Klausner and Kok5]: “…we could also consider pattern-matching problems for patterns in which some degrees of certain possibly external “filled” nodes must match exactly while the degrees of the other, possibly internal “empty” nodes might be different. But then the situation is more involved.” Then, in [Reference Chysak, Drmota, Klausner and Kok5, Section 5.3] they explain that having an internal “empty” node leads to serious complications in their approach.
We define our tree parameter formally. Let H be a tree with $\ell$ vertices $v_1,\ldots,v_{\ell}$ . Let $\boldsymbol{\theta} = (\theta_1,\ldots,\theta_\ell) \in \{0,1\}^{\ell}$ . We say the pattern $(H,\boldsymbol{\theta})$ occurs in a tree $T \in \mathcal{T}_n$ if there exists a pair of sets (U, W) such that $W\subset U \subset [n]$ and
-
• the induced subgraph T[U] is isomorphic to H,
-
• the set W corresponds to all vertices $v_i$ with $\theta_i=1$ (“empty” nodes),
-
• there is no edge in T between $U-W$ and $[n] - U$ .
Denote by $N_{H,\boldsymbol{\theta}}(T)$ the number of occurrences of the pattern $(H,\boldsymbol{\theta})$ in T that is the number of different pairs (U, W) satisfying the above. It equals the number of ways to choose suitable identities for $v_1,\ldots, v_\ell$ in [n] divided by $|\rm{A}\small{UT}\left({H,\boldsymbol{\theta}}\right)|$ , the number of automorphisms of H that preserve $\boldsymbol{\theta}$ . In particular, if $\theta_i = 1$ for all $i\in [\ell]$ then $N_{H,\boldsymbol{\theta}}(T)$ is the number of induced subgraphs in T isomorphic H. If $\theta_i = 1$ whenever i is a leaf of H, then $N_{H,\boldsymbol{\theta}}(T)$ is the tree count considered in Theorem 2.1. If $\theta_i=1$ for exactly one vertex $i\in [\ell]$ which is a leaf in H, then $N_{H,\boldsymbol{\theta}}(T)$ counts fringe subtrees.
In Section 2.2, we prove that $ N_{H,\boldsymbol{\theta}}({\textbf{T}})$ is asymptotically normal for any fixed H and $\boldsymbol{\theta}\in \{0,1\}^{\ell}$ with at least one non-zero component (where $\ell$ is the number of vertices in H). Note that if $\theta_i=0$ for all $i \in [\ell]$ and $n>\ell$ , then $N_{H,\boldsymbol{\theta}}({\textbf{T}}) = 0$ since at least one vertex corresponding to H must be adjacent to other vertices in T. Our approach also works for growing patterns. We demonstrate it for the case when H is a path.
2.1. Moments calculation
To apply Theorem 1.1, we need a lower bound for $\textrm{Var}(N_{H,\boldsymbol{\theta}}({\textbf{T}}))$ . One can compute the moments of $N_{H,\boldsymbol{\theta}}({\textbf{T}})$ using the following formula for the number of trees containing a given spanning forest. Lemma 2.3 is a straightforward generalisation of [Reference Moon21, Theorem 6.1] with almost identical proof, which we include here for the sake of completeness.
Lemma 2.3. Let $S = H_1 \sqcup\ldots\sqcup H_k$ be a forest on [n] and $B_i$ be non-empty subsets (not necessarily proper) of $V(H_i)$ for all $i \in [k]$ . Then, the number of trees T on [n] containing all edges of H such that $\rm{deg}_T(v) = \rm{deg}_S(v)$ for every v outside $B_1 \cup \ldots \cup B_{k}$ equals $b_1\cdots b_k (b_1+\cdots + b_k)^{k-2}$ , where $b_i$ is the number of vertices in $B_i$ .
Proof. Any desired tree T corresponds to a tree $T_H$ on k vertices labelled by $H_1,\ldots, H_k$ for which the vertices $H_i$ and $H_j$ are adjacent if and only if there is an edge between $H_i$ and $H_j$ in T. If $d_1,\ldots, d_k$ are degrees of $T_H$ , then the number of trees T corresponding to $T_H$ equals $b_1^{d_1} \ldots b_{k}^{d_{k}}$ since we can only use vertices from $B_1 \cup \ldots \cup B_{k}$ for edges of T. From [Reference Moon21, Theorem 3.1], we know that the number of trees on k vertices with degrees $d_1,\ldots, d_k$ is $\binom{k-2}{d_1-1, \ldots ,d_k-1}.$ Thus, the total number of such trees T is
where the sum is over all positive integers sequences that $d_1+\cdots+ d_k = 2k-2$ .
For an $\ell$ -tuple ${\textbf{u}}= (u_1,\ldots,u_\ell) \in [n]^{\ell}$ with distinct coordinates, let $\mathbb{1}_{{\textbf{u}}}({\textbf{T}})$ be the indicator of the event that a pattern $(H,\boldsymbol{\theta})$ occurs in ${\textbf{T}}$ with $u_1, \ldots, u_\ell$ corresponding to the vertices of H. Let $s\,:\! =\, \sum_{i=1}^{\ell} \theta_i$ . Applying Lemma 2.3 to a forest consisting of one nontrivial component isomorphic to H and dividing by $|\mathcal{T}_n| = n^{n-2}$ , we find that
Summing over all choices for ${\textbf{u}}$ and dividing by $|\rm{A}\small{UT}\left({H,\boldsymbol{\theta}}\right)|$ to adjust overcounting, we get
In particular, this formula agrees with Theorem 2.1 that $\mu$ is a polynomial with rational coefficients in $1/e$ . Similarly, for the variance, we have
where the sum over all $\ell$ -tuples ${\textbf{u}},{\textbf{u}}' \in [n]^{\ell}$ with distinct coordinates. Then, we can also use Lemma 2.3 (with one or two nontrivial components) to compute $ \textrm{Cov} ( \mathbb{1}_{{\textbf{u}}}({\textbf{T}}), \mathbb{1}_{{\textbf{u}}'}({\textbf{T}}))$ . However, this computation is much more involved: one needs to consider all possible ways the pattern $(H,\boldsymbol{\theta})$ intersects with itself. Nevertheless, for a fixed pattern, it is not difficult to see that ${\mathbb{E}} \left[\mathbb{1}_{{\textbf{u}}}({\textbf{T}})\right]$ and ${\mathbb{E}} \left[\mathbb{1}_{{\textbf{u}}}({\textbf{T}}) \mathbb{1}_{{\textbf{u}}'}({\textbf{T}})\right]$ are polynomials with integer coefficients in $1/e$ divided by some power of n. This observation is already sufficient to establish the bound $\rm{Var}\left[{N_{H,\boldsymbol{\theta}}({\textbf{T}})}\right]= \Omega(n)$ for the case when $\sum_{i=1}^{\ell} \theta_i <\ell$ .
Lemma 2.4. Let $(H,\boldsymbol{\theta})$ be a fixed pattern, $\ell$ be the number of vertices in tree H, and $s\,:\!=\,\sum_{i=1}^{\ell} \theta_i$ . Then, there exist a polynomial $p_{H,\boldsymbol{\theta}}$ of degree at most $2\ell-2s$ with integer coefficients that
Moreover, if $s <\ell$ then $p_{H,\boldsymbol{\theta}} (1/e) > 0$ .
Proof. Consider any $\ell$ -tuples ${\textbf{u}},{\textbf{u}}' \in [n]^{\ell}$ with distinct coordinates. If the coordinates of ${\textbf{u}}$ and ${\textbf{u}}'$ form disjoint sets, then applying Lemma 2.3 to a forest consisting of two nontrivial component isomorphic to H, we find that
Using (3), we get that
Then, the contribution of such ${\textbf{u}},{\textbf{u}}'$ to the sum $\sum_{{\textbf{u}},{\textbf{u}}'} \textrm{Cov} (\mathbb{1}_{{\textbf{u}}}({\textbf{T}}), \mathbb{1}_{{\textbf{u}}'}({\textbf{T}}) )$ in (4) equals
Next, we proceed to the case when the sets formed by the coordinates of ${\textbf{u}}$ and ${\textbf{u}}'$ intersect. Let a be the size of the union of these two sets and
Note that $\ell - s \leqslant a-b \leqslant 2\ell -2 s$ . Then, using Lemma 2.3 (and also (3)), we find that
We say a pair $({\textbf{u}},{\textbf{u}}')$ is equivalent to $({\textbf{w}},{\textbf{w}}')$ if there is a permutation $\sigma$ of the set [n] that $w_i = \sigma(u_i)$ and $w^{\prime}_i = \sigma(u^{\prime}_i)$ for all $i \in [\ell]$ . Note that the number of pairs equivalent to $({\textbf{u}},{\textbf{u}}')$ is exactly $(n)_{a}$ . Then, the contribution of the equivalence class to the sum $\sum_{{\textbf{u}},{\textbf{u}}'} \textrm{Cov} (\mathbb{1}_{{\textbf{u}}}({\textbf{T}}), \mathbb{1}_{{\textbf{u}}'}({\textbf{T}}) )$ in (4) is $n be^{-a+b} +O(1)$ or $nbe^{-a+b} - s^2 e^{-2\ell +2s}+ O(1)$ . Summing over all equivalence classes, we complete the proof of the first part.
For the second part, observe in the above that $a-b = \ell -s$ if and only if the sets of coordinates of ${\textbf{u}}$ and ${\textbf{u}}'$ coincide and $\{u_i \mathrel{:} \theta_i=1\}=\{u^{\prime}_i \mathrel{:} \theta_i=1\}$ . In particular, we have $a<2\ell -1$ so $ \textrm{Cov}(\mathbb{1}_{{\textbf{u}}}({\textbf{T}}),\mathbb{1}_{{\textbf{u}}'}({\textbf{T}})) >0$ . Then, the coefficient corresponding to $x^{-\ell + s}$ in $p_{H,\theta}(x)$ is strictly positive so the polynomial $p_{H,\theta}(x)$ is not trivial. Since the number $1/e$ is transcendental, we conclude that $p_{H,\theta}(1/e)$ is not zero. Also, $p_{H,\theta}(1/e)$ cannot be negative since $\rm{Var}\left[{N_{H,\boldsymbol{\theta}}({\textbf{T}})}\right] \geqslant 0$ so it can only be positive. This completes the proof.
For a tree $T\in \mathcal{T}_n$ , let $N_{H}(T) \,:\!=\, N_{H, \boldsymbol{\theta}}(T)$ if $\theta_i=1$ for all $i \in [\ell]$ that is $N_{H}(T)$ is the number of induced subgraphs of T isomorphic to H. Unfortunately, the lemma above cannot guarantee that $\rm{Var}\left[{N_{H}({\textbf{T}})}\right] = \Omega(n)$ . In this case, the polynomial $p_{H,\theta}$ is a non-negative constant, but an additional argument is required to show that it is not zero.
Lemma 2.5. For any fixed tree H with degrees $h_1,\ldots,h_\ell$ , we have
where $c_j = \sum_{i=1}^\ell \left( \binom{h_i}{j} + (\ell-1) \binom{h_i-1}{j} \right)$ . In particular, $c_2>0$ if $\ell \geqslant 3$ .
The proof of Lemma 2.5 is given in Appendix A of the ArXiv version [Reference Isaev, Southwell and Zhukovskii15] of the current paper. The key idea of this proof is to estimate the variance of the conditional expection value of $N_{H}({\textbf{T}})$ given the degree sequence of ${\textbf{T}}$ .
Remark 2.6. There is a different way to show $\rm{Var}\left[{N_{H,\boldsymbol{\theta}}({\textbf{T}})}\right] =\Omega(n)$ for any fixed H and $\boldsymbol{\theta}$ (including the case $\theta_i=1$ for all $i \in [\ell]$ ). First, one establishes that $\mathbb{P}(N_{H,\boldsymbol{\theta}}({\textbf{T}}) = x_n) = o(1)$ for any sequence $x_n$ . Reducing/incrementing the number of fringe copies of H in a clever way shows that $\mathbb{P}(N_{H,\boldsymbol{\theta}}({\textbf{T}}) = x_n)$ is not much larger than $\mathbb{P}(N_{H,\boldsymbol{\theta}}({\textbf{T}}) = x_n -k) + \mathbb{P}(N_{H,\boldsymbol{\theta}}({\textbf{T}}) = x_n +k)$ for all k from a sufficiently large set. This implies that $\rm{Var}\left[{N_{H,\boldsymbol{\theta}}({\textbf{T}})}\right] \rightarrow \infty$ . Therefore, $p_{H,\theta}>0$ so $\rm{Var}\left[{N_{H,\boldsymbol{\theta}}({\textbf{T}})}\right] = \Omega(n)$ . In fact, the proof of Lemma 2.5 given in [Reference Isaev, Southwell and Zhukovskii15, Appendix A] is more technically involved than this idea, but it extends better to growing substructures.
Using formula (4), we also obtain a precise estimate of $\rm{Var}\left[{N_{H}({\textbf{T}})}\right]$ for the case when H is a path. With slight abuse of notations, let $P_\ell(T)\,:\!=\, N_{P_\ell}(T)$ that is the number of paths on $\ell$ vertices in a tree $T\in \mathcal{T}_n$ .
Lemma 2.7. Let $\ell>2$ and $\ell = O(n^{1/2})$ , then
Proof. For the induced path counts formula (4) simplifies as follows:
For $i \in [\ell]$ , let $\Sigma_i$ be the set of pairs $({\textbf{u}},{\textbf{u}}')$ that the sets formed by its coordinates have exactly i elements in common. From (3), we have that $\mathbb{E}\left[{ \mathbb{1}_{{\textbf{u}}}({\textbf{T}})}\right] = \ell n^{1-\ell}$ . Using Lemma 2.3, we get $\mathbb{E}\left[{\mathbb{1}_{{\textbf{u}}}({\textbf{T}})\mathbb{1}_{{\textbf{u}}'}({\textbf{T}})} \right]= \ell^2 n^{2-2\ell}$ for $({\textbf{u}},{\textbf{u}}') \in \Sigma_0$ , so
Applying Lemma 2.3, it is a routine to check that
Similarly, for $2 \leqslant i\leqslant \ell$ , we get
Summing the above bounds for $\Sigma_0,\ldots, \Sigma_{\ell}$ and using
we get the stated formula for $\textrm{Var} P_{\ell}({\textbf{T}}) $ .
2.2. Asymptotic normality of pattern counts
Here we apply Theorem 1.1 to derive the limiting distribution of the pattern counts $N_{H,\boldsymbol{\theta}}({\textbf{T}})$ . In fact, all applications of Theorem 1.1 typically have short proofs leaving the lower bound for the variance to be the most technically involved part.
Theorem 2.8. Let H be a tree on $\ell$ vertices and $\boldsymbol{\theta}\in\{0,1\}^{\ell}$ be a non-zero vector. Then $N_{H,\boldsymbol{\theta}}({\textbf{T}})$ is asymptotically normal and $\delta_{\rm{K}}\left[N_{H,\boldsymbol{\theta}}({\textbf{T}})\right]=O(n^{-1/4 + \varepsilon})$ for any $\varepsilon>0$ .
Proof. For a tree $T\in\mathcal{T}_n$ , let F(T) be the number of occurrences of $(H,\boldsymbol{\theta})$ in the induced subforest of T for the set of vertices with degrees at most ${\textrm{log}}\, n$ in T.
Removing one edge from T can only destroy at most $ {\textrm{log}}^{\ell} n$ patterns $(H,\boldsymbol{\theta})$ counted in F(T). Thus, F is $\alpha$ -Lipschitz with $\alpha = 2\, {\textrm{log}}^{\ell} n$ . If two perturbations $\textrm{S}_{i}^{jk}$ and $\textrm{S}_{a}^{bc}$ are at distance at least $3\ell$ in T, then every pattern $(H,\boldsymbol{\theta})$ counted in $F(\textrm{S}_{i}^{jk} \textrm{S}_{a}^{bc} T) - F(T)$ (with positive or negative sign) is present in exactly one of the terms $F(\textrm{S}_{i}^{jk}T) - F(T)$ and $F(\textrm{S}_{a}^{bc} T) - F(T)$ (with the same sign). Thus, F is $\rho$ -superposable with $\rho= 3 \ell$ .
From (1), we know that
Since the values of these random variables are not bigger than $n^{\ell}$ , we get
Combining Lemmas 2.4 and 2.5, we get that $\rm{Var}\left[{F({\textbf{T}})}\right] = \Omega(n)$ . Applying Theorem 1.1, we complete the proof.
In the next result, we allow the pattern to grow, but restricted to the case when H is a path and all $\theta_i$ equal 1 (all vertices are “empty”).
Theorem 2.9. Let $\ell = O(n^{1/8-\delta})$ for some fixed $\delta \in (0,1/8)$ . Then $P_\ell({\textbf{T}})$ is asymptotically normal and $\delta_{\rm{K}}\left[P_{\ell}({\textbf{T}})\right]=O(n^{-\varepsilon'})$ for any $\varepsilon' \in (0, 2\delta)$ .
Proof. For a tree $T\in \mathcal{T}_n$ , let
Define F(T) to be the number of induced paths on $\ell$ vertices in the forest $T[V_{\rm{good}}(T)]$ .
The number of $\ell$ -paths counted in F(T) containing any fixed edge is at most
Arguing similarly to the proof of Theorem 2.8, we conclude that F is $\alpha$ -Lipschitz with $\alpha= \ell^3\ {\textrm{log}}^8 n$ and $\rho$ -superposable with $\rho=3\ell$ . From Theorem 1.4, we also get
Next, for a tree $T\in \mathcal{T}_n$ , observe that $F(T)\leqslant P_{\ell}(T) \leqslant n^2$ since any path in T is uniquely determined by the choice of its end vertices. The rest of the argument is identical to the proof of Theorem 2.8.
3. Number of automorphisms
An automorphism of a graph G is a bijection $\sigma\,:\, V(G) \to V(G)$ such that the edge set of G is preserved under $\sigma$ . Bona and Flajolet [Reference Bóna and Flajolet3] studied this parameter for random unlabelled rooted non-plane trees and random phylogenetic trees (rooted non-plane binary trees with labelled leaves). They showed that in both cases the distribution is asymptotically lognormal; that is, the logarithm of the number of automorphisms in a random tree is asymptotically normal. McKeon [Reference McKeon20] proved asymptotic formulas for the number automorphisms in related random models of unlabelled locally restricted trees.
In her PhD thesis, Yu [Reference Yu30] determined the asymptotics of $\mathbb{E}\left[{{\textrm{log}} \left|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\right|}\right]$ for uniform random labelled tree ${\textbf{T}}$ . She also made the following conjecture:
Conjecture 3.1. [Reference Yu30] The distribution of $\left|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\right|$ is asymptotically lognormal.
In this section, we prove this conjecture. Unfortunately, we cannot immediately apply Theorem 1.1 to derive the distribution of the number of automorphisms since the logarithm of this parameter is not $\rho$ -superposable for a sufficiently small $\rho$ . This happens because some trees have automorphisms affected by both perturbations $\textrm{S}_{i}^{jk}$ and $\textrm{S}_{a}^{bc}$ even if $d_T(\{j,k\}, \{b,c\})$ is large. Instead, we start by looking at $\rm{A}\small{UT}_r\left({T}\right)$ , the subgroup of $\rm{A}\small{UT}\left({T}\right)$ consisting of automorphisms $\sigma \in \rm{A}\small{UT}\left({T}\right)$ such that $\sigma(r) = r$ , where r is some fixed vertex from [n]. In other words, $\rm{A}\small{UT}_r\left({T}\right)$ is the number of rooted automorphisms of a tree T with root r, or equivalently the stabilizer of r.
The parameter $ \rm{A}\small{UT}_r\left({{\textbf{T}}}\right)$ is easier to work with while also remaining asymptotically very similar to $\rm{A}\small{UT}\left({{\textbf{T}}}\right)$ . The ease of analysis comes from the product representation of $\left|\rm{A}\small{UT}_r\left({{T}}\right)\right|$ given by Yu [Reference Yu30, Corollary 2.1.3].
The product over B represents a product over isomorphism classes of rooted unlabelled trees. Define a branch of T at v to be a subtree rooted at an immediate descendent (with respect to r) of v. That is the branch is a fringe subtree of T at this descendent. The term $N_i(B,T,r)$ denotes the number of branches isomorphic to B at vertex i. Factorisation (5) also follows from the result of Stacey and Holton that says every rooted automorphism is a product of branch transpositions [Reference Stacey and Holton24, Lemma 2.4].
We give an example of (5) in Figure 3 for a tree on 9 vertices. There are only three types of branches in this tree with respect to the root $r=1$ , namely $B_1$ , $B_2$ , and $B_3$ . Vertex 1 has two branches isomorphic to $B_2$ , and thus $N_1(B_2,T,r)! = 2! = 2$ . It also has one branch isomorphic to $B_1$ , and thus $N_1(B_1,T,r)! = 1$ . Vertex 2 has three branches isomorphic to $B_3$ , and thus $N_2(B_3, T, r)! = 3! = 6$ . Vertices 3 and 4 each have one branch isomorphic to $B_3$ , and thus $N_3(B_3,T, r)! = N_4(B_3, T,r)! = 1$ . Applying (5) shows that $|\rm{A}\small{UT}_r(T)| = 3! \cdot 2! = 12$ .
To define our tree parameter F(T), we look at a subgroup of $\rm{A}\small{UT}_r\left({T}\right)$ based on small automorphisms. We define a small branch to be a branch with at most $4\,{\textrm{log}}\, n$ vertices, any branch that is not small is large. A small automorphism is an automorphism where any vertex that is the root of a large branch is fixed. For a given tree T, let $\rm{A}\small{UT}_{\rm{small}} \subseteq \rm{A}\small{UT}_r\left({T}\right)$ be the set of small automorphisms.
Lemma 3.2. $\rm{A}\small{UT}_{\rm{small}}$ is a subgroup of $\rm{A}\small{UT}_r\left({T}\right)$ .
Proof. Observe that any automorphism in $\rm{A}\small{UT}_{\rm{small}}$ must also have an inverse in $\rm{A}\small{UT}_{\rm{small}}$ , since they move the same vertices. Furthermore, to prove closure under composition, suppose that $a,b \in \rm{A}\small{UT}_{\rm{small}}$ but $ab \notin \rm{A}\small{UT}_{\rm{small}}$ . Let B be a large branch that is mapped by ab onto $B^\prime$ . Then all of the vertices in B are moved by either a or b. Since $a \in \rm{A}\small{UT}_{\rm{small}}$ , there are some vertices in B not moved by a; denote this set by X. Since B is connected, there exists an edge between X and $V(B)\backslash X$ in the edge set of B. Thus, there exists an edge between aX and aV(B) in T; however, this creates a cycle and thus a contradiction. Thus, ab must also only move small branches, and thus $ab \in \rm{A}\small{UT}_{\rm{small}}$ . Thus, $\rm{A}\small{UT}_{\rm{small}}$ is a subgroup.
The parameter F(T) is obtained by writing $|\rm{A}\small{UT}_{\rm{small}}|$ in the same product representation as $\left|\rm{A}\small{UT}_r\left({{T}}\right)\right|$ and taking the logarithm:
Here $\mathcal{B}_{\rm{small}}$ is the set of small branches.
Remark 3.3. In fact, the parameter F defined above belongs to a larger class of additive functionals considered by Janson [Reference Janson16] and Wagner [Reference Wagner27]. They established a general CLT for this type of parameters. [Reference Janson16, Theorem 1.3] and [Reference Wagner27, Theorem 2] do not cover the number of automorphisms in ${\textbf{T}}$ because $ {\mathbb{E}} \left[ \left(\sum_{B} {\textrm{log}} (N_i(B, {\textbf{T}}, r)!)\right)^{2} \right] $ is not vanishing. In fact, it is bounded below by the second moment of the number of leaves attached to a given vertex which tends to a positive constant; see also the estimates given in [Reference Isaev, Southwell and Zhukovskii15, Appendix B].
Next, we show that F(T) satisfies assumptions of Theorem 1.1 while also being very close to ${\textrm{log}} \left|\rm{A}\small{UT}_r\left({{T}}\right)\right|$ .
Lemma 3.4. Let $\alpha = 3{\rm{log}}\, n$ and $\rho = 10\ {\rm{log}}\, n$ . Then $F({\textbf{T}})$ as defined in (6) is $\alpha$ -Lipschitz and $\rho$ -superposable.
Proof. To prove the Lipschitz property, we show that for any two trees T and $T^\prime$ differing by a perturbation $\textrm{S}_i^{jk}$ , the order of $\rm{A}\small{UT}_{\rm{small}}$ for each tree can differ by at most a factor of $n^3$ . Any automorphism of T fixing $\left\{i,j,k \right\}$ is an automorphism of $T^\prime$ , since all other edges remained static so their orbits are unaffected. Let $G_{ijk}$ be the subgroup of $\rm{A}\small{UT}_{\rm{small}}$ that fixes $\left\{ i,j,k \right\}$ . Then the cosets of this subgroup are defined by where they send each of these vertices. Since there are at most n such options for each element in the set, we get at most $n^3$ cosets. By Lagrange’s theorem, we get that
and vice versa by swapping the roles of T and $T^\prime$ . Taking the logarithm of both sides gives the desired bound.
Next, we show that F is $\rho$ -superposable. Suppose $d = d_T \left( \left\{ j,k \right\}, \left\{ b,c\right\} \right) > 10\,{\textrm{log}}\, n$ . Then suppose an automorphism $\sigma \in \rm{A}\small{UT}_{\rm small}(T)$ is created or destroyed by $\textrm{S}^{jk}_i$ . Then $\sigma$ must not fix $\left\{ i, j, k \right\}$ . Any path between one of $\left\{j,k\right\}$ and one of $\left\{b,c\right\}$ must be longer than $10\,{\textrm{log}}\, n$ . Therefore, any parent vertex in the tree is strictly more than $5\,{\textrm{log}}\, n$ distance from at least one vertex in each pair. So $\sigma$ must fix $\left\{a,b,c\right\}$ and all lower branches, since each branch moved by the automorphism is at most $4\,{\textrm{log}}\, n$ . So $\textrm{S}_a^{bc}$ cannot affect the presence or absence of $\sigma$ in $\rm{A}\small{UT}\left({T}\right)$ . Similarly, any automorphism created or destroyed by $\textrm{S}_a^{bc}$ cannot be affected by $\textrm{S}_{i}^{jk}$ . Thus,
This completes the proof.
In the next lemma, we derive bounds needed to compare $\rm{A}\small{UT}({\textbf{T}})$ and $F({\textbf{T}})$ .
Lemma 3.5. The following statements hold.
-
a. $\Big| {\rm{log}} \left|\rm{A}\small{UT}_r\left({{T}}\right)\right| - {\rm{log}} \left|\rm{A}\small{UT}\left({T}\right)\right|\Big| \leqslant {\rm{log}}\, n$ for all $T\in \mathcal{T}_n$ ,
-
b. $\mathbb{P}\left({F({\textbf{T}}) \neq {\rm{log}}\left|\rm{A}\small{UT}_r\left({{{\textbf{T}}}}\right)\right|}\right) = O\left( \frac{1}{n^3} \right)$ ,
-
c. $\mathbb{E}\left[{{\rm{log}} \left|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\right|}\right] - \mathbb{E}\left[{F({\textbf{T}})}\right] = O({\rm{log}}\, n)$ ,
-
d. $\rm{Var}\left[{ {\rm{log}} \left|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\right|}\right] - \rm{Var}\left[{F({\textbf{T}})}\right] = O\left(\sqrt{n\ {\rm{log}}\, n}\right)$ .
Proof. Each automorphism in $\rm{A}\small{UT}_r\left({T}\right)$ is an automorphism in $\rm{A}\small{UT}\left({T}\right)$ . The group $\rm{A}\small{UT}\left({T}\right)$ operates on [n] such that $\rm{A}\small{UT}_r\left({T}\right)$ is the stabilizer of r. Hence,
Thus, we get (a). Parts (b) follows almost immediately from results by Yu [Reference Yu30, Corollary 2.2.2]. To show part (c), we use parts (a) and (b) and observe $F(T) \leqslant \left|\rm{A}\small{UT}_r\left({{T}}\right)\right| \leqslant {\textrm{log}}\, n! \leqslant n\, {\textrm{log}}\, n$ to get that
Finally, we proceed to part (d). Let $W = F({\textbf{T}}) - {\textrm{log}}\left|\rm{A}\small{UT}_r\left({{{\textbf{T}}}}\right)\right|$ and $Z = {\textrm{log}} \left|\rm{A}\small{UT}_r\left({{{\textbf{T}}}}\right)\right| - {\textrm{log}}\left|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\right|$ . From Lemma 3.5(a,b,c), we get that
Then, we have
The final ingredient needed to apply Theorem 1.1 is a bound on the variance of $F({\textbf{T}})$ , given in the lemma below.
Lemma 3.6. For sufficiently large n, we have $\rm{Var}\left[{F({\textbf{T}})}\right] \geqslant 0.002\,n$ .
The proof of Lemma 3.6 is lengthy and quite technical. We include it in Appendix B of the ArXiv version [Reference Isaev, Southwell and Zhukovskii15] of the current paper.
Now, we are ready to prove the following result.
Theorem 3.7. Conjecture 3.1 is true. Furthermore, $\delta_K\left[ {\rm{log}} \left|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\right| \right] = O\left(n^{-\frac{1}{4} \,+\, \epsilon}\right)$ and $\delta_K\left[ {\rm{log}} \left|\rm{A}\small{UT}_r\left({{{\textbf{T}}}}\right)\right| \right]= O\left(n^{-\frac{1}{4} \,+\, \epsilon}\right)$ for any $\epsilon > 0$ .
Proof of Theorem 3.7. Combining Lemmas 3.4, 3.5, and 3.6, we get that the parameter F defined in (6) satisfies all the assumptions of Theorem 1.1 and $\delta_k[F({\textbf{T}})] = O(n^{-1/4 + \epsilon})$ for any $\epsilon >0$ . Using Lemma 3.5 and recalling that $F(T) \leqslant {\textrm{log}}\left|\rm{A}\small{UT}_r\left({{{\textbf{T}}}}\right)\right| \leqslant {\textrm{log}}\rm{A}\small{UT}\left({{\textbf{T}}}\right) $ , we get that ${\textrm{log}}\left|\rm{A}\small{UT}_r\left({{{\textbf{T}}}}\right)\right|$ and ${\textrm{log}}|\rm{A}\small{UT}\left({{\textbf{T}}}\right)\!|$ have the same limiting distribution as F(T) (with the same bound for the Kolmogorov distance).
Remark 3.8. Recently, Stufler and Wagner [Reference Wagner and Stufler28] have also announced progress in showing that the distributions of $\left|\rm{A}\small{UT}\left({T}\right)\right|$ and $\left|\rm{A}\small{UT}_r\left({{T}}\right)\right|$ are asymptotically lognormal; however, it has not yet appear in any published or arXiv paper. Their method is based on the analysis of the generating function and is different from our approach. Stufler and Wagner gave much more accurate values for the mean and variance in their talk [Reference Wagner and Stufler28], specifically $\mathbb{E}\left[{{\textrm{log}}\left|\rm{A}\small{UT}\left({T}\right)\right|}\right] \approx 0.052290n$ and $\rm{Var}\left[{{\textrm{log}}\left|\rm{A}\small{UT}\left({T}\right)\right|}\right] = 0.039498n$ .
4. Tools from the theory of martingales
Let $\mathcal{P}=(\varOmega,\mathcal{F}, \mathbb{P})$ be a probability space. A sequence $\mathcal{F}_0,\ldots,\mathcal{F}_n$ of sub- $\sigma$ -fields of $\mathcal{F}$ is a filtration if $\mathcal{F}_0\subseteq\cdots\subseteq\mathcal{F}_n$ . A sequence $Y_0,\ldots,Y_n$ of random variables on $\mathcal{P}$ is a martingale with respect to $\mathcal{F}_0,\ldots,\mathcal{F}_n$ if
-
i. $Y_i$ is $\mathcal{F}_i$ -measurable and $\left| Y_i \right|$ has finite expectation, for $0\le i\le n$ ;
-
ii. $\mathbb{E}\left[{ Y_i \mid \mathcal{F}_i}\right] = Y_{i-1}$ for $1\le i\le n$ .
In the following we will always assume that $\mathcal{F}_0 = \{\emptyset, \varOmega \}$ and so $Y_0 = {\mathbb{E}} [ Y_n]$ .
In this section, we state some general results on concentration and limiting distribution for martingales. In fact, we only need these results for discrete uniform probability spaces, where the concept of martingale reduces to average values over increasing set systems. In this case, $\varOmega$ is a finite set and each $\sigma$ -field $\mathcal{F}_{i}$ is generated by unions of blocks of a partion of $\varOmega$ . Following McDiarmid [Reference McDiarmid19], for $i=0,\ldots, n$ we define the conditional range of a random variable X on $\mathcal{P}$ as
Here, $\sup[X\mid \mathcal{F}_{i}] $ is the $\mathcal{F}_{i}$ -measurable random variable which takes the value at $\omega \in \varOmega$ equal to the maximum value of X over the block of $\mathcal{F}_{i}$ containing $\omega$ (and similarly for $-X$ ). More generally, “supremum” can be replaced by “essential supremum”. For more information about conditional range and diameter, see, for example, [Reference Isaev and McKay14, Section 2.1] and references therein. We will use that the conditional range is a seminorm and, in particular, it is subadditive.
Our first tool is the following result of McDiarmid [Reference McDiarmid19]. Further in this section, the notation $\textrm{ran}_i [\cdot]$ stands for $\textrm{ran} [\cdot \mid \mathcal{F}_{i}]$ .
Theorem 4.1 ([Reference McDiarmid19, Theorem 3.14]) Let $Y_0,Y_1,\ldots, Y_n$ be a real-valued martingale with respect to the filtration $\{\emptyset,\varOmega\}=\mathcal{F}_0,\mathcal{F}_1\ldots,\mathcal{F}_n$ . Denote
Then, for any $r,t > 0$
The normalized quadratic variation of a martingale sequence ${\textbf{Y}}= (Y_0, \ldots, Y_n)$ is defined by
Observe that
Thus,
A classical result by Brown [Reference Brown4] states that if the increments $Y_{i}-Y_{i-1}$ have finite variances, $ Q[{\textbf{Y}}] \xrightarrow{prob.} 1$ as $n \rightarrow \infty$ and a certain Lindeberg-type condition is satisfied then the limiting distribution of $Y_n$ is normal, i.e. $\delta_{\textrm{K}}[Y_n] \rightarrow 0$ . For a more restricted class of martingales with bounded differences, these conditions can be slightly simplified and will be sufficient for our purposes. Our second tool is the following result of Mourrat [Reference Mourrat23] which gives an explicit bound on the rate of convergence in the CLT under a strengthened condition that the normalized quadratic variation $Q[{\textbf{Y}}]$ converges to 1 in $L^p$ .
Theorem 4.2 ([Reference Mourrat23, Theorem 1.5.]) Let $p \in [1,+\infty)$ and $\gamma \in (0,+\infty)$ . There exists a constant $C_{p,\gamma}>0$ such that, for any real martingale sequence ${\textbf{Y}}= (Y_0, \ldots, Y_n)$ satisfying $|Y_i - Y_{i-1}| \leqslant \gamma$ for all $i=1,\ldots, n$ ,
One way to bound the term $\mathbb{E}\left[{ |Q[{\textbf{Y}}] -1|^p}\right]$ in the above is by applying Theorem 4.1 to the martingale for $Q[{\textbf{Y}}]$ with respect to the same filtration, as which gives the following lemma.
Lemma 4.3. Let $Y_0,\ldots, Y_n$ be a real-valued martingale with respect to the filtration $\{\emptyset,\varOmega\}=\mathcal{F}_0,\ldots,\mathcal{F}_n$ . For $\hat{q}>0$ , let $\mathcal{A}_{\hat{q}}$ denote the event
Then, for any $p \in [1,+\infty)$ , we have
where $ c_p = 2p \int_{0}^{+\infty} e^{-2x^2} x^{p-1} dx$ .
Proof. By definition, we have that $|Y_i - Y_{i\,-\,1}| \leqslant \textrm{ran}_{i\,-\,1} [Y_i]$ for all $i \in [n]$ . Therefore,
Observe also $\textrm{ran}_{i-1} \left[ (Y_j - Y_{j-1})^2\right] = 0$ for any $j<i$ . Then, using (8) and the subadditivity of the conditional range, we get that
Applying Theorem 4.1 to the martingale $\{\mathbb{E}\left[{Q[{\textbf{Y}}] \mid \mathcal{F}_i }\right]\}_{i=0,\ldots, n}$ , we find that
Substituting this bound into
and changing the variable $t = \hat{q} x$ , we complete the proof. Here, $t_{\max}= \sup |Q({\textbf{Y}}) -1|$ .
Using the formulas for $\mathbb{E}\left[{ (Y_j-Y_{j-1})^2 \mid \mathcal{F}_{i}}\right] $ similar to (8), we find that
Then, by the subadditivity of the conditional range, we get the next bound, which will be useful in applying Lemma 4.3.
The Doob martingale construction is another important tool in our argument. Suppose ${\textbf{X}} = (X_1,\ldots,X_n)$ is a random vector on $\mathcal{P}$ taking values in S and $f\,:\,S\rightarrow {\mathbb{R}}$ is such that $f({\textbf{X}})$ has bounded expectation. Consider the filtration $\mathcal{F}_0,\ldots \mathcal{F}_n$ defined by $\mathcal{F}_i = \sigma(X_1,\ldots,X_i)$ which is the $\sigma$ -field generated by random variables $X_1,\ldots X_i$ . Then, the Doob martingale ${\textbf{Y}}^{\rm Doob} = {\textbf{Y}}^{\rm Doob} (f,{\textbf{X}})$ is defined by, for all $i=0,\ldots,n$ ,
In case of finite S, the random variables $Y_i^{\rm Doob} $ , $\rm{Var}\left[{Y_n^{\rm Doob} \mid \mathcal{F}_j}\right] $ and $\textrm{ran}_{i} [Y_n^{\rm Doob}] $ can be seen as functions $f_i, v_i, r_i \,:\, S \rightarrow {\mathbb{R}}$ of the random vector ${\textbf{X}}$ defined as follows: for ${\textbf{x}} \in S$ ,
where $x_1,\ldots,x_i$ are fixed and $X_{i+1},\ldots, X_n$ are random and both $\max$ and $\min$ are over ${\textbf{y}} \in S$ such that $y_j=x_j$ for $j=1,\ldots,i$ . If, in addition, random variables $X_1, \ldots, X_n$ are independent then
where the maximum is over ${\textbf{x}}, {\textbf{x}}' \in S$ that differ only in the ith coordinate.
In particular, the Doob martingale process is applicable for functions of random permutations since we can represent them as vectors. Let $S_n$ be the set of permutations of [n]. We write $\omega =(\omega_1,\ldots,\omega_n) \in S_n$ if $\omega$ maps j to $\omega_j$ . The product of two permutuations $\omega,\sigma \in S_n$ is defined by
which corresponds to the composition of $\omega$ and $\sigma$ if we treat them as functions on [n]. For a function $f\,:\, S_n \rightarrow {\mathbb{R}}$ and $1\leqslant i\neq j \leqslant n-1$ , define
Let ${\textbf{X}} = (X_1,\ldots,X_n)$ be a uniform random element of $S_n$ and ${\textbf{Y}}^{\rm Doob}(f, {\textbf{X}})$ be the Doob martingale sequence for $f({\textbf{X}})$ . Note that $Y^{\rm Doob}_n = Y^{\rm Doob}_{n-1} = f({\textbf{X}})$ since the first $n-1$ coordinates $X_i$ determine the permutation ${\textbf{X}}$ uniquely.
Lemma 4.4. If $Y^{\rm Doob} = Y^{\rm Doob}(f,{\textbf{X}})$ where $f\,:\, S_n \rightarrow {\mathbb{R}}$ and ${\textbf{X}}$ is a uniform random element of $S_n$ , then
-
a. $\displaystyle |Y^{\rm Doob}_i - Y^{\rm Doob}_{i-1}|\leqslant \rm{ran}_{i-1} \left[ Y^{\rm Doob}_i \right] \leqslant \alpha_i[f],$ for all $1\leqslant i \leqslant n-1$ .
-
b. $ \displaystyle \rm{ran}_{i-1} \left[ \mathbb{E}\left[{(Y^{\rm Doob}_j-Y^{\rm Doob}_{j-1})^2 \mid \mathcal{F}_i}\right] \right]\leqslant 2 \alpha_j[f] \varDelta_{ij}[f]$ , for all $1\leqslant i< j \leqslant n-1$ .
Proof. To show the first inequality in part (a), we observe that
by definition. The other bounds is a special case of [Reference Greenhill, Isaev and McKay11, Lemma 2.1.] for real-valued random variables, where the conditional range is the same as the conditional diameter.
5. Martingales for tree parameters
To prove Theorem 1.1, we use the martingale based on the Aldous–Broder algorithm, which generates a random spanning tree of a given graph G. Here is a quick summary: (1) consider the random walk starting from any vertex; (2) every time we traverse an edge which takes us to a vertex we have not yet explored, add this edge to the tree; (3) stop when we visited all vertices. The resulting random graph has uniform distribution over the set of spanning trees of G, for more details see [Reference Aldous1]. If G is the complete graph $K_n$ , $n\geqslant 2$ , this construction can be rephrased as the following two-stage procedure [Reference Aldous1, Algorithm 2]:
-
I. For $1\leqslant i \leqslant n-1$ connect vertex $i+1$ to vertex $V_{i} = \min\{i, U_{i}\}$ , where ${\textbf{U}} = (U_1,\ldots, U_{n-1})$ is uniformly distributed on $[n]^{n-1}$ .
-
II. Relabel vertices $1,\ldots,n$ as $X_1,\ldots,X_n$ , where ${\textbf{X}} = (X_1,\ldots,X_n)$ is a uniform random permutation from $S_n$ .
Let $T({\textbf{u}})$ is the tree produced at stage I given that ${\textbf{U}}= {\textbf{u}}$ . For a permutation $\omega\in S_n$ and a tree $T\in \mathcal{T}_n$ , let
From [Reference Aldous1] we know that $T({\textbf{U}})^{{\textbf{X}}}$ has uniform distribution on the set $\mathcal{T}_n$ . Now, a tree parameter $F\,:\, \mathcal{T}_n \rightarrow {\mathbb{R}} $ can be seen as a function with domain $[n]^{n-1} \times S_n$ . Consider the functions $\hat{F}\,:\, \mathcal{T}_n \rightarrow {\mathbb{R}} $ and $F_T\,:\, S_n \rightarrow {\mathbb{R}}$ defined by
Let ${\textbf{Y}} = (Y_0, \ldots, Y_{n-1})$ and ${\textbf{Z}}(T) = (Z_0(T), \ldots, Z_{n-1}(T))$ be the Doob martingale sequences for $ \hat{F}(T({\textbf{U}}))$ and $F_T({\textbf{X}})$ , respectively: for $i = 0,\ldots ,n-1$ ,
where the filtrations are $\mathcal{F}_i = \sigma(U_1,\ldots,U_i)$ and $\mathcal{G}_i = \sigma(X_1,\ldots, X_{i})$ . We construct the martingale for $F({\textbf{T}})$ by combining the above two sequences together. Further in this section, we will use the following notations for conditional statistics of a random variable W with respect to $\mathcal{F}_i$ and $\mathcal{G}_i$ :
5.1. Properties of $F_T$ and $\hat{F}$
First, we study properties of functions $F_T$ and $\hat{F}$ from (13) given that the parameter F is $\alpha$ -Lipschitz and $\rho$ -superposable.
Lemma 5.1. Let a tree parameter $F\,:\, \mathcal{T}_n\rightarrow {\mathbb{R}}$ be $\alpha$ -Lipschitz and $\rho$ -superposable for some $\alpha\geqslant 0$ and $\rho\geqslant 1$ , then
-
a. $\hat{F}$ is $\alpha$ -Lipschitz and $\rho$ -superposable.
Furthermore, the following holds for all trees $T \in \mathcal{T}_n$ and permutations $\omega \in S_n$ .
-
b. If (ia) is a transposition from $S_n$ , then
\begin{equation*} |F_T(\omega) - F_T(\omega \circ (ia))| \leqslant \alpha (\rm{deg}_T (i)+\rm{deg}_T (a)), \end{equation*}where $\rm{deg}_T (i)$ , $\rm{deg}_T (a)$ are degrees of i, a in the tree T. -
c. Let $T' = \rm{S}_{q}^{rs} T$ be a tree for some triple (q, r, s). If (ia) is a transposition from $S_n$ that $d_T(\{i,a\}, \{r,s\}) \geqslant \rho+1$ , then
\begin{equation*} F_T(\omega) - F_T(\omega \circ (ia) ) - F_{T'}(\omega) + F_{T'} (\omega \circ (ia)) = 0. \end{equation*} -
d. If (ia), (jb) are transpositions from $S_n$ such that $d_T(\{i,a\}, \{j,b\}) \geqslant \rho+2$ , then
\begin{equation*} F_T(\omega) - F_T(\omega \circ (ia) ) - F_T(\omega \circ (jb) ) + F_T(\omega \circ (jb) \circ (ia)) = 0. \end{equation*}
Proof. For any permutation $\omega = (\omega_1,\ldots,\omega_n)\in S_n$ define the function $F_\omega\,:\, \mathcal{T}_n \rightarrow {\mathbb{R}}$ by $F_\omega(T)\,:\!=\, F(T^\omega)$ . If $\textrm{S}_{i}^{jk}T$ is a tree then $(\textrm{S}_{i}^{jk}T)^{\omega} = \textrm{S}_{\omega_i}^{\omega_j \omega_k} T^\omega$ . Relabelling also does not change the distances, that is, $d_T(a,b) = d_{T^{\omega}} (\omega_a,\omega_b)$ for all $a,b\in [n]$ . Thus, $F_\omega$ is $\alpha$ -Lipschitz and $\rho$ -superposable. Averaging over all $\omega$ proves part (a).
For part (b), we show that the tree $T^{(ia)}$ can be obtained from T by performing at most $\textrm{deg}_T(i)+\textrm{deg}_T(a)$ tree perturbations $\textrm{S}_{x}^{yz}$ . We denote the set of these perturbations by $\mathcal{P}^{ia}_T$ . Let u and v be the vertices on the path from i to a in T adjacent to i and a, respectively. Consider $\textrm{deg}_T(i)-1$ perturbations $\textrm{S}_{x}^{ia}$ for all vertices $x \neq u$ adjacent to i and $\textrm{deg}_T(a)-1$ perturbations $\textrm{S}_{x}^{ai}$ for all for all vertices $x \neq v$ adjacent to a. If $d_T(a,i) \leqslant 2$ then performing these $\textrm{deg}_T(i) + \textrm{deg}_T(a)-2$ perturbations in any order turns T into $T^{(ia)}$ . Otherwise, all vertices i, a, u, v are distinct and we need two more perturbations $\textrm{S}_{i}^{uv}$ and $\textrm{S}_{a}^{vu}$ to obtain $T^{(ia)}$ . This defines the set $\mathcal{P}^{ia}_T$ . Now, since F is $\alpha$ -Lipschitz, the value of the function changes by at most $\alpha$ after each perturbation so
The above holds for any $T\in \mathcal{T}_n$ . Substituting $T^{\omega}$ and observing $\textrm{deg}_{T^{\omega}}(\omega_i) =\textrm{deg}_T(i)$ , we prove part (b).
Before proving parts (c) and (d), we outline some important properties of the set $\mathcal{P}^{ia}_T $ of the tree perturbations that turn T into $T^{(ia)}$ .
-
i. The perturbations of $\mathcal{P}^{ia}_T$ can be performed in any order, that is, all intermediate graphs are trees.
-
ii. $d_T(\{x,y,z\}, \{i,a\}) =0$ for any $\textrm{S}_{x}^{yz} \in \mathcal{P}^{ia}_T$ , that is, $\{x,y,z\} \cap \{i,a\} \neq \emptyset$ .
-
iii. the distance from any $w\in[n]$ to $\{i,a\}$ is unchanged by perturbations $\textrm{S}_{x}^{ia}$ or $\textrm{S}_{x}^{ai}$ .
-
iv. the distance from any $w\in[n]$ to $\{i,a\}$ can increase after performing one of the perturbations $\textrm{S}_{i}^{uv}$ or $\textrm{S}_{a}^{vu}$ but then it decreases back to the initial value after performing the second (so it never gets smaller than the initial distance $d_T(w,\{i,a\})$ ).
For (c), observe first that $d_T(\{i,a\},\{r,s\}) \geqslant 2$ implies that i and a are adjacent to the same sets of vertices in T and T ′. Consider first the case when both u and v belong to the path from i to a in the tree T ′. For example, this is always the case when the path from i to a is not affected by removing the edge qr. Then, by definition, $\mathcal{P}^{ia}_T = \mathcal{P}^{ia}_{T'}$ that is we can use the same sets of perturbations to change labels i and a in both trees. We order them arbitrary to form a sequence $(\textrm{S}_1, \ldots, \textrm{S}_k)$ . Note also that for any perturbation $\textrm{S}_x^{yz} \in \mathcal{P}^{ia}_T $ we have $d_T(\{y,z\},\{r,s\}) \geqslant \rho$ due to the property (ii) and $d_T(\{i,a\},\{r,s\}) \geqslant \rho+1$ . Since F is $\rho$ -superposable and using properties (iii) and (iv), we get that
Summing up these equalities for all $t =0, \ldots k-1$ , we get that
We still need to consider the case when removing qr changes the path from i to a such that u or v do not lie on the path anymore. In this case, one have to be slightly more careful with the order of perturbations $(\textrm{S}_1, \ldots, \textrm{S}_k)$ to avoid the appearance of cycles in $\textrm{S}_t \cdots \textrm{S}_1 T'$ . Without loss of generality, we may assume that $d_T(i,q)<d_T(i,r)$ (otherwise, swap the roles of i and a). Let v ′ be the vertex adjacent to a that lies on the path from i to a in T ′. In notations of part (b), we define $\textrm{S}_1 = \textrm{S}_i^{uv}$ and $\textrm{S}_2 = \textrm{S}_{v'}^{ai}$ , then put the remaining perturbations in any order. A sequence $(\textrm{S}_1, \ldots, \textrm{S}_k)$ defined in this way ensures that all intermediate steps from T ′ to $T'^{(ia)}$ are trees. Repeating the same argument as above, we prove (15). To complete the proof of part (c), we just need to substitute T by $T^{\omega}$ similarly to part (b).
Finally, we prove (d) by repeatedly using part (c) for a sequence of perturbations $\textrm{S}_{q}^{rs} \in \mathcal{P}_T^{jb}$ that turn T into $T^{(jb)}$ . We can apply part (c) for all intermediate trees T ′ because the assumption $d_T(\{i,a\}, \{j,b\}) \geqslant \rho +2$ together with properties (ii), (iii), (iv) implies that $d_{T'}(\{i,a\}, \{r,s\}) \geqslant$ $\rho +1$ .
5.2. Martingale properties
Here, we establish the properties of martingales ${\textbf{Y}}$ and ${\textbf{Z}}(T)$ from (14) needed to apply the results of Section 4. For a tree $T\in \mathcal{T}_n$ and $A,B\subset [n]$ , define
We will repeatedly use the fact that for any $T\in \mathcal{T}_n$ and $i \in [n]$ , we have
where $\beta(T)$ is the parameter defined in (2). In the following, for simplicity of notations, we write $ \mathbb{1}_T^\rho(i,B)$ , or $ \mathbb{1}_T^\rho(A,j)$ , or $ \mathbb{1}_T^\rho(i,j)$ when A, or B, or both are one-element sets. Let $\mathcal{T}_n^d\subset \mathcal{T}_n$ be the set of trees with degrees at most d. We denote by $a\wedge b$ the minimum of two real numbers a, b.
Lemma 5.2. Let $F\,:\, \mathcal{T}_n\rightarrow {\mathbb{R}}$ be $\alpha$ -Lipschitz and $\rho$ -superposable for some $\alpha\geqslant 0$ and $\rho\geqslant 1$ . Then, the following holds for all $i\in[n-1]$ , $d \in {\mathbb{R}}^+$ and $T \in \mathcal{T}_n^d$ .
-
a. $|Y_i-Y_{i-1}| \leqslant \rm{ran}_{\mathcal{F}_{i-1}} [Y_i] \leqslant \alpha$ .
-
b. $ \displaystyle \rm{ran}_{\mathcal{F}_{i-1}} \left[\rm{Var}_{\mathcal{F}_i} [Y_{n-1}] \right] \leqslant 32\alpha^2 \rho^2 \rm{sup}_{\mathcal{F}_{i-1}} \left[ {\mathbb{E}}_{\mathcal{F}_{i}} [\beta(T({\textbf{U}}))] \right]. $
-
c. $\displaystyle |Z_i(T)-Z_{i-1}(T)|\leqslant \rm{ran}_{\mathcal{G}_{i-1}} [Z_i(T)] \leqslant \max_{\omega,(ia)\in S_n} |F(T^{\omega}) - F(T^{ \omega \circ (ia) })| \leqslant 2\alpha d. $
-
d. $\displaystyle \rm{ran}_{\mathcal{G}_{i-1}} \left[\rm{Var}_{\mathcal{G}_i} [Z_{n-1}(T)]\right] \leqslant 64\alpha^2 d^2 (\rho+2)^2 \beta(T) {\rm{log}}\, n. $
-
e. Let $V({\textbf{u}}) \,:\!=\, \rm{Var}\left[{Z_{n-1}(T({\textbf{u}}))}\right] = \rm{Var}\left[{F_{T({\textbf{u}})} ({\textbf{X}})}\right] $ . Then, $0\leqslant V({\textbf{U}}) \leqslant 4 \alpha^2 n^2$ and
\begin{align*} \rm{ran}_{\mathcal{F}_{i-1}} \Big[ {\mathbb{E}}_{\mathcal{F}_i} &\left[V({\textbf{U}}) \mathbb{1}_{T({\textbf{U}}) \in \mathcal{T}_n^d}\right] \Big] \\[5pt] &\leqslant \alpha^2 \rm{sup}_{\mathcal{F}_{i-1}} \left[ {\mathbb{E}}_{\mathcal{F}_{i}} \left[4n^2 \mathbb{1}_{T({\textbf{U}}) \notin \mathcal{T}_n^d} + 8d^2 (\rho+1)^2 \beta(T({\textbf{U}})) \right]\right]. \end{align*}
Proof. Using bound (12), we find that
where ${\textbf{u}},{\textbf{u}}' \in [n]^{n-1}$ differ in ith coordinate. Observe that
From Lemma 5.1(a), we know that $\hat{F}(T)$ is $\alpha$ -Lipschitz. Part (a) follows.
As explained in $(11)$ , we have $Y_i = f_i({\textbf{U}})$ , where
and ${\mathbb{E}} (\cdot \mid u_{\leqslant i})$ stands for ${\mathbb{E}} (\cdot \mid U_1=u_1,\ldots,U_i=u_i)$ . Let $0\leqslant i<j \leqslant n-1$ . Using formula (17), we find that
Consider ${\textbf{u}}'\in [n]^{n-1}$ that differs from ${\textbf{u}}$ only in ith coordinate. Then, we have
From part (a), we have $0\leqslant |f_j - f_{j-1}| \leqslant \alpha$ . Observe also that if $U_1=u_1, \ldots,U_{j-1}=u_{j-1} $ and $v \in [i]$ then
That is, the distance between v and $\{i \wedge u_i,i \wedge u^{\prime}_i\}$ is completely determined by $u_1, \ldots, u_{j-1}$ and v. From Lemma 5.1(a), we know that $\hat{F}(T)$ is $\rho$ -superposable. Thus, we find that
Using (16), we can bound
Similarly to (11), let $\textrm{ran}_{\mathcal{F}_{i-1}} \left[\textrm{Var}_{\mathcal{F}_i} [Y_{n-1}]\right] = r(U_1,\ldots,U_{i-1})$ . Using (9), (12) and taking the conditional expectation given $U_1= u_1,\ldots,U_{i-1}=u_{i-1}$ for the bounds above, we obtain that
This completes the proof part (b).
Part (c) immediately follows from Lemma 4.4(a) and Lemma 5.1(b). Indeed,
For (d), recall from (10) that
We will apply Lemma 4.4(b) to estimate the right-hand side of (19). From Lemma 5.1(d) and the bound (18), we get that
Bounding
and using (16), we find that, for $1\leqslant i <j \leqslant n-1$ ,
Combining (16), (18), Lemma 4.4(b) and the inequality
we obtain that
Finally, we proceed to part (e). Since F is $\alpha$ -Lipschitz, we have $|F(T) - F(T')| \leqslant 2 \alpha n$ for any two trees $T,T'\in \mathcal{T}_n$ . Indeed, applying at most n perturbations of type $\textrm{S}_{x}^{y 1}$ , where x is a leaf, we can turn any tree into a star centered at vertex 1. Thus, we can bound
Then, for any $\mathcal{A} \subset [n]^{n-1}$ and $u_1, \ldots, u_{i-1} \in [n]$ ,
where ${\textbf{U}}'$ differs from ${\textbf{U}}$ in ith coordinate only. For the following we put $\mathcal{A} = \{{\textbf{u}} \in [n]^{n-1} \mathrel{:} T({\textbf{u}}) \in \mathcal{T}_n^d\}$ . It remains to bound $V({\textbf{U}})- V({\textbf{U}}')$ when $T({\textbf{U}}),T({\textbf{U}}') \in \mathcal{T}_n^d$ .
Consider any ${\textbf{u}},{\textbf{u}}' \in [n]^{n-1}$ that differ in ith coordinate only and $T({\textbf{u}}), T({\textbf{u}}') \in \mathcal{T}_n^d$ . If $T({\textbf{u}}) = T({\textbf{u}}')$ , then $V({\textbf{u}}) = V({\textbf{u}}')$ . Otherwise, recalling (17), we can find some relabelling $\sigma \in S_n$ that the trees $T = T({\textbf{u}})^{\sigma}$ , $T'=T({\textbf{u}}')^{\sigma}$ satisfy $T' = \textrm{S}_{3}^{12} T$ and
Note that $\textrm{Var} [F_T({\textbf{X}})] = V({\textbf{u}})$ and $\textrm{Var} [F_{T'}({\textbf{X}})] = V({\textbf{u}}')$ . Using Lemma 5.1(c) and (18), we find that, for any $1\leqslant i <a\leqslant n$ ,
Applying Lemma 4.4(a) to the function $F_T - F_{T'}$ , we obtain
We have already proved in part (b) that $|Z_i(T) - Z_{i-1}(T)| \leqslant 2 \alpha d$ . Using (8) and (16), we bound
Part (e) follows.
5.3. Proof of Theorem 1.1
Before proving of Theorem 1.1, we need one more lemma. Let
Lemma 5.3. The following asymptotics bounds hold for any ${\textbf{u}} \in \mathcal{U}_{\rm small}$ , $u \in [n]$ :
Proof. The first bound follows immediately from (1) and Theorem 1.4. For the second, observe that, for any $u^{\prime}_1,\ldots, u^{\prime}_{i-1} \in [n]$ ,
Indeed, let ${\textbf{U}}$ , ${\textbf{U}}'$ are such that $U_j = u_j$ and $U^{\prime}_j = u^{\prime}_j$ for $j \in [i-1]$ and $U_j = U^{\prime}_j$ for $j \geqslant i$ . Then, $T({\textbf{U}}) \subset T(U') \cup T({\textbf{u}})$ because the edges corresponding from $i-1$ steps of the Aldous–Broder algorithm for $T({\textbf{U}})$ lie in $T({\textbf{u}})$ , while the remaining edges are covered by T(U ′)). We know that ${\textbf{u}} \in \mathcal{U}_{\rm small}$ . Therefore, if ${\textbf{U}}' \in \mathcal{U}_{\rm small}$ , then ${\textbf{U}} \in \mathcal{U}_{\rm big}$ .
Next, averaging over all $u^{\prime}_1,\ldots, u^{\prime}_{i-1} \in [n]$ , we conclude that
Note that, for any $u \in [n]$ ,
Recalling $ \mathbb{P} \left({\textbf{U}} \notin \mathcal{U}_{\rm small}\right) = e^{-\omega({\textrm{log}}\, n)}$ , we complete the proof.
Now we are ready to prove Theorem 1.1, our main result. Let ${\textbf{Y}}$ and ${\textbf{Z}}(T)$ be the martingales from (14). Consider the sequence ${\textbf{W}} = (W_0, \ldots, W_{2n-2})$ defined by
Note that ${\textbf{W}}$ is a martingale with the respect to the filtration $\mathcal{F}^{\prime}_0, \ldots, \mathcal{F}^{\prime}_{2n-2}$ , where $\mathcal{F}^{\prime}_i = \mathcal{F}_i,$ for $i\leqslant n-1$ and $ \mathcal{F}^{\prime}_i = \mathcal{F}_{n-1} \times \mathcal{G}_{i-n+1},$ for $i \geqslant n$ . Using (1), (8), and Lemma 5.2(e), we get that
Then, by assumptions of Theorem 1.1, we get $\rm{Var}\left[{W_{2n-2}}\right] = \left(1+ e^{-\omega({\textrm{log}}\, n)}\right)\rm{Var}\left[{F({\textbf{T}})}\right] $ and
Using Lemma 5.2(a,c), we obtain that, for all $i\in [2n-2]$ ,
Let ${\textbf{u}} \in [n]^{n-1} \in U_{\rm small}$ . Combining Lemmas 5.2(b,d,e) and 5.3 and observing $\beta(T)\leqslant n^2$ for all $T \in \mathcal{T}_n$ , we get that, for all $i\in [n-1]$
Note that, in the case of the event ${\textbf{U}} \in \mathcal{U}_{\rm small}$ , we have $W_i = Z_i(T({\textbf{U}}))$ and
Then, we obtain that if ${\textbf{U}} \in \mathcal{U}_{\rm small}$ then, for all $i \in [2n-2]$ ,
Using (20), we conclude that, with probability $1-e^{-\omega({\textrm{log}}\, n)}$ ,
Let $\tilde{\varepsilon} \in (0,\varepsilon)$ . Setting $\hat{q}= n^{-2\tilde{\varepsilon}}$ and applying Lemma 4.3, we get that, for any $p \in[1,+\infty)$ ,
Using (21) and (20), we can bound
Applying Theorem 4.2 to the scaled martingale sequence ${\textbf{W}} / (\alpha\, {\textrm{log}}\, n)$ , we get that
We can make $2p \tilde{\varepsilon}/(2p+1) \geqslant \varepsilon'$ for any $\varepsilon'\in (0,\varepsilon)$ by taking $\tilde{\varepsilon}$ to be sufficiently close to $\varepsilon$ and p to be sufficiently large. Recalling that $W_{2n-2} = F(T({\textbf{U}})^{{\textbf{X}}})$ with probability $1 - e^{\omega({\textrm{log}}\, n)}$ (that is for the event $T({\textbf{U}}) \in \mathcal{T}_n^{{\textrm{log}}\, n}$ ) and $\textrm{Var} [W_{2n-2}] = \left(1+ e^{-\omega({\textrm{log}}\, n)}\right) \rm{Var}\left[{F({\textbf{T}})}\right]$ , the required bound for $\delta_{K}[F({\textbf{T}})]$ follows.
Remark 5.4. The proof of Theorem 1.1 can be significantly simplified under additional assumption that the tree parameter F is symmetric. Namely, we would not need the martingale sequence ${\textbf{Z}}(T)$ , the bounds of Section 5.1, and we would only use parts (a), (b) from Lemma 5.2. In fact, a symmetric version of Theorem 1.1 would be sufficient to cover all applications given in Sections 2 and 3. Our decision to consider arbitrary tree parameters serves two purposes. First, the result is significantly stronger. Second, the analysis of martingales based on functions with dependent random variables is essential for extensions to more sophisticated tree models.
Remark 5.5. Combining Lemma 5.2(a,c) and Theorem 4.1 one can easily derive fast decreasing bounds for the tail of the distribution of $F({\textbf{T}})$ , provided a tree parameter F is $\alpha$ -Lipschitz. Cooper, McGrae and Zito [Reference Cooper, McGrae and Zito6, Section 4] used a different martingale construction for trees to establish the concentration of $F({\textbf{T}})$ around its expectation; however, they needed more restrictive assumptions about the tree parameter F.
6. The balls in random trees are not too large
In this section, we prove Theorem 1.4 using martingales. For a tree $T\in \mathcal{T}_n$ , let $\Gamma^k_{T}(v)$ be the set of all vertices at distance exactly k from v. Theorem 1.4 follows immediately from Lemmas 6.2 and 6.4 (stated below) by summing over all $|\Gamma_{{\textbf{T}}}^k(v)|$ for $k=1,\ldots, d$ and using the union bound over all vertices $v \in [n]$ .
Let $a>b$ be positive integers. Let A be an arbitrary set of a vertices from [n], and B be its subset on b vertices. Consider event $\mathcal{E}_{A,B}$ that A induces a tree and vertices of $A{\setminus}B$ have neighbours only in A. For $T\in\mathcal{T}_n$ , let $\xi_{A,B}(T)$ be the number of neighbours of B in T outside A. Below, we denote the random variable $\xi_{A,B}({\textbf{T}})$ simply $\xi_{A,B}$ .
Lemma 6.1. The conditional distribution of $\xi_{A,B}-1$ subject to $\mathcal{E}_{A,B}$ is binomial with parameters $(n-a-1,\frac{b}{n-a+b})$ .
Proof. Let $T_0$ be a tree on A. Consider event $\mathcal{E}_{A,B,T_0}$ that A induces exactly the given subtree $T_0$ and vertices of $A{\setminus}B$ have neighbours only in A. By Lemma 2.3,
Let $k\in\mathbb{N}$ . By Lemma 2.3,
Therefore,
which is the required distribution.
Fix a vertex $v\in [n]$ . Define the sequence of random variables $X_0,\ldots, X_n$ by
From Lemma 6.1, we have $X_1-1\sim$ Bin $(n-2,\frac{1}{n})$ . Notice that, for $k\geqslant 1$ , the vertices of $\Gamma^{k+1}_{{\textbf{T}}}(v)$ are adjacent only to the vertices of $\Gamma^k_{{\textbf{T}}}(v)$ in $\bigsqcup_{j\leqslant k+1}\Gamma^j_{{\textbf{T}}}(v)$ . Let $(x_1,\ldots,x_k)$ be a sequence of positive integers such that $1+x_1+\ldots+x_k\leqslant n$ . By Lemma 6.1, if $x_1+\ldots+x_k\leqslant n-3$ , then the conditional distribution of $X_{k+1}-1$ subject to $(X_1=x_1,\ldots,X_k=x_k)$ is binomial with parameters $n-x_1-\ldots-x_k-2$ and $\frac{x_k}{n-x_1-\ldots-x_{k-1}-1}$ . If $x_1+\ldots+x_k=n-2$ , then $X_{k+1}=1$ . Finally, if $x_1+\ldots+x_k=n-1$ , then $X_{k+1}=0$ .
Lemma 6.2. There exists a sequence $X_0=X^{\prime}_0,X^{\prime}_1,\ldots,X^{\prime}_n$ such that
-
• $X^{\prime}_k\geqslant X_k$ ,
-
• for $k\geqslant 0$ , the distribution of $X^{\prime}_{k+1}-1$ subject to $X_j=x_j,X^{\prime}_j =x^{\prime}_j $ , $j\in[k]$ , is
\begin{equation*}\begin{cases}\rm{Bin}\left(n-\sum_{j=0}^{k-1} x_j,\frac{x^{\prime}_k}{n-\sum_{j=0}^{k-1} x_j}\right), &\text{if }n-\sum_{j=0}^{k-1} x_j\geqslant x^{\prime}_k,\\[10pt]x^{\prime}_k \text{ with probability } 1, &\text{otherwise}.\end{cases}\end{equation*}
Proof. It is straightforward since, for every k, we preserve the denominator of the second parameter of the binomial distribution but make the first one larger.
Note that $(X^{\prime}_k-k)_{k\in[n]}$ is a martingale sequence. Unfortunately, we can not apply Theorem 4.1 directly because every $X^{\prime}_k$ ranges in a large interval (mostly for small k). Instead, we cut the tails of these random variables and construct a new martingale. To do that we need the following property of binomial distributions.
Lemma 6.3. Let N and $a\leqslant N$ be positive integers, $\xi\sim\rm{Bin}(N,\frac{a}{N})$ . Then, for every $b\in\mathbb{N}$ , there exists an interval $\mathcal{I}=\mathcal{I}(N,a,b)\subset[a-b,a+b]$ such that
-
• $\mathbb{P}(\xi\notin\mathcal{I})\leqslant N^2\mathbb{P}(\xi\notin[a-b,a+b])$ ,
-
• $\exists c\in[a-b,a+b]$ such that the function $f\,:\,\mathbb{R}\to\mathbb{R}$ defined by
\begin{equation*}f(x)\,:\!=\,\left\{\begin{array}{cc}x, & x\in\mathcal{I} \\ c, & x\notin\mathcal{I}\end{array}\right.\end{equation*}satisfies $\mathbb{E}\left[{f(\xi)}\right] =a$ .
Proof. For $a=N/2$ , we get the result by setting $\mathcal{I}=[a-b,a+b]$ and $c=a$ . For the following, without loss of the generality, we may assume $a<N/2$ since the proof for $a>N/2$ is symmetric.
Let us consider the set $\mathcal{S}$ of all integers s such that
It is clear that $0\in\mathcal{S}$ . However, for every $x\in\mathbb{N}$ , $\mathbb{P}(\xi=a-x)>\mathbb{P}(\xi=a+x)$ . Indeed,
Therefore, $b\notin\mathcal{S}$ . Let $s^*$ be the maximum integer from $\mathcal{S}$ . Then, $s^*\in[1,b-1]$ and
Let us prove that $\mathcal{I}=[a-s^*,a+b]$ is the desired interval. From (23), we get
Moreover, since (22) holds for $s=s^*$ ,
Therefore, there exists $c\in(a-s^*-1,a]$ such that ${\mathbb{E}}[cI(\xi\notin\mathcal{I})+\xi I(\xi\in\mathcal{I})]=a$ .
It remains to estimate $\mathbb{P}(\xi\notin\mathcal{I})$ from above. Notice that, from (23),
Therefore, $s^*\mathbb{P}(\xi<a-s^*-1)<N\mathbb{P}(\xi>a+b)$ . Since $2a\mathbb{P}(\xi=a-s^*-2)>\mathbb{P}(\xi=a-s^*-1)$ , we get
and this immediately implies that $\mathbb{P}(\xi\notin\mathcal{I})\leqslant N^2\mathbb{P}(\xi\notin[a-b,a+b])$ .
Now, we are ready to construct a martingale sequence that coincides with $X^{\prime}_k-k$ with probability very close to 1, but is more suitable for applying Theorem 4.1. For every $k\geqslant 2$ , consider the event
For $\omega\in\mathcal{B}_k$ , denote
Let
Define the sequence $(Y_k)_{k\in[n]}$ as follows. Let $Y_0\,:\!=\,X^{\prime}_0=1$ . For $k\geqslant 1$ , set
Using Lemmas 6.2 and 6.3, we find that $(Y_0,Y_1,\ldots,Y_n)$ is a martingale sequence with respect to the filtration $\mathcal{F}_i = \sigma(X_j,X^{\prime}_j \mathrel{:} \,0\leqslant j\leqslant i)$ for all $i\in\{0,1,\ldots,n\}$ .
Lemma 6.4. Let $c>0$ be a fixed constant. Then, the following bounds hold:
-
a. $\mathbb{P}\left(\exists k\in[n]:\,\,Y_k>k\,{\rm{log}}^4 n\right)\leqslant e^{-\omega({\rm{log}}\, n)}$ ,
-
b. $\mathbb{P}\left(\exists k\in[n]:\,\,Y_k\neq X^{\prime}_k -k\right)\leqslant e^{-\omega({\rm{log}}\, n)}$ .
Proof. For (a), we apply Theorem 4.1. First, we estimate the conditional ranges. From Lemma 6.3, we get that, for all $k\in[n]$
We prove by induction on k that ${\textsf{P}}(Y_k> k\,{\textrm{log}}^4\, n)\leqslant\exp[-c\,{\textrm{log}}^2 n]$ , where $c>0$ does not depend on k and n. For $k=1$ , we have $\mathbb{P}(Y_1>{\textrm{log}}^4\, n)\leqslant \mathbb{P}(Y_1>{\textrm{log}}\, n)=0$ .
Assume that $\mathbb{P}(Y_j> j\,{\textrm{log}}^4\, n-j)\leqslant\exp[-{\textrm{log}}^2 n(1+o(1))]$ for all $j\leqslant k$ . Then, with a probability at least $1-n\exp[{-}{\textrm{log}}^2 n(1+o(1))]=1-\exp[{-}{\textrm{log}}^2 n(1+o(1))]$ ,
Therefore, by Theorem 4.1,
This proves (a).
For (b), observe that, by the definition of $Y_k$ ,
Each term in the sum above is $e^{-\omega({\textrm{log}}\, n)}$ by Lemma 6.5 and the difinition of $X^{\prime}_k$ given in Lemma 6.2. Part (b) follows.
Lemma 6.5. For n large enough and all positive integers $a\leqslant N$ , a random variable $\xi \sim \rm{Bin}(N,a/N)$ satisfies the following:
Proof. By the Chernoff bounds,
It is straightforward to check that the stated bound holds for all possible values of a.