Hostname: page-component-586b7cd67f-t7czq Total loading time: 0 Render date: 2024-11-22T11:58:28.391Z Has data issue: false hasContentIssue false

Community detection and percolation of information in a geometric setting

Published online by Cambridge University Press:  31 May 2022

Ronen Eldan
Affiliation:
Weizmann Institute, Rehovot, Israel
Dan Mikulincer*
Affiliation:
Weizmann Institute, Rehovot, Israel
Hester Pieters
Affiliation:
Weizmann Institute, Rehovot, Israel
*
*Corresponding author. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

We make the first steps towards generalising the theory of stochastic block models, in the sparse regime, towards a model where the discrete community structure is replaced by an underlying geometry. We consider a geometric random graph over a homogeneous metric space where the probability of two vertices to be connected is an arbitrary function of the distance. We give sufficient conditions under which the locations can be recovered (up to an isomorphism of the space) in the sparse regime. Moreover, we define a geometric counterpart of the model of flow of information on trees, due to Mossel and Peres, in which one considers a branching random walk on a sphere and the goal is to recover the location of the root based on the locations of leaves. We give some sufficient conditions for percolation and for non-percolation of information in this model.

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1. Introduction

Community detection in large networks is a central task in data science. It is often the case that one gets to observe a large network, the links of which depends on some unknown, underlying community structure. A natural task in this case is to detect and recover this community structure to the best possible accuracy. Perhaps the most well-studied model in this topic is the stochastic block model (SBM) where a random graph whose vertex set is composed of several communities, $\{c_1,\ldots,c_k\}$ is generated in a way that every pair of nodes $v,u$ which belong to communities $c(u), c(v)$ , will be connected to each other with probability $p = p(c(v),c(u))$ , hence with probability that only depends on the respective communities, and otherwise independently. The task is to recover the communities $c(\!\cdot\!)$ based on the graph. The (unknown) association of nodes with communities is usually assumed to be random and independent between different nodes. See [Reference Abbe1] for an extensive review of this model. A natural extension of the SBM is the geometric random graph, where the discrete set of communities is replaced by a metric space. More formally, given a metric space $(X,d)$ , a function $f\;:\; V \to X$ from a vertex set $V$ to the metric space and a function $\varphi \;:\;\mathbb R_+ \to [0,1]$ , a graph is formed by connecting each pair of vertices $u,v$ independently, with probability

\begin{equation*} p(u,v) \;:\!=\; \varphi \left ( d(f(u), f(v)) \right ). \end{equation*}

This model can sometimes mimic the behaviour of real-world networks more accurately than the SBM. For example, a user in a social network may be represented as a point in some linear space in a way that the coordinates correspond to attributes of her personality and her geographic location. The likelihood of two persons being associated with each other in the network will then depend on the proximity of several of these attributes. A flat community structure may therefore be two simplistic to reflect these underlying attributes. Therefore, a natural extension of the theory of SBMs would be to understand under what conditions the geometric representation can be recovered by looking at the graph. Our focus is on the case that the metric is defined over a symmetric space, such as the Euclidean sphere in $d$ -dimensions. By symmetry, we mean that the probability of two vertices to be connected, given their locations, is invariant under a natural group acting on the space. We are interested in the sparse regime where the expected degrees of the vertices do not converge to infinity with the size of the graph. This is the (arguably) natural and most challenging regime for the SBM.

1.1 Inference in geometric random graphs

For the sake of simplicity, in what follows, we will assume that the metric space is the Euclidean sphere, and our main theorems will be formulated in this setting; It will be straightforward, yet technical, to generalise our results to more general symmetric space (see [Reference De Castro, Lacour and Ngoc6] for further discussion on this point).We begin by introducing a model for a random geometric graph. Let $\sigma$ be the uniform probability measure on $\mathbb{S}^{d-1}$ and let $\varphi \;:\; \mathbb{S}^{d-1} \times \mathbb{S}^{d-1} \to \mathbb R_+$ , be of the form $\varphi (x,y) = f(\langle x,y\rangle )$ for some $f\;:\; [-1,1] \to \mathbb R_+$ . Consider $\{X_i\}_{i=1}^n\sim \sigma$ , a sequence of independently sampled vectors. Let $G\left (n, \frac{1}{n} \varphi (X_i, X_j)\right )$ be the inhomogeneous Erdös-Rényi graph model where edges are formed independently with probability $\frac{1}{n} \varphi (X_i, X_j)$ and let $A$ be the adjacency matrix of a random graph drawn from $G\left (n, \frac{1}{n}\varphi (X_i,X_j)\right )$ . We may freely assume that $n$ is large enough so that, necessarily, $\frac{1}{n} \varphi (X_i, X_j) \leq 1$ .We now define the task of recovering the latent positions of the sample $\{X_i\}_{i=1}^n$ from an observation of the random geometric graph.

Definition 1. We say that the model is $\varepsilon -{reconstructible}$ if, for all $n$ large enough, there is an algorithm which returns an $n \times n$ matrix $\mathcal{X}$ such that

\begin{equation*} \frac {1}{n^2} \sum _{i,j} |\mathcal {X}_{i,j} - X_i \cdot X_j|^2 \leq \varepsilon . \end{equation*}

Remark that, due the symmetry of the model, it is clear that the locations can only be reconstructed up to an orthogonal transformation, which is equivalent to reconstruction of the Gram matrix.

In order state the conditions under which we prove the possibility of reconstructions, we need some notation. For $\varphi$ as above, define the integral operator $A_\varphi \;:\; L^2\left (\mathbb{S}^{d-1}\right )\to L^2\left (\mathbb{S}^{d-1}\right )$ by

\begin{equation*}A_\varphi (g)(x) = \int \limits _{\mathbb {S}^{d-1}} \varphi (x,y)g(y)d\sigma (y).\end{equation*}

It is standard to show that $A_\varphi$ is a self-adjoint compact operator (see [Reference Hirsch and Lacombe10], for example) and so has a discrete spectrum, except at $0$ . By definition, $\varphi$ is invariant to rotations and so $A_\varphi$ commutes with the Laplacian. It follows that the eigenfunctions of $A_\varphi$ are precisely the spherical harmonics which we denote by $\{\psi _i\}_{i=0}^\infty$ . Thus, if $\lambda _i(A_\varphi )$ denotes the eigenvalue of $\varphi$ corresponding to $\psi _i$ we have the following identity,

(1) \begin{equation} \varphi = \sum \limits _{i=0}^\infty \lambda _i \psi _i \otimes \psi _i. \end{equation}

In particular, $\psi _0 = 1$ and for $i = 1,\ldots,d$ , $\psi _i$ are linear functionals such that, for $x,y \in \mathbb{S}^{d-1}$ ,

(2) \begin{equation} d \cdot \langle x,y\rangle = \sum _{l=1}^d \psi _l(x)\psi _l(y). \end{equation}

Note that in our notation the eigenvalues are indexed by the spherical harmonics, and are therefore not necessarily in decreasing order. By rotational invariance it must hold that

(3) \begin{equation} \lambda (\varphi ) \;:\!=\; \lambda _1=\ldots =\lambda _d. \end{equation}

Define $\|\varphi \|_\infty = \sup _{x,y} \varphi (x,y)$ . We make the following, arguably natural, assumptions on the function $\varphi$ :

  • $\rm A1.$ There exist $\delta \gt 0$ such that $\min _{i\neq 1,\ldots, d}|\lambda (\varphi ) - \lambda _i|\gt \delta$ .

  • $\rm A2.$ Reordering the eigenvalues in decreasing order $\lambda _{l_0}\geq \lambda _{l_1}\geq \lambda _{l_2}\geq \ldots$ there exists $C \gt 0$ such that for every $i\geq 0, |\lambda _{l_i}|\leq \frac{C}{(i+1)^2}$ .

Remark 2. For the spectral algorithm we propose, the eigenspace associated to $\lambda (\varphi )$ plays as a special role, since it may encode positions of latent vectors in $\mathbb R^d$ . Therefore, to extract meaningful information from the algorithm, it is crucial that this eigenspace is $d$ -dimensional; a property that is enforced by Assumption A1. However, it is not clear what would happen if, say, $\lambda _{d+1}$ is very close to $\lambda _d$ . Our analysis, which is based on stability of the eigenspaces to perturbations of the kernel, fails in this case, but it is plausible the algorithm (or another one) may still succeed.

Regarding Assumption A2. While, it is usually the case that, some integrability of eigenvalues is necessary for the problems we consider, we do not know whether the quadratic decay rate is tight. We have made no efforts to optimise this dependence and are willing to conjecture that there are possible improvements.

Theorem 3. For every $\varepsilon \gt 0$ there exists a constant $C=C(\varepsilon,d)$ , such that the model is $\varepsilon$ -reconstructible whenever

(4) \begin{equation} \min _{i\neq 0,\ldots,d}|\lambda _{i}-\lambda (\varphi )|^2 \geq C \| \varphi \|_{\infty }. \end{equation}

Remark 4. Observe that, since the left hand side of condition (4) is $2$ -homogeneous, whereas its right hand side is $1$ -homogeneous, we have that as long as the left hand side is nonzero, by multiplication of the function $\varphi$ by a large enough constant, the condition can be made to hold true.

Theorem 3 can be compared to the known bounds for recovery in the SBM (see [Reference Abbe1]). In particular, it is illuminating to compare the SBM with two communities to linear kernels in our model. In this case, both models are parameterised by two numbers. In the SBM these are the inter- and intra-communities probabilities and in our model, the coefficients of the linear function. In the SBM, denote the parameters as $a$ and $b$ , then recovery of the communities depends on the ratio $\frac{(a-b)^2}{a+b}$ . The example below gives a similar result for linear kernels, with a dimensional affect, which typically makes reconstruction easier.

Example 5. Consider the linear kernel, $\varphi (x,y) =\gamma + \zeta \langle x,y\rangle$ , with $|\zeta |\leq \gamma$ . A calculation shows that

\begin{align*} \lambda _0 &= \gamma \\[5pt] \lambda (\varphi ) &= \frac{\zeta }{d}. \end{align*}

Applying our theorem, we show that the model is reconstructible whenever

\begin{equation*} \left | \gamma -\frac {\zeta }{d}\right |^2 \geq C \cdot (\gamma +\zeta ). \end{equation*}

Methods and related works

Our reconstruction theorem is based on a spectral method, via the following steps:

  1. 1. We observe that by symmetry of our kernel, linear functions are among its eigenfunctions. We show that the kernel matrix (hence the matrix obtained by evaluating the kernel at pairs of the points $(X_i)$ ) will have respective eigenvalues and eigenvectors which approximate the ones of the continuous kernel.

  2. 2. Observing that the kernel matrix is the expectation of the adjacency matrix, we rely on a matrix concentration inequality due to Le-Levina-Vershynin [Reference Le, Levina and Vershynin12] to show that the eigenvalues of the former are close to the ones of the latter.

  3. 3. We use the Davis-Kahan theorem to show that the corresponding eigenvectors are also close to each other.

The idea in Steps 2 and 3 is not new, and rather standard (see [Reference Le, Levina and Vershynin12] and references therein). Thus, the main technical contribution in proving our upper bound is in Step 1, where we prove a bound for the convergence of eigenvectors of kernel matrices. So far, similar results have only been obtained in the special case that the Kernel is positive definite, see for instance [Reference Braun4].The paper [Reference Valdivia20] considers kernels satisfying some Sobolev-type hypotheses similar to our assumptions on $\varphi$ (but gives results on the spectrum rather than the eigenvectors). Reconstruction of the eigenspaces has been considered in [Reference Tang, Sussman and Priebe18], for positive definite kernels in the dense regime; in [Reference Sussman, Tang and Priebe17], for random dot products graphs; and in [Reference Valdivia and De Castro2] in the dense and relatively sparse regimes, again for kernels satisfying some Sobolev-type hypotheses. Let us also mention other works which augmented the SBM with geometric information. The paper [Reference Deshpande, Sen, Montanari, Mossel, Bengio, Wallach, Larochelle, Grauman, Cesa-Bianchi and Garnett7] considers the problem of discrete community recovery when presented with informative node covariates, inspired by the spiked covariance model. The authors derived a sharp threshold for recovery by introducing a spectral-like algorithm. However, the model is rather different than the one we propose in which the community structure is continuous. A model which is slightly closer to the one we consider appears in [Reference Galhotra, Mazumdar, Pal and Saha9]. In this model, communities are still discrete but the edge connectivity depends continuously on the latent positions of nodes on some $d$ dimensional sphere. In such a model, recovery of the communities may be reduced to more combinatorial arguments. Indeed, the number of common neighbours can serve as an indicator for checking whether two nodes come from the same community. A similar idea was previously explored in [Reference Bubeck, Ding, Eldan and Rácz5], where a triangle count was used to establish a threshold for detecting latent geometry in random geometric graphs.

1.2 Percolation of geometric information in trees

The above theorem gives an upper bound for the threshold for reconstruction. The question of finding respective lower bounds, in the SBM, is usually reduced to a related but somewhat simpler model of percolation of information on trees. The idea is that in the sparse regime, the neighbourhood of each node in the graph is usually a tree, and it can be shown that recovering the community of a specific node based on observation of the entire graph, is more difficult than the recovery of its location based on knowledge of the community association of the leaves of a tree rooted at this node. For a formal derivation of this reduction (in the case of the SBM), we refer to [Reference Abbe1].This gives rise to the following model, first described in Mossel and Peres [Reference Mossel and Peres15] (see also [Reference Mossel14]): Consider a $q$ -ary tree $T=(V,E)$ of depth $k$ , rooted at $r \in V$ . Suppose that each node in $V$ is associated with a label $\ell \;:\; V \to \{1,..,k\}$ in the following way: The root $r$ is assigned with some label and then, iteratively, each node is assigned with its direct ancestor’s label with probability $p$ and with a uniformly picked label with probability $1-p$ (independent between the nodes at each level). The goal is then to detect the assignment of the root based on observation of the leaves. Let us now suggest an extension of this model to the geometric setting. We fix a Markov kernel $\varphi (x,y) = f(\langle x,y \rangle )$ , normalised such that $\int _{\mathbb{S}^{n-1}} \varphi (x,y) d \sigma (y) = 1$ for all $x \in \mathbb{S}^{d-1}$ . We define $g\;:\; T \to \mathbb{S}^{d-1}$ in the following way. For the root $r$ , $g(r)$ is picked according to the uniform measure. Iteratively, given that $g(v)$ is already set for all nodes $v$ at the $\ell$ -th level, we pick the values $g(u)$ for nodes $u$ at the $(\ell +1)$ th level independently, so that if $u$ is a direct descendant of $v$ , the label $g(u)$ is distributed according to the law $\varphi (g(v), \cdot ) d \sigma$ .

Denote by $T_k \subset V$ the set of nodes at depth $k$ , and define by $\mu _k$ the conditional distribution of $g(r)$ given $(g(v))_{v \in T_k}$ . We say that the model has positive information flow if

\begin{equation*} \lim _{k \to \infty } \mathbb E \left [ \mathrm {TV} (\mu _k, \sigma ) \right ] \gt 0. \end{equation*}

Remark that by symmetry, we have

\begin{equation*} \mathbb E \left [ \mathrm {TV} (\mu _k, \sigma ) \right ] = \mathbb E \left [ \mathrm {TV} (\mu _k, \sigma ) | g(r) = e_1 \right ] \end{equation*}

where $r$ is the root and $e_1$ is the north pole.

Our second objective in this work is to make the first steps towards understanding under which conditions the model has positive information flow, and in particular, our focus is on providing nontrivial sufficient conditions on $q, \varphi$ for the above limit to be equal to zero.

Let us first outline a natural sufficient condition for the information flow to be positive which, as we later show, turns out to be sharp in the case of Gaussian kernels. Consider the following simple observable,

\begin{equation*} Z_k \;:\!=\; \frac {1}{|T_k|} \sum _{v \in T_k} g(v). \end{equation*}

By Bayes’ rule, we clearly have that the model has positive information flow if (but not only if)

(5) \begin{equation} \liminf _{k \to \infty } \frac{ \mathbb E[ \langle Z_k, e_1 \rangle | g(r)=e_1] }{\sqrt{\mathrm{Var} \left [\langle Z_k, e_1 \rangle \right | g(r) = e_1 ]} } \gt 0. \end{equation}

This gives rise to the parameter

\begin{equation*} \lambda (\varphi ) \;:\!=\; \int _{\mathbb {S}^{d-1}} \langle x, e_1 \rangle \varphi (e_1, x) d\sigma (x), \end{equation*}

which is the eigenvalue corresponding to linear harmonics. As $\langle x, e_1 \rangle$ is an eigenfunction of the Markov kernel, it is preserved under repeated applications. Coupling this observation with the linearity of expectation, we have

(6) \begin{equation} \mathbb E[ \langle Z_k, e_1 \rangle | g(r)=e_1] = \lambda (\varphi )^k. \end{equation}

Moreover, by symmetry if $j \neq 1$ ,

\begin{equation*} 0 = \int _{\mathbb {S}^{d-1}} \langle x, e_j \rangle \varphi (e_1, x) d\sigma (x), \end{equation*}

from which we may conclude for any $v \in \mathbb{S}^{d-1}$ ,

(7) \begin{equation} \lambda (\varphi )\langle v, e_1 \rangle = \int _{\mathbb{S}^{d-1}} \langle x, v \rangle \varphi (e_1, x) d\sigma (x), \end{equation}

For two nodes $u,v \in T$ define by $c(u,v)$ the deepest common ancestor of $u,v$ and by $\ell (u,v)$ its level. A calculation gives

\begin{align*} \mathrm{Var} \left [\langle Z_k, e_1 \rangle \right | g(r) = e_1 ] & = \frac{1}{q^{2k}} \sum _{u,v \in T_k} \mathbb E \left [ g(v)_1 g(u)_1 | g(r) = e_1 \right ] - \lambda (\varphi )^{2k} \\[5pt] & = \frac{1}{q^{2k}} \sum _{u,v \in T_k} \mathbb E \left [ \mathbb E [g(v)_1 | g(c(u,v))] \mathbb E[g(u)_1 | g(c(u,v))] | g(r) = e_1 \right ] - \lambda (\varphi )^{2k} \\[5pt] & = \frac{1}{q^{2k}} \sum _{u,v \in T_k} \mathbb E \left [g(c(u,v))_1^2 | g(r) = e_1 \right ] \lambda (\varphi )^{2(k-\ell (u,v))} - \lambda (\varphi )^{2k} \\[5pt] & \leq \frac{1}{q^{2k}} \sum _{u,v \in T_k} \lambda (\varphi )^{2(k-\ell (u,v))} - \lambda (\varphi )^{2k} \\[5pt] & = \frac{\lambda (\varphi )^{2k}}{q^{2k}} \sum _{u,v \in T_k} \lambda (\varphi )^{-2\ell (u,v)} - \lambda (\varphi )^{2k} \\[5pt] & \leq \frac{\lambda (\varphi )^{2k}}{q^{2k}} \sum _{\ell =0}^k q^\ell q^{2(k-\ell )} \lambda (\varphi )^{-2\ell } - \lambda (\varphi )^{2k} = \lambda (\varphi )^{2k} \sum _{\ell =1}^k \left (q \lambda (\varphi )^2 \right )^{-\ell }, \end{align*}

where in the third equality, we have applied (6) and (7) to $\mathbb E [g(v)_1 | g(c(u,v))]$ and $\mathbb E[g(u)_1 | g(c(u,v))]$ . This gives a sufficient condition for (5) to hold true, concluding:

Claim 6. The condition $q \lambda (\varphi )^2 \gt 1$ is sufficient for the model to have positive percolation of information.

We will refer to this as the Kesten-Stigum (KS) bound.

We now turn to describe our lower bounds. For the Gaussian kernel, we give a lower bound which misses by a factor of $2$ from giving a matching bound to the KS bound. To describe the Gaussian kernel, fix $\beta \gt 0$ , let $X$ be a normal random variable with law $\mathcal{N}(0,\beta )$ and suppose that $\varphi \;:\; \mathbb{S}^1 \times \mathbb{S}^1 \to \mathbb R$ is such that

(8) \begin{equation} \varphi (x, \cdot )\text{ is the density of }(x + X) \mathrm{mod} 2 \pi, \end{equation}

where we identify $\mathbb{S}^1$ with the interval $[0,2 \pi )$ . We have the following result.

Theorem 7. For the Gaussian kernel defined above, there is zero information flow whenever $q \lambda (\varphi ) \lt 1$ .

Related results in the Gaussian setting

In the discrete setting, an analogous result was obtained in [Reference Mossel13]. In fact, our method of proof is closely related. We use the inherent symmetries of the spherical kernels to decouple the values of $g(r)$ and $g(v)$ , where $v$ is some vertex which is sufficiently distant from $r$ . This is same idea of Proposition 10 in [Reference Mossel13] which uses a decomposition of the transition matrix to deduce a similar conclusion. Another related result appears in [Reference Mossel, Roch and Sly16]. In the paper the authors consider a broadcast model on a binary tree with a Gaussian Markov random field and obtain a result in the same spirit as the one above. However, since the random field they consider is real valued, as opposed to our process which is constrained to the circle, the method of proof is quite different.

In the general case, we were unable to give a corresponding bound, nevertheless, using the same ideas, we are able to give some nontrivial sufficient condition for zero flow of information for some $q\gt 1$ , formulated in terms of the eigenvalues of the kernel. To our knowledge, this is the first known result in this direction. In order to formulate our result, we need some definitions.

We begin with a slightly generalised notion of a $q$ -ary tree.

Definition 8. Let $q \gt 1$ , we say that $T$ is a tree of growth at most $q$ if for every $k \in \mathbb N$ ,

\begin{equation*}\left |T_k\right | \leq \left \lceil q^k \right \rceil .\end{equation*}

Now, recall that $\varphi (x,y) = f(\langle x,y\rangle )$ . Our bound is proven under the following assumptions on the kernel.

  • $f$ is monotone.

  • $f$ is continuous.

  • $\lambda (\varphi ) \gt 0$ and for every $i \geq 1$ , $ |\lambda _i| \leq \lambda (\varphi )$ .

We obtain the following result.

Theorem 9. Let $\varphi$ satisfy the assumptions above and let $T$ be a tree of growth at most $q$ . There exists a universal constant $c \gt 0$ , such that if

\begin{equation*} q \leq \left (1-c\frac {\ln (\lambda (\varphi ))(1-\lambda (\varphi ))^2}{\ln \left (\frac {\lambda (\varphi )(1-\lambda (\varphi ))}{f(1)}\right )}\right )^{-1} \end{equation*}

then the model has zero percolation of information.

Possible connection to reconstruction on random geometric graphs

One might naively expect that the machinery developed in [Reference Mossel and Peres15] might be used, in conjunction with Theorems 7 and 9, to deduce lower bounds for reconstructability in random geometric graphs. However, in contrast to the SBM, if $v$ is a vertex and we condition on the latent positions of all vertices in the boundary of some ball around $v$ , the position of $v$ still retains some dependency with the position of vertices outside the ball.

Because of this fact, reducing a lower bound for the problem of recovering latent positions in random geometric graphs to the continuous broadcast model cannot be as straightforward as in the SBM. We leave the existence of a possible reduction as an interesting question for future research.

2. The upper bound: Proof of Theorem 3

Recall the following identity in $L^2(\mathbb{S}^{d-1})$ ,

\begin{align*} \varphi = \sum \limits _{k=0}^\infty \lambda _k \psi _k \otimes \psi _k, \end{align*}

with the eigenvalues $\lambda _k$ indexed by the spherical harmonics. Define the random matrices $M_n, \Psi _n$ by

\begin{equation*}(M_n)_{i,j} = \frac {1}{n}\varphi (X_i,X_j), \;\;\;\;\; (\Psi _n)_{i,k} = \frac {1}{\sqrt {n}}\psi _k(X_i).\end{equation*}

Note that $M_n$ is an $n \times n$ matrix, while $\Psi _n$ , has infinitely many columns. Furthermore, denote by $\Lambda$ the diagonal matrix $\mathrm{diag}\{\lambda _i\}_{i=0}^\infty$ . Then, $\sigma$ -almost surely, for every $i,j \in [n],$

\begin{equation*} (M_n)_{i,j} = \frac {1}{n}\sum _{k=0}^\infty \lambda _k\psi _k(X_i)\psi _k(X_j) =(\Psi _n\Lambda \Psi _n^T)_{i,j}. \end{equation*}

For $r \in \mathbb N$ we also denote

\begin{equation*}\varphi ^r \;:\!=\; \sum \limits _{k=0}^r \lambda _k \psi _k \otimes \psi _k,\end{equation*}

the finite rank approximation of $\varphi$ , $\Lambda ^r = \mathrm{diag}\{\lambda _k\}_{k=0}^r$ , and $\Psi _n^r$ the sub-matrix of $\Psi _n$ composed of its first $r$ columns. Finally, denote

\begin{equation*}M_n^r =\Psi _n^r\Lambda ^r(\Psi _n^r)^T.\end{equation*}

As before, let $A$ be an adjacency matrix drawn from $G\left (n, \frac{1}{n}\varphi (X_i,X_j)\right )$ so that $\mathbb E \left [A|X_1,\ldots,X_n\right ] = M_n$ . Our goal is to recover $\Psi _n^{d+1}$ from the observed $A$ . The first step is to recover $\Psi _n^{d+1}$ from $M_n$ . We begin by showing that the columns of $\Psi _n^r$ are, up to a small additive error, eigenvectors of $M_n^r$ . To this end, denote

\begin{equation*}E_{n,r} \;:\!=\; (\Psi _n^r)^T\Psi _n^r - \mathrm {Id}_r,\end{equation*}

$C(n,r) = \left \lVert E_{n,r}\right \rVert ^2_{op}$ , and $K = \max _i{\lambda _i}$ .

Lemma 10. Let $u_i$ be the $i$ ’th column of $\Psi _n$ and let $\eta \gt 0$ . Then

\begin{equation*} \left \lVert M_n^ru_i - \lambda _iu_i\right \rVert _2^2\leq K^2(\sqrt {C(n,r)}+1)C(n,r). \end{equation*}

Moreover, whenever $n\geq l(r) \log (2r/\eta )$ , we have with probability larger than $1-\eta$ ,

\begin{equation*} C(n,r) \leq \frac {4 l(r)\log (2r/\eta )}{n}. \end{equation*}

where $l(r)$ only depends on $r$ and on the dimension.

Proof. Let $e_i\in \mathbb R^r$ be the $i$ ’th standard unit vector so that $u_i = \Psi _n^re_i$ . So,

\begin{equation*}(\Psi _n^r)^Tu_i = (\Psi _n^r)^T\Psi _n^re_i =(\mathrm {Id}_r + (\Psi _n^r)^T\Psi _n^r - \mathrm {Id}_r)e_i=e_i + E_{n,r}e_i.\end{equation*}

We then have

\begin{align*} M_n^ru_i &= \Psi _n^r\Lambda ^r(\Psi _n^r)^Tu_i = \Psi _n^r\Lambda ^re_i + \Psi _n^r\Lambda ^rE_{n,r}e_i \\[5pt] &= \lambda _i\Psi _n^re_i + \Psi _n^r\Lambda ^rE_{n,r}e_i = \lambda _iu_i + \Psi _n^r\Lambda ^rE_{n,r}e_i. \end{align*}

To bound the error, we estimate $\left \lVert M_n^ru_i - \lambda _iu_i\right \rVert _2^2=\left \lVert \Psi _n^r\Lambda ^rE_{n,r}e_i\right \rVert _2^2$ as

\begin{align*} \langle \Lambda ^rE_{n,r}e_i,(\Psi _n^r)^T\Psi _n^r\Lambda ^rE_{n,r}e_i\rangle &= \langle \Lambda ^rE_{n,r}e_i,E_{n,r}\Lambda ^rE_{n,r}e_i\rangle + \left \lVert \Lambda ^rE_{n,r}e_i\right \rVert _2^2 \\[5pt] &\leq \left (\sqrt{C(n,r)}+1\right )\left \lVert \Lambda ^rE_{n,r}e_i\right \rVert _2^2 \leq K^2\left (\sqrt{C(n,r)}+1\right )C(n,r). \end{align*}

It remains to bound $C(n,r)$ . Let $X_i^r = ( \psi _0(X_i),\ldots, \psi _{r-1}(X_i))$ stand for the $i$ ’th row of $\Psi _n^r$ . Then, $E_{n,r} = \frac{1}{n}\sum _{i=1}^n\left ((X_i^r)^T X_i^r - \mathrm{Id}_r\right )$ , is a sum of independent, centred random matrices. We have

\begin{align*} \sigma _{n,r}^2 &\;:\!=\; \left \lVert \mathbb E\left ( \frac{1}{n}\sum _{i=1}^n\left ((X_i^r)^T X_i^r - \mathrm{Id}_r\right )\right )^2\right \rVert _{op} \\[5pt] &= \frac{1}{n} \left \lVert \mathbb E \left ( (X_1^r)^T X_1^r - \mathrm{Id}_r\right )^2\right \rVert _{op} \end{align*}

Furthermore, the norm of the matrices can be bounded by

\begin{align*} \left \lVert \frac{1}{n}\left ((X_1^r)^T X_1^r - \mathrm{Id}_r\right )\right \rVert _{op} &=\frac{1}{n} \max (1, \left \lVert X_1^r\right \rVert _2^2 -1) \\[5pt] &\leq \frac{1}{n}\max \left (1, \left \lVert \sum _{i=0}^r \psi _i^2\right \rVert _\infty - 1\right ). \end{align*}

Note that the right-hand side of the two last displays are of the form $\frac{1}{n} l(r)$ where $l(r)$ depends only on $r$ and $d$ (not on $n$ ). Applying matrix Bernstein ([Reference Tropp19, Theorem 6.1]) then gives

\begin{equation*} \mathbb P\left ( \left \lVert E_{n,r}\right \rVert _{op}\geq t \right ) \leq 2r \exp \left (-\frac {n}{2l(r)}\frac {t^2}{1+t/3} \right ), \end{equation*}

where $l(r)$ depends only on $r$ and $d$ . Choose now $t_0 = \frac{4 l(r)\log (2r/\eta )}{n}$ . As long as $n\geq l(r) \log (2r/\eta )$ , $t_0 \leq 4$ , and the above bound may be refined to

\begin{equation*} \mathbb P\left ( \left \lVert E_{n,r}\right \rVert _{op}\geq t_0 \right ) \leq 2r \exp \left (-\frac {n}{l(r)}\frac {t_0^2}{7} \right ). \end{equation*}

With the above conditions, it may now be verified that $2r \exp \left (-\frac{n}{l(r)}\frac{t_0^2}{7} \right ) \leq \eta$ , and the proof is complete.

We now show that as $r$ increases, the subset of eigenvectors of $M_n^r$ , that encode the latent positions of $\{X_i\}_{i=1}^n$ , converge to those of $M_n$ . Order the eigenvalues in decreasing order $\lambda _{l_0}\geq \lambda _{l_1} \geq \lambda _{l_2} \geq \ldots$ and let $\Lambda _{\gt r} = \sum _{i=r}^\infty \lambda _{l_i}^2$ . Note that it follows from assumption A2 that $\Lambda _{\gt r} = O(r^{-3})$ . We will denote by $\lambda _i(M_n), \lambda _i(M_n^r)$ the respective eigenvalues of $M_n$ and $M_n^r$ , ordered in a decreasing way, and by $v_i(M_n),v_i(M_n^r)$ their corresponding unit eigenvectors. Suppose that $s$ is such that

(9) \begin{equation} \lambda (\varphi ) = \lambda _{l_{s+1}} = \ldots = \lambda _{l_{s+d}}. \end{equation}

Moreover, define

\begin{equation*} V_n \;:\!=\; \mathrm {span}(v_{l_{s+1}}(M_n),\ldots, v_{l_{s+d}}(M_n)), \;\;\; V_n^r \;:\!=\; \mathrm {span}\left (v_{l_{s+1}}(M_n^r),\ldots, v_{l_{s+d}}(M_n^r)\right ). \end{equation*}

The next lemma shows that $V_n$ is close to $V_n^r$ whenever both $n$ and $r$ are large enough.

Lemma 11. Let $\delta$ and $C$ be the constants from Assumption A1 and Assumption A2. For all $n,r$ , let $P_{n,r}$ be the orthogonal projection onto $V_n^r$ . Then, for all $\eta \gt 0$ there exist constants $n_0,r_0$ , which may depend on $\delta$ , $C$ , and $\eta$ , such that for all $n\gt n_0$ and $r\gt r_0$ , we have with probability at least $1-\eta$ that, for all $w\in V_n$ ,

\begin{equation*}\left \lVert w - P_{n,r}w\right \rVert _2 \leq \frac {4C}{\eta \delta ^2 r^3 }.\end{equation*}

Proof. We have

\begin{align*} \mathbb E \left \lVert M_n - M_n^r\right \rVert ^2_{F} &= \sum _{i,j} \mathbb E (M_n -M_n^r)^2_{i,j} \\[5pt] & = \sum _{i,j} \frac{1}{n^2} \mathbb E \left (\sum _{k=r}^\infty \lambda _k\psi _k(X_i)\psi _k(X_j)\right )^2 \\[5pt] & = \mathbb E_{x,y \sim \sigma } \left ( \sum _{k=r}^\infty \lambda _k \psi _k(x)\psi _k(y)\right )^2 \\[5pt] & = \sum _{k=r}^\infty \lambda _k^2 = \Lambda _{\gt r}. \end{align*}

Applying Markov’s inequality gives that with probability $1-\eta$

(10) \begin{equation} \left \lVert M_n - M_n^r\right \rVert ^2_{F} \leq \frac{\Lambda _{\gt r}}{\eta }\leq \frac{C}{\eta r^3}. \end{equation}

Theorem 1 in [Reference Valdivia20] shows that there exists $n$ large enough such that with probability larger than $1-\eta$ , one has

\begin{equation*} |\lambda _i(M_n) - \lambda _{l_{s+i}}| \leq \delta/ 4, \end{equation*}

with $\delta$ being the constant from Assumption A1. It follows that

(11) \begin{equation} \lambda _{l_{s+1}}(M_n),\ldots,\lambda _{l_{s+d}}(M_n) \in \left [\lambda (\varphi )-\frac{\delta }{4}, \lambda (\varphi )+\frac{\delta }{4}\right ], \end{equation}

while by (10) and Weyl’s perturbation theorem (e.g., [Reference Bhatia3, Corollary III.2.6]), for $r$ large enough with probability $1-\eta$ ,

(12) \begin{equation} \lambda _i(M_n^r) \notin \left [\lambda (\varphi )-\frac{3\delta }{4}, \lambda (\varphi )+\frac{3\delta }{4}\right ], \text{ for } i \neq l_{s+1},\ldots l_{s+d}. \end{equation}

Combining (10), (11) and (12) it follows from the classical Davis-Kahan theorem (see e.g. [Reference Bhatia3, Section VII.3]) that with probability at-least $1-2\eta$ , for every $w \in V_n$ ,

\begin{equation*}\left \lVert w - P_{n,r}w\right \rVert _2^2 \leq \frac {4C}{\eta \delta ^2 r^3}.\end{equation*}

Denote

\begin{equation*} G_n \;:\!=\; \frac {1}{d} \sum _{k=1}^d v_{l_{s+k}}(M_n) v_{l_{s+k}}(M_n)^T, \;\;\; (G'_{\!\!n})_{i,j} = \frac {1}{n}\langle X_i, X_j \rangle . \end{equation*}

A combination of the last two lemmas produces the following:

Theorem 12. One has

\begin{equation*} \| G_n - G'_{\!\!n} \|_F \to 0 \end{equation*}

in probability, as $n \to \infty$ .

Proof. Denote

\begin{equation*} G_n^r\;:\!=\;\frac {1}{d}\sum _{k=1}^d v_{l_{s+k}}(M_n^r)(v_{l_{s+k}}(M_n^r))^T. \end{equation*}

Then

\begin{equation*} \| G_n - G'_{\!\!n} \|_F \leq \|G_n^r - G'_{\!\!n}\|_F + \|G_n-G_n^r\|_F \end{equation*}

We will show that the two terms on the right hand side converge to zero. Let $r(n)$ be a function converging to infinity slowly enough so that $C(n,r)\to 0$ , for the constant $C(n,r)$ defined in Lemma 10. Taking $\eta = \eta (n)$ to converge to zero slowly enough and applying Lemma 10, gives for all $1 \leq i \leq d$ ,

\begin{equation*} \|(M_n^r - \lambda (\varphi )\mathrm {Id}_n) u_i \|_2^2 \leq \varepsilon _{n} \end{equation*}

with $u_i$ the $i$ ’th column of $\Psi _n^{r}$ and where $\varepsilon _n \to 0$ as $n\to \infty$ . Now, if we write

\begin{equation*} u_i = \sum _{j=0}^\infty \alpha _{i,j} v_j(M_n^r), \end{equation*}

the last inequality becomes

\begin{equation*} \sum _j |\lambda _j(M_n^r) -\lambda (\varphi )|^2 \alpha _{i,j}^2 = \sum _j |(M_n^r - \lambda (\varphi )\mathrm {Id}_n) v_j(M_n^r) |^2 \alpha _{i,j}^2 \leq \varepsilon _n, \;\; \forall 1 \leq i \leq d. \end{equation*}

Recall that, by (12), for $j \notin \{l_{s+1},..,l_{s+d}\}$ , we have $\frac{3\delta }{4}\leq |\lambda _j(M_n^r) -\lambda (\varphi )|$ . Hence.

(13) \begin{equation} \sum _{j \notin \{l_{s+1},..,l_{s+d}\}} \alpha _{i,j}^2 \leq \frac{16 \varepsilon _n}{9\delta ^2} \to 0, \end{equation}

and thus

\begin{equation*} \left \|u_i - \sum _{k=1}^d \alpha _{i, l_{s+k}} v_{l_{s+k}}(M_n^r)\right \|_2^2\to 0, \;\; \forall 1 \leq i \leq d. \end{equation*}

Define a $d\times d$ -matrix $B$ by $B_{i,j}=\alpha _{i, l_{s+j}}$ . Then we can rewrite the above as

\begin{equation*} \left \|(u_1,\ldots,u_d) - (v_{l_{s+1}}(M_n^r),\ldots, v_{l_{s+d}}(M_n^r))\cdot B\right \|_F^2 \to 0. \end{equation*}

Now, since for two $n\times d$ matrices $R,S$ we have $\|RR^T - SS^T\|_F\leq (\|R\|_{op} +\|S\|_{op})\|R-S\|_F$ . It follows that

(14) \begin{equation} \|G'_{\!\!n} - (v_{l_{s+1}}(M_n^r), \ldots, v_{l_{s+d}}(M_n^r))BB^T(v_{l_{s+1}}(M_n^r), \ldots, v_{l_{s+d}}(M_n^r))^T\|_F \to 0. \end{equation}

Observe that

\begin{equation*} B_{i,j} = \langle u_i, u_j \rangle - \sum _{k \notin \{l_{s+1},..,l_{s+d}\}} \alpha _{i,k} \alpha _{j,k}, \end{equation*}

implying that

\begin{equation*} |(B B^T)_{i,j} - (E_{i,j} + \mathrm {Id}_d)| \leq \sqrt { \sum _{k \notin \{l_{s+1},..,l_{s+d}\}} \alpha _{i,k}^2 \sum _{k \notin \{l_{s+1},..,l_{s+d}\}} \alpha _{j,k}^2 } \stackrel {\text {13}}{\longrightarrow } 0. \end{equation*}

where $E = E_{n,r}$ . Consequently we have

\begin{equation*} \|B B^T - \mathrm {Id}_d \|_{op} \leq \sqrt { \sum _{k \notin \{l_{s+1},..,l_{s+d}\}} \alpha _{i,k}^2 \sum _{k \notin \{l_{s+1},..,l_{s+d}\}} \alpha _{j,k}^2 } + \sqrt {C(n,r)} \to 0, \end{equation*}

which implies that

\begin{align*} \|v_{l_{s+1}}(M_n^r), \ldots, v_{l_{s+d}}(M_n^r))(BB^T-\mathrm{Id}_d)(v_{l_{s+1}}(M_n^r), \ldots, v_{l_{s+d}}(M_n^r))^T\|_F^2\to 0. \end{align*}

Combining with (14) finally yields

\begin{equation*} \|G'_{\!\!n} - G_n^r\|_F \to 0. \end{equation*}

in probability, as $n\to \infty$ . If $P$ is the orthogonal projection onto $V^r_n=\mathrm{span}(v_{l_{s+1}}(M_n^r),\ldots,v_{l_{s+d}}(M_n^r))$ , and $Q$ is the orthogonal projection onto $\mathrm{span}(v_{l_{s+1}}(M_n),\ldots,v_{l_{s+d}}(M_n))$ , then Lemma 11 shows that for all $\eta \gt 0$ , with probability at least $1-\eta$ , as $n\to \infty$ (and $r=n^{\frac{1}{2d}}\to \infty$ ), we have for every unit vector $v$

(15) \begin{equation} |(P - \mathrm{Id}_n) Q v | \leq \varepsilon _n \end{equation}

with some $\varepsilon _n \to 0$ . By symmetry, we also have for every unit vector $v$ that

\begin{equation*} |(Q - \mathrm {Id}_n) P v | \leq \varepsilon _n \end{equation*}

(this uses that fact that both $P$ and $Q$ are projections into subspaces of the same dimension). The last two inequalities easily yield that $\|P - Q\|_{op} \leq \varepsilon _n$ . Since this is true for every $\eta \gt 0$ , it follows that

\begin{equation*} \|G_n-G_n^r\|_F\to 0, \end{equation*}

in probability, as $n\to \infty$ .

Now, after establishing that $M_n$ is close to $\Psi _n^{d+1}$ , the second step is to recover $M_n$ (and therefore $\Psi _n^{d+1}$ ), from the observed $A$ . For the proof we will need the following instance of the Davis-Kahan theorem.

Theorem 13. ([Reference Yu, Wang and Samworth21, Theorem 2]). Let $X,Y$ be symmetric matrices with eigenvalues $\lambda _0\geq \ldots \geq \lambda _p$ resp. $\hat \lambda _0\geq \ldots \geq \hat{\lambda }_p$ with corresponding orthonormal eigenvectors $v_0,\ldots,v_p$ resp. $\hat v_0,\ldots, \hat v_p$ . Let $V=(v_{s+1},\ldots, v_{s+d})$ and $\hat V = (\hat v_{s+1},\ldots, \hat v_{s+d})$ . Then there exists an orthogonal $ d \times d$ matrix $R$ such that

\begin{equation*} \| \hat V R - V \|_{F} \leq \frac {2^{3/2} \min (d^{1/2}\| Y-X\|_{op}, \|Y-X\|_F)}{\min (\lambda _{s}-\lambda _{s+1}, \lambda _{s+d}-\lambda _{s+d+1})}. \end{equation*}

Our main tool to pass from the expectation of the adjacency matrix to the matrix itself is the following result regarding concentration of random matrices, which follows from [Reference Le, Levina and Vershynin11, Theorem 1.1] (see also [Reference Le, Levina and Vershynin12, Theorem 5.1]).

Theorem 14. Let $A$ be the adjacency matrix of a random graph drawn from $G\left (n, \frac{1}{n} \varphi (X_i, X_j)\right )$ . For every vertex of degree larger than $\max _{ij}\varphi (X_i,X_j)$ , remove all incident edges and let $A'$ denote the new adjacency matrix. Then,

\begin{equation*} \|A'-M_n\| \leq C \sqrt {\|\varphi \|_\infty }. \end{equation*}

We can now prove the main reconstruction theorem.

Proof of Theorem 3. Let $A$ be the adjacency matrix of a random graph drawn from the model $G\left (n, \frac{1}{n} \varphi (X_i, X_j)\right )$ and let $A'$ be the adjacency matrix defined in Theorem 14.

Denote by $\lambda'_{\!\!0}\geq \lambda'_{\!\!1}\geq \ldots$ the eigenvalues of $A'$ and by $v'_{\!\!0}, v'_{\!\!1},\ldots$ the corresponding orthonormal eigenvectors. Let $Y = (v'_{\!\!l_{s+1}},\ldots,v'_{\!\!l_{s+d}})$ . By Theorem 13 there exists an $R\in \mathcal{O}(d)$ such that

\begin{align*} \left \lVert \left (v_{l_{s+1}}(M_n),\ldots,v_{l_{s+d}}(M_n)\right ) - YR \right \rVert _{F} &\leq \frac{2^{3/2} d^{1/2}\| M_n - A'\|_{op}}{\min _{i\neq 1,\ldots,d}|\lambda _{i}-\lambda (\varphi )|}. \end{align*}

Hence by Theorem 14 we have

\begin{equation*} \left \lVert \left (v_{l_{s+1}}(M_n),\ldots,v_{l_{s+d}}(M_n)\right ) - YR \right \rVert _{F} \leq C \sqrt { d} \cdot \frac {\sqrt {\|\varphi \|_\infty }}{ \min _{i\neq 1,\ldots,d}|\lambda _{i}-\lambda (\varphi )|}, \end{equation*}

with probability $1-n^{-1}$ . It follows that

\begin{equation*} \left \lVert G_n - Y Y^T \right \rVert _{F} \leq C \sqrt { d} \cdot \frac {\sqrt {\|\varphi \|_\infty }}{ \min _{i\neq 1,\ldots,d}|\lambda _{i}-\lambda (\varphi )|} \end{equation*}

Combining this with Theorem 12 yields

\begin{align*} \left \lVert G'_{\!\!n} - Y Y^T \ \right \rVert _{F} \leq C \sqrt{d} \cdot \frac{\sqrt{\|\varphi \|_\infty }}{\min _{i\neq 1,\ldots,d}|\lambda _{i}-\lambda (\varphi )|}, \end{align*}

So,

\begin{equation*} \frac {1}{n^2} \sum _{i,j} \left |X_i \cdot X_j - (n Y Y^T)_{i,j} \right |^2 \leq C d \frac {\|\varphi \|_\infty }{\min _{i\neq 1,\ldots,d}|\lambda _{i}-\lambda (\varphi )|^2} \end{equation*}

which gives the desired reconstruction bound.

3. Lower bounds

Our approach to proving lower bounds will be to exploit some symmetries which are inherent to well behaved kernel functions. We thus make the following definition:

Definition 15. (DPS property). Let $\mu$ be a probability measure on $\mathbb{S}^{d-1}$ , and let $w \in \mathbb{S}^{d-1}$ . We say that $\mu$ has the diminishing post-translation symmetry (DPS) around $w$ property with constant $\varepsilon$ , if there exists a decomposition,

\begin{equation*}\mu = (1-\varepsilon )\mu _w + \varepsilon \mu _w^s.\end{equation*}

Here $\mu _w^s$ is a probability measure, invariant to reflections with respect to $w^\perp$ . In other words, if $R = \mathrm{Id}_d - 2ww^T$ , $R_*\mu _w^s = \mu _w^s$ . For such a measure we denote $\mu \in \mathrm{DPS}_w(\varepsilon )$ .

If instead $\mu$ is a measure on $(\mathbb{S}^{d-1})^{|T_k|}$ , we say that $\mu \in \mathrm{DPS}^k_w(\varepsilon )$ if a similar decomposition exists but now the reflections should be applied to each coordinate separately.

We now explain the connection between the DPS property and percolation of information in trees. For this, let us recall the random function $g:T \to \mathbb{S}^{d-1}$ , introduced in Section 1.2, which assigns to the root, $r$ , a uniformly random value and for any other $u \in T$ , the label $g(u)$ is distributed according to $\varphi (g(\mathrm{parent}(u)),\cdot )d\sigma$ .

Lemma 16. Suppose that there exist a sequence $(p_k)_k$ with $\lim _{k\to \infty } p_k= 1$ such that for every $w,x_0 \in \mathbb{S}^{d-1}$ and every $k \gt 0$ ,

\begin{equation*}\mathrm {Law}((g(v))_{v\in T_k}|g(r) = x_0) \in \mathrm {DPS}^k_w(p_k).\end{equation*}

Then there is zero percolation of information along $T$ .

Proof. Denote $X = g(r)$ and $Y = g(v)_{v\in T_k}$ and let $\rho _{X|Y}$ be the density of $X|Y$ with respect to $\sigma$ . Our aim is to show that $\mathbb E_Y\left [\mathrm{TV}(X|Y,\sigma )\right ] = o(1)$ . We first claim that it is enough to show that for all $x,x' \in \mathbb{S}^{d-1}$ and all $\delta \gt 0$ one has,

(16) \begin{equation} \mathbb P \left . \left ( \frac{\rho _{X|Y}(x) }{ \rho _{X|Y}(x') } - 1 \leq \delta \right | X = x \right ) = 1-o(1). \end{equation}

Indeed, let $H = \left \{\frac{\rho _{X|Y}(x) }{ \rho _{X|Y}(x') } - 1 \leq \delta \right \}$ . Note that, by Bayes’ rule,

\begin{equation*} \frac {\rho _{X|Y}(x) }{ \rho _{X|Y}(x') } = \frac {\rho _{Y|X=x}(Y) }{ \rho _{Y|X=x'}(Y) }. \end{equation*}

Let $x \in \mathbb{S}^{d-1}$ be some fixed point and let $x'$ be uniformly distributed is $\mathbb{S}^{d-1}$ and independent from $Y$ . We will use $\mathbb E_{x'}$ to denote integration with respect to $x'$ , which has density $\rho _X$ . Similarly, $\mathbb E_{Y}$ stands for integration with respect to the density $\rho _Y$ of $Y$ and $\mathbb{E}_{x', Y}$ is with respect to the product $\rho _X\cdot \rho _Y$ . The symbol $\mathbb P$ will always mean probability over $x'$ and $Y$ .

As a consequence of the assumption in (16), as long as $k$ is large enough, we have,

\begin{equation*}\mathbb P\left (H|X=x\right ) \geq 1-\delta .\end{equation*}

For any Borel measurable set $A$ , we denote the projection:

\begin{align*} \Pi _YA &= \{y \in (\mathbb{S}^{d-1})^{|T_k|}\;:\; (\tilde{x}, y) \in A \text{ for some } \tilde{x} \in \mathbb{S}^{d-1}\}. \end{align*}

Define now the set

\begin{equation*}H' \;:\!=\; \{(\tilde {x}, y)| (\tilde {x},y)\in H \text { and } \mathbb {E}_{x'}[\textbf {1}_H(\cdot, y)]\geq 1 - 4\delta \}.\end{equation*}

In particular, for every $y \in \Pi _YH'$ ,

\begin{equation*}\mathbb {E}_{x'}[\textbf {1}_H(\cdot, y)]\geq 1 - 4\delta,\end{equation*}

and by Fubini’s theorem, $\mathbb P\left (H'|X = x\right ) \geq 1 - 4\delta$ .

Consider the random variables $\alpha \;:\!=\; \alpha (Y) = \frac{\rho _{Y|X=x}(Y)}{ \rho _{Y}(Y) }$ and $\beta \;:\!=\;\beta (x',Y) = \frac{\rho _{Y|X=x'}(Y)}{\rho _Y(Y)}$ . By definition of $H$ , we have

(17) \begin{equation} (1-\delta )\textbf{1}_H\alpha \leq \textbf{1}_H\beta . \end{equation}

Moreover, for almost every $Y$ ,

\begin{equation*}\mathbb E_{x'}\left [\beta \right ] = \frac {1}{\rho _Y(Y)}\int \rho _{Y|X=x'}(Y)\rho _X(x') = \frac {1}{\rho _Y(Y)}\rho _Y(Y) = 1.\end{equation*}

So,

\begin{equation*} (1-\delta )(1-4\delta )\mathbb E_Y\left [\alpha \textbf {1}_{ \Pi _Y{H'}}\right ]\leq (1-\delta )\mathbb E_{Y}\left [\alpha \mathbb E_{x'}\left [\textbf {1}_{H'}\right ]\right ] \leq \mathbb E_{x',Y}\left [\beta \textbf {1}_{H'}\right ]\leq \mathbb E_{x',Y}\left [\beta \right ]=1. \end{equation*}

Observe that $\mathbb E_Y\left [\alpha \textbf{1}_{ \Pi _Y{H'}}\right ] = \mathbb P\left (\Pi _Y{H'}|X=x\right ) = 1- o(1)$ , where the second equality is a consequence of Fubini’s theorem. Hence, let us write $h_1(\delta )$ , so that

\begin{equation*}1 - h_1(\delta ) = (1-\delta )(1-4\delta )\mathbb E_Y\big[\alpha \textbf {1}_{ \Pi _Y{H'}}\big] \leq (1-\delta )\mathbb E_{x', Y}\left [\alpha \textbf {1}_{H'}\right ].\end{equation*}

Markov’s inequality then implies,

(18) \begin{equation} \mathbb P\left (\beta \textbf{1}_{H'}\geq (1-\delta )\alpha \textbf{1}_{H'}+\sqrt{h_1(\delta )}\right )\leq \frac{\mathbb E_{x', Y}\left [\beta \textbf{1}_{H'}\right ]-(1-\delta )\mathbb E_{x', Y}\left [\alpha \textbf{1}_{H'}\right ]}{\sqrt{h_1(\delta )}}\leq \sqrt{h_1(\delta )}. \end{equation}

Now, we integrate over $x'$ to obtain,

\begin{equation*}(1-\delta )(1-4\delta )\alpha \textbf {1}_{\Pi _Y{H'}}\leq (1-\delta )\alpha \mathbb E_{x'}\left [\textbf {1}_{H'}\right ] \leq \mathbb E_{x'}\left [\beta \textbf {1}_{H'}\right ] \leq 1,\end{equation*}

which implies

(19) \begin{equation} \alpha \textbf{1}_{H'} \leq \frac{1}{(1-\delta )(1-4\delta )}. \end{equation}

Keeping in mind the previous displays, we may choose $h_2(\delta )$ , which satisfies $\lim \limits _{\delta \to 0} h_2(\delta ) = 0$ , $\alpha \textbf{1}_{H'} \leq 1 + h_2(\delta )$ and $\mathbb E_{x', Y}\left [\alpha \textbf{1}_{H'}\right ] \geq 1 - h_2(\delta )$ .

So, an application of the reverse Markov’s inequality for bounded and positive random variables shows,

(20) \begin{align} \mathbb P\left (\alpha \textbf{1}_{H'}\geq 1-\sqrt{h_2(\delta )}\right ) \geq \frac{\mathbb E_{x', Y}\left [\alpha \textbf{1}_{H'}\right ] - (1-\sqrt{h_2(\delta )})}{1+h_2(\delta ) - (1-\sqrt{h_2(\delta )})} &\geq \frac{1-h_2(\delta ) - (1-\sqrt{h_2(\delta )})}{1+h_2(\delta ) - (1-\sqrt{h_2(\delta )})}\nonumber \\[5pt] &= \frac{\sqrt{h_2(\delta )} - h_2(\delta )}{\sqrt{h_2(\delta )} + h_2(\delta )}. \end{align}

Note that as $\delta \to 0$ the RHS goes to $1$ . Thus, by combining the above displays, there exists a function $h$ , which satisfies $\lim \limits _{\delta \to 0}h(\delta ) = 0$ and some $H'' \subset H'$ , with $\mathbb P\left (H'\right ) \geq 1 -h(\delta )$ , such that, by (19) and (20),

\begin{equation*} \textbf {1}_{H''}|\alpha -1|\leq h(\delta ), \end{equation*}

which implies, together with (17) and (18)

\begin{equation*} \textbf {1}_{H''}|\alpha -\beta |\leq h(\delta ). \end{equation*}

This then gives

\begin{equation*} \textbf {1}_{H''}|1 -\beta |\leq 2h(\delta ). \end{equation*}

We can thus conclude,

\begin{align*} \mathbb E_Y \mathrm{TV} \left ( X|Y, X \right ) &= \mathbb E_{Y,x'} \left [ |\beta - 1| \right ] = 2\mathbb E_{Y,x'} \left [(1-\beta )\textbf{1}_{\beta \leq 1} \right ] \\[5pt] &\leq 2 \mathbb E_{Y,x'} \left [(1-\beta )\textbf{1}_{\beta \leq 1}\textbf{1}_{H''}\right ] +1-\mathbb P\left (H''\right )\\[5pt] &= 2 \mathbb E_{Y,x'} \left [|1-\beta |\textbf{1}_{H''}\right ] +h(\delta )\\[5pt] &\leq 5h(\delta ).\\[5pt] \end{align*}

Take now $\delta \to 0$ to get $\mathbb E_Y \mathrm{TV} \left ( X|Y, X \right ) \to 0$ .

Thus, we may assume towards a contradiction that there exist $x, x' \in \mathbb{S}^{d-1}$ and a set $F\subset (\mathbb{S}^{d-1})^{|T_k|}$ , such that

(21) \begin{equation} \mathbb P\left (Y \in F | X = x \right ) \geq \delta, \end{equation}

and under $\{Y \in F \}$ ,

(22) \begin{equation} \frac{\rho _{X|Y}(x)}{\rho _{X|Y}(x')} \geq 1+ \delta, \end{equation}

for some constant $\delta \gt 0$ .

Let $w\in \mathbb{S}^{d-1}$ be such that the reflection $R\;:\!=\;\mathrm{Id}_d - 2ww^T$ satisfies $Rx = x'$ . Under our assumption, there exists an event $A_k$ , which satisfies

\begin{equation*} \mathbb P(A_k|X=x) = 1 - o(1) \end{equation*}

and such that $Y|(X=x,A_k)$ is $R$ -invariant. By (21), we also have

\begin{equation*}\mathbb P(A_k|X=x,Y \in F) = 1 - o(1),\end{equation*}

and thus there exists $y \in F$ such that

(23) \begin{equation} \mathbb P(A_k | X=x,Y = y ) = 1-o(1). \end{equation}

By continuity, we can make sense of conditioning on the zero probability event $E \;:\!=\; \bigl \{X = x, \; Y \in \{y, R y\} \bigr \}$ , in the sense of regular probabilities. Note that we have by symmetry and since $y \in F$ ,

(24) \begin{equation} Y = y \;\; \Rightarrow \;\; 1+\delta \leq \frac{\rho _{X|Y}(x)}{\rho _{X|Y}(x')} = \frac{\mathbb P( Y = y | E )}{ \mathbb P( Y = R y | E )}. \end{equation}

On the other hand, we have by definition of $A_k$ ,

\begin{equation*} \mathbb P( Y = R y | E, A_k ) = \mathbb P( Y = y | E, A_k ), \end{equation*}

which implies that

\begin{align*} \mathbb P( Y = R y | E ) & \geq \mathbb P \bigl ( \{Y = R y \} \cap A_k | E \bigr ) \\[5pt] & = \mathbb P \bigl ( \{Y = y \} \cap A_k | E \bigr ) \\[5pt] & \geq \mathbb P( Y = y | E ) (1-o(1)) \end{align*}

which contradicts (24). The proof is complete.

3.1 The Gaussian case

Our aim is to show that certain classes of kernel functions satisfy the DPS condition. We begin by considering the case where the kernel $\varphi$ is Gaussian, as in (8). In this case, the function $g$ may be defined as follows. Let $T$ be a $q$ -ary tree. To each edge $e \in E(T)$ we associate a Brownian motion $(B_e(t))_{t \in (0,1)}$ of rate $\beta$ such that for every node $v \in V$ we have

\begin{equation*} g(v) = \sum _{e \in P(v)} B_e(1) \; \mathrm {mod} 2 \pi \end{equation*}

where $P(v)$ denotes the shortest path from the root to $v$ .

For every node $v \in T_k$ let us now consider the Brownian motion $(B_v(t))_{t=0}^k$ defined by concatenating the Brownian motions $B_e$ along the edges $e \in P(v)$ . Define by $E_v$ the event that the image of $B_v \mathrm{mod} \pi$ contains the entire interval $[0, \pi )$ , and define $E_k = \bigcap _{v \in T_k} E_v$ . Our lower bound relies on the following observation.

Claim 17. Fix $v\in T_k$ and set $p_k \;:\!=\; \mathbb P(E_k)$ . Then, for every $\theta,x_0 \in \mathbb{S}^1$ , $\mathrm{Law}(g(v)|g(r)=x_0)\in \mathrm{DPS}^k_\theta (p_k).$

Proof. Fix $\theta \in [0, \pi )$ . Given the event $E_v$ , we have almost surely that there exists a time $t_v \leq k$ such that $B_v(t_v) \in \{\theta - \pi, \theta \}$ . By symmetry and by the Markov property of Brownian motion, we have that the distribution of $g(v)$ conditioned on the event $\{B_v(t_v) = \theta, \;\; \forall v\}$ is symmetric around $\theta$ . Thus, by considering $(B_v(t))_{v\in T_k}$ , under the event $\{t_v \leq k|v \in T_k \}$ we get that for any $x_0$ , $\mathrm{Law}((g(v))_{v\in T_k}|g(r)=x_0)$ is symmetric around $\theta$ . So,

\begin{equation*}\mathrm {Law}((g(v))_{v\in T_k}|g(r)=x_0)\in \mathrm {DPS}^k_\theta (p_k).\end{equation*}

We will also need the following bound, shown for example in [Reference Ernst and Shepp8].

Lemma 18. Let $B(t)$ be a Brownian motion of rate $\beta$ on the unit circle. Then,

\begin{equation*} \mathbb P( \mathrm {Image} (B(s) \mathrm {mod} \pi )_{0 \leq s \leq t} = [0, \pi ) ) \geq 1- C e^{- t \beta/2}. \end{equation*}

We are now in a position to prove Theorem 7.

Proof of Theorem 7. Lemma 18 immediately implies that for all $v \in T_k$ ,

\begin{equation*} \mathbb P(E_v) \geq 1 - C e^{-k \beta/ 2}. \end{equation*}

On the other hand, a calculation gives $\lambda (\varphi ) = \mathbb E[\cos (B_1)] = e^{-\beta/2}$ . Thus, applying a union bound implies that for some constant $C \gt 0$ , $\mathbb P(E_k)\geq 1-Cq^k\lambda (\varphi )^k$ . Hence, by Claim 17,

\begin{equation*}\mathrm {Law}((g(v))_{v\in T_k}|g(r)=x_0) \in \mathrm {DPS}_w(1-Cq^k\lambda (\varphi )^k).\end{equation*}

The result is now a direct consequence of Lemma 16.

In the next section, we generalise the above ideas and obtain a corresponding bound which holds for distributions other than the Gaussian one.

3.2 The general case

3.2.1 On symmetric functions

We begin this section with simple criterion to determine whether a measure belongs to some $\mathrm{DPS}$ class. In the sequel, for $w \in \mathbb{S}^{d-1}$ , we denote,

\begin{equation*} H_w^+ =\{x \in \mathbb {S}^{d-1}| \langle x,w\rangle \geq 0\}\text { and }H_w^- =\{x \in \mathbb {S}^{d-1}| \langle x,w\rangle \lt 0\}. \end{equation*}

Lemma 19. Let $f\;:\; [-1,1] \to \mathbb R^+$ satisfy the assumptions of Theorem 9 and let $y \in \mathbb{S}^{d-1}$ . If $\mu = f(\langle \cdot,y\rangle )d\sigma$ , then for any $w \in \mathbb{S}^{d-1}$ , $\mu \in \mathrm{DPS}_w(2\cdot p_w)$ , where $p_w = \min \left (\mu (H_w^+),\mu (H_w^-)\right )$ .

Proof. Without loss of generality let us assume that $y \in H_w^+$ . Monotonicity of $f$ implies $\mu (H_w^+)\geq \mu (H_w^-)$ . Now, if $R = \mathrm{Id}_d - 2ww^T$ is the reflection matrix with respect to $w^\perp$ , then we have for any $x \in H^-_w$ ,

\begin{equation*} f(\langle x,y\rangle ) \leq f(\langle Rx,y\rangle ). \end{equation*}

This follows since $\langle x,y\rangle \leq \langle Rx,y\rangle$ .

Let us now define the measure $\tilde{\mu }_w^s$ such that

\begin{equation*} \frac {d\tilde {\mu }_w^s}{d\sigma }(x) = \left \{\begin {array}{ll} f(\langle x,y\rangle ) & \text { if }x \in H^-_w\\[5pt] f(\langle Rx,y\rangle )& \text { if }x \in H^+_w. \end {array}\right . \end{equation*}

$\tilde{\mu }_w^s$ is clearly $R$ -invariant and the above observation shows that $\tilde{\mu }_w^s(\mathbb{S}^{d-1}) \leq 1$ . We can thus define $\tilde{\mu }_w = \mu - \tilde{\mu }_w^s$ . To obtain a decomposition, define $\mu _w^s = \frac{\tilde{\mu }^s_w}{\tilde{\mu }_w^s(\mathbb{S}^{d-1})}$ and $\mu _w = \frac{\tilde{\mu }_w}{\tilde{\mu }_w(\mathbb{S}^{d-1})}$ , for which

\begin{equation*}\mu = (1-\tilde {\mu }^s_w(\mathbb {S}^{d-1}))\mu _w + \tilde {\mu }_w^s(\mathbb {S}^{d-1})\mu _w^s.\end{equation*}

The proof is concluded by noting $\tilde{\mu }_w^s(\mathbb{S}^{d-1}) = 2\mu (H_w^-)$ .

Our main object of interest will be the measure $\mu _\varphi (y)$ which, for a fixed $y$ , is defined by

(25) \begin{equation} \mu _\varphi (y) \;:\!=\; \varphi (x,y)d\sigma (x) = f (\langle x,y\rangle )d\sigma (x). \end{equation}

Let us now denote,

\begin{equation*} \beta _d(t) \;:\!=\; \frac {\Gamma (\frac {d}{2})}{\Gamma (\frac {d-1}{2})}(1-t^2)^{(d-3)/2}, \end{equation*}

which is the marginal of the uniform distribution on the sphere. We now show that the spectral gap of $\varphi$ may determine the $\mathrm{DPS}$ properties of $\mu _\varphi (y)$ .

Lemma 20. Let $w \in \mathbb{S}^{d-1}$ and suppose that $f$ is monotone. If $|\langle w,y\rangle |\leq \frac{1-\lambda (\varphi )}{16\sqrt{d}}$ , then

\begin{equation*}\mu _\varphi (y) \in \mathrm {DPS}_w\left (\frac {1-\lambda (\varphi )}{35}\right ).\end{equation*}

Proof. Assume W.L.O.G. that $\langle y,w\rangle \gt 0$ . By Lemma 19 it will be enough to bound $\int \limits _{H_w^-}\mu _{\varphi }(y)$ from below. Let $X \sim \mu _{\varphi }(y)$ and define $Z = \langle X,y\rangle$ . We have $\mathbb E\left [Z\right ] = \lambda (\varphi )$ and by Markov’s inequality,

\begin{equation*}\mathbb P\left (Z \leq \frac {1+\lambda (\varphi )}{2}\right ) = \mathbb P\left (Z + 1 \leq \frac {1+\lambda (\varphi )}{2} + 1\right ) \geq 1 - \frac {2(\lambda (\varphi ) +1)}{3 + \lambda (\varphi )}\geq \frac {1-\lambda (\varphi )}{4}.\end{equation*}

For $t \in [-1,1]$ , set $S_t = \{x \in \mathbb{S}^{d-1}| \langle x,y\rangle = t\}$ and let $\mathcal{H}^{d-2}$ stand for $d-2$ -dimensional Hausdorff measure. To bound $\int \limits _{H_w^-}\mu _{\varphi }(y)$ we would first like to estimate $\frac{\mathcal{H}^{d-2}(S_t\cap H_w^-)}{\mathcal{H}^{d-2}(S_t)}$ .

We know that $0 \leq \langle w,y\rangle \leq \frac{1-\lambda (\varphi )}{16\sqrt{d}}$ . Denote $t_y \;:\!=\; \langle w,y\rangle$ and fix $t \leq t_0 \;:\!=\; \frac{1+\lambda (\varphi )}{2}$ . With no loss of generality, let us write $w = e_1$ and $y = t_ye_1 + \sqrt{1-t_y^2}e_2$ . Define now $z = -\sqrt{1-t_y^2}e_1+t_ye_2$ . We claim that

(26) \begin{equation} \left \{v\in S_t\Big | \frac{\langle v,z\rangle }{\sqrt{1-t^2}} \geq \frac{1}{2\sqrt{d}}\right \}\subseteq S_t\cap H_w^-. \end{equation}

If $v\in S_t$ , its projection onto the plane $\mathrm{span}(y,w)$ , can be written as $t\cdot y + \sqrt{1-t^2}c\cdot z$ , for some $c\in [-1,1]$ . So,

\begin{equation*}\langle v,w\rangle = t\cdot t_y - \sqrt {1-t^2}\sqrt {1-t_y^2}c.\end{equation*}

Now, whenever

\begin{equation*}c \gt \frac {t\cdot t_y}{\sqrt {1-t^2}\sqrt {1-t_y^2}},\end{equation*}

we get $\langle v,w\rangle \lt 0$ . Also,

\begin{equation*} \frac {t\cdot t_y}{\sqrt {1-t^2}\sqrt {1-t_y^2}}\leq \frac {1}{\sqrt {3}}\frac {t_y}{\sqrt {1-t_y^2}}\leq \frac {1}{2\sqrt {d}},\end{equation*}

where we have used $ t \leq \frac{1}{2}$ for the first inequality and $t_y \leq \frac{1}{2\sqrt{d}}$ in the second inequality. By combining the above displays with $\frac{\langle v,z\rangle }{\sqrt{1-t^2}} = c$ , (26) is established. Thus, by taking the marginal of $S_t$ in the direction of $-z$ , we see

\begin{align*} \frac{\mathcal{H}^{d-2}(S_t\cap H_w^-)}{\mathcal{H}^{d-2}(S_t)} &\geq \int \limits _{-1}^{-\frac{1}{2\sqrt{d}}}\beta _{d-1}(s)ds\geq \int \limits _{-\frac{1}{\sqrt{d}}}^{-\frac{1}{2\sqrt{d}}}\beta _{d-1}(s)ds\geq \frac{1}{2\sqrt{d}}\beta _{d-1}\left (\frac{1}{\sqrt{d}}\right )\\[5pt] &\geq \frac{1}{2\sqrt{d}}\frac{\Gamma (\frac{d-1}{2})}{\Gamma (\frac{d-2}{2})}\left (1-\frac{1}{d}\right )^{(d-4)/2}\geq \frac{1}{10\sqrt{e}}, \end{align*}

where we used $\frac{\Gamma (\frac{d-1}{2})}{\Gamma (\frac{d-2}{2})} \geq \frac{\sqrt{d}}{5},$ valid for any $d \geq 3$ . We use the above estimates with Fubini’s theorem to obtain:

\begin{align*} \mathbb P\left (X \in H_w^-\right ) &= \int \limits _{-1}^1f(t)\mathcal{H}^{d-2}(S_t\cap H_w^-)dt \geq \int \limits _{-1}^{t_0}f(t)\mathcal{H}^{d-2}(S_t\cap H_w^-)dt\\[5pt] &\geq \frac{1}{10\sqrt{e}}\int \limits _{-1}^{t_0}f(t)\mathcal{H}^{d-2}(S_t)dt = \frac{1}{10\sqrt{e}}\mathbb P\left (Z \leq \frac{1+\lambda (\varphi )}{2}\right )\geq \frac{1-\lambda (\varphi )}{70}. \end{align*}

3.2.2 Mixing

Recall the random function $g:T \to \mathbb{S}^{d-1}$ , introduced in Section 1.2, which assigns to the root, $r$ , a uniformly random value and for any other $u \in T$ , the label $g(u)$ is distributed according to $\varphi (g(\mathrm{parent}(u)),\cdot )d\sigma \;=\!:\; \mu _\varphi (\mathrm{parent}(u))$ . Suppose that $v \in T_k$ and let $\{v_i\}_{i=0}^k$ denote the simple path from $r$ to $v$ in $T$ . Fix $x_0 \in \mathbb{S}^{d-1},$ for $i = 0,\ldots,k$ , we now regard,

\begin{equation*} X_i \;:\!=\;g(v_i)|g(r) = x_0, \end{equation*}

as a random walk on $\mathbb{S}^{d-1}$ . Observe that given $X_{i-1}$ , $X_i\sim \mu _\varphi (X_{i-1}).$ The following lemma shows that this random walk is rapidly mixing.

Lemma 21. For $w \in \mathbb{S}^{d-1}$ , let

\begin{equation*}S(w) = \left \{u \in \mathbb {S}^{d-1}\;:\; |\langle u,w\rangle | \leq \frac {1-\lambda (\varphi )}{16\sqrt {d}}\right \},\end{equation*}

and set $k_0 = \frac{\ln \left (\frac{\lambda (\varphi )(1-\lambda (\varphi ))}{32 f(1)}\right )}{\ln (\lambda (\varphi ))}$ . Then, if $f$ satisfies the assumptions of Theorem 9 ,

\begin{equation*}\mathbb P(X_{k_0} \in S(w)) \geq \frac {1-\lambda (\varphi )}{32}.\end{equation*}

Proof. Note that if $U \sim \mathrm{Uniform}(\mathbb{S}^{d-1})$ , then

(27) \begin{equation} \mathbb P(U \in S(w)) = \int \limits _{|t| \leq \frac{1-\lambda (\varphi )}{16\sqrt{d}}}\beta _d(t)dt \geq 2\frac{\Gamma (\frac{d}{2})}{\Gamma \left (\frac{d-1}{2}\right )}\beta _d\left (\frac{1}{16\sqrt{d}}\right )\frac{1-\lambda (\varphi )}{16\sqrt{d}} \geq \frac{1-\lambda (\varphi )}{16}. \end{equation}

It will then suffice to show that $\mathbb P(X_{k_0} \in S(w))$ can be well approximated by $\mathbb P(U \in S(w))$ . Since $X_{k_0}$ has density $A^{k_0-1}_\varphi f(\langle x,x_0\rangle )$ , the following holds true,

\begin{align*} \left (\mathbb P(U \in S(w)) - \mathbb P(X_{k_0} \in S(w))\right )^2 &= \left (\int \limits _{S(w)}\left (A_\varphi ^{k_0-1}f(\langle x,x_0\rangle ) - 1\right )d\sigma (x)\right )^2\\[5pt] &\leq \int \limits _{S(w)}\left (A_\varphi ^{k_0-1}f(\langle x,x_0\rangle ) - 1\right )^2d\sigma (x). \end{align*}

We now decompose the density as $f(\langle x,x_0\rangle ) = \sum \limits _{i = 0}^\infty \lambda _i \psi _i(x)\psi _i(x_0)$ . So that

\begin{equation*}A_\varphi ^{k_0-1}f(\langle x,x_0\rangle ) = \sum \limits _{i = 0}^\infty \lambda _i^{k_0} \psi _i(x)\psi _i(x_0).\end{equation*}

We know that $\psi _0 \equiv 1$ with eigenvalue $\lambda _0 = 1$ , and we’ve assumed that $|\lambda _i|\leq \lambda _1 = \lambda (\varphi )$ for every $i \geq 1$ . Thus,

\begin{align*} \int \limits _{S(w)}\left (A_\varphi ^{k_0-1}f(\langle x,x_0\rangle ) - 1\right )^2d\sigma (x) &= \int \limits _{S(w)}\left (\sum \limits _{i =1}^\infty \lambda _i^{k_0}\psi _i(x)\psi _i(x_0)\right )^2d\sigma (x)\\[5pt] &\leq (\lambda (\varphi ))^{2k_0-2}\int \limits _{S(w)}\sum \limits _{i =1}^\infty (\lambda _i\psi _i(x))^2\sum \limits _{i =1}^\infty (\lambda _i\psi _i(x_0))^2d\sigma (x) \\[5pt] &\leq \lambda (\varphi )^{2k_0-2}f(1)^2. \end{align*}

where in the last inequality we have used $f(1) = \sum \lambda _i\psi _i(y)\psi _i(y)$ , which is valid for any $y \in \mathbb{S}^{d-1}$ . Thus, since $k_0 = \frac{\ln \left (\frac{\lambda (\varphi )(1-\lambda (\varphi ))}{32 f(1)}\right )}{\ln (\lambda (\varphi ))}$ , by (27), we get,

\begin{equation*}\mathbb P(X_{k_0} \in S(w)) \geq \mathbb P(U \in S(w)) - \frac {1-\lambda (\varphi )}{32} \geq \frac {1-\lambda (\varphi )}{32}.\end{equation*}

Since the random walk $X_k$ mixes well, we now use Lemma 20 to show that after enough steps, $X_k$ will be approximately invariant to a given reflection.

Lemma 22. Let $w,x_0 \in \mathbb{S}^{d-1}$ . Then,

\begin{equation*}\mathrm {Law}((g(v))_{v \in T_k}|g(r) = x_0) \in \mathrm {DPS}^k_w\left (1- q^kp^{k}\right ), \end{equation*}

where $p = \left (1-\frac{\ln (\lambda (\varphi ))}{\ln \left (\frac{\lambda (\varphi )(1-\lambda (\varphi ))}{32 f(1)}\right )}\frac{(1-\lambda (\varphi ))^2}{600}\right )$ .

Proof. Let $R =\mathrm{Id}_d - 2ww^T$ denote the linear reflection with respect to $w^\perp$ . Then, the claim is equivalent to the decomposition,

\begin{equation*}X_k = P\tilde {X}_k + (1-P)X_k^R,\end{equation*}

where $X_k^R$ is invariant to reflections by $R$ and $P \sim \mathrm{Bernoulli}(s_k)$ is independent from $\{\tilde{X}_k,X_k^R\}$ with $s_k \leq \left (1-\frac{\ln (\lambda (\varphi ))}{\ln \left (\frac{\lambda (\varphi )(1-\lambda (\varphi ))}{32 f(1)}\right )}\frac{(1-\lambda (\varphi ))^2}{600}\right )^{k}$ .

Consider the case that for some $i = 0,\ldots,k$ , $|\langle X_i, w \rangle | \leq \frac{1-\lambda (\varphi )}{16\sqrt{d}}$ . In this case, from Lemma 20, given $X_i$ , we have the decomposition,

\begin{equation*}\mu _\varphi (X_i) = \left (1 - \frac {(1-\lambda (\varphi ))}{35}\right )\mu _{\varphi,w} + \frac {(1-\lambda (\varphi ))}{35}\mu _{\varphi,w}^s(X_i),\end{equation*}

where $\mu _{\varphi,w}^s(X_i)$ is $R$ -invariant.

We now generate the random walk in the following way. For $i = 0,\ldots,k$ , let

(28) \begin{equation} P_i \sim \mathrm{Bernoulli}\left (\frac{(1-\lambda (\varphi ))}{35}\right ), \end{equation}

be an i.i.d sequence. Given $X_i$ , if $|\langle X_i, w \rangle | \gt \frac{1-\lambda (\varphi )}{16\sqrt{d}}$ then $X_{i+1} \sim \mu _\varphi (X_i)$ . Otherwise, $|\langle X_i, w \rangle | \leq \frac{1-\lambda (\varphi )}{16\sqrt{d}}$ . To decide on the position of $X_{i+1}$ we consider $P_i$ . If $P_i = 0$ then $X_{i+1} \sim \mu _{\varphi,w}(X_i)$ . If $P_i = 1$ , we generate $X_{i+1} \sim \mu _{\varphi,w}^s(X_i)$ . We denote the latter event by $A_i$ and $A = \cup _{i=0}^{k-1}A_i$ .

It is clear that, conditional on $A$ , $RX_k \stackrel{law}{=} X_k$ . Thus, to finish the proof, if $\bar{A}$ is the complement of $A$ , we will need to show

\begin{equation*}\mathbb P(\bar {A}) \leq \left (1-\frac {\ln (\lambda (\varphi ))}{\ln \left (\frac {\lambda (\varphi )(1-\lambda (\varphi ))}{32 f(1)}\right )}\frac {(1-\lambda (\varphi ))^2}{600}\right )^{k}.\end{equation*}

Towards this, let $S(w)$ and $k_0$ be as in Lemma 21. Coupled with (28), the lemma tells us that

\begin{equation*}\mathbb P\left (A_{k_0}\right ) \geq \frac {(1-\lambda (\varphi ))^2}{600}.\end{equation*}

Now, by restarting the random walk from $X_{k_0}$ if needed, we may show,

\begin{equation*}\mathbb P\left (\bar {A}\right )\leq \sum \limits _{m\leq \frac {k}{k_0}}\mathbb P(\bar {A}_{m\cdot k_0}) \leq \left (1- \frac {(1-\lambda (\varphi ))^2}{600}\right )^{\frac {k}{k_0}} \leq \left (1- \frac {(1-\lambda (\varphi ))^2}{600k_0}\right )^{k}.\end{equation*}

The claim now follows by taking a union bound over all paths. Indeed, for each $v \in T_k$ define the corresponding event $A_v$ . Then, with a union bound,

\begin{equation*}\mathbb P\left (\bigcup _{v \in T_k}\bar {A_v}\right ) \leq |T_k|p^k \leq q^kp^k.\end{equation*}

By definition, conditioned on $\bigcap _{v\in T_k}{A_v}$ , $\{X_v\}_{v\in T_k}$ is symmetric around $w$ , which completes the lemma.

3.2.3 Proving Theorem 9

Proof. By Lemma 22, for every $w,x_0 \in \mathbb{S}^{d-1}$ .

\begin{equation*} \mathrm {law}(g(v)_{v\in T_k}|g(r) = x_0) \in \mathrm {DPS}^k_w(1-q^kp^k), \end{equation*}

where $p = \left (1-\frac{\ln (\lambda (\varphi ))}{\ln \left (\frac{\lambda (\varphi )(1-\lambda (\varphi ))}{32 f(1)}\right )}\frac{(1-\lambda (\varphi ))^2}{600}\right )$ . By assumption

\begin{equation*} q \leq p^{-1}, \end{equation*}

and Lemma 16 gives the result.

Acknowledgments

We are grateful to two anonymous referees for carefully reading the paper and providing many insightful comments.

Footnotes

Supported by a European Research Council Starting Grant (ERC StG) and by an Israel Science Foundation Grant no. 715/16.

Supported by an Azrieli foundation fellowship.

§

Supported by an Israel Science Foundation Grant no. 715/16.

References

Abbe, E. (2018) Community detection and stochastic block models. Found. Trends® Commun. Inform. Theory 14(1-2) 1162.CrossRefGoogle Scholar
Valdivia, E. A and De Castro, Y. (2019) Latent distance estimation for random geometric graphs, Advances in Neural Information Processing Systems 32. Curran Associates, Inc, 87248734.Google Scholar
Bhatia, R. (1997) Matrix Analysis. Springer-Verlag, New York.CrossRefGoogle Scholar
Braun, M. L. (Dec. 2006) Accurate error bounds for the eigenvalues of the kernel matrix. J. Mach. Learn. Res. 7 23032328.Google Scholar
Bubeck, S., Ding, J., Eldan, R. and Rácz, M. Z. (2016) Testing for high-dimensional geometry in random graphs. Random Struct. Algorithms 49(3) 503532.CrossRefGoogle Scholar
De Castro, Y., Lacour, C. and Ngoc, T. M. P. (2017) Adaptive estimation of nonparametric geometric graphs, arXiv preprint arXiv: 1708. 02107.Google Scholar
Deshpande, Y., Sen, S., Montanari, A. and Mossel, E. (2018) Contextual stochastic block models, Advances in Neural Information Processing Systems 31, Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N. and Garnett, R., Curran Associates, Inc, 85818593.Google Scholar
Ernst, P. and Shepp, L. (2016) On the time for Brownian motion to visit every point on a circle. J. Statist. Plann. Inference 171 130134.CrossRefGoogle Scholar
Galhotra, S., Mazumdar, A., Pal, S. and Saha, B.. The geometric block model. arXiv preprint arXiv: 1709. 05510, 2017.Google Scholar
Hirsch, F. and Lacombe, G. (2012) Elements of functional analysis. Springer Science & Business Media, 192 Google Scholar
Le, C. M., Levina, E. and Vershynin, R. (2017) Concentration and regularization of random graphs. Random Struct. Algorithms 51(3) 538561.CrossRefGoogle Scholar
Le, C. M., Levina, E. and Vershynin, R. (2018) Concentration of random graphs and application to community detection. Proc. Int. Cong. Math. 3 29132928.Google Scholar
Mossel, E. (2001) Reconstruction on trees: beating the second eigenvalue. Ann. Appl. Probab. 11(1) 285300.CrossRefGoogle Scholar
Mossel, E. (2004) Survey: Information Flow on Trees. American Mathematical Society.Google Scholar
Mossel, E. and Peres, Y. (08 2003) Information flow on trees. Ann. Appl. Probab. 13(3) 817844.Google Scholar
Mossel, E., Roch, S. and Sly, A. (2013) Robust estimation of latent tree graphical models: inferring hidden states with inexact parameters. IEEE Trans. Inform. Theory 59(7) 43574373.CrossRefGoogle Scholar
Sussman, D. L., Tang, M. and Priebe, C. E. (Jan 2014) Consistent latent position estimation and vertex classification for random dot product graphs. IEEE Trans. Pattern Anal. Mach. Intell. 36(1) 4857.CrossRefGoogle ScholarPubMed
Tang, M., Sussman, D. L. and Priebe, C. E. (2013) Universally consistent vertex classification for latent positions graphs. Ann. Stat. 41(3) 14061430.CrossRefGoogle Scholar
Tropp, J. (2012) User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4) 389434.CrossRefGoogle Scholar
Valdivia, E. A. (2018) Relative concentration bounds for the kernel matrix spectrum, arXiv preprint arXiv: 1812. 02108.Google Scholar
Yu, Y., Wang, T. and Samworth, R. J. (2014) A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102(2) 315323.CrossRefGoogle Scholar