Hostname: page-component-586b7cd67f-l7hp2 Total loading time: 0 Render date: 2024-11-21T23:28:20.405Z Has data issue: false hasContentIssue false

Exact recovery of community detection in k-community Gaussian mixture models

Published online by Cambridge University Press:  18 September 2024

Zhongyang Li*
Affiliation:
Department of Mathematics, University of Connecticut, Storrs, Connecticut, USA
Rights & Permissions [Opens in a new window]

Abstract

We study the community detection problem on a Gaussian mixture model, in which vertices are divided into $k\geq 2$ distinct communities. The major difference in our model is that the intensities for Gaussian perturbations are different for different entries in the observation matrix, and we do not assume that every community has the same number of vertices. We explicitly find the necessary and sufficient conditions for the exact recovery of the maximum likelihood estimation, which can give a sharp phase transition for the exact recovery even though the Gaussian perturbations are not identically distributed; see Section 7. Applications include the community detection on hypergraphs.

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press

1. Introduction

Community structures are ubiquitous in graphs modelling natural and social phenomena. In natural sciences, atoms form molecules so that atoms in the same molecule have stronger connections compared to those in different molecules. In social sciences, individuals form groups in such a way that individuals in the same group have more communications compared to individuals in different groups. The main aim for community detection is to determine the specific groups that specific individuals belong to based on observations of (random) connections between individuals. Identifying different communities in the stochastic block model is a central topic in many fields of science and technology; see [Reference Abbe1] for a summary.

In this paper, we study the community detection problem for the Gaussian mixture model, in which there are $n$ vertices belonging to $k$ ( $k\geq 2$ ) different communities. We observe a $p\times 1$ vector for each one of the $n$ vertices, perturbed by a $p\times 1$ Gaussian vector with independent (but not necessarily identically distributed), mean-0 entries. More precisely, each entry of the $p\times n$ perturbation matrix is obtained by a multiple of a standard Gaussian random variable, while the intensities of different entries are different. Given such an observation, we find the maximum likelihood estimation (MLE) for the community assignment and study the probability that the MLE equals the true community assignment as the number of vertices $n\rightarrow \infty$ . If this probability tends to $1$ as $n\rightarrow \infty$ , we say exact recovery occurs. Heuristically, it is natural to conjecture that exact recovery may occur when the intensities of the perturbations are small but does not occur when these intensities are large. The major theme of the paper is to investigate how small the intensities of the perturbations are needed in order to ensure the exact recovery and how large the intensities are required to stop the occurrences of the exact recovery.

Clustering problems in the Gaussian mixture model have been studied extensively; see [Reference Dempster, Laird and Rubin5, Reference Fraley and Raftery7, Reference McQueen15, Reference Pollard18] for an incomplete list. We mention some recent related work here.

The Gaussian mixture model when all the entries of the perturbation matrix are i.i.d was studied in [Reference Chen and Yang4], in which a condition for the exact recovery of the semi-definite programming (SDP) is proved. When all the communities have the same size, a condition that exact recovery does not occur was also proved in [Reference Chen and Yang4] when the number of communities $k\leq \log n$ . The case of unbalanced communities was investigated in [Reference Giraud and Verzelen8]. In this paper, we obtain conditions when the exact recovery happens and does not happen for the more general Gaussian mixture model when the entries of the perturbation matrix are not necessarily identically distributed. Our result can be applied to the special case when intensities of the Gaussian perturbations are all equal, and in particular, we obtain a condition that the exact recovery of MLE does not occur when the number of communities $k$ is $e^{o(\log n)}$ in the hypergraph model; see the explanations at the end of Example 4.5. We can also see from the sufficient condition (17) for the exact recovery and the sufficient condition (22) that the exact recovery does not occur that when $k$ is $e^{o(\log n)}$ , these two conditions match and there is a sharp phase transition. In Section 7, we see an example in which these necessary and sufficient conditions of the exact recovery give a sharp phase transition when the Gaussian perturbations are not i.i.d.

When $p=n$ in our model, we may consider the rows and columns of the observation matrix are indexed by the $n$ vertices, and each entry represents an edge. In this case, we obtain the community detection problem on a graph. When $p=n^s$ with $s\geq 2$ , we may consider the rows of the observation matrix are indexed by ordered $s$ -tuples of vertices, and each entry of the observation matrix represents a $(s+1)$ -hyperedge. In this case, we obtain the community detection problem on a hypergraph. Community detections on hypergraphs with Gaussian perturbations were studied in [Reference Kim, Bandeira and Goemans11], where the vertices are divided into two equal-sized communities, and a weight-1 $(d+1)$ -hyperedge exists if and only if all the vertices are in the same group. The results proved in this paper can be applied to the community detection problems on hypergraphs with Gaussian perturbation to obtain necessary and sufficient conditions for the exact recovery, in which the number of communities is arbitrary and communities are not necessarily equal-sized; moreover, the hyperedges have general weights as represented in the (unperturbed) observation matrix. Community detection problems on random graphs were also studied in [Reference Abbe and Sandon2, Reference Abbe, Bandeira and Hall3, Reference Dyer and Frieze6, Reference Holland, Laskey and Leinhardt9, Reference Massoulié14, Reference Mossel, Neeman and Sly16]. Algorithms to efficiently implement the community recovery in mixture models include the Expectation-Maximization (EM) algorithm ([Reference Dempster, Laird and Rubin5]) and the spectral method ([Reference Kannan, Salmasian and Vempala10, Reference Löffler, Zhang and Zhou12, Reference Vempalaa and Wang19]). The EM algorithm is a local search heuristic that can fail. The spectral method is based on the singular value decomposition of the dataset and then only use data related to a fixed finite number of largest singular values to achieve dimension reduction. The performance of the spectral method when the Gaussian perturbations consist of i.i.d. Gaussian random variables was discussed in [Reference Löffler, Zhang and Zhou12]. The major differences between results in this paper and those in [Reference Löffler, Zhang and Zhou12] are (1) this paper focuses on exact recovery, that is, the probability that the estimation is exactly equal to the true community assignment and no mislabel is allowed, while [Reference Löffler, Zhang and Zhou12] allows a small number of mislabelled vertices (almost exact recovery) and (2) [Reference Löffler, Zhang and Zhou12] focuses on Gaussian mixture model with isotropic covariance matrix; while in this paper, the covariances matrices for the model can be more general. Partial recovery was also discussed in [Reference Giraud and Verzelen8].

The implement of the MLE is usually very slow; in some cases, relaxing some constraints in MLE leads to the more efficient SDP. However, most SDPs require the partition to be balanced or that, at least, the size of each group is known in advance. The MLE optimisation discussed here has the advantage to attack the case of unbalanced sizes of groups, and the case when the size of each group is unknown. When the perturbations for the entries of the observation matrix are assumed to be i.i.d. Gaussian, and using the average value of the observations associated with each group to approximated the expectation of the observation to the group, in this case the MLE becomes the K-means estimation. The SDP relaxation for K-means have been studied extensively, see for example [Reference Peng and Wei17]. We expect a similar SDP relaxation for K-means in the dependent case in which the inner product in the objective function should be defined with respect to the inverse of the covariance matrix; yet in this paper, we shall focus on the more fundamental MLE and try to investigate its statistical limit.

The organisation of the paper is as follows. In Section 2, we review the definition of the Gaussian mixture models and hypergraphs and state the main results proved in this paper. In Section 3, we prove conditions for the exact recovery of the Gaussian mixture model when the number of vertices in each community is unknown. In Section 4, we apply the results proved in Section 3 to the exact recovery of the community detection in hypergraphs and also prove conditions when exact recovery does not occur in hypergraphs under the assumption that the number of vertices in each community is unknown. In Section 5, we prove conditions for the exact recovery of the Gaussian mixture model when the number of vertices in each community is known and fixed. In Section 6, we prove conditions when exact recovery does not occur in hypergraphs under the assumption that the number of vertices in each community is known and fixed. In Section 7, we give an example in which these necessary and sufficient conditions of the exact recovery give a sharp phase transition when the Gaussian perturbations are not i.i.d.; results of numerical experiments are also included to show the performance of MLE with difference parameters to illustrate the sharp phase transition. In Section A, we prove a lemma used to obtain the main results of the paper.

2. Backgrounds and main results

In this section, we review the definition of the Gaussian mixture models and hypergraphs and state the main results proved in this paper.

2.1. Gaussian mixture model

Let $n\geq k\geq 2$ be positive integers. Let

\begin{eqnarray*} [n]=\{1,2,\ldots,n\} \end{eqnarray*}

be a set of $n$ vertices divided into $k$ different communities. Let

\begin{eqnarray*} [k]\;:\!=\;\{1,\ldots,k\} \end{eqnarray*}

be the set of communities. A mapping $x\; :\; [n]\rightarrow [k]$ which assigns a unique community represented by an integer in $[k]$ to each one of the $n$ vertices in $[n]$ is called a community assignment mapping. Let $\Omega$ be the set consisting of all the possible mappings from $[n]$ to $[k]$ , that is,

\begin{eqnarray*} \Omega \;:\!=\;\{x\; :\; [n]\rightarrow [k]\}. \end{eqnarray*}

Each mapping in $\Omega$ is a community assignment mapping.

We shall define a function $\theta$ , which assigns to each vertex a $p\times 1$ vector, depending on the community of this vertex. Indeed, we require that all the vertices in the same community correspond to the same $p\times 1$ vector under $\theta$ . Observing $\theta$ perturbed by some Gaussian noise, our goal is to identify the community assignment and determine when exactly recovery occurs. It is natural to believe that those vertices have corresponding observations close to each other are in the same community, while those vertices whose corresponding observations are far away from each other are in different communities, but this intuition may be complicated by the existence of the Gaussian noise. See Sections 4.1, 4.2 and 4.3 for examples with specific $\theta$ .

More precisely, let $p\geq 1$ be a positive integer. Let

\begin{eqnarray*} \theta \; :\;\Omega \times [p]\times [k]\rightarrow{\mathbb R} \end{eqnarray*}

be a function on the set $\Omega \times [p]\times [k]$ taking real values.

For a community assignment mapping $x\in \Omega$ , let $\mathbf{A}_x$ be a $p\times n$ matrix whose entries are given by:

(1) \begin{eqnarray} (\mathbf{A}_x)_{i,j}=\theta (x,i,x(j)),\ \forall i\in [p], j\in [n]. \end{eqnarray}

Let $\Sigma$ be a $p\times n$ matrix with positive real entries defined by:

\begin{eqnarray*} \Sigma \;:\!=\;(\sigma _{i,j})_{i\in [p],j\in [n]}\in ({\mathbb R}^{+})^{p\times n} \end{eqnarray*}

Let $P,Q$ be two $p\times n$ matrices. Define the inner product of $P,Q$ by

\begin{eqnarray*} \langle P,Q\rangle =\sum _{i\in [p]}\sum _{j\in [n]} P_{i,j}Q_{i,j}. \end{eqnarray*}

The norm $\|P\|$ for a matrix $P$ is defined by:

\begin{eqnarray*} \|P\|=\sqrt{\langle P, P\rangle }. \end{eqnarray*}

Let $P*Q$ be a $p\times n$ matrix defined by:

\begin{eqnarray*} P*Q\;:\!=\;(P_{i,j}Q_{i,j})_{i\in [p],j\in [n]} \end{eqnarray*}

Define a random observation matrix $\mathbf{K}_x$ by:

(2) \begin{eqnarray} \mathbf{K}_x=\mathbf{A}_x+ \Sigma *\mathbf{W}; \end{eqnarray}

where $\mathbf{W}$ is a random $p\times n$ matrix with i.i.d. standard Gaussian entries. Note that if the entries of $\Sigma$ are not all equal, the perturbation matrix $\Sigma *W$ has independent but not identically distributed entries.

Let $y\in \Omega$ be the true community assignment mapping. Given the observation $\mathbf{K}_y$ , the goal is to determine the true community assignment mapping $y$ . We shall apply the MLE.

Let $n_1,\ldots,n_k$ be positive integers satisfying

\begin{eqnarray*} \sum _{i=1}^{k}n_i=n. \end{eqnarray*}

and

\begin{eqnarray*} |y^{-1}(i)|=n_i,\ \forall i\in [k]; \end{eqnarray*}

that is, $n_i$ is the number of vertices in community $i$ for each $i\in [k]$ under the mapping $y$ .

Let

\begin{eqnarray*} \Omega _{n_1,\ldots,n_k}\;:\!=\;\{x\in \Omega \; :\; |x^{-1}(i)|=n_i,\ \forall i\in [k]\} \end{eqnarray*}

be the set of all the community assignment mappings such that there are exactly $n_i$ vertices in the community $i$ , for each $i\in [k]$ .

For each real number $c\in (0,1)$ , let

\begin{eqnarray*} \Omega _c\;:\!=\;\left \{x\in \Omega \; :\; \frac{|x^{-1}(i)|}{\sum _{j\in [k]}|x^{-1}(j)|}\geq c,\ \forall i\in [k]\right \}, \end{eqnarray*}

that is, $\Omega _c$ consists of all the community assignment mappings such that the percentage of the numbers of vertices in each community is at least $c$ .

Assume the true community assignment mapping $y\in \Omega _c$ for some $c\in (0,1)$ . Let $\Phi$ be an $p\times n$ matrix whose entries are given by:

\begin{eqnarray*} (\Phi )_{i,j}=\frac{1}{\sigma _{i,j}},\ \forall i\in [p], j\in [n]; \end{eqnarray*}

in other words, the $(i,j)$ -entry of $\Phi$ is the reciprocal of the $(i,j)$ -entry of $\Sigma$ . Define

(3) \begin{eqnarray} \hat{y}\;:\!=\;\mathrm{argmin}_{x\in \Omega _{\frac{2c}{3}}}\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2 \end{eqnarray}

and

(4) \begin{eqnarray} \check{y}\;:\!=\;\mathrm{argmin}_{x\in \Omega _{n_1,\ldots,n_k}}\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2 \end{eqnarray}

Then we have the following lemma

Lemma 2.1. $\hat{y}$ is the MLE with respect to the observation $\mathbf{K}_y$ in $\Omega _{\frac{2c}{3}}$ . $\check{y}$ is the MLE with respect to the observation $\mathbf{K}_y$ in $\Omega _{n_1,\ldots,n_k}$ .

Proof. By definition, the MLE with respect to the observation $\mathbf{K}_y$ in $\Omega _{\frac{2c}{3}}$ (resp. $\Omega _{n_1,\ldots,n_k}$ ) should maximise the probability density of the observation $\mathbf{K}_y$ among all $x\in \Omega _{\frac{2c}{3}}$ (resp. $x\in \Omega _{n_1,\ldots,n_k}$ ). If the true community assignment mapping $y=x$ , we may consider $\mathbf{K}_{y}$ as a random matrix with mean value $\mathbf{A}_x$ and independent entries, such that variance of its $(i,j)$ -entry is $\sigma _{i,j}^2$ . Therefore, the probability density of $\mathbf{K}_y$ is given by:

\begin{eqnarray*} \left (\prod _{i\in [p],j\in [n]}\frac{1}{\sqrt{2\pi }\sigma _{i,j}}\right )e^{-\sum _{i\in [p],j\in [n]}\frac{(\mathbf{K}_y-\mathbf{A}_x)_{i,j}^2}{2\sigma _{i,j}^2}}, \end{eqnarray*}

where the exponent is exactly

\begin{eqnarray*} -\frac{1}{2}\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2. \end{eqnarray*}

It is straightforward to check that the minimiser of $\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2$ is exactly the maximiser of the probability density. Then the lemma follows.

We shall investigate under which conditions we have $\check{y}=y$ and $\hat{y}=y$ with high probability.

To state the main theorems proved in this paper, we first introduce a few assumptions.

For $x,y\in \Omega$ , let

(5) \begin{eqnarray} L_{\Phi }(x,y)\;:\!=\;\|\Phi *(\mathbf{A}_x-\mathbf{A}_y)\|^2. \end{eqnarray}

For $x\in \Omega$ , let

\begin{eqnarray*} n_i(x)=|x^{-1}(i)|,\ \forall \ i\in [k]; \end{eqnarray*}

then $n_i(x)$ is the number of vertices in community $i$ under the community assignment mapping $x$ . It is straightforward to check that

\begin{eqnarray*} \sum _{i=1}^k n_i(x)=n. \end{eqnarray*}

For $i,j\in [k]$ and $x,z\in \Omega$ , let $t_{i,j}(x,z)$ be a non-negative integer given by:

\begin{eqnarray*} t_{i,j}(x,z)=|x^{-1}(i)\cap z^{-1}(j)|. \end{eqnarray*}

That is, $t_{i,j}(x,z)$ is the number of vertices in $[n]$ which are in community $i$ under the mapping $x$ and in community $j$ under the mapping $z$ . Then

(6) \begin{eqnarray} \sum _{j\in [k]}t_{i,j}(x,z)=n_i(x);\qquad \sum _{i\in [k]}t_{i,j}(x,z)=n_j(z); \end{eqnarray}

We now introduce an equivalence condition on $\Omega$ .

Definition 1. For $x\in \Omega$ , let $C(x)$ consist of all the $x'\in \Omega$ such that $x'$ can be obtained from $x$ by a $\theta$ -preserving bijection of communities. More precisely, $x'\in C(x)\subset \Omega$ if and only if the following condition holds

  • for $i\in [p]$ and $j\in [n]$ , $\theta (x,i,x(j))=\theta (x',i,x'(j))$ .

We define an equivalence relation on $\Omega$ as follows: we say $x,z\in \Omega$ are equivalent if and only if $x\in C(z)$ . Let $\overline{\Omega }$ be the set of all the equivalence classes in $\Omega$ .

We see that if $x$ and $z$ are equivalent in the sense of Definition 1, the non-random parts of the observation matrices corresponding to $x$ and $z$ satisfy $\mathbf{A}_x=\mathbf{A}_z$ . Hence, any algorithm based on this observation will not distinguish equivalent community assignments; and the best we expect is to exactly recover the equivalent class of the community assignment. We want to assume this $\theta$ function to be able to distinguish community assignments up to the composition with a bijection of communities more precisely.

Assumption 2.2. Let $x,z\in \Omega$ . If for any $i\in [p]$ and $j\in [n]$ ,

(7) \begin{eqnarray} \theta (x,i,x(j))=\theta (z,i,z(j)); \end{eqnarray}

then there is a bijection $\eta\; :\;[k]\rightarrow [k]$ , such that

(8) \begin{eqnarray} x=\eta \circ z \end{eqnarray}

where $\circ$ denotes the composition of two mappings.

Note that (8) is equivalent of saying that for $i,j\in [n]$ , $x(i)=x(j)$ if and only if $x'(i)=x'(j)$ . See Section 4.1 for examples.

Define a set

(9) \begin{eqnarray} \mathscr{B}\;:\!=\;\left \{(t_{1,1},t_{1,2},\ldots,t_{k,k})\in \{0,1,2,\ldots,n\}^{k^2}:\sum _{i=1}^{k}t_{i,j}=n_j\right \}. \end{eqnarray}

For $\varepsilon \gt 0$ , define a set ${\mathscr B}_{\varepsilon }$ consisting of all the $(t_{1,1},t_{1,2},\ldots,t_{k,k})\in{\mathscr B}$ satisfying all the following conditions:

  1. 1. $\forall \ i\in [k],\ \max _{j\in [k]}t_{j,i}\geq n_i-n\varepsilon$ .

  2. 2. For $i\in [k]$ , let $t_{w(i),i}=\max _{j\in [k]}t_{j,i}$ . Then $w$ is a bijection from $[k]$ to $[k]$ .

  3. 3. $w$ is $\theta$ -preserving, that is, for any $x\in \Omega$ , $i\in [p]$ and $a\in [k]$ , we have

    \begin{eqnarray*} \theta (x,i,a)=\theta (w\circ x,i,w(a)). \end{eqnarray*}

We may assume $\theta$ and $\Sigma$ satisfy the following assumptions.

Assumption 2.3. Suppose $\varepsilon \in (0,\frac{2c}{3k})$ , $x\in \Omega _{\frac{2c}{3}}$ and $y\in \Omega _c$ . Then for all $x,y\in \Omega$ , and

(10) \begin{eqnarray} (t_{1,1}(x,y),t_{1,2}(x,y),\ldots,t_{k,k}(x,y))\in \mathscr{B}\setminus \mathscr{B}_{\varepsilon }, \end{eqnarray}

we have

(11) \begin{eqnarray} L_{\Phi }(x,y)\geq R(n)\gt 0. \end{eqnarray}

Here $R(n)$ is some function of $n$ , such that ( 11 ) combined with ( 15 ) leads to the desired exact recovery result in Theorem 2.6 .

Assumption 2.4. Assume $\varepsilon \in (0,\frac{2c}{3k})$ , $x\in \Omega _{\frac{2c}{3}}$ and $y\in \Omega _c$ . Assume there exists $\Delta \gt 0$ such that the following holds:

Let $y_1,y_2\in \Omega _{\frac{2c}{3}}$ and $a,b\in [k]$ and $a\neq b$ . Let $i,j\in [n]$ such that $i\in y_1^{-1}(a)\cap x^{-1}(b)$ . Let $y_2\;:\;[n]\rightarrow [k]$ be defined as follows:

\begin{eqnarray*} y_2(j)\;:\!=\;\begin{cases}b&\mathrm{if}\ j=i\\ y_1(j)&\mathrm{if}\ j\in [n]\setminus \{i\} \end{cases}. \end{eqnarray*}

When

(12) \begin{eqnarray} \left (t_{1,1}(x,y_1),t_{1,2}(x,y_1),\ldots,t_{k,k}(x,y_1)\right )\in \mathscr{B}_{\varepsilon } \end{eqnarray}

such that for all $i\in [k]$

\begin{eqnarray*} t_{i,i}=\max _{j\in [k]}t_{j,i}(x,y_1); \end{eqnarray*}

$\varepsilon \in \left (0,\frac{2c}{3k}\right )$ , and

\begin{eqnarray*} y_1\notin C(x);\ \end{eqnarray*}

we have

\begin{eqnarray*} L_{\Phi }(x,y_1)-L_{\Phi }(x,y_2)\geq \Delta (1+o(1)). \end{eqnarray*}

where $o(1)\rightarrow 0$ , as $n\rightarrow \infty$ .

Assumptions 2.3 and 2.4 are technical. One may interpret Assumption 2.4 as follows: when a community assignment mapping $y_1\in \Omega$ is sufficiently close to the community assignment mapping $x$ in the sense of (12), change the community of exactly one vertex $j$ from $y_1(j)$ to $x(j)$ and obtain a new community assignment mapping $y_2(j)$ , then the distance between $y_2$ and $x$ , in the sense of $L_{\Phi }$ , approximately decreases by $\Delta$ from the distance between $y_1$ and $x$ . In other words, as $y_1$ is close to $x$ and approaching $x$ , $L_{\Phi }(y_1,x)$ decreases linearly with rate $\Delta$ . As we see later, this will create a convergent geometric series when computing the lower bound for the probability of exact recovery.

Note that Assumption 2.3 can be guaranteed by the following stronger assumption with $R(n)=\frac{T(n)}{B_1^2}$ ; see Lemma 3.5.

Assumption 2.5.

  1. 1. There exists $B_1\gt 0$ , such that for all $i,j\in [p]\times n$ , we have

    \begin{eqnarray*} |\sigma _{i,j}|\leq B_1. \end{eqnarray*}
  2. 2. Assume $\varepsilon \in (0,\frac{2c}{3k})$ , $x\in \Omega _{\frac{2c}{3}}$ and $y\in \Omega _c$ . Then for all $x,y\in \Omega$ , and

    (13) \begin{eqnarray} (t_{1,1}(x,y),t_{1,2}(x,y),\ldots,t_{k,k}(x,y))\in \mathscr{B}\setminus \mathscr{B}_{\varepsilon }, \end{eqnarray}
    we have
    (14) \begin{eqnarray} \sum _{i\in [p],j\in [n]}(\theta (x,i,x(j))-\theta (y,i,y(j)))^2\geq T(n)\gt 0. \end{eqnarray}

Here $T(n)$ is some function of $n$ , such that ( 14 ) combined with ( 16 ) leads to the desired exact recovery result in Theorem 2.6 .

Assumption 2.5(1) gives an upper bound on the standard deviation of the Gaussian perturbations (which may depend on $n$ ); Assumption 2.5(2) may be interpreted as when $x,y\in \Omega$ are “far away” from each other in the sense of (13), the corresponding $\theta$ functions are “far away” from each other in the sense of (11).

Theorem 2.6. Assume $y\in \Omega _c$ is the true community assignment mapping. Suppose that Assumptions 2.5 and 2.4 hold. Let $\varepsilon \in (0,\frac{2c}{3k})$ . Suppose one of the following cases occurs

  1. 1. Assumption 2.3 holds and

    (15) \begin{eqnarray} \lim _{n\rightarrow \infty } n\log k-\frac{R(n)}{8}=-\infty, \end{eqnarray}
  2. 2. Assumption 2.5 holds and

    (16) \begin{eqnarray} \lim _{n\rightarrow \infty } n\log k-\frac{T(n)}{8B_1^2}=-\infty . \end{eqnarray}

Moreover, suppose that for any constant $\delta \gt 0$ independent of $n$ ,

(17) \begin{eqnarray} \lim _{n\rightarrow \infty }\log k+\log n-\frac{\Delta (1-\delta )}{8}=-\infty, \end{eqnarray}

then $\lim _{n\rightarrow \infty }\Pr (\hat{y}\in C(y))=1$ .

Theorem 2.6 gives a sufficient condition for the exact recovery of MLE in the Gaussian mixture model. It is proved in Section 3. An application of Theorem 2.6 on the exact recovery of community detection on hypergraphs is discussed in Section 4.3.

We also obtain a condition for the exact recovery when the sample space of the MLE is restricted to $\Omega _{n_1,\ldots,n_k}$ , that is, the number of vertices in each community is known and fixed.

Assumption 2.7. Assume $x,y_m,y_h\in \Omega$ such that

  1. 1. $D_{\Omega }(y_m,y_h)=j$ , where $j\geq 2$ is a positive integer; and

  2. 2. There exist $u_1,\ldots,u_j\in [n]$ , such that

    1. (a) $y_m(v)=y_h(v)$ , for all $v\in [n]\setminus \{u_1,\ldots,u_j\}$ ; and

    2. (b) $y_m(u_i)\neq y_h(u_i)=x(u_i)=y_m(u_{i-1})$ for all $i\in [j]$ .

    3. (c) $(t_{1,1}(x,y_m),t_{1,2}(x,y_m),\ldots,t_{k,k}(x,y_m))\in \mathscr{B}_{\varepsilon }$ with $\varepsilon \in \left (0,\frac{2c}{3k}\right )$ and $w(i)=i$ .

Then

(18) \begin{eqnarray} L_{\Phi }(x,y_m)- L_{\Phi }(x,y_h)\geq j\Delta (1+o(1)) \end{eqnarray}

Theorem 2.8. Suppose that Assumptions 2.5, 2.7 , ( 15 ) and ( 17 ) hold. Then $\lim _{n\rightarrow \infty }\Pr (\check{y}\in C(y))=1$ .

Indeed, Assumption 2.4 implies Assumption 2.7; see Lemma 5.5. Theorem 2.8 is proved in Section (5).

2.2. Hypergraphs

A special case for the Gaussian mixture model is the hypergraph model. Let $s,s_1,s_2$ be positive integers satisfying

\begin{eqnarray*} 2\leq s_1\leq s\leq s_2. \end{eqnarray*}

A hypergraph $H=(V,E)$ has vertex set $V\;:\!=\;[n]$ and hyperedge set $E$ defined as follows:

\begin{eqnarray*} E\;:\!=\;\{(a_1,\ldots,a_s)\;:\; a_1,\ldots,a_s\in [n], s\in \{s_1,s_1+1,\ldots,s_2\}\} \end{eqnarray*}

Let $\phi \;:\;\cup _{s=s_1}^{s_2}[k]^{s}\rightarrow [0,\infty )$ be a function which assigns a unique real number $\phi (c_1,\ldots,c_s)$ to each $s$ -tuple of communities $(c_1,\ldots,c_s)\in [k]^s$ , and $s\in [s_1,s_2]$ .

For a community assignment mapping $x$ , the weighted adjacency tensor $\mathbf{A}_x$ is defined by:

(19) \begin{eqnarray} (\mathbf{A}_x)_{a_1,\ldots,a_s}=\begin{cases}\phi (x(a_1),\ldots,x(a_s)),&\mathrm{if}\ (a_1,\ldots,a_s)\in E\\ 0&\mathrm{otherwise}.\end{cases} \end{eqnarray}

and

\begin{eqnarray*} \Sigma _{(a_1,\ldots,a_s)}\;:\!=\;\sigma _{(a_1,\ldots,a_s)} \end{eqnarray*}

Define a random tensor $\mathbf{K}_x$ as in (2). Recall that $y\in \Omega _c$ is the true community assignment mapping. Define $\hat{y}$ and $\check{y}$ as in (3) and (4).

Recall that $y\in \Omega$ is the true community assignment mapping satisfying $|y^{-1}(i)|=n_i$ , for all $i\in [k]$ . Let $a\in [n]$ . Let $y^{(a)}\in \Omega$ be defined by:

(20) \begin{eqnarray} y^{(a)}(i)=\begin{cases} y(i)&\mathrm{if}\ i\in [n],\ \mathrm{and}\ i\neq a\\ y^{(a)}(a)&\mathrm{if}\ i=a.\end{cases} \end{eqnarray}

such that

\begin{eqnarray*} y(a)\neq y^{(a)}(a)\in [k]. \end{eqnarray*}

Theorem 2.9. Assume

(21) \begin{eqnarray} \lim _{n\rightarrow \infty }\min _{i\in [k]}n_i=\infty . \end{eqnarray}

Suppose that there exists a subset $H\subset [n]$ satisfying all the following conditions:

  1. 1. $|H|=h=o(n)$ ;

  2. 2. $\lim _{n\rightarrow \infty }\frac{\log h}{\log n}=1$ ;

  3. 3. For each $g\in H$ ,

    \begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus H)^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},g,i_{j+1},\ldots,i_s)}^2}\\ &&\times (\phi (y(i_1),\ldots,y^{(g)}(g),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(g),\ldots,y(i_s)))^2\\ &=&(1+o(1))L_{\Phi }(y^{(g)},y) \end{eqnarray*}
  4. 4. there exists a constant $\beta \gt 0$ independent of $n$ , such that

    \begin{eqnarray*} \frac{\max _{a\in H}L_{\Phi }(y^{(a)},y)}{\min _{a\in H} L_{\Phi }(y^{(a)},y)}\leq \beta ^2,\ \forall n. \end{eqnarray*}

If there exists a constant $\delta \gt 0$ independent of $n$ , such that

(22) \begin{eqnarray} \max _{a\in H}L_{\Phi }(y^{(a)},y)\lt 8(1-\delta )\log n \end{eqnarray}

Then $\lim _{n\rightarrow \infty }\Pr (\hat{y}\in C(y))=0$ .

Theorem 2.9 is proved in Section 4. An example is given in Section 4.2.

Let $a,b\in [n]$ such that $y(a)\neq y(b)$ . Let $y^{(ab)}\in \Omega _{n_1,\ldots,n_k}$ be the community assignment mapping defined by:

(23) \begin{eqnarray} y^{(ab)}(i)=\begin{cases}y(i)& \mathrm{if}\ i\in [n]\setminus \{a,b\}\\ y(b)&\mathrm{if}\ i=a\\ y(a)& \mathrm{if}\ i=b \end{cases} \end{eqnarray}

In other words, $y^{(ab)}$ is obtained from $y$ by exchanging $y(a)$ and $y(b)$ .

We also prove a condition when the exact recovery does not occur if the sample space of the MLE is restricted in $\Omega _{n_1,\ldots,n_k}$ .

Theorem 2.10. Assume

(24) \begin{eqnarray} \lim _{n\rightarrow \infty }\min _{i\in [k]}n_i=\infty . \end{eqnarray}

Suppose that there exist two subsets $H_1,H_2\subset [n]$ satisfying all the following conditions:

  1. 1. $|H_1|=|H_2|=h=o(n)$ ;

  2. 2. $\lim _{n\rightarrow \infty }\frac{\log h}{\log n}=1$ ;

  3. 3. For any $u_1,u_2\in H_1$ and $v_1,v_2\in H_2$ ,

    \begin{eqnarray*} y(u_1)=y(u_2)\neq y(v_1)=y(v_2); \end{eqnarray*}
  4. 4. For any $u\in H_1$ and $v\in H_2$

    \begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_1\cup H_2))^{s-1}}\left (\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},u,i_{j+1},\ldots,i_s)}^2}+\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},v,i_{j+1},\ldots,i_s)}^2}\right )\\ &&(\phi (y(i_1),\ldots,y(v),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(u),\ldots,y(i_s)))^2\\ &=&(1+o(1))L_{\Phi }(y^{(uv)},y) \end{eqnarray*}
  5. 5. For any $g\in H_1\cup H_2$ , the quantity

    \begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_1\cup H_2))^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},g,i_{j+1},\ldots,i_s)}^2}\\ &&(\phi (y(i_1),\ldots,y(b),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))^2 \end{eqnarray*}
    is a constant and is independent of $g$ .

If there exists a constant $\delta \gt 0$ independent of $n$ , such that

(25) \begin{eqnarray} \max _{u\in H_1,v\in H_2}L_{\Phi }(y^{(uv)},y)\lt 16(1-\delta )\log n, \end{eqnarray}

$\lim _{n\rightarrow \infty }\Pr (\check{y}\in C(y))=0$ .

Theorem 2.10 is proved in Section 6.

3. Community detection on K-community Gaussian mixture models

In this section, we consider the MLE when the number of vertices in each community is unknown. We shall obtain a sufficient condition for the occurrence of the exact recovery. The main goal is to prove Theorem 2.6.

Recall that we defined an equivalence relation on $\Omega$ in Definition 1. It is straightforward to check that

\begin{eqnarray*} \mathbf{K}_y=\mathbf{K}_{y^{\prime}},\ \mathrm{and}\ \mathbf{A}_y=\mathbf{A}_{y^{\prime}},\qquad \mathrm{if}\ y'\in C(y). \end{eqnarray*}

Therefore, the MLE based on the observation $\mathbf{K}_y$ can only recover the community assignment mapping up to equivalence.

The main idea to prove Theorem 2.6 is as follows:

  • We find an upper bound for the probability that the MLE is not in the equivalence class of the true community assignment; see (32).

  • We show this upper bound converges to zero as $n\rightarrow \infty$ under the assumptions of Theorem 2.6. This is achieved by splitting the upper bounds into two parts: one part $I_2$ is the sum over all the community assignment functions that differs from the true community assignment functions by at most $n\varepsilon$ and other part $I_1$ is the sum over all the community assignment functions that differs from the true community assignment functions by at least $n\varepsilon$ . The fact that $I_2$ converges to 0 as $n\rightarrow \infty$ follows directly from Assumption 2.5. To prove that $I_2$ converges to 0 as $n\rightarrow \infty$ , we bound $I_1$ by a geometric series that converges to 0 as $n\rightarrow \infty$ .

Note that

(26) \begin{eqnarray} \langle \Phi *\mathbf{A}_x, \Phi *\mathbf{A}_z\rangle &=&\sum _{i\in [p],j\in [n]}\frac{(\mathbf{A}_x)_{i,j}(\mathbf{A}_z)_{i,j}}{\sigma _{i,j}^2}\\ &=&\sum _{i\in [p],j\in [n]}\frac{\theta (x,i,x(j))\theta (z,i,z(j))}{\sigma _{i,j}^2} \notag \end{eqnarray}

In particular for each $x\in \Omega$ , we have

(27) \begin{eqnarray} \|\Phi *\mathbf{A}_x\|^2=\sum _{i\in [p];j\in [n]}\frac{(\theta (x,i,x(j)))^2}{\sigma _{i,j}^2} \end{eqnarray}

Recall that $y\in \Omega _{n_1,\ldots,n_k}$ is the true community assignment mapping. Note that

(28) \begin{eqnarray} \|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2&=&\|\Phi *\mathbf{K}_y\|^2-2\langle \Phi *\mathbf{K}_y, \Phi *\mathbf{A}_x\rangle +\|\Phi *\mathbf{A}_x\|^2 \end{eqnarray}

For each fixed observation $\mathbf{K}_y$ , $\|\Phi *\mathbf{K}_y\|^2$ is fixed and independent of $x\in \Omega$ . Therefore,

\begin{align*} \hat{y}&\;:\!=\;\mathrm{argmin}_{x\in \Omega _{\frac{2c}{3}}}\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2\\ &=\mathrm{argmin}_{x\in \Omega _{\frac{2c}{3}}}\left (-2\langle \Phi *\mathbf{K}_y, \Phi *\mathbf{A}_x\rangle +\|\Phi *\mathbf{A}_x\|^2\right ) \end{align*}

For $x\in \Omega$ , define

(29) \begin{eqnarray} f(x)\;:\!=\;-2\langle \Phi *\mathbf{K}_y, \Phi *\mathbf{A}_x\rangle +\|\Phi *\mathbf{A}_x\|^2 \end{eqnarray}

Then

(30) \begin{eqnarray} f(x)-f(y)&=&\|\Phi *\mathbf{A}_x\|^2-\|\Phi *\mathbf{A}_y\|^2\\ &&-2\langle \Phi *\mathbf{A}_y, \Phi *(\mathbf{A}_x-\mathbf{A}_y)\rangle -2\langle \mathbf{W}, \Phi *(\mathbf{A}_x-\mathbf{A}_y) \rangle \notag \\ &=&\|\Phi *(\mathbf{A}_x-\mathbf{A}_y)\|^2-2\langle \mathbf{W}, \Phi *(\mathbf{A}_x-\mathbf{A}_y)\notag, \end{eqnarray}

where we use the identity

\begin{eqnarray*} \Phi *(\Sigma *\mathbf{W})=\mathbf{W}. \end{eqnarray*}

Then $f(x)-f(y)$ is a Gaussian random variable with mean value

\begin{eqnarray*} \mathbb{E}\left (f(x)-f(y)\right )=L_{\Phi }(x,y); \end{eqnarray*}

and variance

\begin{eqnarray*} \mathrm{Var}(f(x)-f(y)) &=&4L_{\Phi }(x,y). \end{eqnarray*}

Lemma 3.1. For $x,z\in \Omega$ . If $x\in C(z)$ , then

\begin{eqnarray*} f(x)=f(z). \end{eqnarray*}

Proof. By Definition 1, if $x\in C(z)$ , then for any $i\in [p]$ and $j,h\in [n]$ , $x(j)=x(h)$ if and only if $z(j)=z(h)$ and $\theta (x,i,x(j))=\theta (z,i,z(j))$ , then $\mathbf{A}_x=\mathbf{A}_z$ by (19). Moreover, since for any $i\in [p]$ and $j\in [n]$ ,

\begin{eqnarray*} (\mathbf{A}_x)_{i,j}=(\mathbf{A}_z)_{i,j}; \end{eqnarray*}

we have

\begin{eqnarray*} \frac{(\mathbf{A}_x)_{i,j}}{\sigma _{i,j}}=\frac{(\mathbf{A}_z)_{i,j}}{\sigma _{i,j}}; \end{eqnarray*}

this implies

(31) \begin{eqnarray} \Phi *\mathbf{A}_x=\Phi *\mathbf{A}_z. \end{eqnarray}

Then the lemma follows from (29).

Define

\begin{align*} p(\hat{y}\; ;\;\sigma )\;:\!=\;\mathrm{Pr}\left (\hat{y}\in C(y)\right )=\mathrm{Pr}\left (f(y)\lt \min _{C(x)\in \overline{\Omega }_{\frac{2c}{3}},C(x)\neq C(y)}f(x)\right ) \end{align*}

Then

(32) \begin{align} 1-p(\hat{y}\; ;\;\sigma )\leq & \sum _{C(x)\in \overline{\Omega }_{\frac{2c}{3}}\;:\;C(x)\neq C(y)}\mathrm{Pr}(f(x)-f(y)\leq 0)\\ \notag =& \sum _{C(x)\in \overline{\Omega }_{\frac{2c}{3}}\;:\;C(x)\neq C(y)}\mathrm{Pr}_{\xi \in \mathscr{N}(0,1)}\left (\xi \leq \frac{-\sqrt{L_{\Phi }(x,y)}}{2 }\right )\\ \notag \leq & \sum _{C(x)\in \overline{\Omega }_{\frac{2c}{3}}\;:\;C(x)\neq C(y)} e^{-\frac{(L_{\Phi }(x,y))^2}{8}}. \end{align}

Lemma 3.2. Let $x,y,x',y'\in \Omega$ , such that $x'\in C(x)$ and $y'\in C(y)$ , then

\begin{eqnarray*} L_{\Phi }(x,y)=L_{\Phi }(x',y'). \end{eqnarray*}

Proof. By (31), we obtain that when $x'\in C(x)$ and $y'\in C(y)$ ,

\begin{eqnarray*} \Phi *\mathbf{A}_x=\Phi *\mathbf{A}_{x^{\prime}};\qquad \Phi *\mathbf{A}_y=\Phi *\mathbf{A}_{y^{\prime}}. \end{eqnarray*}

Then the lemma follows from (5).

Lemma 3.3. For $x,y\in \Omega$ , $L_{\Phi }(x,y)\geq 0$ . Moreover,

  1. 1. If $x\in C(y)$ , then $L_{\Phi }(x,y)=0$ .

  2. 2. If $\theta$ satisfies Assumption 2.2 and $L_{\Phi }(x,y)=0$ , then $x\in C(y)$ .

Proof. From (5), it is straightforward to check that $L_{\Phi }(x,y)\geq 0$ for any $x,y\in \Omega$ . Moreover, from (5), we obtain

\begin{eqnarray*} L_{\Phi }(x,y)=\sum _{i\in [p],j\in [n]}\sigma _{i,j}^2(\theta (x,i,x(j))-\theta (y,i,y(j)))^2 \end{eqnarray*}

By the fact that $\sigma _{i,j}\gt 0$ for all $i\in [p],j\in [n]$ , we obtain that $L_{\Phi }(x,y)=0$ if and only if

(33) \begin{eqnarray} \theta (x,i,x(j))=\theta (y,i,y(j)),\ \qquad \forall i\in [p], j\in [n]. \end{eqnarray}

If $x\in C(y)$ , then there exists a $\theta$ -preserving bijection $\eta \;:\;[k]\rightarrow [k]$ , such that $x=\eta \circ y$ . Then (33) holds by the $\theta$ -preserving property of $\eta$ , then we obtain Part(1).

On the other hand, if $L_{\Phi }(x,y)=0$ , we have (33) holds. Then $x\in C(y)$ follows from Assumption 2.2.

Lemma 3.4. Assume that $y\in \Omega _c$ and $x\in \Omega _{\frac{2c}{3}}$ . For $i\in [k]$ , let

(34) \begin{eqnarray} t_{w(i),i}(x,y)=\max _{j\in [k]}t_{j,i}(x,y), \end{eqnarray}

where $w(i)\in [k]$ . When $\varepsilon \in \left (0,\frac{2c}{3k}\right )$ and $(t_{1,1}(x,y),t_{1,2}(x,y),\ldots, t_{k,k}(x,y))\in{\mathbb R}^{k^2}$ satisfies

\begin{eqnarray*} \max _{j\in [k]}t_{j,i}(x,y)\geq n_i-n\varepsilon,\ \forall i\in [k] \end{eqnarray*}

$w$ is a bijection from $[k]$ to $[k]$ .

Proof. See Lemma 5.6 of [Reference Li13].

Definition 2. Define the distance function $D_{\Omega }\;:\;\Omega \times \Omega \rightarrow [n]$ as follows:

\begin{eqnarray*} D_{\Omega }(x,y)=\sum _{i,j\in [k],i\neq j}t_{i,j}(x,y). \end{eqnarray*}

for $x,y\in \Omega$ .

From Definition 2, it is straightforward to check that

\begin{eqnarray*} D_{\Omega }(x,y)=n-\sum _{i\in [k]}t_{i,i}(x,y) \end{eqnarray*}

Lemma 3.5. Assume that $\theta,\Sigma$ satisfies Assumptions 2.5 . Then for all the $x,y\in \Omega$ such that ( 10 ) holds, we have

\begin{eqnarray*} L_{\Phi }(x,y)\geq \frac{T(n)}{B_1^2}. \end{eqnarray*}

Proof. Note that

\begin{eqnarray*} L_{\Phi }(x,y)=\left (\sum _{i\in [p],j\in [n]}\left (\theta (x,i,x(j))-\theta (y,i,y(j))\right )^2\right )\left (\frac{\sum _{i\in [p],j\in [n]}\frac{1}{\sigma _{i,j}^2}\left (\theta (x,i,x(j))-\theta (y,i,y(j))\right )^2}{\sum _{i\in [p],j\in [n]}\left (\theta (x,i,x(j))-\theta (y,i,y(j))\right )^2}\right ) \end{eqnarray*}

By Assumption 2.5(1), we have

\begin{eqnarray*} \frac{\sum _{i\in [p],j\in [n]}\frac{1}{\sigma _{i,j}^2} \left (\theta (x,i,x(j))-\theta (y,i,y(j))\right )^2}{\sum _{i\in [p],j\in [n]}\left (\theta (x,i,x(j))-\theta (y,i,y(j))\right )^2}\geq \frac{1}{B_1^2} \end{eqnarray*}

Then the lemma follows from Assumption 2.5(2).

Proof of Theorem 2.6 . Note that

\begin{eqnarray*} \sum _{C(x)\in \overline{\Omega }\setminus C(y)}e^{-\frac{L_{\Phi }(x,y)}{8}}\leq I_1+I_2 \end{eqnarray*}

where

\begin{eqnarray*} I_1&=&\sum _{C(x)\in \overline{\Omega }_{\frac{2c}{3}}:\left (t_{1,1}(x,y),\ldots,t_{k,k}(x,y)\right )\in [\mathscr{B}\setminus \mathscr{B}_{\varepsilon }], C(x)\neq C(y)}e^{\frac{-(L_{\Phi }(x,y))^2}{8}} \end{eqnarray*}

and

\begin{eqnarray*} I_2=\sum _{C(x)\in \overline{\Omega }_{\frac{2c}{3}}:\left (t_{1,1}(x,y),\ldots, t_{k,k}(x,y)\right )\in \mathscr{B}_{\varepsilon }, C(x)\neq C(y)}e^{\frac{-(L_{\Phi }(x,y))^2}{8}}. \end{eqnarray*}

and $\varepsilon \in \left (0,\frac{2c}{3k}\right )$ .

By Lemma 3.5, when Assumption 2.5 holds, we have

\begin{eqnarray*} I_1&\leq & k^n e^{-\frac{T(n)}{8B_1^2}} \end{eqnarray*}

When (15) holds, we obtain

(35) \begin{eqnarray} \lim _{n\rightarrow \infty }I_1=0. \end{eqnarray}

Now let us consider $I_2$ . Let $w$ be the bijection from $[k]$ to $[k]$ as defined in (34). Let $y^*\in \Omega$ be defined by:

\begin{eqnarray*} y^*(z)=w(y(z)),\ \forall z\in [n]. \end{eqnarray*}

Then $y^*\in C(y)$ since $w$ is $\theta$ -preserving by the definition of $\mathscr{B}_{\varepsilon }$ . Moreover, $x$ and $y^*$ satisfies

(36) \begin{eqnarray} t_{i,i}(x,y^*)\geq n_i(y^*)-n\varepsilon,\ \forall i\in [k]. \end{eqnarray}

We consider the following community changing process to obtain $x$ from $y^*$ :

  1. 1. If for all $(j,i)\in [k]^2$ , and $j\neq i$ , $t_{j,i}(x,y^*)=0$ , then $x=y^*$ .

  2. 2. If (1) does not hold, find the least $(j,i)\in [k]^2$ in lexicographic order such that $j\neq i$ and $t_{j,i}(x,y^*)\gt 0$ . Choose an arbitrary vertex $u\in \left \{ x^{-1}(j)\cap (y^*)^{-1}(i)\right \}$ . Define $y_1\in \Omega$ as follows:

    \begin{eqnarray*} y_1(z)= \begin{cases}j&\mathrm{if}\ z=u\\ y^*(z)&\mathrm{if}\ z\in [n]\setminus \{u\} \end{cases} \end{eqnarray*}

Then we have

(37) \begin{align}& t_{j,i}(x,y_1)= t_{j,i}(x,y^*)-1 \end{align}
(38) \begin{align}& t_{j,j}(x,y_1)= t_{j,j}(x,y^*)+1\\ &t_{a,b}(x,y_1)=t_{a,b}(x,y^*)\ \forall (a,b)\in \left ([k]^2\setminus \{(j,i),(j,j)\}\right ).\notag \\ &n_i(y_1)=n_i(y^*)-1\notag \\ &n_j(y_1)=n_j(y^*)+1\notag \\ &n_{l}(y_1)=n_l(y^*)\ \forall l\in [k]\setminus \{i,j\}.\notag \end{align}

Therefore, $x$ , $y_1$ and $y^*$ satisfy

\begin{eqnarray*} &&t_{l,l}(x,y_1)\geq n_l(y_1)-n\varepsilon ;\\ &&t_{l,l}(x,y_1)\geq n_l(y^*)-n\varepsilon ;\\ &&n_l(y_1)\geq n_l(y^*)-n\varepsilon ; \end{eqnarray*}

for all $l\in [k]$ .

From Assumption 2.4 and Lemma 3.2, we obtain

\begin{eqnarray*} L_{\Phi }(x,y_1)-L_{\Phi }(x,y)&=&L_{\Phi }(x,y_1)-L_{\Phi }(x,y^*)\leq -\Delta (1+o(1)). \end{eqnarray*}

Therefore,

(39) \begin{eqnarray} e^{-\frac{L_{\Phi }(x,y)}{8}}\leq e^{-\frac{L_{\Phi }(x,y_1)}{8}}e^{-\frac{\Delta (1+o(1))}{8}} \end{eqnarray}

In general, if we have constructed $y_l\in \Omega$ ( $r\geq 1$ ) satisfying all the following conditions:

(40) \begin{eqnarray} &&t_{l,l}(x,y_r)\geq n_l(y_r)-n\varepsilon ;\notag \\ &&t_{l,l}(x,y_r)\geq n_l(y^*)-n\varepsilon ;\notag \\ &&n_l(y_r)\geq n_l(y^*)-n\varepsilon ; \end{eqnarray}

for all $l\in [k]$ . We now construct $y_{r+1}\in \Omega$ as follows:

  1. (1) If for all $(j,i)\in [k]^2$ , and $j\neq i$ , $t_{j,i}(x,y_r)=0$ , then $x=y_r$ ; then the construction process stops at this step.

  2. (2) If (a) does not hold, find the least $(j,i)\in [k]^2$ in lexicographic order such that $j\neq i$ and $t_{j,i}(x,y_r)\gt 0$ . Choose an arbitrary vertex $u\in \left \{x^{-1}(j)\cap y_r^{-1}(i)\right \}$ . Define $y_{r+1}\in \Omega$ as follows:

    \begin{eqnarray*} y_{r+1}(z)= \begin{cases}j&\mathrm{if}\ z=u\\ y_r(z)&\mathrm{if}\ z\in [n]\setminus \{u\} \end{cases} \end{eqnarray*}

Then it is straightforward to check that

\begin{eqnarray*} &&t_{l,l}(x,y_{r+1})\geq n_l(y_{r+1})-n\varepsilon ;\\ &&t_{l,l}(x,y_{r+1})\geq n_l(y^*)-n\varepsilon ;\\ &&n_l(y_{r+1})\geq n_l(y^*)-n\varepsilon ; \end{eqnarray*}

for all $l\in [k]$ .

Then if (36) holds with $y^*$ replaced by $y_r$ , then (36) holds with $y^*$ replaced by $y_{r+1}$ . By Assumption 2.4, we obtain

\begin{eqnarray*} e^{-\frac{L_{\Phi }(x,y_r)}{8}}&\leq & e^{-\frac{L_{\Phi }(x,y_{r+1})}{8}} e^{-\frac{\Delta (1+o(1))}{8}}. \end{eqnarray*}

Recall that the distance $D_{\Omega }$ in $\Omega$ is defined in Definition 2. From the constructions of $y_{r+1}$ , we have

\begin{eqnarray*} D_{\Omega }(x,y_{r+1})=D_{\Omega }(x,y_r)-1. \end{eqnarray*}

Therefore, there exists $h\in [n]$ , such that $y_h=x$ . By (39) and Assumption 2.4, we obtain

\begin{eqnarray*} e^{-\frac{L_{\Phi }(x,y)}{8}}\leq e^{-\frac{h\Delta (1+o(1))}{8}}. \end{eqnarray*}

Since any $x$ in $\mathscr{B}_{\varepsilon }$ can be obtained from $y$ by the community changing process described above, we have

(41) \begin{eqnarray} I_2\leq \sum _{l=1}^{\infty } (nk)^{l} e^{-\frac{l\Delta (1+o(1))}{8}}; \end{eqnarray}

The right-hand side of (41) is the sum of geometric series with both initial term and common ratio equal to

(42) \begin{eqnarray} V\;:\!=\;e^{\log k+\log n-\frac{\Delta (1+o(1))}{8}} \end{eqnarray}

When (17) holds, we obtain

(43) \begin{eqnarray} \lim _{n\rightarrow \infty } I_2=0 \end{eqnarray}

Then the proposition follows from (35) and (43).

4. Community detection on k-community hypergraphs

In this section, we apply the results proved in Section 3 to the exact recovery of the community detection in hypergraphs and also prove conditions when exact recovery does not occur in hypergraphs under the assumption that the number of vertices in each community is unknown.

The main idea to prove Theorem 2.9 is as follows:

  • We obtain an lower bound for the probability that the MLE is not in the equivalent class of the true community assignment function; see (50).

  • We show this lower bound converges to 1 as $n\rightarrow \infty$ . The proof utilises the inequalities regarding the distribution of the maximum of a number of Gaussian random variables as discussed in Section A.

In the case of a hypergraph, from (26), when

\begin{eqnarray*} &&i=(i_1,i_2,\ldots,i_{s-1})\in [n]^{s-1};\\ &&\theta (x,i,a)=\phi (x(i_1),\ldots,x(i_{s-1}),a); \end{eqnarray*}

we obtain for $x,z\in \Omega$

(44) \begin{eqnarray} &&\langle \Phi *\mathbf{A}_x, \Phi *\mathbf{A}_z\rangle \\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s} \frac{(\mathbf{A}_x)_{(i_1,\ldots,i_s)}(\mathbf{A}_z)_{(i_1,\ldots,i_s)}}{\sigma _{(i_1,\ldots,i_s)}^2}\notag \\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s} \frac{\phi (x(i_1),\ldots,x(i_s))\phi (z(i_1),\ldots,z(i_s))}{\sigma ^2_{(i_1,\ldots,i_s)}}\notag \end{eqnarray}

In particular,

\begin{eqnarray*} \|\Phi *\mathbf{A}_x\|^2&=&\langle \Phi * \mathbf{A}_x,\Phi *\mathbf{A}_x \rangle \\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s} \frac{(\phi (x(i_1),\ldots,x(i_s)))^2}{\sigma ^2_{(i_1,\ldots,i_s)}} \end{eqnarray*}

Recall that $y\in \Omega _c$ is the true community assignment mapping. Then

\begin{eqnarray*} \hat{y}&=&\mathrm{argmin}_{x\in \Omega _{\frac{2c}{3}}}\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2=\mathrm{argmin}_{x\in \Omega _{\frac{2c}{3}}}f(x) \end{eqnarray*}

where $f(x)$ is given by (29).

By (30), we obtain that in the hypergraph case

(45) \begin{eqnarray} &&f(x)-f(y)\\ &=&\|\Phi *(\mathbf{A}_x-\mathbf{A}_y)\|^2-2\langle \mathbf{W}, \Phi *(\mathbf{A}_x-\mathbf{A}_y)\rangle \notag \\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s}\frac{(\phi (x(i_1),\ldots,x(i_s))-\phi (y(i_1),\ldots,y(i_s)))^2}{\sigma _{(i_1,\ldots,i_s)}^2}\notag \\ &&-2\langle \mathbf{W}, \Phi *(\mathbf{A}_x-\mathbf{A}_y)\rangle \notag \end{eqnarray}

Then $f(x)-f(y)$ is a Gaussian random variable with mean value $L_{\Phi }(x,y)$ and variance $4L_{\Phi }(x,y)$ , where $L_{\Phi }(x,y)$ is defined by (5).

Proof of Theorem 2.9. When $y^{(a)}\in \Omega$ is defined by (20),

(46) \begin{align} &t_{y^{(a)}(a),y(a)}(y^{(a)},y)=1; \end{align}
(47) \begin{align}&t_{y(a),y(a)}(y^{(a)},y)=n_{y(a)}-1; \end{align}
(48) \begin{align}&t_{i,i}(y^{(a)},y)=n_i;\ \forall \ i\in [k]\setminus \{y(a)\}; \end{align}
(49) \begin{align}&t_{i,j}(y^{(a)},y)=0;\ \forall (i,j)\in [k]^2\setminus \{(y^{(a)}(a),y(a))\},\ \mathrm{and}\ i\neq j. \end{align}

and

\begin{eqnarray*} &&n_{y^{(a)}(a)}(y^{(a)})=n_{y^{(a)}(a)}+1;\\ &&n_{y(a)}(y^{(a)})=n_{y(a)}-1;\\ &&n_i(y^{(a)})=n_i;\ \forall \ i\in [k]\setminus \{y^{(a)}(a),y(a)\}. \end{eqnarray*}

Moreover,

(50) \begin{eqnarray} 1-p(\hat{y};\ \sigma )\geq \mathrm{Pr}\left (\cup _{a\in [n]}\{f(y^{(a)})-f(y)\lt 0\}\right ) \end{eqnarray}

Since any of the event $\{f(y^{(a)})-f(y)\lt 0\}$ implies $\hat{y}\neq y$ .

Let $H\subset [n]$ be given as in the assumptions of the proposition. Under Assumption (3) of the proposition when $a\in H$ , we have

\begin{eqnarray*} &&\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2\\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s}\frac{\left (\phi (y^{(a)}(i_1)),\ldots,y^{(a)}(i_s))-\phi (y(i_1),\ldots,y(i_s))\right )^2}{\sigma ^2_{(i_1,\ldots,i_s)}}\\ &=&L_{\Phi }(y^{(a)},y)\\ &=&(1+o(1))\left \{\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus H)^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},a,i_{j+1},\ldots,i_s)}^2}\right .\\ &&\left .\times (\phi (y(i_1),\ldots,y^{(a)}(a),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))^2\right \} \end{eqnarray*}

Then from (45), we have

\begin{eqnarray*} &&f(y^{(a)})-f(y)\\ &=&-2\langle \mathbf{W},\Phi *\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y} \rangle +(1+o(1))\left \{\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus H)^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},a,i_{j+1},\ldots,i_s)}^2}\right .\\ &&\left .\times (\phi (y(i_1),\ldots,y^{(a)}(a),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))^2\right \}. \end{eqnarray*}

Then $1-p(\hat{y};\ \sigma )$ is at least

\begin{eqnarray*} &&\mathrm{Pr}\left (\cup _{a\in [n]}\left \{f(y^{(a)})-f(y)\lt 0\right \}\right )\\ &\geq &\mathrm{Pr}\left (\mathrm{max}_{a\in [n]}\frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2}\gt 1\right )\\ &\geq &\mathrm{Pr}\left (\mathrm{max}_{a\in H}\frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2}\gt 1\right ) \end{eqnarray*}

Let $(\mathscr{X},\mathscr{Y},\mathscr{Z})$ be a partition of $\cup _{s=s_1}^{s_2}[n]^s$ defined by:

\begin{eqnarray*} &&\mathscr{X}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in \cup _{s=s_1}^{s_2} [n]^s, \{\alpha _1,\ldots,\alpha _s\}\cap H=\emptyset \}\\ &&\mathscr{Y}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in \cup _{s=s_1}^{s_2} [n]^s, |\{i\in [s]\;:\; \alpha _i\in H\}|=1\}\\ &&\mathscr{Z}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in \cup _{s=s_1}^{s_2}[n]^s, |\{i\in [s]\;:\; \alpha _i\in H\}|\geq 2\} \end{eqnarray*}

For $\eta \in \{\mathscr{X},\mathscr{Y},\mathscr{Z}\}$ , define the random tensor $\mathbf{W}_{\eta }$ from the entries of $\mathbf{W}$ as follows:

\begin{eqnarray*} (\mathbf{W}_{\eta })_{(i_1,i_2,\ldots,i_s)}=\begin{cases}0&\mathrm{if}\ (i_1,\ldots,i_s)\notin \eta \\ (\mathbf{W})_{(i_1,\ldots,i_s)},&\mathrm{if}\ (i_1,\ldots,i_s)\in \eta \end{cases} \end{eqnarray*}

For each $a\in H$ , let

\begin{eqnarray*} &&\mathscr{X}_{a}=\langle \mathbf{W}_{\mathscr{X}},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle \\ &&\mathscr{Y}_{a}=\langle \mathbf{W}_{\mathscr{Y}},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle \\ &&\mathscr{Z}_{a}=\langle \mathbf{W}_{\mathscr{Z}},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle \end{eqnarray*}

For $s\in \{s_1,s_1+1,\ldots,s_2\}$ , let

\begin{eqnarray*} J_s\;:\!=\;(j_1,\ldots,j_s)\subset [n]^s \end{eqnarray*}

Explicit computations show that

(51) \begin{eqnarray} &&(\mathbf{A}_{y^{(a)}})_{J_s}-(\mathbf{A}_{y})_{J_s}\\ &=&\begin{cases}\phi (y^{(a)}(j_1),\ldots,y^{(a)}(j_s))-\phi (y(j_1),\ldots,y(j_s))&\mathrm{if}\ a\in \{j_1,\ldots,j_s\}\\0 &\mathrm{otherwise}.\end{cases}\notag \end{eqnarray}

Claim 4.1. The followings are true:

  1. 1. $\mathscr{X}_{a}=0$ for $a\in H$ .

  2. 2. For each $a\in H$ , the variables $\mathscr{Y}_{a}$ and $\mathscr{Z}_{a}$ are independent.

Proof. It is straightforward to check (1). (2) holds because $\mathscr{Y}\cap \mathscr{Z}=\emptyset$ .

For $g\in H$ , let $\mathscr{Y}^{\,g}\subseteq \mathscr{Y}$ be defined by

\begin{eqnarray*} \mathscr{Y}^{\,g}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in \mathscr{Y}\;:\; s\in \{s_1,s_1+1,\ldots,s_2\},\ \exists l\in [s],\ \mathrm{s.t.}\ \alpha _l=g\}. \end{eqnarray*}

Note that for $g_1,g_2\in H$ and $g_1\neq g_2$ , $\mathscr{Y}^{\,g_1}\cap \mathscr{Y}^{\,g_2}=\emptyset$ . Moreover, $\mathscr{Y}=\cup _{g\in H}\mathscr{Y}^{\,g}$ . Therefore,

\begin{eqnarray*} \mathscr{Y}_{a}=\sum _{g\in H}\langle \mathbf{W}_{\mathscr{Y}^{\,g}},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle \end{eqnarray*}

Note also that $\langle \mathbf{W}_{\mathscr{Y}^{\,g}},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle =0$ , if $g\neq a$ . Hence,

\begin{eqnarray*} \mathscr{Y}_{a}=\sum _{\alpha \in \mathscr{Y}^{\,a}}\frac{(\mathbf{W})_{\alpha }\cdot \{(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})_{\alpha }\}}{\sigma _{\alpha }} \end{eqnarray*}

So by (51), we obtain

\begin{eqnarray*} &&\sum _{\alpha \in \mathscr{Y}^{\,a}}\frac{(\mathbf{W})_{\alpha }\cdot \{(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})_{\alpha }\}}{\sigma _{\alpha }} =\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus H)^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},a,i_{j+1},\ldots,i_s)}}\\ &&\left \{(\phi (y(i_1),\ldots,y^{(a)}(a),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))(\mathbf{W})_{(i_1,\ldots,i_{j-1},a,i_{j+1},\ldots,i_s)}\right \} \end{eqnarray*}

Then $\{\mathscr{Y}_g\}_{g\in H}$ is a collection of independent centred Gaussian random variables. Moreover, the variance of $\mathscr{Y}_g$ is equal to

(52) \begin{eqnarray} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus H)^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},g,i_{j+1},\ldots,i_s)}^2}\\ &&(\phi (y(i_1),\ldots,y^{(g)}(g),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(g),\ldots,y(i_s)))^2\notag \\ &=&(1+o(1))L_{\Phi }(y^{(g)},y)\notag \end{eqnarray}

by Assumption (3) of the proposition.

By Claim 4.1, we obtain

\begin{eqnarray*} \frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_y^{(a)}-\mathbf{A}_y)\|^2}=\frac{2\mathscr{Y}_a}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2}+\frac{2\mathscr{Z}_{a}}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2} \end{eqnarray*}

Moreover,

\begin{eqnarray*} \max _{a\in H}\frac{2(\mathscr{Y}_a+\mathscr{Z}_{a})}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})\|^2}&\geq & \max _{a\in H}\frac{ 2\mathscr{Y}_a}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})\|^2}-\max _{a\in H}\frac{-2\mathscr{Z}_{a}}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})\|^2} \end{eqnarray*}

Recall that

\begin{eqnarray*} \|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2=L_{\Phi }(y^{(a)},y) \end{eqnarray*}

By Lemma A.1 about the tail bound result of the maximum of Gaussian random variables, if (A1) holds with $N$ replaced by $h$ , the event

\begin{eqnarray*} E_1\;:\!=\;\left \{\max _{a\in H}\frac{2\mathscr{Y}_a}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})\|^2}\geq (1-\varepsilon )\sqrt{2\min _{a\in H}\frac{4}{L_{\Phi }(y^{(a)},y)}\log h}\right \} \end{eqnarray*}

has probability at least $1-e^{-h^{\varepsilon }}$ , and the event

\begin{eqnarray*} E_2\;:\!=\;\left \{\max _{a\in H}\frac{2\mathscr{Z}_{a}}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2}\leq (1+\varepsilon )\sqrt{2\log h\cdot \max _{a\in H} \frac{4\mathrm{Var}(\mathscr{Z}_{a})}{(L_{\Phi }(y^{(a)},y))^2}}\right \} \end{eqnarray*}

has probability $1-h^{-\varepsilon }$ .

Moreover, by Assumption (3) of the Proposition and (52),

\begin{eqnarray*} \mathrm{Var} \mathscr{Z}_{a}&=&\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})\|^2-\mathrm{Var}(\mathscr{Y}_a)=o(1)L_{\Phi }(y^{(a)},y) \end{eqnarray*}

Define an event $E$ by

\begin{eqnarray*} &&E\;:\!=\;\left \{\max _{a\in H}\frac{2\mathscr{Y}_a+2\mathscr{Z}_{a}}{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y})\|^2}\geq \left (1-\varepsilon -(1+\varepsilon )o(1)\sqrt{\frac{\max _{a\in H}L_{\Phi }(y^{(a)},y)}{\min _{a\in H}L_{\Phi }(y^{(a)},y)}}\right )\right .\\ &&\left .\times \sqrt{8\log h\min _{a\in H}\frac{1}{L_{\Phi }(y^{(a)},y)}}\right \} \end{eqnarray*}

Then $E_1\cap E_2\subseteq E$ .

When $n$ is large, and (22) holds

\begin{eqnarray*} &&\mathrm{Pr}\left (\mathrm{max}_{a\in H}\frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2}\gt 1\right )\\ &&\geq \mathrm{Pr}(E)\geq \Pr (E_1\cap E_2)\geq 1-\Pr (E_1^c)-\Pr (E_2^c)\rightarrow 1, \end{eqnarray*}

as $n\rightarrow \infty$ . Then the proposition follows.

4.1. Examples of Assumption 2.2

We shall see some examples of the function $\theta \; :\; \Omega \times [k]\times [k]\rightarrow{\mathbb R}$ satisfying Assumption 2.2. We first see an example when $\theta$ can uniquely determine the community assignment mapping in $\Omega$ .

Example 4.2. Assume $p=k$ . For $a,b\in [k]$ , $x\in \Omega$

\begin{eqnarray*} \theta (x,a,b)=\begin{cases}1&\mathrm{if}\ a=b\\ 0&\mathrm{otherwise}.\end{cases} \end{eqnarray*}

Then if for all $a\in [p]$ and $j\in [n]$ , ( 7 ) holds, we have $x(j)=a$ if and only if $z(j)=a$ , then $x=z$ .

We now see an example when $\theta$ cannot uniquely determine the community assignment mapping but determines the community assignment mappings up to the equivalent class as defined in Definition 1.

Example 4.3. Assume $p=n$ .

\begin{eqnarray*} \theta (x,i,a)=\begin{cases}1&\mathrm{if}\ x(i)=a;\\ 0&\mathrm{otherwise}.\end{cases} \end{eqnarray*}

Then if for all $i,j\in [n]$ , ( 7 ) holds, we have $x(j)=x(i)$ if and only if $z(j)=z(i)$ , then $x\in C(z)$ .

Example 4.4. Assume $p=n$ , $a\in [k]$ and $i\in [n]$ .

\begin{eqnarray*} \theta (x,i,a)=x(i)-a; \end{eqnarray*}

Then if for all $i,j\in [n]$ , ( 7 ) holds, we have $x(i)-x(j)=z(i)-z(j)$ . This implies that $x(i)=x(j)$ if and only if $z(i)=z(j)$ , therefore $x\in C(z)$ . If both $x$ and $z$ are surjective onto $[k]$ , then $x=z$ .

4.2. Example of Theorem 2.9

Example 4.5. Here we see an example about how to apply Theorem 2.9 to the exact recovery of community detection on hypergraphs. Let $y\in \Omega _{n_1,\ldots,n_k}$ be the true community assignment mapping. Assume that for any $s\in \{s_1,\ldots,s_2\}$ , $(i_1,i_2,\ldots,i_s), (j_1,j_2,\ldots,j_s)\in [n]^s$ , we have

\begin{eqnarray*} \sigma _{(i_1,i_2,\ldots,i_s)}=\sigma _{(j_1,j_2,\ldots,j_s)} \end{eqnarray*}

whenever

\begin{eqnarray*} y(i_r)=y(j_r),\ \forall r\in [s]; \end{eqnarray*}

that is, $\sigma _{(i_1,\ldots,i_s)}$ depends only on the communities of $(i_1,\ldots,i_s)$ under the mapping $y$ . In this case, we can define $\overline{\sigma }: \cup _{s=s_1}^{s_2}[k]^s\rightarrow (0,\infty )$ , such that

(53) \begin{eqnarray} \sigma _{(i_1,\ldots,i_s)}=\overline{\sigma }(y(i_1),\ldots,y(i_s)),\ \forall (i_1,\ldots,i_s)\in [n]^s \end{eqnarray}

Then for any $a\in [n]$ ,

(54) \begin{eqnarray} &&L_{\Phi }(y^{(a)},y)=\|\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_y)\|^2\\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s}\frac{(\phi (y^{(a)}(i_1),\ldots,y^{(a)}(i_s))-\phi (y(i_1),\ldots,y(i_s)))^2}{(\overline{\sigma }(y(i_1),\ldots,y(i_s)))^2}\notag \end{eqnarray}

Moreover, for any $a,b\in [n]$ such that

\begin{eqnarray*} y(a)=y(b);\ y^{(a)}(a)=y^{(b)}(b) \end{eqnarray*}

we have

\begin{eqnarray*} L_{\Phi }(y^{(a)},y)=L_{\Phi }(y^{(b)},y). \end{eqnarray*}

We consider

(55) \begin{eqnarray} \min _{(y^{(a)}(a),y(a))\in [k]^2,y^{(a)}(a)\neq y(a)}L_{\Phi }(y^{(a)},y) \end{eqnarray}

Assume that when $y(a)=r_0$ , $y^{(a)}(a)=r_1$ , $L_{\Phi }(y^{(a)},y)$ achieves its minimum. Let $H\subset y^{-1}(r_0)$ , then $h=|H|\leq n_{r_0}$ . Assume

\begin{eqnarray*} \lim _{n\rightarrow \infty }\frac{\log n_{r_0}}{\log n}=1. \end{eqnarray*}

Then we may choose $h=\frac{n_{r_0}}{\log n}$ such that Assumptions (1)(2) in Theorem 2.9 hold. Moreover, Assumption (4) in Theorem 2.9 holds because if for all $a\in H$ , let $y^{(a)}(a)=r_1$ , then $L_{\Phi }(y^{(a)},y)$ takes the same value for all $a\in H$ . There are many mappings $\phi\; : \; \cup _{s=s_1}^{s_1}[k]^s\rightarrow{\mathbb R}$ to guarantee Assumption (3) in Theorem 2.9 . For example, one may choose

(56) \begin{eqnarray} \phi (b_1,\ldots,b_s)=\begin{cases}2^s&\mathrm{if}\ b_1=\ldots =b_s\\0&\mathrm{otherwise}.\end{cases} \end{eqnarray}

for $s\in \{s_1,s_1+1,\ldots,s_2\}$ and $b_1,\ldots,b_s\in [k]$ . Then from ( 54 ), we obtain

(57) \begin{eqnarray} &&L_{\Phi }(y^{(a)},y)=\sum _{s=s_1}^{s_2}\sum _{(b_1,\ldots,b_s)\in [k]^s}\sum _{(d_1,\ldots,d_s)\in [k]^s}\\ &&\frac{(\phi (d_1,\ldots,d_s)-\phi (b_1,\ldots,b_s))^2}{(\overline{\sigma }(b_1,\ldots,b_s))^2}\left (\prod _{j=1}^{s}t_{d_j,b_j}(y^{(a)},y)\right )\notag \end{eqnarray}

From ( 46 )–( 49 ) and ( 56 ), we obtain that the terms actually contributing to the sum must satisfy

\begin{eqnarray*} \{(d_1,b_1),\dots,(d_s,b_s)\}\subseteq \{(r_1,r_1),(r_1,r_0)\} \end{eqnarray*}

or

\begin{eqnarray*} \{(d_1,b_1),\dots,(d_s,b_s)\}\subseteq \{(r_0,r_0),(r_1,r_0)\} \end{eqnarray*}

Then we obtain

\begin{eqnarray*} L_{\Phi }(y^{(a)},y)&=&\sum _{s=s_1}^{s_2}2^{2s}(L_{0,s}+L_{1,s}) \end{eqnarray*}

where

\begin{eqnarray*} L_{0,s}&=&\sum _{(b_1,\ldots,b_s)\in [k]^s, (d_1,\ldots,d_s)\in [k]^s, (d_1,b_1),\dots,(d_s,b_s)\subseteq \{(r_0,r_0),(r_1,r_0)\}}\frac{\left (\prod _{j=1}^{s}t_{d_j,b_j}(y^{(a)},y)\right )}{(\overline{\sigma }(r_0,\ldots,r_0))^2}\\ L_{1,s}&=&\sum _{(b_1,\ldots,b_s)\in [k]^s, (d_1,\ldots,d_s)\in [k]^s, (d_1,b_1),\dots,(d_s,b_s)\subseteq \{(r_1,r_1),(r_1,r_0)\}}\frac{\left (\prod _{j=1}^{s}t_{d_j,b_j}(y^{(a)},y)\right )}{(\overline{\sigma }(b_1,\ldots,b_s))^2} \end{eqnarray*}

Assume

\begin{eqnarray*} \lim _{n\rightarrow \infty }\min \{n_{r_0},n_{r_1}\}=\infty . \end{eqnarray*}

If in $\{(d_1,b_1),\ldots,(d_s,b_s)\}$ , there exist more than one $g\in [s]$ , such that $(d_g,b_g)=(r_1,r_0)$ , then the sum of such terms will be of order $o((n_{r_0})^{s-1})$ (resp. $o((n_{r_1})^{s-1})$ ) in $L_{0,s}$ (resp. $L_{1,s}$ ). Therefore, we obtain

\begin{eqnarray*} L_{0,s}&=&\frac{s\left (n_{r_0}\right )^{s-1}(1+o(1))}{(\overline{\sigma }(r_0,\ldots,r_0))^2} \end{eqnarray*}

To analyse $L_{1,s}$ , assume that there exists a positive constant $C\gt 0$ independent of $n$ , such that

(58) \begin{eqnarray} 0\lt C\lt \frac{\min _{b_1,\ldots,b_s\in \{r_0,r_1\}}\overline{\sigma }(b_1,\ldots,b_s)}{\max _{b_1,\ldots,b_s\in \{r_0,r_1\}}\overline{\sigma }(b_1,\ldots,b_s)},\ \forall n\in{\mathbb N}, s\in \{s_1,s_1+1,\ldots s_2\}. \end{eqnarray}

Then we obtain

\begin{eqnarray*} L_{1,s}&=&\sum _{j=1}^{s}\frac{\left (n_{r_1}\right )^{s-1}(1+o(1))}{(\overline{\sigma }(r_1,\ldots,r_1,r_0,r_1,\ldots,r_1))^2} \end{eqnarray*}

Then Assumption (3) of Theorem 2.9 follows from the fact that $|H|=\frac{n_{r_0}}{\log n}=o(n_{r_0})$ .

In the special case when all the communities have equal size, we may obtain a sufficient condition that the exact recovery of MLE does not occur in the hypergraph case when the number of communites $k=e^{o(\log n)}$ . Since in this case we have $n_1=n_2=\ldots =n_k\geq e^{\log n-o(\log n)}$ , then ( 21 ) holds. Choose $h=\frac{n_1}{\log n}$ , then Assumptions (1) and (2) of Theorem 2.9 hold.

4.3. Example of Theorem 2.6

Example 4.6. We can also apply Theorem 2.6 to the case of exact recovery of community detection on hypergraphs. Again we consider the case when $\sigma _{(i_1,\ldots,i_s)}$ depends only on $(y(i_1),\ldots,y(i_s))$ . Hence, we may define $\overline{\sigma }$ as in ( 53 ).

To check Assumption 2.4 , let $y_m,y_{m+1},x\in \Omega$ be given as in the proof of Proposition 2.6 . For the simplicity of notation, we use $y$ instead of $y^*$ . By ( 57 ), we obtain

\begin{eqnarray*} &&L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1})\\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s}\frac{1}{(\overline{\sigma }(y(i_1),\ldots,y(i_s)))^2}\\ &&\left [(\phi (x(i_1),\ldots,x(i_s))-\phi (y_m(i_1),\ldots,y_m(i_s)))^2-(\phi (x(i_1),\ldots,x(i_s))-\phi (y_{m+1}(i_1),\ldots,y_{m+1}(i_s)))^2\right ]\\ &=&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [n]^s}\frac{1}{(\overline{\sigma }(y(i_1),\ldots,y(i_s)))^2}\left \{(\phi (y_m(i_1),\ldots,y_m(i_s))^2-(\phi (y_{m+1}(i_1),\ldots,y_{m+1}(i_s))^2\right .\\ &&\left .-2\phi (x(i_1),\ldots,x(i_s))\left [\phi (y_m(i_1),\ldots,y_m(i_s))-\phi (y_{m+1}(i_1),\ldots,y_{m+1}(i_s))\right ]\right \} \end{eqnarray*}

For $j,p,q\in [k]$ and $x,y,z\in \Omega$ , we define

\begin{eqnarray*} t_{j,p,q}(x,y,z)=|\{i\in [n]\;:\; x(i)=j,y(i)=p,z(i=q)\}|=|x^{-1}(j)\cap y^{-1}(p)\cap z^{-1}(q)|. \end{eqnarray*}

Then

(59) \begin{eqnarray} &&L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1})=\sum _{s=s_1}^{s_2}\sum _{(b_1,\ldots,b_s)\in [k]^s}\frac{1}{(\overline{\sigma }(b_1,\ldots,b_s))^2}\sum _{(d_1,\ldots,d_s)\in [k]^s}\\ &&\left \{(\phi (d_1,\ldots,d_s))^2 \left (\prod _{r=1}^{s}t_{b_r,d_r}(y,y_m)-\prod _{r=1}^{s}t_{b_r,d_r}(y,y_{m+1})\right )-2\sum _{(l_1,\ldots,l_s)\in [k]^s}\right .\notag \\ &&\left .\phi (l_1,\ldots,l_s)\phi (d_1,\ldots,d_s) \left (\prod _{r=1}^{s}t_{b_r,d_r,l_r}(y,y_m,x)-\prod _{r=1}^{s}t_{b_r,d_r,l_r}(y,y_{m+1},x)\right )\right \}\notag \end{eqnarray}

Recall that $D_{\Omega }(y_m,y_{m+1})=1$ , and there exists $u\in [n]$ such that

\begin{eqnarray*} x(u)=j=y_{m+1}(u)\neq y_m (u)=i=y(u). \end{eqnarray*}

where $i,j\in [k]$ and $i\neq j$ , while $y_m(v)=y_{m+1}(v)$ for all the $v\in [n]\setminus \{u\}$ . This implies that if $\{d_1,\ldots,d_s\}\cap \{i,j\}=\emptyset$ , then the corresponding summand in ( 59 ) is 0 and does not contribute to the sum. Under the assumption that

  1. 1. $(t_{1,1}(x,y),t_{1,2}(x,y),\ldots,t_{k,k}(x,y))\in \mathscr{B}_{\varepsilon }$ with $w\;:\;[k]\rightarrow [k]$ the identity map; and

  2. 2. $n_1\geq n_2\geq \ldots \geq n_k$ ; and

  3. 3. $\min _{(b_1,\ldots,b_s)\in [k]^s}|\overline{\sigma }(b_1,\ldots,b_s)|\geq B_3\gt 0$ ; and

  4. 4. $\lim _{n\rightarrow \infty }\frac{n\varepsilon }{n_1}=0$ .

we obtain

\begin{eqnarray*} &&L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1})=\sum _{s=s_1}^{s_2}\sum _{g=1}^{s}\sum _{(b_1,\ldots,\widehat{b}_g,\ldots,b_s)\in [k]^s}\frac{1}{(\overline{\sigma }(b_1,\ldots,i,\ldots,b_s))^2}\sum _{(d_1,\ldots,\widehat{d}_g,\ldots,d_s)\in [k]^s}\\ &&\left \{(\phi (d_1,\ldots,i,\ldots,d_s))^2-(\phi (d_1,\ldots,j,\ldots,d_s))^2) \prod _{r\in [s]\setminus \{g\}}t_{b_r,d_r}(y,y_m)-2\sum _{(l_1,\ldots,\widehat{l}_g,\ldots,l_s)\in [k]^s}\right .\\ &&\phi (l_1,\ldots,j,\ldots,l_s)\left (\phi (d_1,\ldots,i,\ldots,d_s)-\phi (d_1,\ldots,j,\ldots,d_s) \right )\\ &&\left .\left (\prod _{r\in [s]\setminus \{g\}}t_{b_r,d_r,l_r}(y,y_m,x)\right )\right \}+O\left (\frac{n_1^{k-2}}{B_3^2}\right ) \end{eqnarray*}

The identity above can be interpreted as follows. We can classify the terms satisfying $\{d_1,\ldots,d_s\}\cap \{i,j\}\neq \emptyset$ by the number

\begin{eqnarray*} N_{i,j}=\{l\in [s]\;:\; d_l\in \{i,j\}\}, \end{eqnarray*}

and obtain that the leading term of $L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1})$ is given by the terms when $N_{i,j}=1$ . Moreover, by Assumption (1) we have

\begin{eqnarray*} &&L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1})=\sum _{s=s_1}^{s_2}\sum _{g=1}^{s}\sum _{(b_1,\ldots,\widehat{b}_g,\ldots,b_s)\in [k]^s}\frac{1}{(\overline{\sigma }(b_1,\ldots,i,\ldots,b_s))^2}\\ &&\left \{(\phi (b_1,\ldots,i,\ldots,b_s))^2-(\phi (b_1,\ldots,j,\ldots,b_s))^2) \prod _{r\in [s]\setminus \{g\}}n_r-2\sum _{(l_1,\ldots,\widehat{l}_g,\ldots,l_s)\in [k]^s}\right .\\ &&\phi (b_1,\ldots,j,\ldots,b_s)\left (\phi (b_1,\ldots,i,\ldots,b_s)-\phi (b_1,\ldots,j,\ldots,b_s) \right )\\ &&\left .\left (\prod _{r\in [s]\setminus \{g\}}n_r\right )\right \}+O\left (\frac{n_1^{s-2}}{B_3^2}\right )+O\left (\frac{\varepsilon n n_1^{s-2}}{B_3^2}\right )\\ &&=\sum _{s=s_1}^{s_2}\sum _{g=1}^{s}\sum _{(b_1,\ldots,\widehat{b}_g,\ldots,b_s)\in [k]^s}\frac{1}{(\overline{\sigma }(b_1,\ldots,i,\ldots,b_s))^2}\\ &&(\phi (b_1,\ldots,i,\ldots,b_s))-(\phi (b_1,\ldots,j,\ldots,b_s)))^2 \prod _{r\in [s]\setminus \{g\}}n_{b_r}\\ &&+O\left (\frac{n_1^{s-2}}{B_3^2}\right )+O\left (\frac{\varepsilon n n_1^{s-2}}{B_3^2}\right ) \end{eqnarray*}

Define

\begin{align*} \Delta & \;:\!=\;\min _{i,j\in [k],i\neq j}\sum _{s=s_1}^{s_2}\sum _{g=1}^{s}\sum _{(b_1,\ldots,\widehat{b}_g,\ldots,b_s)\in [k]^s}\frac{1}{(\overline{\sigma }(b_1,\ldots,i,\ldots,b_s))^2}\\ &\times (\phi (b_1,\ldots,i,\ldots,b_s))-(\phi (b_1,\ldots,j,\ldots,b_s)))^2 \prod _{r\in [s]\setminus \{g\}}n_{b_r} \end{align*}

We further make the assumptions below:

(60) \begin{eqnarray} \lim _{n\rightarrow \infty }\frac{n_1^{s-2}+\varepsilon n n_1^{s-2}}{B_3^2\Delta }=0. \end{eqnarray}

Then

\begin{eqnarray*} L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1}) &\geq &\Delta (1+o(1)) \end{eqnarray*}

Then by Assumption 2.5(1), the exact recovery occurs with probability 1 when $n\rightarrow \infty$ if ( 15 ) and ( 17 ) hold, and

(61) \begin{eqnarray} &&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [k]^s}\sum _{(j_1,\ldots,j_s\in [k]^s)}\left (\phi (i_1,\ldots,i_s)-\phi (j_1,\ldots,j_s)\right )^2\\ &&\times \left (\prod _{r=1}^s t_{i_r,j_r}(x,y)\right )\geq T(n)\notag \end{eqnarray}

when ( 10 ) holds.

There are a lot of functions $\phi \;:\;\cup _{s=s_1}^{s_2}[k]^s\rightarrow{\mathbb R}$ satisfying ( 60 ) and ( 61 ) and Assumption 2.2 . For example, we may consider the function $\phi$ as defined in ( 56 ). Assume

\begin{eqnarray*} \Delta &=&\sum _{s=s_1}^{s_2}\sum _{g=1}^{s}\sum _{(b_1,\ldots,\widehat{b}_g,\ldots,b_s)\in [k]^s}\frac{1}{(\overline{\sigma }(b_1,\ldots,r_0,\ldots,b_s))^2}\\ &&\times\, (\phi (b_1,\ldots,r_0,\ldots,b_s))-(\phi (b_1,\ldots,r_1,\ldots,b_s)))^2 \prod _{r\in [s]\setminus \{g\}}n_{b_r} \end{eqnarray*}

where $r_0,r_1\in [k]$ and $r_0\neq r_1$ . As in Example 4.5, we obtain that

\begin{eqnarray*} \Delta =\sum _{s=s_1}^{s_2}2^{2s}(\Delta _{0,s}+\Delta _{1,s}), \end{eqnarray*}

where

\begin{eqnarray*} \Delta _{0,s}&=&\frac{s(n_{r_0})^{s-1}}{(\overline{\sigma }(r_0,\ldots,r_0))^2}\geq \frac{s((n_{r_0}))^{s-1}}{B_1^2}\\ \Delta _{1,s}&=&\sum _{j=1}^s\frac{(n_{r_1})^{s-1}}{(\overline{\sigma }(r_1,\ldots,r_1,r_0,r_1,\ldots,r_1))^2} \geq \frac{s((n_{r_1}))^{s-1}}{B_1^2} \end{eqnarray*}

where the inequality follows from Assumption 2.5(2). It is straightforward to check that when $\phi$ is given by ( 56 ) and ( 60 ) hold if ( 58 ) holds.

To check ( 61 ), note that

\begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [k]^s}\sum _{(j_1,\ldots,j_s\in [k]^s)}\left (\phi (i_1,\ldots,i_s)-\phi (j_1,\ldots,j_s)\right )^2\times \left (\prod _{r=1}^s t_{i_s,j_s}(x,y)\right )\\ &\geq & \sum _{s=s_1}^{s_2} \sum _{g\in [s]}\sum _{j\in [s]}\sum _{i\in [k],i\neq j} (\phi (w(i),\ldots,w(i))-\phi (i,\ldots,i,j,i,\ldots,i))^2\\ &&\times t_{w(i),j}(x,y)\prod _{r=[s]\setminus \{g\}} t_{w(i),i}(x,y) \end{eqnarray*}

When ( 10 ) holds, the following cases might occur

  • $w$ is not a bijection from $[k]$ to $[k]$ . In this case, there exists $i,j\in [k]$ , such that $w(i)=w(j)$ , then when ( 10 ) holds, we obtain

    \begin{eqnarray*} t_{w(i),j}=t_{w(j),j}\geq \frac{n_{j}}{k} \end{eqnarray*}
  • $w$ is a bijection from $[k]$ to $[k]$ . However, there exists $i\in [k]^2$ , such that

    \begin{eqnarray*} t_{w(j),j}\leq n_i-\varepsilon n. \end{eqnarray*}
    Let
    \begin{eqnarray*} i\;:\!=\;w^{-1}(\mathrm{argmax}_{l\in [k]\setminus \{w(j)\}}t_{l,j}), \end{eqnarray*}
    then $i\neq j$ and
    \begin{eqnarray*} t_{w(i),j}(x,y)\geq \frac{\varepsilon n}{k-1} \end{eqnarray*}

When ( 10 ) holds and $y\in \Omega _c$ , we have

\begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{(i_1,\ldots,i_s)\in [k]^s}\sum _{(j_1,\ldots,j_s\in [k]^s)}\left (\phi (i_1,\ldots,i_s)-\phi (j_1,\ldots,j_s)\right )^2\times \left (\prod _{r=1}^s t_{i_s,j_s}(x,y)\right )\\ &\geq &\sum _{s=s_1}^{s_2} s 2^{2s}\left (\frac{n_i}{k}\right )^{s-1}\min \left \{\frac{n_j}{k},\frac{\varepsilon n}{k-1}\right \} \geq \sum _{s=s_1}^{s_2} \frac{2^{2s}sn^s}{(ck)^{s-1}\max \left \{ck,\frac{k-1}{\varepsilon }\right \}} \end{eqnarray*}

Let

\begin{eqnarray*} T(n)\;:\!=\;\sum _{s=s_1}^{s_2} \frac{2^{2s}sn^s}{(ck)^{s-1}\max \left \{ck,\frac{k-1}{\varepsilon }\right \}} \end{eqnarray*}

Then we obtain ( 61 ).

5. Community detection on Gaussian mixture models with fixed number of vertices in each community

In this section, we consider the MLE restricted to the sample space consisting of all the mappings satisfying the condition that the number of vertices in each community is the same as that of the true community assignment mapping $y\in \Omega _{n_1,\ldots,n_k}$ . Again we shall prove a sufficient condition for the occurrences of exact recovery.

Let $x\in \Omega _{n_1,\ldots,n_k}$ . By (28),

\begin{eqnarray*} \check{y}\;:\!=\;\mathrm{argmin}_{x\in \Omega _{n_1,\ldots,n_k}}\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2=\mathrm{argmin}_{x\in \Omega _{n_1,\ldots,n_k}}f(x) \end{eqnarray*}

Recall that $f(x)$ is defined as in (29). Recall also that $f(x)-f(y)$ is a Gaussian random variable with mean value $L_{\Phi }(x,y)$ and variance $4L_{\Phi }(x,y)$ .

For each $x\in \Omega _{n_1,\ldots,n_k}$ , let

\begin{eqnarray*} C^*(x)\;:\!=\;C(x)\cap \Omega _{n_1,\ldots,n_k}; \end{eqnarray*}

that is, $C^*(x)$ consists of all the community assignment mappings in $\Omega _{n_1,\ldots,n_k}$ that are equivalent to $x$ in the sense of Definition 1. Let

\begin{eqnarray*} \overline{\Omega }_{n_1,\ldots,n_k}\;:\!=\;\{C^*(x):x\in \Omega _{n_1,\ldots,n_k}\}; \end{eqnarray*}

that is, $\overline{\Omega }_{n_1,\ldots,n_k}$ consists of all the equivalence classes in $\Omega _{n_1,\ldots,n_k}$ .

Lemma 5.1. For $x,z\in \Omega _{n_1,\ldots,n_k}$ . If $x\in C^*(z)$ , then

\begin{eqnarray*} f(x)=f(z). \end{eqnarray*}

Proof. The lemma follows from Lemma 3.1.

Define

\begin{eqnarray*} p(\check{y};\ \sigma )\;:\!=\;\Pr (\check{y}\in C(y))=\Pr \left (f(\check{y})\lt \min _{C^*(x)\in \left (\overline{\Omega }_{n_1,\ldots,n_k}\setminus \{C^*(y)\}\right )}f(x)\right ) \end{eqnarray*}

Then

(62) \begin{eqnarray} 1-p(\check{y};\ \sigma )&\leq & \sum _{C^*(x)\in \left (\overline{\Omega }_{n_1,\ldots,n_k}\setminus \{C^*(y)\}\right )}\Pr (f(x)-f(y)\leq 0)\\ &=&\sum _{C^*(x)\in \left (\overline{\Omega }_{n_1,\ldots,n_k}\setminus \{C^*(y)\}\right )}\Pr _{\xi \in{\mathscr{N}}(0,1)}\left (\xi \geq \frac{L_{\Phi }(x,y)}{2}\right )\notag \\ &\leq &\sum _{C^*(x)\in \left (\overline{\Omega }_{n_1,\ldots,n_k}\setminus \{C^*(y)\}\right )} e^{-\frac{L_{\Phi }(x,y)}{8}}\notag \end{eqnarray}

Lemma 5.2. Let $y\in \Omega _{n_1,\ldots,n_k}\cap \Omega _c$ be the true community assignment mapping. Let $x\in \Omega _{n_1,\ldots,n_k}$ . For $i\in [k]$ , let $w(i)\in [k]$ be defined as in ( 34 ). Then

  1. 1. When $\varepsilon \in \left (0,\frac{c}{k}\right )$ and $(t_{1,1}(x,y),\ldots, t_{k,k}(x,y))\in \mathscr{B}_{\varepsilon }$ , $w$ is a bijection from $[k]$ to $[k]$ .

  2. 2. Assume there exist $i,j\in [k]$ , such that $n_i\neq n_j$ . If

    (63) \begin{eqnarray} \varepsilon \lt \min _{i,j\in [k]:n_i\neq n_j}\left |\frac{n_i-n_j}{n}\right | \end{eqnarray}
    Then for any $i\in [k]$ ,
    (64) \begin{eqnarray} n_i=|y^{-1}(i)|=|y^{-1}(w(i))|=n_{w(i)}. \end{eqnarray}

Proof. See Lemma 6.6 of [Reference Li13].

Definition 3. Let $l\geq 2$ be a positive integer. Let $x,y\in \Omega _{n_1,\ldots,n_k}$ . We say $l$ distinct communities $(i_1,\ldots,i_{l})\in [k]^l$ is an $l$ -cycle for $(x,y)$ , if $t_{i_{s-1},i_s}(x,y)\gt 0$ for all $2\leq s\leq l+1$ , where $i_{l+1}\;:\!=\;i_1$ .

Lemma 5.3. Let $x,y\in \Omega _{n_1,\ldots,n_k}$ and $x\neq y$ . Then there exists an $l$ -cycle for $(x,y)$ with $2\leq l\leq k$ .

Proof. See Lemma 3.3 of [Reference Li13].

Lemma 5.4. For any $x,y\in \Omega _{n_1,\ldots,n_k}$ , $L_{\Phi }(x,y)\geq 0$ , where the equality holds if and only if $x\in C^*(y)$ .

Proof. The lemma follows from Lemma 3.3.

Lemma 5.5. Suppose that Assumption 2.4 holds. Then Assumption 2.7 holds.

Proof. Let $z_0=y_m$ and $z_j=y_h$ . For $i\in [j-1]$ , define $z_i\in \Omega$ by

\begin{eqnarray*} z_i(v)=\begin{cases}z_{i-1}(v)&\mathrm{if}\ v\in [n]\setminus \{u_i\}\\ x(u_i)&\mathrm{if}\ v=u_i \end{cases} \end{eqnarray*}

Then for any $i\in [j]$ ,

\begin{eqnarray*} D_{\Omega }(z_i,z_{i-1})=1. \end{eqnarray*}

by Assumption 2.4, we obtain

(65) \begin{eqnarray} L_{\Phi }(x,z_{i-1})- L_{\Phi }(x,z_i)\geq \Delta (1+o(1)),\ \forall i\in [j] \end{eqnarray}

summing over all the $i\in [j]$ , we obtain (18).

Proof of Theorem 2.8. Let

\begin{eqnarray*} \Gamma \;:\!=\;\sum _{C^*(x)\in \left (\overline{\Omega }_{n_1,\ldots,n_k}\setminus \{C^*(y)\}\right )}e^{-\frac{L_{\Phi }(x,y)}{8}}. \end{eqnarray*}

By (62), it suffices to show that $\lim _{n\rightarrow \infty }\Gamma =0$ .

Let

\begin{eqnarray*} 0\lt \varepsilon \lt \min \left (\frac{2c}{3k},\min _{i,j\in [k],n_i\neq n_j}\left |\frac{n_i-n_j}{n}\right |\right ) \end{eqnarray*}

Note that

\begin{eqnarray*} \Gamma \leq \Gamma _1+\Gamma _2; \end{eqnarray*}

where

\begin{eqnarray*} \Gamma _1&=&\sum _{C^*(x)\in \overline{\Omega }_{n_1,\ldots,n_k}\;:\;\left (t_{1,1}(x,y),\ldots,t_{k,k}(x,y)\right )\in \left (\mathscr{B}\setminus \mathscr{B}_{\varepsilon }\right ), C^*(x)\neq C^*(y)}e^{\frac{-L_{\Phi }(x,y)}{8}} \end{eqnarray*}

and

(66) \begin{eqnarray} \Gamma _2=\sum _{C^*(x)\in \overline{\Omega }_{n_1,\ldots,n_k}\;:\;\left (t_{1,1}(x,y),\ldots, t_{k,k}(x,y)\right )\in \mathscr{B}_{\varepsilon }, C(x)\neq C(y)}e^{\frac{-L_{\Phi }(x,y)}{8}}. \end{eqnarray}

Under Assumption 2.5, by Lemma 3.5 we have

\begin{eqnarray*} 0\leq \Gamma _1&\leq & k^n e^{-\frac{T(n)}{B_1^2}} \end{eqnarray*}

By (15), we have

(67) \begin{eqnarray} \lim _{n\rightarrow \infty }\Gamma _1=0. \end{eqnarray}

Now let us consider $\Gamma _2$ . Recall that $y\in \Omega _{n_1,\ldots,n_k}\cap \Omega _c$ is the true community assignment mapping. Let $w$ be the bijection from $[k]$ to $[k]$ as defined in (34). Let $y^*\in \Omega$ be defined by:

\begin{eqnarray*} y^*(z)=w(y(z)),\ \forall z\in [n]. \end{eqnarray*}

Then $y^*\in C(y)$ . By Part (2) of Lemma 5.2, we obtain that for $i\in [k]$

\begin{eqnarray*} \left |(y^*)^{-1}(i)\right |=\left |y^{-1}(w^{-1}(i))\right |=\left |y^{-1}(i)\right |; \end{eqnarray*}

therefore $y^*\in \Omega _{n_1,\ldots,n_k}$ . Moreover, $x$ and $y^*$ satisfies

(68) \begin{eqnarray} t_{i,i}(x,y^*)\geq n_i(y^*)-n\varepsilon,\ \forall i\in [k]. \end{eqnarray}

If $x\neq y^*$ , by Lemma 5.3, there exists an $l$ -cycle $(i_1,\ldots,i_{l})$ for $(x,y^*)$ with $2\leq l\leq k$ . Then for each $2\leq a\leq (l+1)$ , choose an arbitrary vertex $u_m$ in $S_{i_{m-1},i_m}(x,y^*)$ , and let $y_1(u_m)=i_{m-1}$ , where $i_{l+1}\;:\!=\;i_1$ . For any vertex $z\in [n]\setminus \{u_{2},\ldots,u_{l+1}\}$ , let $y_1(z)=y^*(z)$ .

Note that $y_1\in \Omega _{n_1,\ldots,n_k}$ . Moreover, for $1\leq m\leq l$ , we have

\begin{eqnarray*} t_{i_m,i_m}(x,y^*)+1=t_{i_m,i_m}(x,y_1);\\ t_{i_m,i_{m+1}}(x,y^*)-1=t_{i_m,i_{m+1}}(x,y_1) \end{eqnarray*}

and

\begin{eqnarray*} t_{a,b}(x,y^*)=t_{a,b}(x,y_1),\ \forall (a,b)\notin \{(i_m,i_m),(i_m,i_{m+1})\}_{s=1}^{l}. \end{eqnarray*}

When $\left (t_{1,1}(x,y),\ldots, t_{k,k}(x,y)\right )\in \mathscr{B}_{\varepsilon }$ , From Assumption 2.4 and Lemma 5.5, we obtain

\begin{eqnarray*} L_{\Phi }(x,y_1)-L_{\Phi }(x,y)&=&L_{\Phi }(x,y_1)-L_{\Phi }(x,y^*)\leq -l\Delta (1+o(1)) \end{eqnarray*}

Therefore,

(69) \begin{eqnarray} e^{-\frac{L_{\Phi }(x,y)}{8}}\leq e^{-\frac{L_{\Phi }(x,y_1)}{8}}e^{-\frac{l\Delta (1+o(1))}{8}} \end{eqnarray}

If $y_1\neq x$ , we find an $l_2$ -cycle ( $2\leq l_2\leq k$ ) for $(x,y_1)$ , change community assignments along the $l_2$ -cycle as above, and obtain another community assignment mapping $y_2\in \Omega _{n_1,\ldots,n_k}$ , and so on. Let $y_0\;:\!=\;y$ , and note that for each $r\geq 1$ , if $y_r$ is obtained from $y_{r-1}$ by changing colours along an $l_r$ cycle for $(x,y_{r-1})$ , we have

\begin{eqnarray*} D_{\Omega }(x,y_r)= D_{\Omega }(x,y_{r-1})-l_r \end{eqnarray*}

Therefore, finally we can obtain $x$ from $y$ by changing colours along at most $\left \lfloor \frac{n}{2} \right \rfloor$ cycles. By similar arguments as those used to derive (69), we obtain that for each $r$

\begin{eqnarray*} e^{-\frac{L_{\Phi }(x,y_{r-1})}{8}}\leq e^{-\frac{L_{\Phi }(x,y_r)}{8}}e^{-\frac{l_r\Delta (1+o(1))}{8}} \end{eqnarray*}

Therefore, if $y_h=x$ for some $1\leq h\leq \left \lfloor \frac{n}{2} \right \rfloor$ , we have

\begin{eqnarray*} e^{-\frac{L_{\Phi }(x,y)}{8}}\leq e^{-\frac{L_{\Phi }(x,y_{h-1})}{8}}e^{-\frac{\left (\sum _{r=1}^{h-1}l_r\right )\Delta (1+o(1)))}{8}}. \end{eqnarray*}

By Assumption 2.4 and Lemma 5.5, we obtain

\begin{eqnarray*} L_{\Phi }(x,y_{h-1}))^2\geq l_h\Delta (1+o(1)) \end{eqnarray*}

Therefore,

\begin{eqnarray*} e^{-\frac{L_{\Phi }(x,y)}{8}}\leq \prod _{i\in [h]}e^{-\frac{l_i\Delta (1+o(1))}{8}}. \end{eqnarray*}

Note also that for any $r_1\neq r_2$ , in the process of obtaining $y_{r_1}$ from $y_{r_1-1}$ and the process of obtaining $y_{r_2}$ from $y_{r_2-1}$ , we change community assignments on disjoint sets of vertices. Hence, the order of these steps of changing community assignments along cycles does not affect the final community assignment mapping we obtain. From (66), we have

(70) \begin{eqnarray} \Gamma _2\leq \prod _{l=2}^k \left (\sum _{m_l=0}^{\infty } (nk)^{m_ll} e^{-\frac{(1+o(1))\Delta l m_l}{8}}\right )-1. \end{eqnarray}

On the right-hand side of (70), when expanding the product, each summand has the form:

\begin{eqnarray*} \left [(nk)^{2m_2}e^{-\frac{(1+o(1))\Delta 2 m_2}{8}}\right ]\cdot \left [ (nk)^{3m_3}e^{-\frac{(1+o(1))\Delta 3 m_3}{8}}\right ]\cdot \ldots \cdot \left [(nk)^{km_k} e^{-\frac{(1+o(1))\Delta k m_k}{8}}\right ] \end{eqnarray*}

where the factor $\left [(nk)^{2m_2}e^{-\frac{(1+o(1))\Delta 2 m_2}{8}}\right ]$ represents that we changed along 2-cycles $m_2$ times, the factor $\left [ (nk)^{3m_3}e^{-\frac{(1+o(1))\Delta 3 m_3}{8}}\right ]$ represents that we changed along 3-cycles $m_3$ times, and so on. Moreover, each time we changed along an $l$ -cycle, we need to first determine the $l$ different colours involved in the $l$ -cycle, and there are at most $k^l$ different $l$ -cycles; we then need to choose $l$ vertices to change colours, and there are at most $n^{l}$ choices. It is straightforward to check that if $\sigma$ satisfies (17), then

\begin{eqnarray*} \lim _{n\rightarrow \infty }nke^{-\frac{(1+o(1))\Delta }{8}}=0. \end{eqnarray*}

Therefore, we have

\begin{eqnarray*} \sum _{m_l=0}^{\infty } (nk)^{m_ll} e^{-\frac{(1+o(1))\Delta l m_l}{8}}\leq \frac{1}{1-e^{\log k+\log n-\frac{(1+o(1))\Delta }{8}}}; \end{eqnarray*}

when $n$ is sufficiently large and $\varepsilon$ is sufficiently small. Let

\begin{eqnarray*} \Psi \;:\!=\; \prod _{l=2}^k \left (\sum _{m_l=0}^{\infty } (nk)^{m_ll} e^{-\frac{(1+o(1))m_l l\Delta }{8}}\right ). \end{eqnarray*}

Since $\log (1+x)\leq x$ for $x\geq 0$ , we have

\begin{eqnarray*} 0\leq \log \Psi &=&\sum _{l=2}^{k}\log \left (1+\sum _{m_l=1}^{\infty } (nk)^{m_ll} e^{-\frac{(1+o(1))\Delta l m_l}{8}}\right )\\ &\leq &\sum _{l=2}^{k}\sum _{m_l=1}^{\infty } (nk)^{m_ll} e^{-\frac{(1-\delta )\Delta l m_l}{8}}\\ &\leq &\sum _{l=2}^{k}\frac{\left (nke^{-\frac{(1-\delta )\Delta }{8}}\right )^{l}}{1-\left (nke^{-\frac{(1-\delta )\Delta }{8}}\right )^l}\\ &\leq &\frac{\left (nke^{-\frac{(1-\delta )\Delta }{8}}\right )^{2}}{\left [1-\left (nke^{-\frac{(1-\delta )\Delta }{8}}\right )^{2}\right ]\left [1-\left (nke^{-\frac{(1-\delta )\Delta }{8}}\right )\right ]}\rightarrow 0, \end{eqnarray*}

as $n\rightarrow \infty$ . Then

(71) \begin{eqnarray} 0\leq \lim _{n\rightarrow \infty }\Gamma _2\leq \lim _{n\rightarrow \infty }e^{\log \Sigma }-1=0. \end{eqnarray}

Then the proposition follows from (67) and (71).

6. Community detection on hypergraphs with fixed number of vertices in each community

In this section, we study community detection on hypergraphs under the assumption that the number of vertices in each community is known and fixed. We shall prove a condition when exact recovery does not occur.

Recall that $y\in \Omega _{n_1,\ldots,n_k}$ is the true community assignment mapping.

Proof of Theorem 2.10 . When $y^{(ab)}\in \Omega _{n_1,\ldots,n_k}$ is defined by (23):

\begin{eqnarray*} &&t_{y^{(ab)}(a),y(a)}(y^{(ab)},y)-1=t_{y^{(ab)}(a),y(a)}(y,y)=t_{y(b),y(a)}(y,y)=0\\ &&t_{y(b),y(b)}(y^{(ab)},y)+1=t_{y(b),y(b)}(y,y)=n_{y(b)}\\ &&t_{y^{(ab)}(b),y(b)}(y^{(ab)},y)-1=t_{y(a),y(b)}(y^{(ab)},y)-1=t_{y(a),y(b)}(y,y)=0\\ &&t_{y(a),y(a)}(y^{(ab)},y)+1=t_{y(a),y(a)}(y,y)=n_{y(a)}. \end{eqnarray*}

and

\begin{eqnarray*} t_{i,j}(y^{(ab)},y)=t_{i,j}(y),\ \forall \ (i,j)\in \left ([k]^2\setminus \{(y(a),y(a)),(y(a),y(b)),(y(b),y(a)),(y(b),y(b))\}\right ) \end{eqnarray*}

Note that

\begin{eqnarray*} 1-p(\check{y};\ \sigma )\geq \mathrm{Pr}\left (\cup _{a,b\in [n],y(a)\neq y(b)}(f(y^{(ab)})-f(y)\lt 0)\right ), \end{eqnarray*}

since any of the event $(f(y^{(ab)})-f(y)\lt 0)$ implies $\check{y}\neq y$ . By (45), we obtain that $f(y^{(ab)})-f(y)$ is a Gaussian random variable with mean value $\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2$ and variance $4\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2$ . So $1-p(\check{y};\ \sigma )$ is at least

\begin{eqnarray*} &&\mathrm{Pr}\left (\cup _{a,b\in [n],y(a)\neq y(b)}(f(y^{(ab)})-f(y)\lt 0)\right )\\ &\geq &\mathrm{Pr}\left (\mathrm{max}_{a,b\in [n], y(a)\neq y(b)}\frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2}\gt 1\right ) \end{eqnarray*}

Let $H_1$ and $H_2$ be given as in the assumptions of the proposition. Then

\begin{eqnarray*} 1-p(\check{y};\ \sigma )\geq \mathrm{Pr}\left (\mathrm{max}_{a\in H_1,b\in H_2}\frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y})\rangle }{\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2}\gt 1\right ) \end{eqnarray*}

Let $(\mathscr{X},\mathscr{Y},\mathscr{Z})$ be a partition of $[n]^s$ defined by:

\begin{eqnarray*} &&\mathscr{X}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in [n]^s\;:\; s\in \{s_1,s_1+1,\ldots,s_2\}, \{\alpha _1,\ldots,\alpha _s\}\cap (H_1\cup H_2)=\emptyset \}\\ &&\mathscr{Y}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in [n]^s\;:\; s\in \{s_1,s_1+1,\ldots,s_2\}, |r\in [s]\;:\;\alpha _r\in (H_1\cup H_2)|=1\}\\ &&\mathscr{Z}=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in [n]^s\;:\; s\in \{s_1,s_1+1,\ldots,s_2\}, |r\in [s]\;:\;\alpha _r\in (H_1\cup H_2)|\geq 2\} \end{eqnarray*}

For $\eta \in \{\mathscr{X},\mathscr{Y},\mathscr{Z}\}$ , define a random tensor $\mathbf{W}_{\eta }$ from the entries of $\mathbf{W}$ as follows:

\begin{eqnarray*} (\mathbf{W}_{\eta })_{(a_1,\ldots,a_s)}=\begin{cases}0&\mathrm{if}\ (a_1,\ldots,a_s)\notin \eta \\ \mathbf{W}_{(a_1,\ldots,a_s)},&\mathrm{if}\ (a_1,\ldots,a_s)\in \eta \end{cases} \end{eqnarray*}

For each $u\in H_{1}$ and $v\in H_{2}$ , let

\begin{eqnarray*} \mathscr{X}_{uv}=\langle \mathbf{W}_{\mathscr{X}},\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_{y}) \rangle \\ \mathscr{Y}_{uv}=\langle \mathbf{W}_{\mathscr{Y}},\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_{y}) \rangle \\ \mathscr{Z}_{uv}=\langle \mathbf{W}_{\mathscr{Z}},\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_{y}) \rangle \end{eqnarray*}

Lemma 6.1. The followings are true:

  1. 1. $\mathscr{X}_{uv}=0$ for $u\in H_{1}$ and $v\in H_{2}$ .

  2. 2. For each $u\in H_{1}$ and $v\in H_{2}$ , the variables $\mathscr{Y}_{uv}$ and $\mathscr{Z}_{uv}$ are independent.

  3. 3. Each $\mathscr{Y}_{uv}$ can be decomposed into $Y_u+Y_v$ where $\{Y_u\}_{u\in H_{1}}\cup \{Y_v\}_{v\in H_{2}}$ is a collection of i.i.d. Gaussian random variables.

Proof. Note that for $J_s=(j_1,j_2,\ldots,j_s)\in [n]^s$ ,

(72) \begin{eqnarray} &&(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_{y})_{J_s}\\ &=&\begin{cases}\phi (y^{(uv)}(j_1),y^{(uv)}(j_2),\ldots,y^{(uv)}(j_s))-\phi (y(j_1),y(j_2),\ldots,y(j_s))& \mathrm{if}\ \{a,b\}\cap \{j_1,\ldots,j_s\}\neq \emptyset \\0&\mathrm{otherwise}.\end{cases}\notag \end{eqnarray}

It is straightforward to check (1). (2) holds because $\mathscr{Y}\cap \mathscr{Z}=\emptyset$ .

For $g\in H_{1}\cup H_{2}$ , let $\mathscr{Y}_g\subseteq \mathscr{Y}$ be defined by:

\begin{eqnarray*} \mathscr{Y}_g=\{\alpha =(\alpha _1,\alpha _2,\ldots,\alpha _s)\in \mathscr{Y}: g\in \{\alpha _1,\ldots,\alpha _s\}\}. \end{eqnarray*}

Note that for $g_1,g_2\in H_{1}\cup H_{2}$ and $g_1\neq g_2$ , $\mathscr{Y}_{g_1}\cap \mathscr{Y}_{g_2}=\emptyset$ . Moreover, $\mathscr{Y}=\cup _{g\in H_{1}\cup H_{2}}\mathscr{Y}_g$ . Therefore,

\begin{eqnarray*} \mathscr{Y}_{ab}=\sum _{g\in H_{1}\cup H_{2}}\langle \mathbf{W}_{\mathscr{Y}_g},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle \end{eqnarray*}

Note also that $\langle \mathbf{W}_{\mathscr{Y}_g},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle =0$ , if $g\notin \{a,b\}$ . Hence,

\begin{eqnarray*} \mathscr{Y}_{ab}=\sum _{\alpha \in \mathscr{Y}_a\cup \mathscr{Y}_b}\frac{(\mathbf{W})_{\alpha }\cdot \{(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y})_{\alpha }\}}{\sigma _{\alpha }} \end{eqnarray*}

So, we can define

\begin{eqnarray*} Y_a\;:\!=\;&&\sum _{\alpha \in \mathscr{Y}_a}\frac{(\mathbf{W})_{\alpha }\cdot \{(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y})_{\alpha }\}}{\sigma _{\alpha }} \end{eqnarray*}

By (72), we obtain

\begin{eqnarray*} Y_a&=&\sum _{\alpha \in \mathscr{Y}_a}\frac{(\mathbf{W})_{\alpha }\cdot \{(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y})_{\alpha }\}}{\sigma _{\alpha }}\\ &=&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_{1}\cup H_2))^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},a,i_{j+1},\ldots,i_s)}}\\ &&\left \{(\phi (y(i_1),\ldots,y^{(ab)}(a),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))(\mathbf{W})_{(i_1,\ldots,i_{j-1},a,i_{j+1},\ldots,i_s)}\right \} \end{eqnarray*}

Similarly, define

\begin{eqnarray*} Y_b:&=&\sum _{\alpha \in \mathscr{Y}_b}\frac{(\mathbf{W})_{\alpha }\cdot \{(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y})_{\alpha }\}}{\sigma _{\alpha }}\\ &=&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_{1}\cup H_{2}))^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},b,i_{j+1},\ldots,i_s)}}\\ &&\left \{(\phi (y(i_1),\ldots,y^{(ab)}(b),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(b),\ldots,y(i_s)))(\mathbf{W})_{(i_1,\ldots,i_{j-1},b,i_{j+1},\ldots,i_s)}\right \} \end{eqnarray*}

Then $\mathscr{Y}_{ab}=Y_a+Y_b$ and $\{Y_g\}_{g\in H_{1}\cup H_{2}}$ is a collection of independent Gaussian random variables. Moreover, the variance of $Y_g$ is

\begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_1\cup H_2))^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},g,i_{j+1},\ldots,i_s)}^2}\\ &&(\phi (y(i_1),\ldots,y(b),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))^2\\ \end{eqnarray*}

By Assumption (6) of the proposition, this is independent of $g$ .

By Lemma 6.1, we obtain

\begin{eqnarray*} \langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle =Y_a+Y_b+\mathscr{Z}_{ab} \end{eqnarray*}

Moreover,

\begin{eqnarray*} \max _{u\in H_{1},v\in H_{2}}Y_u+Y_v+\mathscr{Z}_{uv}&\geq & \max _{u\in H_{1},v\in H_{2}}(Y_u+Y_v)-\max _{u\in H_{1},v\in H_{2}}(-\mathscr{Z}_{uv})\\ &=&\max _{u\in H_{1}} Y_u+\max _{v\in H_{2}}Y_v-\max _{u\in H_{1},v\in H_{2}}(-\mathscr{Z}_{uv}) \end{eqnarray*}

By Lemma A.1, we obtain that when $\varepsilon, h$ satisfy (A1) with $N$ replaced by $h$ , each one of the following two events

\begin{eqnarray*} F_1\;:\!=\;\left \{\max _{u\in H_{1}}\frac{Y_u}{\|\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_y)\|^2}\geq (1-\varepsilon )\sqrt{2\log h\cdot \min _{u\in H_{1}}\frac{\mathrm{Var}(Y_u)}{(L_{\Phi }(y^{(u,v)},y))^2}}\right \}\\ F_2\;:\!=\;\left \{\max _{v\in H_{2}}\frac{Y_v}{\|\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_y)\|^2}\geq (1-\varepsilon )\sqrt{{2\log h\cdot \min _{v\in H_{2}}\frac{\mathrm{Var}(Y_v)}{(L_{\Phi }(y^{(u,v)},y))^2}}}\right \} \end{eqnarray*}

has probability at least $1-e^{-h^{\varepsilon }}$ . Moreover, the event

\begin{eqnarray*} F_3\;:\!=\;\left \{\max _{u\in H_{1},v\in H_{2}}\frac{\mathscr{Z}_{uv}}{{\|\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_y)\|^2}}\leq (1+\varepsilon )\sqrt{2\log ( 2h)\cdot \max _{u\in H_{1},v\in H_{2}} \frac{\mathrm{Var}(Z_{uv})}{(L_{\Phi }(y^{(uv)},y))^2}}\right \} \end{eqnarray*}

occurs with probability at least $1-h^{-2\varepsilon }$ . Then by Assumption (4) of the proposition, we have

\begin{eqnarray*} \mathrm{Var} \mathscr{Z}_{uv} &=&\|\Phi *(\mathbf{A}_{y^{(uv)}}-\mathbf{A}_{y})\|^2-\mathrm{Var}(Y_u)-\mathrm{Var}(Y_v)\\ &=&L_{\Phi }(y^{(uv)},y)-(1+o(1))L_{\Phi }(y^{(uv)},y)\\ &=&o(1)L_{\Phi }(y^{(uv)},y). \end{eqnarray*}

By Assumption (5) of the proposition, for any $u\in H_1$ and $v\in H_2$ , we have

\begin{eqnarray*} \mathrm{Var}(Y_u)=\mathrm{Var}(Y_v). \end{eqnarray*}

Moreover, by Assumption (4) of the proposition,

\begin{eqnarray*} \mathrm{Var}(Y_u)+\mathrm{Var}(Y_v)=(1+o(1))L_{\Phi }(y^{(uv)},y). \end{eqnarray*}

Hence, the probability of the event

\begin{eqnarray*} &&F\;:\!=\;\left \{\max _{a\in H_1,b\in H_{2}}\frac{\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2}\geq \frac{(1-\varepsilon )\sqrt{2\log h}}{\max _{u\in H_1,v\in H_2}L_{\Phi }(y^{(u,v)},y)} \right .\\ &&\left .\left (\sqrt{\min _{u\in H_1}\mathrm{Var}(Y_u)}+\sqrt{\min _{v\in H_2}\mathrm{Var}(Y_v)}-(1+o(1))\sqrt{\max _{u\in H_1,v\in H_2}\mathrm{Var}(Z_{uv})}\right )\right \}\\ &=&\left \{\max _{a\in H_1,b\in H_{2}}\frac{\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle }{\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2}\geq \frac{2(1-\varepsilon )\sqrt{\log h }}{\sqrt{\max _{u\in H_1,v\in H_2}L_{\Phi }(y^{(u,v)},y)}} (1+o(1))\right \} \end{eqnarray*}

is at least

\begin{eqnarray*} \mathrm{Pr}(F_1\cap F_2\cap F_3)&=&1-\mathrm{Pr}((F_1)^c\cup (F_2)^c\cup (F_3)^c)\\ &\geq &1- \mathrm{Pr}((F_1)^c)-\mathrm{Pr}((F_2)^c)-\mathrm{Pr}((F_3)^c)\\ &\geq &1-2e^{-h^{\varepsilon }}-h^{-2\varepsilon }. \end{eqnarray*}

When (25) holds, we have

\begin{eqnarray*} &&\mathrm{Pr}\left (\mathrm{max}_{a,b\in [n],y(a)\neq y(b)}\frac{2\langle \mathbf{W},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y} )\rangle }{\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2}\gt 1)\right ) \geq \mathrm{Pr}(F)\rightarrow 1, \end{eqnarray*}

as $n\rightarrow \infty$ . Then the proposition follows.

7. An example

Assume $n$ vertices are divided into group I and group II such that group I contains $\lfloor \alpha n\rfloor$ vertices and group II contains $n-\lfloor \alpha n \rfloor$ vertices, where $\alpha \in (0,1)$ . Let $y\;:\;[n]\rightarrow [2]$ be the true community assignment mapping. For any $x\in \Omega$ , define

\begin{align*} (\mathbf{A}_x)_{i,j}\;:\!=\;\begin{cases}1&\mathrm{if}\ x(i)=x(j)=1\\ 0&\mathrm{if}\ x(i)\neq x(j)\\ 2&\mathrm{if}\ x(i)=x(j)=2\end{cases} \end{align*}

and

\begin{align*} \sigma _{i,j}=\begin{cases}\delta _{1}&\mathrm{if}\ y(i)=y(j)=1\\ \delta _0&\mathrm{if}\ y(i)\neq y(j)\\\delta _2&\mathrm{if}\ y(i)=y(j)=2. \end{cases} \end{align*}

Then

\begin{align*} L_{\Phi }(x,y)&=\sum _{i\in [n]}\sum _{j\in [n]}\frac{1}{\sigma _{ij}^2}[(\mathbf{A}_x)_{i,j}-(\mathbf{A}_y)_{i,j}]^2\\ &=\frac{[t_{1,2}(x,y)]^2}{\delta _2^2}+\frac{[t_{2,1}(x,y)]^2}{\delta _1^2}+\frac{2t_{1,1}(x,y)t_{1,2}(x,y)}{\delta _0^2}+\frac{2t_{1,1}(x,y)t_{2,1}(x,y)}{\delta _1^2}\\ &+\frac{8t_{1,2}(x,y)t_{2,2}(x,y)}{\delta _2^2}+\frac{8t_{2,1}(x,y)t_{2,2}(x,y)}{\delta _0^2} \end{align*}

Assume

(73) \begin{align} \frac{\alpha }{\delta _0^2}+\frac{4(1-\alpha )}{\delta _2^2}=\frac{\alpha }{\delta _1^2}+\frac{4(1-\alpha )}{\delta _0^2}\;:\!=\;Q \end{align}

Note that (73) holds whenever $\delta _0=\delta _1=\delta _2$ , which corresponds to the case when the Gaussian perturbations for each pair of vertices are i.i.d.; however, (73) may also holds even when $\delta _0,\delta _1,\delta _2$ are not all equal.

When (73) holds, one may check that Assumption 2.4 holds with

\begin{align*} \Delta =2nQ \end{align*}

and Assumption 2.3 holds with

\begin{align*} R(n)=2Q\varepsilon n^2 \end{align*}

We obtain that if

\begin{align*} Q\gt \frac{4(1+\delta )\log n}{n} \end{align*}

then a.s. exact recovery occurs; if

\begin{align*} Q\lt \frac{4(1-\delta )\log n}{n} \end{align*}

Then a.s. exact recovery does not occur by Theorem 2.9.

In the following tables, we let $(n_1,n_2)=(10,6)$ , $n=16$ and $\alpha =0.625$ . In this case, we have

\begin{align*} \frac{4\log n}{n}=\log (2)\approx 0.6931 \end{align*}

For each triple $(\delta _0,\delta _1,\delta _2)$ satisfying (73), we perform the MLE for $m=200$ times to compute $\hat{y}$ , and let $m_{+}$ be the number of experiments with $\hat{y}=y$ . The values $\frac{m_+}{m}$ are shown in the following tables.

Financial support

ZL’s research is supported by National Science Foundation grant 1608896 and Simons Foundation grant 638143.

Competing interests

None.

Appendix

A. Maximum of Gaussian Random Variables

Lemma A.1. Let $G_1,\ldots, G_N$ be Gaussian random variables with mean $0$ . Let $\varepsilon \in (0,1)$ . Then

\begin{eqnarray*} \mathrm{Pr}\left (\max _{i=1,\ldots,N}G_i\gt (1+\varepsilon )\sqrt{2\max _{i\in [N]}\mathrm{Var}(G_i)\log N}\right )\leq N^{-\varepsilon } \end{eqnarray*}

and moreover, if $G_i$ ’s are independent, and $\varepsilon, N$ satisfy

(A1) \begin{eqnarray} \frac{N^{\varepsilon -\varepsilon ^2}(1-\varepsilon )\sqrt{2\log N}}{\sqrt{2\pi }(1+2(1-\varepsilon )^2\log N)}\gt 1 \end{eqnarray}

Then

\begin{eqnarray*} \mathrm{Pr}\left (\max _{i=1,\ldots,N}G_i\lt (1-\varepsilon )\sqrt{2\min _{j\in [N]}\mathrm{Var}(G_j)\log N}\right )\leq \exp (-N^{\varepsilon }) \end{eqnarray*}

Proof. It is known that for a Gaussian random variable $G_i$ and $x\gt 0$ ,

(A2) \begin{eqnarray} \frac{xe^{-\frac{x^2}{2}}}{\sqrt{2\pi }(1+x^2)}\leq \mathbf{Pr}\left (\frac{G_i}{\sqrt{\mathrm{Var}(G_i)}}\gt x\right )\leq \frac{e^{-\frac{x^2}{2}}}{x\sqrt{2\pi }} \end{eqnarray}

Let $G_1,\ldots, G_N$ be $N$ Gaussian random variables. Then by (A2), we have

\begin{eqnarray*} &&\mathrm{Pr}\left (\max _{i\in [N]}G_i\geq (1+\varepsilon )\sqrt{2\max _{i\in [N]}\mathrm{Var}(G_i)\log N}\right )\\ &\leq & \sum _{i\in [N]}\mathrm{Pr}\left (\frac{G_i}{\sqrt{\mathrm{Var}(G_i)}}\geq (1+\varepsilon )\sqrt{2\log N}\right )\\ &\leq &\frac{N e^{-(1+\varepsilon )^2\log N}}{2(1+\varepsilon )\sqrt{\pi \log N}}\\ &\leq & N^{-\varepsilon } \end{eqnarray*}

If we further assume that $G_i$ ’s are independent, then

\begin{align*} &\mathrm{Pr}\left (\max _{i\in [N]}G_i\lt (1-\varepsilon )\sqrt{2\min _{j\in [N]}\mathrm{Var}(G_j)\log N}\right )\\=&\prod _{i\in [N]}\mathrm{Pr}\left (G_i\lt (1-\varepsilon )\sqrt{2\min _{j\in [N]}\mathrm{Var}(G_j)\log N}\right )\\=&\prod _{i\in [N]}\left [1-\mathrm{Pr}\left (G_i \gt (1-\varepsilon )\sqrt{2\min _{j\in [N]}\mathrm{Var}(G_j)\log N}\right )\right ]\\\leq &\prod _{i\in [N]}\left [1-\mathrm{Pr}\left (\frac{G_i}{\sqrt{\mathrm{Var}(G_i)}}\gt (1-\varepsilon )\sqrt{2\log N}\right )\right ] \end{align*}

By (A2), we obtain

\begin{eqnarray*} \mathrm{Pr}\left (\max _{i\in [N]}G_i\lt (1-\varepsilon )\sqrt{2\min _{j\in [N]}\mathrm{Var}(G_j)\log N}\right )&\leq & \left (1-\frac{(1-\varepsilon )\sqrt{2\log N}}{\sqrt{2\pi }(1+2(1-\varepsilon )^2\log N)}\frac{1}{N^{(1-\varepsilon )^2}}\right )^N \end{eqnarray*}

When (A1) holds, we have

\begin{eqnarray*} \mathrm{Pr}\left (\max _{i\in [N]}G_i\lt (1-\varepsilon )\sqrt{2\min _{j\in [N]}\mathrm{Var}(G_j)\log N}\right )\leq \left (1-\frac{1}{N^{1-\varepsilon }}\right )^{N^{1-\varepsilon }\cdot N^{\varepsilon }}\leq e^{-N^{\varepsilon }} \end{eqnarray*}

Then the lemma follows.

References

Abbe, E. (2018) Community detection and stochastic block models: Recent developments. J Mach Learn Res 18, 186.Google Scholar
Abbe, E. & Sandon, C. (2015). Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pp. 670688.Google Scholar
Abbe, E., Bandeira, A. S. & Hall, G. (2016) Exact recovery in the stochastic block model. IEEE Trans Info Theory 62, 471487.Google Scholar
Chen, X. & Yang, Y. (2020) Cutoff for exact recovery of gaussian mixture models, arXiv: 2001.01194.Google Scholar
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977) Maximum likelihood from incomplete data via the em algorithm. J Royal Stat. Soci. Ser. B 39(1), 138.Google Scholar
Dyer, M. E. & Frieze, A. M. (1989) The solution of some random np-hard problems in polynomial expected time. J Algorithm 10, 451489.Google Scholar
Fraley, C. & Raftery, A. (2002) Model-based clustering, discriminant analysis, and density estimation. J. American Stat. Assoc. 97, 611631.Google Scholar
Giraud, C. & Verzelen, N. (2019) Partial recovery bounds for clustering with the relaxed k means, arXiv: 1807.07547.Google Scholar
Holland, P. W., Laskey, K. B. & Leinhardt, S. (1983) Stochastic blockmodels: First steps. Soc. Networks 5, 109137.Google Scholar
Kannan, R., Salmasian, H. & Vempala, S. (2008) The spectral method for general mixture models. SIAM J. Comput. 38, 11411156.Google Scholar
Kim, C., Bandeira, A. & Goemans, M. (2017). Community detection in hypergraphs, spiked tensor models, and sum-of-squares. In:12th International Conference on Sampling Theory and Applications, pp. 124128.Google Scholar
Löffler, M., Zhang, A. Y. & Zhou, H. (2020). Optimality of spectral clustering in thegaussian mixture model.Google Scholar
Li, Z. (2020) Exact recovery of community detection in k-partite graph models, arXiv:1910.04320.Google Scholar
Massoulié, L. (2014). Community detection thresholds and the weak Ramanujan property. In: Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pp. 694703.Google Scholar
McQueen, J. B. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Statistics, University of California Press, Berkeley, pp. 281297.Google Scholar
Mossel, El., Neeman, J. & Sly, A. (2018) A proof of the blockmodel threshold conjecture. Combinatorica 38, 665708.Google Scholar
Peng, J. & Wei, Y. (2015) Approximating k-means-type clustering via semidefinite programming. SIAM J. Optim 18, 186205.Google Scholar
Pollard, D. (1981) Strong consistency of k-means clustering. Ann. Statist 9, 135140.Google Scholar
Vempalaa, G. & Wang, S. (2004) A spectral algorithm for learning mixture models. J. Comput. Syst. Sci 68, 841860.Google Scholar