1. Introduction
Community structures are ubiquitous in graphs modelling natural and social phenomena. In natural sciences, atoms form molecules so that atoms in the same molecule have stronger connections compared to those in different molecules. In social sciences, individuals form groups in such a way that individuals in the same group have more communications compared to individuals in different groups. The main aim for community detection is to determine the specific groups that specific individuals belong to based on observations of (random) connections between individuals. Identifying different communities in the stochastic block model is a central topic in many fields of science and technology; see [Reference Abbe1] for a summary.
In this paper, we study the community detection problem for the Gaussian mixture model, in which there are $n$ vertices belonging to $k$ ( $k\geq 2$ ) different communities. We observe a $p\times 1$ vector for each one of the $n$ vertices, perturbed by a $p\times 1$ Gaussian vector with independent (but not necessarily identically distributed), mean-0 entries. More precisely, each entry of the $p\times n$ perturbation matrix is obtained by a multiple of a standard Gaussian random variable, while the intensities of different entries are different. Given such an observation, we find the maximum likelihood estimation (MLE) for the community assignment and study the probability that the MLE equals the true community assignment as the number of vertices $n\rightarrow \infty$ . If this probability tends to $1$ as $n\rightarrow \infty$ , we say exact recovery occurs. Heuristically, it is natural to conjecture that exact recovery may occur when the intensities of the perturbations are small but does not occur when these intensities are large. The major theme of the paper is to investigate how small the intensities of the perturbations are needed in order to ensure the exact recovery and how large the intensities are required to stop the occurrences of the exact recovery.
Clustering problems in the Gaussian mixture model have been studied extensively; see [Reference Dempster, Laird and Rubin5, Reference Fraley and Raftery7, Reference McQueen15, Reference Pollard18] for an incomplete list. We mention some recent related work here.
The Gaussian mixture model when all the entries of the perturbation matrix are i.i.d was studied in [Reference Chen and Yang4], in which a condition for the exact recovery of the semi-definite programming (SDP) is proved. When all the communities have the same size, a condition that exact recovery does not occur was also proved in [Reference Chen and Yang4] when the number of communities $k\leq \log n$ . The case of unbalanced communities was investigated in [Reference Giraud and Verzelen8]. In this paper, we obtain conditions when the exact recovery happens and does not happen for the more general Gaussian mixture model when the entries of the perturbation matrix are not necessarily identically distributed. Our result can be applied to the special case when intensities of the Gaussian perturbations are all equal, and in particular, we obtain a condition that the exact recovery of MLE does not occur when the number of communities $k$ is $e^{o(\log n)}$ in the hypergraph model; see the explanations at the end of Example 4.5. We can also see from the sufficient condition (17) for the exact recovery and the sufficient condition (22) that the exact recovery does not occur that when $k$ is $e^{o(\log n)}$ , these two conditions match and there is a sharp phase transition. In Section 7, we see an example in which these necessary and sufficient conditions of the exact recovery give a sharp phase transition when the Gaussian perturbations are not i.i.d.
When $p=n$ in our model, we may consider the rows and columns of the observation matrix are indexed by the $n$ vertices, and each entry represents an edge. In this case, we obtain the community detection problem on a graph. When $p=n^s$ with $s\geq 2$ , we may consider the rows of the observation matrix are indexed by ordered $s$ -tuples of vertices, and each entry of the observation matrix represents a $(s+1)$ -hyperedge. In this case, we obtain the community detection problem on a hypergraph. Community detections on hypergraphs with Gaussian perturbations were studied in [Reference Kim, Bandeira and Goemans11], where the vertices are divided into two equal-sized communities, and a weight-1 $(d+1)$ -hyperedge exists if and only if all the vertices are in the same group. The results proved in this paper can be applied to the community detection problems on hypergraphs with Gaussian perturbation to obtain necessary and sufficient conditions for the exact recovery, in which the number of communities is arbitrary and communities are not necessarily equal-sized; moreover, the hyperedges have general weights as represented in the (unperturbed) observation matrix. Community detection problems on random graphs were also studied in [Reference Abbe and Sandon2, Reference Abbe, Bandeira and Hall3, Reference Dyer and Frieze6, Reference Holland, Laskey and Leinhardt9, Reference Massoulié14, Reference Mossel, Neeman and Sly16]. Algorithms to efficiently implement the community recovery in mixture models include the Expectation-Maximization (EM) algorithm ([Reference Dempster, Laird and Rubin5]) and the spectral method ([Reference Kannan, Salmasian and Vempala10, Reference Löffler, Zhang and Zhou12, Reference Vempalaa and Wang19]). The EM algorithm is a local search heuristic that can fail. The spectral method is based on the singular value decomposition of the dataset and then only use data related to a fixed finite number of largest singular values to achieve dimension reduction. The performance of the spectral method when the Gaussian perturbations consist of i.i.d. Gaussian random variables was discussed in [Reference Löffler, Zhang and Zhou12]. The major differences between results in this paper and those in [Reference Löffler, Zhang and Zhou12] are (1) this paper focuses on exact recovery, that is, the probability that the estimation is exactly equal to the true community assignment and no mislabel is allowed, while [Reference Löffler, Zhang and Zhou12] allows a small number of mislabelled vertices (almost exact recovery) and (2) [Reference Löffler, Zhang and Zhou12] focuses on Gaussian mixture model with isotropic covariance matrix; while in this paper, the covariances matrices for the model can be more general. Partial recovery was also discussed in [Reference Giraud and Verzelen8].
The implement of the MLE is usually very slow; in some cases, relaxing some constraints in MLE leads to the more efficient SDP. However, most SDPs require the partition to be balanced or that, at least, the size of each group is known in advance. The MLE optimisation discussed here has the advantage to attack the case of unbalanced sizes of groups, and the case when the size of each group is unknown. When the perturbations for the entries of the observation matrix are assumed to be i.i.d. Gaussian, and using the average value of the observations associated with each group to approximated the expectation of the observation to the group, in this case the MLE becomes the K-means estimation. The SDP relaxation for K-means have been studied extensively, see for example [Reference Peng and Wei17]. We expect a similar SDP relaxation for K-means in the dependent case in which the inner product in the objective function should be defined with respect to the inverse of the covariance matrix; yet in this paper, we shall focus on the more fundamental MLE and try to investigate its statistical limit.
The organisation of the paper is as follows. In Section 2, we review the definition of the Gaussian mixture models and hypergraphs and state the main results proved in this paper. In Section 3, we prove conditions for the exact recovery of the Gaussian mixture model when the number of vertices in each community is unknown. In Section 4, we apply the results proved in Section 3 to the exact recovery of the community detection in hypergraphs and also prove conditions when exact recovery does not occur in hypergraphs under the assumption that the number of vertices in each community is unknown. In Section 5, we prove conditions for the exact recovery of the Gaussian mixture model when the number of vertices in each community is known and fixed. In Section 6, we prove conditions when exact recovery does not occur in hypergraphs under the assumption that the number of vertices in each community is known and fixed. In Section 7, we give an example in which these necessary and sufficient conditions of the exact recovery give a sharp phase transition when the Gaussian perturbations are not i.i.d.; results of numerical experiments are also included to show the performance of MLE with difference parameters to illustrate the sharp phase transition. In Section A, we prove a lemma used to obtain the main results of the paper.
2. Backgrounds and main results
In this section, we review the definition of the Gaussian mixture models and hypergraphs and state the main results proved in this paper.
2.1. Gaussian mixture model
Let $n\geq k\geq 2$ be positive integers. Let
be a set of $n$ vertices divided into $k$ different communities. Let
be the set of communities. A mapping $x\; :\; [n]\rightarrow [k]$ which assigns a unique community represented by an integer in $[k]$ to each one of the $n$ vertices in $[n]$ is called a community assignment mapping. Let $\Omega$ be the set consisting of all the possible mappings from $[n]$ to $[k]$ , that is,
Each mapping in $\Omega$ is a community assignment mapping.
We shall define a function $\theta$ , which assigns to each vertex a $p\times 1$ vector, depending on the community of this vertex. Indeed, we require that all the vertices in the same community correspond to the same $p\times 1$ vector under $\theta$ . Observing $\theta$ perturbed by some Gaussian noise, our goal is to identify the community assignment and determine when exactly recovery occurs. It is natural to believe that those vertices have corresponding observations close to each other are in the same community, while those vertices whose corresponding observations are far away from each other are in different communities, but this intuition may be complicated by the existence of the Gaussian noise. See Sections 4.1, 4.2 and 4.3 for examples with specific $\theta$ .
More precisely, let $p\geq 1$ be a positive integer. Let
be a function on the set $\Omega \times [p]\times [k]$ taking real values.
For a community assignment mapping $x\in \Omega$ , let $\mathbf{A}_x$ be a $p\times n$ matrix whose entries are given by:
Let $\Sigma$ be a $p\times n$ matrix with positive real entries defined by:
Let $P,Q$ be two $p\times n$ matrices. Define the inner product of $P,Q$ by
The norm $\|P\|$ for a matrix $P$ is defined by:
Let $P*Q$ be a $p\times n$ matrix defined by:
Define a random observation matrix $\mathbf{K}_x$ by:
where $\mathbf{W}$ is a random $p\times n$ matrix with i.i.d. standard Gaussian entries. Note that if the entries of $\Sigma$ are not all equal, the perturbation matrix $\Sigma *W$ has independent but not identically distributed entries.
Let $y\in \Omega$ be the true community assignment mapping. Given the observation $\mathbf{K}_y$ , the goal is to determine the true community assignment mapping $y$ . We shall apply the MLE.
Let $n_1,\ldots,n_k$ be positive integers satisfying
and
that is, $n_i$ is the number of vertices in community $i$ for each $i\in [k]$ under the mapping $y$ .
Let
be the set of all the community assignment mappings such that there are exactly $n_i$ vertices in the community $i$ , for each $i\in [k]$ .
For each real number $c\in (0,1)$ , let
that is, $\Omega _c$ consists of all the community assignment mappings such that the percentage of the numbers of vertices in each community is at least $c$ .
Assume the true community assignment mapping $y\in \Omega _c$ for some $c\in (0,1)$ . Let $\Phi$ be an $p\times n$ matrix whose entries are given by:
in other words, the $(i,j)$ -entry of $\Phi$ is the reciprocal of the $(i,j)$ -entry of $\Sigma$ . Define
and
Then we have the following lemma
Lemma 2.1. $\hat{y}$ is the MLE with respect to the observation $\mathbf{K}_y$ in $\Omega _{\frac{2c}{3}}$ . $\check{y}$ is the MLE with respect to the observation $\mathbf{K}_y$ in $\Omega _{n_1,\ldots,n_k}$ .
Proof. By definition, the MLE with respect to the observation $\mathbf{K}_y$ in $\Omega _{\frac{2c}{3}}$ (resp. $\Omega _{n_1,\ldots,n_k}$ ) should maximise the probability density of the observation $\mathbf{K}_y$ among all $x\in \Omega _{\frac{2c}{3}}$ (resp. $x\in \Omega _{n_1,\ldots,n_k}$ ). If the true community assignment mapping $y=x$ , we may consider $\mathbf{K}_{y}$ as a random matrix with mean value $\mathbf{A}_x$ and independent entries, such that variance of its $(i,j)$ -entry is $\sigma _{i,j}^2$ . Therefore, the probability density of $\mathbf{K}_y$ is given by:
where the exponent is exactly
It is straightforward to check that the minimiser of $\|\Phi *(\mathbf{K}_y-\mathbf{A}_x)\|^2$ is exactly the maximiser of the probability density. Then the lemma follows.
We shall investigate under which conditions we have $\check{y}=y$ and $\hat{y}=y$ with high probability.
To state the main theorems proved in this paper, we first introduce a few assumptions.
For $x,y\in \Omega$ , let
For $x\in \Omega$ , let
then $n_i(x)$ is the number of vertices in community $i$ under the community assignment mapping $x$ . It is straightforward to check that
For $i,j\in [k]$ and $x,z\in \Omega$ , let $t_{i,j}(x,z)$ be a non-negative integer given by:
That is, $t_{i,j}(x,z)$ is the number of vertices in $[n]$ which are in community $i$ under the mapping $x$ and in community $j$ under the mapping $z$ . Then
We now introduce an equivalence condition on $\Omega$ .
Definition 1. For $x\in \Omega$ , let $C(x)$ consist of all the $x'\in \Omega$ such that $x'$ can be obtained from $x$ by a $\theta$ -preserving bijection of communities. More precisely, $x'\in C(x)\subset \Omega$ if and only if the following condition holds
-
• for $i\in [p]$ and $j\in [n]$ , $\theta (x,i,x(j))=\theta (x',i,x'(j))$ .
We define an equivalence relation on $\Omega$ as follows: we say $x,z\in \Omega$ are equivalent if and only if $x\in C(z)$ . Let $\overline{\Omega }$ be the set of all the equivalence classes in $\Omega$ .
We see that if $x$ and $z$ are equivalent in the sense of Definition 1, the non-random parts of the observation matrices corresponding to $x$ and $z$ satisfy $\mathbf{A}_x=\mathbf{A}_z$ . Hence, any algorithm based on this observation will not distinguish equivalent community assignments; and the best we expect is to exactly recover the equivalent class of the community assignment. We want to assume this $\theta$ function to be able to distinguish community assignments up to the composition with a bijection of communities more precisely.
Assumption 2.2. Let $x,z\in \Omega$ . If for any $i\in [p]$ and $j\in [n]$ ,
then there is a bijection $\eta\; :\;[k]\rightarrow [k]$ , such that
where $\circ$ denotes the composition of two mappings.
Note that (8) is equivalent of saying that for $i,j\in [n]$ , $x(i)=x(j)$ if and only if $x'(i)=x'(j)$ . See Section 4.1 for examples.
Define a set
For $\varepsilon \gt 0$ , define a set ${\mathscr B}_{\varepsilon }$ consisting of all the $(t_{1,1},t_{1,2},\ldots,t_{k,k})\in{\mathscr B}$ satisfying all the following conditions:
-
1. $\forall \ i\in [k],\ \max _{j\in [k]}t_{j,i}\geq n_i-n\varepsilon$ .
-
2. For $i\in [k]$ , let $t_{w(i),i}=\max _{j\in [k]}t_{j,i}$ . Then $w$ is a bijection from $[k]$ to $[k]$ .
-
3. $w$ is $\theta$ -preserving, that is, for any $x\in \Omega$ , $i\in [p]$ and $a\in [k]$ , we have
\begin{eqnarray*} \theta (x,i,a)=\theta (w\circ x,i,w(a)). \end{eqnarray*}
We may assume $\theta$ and $\Sigma$ satisfy the following assumptions.
Assumption 2.3. Suppose $\varepsilon \in (0,\frac{2c}{3k})$ , $x\in \Omega _{\frac{2c}{3}}$ and $y\in \Omega _c$ . Then for all $x,y\in \Omega$ , and
we have
Here $R(n)$ is some function of $n$ , such that ( 11 ) combined with ( 15 ) leads to the desired exact recovery result in Theorem 2.6 .
Assumption 2.4. Assume $\varepsilon \in (0,\frac{2c}{3k})$ , $x\in \Omega _{\frac{2c}{3}}$ and $y\in \Omega _c$ . Assume there exists $\Delta \gt 0$ such that the following holds:
Let $y_1,y_2\in \Omega _{\frac{2c}{3}}$ and $a,b\in [k]$ and $a\neq b$ . Let $i,j\in [n]$ such that $i\in y_1^{-1}(a)\cap x^{-1}(b)$ . Let $y_2\;:\;[n]\rightarrow [k]$ be defined as follows:
When
such that for all $i\in [k]$
$\varepsilon \in \left (0,\frac{2c}{3k}\right )$ , and
we have
where $o(1)\rightarrow 0$ , as $n\rightarrow \infty$ .
Assumptions 2.3 and 2.4 are technical. One may interpret Assumption 2.4 as follows: when a community assignment mapping $y_1\in \Omega$ is sufficiently close to the community assignment mapping $x$ in the sense of (12), change the community of exactly one vertex $j$ from $y_1(j)$ to $x(j)$ and obtain a new community assignment mapping $y_2(j)$ , then the distance between $y_2$ and $x$ , in the sense of $L_{\Phi }$ , approximately decreases by $\Delta$ from the distance between $y_1$ and $x$ . In other words, as $y_1$ is close to $x$ and approaching $x$ , $L_{\Phi }(y_1,x)$ decreases linearly with rate $\Delta$ . As we see later, this will create a convergent geometric series when computing the lower bound for the probability of exact recovery.
Note that Assumption 2.3 can be guaranteed by the following stronger assumption with $R(n)=\frac{T(n)}{B_1^2}$ ; see Lemma 3.5.
Assumption 2.5.
-
1. There exists $B_1\gt 0$ , such that for all $i,j\in [p]\times n$ , we have
\begin{eqnarray*} |\sigma _{i,j}|\leq B_1. \end{eqnarray*} -
2. Assume $\varepsilon \in (0,\frac{2c}{3k})$ , $x\in \Omega _{\frac{2c}{3}}$ and $y\in \Omega _c$ . Then for all $x,y\in \Omega$ , and
(13) \begin{eqnarray} (t_{1,1}(x,y),t_{1,2}(x,y),\ldots,t_{k,k}(x,y))\in \mathscr{B}\setminus \mathscr{B}_{\varepsilon }, \end{eqnarray}we have(14) \begin{eqnarray} \sum _{i\in [p],j\in [n]}(\theta (x,i,x(j))-\theta (y,i,y(j)))^2\geq T(n)\gt 0. \end{eqnarray}
Here $T(n)$ is some function of $n$ , such that ( 14 ) combined with ( 16 ) leads to the desired exact recovery result in Theorem 2.6 .
Assumption 2.5(1) gives an upper bound on the standard deviation of the Gaussian perturbations (which may depend on $n$ ); Assumption 2.5(2) may be interpreted as when $x,y\in \Omega$ are “far away” from each other in the sense of (13), the corresponding $\theta$ functions are “far away” from each other in the sense of (11).
Theorem 2.6. Assume $y\in \Omega _c$ is the true community assignment mapping. Suppose that Assumptions 2.5 and 2.4 hold. Let $\varepsilon \in (0,\frac{2c}{3k})$ . Suppose one of the following cases occurs
-
1. Assumption 2.3 holds and
(15) \begin{eqnarray} \lim _{n\rightarrow \infty } n\log k-\frac{R(n)}{8}=-\infty, \end{eqnarray} -
2. Assumption 2.5 holds and
(16) \begin{eqnarray} \lim _{n\rightarrow \infty } n\log k-\frac{T(n)}{8B_1^2}=-\infty . \end{eqnarray}
Moreover, suppose that for any constant $\delta \gt 0$ independent of $n$ ,
then $\lim _{n\rightarrow \infty }\Pr (\hat{y}\in C(y))=1$ .
Theorem 2.6 gives a sufficient condition for the exact recovery of MLE in the Gaussian mixture model. It is proved in Section 3. An application of Theorem 2.6 on the exact recovery of community detection on hypergraphs is discussed in Section 4.3.
We also obtain a condition for the exact recovery when the sample space of the MLE is restricted to $\Omega _{n_1,\ldots,n_k}$ , that is, the number of vertices in each community is known and fixed.
Assumption 2.7. Assume $x,y_m,y_h\in \Omega$ such that
-
1. $D_{\Omega }(y_m,y_h)=j$ , where $j\geq 2$ is a positive integer; and
-
2. There exist $u_1,\ldots,u_j\in [n]$ , such that
-
(a) $y_m(v)=y_h(v)$ , for all $v\in [n]\setminus \{u_1,\ldots,u_j\}$ ; and
-
(b) $y_m(u_i)\neq y_h(u_i)=x(u_i)=y_m(u_{i-1})$ for all $i\in [j]$ .
-
(c) $(t_{1,1}(x,y_m),t_{1,2}(x,y_m),\ldots,t_{k,k}(x,y_m))\in \mathscr{B}_{\varepsilon }$ with $\varepsilon \in \left (0,\frac{2c}{3k}\right )$ and $w(i)=i$ .
-
Then
Theorem 2.8. Suppose that Assumptions 2.5, 2.7 , ( 15 ) and ( 17 ) hold. Then $\lim _{n\rightarrow \infty }\Pr (\check{y}\in C(y))=1$ .
Indeed, Assumption 2.4 implies Assumption 2.7; see Lemma 5.5. Theorem 2.8 is proved in Section (5).
2.2. Hypergraphs
A special case for the Gaussian mixture model is the hypergraph model. Let $s,s_1,s_2$ be positive integers satisfying
A hypergraph $H=(V,E)$ has vertex set $V\;:\!=\;[n]$ and hyperedge set $E$ defined as follows:
Let $\phi \;:\;\cup _{s=s_1}^{s_2}[k]^{s}\rightarrow [0,\infty )$ be a function which assigns a unique real number $\phi (c_1,\ldots,c_s)$ to each $s$ -tuple of communities $(c_1,\ldots,c_s)\in [k]^s$ , and $s\in [s_1,s_2]$ .
For a community assignment mapping $x$ , the weighted adjacency tensor $\mathbf{A}_x$ is defined by:
and
Define a random tensor $\mathbf{K}_x$ as in (2). Recall that $y\in \Omega _c$ is the true community assignment mapping. Define $\hat{y}$ and $\check{y}$ as in (3) and (4).
Recall that $y\in \Omega$ is the true community assignment mapping satisfying $|y^{-1}(i)|=n_i$ , for all $i\in [k]$ . Let $a\in [n]$ . Let $y^{(a)}\in \Omega$ be defined by:
such that
Theorem 2.9. Assume
Suppose that there exists a subset $H\subset [n]$ satisfying all the following conditions:
-
1. $|H|=h=o(n)$ ;
-
2. $\lim _{n\rightarrow \infty }\frac{\log h}{\log n}=1$ ;
-
3. For each $g\in H$ ,
\begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus H)^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},g,i_{j+1},\ldots,i_s)}^2}\\ &&\times (\phi (y(i_1),\ldots,y^{(g)}(g),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(g),\ldots,y(i_s)))^2\\ &=&(1+o(1))L_{\Phi }(y^{(g)},y) \end{eqnarray*} -
4. there exists a constant $\beta \gt 0$ independent of $n$ , such that
\begin{eqnarray*} \frac{\max _{a\in H}L_{\Phi }(y^{(a)},y)}{\min _{a\in H} L_{\Phi }(y^{(a)},y)}\leq \beta ^2,\ \forall n. \end{eqnarray*}
If there exists a constant $\delta \gt 0$ independent of $n$ , such that
Then $\lim _{n\rightarrow \infty }\Pr (\hat{y}\in C(y))=0$ .
Theorem 2.9 is proved in Section 4. An example is given in Section 4.2.
Let $a,b\in [n]$ such that $y(a)\neq y(b)$ . Let $y^{(ab)}\in \Omega _{n_1,\ldots,n_k}$ be the community assignment mapping defined by:
In other words, $y^{(ab)}$ is obtained from $y$ by exchanging $y(a)$ and $y(b)$ .
We also prove a condition when the exact recovery does not occur if the sample space of the MLE is restricted in $\Omega _{n_1,\ldots,n_k}$ .
Theorem 2.10. Assume
Suppose that there exist two subsets $H_1,H_2\subset [n]$ satisfying all the following conditions:
-
1. $|H_1|=|H_2|=h=o(n)$ ;
-
2. $\lim _{n\rightarrow \infty }\frac{\log h}{\log n}=1$ ;
-
3. For any $u_1,u_2\in H_1$ and $v_1,v_2\in H_2$ ,
\begin{eqnarray*} y(u_1)=y(u_2)\neq y(v_1)=y(v_2); \end{eqnarray*} -
4. For any $u\in H_1$ and $v\in H_2$
\begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_1\cup H_2))^{s-1}}\left (\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},u,i_{j+1},\ldots,i_s)}^2}+\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},v,i_{j+1},\ldots,i_s)}^2}\right )\\ &&(\phi (y(i_1),\ldots,y(v),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(u),\ldots,y(i_s)))^2\\ &=&(1+o(1))L_{\Phi }(y^{(uv)},y) \end{eqnarray*} -
5. For any $g\in H_1\cup H_2$ , the quantity
\begin{eqnarray*} &&\sum _{s=s_1}^{s_2}\sum _{j=1}^{s}\sum _{(i_1,\ldots,\widehat{i}_j,\ldots,i_s)\in ([n]\setminus (H_1\cup H_2))^{s-1}}\frac{1}{\sigma _{(i_1,\ldots,i_{j-1},g,i_{j+1},\ldots,i_s)}^2}\\ &&(\phi (y(i_1),\ldots,y(b),\ldots,y(i_s))-\phi (y(i_1),\ldots,y(a),\ldots,y(i_s)))^2 \end{eqnarray*}is a constant and is independent of $g$ .
If there exists a constant $\delta \gt 0$ independent of $n$ , such that
$\lim _{n\rightarrow \infty }\Pr (\check{y}\in C(y))=0$ .
3. Community detection on K-community Gaussian mixture models
In this section, we consider the MLE when the number of vertices in each community is unknown. We shall obtain a sufficient condition for the occurrence of the exact recovery. The main goal is to prove Theorem 2.6.
Recall that we defined an equivalence relation on $\Omega$ in Definition 1. It is straightforward to check that
Therefore, the MLE based on the observation $\mathbf{K}_y$ can only recover the community assignment mapping up to equivalence.
The main idea to prove Theorem 2.6 is as follows:
-
We find an upper bound for the probability that the MLE is not in the equivalence class of the true community assignment; see (32).
-
We show this upper bound converges to zero as $n\rightarrow \infty$ under the assumptions of Theorem 2.6. This is achieved by splitting the upper bounds into two parts: one part $I_2$ is the sum over all the community assignment functions that differs from the true community assignment functions by at most $n\varepsilon$ and other part $I_1$ is the sum over all the community assignment functions that differs from the true community assignment functions by at least $n\varepsilon$ . The fact that $I_2$ converges to 0 as $n\rightarrow \infty$ follows directly from Assumption 2.5. To prove that $I_2$ converges to 0 as $n\rightarrow \infty$ , we bound $I_1$ by a geometric series that converges to 0 as $n\rightarrow \infty$ .
Note that
In particular for each $x\in \Omega$ , we have
Recall that $y\in \Omega _{n_1,\ldots,n_k}$ is the true community assignment mapping. Note that
For each fixed observation $\mathbf{K}_y$ , $\|\Phi *\mathbf{K}_y\|^2$ is fixed and independent of $x\in \Omega$ . Therefore,
For $x\in \Omega$ , define
Then
where we use the identity
Then $f(x)-f(y)$ is a Gaussian random variable with mean value
and variance
Lemma 3.1. For $x,z\in \Omega$ . If $x\in C(z)$ , then
Proof. By Definition 1, if $x\in C(z)$ , then for any $i\in [p]$ and $j,h\in [n]$ , $x(j)=x(h)$ if and only if $z(j)=z(h)$ and $\theta (x,i,x(j))=\theta (z,i,z(j))$ , then $\mathbf{A}_x=\mathbf{A}_z$ by (19). Moreover, since for any $i\in [p]$ and $j\in [n]$ ,
we have
this implies
Then the lemma follows from (29).
Define
Then
Lemma 3.2. Let $x,y,x',y'\in \Omega$ , such that $x'\in C(x)$ and $y'\in C(y)$ , then
Proof. By (31), we obtain that when $x'\in C(x)$ and $y'\in C(y)$ ,
Then the lemma follows from (5).
Lemma 3.3. For $x,y\in \Omega$ , $L_{\Phi }(x,y)\geq 0$ . Moreover,
-
1. If $x\in C(y)$ , then $L_{\Phi }(x,y)=0$ .
-
2. If $\theta$ satisfies Assumption 2.2 and $L_{\Phi }(x,y)=0$ , then $x\in C(y)$ .
Proof. From (5), it is straightforward to check that $L_{\Phi }(x,y)\geq 0$ for any $x,y\in \Omega$ . Moreover, from (5), we obtain
By the fact that $\sigma _{i,j}\gt 0$ for all $i\in [p],j\in [n]$ , we obtain that $L_{\Phi }(x,y)=0$ if and only if
If $x\in C(y)$ , then there exists a $\theta$ -preserving bijection $\eta \;:\;[k]\rightarrow [k]$ , such that $x=\eta \circ y$ . Then (33) holds by the $\theta$ -preserving property of $\eta$ , then we obtain Part(1).
On the other hand, if $L_{\Phi }(x,y)=0$ , we have (33) holds. Then $x\in C(y)$ follows from Assumption 2.2.
Lemma 3.4. Assume that $y\in \Omega _c$ and $x\in \Omega _{\frac{2c}{3}}$ . For $i\in [k]$ , let
where $w(i)\in [k]$ . When $\varepsilon \in \left (0,\frac{2c}{3k}\right )$ and $(t_{1,1}(x,y),t_{1,2}(x,y),\ldots, t_{k,k}(x,y))\in{\mathbb R}^{k^2}$ satisfies
$w$ is a bijection from $[k]$ to $[k]$ .
Proof. See Lemma 5.6 of [Reference Li13].
Definition 2. Define the distance function $D_{\Omega }\;:\;\Omega \times \Omega \rightarrow [n]$ as follows:
for $x,y\in \Omega$ .
From Definition 2, it is straightforward to check that
Lemma 3.5. Assume that $\theta,\Sigma$ satisfies Assumptions 2.5 . Then for all the $x,y\in \Omega$ such that ( 10 ) holds, we have
Proof. Note that
By Assumption 2.5(1), we have
Then the lemma follows from Assumption 2.5(2).
Proof of Theorem 2.6 . Note that
where
and
and $\varepsilon \in \left (0,\frac{2c}{3k}\right )$ .
By Lemma 3.5, when Assumption 2.5 holds, we have
When (15) holds, we obtain
Now let us consider $I_2$ . Let $w$ be the bijection from $[k]$ to $[k]$ as defined in (34). Let $y^*\in \Omega$ be defined by:
Then $y^*\in C(y)$ since $w$ is $\theta$ -preserving by the definition of $\mathscr{B}_{\varepsilon }$ . Moreover, $x$ and $y^*$ satisfies
We consider the following community changing process to obtain $x$ from $y^*$ :
-
1. If for all $(j,i)\in [k]^2$ , and $j\neq i$ , $t_{j,i}(x,y^*)=0$ , then $x=y^*$ .
-
2. If (1) does not hold, find the least $(j,i)\in [k]^2$ in lexicographic order such that $j\neq i$ and $t_{j,i}(x,y^*)\gt 0$ . Choose an arbitrary vertex $u\in \left \{ x^{-1}(j)\cap (y^*)^{-1}(i)\right \}$ . Define $y_1\in \Omega$ as follows:
\begin{eqnarray*} y_1(z)= \begin{cases}j&\mathrm{if}\ z=u\\ y^*(z)&\mathrm{if}\ z\in [n]\setminus \{u\} \end{cases} \end{eqnarray*}
Then we have
Therefore, $x$ , $y_1$ and $y^*$ satisfy
for all $l\in [k]$ .
From Assumption 2.4 and Lemma 3.2, we obtain
Therefore,
In general, if we have constructed $y_l\in \Omega$ ( $r\geq 1$ ) satisfying all the following conditions:
for all $l\in [k]$ . We now construct $y_{r+1}\in \Omega$ as follows:
-
(1) If for all $(j,i)\in [k]^2$ , and $j\neq i$ , $t_{j,i}(x,y_r)=0$ , then $x=y_r$ ; then the construction process stops at this step.
-
(2) If (a) does not hold, find the least $(j,i)\in [k]^2$ in lexicographic order such that $j\neq i$ and $t_{j,i}(x,y_r)\gt 0$ . Choose an arbitrary vertex $u\in \left \{x^{-1}(j)\cap y_r^{-1}(i)\right \}$ . Define $y_{r+1}\in \Omega$ as follows:
\begin{eqnarray*} y_{r+1}(z)= \begin{cases}j&\mathrm{if}\ z=u\\ y_r(z)&\mathrm{if}\ z\in [n]\setminus \{u\} \end{cases} \end{eqnarray*}
Then it is straightforward to check that
for all $l\in [k]$ .
Then if (36) holds with $y^*$ replaced by $y_r$ , then (36) holds with $y^*$ replaced by $y_{r+1}$ . By Assumption 2.4, we obtain
Recall that the distance $D_{\Omega }$ in $\Omega$ is defined in Definition 2. From the constructions of $y_{r+1}$ , we have
Therefore, there exists $h\in [n]$ , such that $y_h=x$ . By (39) and Assumption 2.4, we obtain
Since any $x$ in $\mathscr{B}_{\varepsilon }$ can be obtained from $y$ by the community changing process described above, we have
The right-hand side of (41) is the sum of geometric series with both initial term and common ratio equal to
When (17) holds, we obtain
4. Community detection on k-community hypergraphs
In this section, we apply the results proved in Section 3 to the exact recovery of the community detection in hypergraphs and also prove conditions when exact recovery does not occur in hypergraphs under the assumption that the number of vertices in each community is unknown.
The main idea to prove Theorem 2.9 is as follows:
-
We obtain an lower bound for the probability that the MLE is not in the equivalent class of the true community assignment function; see (50).
-
We show this lower bound converges to 1 as $n\rightarrow \infty$ . The proof utilises the inequalities regarding the distribution of the maximum of a number of Gaussian random variables as discussed in Section A.
In the case of a hypergraph, from (26), when
we obtain for $x,z\in \Omega$
In particular,
Recall that $y\in \Omega _c$ is the true community assignment mapping. Then
where $f(x)$ is given by (29).
By (30), we obtain that in the hypergraph case
Then $f(x)-f(y)$ is a Gaussian random variable with mean value $L_{\Phi }(x,y)$ and variance $4L_{\Phi }(x,y)$ , where $L_{\Phi }(x,y)$ is defined by (5).
Proof of Theorem 2.9. When $y^{(a)}\in \Omega$ is defined by (20),
and
Moreover,
Since any of the event $\{f(y^{(a)})-f(y)\lt 0\}$ implies $\hat{y}\neq y$ .
Let $H\subset [n]$ be given as in the assumptions of the proposition. Under Assumption (3) of the proposition when $a\in H$ , we have
Then from (45), we have
Then $1-p(\hat{y};\ \sigma )$ is at least
Let $(\mathscr{X},\mathscr{Y},\mathscr{Z})$ be a partition of $\cup _{s=s_1}^{s_2}[n]^s$ defined by:
For $\eta \in \{\mathscr{X},\mathscr{Y},\mathscr{Z}\}$ , define the random tensor $\mathbf{W}_{\eta }$ from the entries of $\mathbf{W}$ as follows:
For each $a\in H$ , let
For $s\in \{s_1,s_1+1,\ldots,s_2\}$ , let
Explicit computations show that
Claim 4.1. The followings are true:
-
1. $\mathscr{X}_{a}=0$ for $a\in H$ .
-
2. For each $a\in H$ , the variables $\mathscr{Y}_{a}$ and $\mathscr{Z}_{a}$ are independent.
Proof. It is straightforward to check (1). (2) holds because $\mathscr{Y}\cap \mathscr{Z}=\emptyset$ .
For $g\in H$ , let $\mathscr{Y}^{\,g}\subseteq \mathscr{Y}$ be defined by
Note that for $g_1,g_2\in H$ and $g_1\neq g_2$ , $\mathscr{Y}^{\,g_1}\cap \mathscr{Y}^{\,g_2}=\emptyset$ . Moreover, $\mathscr{Y}=\cup _{g\in H}\mathscr{Y}^{\,g}$ . Therefore,
Note also that $\langle \mathbf{W}_{\mathscr{Y}^{\,g}},\Phi *(\mathbf{A}_{y^{(a)}}-\mathbf{A}_{y}) \rangle =0$ , if $g\neq a$ . Hence,
So by (51), we obtain
Then $\{\mathscr{Y}_g\}_{g\in H}$ is a collection of independent centred Gaussian random variables. Moreover, the variance of $\mathscr{Y}_g$ is equal to
by Assumption (3) of the proposition.
By Claim 4.1, we obtain
Moreover,
Recall that
By Lemma A.1 about the tail bound result of the maximum of Gaussian random variables, if (A1) holds with $N$ replaced by $h$ , the event
has probability at least $1-e^{-h^{\varepsilon }}$ , and the event
has probability $1-h^{-\varepsilon }$ .
Moreover, by Assumption (3) of the Proposition and (52),
Define an event $E$ by
Then $E_1\cap E_2\subseteq E$ .
When $n$ is large, and (22) holds
as $n\rightarrow \infty$ . Then the proposition follows.
4.1. Examples of Assumption 2.2
We shall see some examples of the function $\theta \; :\; \Omega \times [k]\times [k]\rightarrow{\mathbb R}$ satisfying Assumption 2.2. We first see an example when $\theta$ can uniquely determine the community assignment mapping in $\Omega$ .
Example 4.2. Assume $p=k$ . For $a,b\in [k]$ , $x\in \Omega$
Then if for all $a\in [p]$ and $j\in [n]$ , ( 7 ) holds, we have $x(j)=a$ if and only if $z(j)=a$ , then $x=z$ .
We now see an example when $\theta$ cannot uniquely determine the community assignment mapping but determines the community assignment mappings up to the equivalent class as defined in Definition 1.
Example 4.3. Assume $p=n$ .
Then if for all $i,j\in [n]$ , ( 7 ) holds, we have $x(j)=x(i)$ if and only if $z(j)=z(i)$ , then $x\in C(z)$ .
Example 4.4. Assume $p=n$ , $a\in [k]$ and $i\in [n]$ .
Then if for all $i,j\in [n]$ , ( 7 ) holds, we have $x(i)-x(j)=z(i)-z(j)$ . This implies that $x(i)=x(j)$ if and only if $z(i)=z(j)$ , therefore $x\in C(z)$ . If both $x$ and $z$ are surjective onto $[k]$ , then $x=z$ .
4.2. Example of Theorem 2.9
Example 4.5. Here we see an example about how to apply Theorem 2.9 to the exact recovery of community detection on hypergraphs. Let $y\in \Omega _{n_1,\ldots,n_k}$ be the true community assignment mapping. Assume that for any $s\in \{s_1,\ldots,s_2\}$ , $(i_1,i_2,\ldots,i_s), (j_1,j_2,\ldots,j_s)\in [n]^s$ , we have
whenever
that is, $\sigma _{(i_1,\ldots,i_s)}$ depends only on the communities of $(i_1,\ldots,i_s)$ under the mapping $y$ . In this case, we can define $\overline{\sigma }: \cup _{s=s_1}^{s_2}[k]^s\rightarrow (0,\infty )$ , such that
Then for any $a\in [n]$ ,
Moreover, for any $a,b\in [n]$ such that
we have
We consider
Assume that when $y(a)=r_0$ , $y^{(a)}(a)=r_1$ , $L_{\Phi }(y^{(a)},y)$ achieves its minimum. Let $H\subset y^{-1}(r_0)$ , then $h=|H|\leq n_{r_0}$ . Assume
Then we may choose $h=\frac{n_{r_0}}{\log n}$ such that Assumptions (1)(2) in Theorem 2.9 hold. Moreover, Assumption (4) in Theorem 2.9 holds because if for all $a\in H$ , let $y^{(a)}(a)=r_1$ , then $L_{\Phi }(y^{(a)},y)$ takes the same value for all $a\in H$ . There are many mappings $\phi\; : \; \cup _{s=s_1}^{s_1}[k]^s\rightarrow{\mathbb R}$ to guarantee Assumption (3) in Theorem 2.9 . For example, one may choose
for $s\in \{s_1,s_1+1,\ldots,s_2\}$ and $b_1,\ldots,b_s\in [k]$ . Then from ( 54 ), we obtain
From ( 46 )–( 49 ) and ( 56 ), we obtain that the terms actually contributing to the sum must satisfy
or
Then we obtain
where
Assume
If in $\{(d_1,b_1),\ldots,(d_s,b_s)\}$ , there exist more than one $g\in [s]$ , such that $(d_g,b_g)=(r_1,r_0)$ , then the sum of such terms will be of order $o((n_{r_0})^{s-1})$ (resp. $o((n_{r_1})^{s-1})$ ) in $L_{0,s}$ (resp. $L_{1,s}$ ). Therefore, we obtain
To analyse $L_{1,s}$ , assume that there exists a positive constant $C\gt 0$ independent of $n$ , such that
Then we obtain
Then Assumption (3) of Theorem 2.9 follows from the fact that $|H|=\frac{n_{r_0}}{\log n}=o(n_{r_0})$ .
In the special case when all the communities have equal size, we may obtain a sufficient condition that the exact recovery of MLE does not occur in the hypergraph case when the number of communites $k=e^{o(\log n)}$ . Since in this case we have $n_1=n_2=\ldots =n_k\geq e^{\log n-o(\log n)}$ , then ( 21 ) holds. Choose $h=\frac{n_1}{\log n}$ , then Assumptions (1) and (2) of Theorem 2.9 hold.
4.3. Example of Theorem 2.6
Example 4.6. We can also apply Theorem 2.6 to the case of exact recovery of community detection on hypergraphs. Again we consider the case when $\sigma _{(i_1,\ldots,i_s)}$ depends only on $(y(i_1),\ldots,y(i_s))$ . Hence, we may define $\overline{\sigma }$ as in ( 53 ).
To check Assumption 2.4 , let $y_m,y_{m+1},x\in \Omega$ be given as in the proof of Proposition 2.6 . For the simplicity of notation, we use $y$ instead of $y^*$ . By ( 57 ), we obtain
For $j,p,q\in [k]$ and $x,y,z\in \Omega$ , we define
Then
Recall that $D_{\Omega }(y_m,y_{m+1})=1$ , and there exists $u\in [n]$ such that
where $i,j\in [k]$ and $i\neq j$ , while $y_m(v)=y_{m+1}(v)$ for all the $v\in [n]\setminus \{u\}$ . This implies that if $\{d_1,\ldots,d_s\}\cap \{i,j\}=\emptyset$ , then the corresponding summand in ( 59 ) is 0 and does not contribute to the sum. Under the assumption that
-
1. $(t_{1,1}(x,y),t_{1,2}(x,y),\ldots,t_{k,k}(x,y))\in \mathscr{B}_{\varepsilon }$ with $w\;:\;[k]\rightarrow [k]$ the identity map; and
-
2. $n_1\geq n_2\geq \ldots \geq n_k$ ; and
-
3. $\min _{(b_1,\ldots,b_s)\in [k]^s}|\overline{\sigma }(b_1,\ldots,b_s)|\geq B_3\gt 0$ ; and
-
4. $\lim _{n\rightarrow \infty }\frac{n\varepsilon }{n_1}=0$ .
we obtain
The identity above can be interpreted as follows. We can classify the terms satisfying $\{d_1,\ldots,d_s\}\cap \{i,j\}\neq \emptyset$ by the number
and obtain that the leading term of $L_{\Phi }(x,y_m)-L_{\Phi }(x,y_{m+1})$ is given by the terms when $N_{i,j}=1$ . Moreover, by Assumption (1) we have
Define
We further make the assumptions below:
Then
Then by Assumption 2.5(1), the exact recovery occurs with probability 1 when $n\rightarrow \infty$ if ( 15 ) and ( 17 ) hold, and
when ( 10 ) holds.
There are a lot of functions $\phi \;:\;\cup _{s=s_1}^{s_2}[k]^s\rightarrow{\mathbb R}$ satisfying ( 60 ) and ( 61 ) and Assumption 2.2 . For example, we may consider the function $\phi$ as defined in ( 56 ). Assume
where $r_0,r_1\in [k]$ and $r_0\neq r_1$ . As in Example 4.5, we obtain that
where
where the inequality follows from Assumption 2.5(2). It is straightforward to check that when $\phi$ is given by ( 56 ) and ( 60 ) hold if ( 58 ) holds.
To check ( 61 ), note that
When ( 10 ) holds, the following cases might occur
-
• $w$ is not a bijection from $[k]$ to $[k]$ . In this case, there exists $i,j\in [k]$ , such that $w(i)=w(j)$ , then when ( 10 ) holds, we obtain
\begin{eqnarray*} t_{w(i),j}=t_{w(j),j}\geq \frac{n_{j}}{k} \end{eqnarray*} -
• $w$ is a bijection from $[k]$ to $[k]$ . However, there exists $i\in [k]^2$ , such that
\begin{eqnarray*} t_{w(j),j}\leq n_i-\varepsilon n. \end{eqnarray*}Let\begin{eqnarray*} i\;:\!=\;w^{-1}(\mathrm{argmax}_{l\in [k]\setminus \{w(j)\}}t_{l,j}), \end{eqnarray*}then $i\neq j$ and\begin{eqnarray*} t_{w(i),j}(x,y)\geq \frac{\varepsilon n}{k-1} \end{eqnarray*}
When ( 10 ) holds and $y\in \Omega _c$ , we have
Let
Then we obtain ( 61 ).
5. Community detection on Gaussian mixture models with fixed number of vertices in each community
In this section, we consider the MLE restricted to the sample space consisting of all the mappings satisfying the condition that the number of vertices in each community is the same as that of the true community assignment mapping $y\in \Omega _{n_1,\ldots,n_k}$ . Again we shall prove a sufficient condition for the occurrences of exact recovery.
Let $x\in \Omega _{n_1,\ldots,n_k}$ . By (28),
Recall that $f(x)$ is defined as in (29). Recall also that $f(x)-f(y)$ is a Gaussian random variable with mean value $L_{\Phi }(x,y)$ and variance $4L_{\Phi }(x,y)$ .
For each $x\in \Omega _{n_1,\ldots,n_k}$ , let
that is, $C^*(x)$ consists of all the community assignment mappings in $\Omega _{n_1,\ldots,n_k}$ that are equivalent to $x$ in the sense of Definition 1. Let
that is, $\overline{\Omega }_{n_1,\ldots,n_k}$ consists of all the equivalence classes in $\Omega _{n_1,\ldots,n_k}$ .
Lemma 5.1. For $x,z\in \Omega _{n_1,\ldots,n_k}$ . If $x\in C^*(z)$ , then
Proof. The lemma follows from Lemma 3.1.
Define
Then
Lemma 5.2. Let $y\in \Omega _{n_1,\ldots,n_k}\cap \Omega _c$ be the true community assignment mapping. Let $x\in \Omega _{n_1,\ldots,n_k}$ . For $i\in [k]$ , let $w(i)\in [k]$ be defined as in ( 34 ). Then
-
1. When $\varepsilon \in \left (0,\frac{c}{k}\right )$ and $(t_{1,1}(x,y),\ldots, t_{k,k}(x,y))\in \mathscr{B}_{\varepsilon }$ , $w$ is a bijection from $[k]$ to $[k]$ .
-
2. Assume there exist $i,j\in [k]$ , such that $n_i\neq n_j$ . If
(63) \begin{eqnarray} \varepsilon \lt \min _{i,j\in [k]:n_i\neq n_j}\left |\frac{n_i-n_j}{n}\right | \end{eqnarray}Then for any $i\in [k]$ ,(64) \begin{eqnarray} n_i=|y^{-1}(i)|=|y^{-1}(w(i))|=n_{w(i)}. \end{eqnarray}
Proof. See Lemma 6.6 of [Reference Li13].
Definition 3. Let $l\geq 2$ be a positive integer. Let $x,y\in \Omega _{n_1,\ldots,n_k}$ . We say $l$ distinct communities $(i_1,\ldots,i_{l})\in [k]^l$ is an $l$ -cycle for $(x,y)$ , if $t_{i_{s-1},i_s}(x,y)\gt 0$ for all $2\leq s\leq l+1$ , where $i_{l+1}\;:\!=\;i_1$ .
Lemma 5.3. Let $x,y\in \Omega _{n_1,\ldots,n_k}$ and $x\neq y$ . Then there exists an $l$ -cycle for $(x,y)$ with $2\leq l\leq k$ .
Proof. See Lemma 3.3 of [Reference Li13].
Lemma 5.4. For any $x,y\in \Omega _{n_1,\ldots,n_k}$ , $L_{\Phi }(x,y)\geq 0$ , where the equality holds if and only if $x\in C^*(y)$ .
Proof. The lemma follows from Lemma 3.3.
Proof. Let $z_0=y_m$ and $z_j=y_h$ . For $i\in [j-1]$ , define $z_i\in \Omega$ by
Then for any $i\in [j]$ ,
by Assumption 2.4, we obtain
summing over all the $i\in [j]$ , we obtain (18).
Proof of Theorem 2.8. Let
By (62), it suffices to show that $\lim _{n\rightarrow \infty }\Gamma =0$ .
Let
Note that
where
and
Under Assumption 2.5, by Lemma 3.5 we have
By (15), we have
Now let us consider $\Gamma _2$ . Recall that $y\in \Omega _{n_1,\ldots,n_k}\cap \Omega _c$ is the true community assignment mapping. Let $w$ be the bijection from $[k]$ to $[k]$ as defined in (34). Let $y^*\in \Omega$ be defined by:
Then $y^*\in C(y)$ . By Part (2) of Lemma 5.2, we obtain that for $i\in [k]$
therefore $y^*\in \Omega _{n_1,\ldots,n_k}$ . Moreover, $x$ and $y^*$ satisfies
If $x\neq y^*$ , by Lemma 5.3, there exists an $l$ -cycle $(i_1,\ldots,i_{l})$ for $(x,y^*)$ with $2\leq l\leq k$ . Then for each $2\leq a\leq (l+1)$ , choose an arbitrary vertex $u_m$ in $S_{i_{m-1},i_m}(x,y^*)$ , and let $y_1(u_m)=i_{m-1}$ , where $i_{l+1}\;:\!=\;i_1$ . For any vertex $z\in [n]\setminus \{u_{2},\ldots,u_{l+1}\}$ , let $y_1(z)=y^*(z)$ .
Note that $y_1\in \Omega _{n_1,\ldots,n_k}$ . Moreover, for $1\leq m\leq l$ , we have
and
When $\left (t_{1,1}(x,y),\ldots, t_{k,k}(x,y)\right )\in \mathscr{B}_{\varepsilon }$ , From Assumption 2.4 and Lemma 5.5, we obtain
Therefore,
If $y_1\neq x$ , we find an $l_2$ -cycle ( $2\leq l_2\leq k$ ) for $(x,y_1)$ , change community assignments along the $l_2$ -cycle as above, and obtain another community assignment mapping $y_2\in \Omega _{n_1,\ldots,n_k}$ , and so on. Let $y_0\;:\!=\;y$ , and note that for each $r\geq 1$ , if $y_r$ is obtained from $y_{r-1}$ by changing colours along an $l_r$ cycle for $(x,y_{r-1})$ , we have
Therefore, finally we can obtain $x$ from $y$ by changing colours along at most $\left \lfloor \frac{n}{2} \right \rfloor$ cycles. By similar arguments as those used to derive (69), we obtain that for each $r$
Therefore, if $y_h=x$ for some $1\leq h\leq \left \lfloor \frac{n}{2} \right \rfloor$ , we have
By Assumption 2.4 and Lemma 5.5, we obtain
Therefore,
Note also that for any $r_1\neq r_2$ , in the process of obtaining $y_{r_1}$ from $y_{r_1-1}$ and the process of obtaining $y_{r_2}$ from $y_{r_2-1}$ , we change community assignments on disjoint sets of vertices. Hence, the order of these steps of changing community assignments along cycles does not affect the final community assignment mapping we obtain. From (66), we have
On the right-hand side of (70), when expanding the product, each summand has the form:
where the factor $\left [(nk)^{2m_2}e^{-\frac{(1+o(1))\Delta 2 m_2}{8}}\right ]$ represents that we changed along 2-cycles $m_2$ times, the factor $\left [ (nk)^{3m_3}e^{-\frac{(1+o(1))\Delta 3 m_3}{8}}\right ]$ represents that we changed along 3-cycles $m_3$ times, and so on. Moreover, each time we changed along an $l$ -cycle, we need to first determine the $l$ different colours involved in the $l$ -cycle, and there are at most $k^l$ different $l$ -cycles; we then need to choose $l$ vertices to change colours, and there are at most $n^{l}$ choices. It is straightforward to check that if $\sigma$ satisfies (17), then
Therefore, we have
when $n$ is sufficiently large and $\varepsilon$ is sufficiently small. Let
Since $\log (1+x)\leq x$ for $x\geq 0$ , we have
as $n\rightarrow \infty$ . Then
6. Community detection on hypergraphs with fixed number of vertices in each community
In this section, we study community detection on hypergraphs under the assumption that the number of vertices in each community is known and fixed. We shall prove a condition when exact recovery does not occur.
Recall that $y\in \Omega _{n_1,\ldots,n_k}$ is the true community assignment mapping.
Proof of Theorem 2.10 . When $y^{(ab)}\in \Omega _{n_1,\ldots,n_k}$ is defined by (23):
and
Note that
since any of the event $(f(y^{(ab)})-f(y)\lt 0)$ implies $\check{y}\neq y$ . By (45), we obtain that $f(y^{(ab)})-f(y)$ is a Gaussian random variable with mean value $\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2$ and variance $4\|\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_y)\|^2$ . So $1-p(\check{y};\ \sigma )$ is at least
Let $H_1$ and $H_2$ be given as in the assumptions of the proposition. Then
Let $(\mathscr{X},\mathscr{Y},\mathscr{Z})$ be a partition of $[n]^s$ defined by:
For $\eta \in \{\mathscr{X},\mathscr{Y},\mathscr{Z}\}$ , define a random tensor $\mathbf{W}_{\eta }$ from the entries of $\mathbf{W}$ as follows:
For each $u\in H_{1}$ and $v\in H_{2}$ , let
Lemma 6.1. The followings are true:
-
1. $\mathscr{X}_{uv}=0$ for $u\in H_{1}$ and $v\in H_{2}$ .
-
2. For each $u\in H_{1}$ and $v\in H_{2}$ , the variables $\mathscr{Y}_{uv}$ and $\mathscr{Z}_{uv}$ are independent.
-
3. Each $\mathscr{Y}_{uv}$ can be decomposed into $Y_u+Y_v$ where $\{Y_u\}_{u\in H_{1}}\cup \{Y_v\}_{v\in H_{2}}$ is a collection of i.i.d. Gaussian random variables.
Proof. Note that for $J_s=(j_1,j_2,\ldots,j_s)\in [n]^s$ ,
It is straightforward to check (1). (2) holds because $\mathscr{Y}\cap \mathscr{Z}=\emptyset$ .
For $g\in H_{1}\cup H_{2}$ , let $\mathscr{Y}_g\subseteq \mathscr{Y}$ be defined by:
Note that for $g_1,g_2\in H_{1}\cup H_{2}$ and $g_1\neq g_2$ , $\mathscr{Y}_{g_1}\cap \mathscr{Y}_{g_2}=\emptyset$ . Moreover, $\mathscr{Y}=\cup _{g\in H_{1}\cup H_{2}}\mathscr{Y}_g$ . Therefore,
Note also that $\langle \mathbf{W}_{\mathscr{Y}_g},\Phi *(\mathbf{A}_{y^{(ab)}}-\mathbf{A}_{y}) \rangle =0$ , if $g\notin \{a,b\}$ . Hence,
So, we can define
By (72), we obtain
Similarly, define
Then $\mathscr{Y}_{ab}=Y_a+Y_b$ and $\{Y_g\}_{g\in H_{1}\cup H_{2}}$ is a collection of independent Gaussian random variables. Moreover, the variance of $Y_g$ is
By Assumption (6) of the proposition, this is independent of $g$ .
By Lemma 6.1, we obtain
Moreover,
By Lemma A.1, we obtain that when $\varepsilon, h$ satisfy (A1) with $N$ replaced by $h$ , each one of the following two events
has probability at least $1-e^{-h^{\varepsilon }}$ . Moreover, the event
occurs with probability at least $1-h^{-2\varepsilon }$ . Then by Assumption (4) of the proposition, we have
By Assumption (5) of the proposition, for any $u\in H_1$ and $v\in H_2$ , we have
Moreover, by Assumption (4) of the proposition,
Hence, the probability of the event
is at least
When (25) holds, we have
as $n\rightarrow \infty$ . Then the proposition follows.
7. An example
Assume $n$ vertices are divided into group I and group II such that group I contains $\lfloor \alpha n\rfloor$ vertices and group II contains $n-\lfloor \alpha n \rfloor$ vertices, where $\alpha \in (0,1)$ . Let $y\;:\;[n]\rightarrow [2]$ be the true community assignment mapping. For any $x\in \Omega$ , define
and
Then
Assume
Note that (73) holds whenever $\delta _0=\delta _1=\delta _2$ , which corresponds to the case when the Gaussian perturbations for each pair of vertices are i.i.d.; however, (73) may also holds even when $\delta _0,\delta _1,\delta _2$ are not all equal.
When (73) holds, one may check that Assumption 2.4 holds with
and Assumption 2.3 holds with
We obtain that if
then a.s. exact recovery occurs; if
Then a.s. exact recovery does not occur by Theorem 2.9.
In the following tables, we let $(n_1,n_2)=(10,6)$ , $n=16$ and $\alpha =0.625$ . In this case, we have
For each triple $(\delta _0,\delta _1,\delta _2)$ satisfying (73), we perform the MLE for $m=200$ times to compute $\hat{y}$ , and let $m_{+}$ be the number of experiments with $\hat{y}=y$ . The values $\frac{m_+}{m}$ are shown in the following tables.
Financial support
ZL’s research is supported by National Science Foundation grant 1608896 and Simons Foundation grant 638143.
Competing interests
None.
Appendix
A. Maximum of Gaussian Random Variables
Lemma A.1. Let $G_1,\ldots, G_N$ be Gaussian random variables with mean $0$ . Let $\varepsilon \in (0,1)$ . Then
and moreover, if $G_i$ ’s are independent, and $\varepsilon, N$ satisfy
Then
Proof. It is known that for a Gaussian random variable $G_i$ and $x\gt 0$ ,
Let $G_1,\ldots, G_N$ be $N$ Gaussian random variables. Then by (A2), we have
If we further assume that $G_i$ ’s are independent, then
By (A2), we obtain
When (A1) holds, we have
Then the lemma follows.