1. Introduction
One of the most fundamental and natural problems in data science is understanding an underlying structure from data sets that can be viewed as networks. The problem is known as clustering or community detection, and it appears in diverse study fields involving real-world networks.
The stochastic block model (SBM) is one of the most fundamental mathematical models for understanding the community structure in networks. An SBM is a random graph with N nodes, partitioned into K disjoint subsets, called the communities, $C_1, C_2, \dots, C_K$ . An SBM can be characterized via its adjacency matrix consisting of 0 and 1, which is a symmetric (random) matrix $\widetilde M$ , whose (i, j)-entry $\widetilde M_{ij}$ is an independent Bernoulli random variable with parameters depending only on the communities to which the nodes i and j belong. For the clustering of an SBM, it is often useful to analyze the eigenvalues of the adjacency matrix and their associated eigenvectors, known as a spectral method.
One of the most prominent examples of spectral methods is principal component analysis (PCA), in which the behavior of the eigenvectors associated with the extremal eigenvalues is considered to obtain the community structure of the SBM. For an SBM with two communities, the expectation of its adjacency matrix $\widetilde M$ has a block structure:
In the simplest case of a balanced SBM with $P_{11} = P_{22} = p$ , $P_{12} = P_{21} = q$ $(p\neq q)$ , with the two communities of equal size, it can be easily checked that ${\mathbb{E}}[\widetilde M]$ has at most two non-zero eigenvalues, $N(p+q)/2$ and $N(p-q)/2$ . Thus, if $N(p-q)/2$ is sufficiently large, the perturbation $\widetilde M - {\mathbb{E}}[\widetilde M]$ is negligible for the two largest eigenvalues $\widetilde M$ , and it is possible to determine the community structure from the eigenvector associated with the second-largest eigenvalue of $\widetilde M$ . After subtracting $(p+q)/2$ from each entry, the shifted adjacency matrix becomes the sum of a rank-1 deterministic matrix and a random matrix with centered entries, and we can use the eigenvector associated with the largest eigenvalue of the shifted adjacency matrix for clustering. The subtraction can work effectively as $(p+q)/2$ can be estimated by the overall density of the matrix for sufficiently large N.
The sum of a deterministic matrix and a random matrix has been extensively studied in random matrix theory. When the deterministic matrix is rank-1, and the random matrix is a Wigner matrix, it is called a (rank-1) spiked Wigner matrix. The behavior of the largest eigenvalue of a spiked Wigner matrix is known to exhibit a sharp phase transition depending on the ratio between the spectral norms of the deterministic part and the random part. This type of phase transition is called the BBP transition after the seminal work of Baik, Ben Arous, and Péché [Reference Baik, Ben Arous and Péché6] for spiked (complex) Wishart matrices. From the BBP transition, we can immediately see that detection of the signal is possible via PCA when the signal-to-noise ratio (SNR) is above a certain threshold.
While the BBP transition has been proved for spiked Wigner matrices under various assumptions [Reference Benaych-Georges and Nadakuditi9, Reference Capitaine, Donati-Martin and Féral10, Reference Féral and Péché17, Reference Péché24], it is not directly applicable to the SBM, since the entries in a Wigner matrix are independent and identically distributed (up to a symmetry constraint) whereas those in the adjacency matrix of an SBM are not. The proof of the BBP transition with an SBM is substantially harder. For example, unless the SBM is balanced, the empirical spectral distribution (ESD) of $\widetilde M$ does not even converge to the semi-circle distribution, which is the limiting ESD of a Wigner matrix; the limiting ESD, in this case, is not given by a simple formula as the semi-circle distribution but by an implicit formula via its Stieltjes transform.
1.1. Main contribution
In this paper we consider a model that generalizes the SBM, called the generalized two-community stochastic block model (G2-SBM), with two communities. In this model, the mean of the matrix has the same block structure as that of the SBM in (1.1), but the entries are not necessarily Bernoulli random variables. With the structure in (1.1), we consider a model with three parameters $p_1$ , $p_2$ , and q. See Definition 2.1 for the precise definition of the G2-SBM.
For the G2-SBM, We prove the BBP-type transition for its largest eigenvalue (Theorem 2.1). The proof is based on analysis of the Stiejtes transform of the ESD, which involves the resolvent of the random part of the G2-SBM. Due to the community structure, the random part is not a Wigner matrix but a generalization of a Wigner matrix, known as a Wigner-type matrix. The local properties of eigenvalues of Wigner-type matrices are now well established by recent developments in random matrix theory; see, e.g., [Reference Ajanki, Erdős and Krüger4, Reference Ajanki, Erdős and Krüger5, Reference Dumitriu and Zhu13].
In our main result, Theorem 2.1, we only state the existence of the critical values and the limiting gap between the two largest eigenvalues but refrain from writing precise formulas for them. We instead apply our results to specific examples naturally arising in applications, the hidden community model and unbalanced stochastic block model, and present the results from numerical experiments. (In terms of the edge probability, the former corresponds to the case $P_{11} = p$ and $P_{12} = P_{21} = P_{22} = q$ , while the latter $P_{11} = P_{22} = p$ and $P_{12} = P_{21} = q$ $(p\neq q)$ , but the sizes of the submatrices are different.)
1.2. Related works
The local law for Wigner-type matrices and the behavior of quadratic vector equations, which are crucial in the analysis for Wigner-type matrices, were thoroughly investigated in [Reference Ajanki, Erdős and Krüger4, Reference Ajanki, Erdős and Krüger5]. A related result on the local law at the cusp for a Wigner-type matrix has also been proved [Reference Erdös, Krüger and Schröder15]. For more results on general Wigner-type matrices, we refer to [Reference Dumitriu and Zhu13, Reference Erdös and Mühlbacher16, Reference Zhu27] and references therein.
The phase transition of the largest eigenvalue was first proved by Baik, Ben Arous, and Péché [Reference Baik, Ben Arous and Péché6] for spiked Wishart matrices and later extended to other models, including the spiked Wigner matrix under various assumptions [Reference Benaych-Georges and Nadakuditi9, Reference Capitaine, Donati-Martin and Féral10, Reference Féral and Péché17, Reference Péché24]. If the SNR is below the threshold given by the BBP transition, the largest eigenvalue has no information on the signal and we cannot use the PCA to detect the signal. For this case, the PCA can be improved by an entrywise transformation that effectively increases the SNR [Reference Barbier, Dia, Macris, Krzakala, Lesieur and Zdeborová8, Reference Perry, Wein, Bandeira and Moitra25]. Reliable detection is impossible below a certain threshold [Reference Perry, Wein, Bandeira and Moitra25], and it is only possible to consider a weak detection, which is a hypothesis testing between the null model (without spike) and the alternative (with spike). We refer to [Reference Chung and Lee12, Reference El Alaoui, Krzakala and Jordan14, Reference Jung, Chung and Lee21] for more detail about weak detection.
The problem of recovering a hidden community from a symmetric matrix for two important cases, Bernoulli and Gaussian entries, was discussed in [Reference Hajek, Wu and Xu19]. A threshold for exact recovery in the SBM was discussed in [Reference Abbe1, Reference Abbe, Bandeira and Hall2, Reference Chen and Hero11, Reference Hajek, Wu and Xu18]. The Kesten–Stigum threshold for the SBM was considered in [Reference Abbe1, Reference Hajek, Wu and Xu20, Reference Mossel, Neeman and Sly22, Reference Mossel and Xu23]. Similarly, the information-theoretic threshold for community detection in the SBM was considered in [Reference Abbe1, Reference Abbe and Sandon3, Reference Banks, Moore, Neeman and Netrapalli7]; it can be achieved for a large number of communities. For more results and applications relating to the SBM, we refer to [Reference Stanley, Bonacci, Kwitt, Niethammer and Mucha26] and references therein.
1.3. Organization of the paper
The rest of the paper is organized as follows: In Section 2 we define the model and state the main result. In Sections 2.1 and 2.2 we introduce the hidden community model and unbalanced stochastic block model to provide results from numerical experiments around the transition threshold. In Section 3 we prove the main theorem. A summary of our results and future research directions is discussed in Section 5. Appendix A contains the definition of the Wigner-type matrices and preliminary results on this model. The detailed analysis for the specific models can be found in Section 4.
2. Main results
In this section, we precisely define the matrix model we consider in this paper and state our main theorem. We introduce a shifted, rescaled matrix for a generalized stochastic block model with two communities.
Definition 2.1. (Generalized two-community stochastic block model.) An $N \times N$ matrix M is a generalized two-community stochastic block model if $M = H + \lambda {uu}^\text{T}$ , where $\lambda \geq 0$ is a constant, $u = (u_1, u_2, \dots, u_N) \in {\mathbb{R}}^N$ with $\|u\| = 1$ , and $H=[H_{ij}]$ is an $N\times N$ real symmetric random matrix satisfying the following:
There exist $S \subset [N] \;:\!=\; \{1, 2, \dots, N \}$ and constants $\theta_1, \theta_2$ such that
\[ u_i = \begin{cases} \theta_1 & \text{if }{\ } i \in S , \\ \theta_2 & \text{if }{\ } i \notin S . \end{cases} \]We further assume that ${|S|}/{N},\,(1-({|S|}/{N})) \gt c \gt 0$ for some (N-independent) constant c.The upper diagonal entries $H_{ij}$ $(i\le j)$ are centered independent random variables such that:
there exist (N-independent) constants $\alpha_1$ and $\alpha_2$ such that
\[ {\mathbb{E}}\big[H_{ij}^2\big] = \begin{cases} \alpha_1 N^{-1} & \text{if }{\ } i, j \in S , \\ \alpha_2 N^{-1} & \text{if }{\ } i, j \notin S , \\ N^{-1} & \text{otherwise}; \end{cases} \]for any (N-independent) $D\gt 0$ , there exists a constant $C_D$ such that, for all $i \leq j$ , ${\mathbb{E}}\big[|H_{ij}|^D\big] \leq C_D N^{-{D}/{2}}.$
For an adjacency matrix $\widetilde M$ as in (1.1), if $P_{11} = p_1$ , $P_{22} = p_2$ , and $P_{12} = P_{21} = q$ , then after shifting and rescaling, we find that
(See Section 4 for more detail.)
We remark that the $H_{ij}$ are not necessarily Bernoulli random variables. The assumption of a finite moment means that the model is in the dense regime. The most typical balanced stochastic block model with two communities corresponds to the choice of parameters $|S| = N/2$ and $\alpha_1 = \alpha_2 \gt 1$ .
Our main theorem is the following result on the phase transition for the spectral gap of the G2-SBM.
Theorem 2.1. Let M be a generalized two-community stochastic block model as defined in Definition 2.1. Denote by $\lambda_1$ and $\lambda_2$ the largest and the second-largest eigenvalues of M. Assume that $\gamma \;:\!=\; N_1 / N$ is fixed. Then, there exists a constant $\lambda_\textrm{c}$ , depending only on $\theta_1$ , $\theta_2$ , $\alpha_1$ , $\alpha_2$ , and $\gamma$ , such that:
(subcritical case) if $\lambda \lt \lambda_\textrm{c}$ , then $\lambda_1 - \lambda_2 \to 0$ as $N \to \infty$ , almost surely;
(supercritical case) if $\lambda \gt \lambda_\textrm{c}$ , then $\lambda_1 - \lambda_2 \to g$ as $N \to \infty$ , almost surely, for some (N-independent) positive constant $g \equiv g(\lambda)$ .
We do not include precise formulas for the critical value $\lambda_{\textrm{c}}$ and the gap g in the statement of Theorem 2.1 for the general cases since they are lengthy but not particularly informative. In the rest of the section, we focus on two specific models and check how the main result, Theorem 2.1, applies to them. Since two models defined in Sections 2.1 and 2.2 are symmetric stochastic block models with two communities, the transition occurs above the Kesten–Stigum threshold discussed in [Reference Abbe1, Reference Mossel and Xu23].
Conjecture 2.1. (Kesten–Stigum threshold.) Let (X, G) be drawn from an $N\times N$ symmetric SBM with k communities with probability p inside the communities and q across. Define $\rm{\small{SNR}}={N(p-q)^2}/{k(p+(1-k)q)}$ . Then, for any $k\ge 2$ , it is possible to solve weak recovery effectively if and only if $\rm{\small{SNR}} \gt 1$ .
Thus, we obtain the regime $p = q + {w}/{\sqrt{N}}$ , where $w = \Theta(1)$ and $p = p_1$ in the following two models.
2.1. Hidden community model
In the hidden community model, only one of the intra-community connection probabilities is larger than the inter-community connection probability, and the other intra-connection probability coincides with the inter-community connection. The precise definition for such a model is as follows.
Definition 2.2. (Hidden community model.) Let $C\subset[n]$ such that $|C|=K$ . Let O be an $N\times N$ symmetric matrix with $O_{ii} = 0$ , where the $O_{ij}$ are independent for $1 \leq i \leq j \leq N$ and
for given probability measures P and Q.
We consider the BBP-type transition of the hidden community model with Bernoulli entries, i.e. $P=\textrm{Bernoulli}(p)$ and $Q=\textrm{Bernoulli}(q)$ with $p\neq q$ , which also corresponds to the case $\alpha_2 = 1$ or $p_2 = q$ in (2.1). It is not hard to find that the transition occurs in the regime $p_1 \;:\!=\; p = {w}/{\sqrt{N}}+q$ for some (possibly N-dependent) $w = \Theta(1)$ . After shifting and rescaling, we find that $\lambda_2 \to 2$ and
See Section 4.1 for the detail.
We perform a numerical simulation for the hidden community model. We set $N=2500$ , $\gamma = 1/4$ , and $q=0.2$ for the sparse model and $q=0.7$ for the dense model. Following the analysis in Section 4.1, we find that an outlier eigenvalue occurs if
Thus, we can show an outlier if $p\gt 0.232$ for the sparse model and $p\ge0.737$ for the dense model. In Figure 1, we compare histograms of the eigenvalues of the shifted, rescaled adjacency matrices with $p = 0.21$ and $p = 0.25$ for the sparse model, and $p = 0.71$ and $p = 0.75$ for the dense model. For each case, we show a histogram of the two largest eigenvalues, $\lambda_1$ and $\lambda_2$ , after 100 iterations. As predicted by the analysis, the outlier appears only for the cases $p = 0.25$ and $p=0.75$ . We can check the gap between $\lambda_1$ and $\lambda_2$ in Figure 1.
2.2. Unbalanced stochastic block model
We next consider the case $p_1=p_2$ or $\alpha_1=\alpha_2$ with $\gamma \neq 1/2$ to refer to an unbalanced stochastic block model. As in the hidden community model, the transition occurs in the regime $p_1 = p_2 \;:\!=\; p = {w}/{\sqrt{N}}+q$ . After shifting and rescaling, we find that $\lambda_2 \to 2$ and
See Section 4.2 for the detail. Note that the transition does not depend on $\gamma$ .
We perform a numerical simulation for the unbalanced stochastic block model. As in the hidden community model, we set $N=2500$ , $\gamma = 1/4$ , and $q=0.2$ for the sparse model and $q = 0.7$ for the dense model. An outlier eigenvalue occurs if
Thus, we can show an outlier if $p\gt 0.216$ for the sparse model and $p\ge0.719$ for the dense model. In Figure 2, we compare histograms of the eigenvalues of the shifted, rescaled adjacency matrices with $p = 0.21$ and $p = 0.23$ for the sparse model, and $p = 0.71$ and $p = 0.73$ for the dense model. Again, for each case, we show histograms of the two largest eigenvalues, $\lambda_1$ and $\lambda_2$ , after 100 iterations. As predicted by the analysis, the outlier appears only for the cases $p = 0.23$ and $p=0.73$ . We can check the gap between $\lambda_1$ and $\lambda_2$ in Figure 2.
3. Proof of Theorem 2.1
Proof of Theorem 2.1. Recall that we denote by $\lambda_1$ and $\lambda_2$ the two largest eigenvalues of M. Let $\mu_1$ and $\mu_2$ be the two largest eigenvalues of H. From Corollary 1.11 and its proof in [Reference Ajanki, Erdős and Krüger4] on Wigner-type matrices, we find that $\mu_1$ and $\mu_2$ converge to $L_+$ , the rightmost edge of the limiting ESD of H. More precisely, there exists an N-independent constant $\delta\gt 0$ such that $|\mu_1 - L_+|,\,|\mu_2 - L_+| \leq N^{-\delta}$ with overwhelming probability. (See Definition A.2 for the definition of the overwhelming probability.) By the Cauchy interlacing formula, we have the inequality $\mu_2 \leq \lambda_2 \leq \mu_1 \leq \lambda_1$ , which shows that $\lambda_2$ also converges to the rightmost edge of the limiting ESD of H.
To prove the limit of $\lambda_1 - \lambda_2$ , we first notice that $\lambda_1$ is an increasing function of $\lambda$ due to the following equation:
Choose $\varepsilon \in (0, 1)$ , which is not necessarily independent of N. (At the end of the proof, we will let $\varepsilon \to 0$ .) Since $\lambda_1 \geq \mu_1$ and $\lambda - \mu_1 \leq \lambda_1 \leq \lambda + \mu_1$ , we find that $\lambda_1 - \mu_1 \leq \varepsilon$ whenever $\lambda \lt \varepsilon$ . On the other hand, $\lambda_1 \gt \mu_1 + 1$ if $\lambda \gt 2\mu_1 + 1 \gt 2\mu_1 + \varepsilon$ . Thus, since $\lambda_1$ is a continuous function of $\lambda$ and $\lambda_2 \leq \mu_1$ , we find that there exists a unique random variable $\widetilde \lambda_{\textrm{c}} \equiv \widetilde \lambda_{\textrm{c}} (N, H, \varepsilon)$ such that
if $\lambda \lt \widetilde \lambda_{\textrm{c}}$ , then $\lambda_1 - \lambda_2 \leq \varepsilon$ ;
if $\lambda \gt \widetilde \lambda_{\textrm{c}}$ , then $\lambda_1 - \lambda_2 \geq \varepsilon$ .
We now consider $\lambda_1$ by applying the Stieltjes transform method in random matrix theory, for which we use the following definition.
Definition 3.1. (Stieltjes transform.) Let $\mu$ be a probability measure on the real line. The Stieltjes transform of $\mu$ is defined by
for $z\in \mathbb{C} \setminus \textrm{supp}(\mu)$ .
For the noise H, we consider its resolvent G(z) defined by $G(z) \;:\!=\; (H-zI)^{-1}$ for $z \in \mathbb{C} \setminus \textrm{spec}(H)$ . Note that the normalized trace $m\;:\!=\; N^{-1} \textrm{Tr}\,G$ is equal to the Stieltjes transform of the ESD of H.
Suppose that z is an eigenvalue of M such that $z \gt L_+ + \varepsilon$ . (In particular, z is not an eigenvalue of H with overwhelming probability.) By definition, $\det\big(H+\lambda {uu}^{\sf T}-zI\big)=0$ , which can be further decomposed into
Thus, if z is not an eigenvalue of H, we find that $\det\!(H-zI)\neq0$ , and hence
Since ${uu}^{\sf T}$ is a rank-1 matrix, $(H-zI)^{-1}{uu}^{\sf T}$ also has rank one. Therefore, it has only one non-zero eigenvalue, which we call $\lambda_0$ . Then, $\lambda_0$ is $-1$ , for otherwise every eigenvalue of $I+(H-zI)^{-1}\lambda {uu}^{\sf T}$ is non-zero, contradicting (3.1). Furthermore, it is also obvious that $(H-zI)^{-1}\lambda u$ is an eigenvector associated with the eigenvalue $-1$ . Thus,
which leads us to the equation
For the noise matrix H, which is a Wigner-type matrix as considered in [Reference Ajanki, Erdős and Krüger4], we have
for any z outside the support of the limiting ESD of H, where $\textbf{m}\;:\!=\; (m_1, m_2, \dots, m_N)$ is the solution to the quadratic vector equation (QVE)
for $i,j=1,2,\ldots,N$ . (See Appendix A for a precise statement of (3.3).) We remark that the uniqueness of the solution m for (3.4) is also known [Reference Ajanki, Erdős and Krüger5].
To solve (3.2) using (3.4), we need to estimate m(z) from the assumption on the community structure in Definition 2.1. From the symmetry, we have an ansatz:
Then, we can rewrite (3.4) as
which can be further simplified to
We can thus conclude that, if there exists a real $\widehat z$ that solves (3.5) under the assumption
obtained from (3.2) and (3.3), then $|\lambda_1 - \widehat z| \leq N^{-\delta}$ for some (N-independent) $\delta \gt 0$ with overwhelming probability. On the other hand, if (3.5) has no real solution under the assumption in (3.6), then $|\lambda_1 - L_+| \leq N^{-\delta}$ for some (N-independent) $\delta \gt 0$ with overwhelming probability.
We now consider the behavior of the random critical value $\widetilde \lambda_{\textrm{c}} (\varepsilon)$ via the deterministic equations (3.5) and (3.6). While it is possible to find the deterministic critical value $\lambda_{\textrm{c}}$ for the existence of a real solution $\widehat z$ for (3.5) and (3.6), we take an indirect approach as follows. First, we notice from the existence of the critical value $\widetilde \lambda_{\textrm{c}}$ and the local law (3.3) that there also exists a deterministic constant $\lambda_{\textrm{c}}(\varepsilon)$ such that, for any $\lambda \gt \lambda_{\textrm{c}}(\varepsilon)$ , there exists real $\widehat z$ that solves (3.5) and (3.6). Note that such a solution $\widehat z$ also satisfies a bound such as $\widehat z - L_+ \gt \varepsilon - N^{-\delta}$ for some $\delta$ from the definition of $\widetilde \lambda_{\textrm{c}}$ . Considering the limit $\varepsilon \to 0$ , we find that $\lambda_{\textrm{c}}$ is determined as the largest number such that when $\lambda = \lambda_{\textrm{c}}$ , $z=L_+$ solves (3.5) and (3.6).
So far, we have seen that $\lambda_1$ and $\lambda_2$ can be approximated by deterministic numbers $\widehat z$ and $L_+$ with overwhelming probability, depending on whether $\lambda$ is larger than another deterministic number $\lambda_{\textrm{c}}$ , again with overwhelming probability. In order to show that these deterministic numbers are N-independent, we notice that the vector $\| u \| = 1$ and hence there exist $\widehat \theta_1$ and $\widehat \theta_2$ , independent of N, such that $N \theta_i^2 = \widehat \theta_i^2$ for $i=1, 2$ . This, in particular, implies that (3.6) is independent of N as well. Since (3.5) is also N-independent, we can conclude that $\widehat z$ , $L_+$ , and $\lambda_{\textrm{c}}$ are N-independent. This completes the proof of Theorem 2.1.
As a remark, we explain how to find $\lambda_{\textrm{c}}$ . First, we change (3.5) to a single equation involving z and $\overline m \equiv \overline m(z) \;:\!=\; \sum_{i=1}^N m_i(z) u_i^2$ only, i.e. $f(z, \overline m)=0$ . We have that the upper edge $L_+$ of the ESD of H is the largest real number such that $f(L_+, \overline m) = 0$ has a double root when considered as an equation for $\overline m$ , which is a consequence of the square-root decay of the limiting ESD at the edge. (For more detail about the behavior of the limiting ESD of Wigner-type matrices, see [Reference Ajanki, Erdős and Krüger4, Theorem 4.1].) The condition can be technically checked easily by solving $f(L_+, \bar m) = 0$ and $({\partial}/{\partial m}) f(L_+, \bar m) = 0$ simultaneously.
4. Examples from stochastic block models
This section considers stochastic block models corresponding to the G2-SBMs with Bernoulli distribution in our setting. Suppose that $\widehat{H}=[\widehat{H}_{ij}]_{i, j=1}^N$ is an SBM such that
In what follows, we will say that the (i, j)-entry is in the diagonal block if $1\le i, j\le N_1$ or $N_1+1\le i, j\le N$ , and otherwise it is in the off-diagonal block. In the block matrix form, it can also be expressed as follows:
Our goal is to shift and rescale $\widehat H$ to convert it into a G2-SBM $M = H + \lambda {uu}^{\sf T}$ as in Definition 2.1. We first notice that the variances of the entries of $\widehat H$ are $p_1 (1-p_1)$ and $p_2 (1-p_2)$ for the diagonal block and $q(1-q)$ for the off-diagonal block. Since we assume that the variance of entry $H_{ij}$ in the off-diagonal block is $N^{-1}$ , we find that the matrix must be divided by $\sqrt{Nq(1-q)}$ . It is then immediate that
as in (2.1).
The mean matrix
is a rank-2 matrix, and thus we need to subtract from each entry a deterministic number that depends on the parameters $p_1$ , $p_2$ , and q. We continue the calculation in Sections 4.1 and 4.2 with two specific cases to obtain the G2-SBM form.
4.1. Hidden community model
Suppose that $p_1 = p$ and $p_2 = q$ . It is then easy to find that ${\mathbb{E}}[\widehat H]$ becomes a rank-1 matrix after subtracting q from each entry, i.e. if we let $E_0$ be the $N \times N$ matrix all of whose entries are q, then
Thus, we find that
Recall that $N_1 = \gamma N$ and $p = {w}/{\sqrt{N}} + q$ . Since $\lambda {uu}^{\sf T} = {\mathbb{E}}[M]$ , we get
i.e. $\theta_1 = 1/\sqrt{\gamma N}$ and $\theta_2 = 0$ . We also find that
Following the proof of Theorem 2.1 in Section 3, we solve the system of equations in (3.5),
Since $p-q = O\big(N^{-1/2}\big)$ , we consider the ansatz $m_N = m_1 + O\big(N^{-1/2}\big)$ , which shows, for $m = \gamma m_1 + (1-\gamma)m_N$ , that $1 + zm + m^2 = O\big(N^{-1/2}\big)$ . Following the analysis in the last paragraph of Section 3, we find that the upper edge $L_+ = 2 + O\big(N^{-1/2}\big)$ . By Theorem 2.1, it also implies that $\lambda_2 \to 2$ as $N \to \infty$ .
In order to determine the location of the largest eigenvalue $\lambda_1$ , we consider (4.1) under the assumption in (3.6),
We remark that the ansatz $m_N = m_1 + O\big(N^{-1/2}\big)$ can be directly checked in this case by plugging (4.2) into (4.1) and eliminating z,
whose solution is
To find the location of the largest eigenvalue, we need to check whether the assumption in (4.2) is valid. However, we can instead find the value of z by first assuming that the solution exists. Then,
At the critical $\lambda_{\textrm{c}}$ for the phase transition in Theorem 2.1, the location of the largest eigenvalue coincides with the location of the upper edge $L_+$ in the limit $N \to \infty$ , or, equivalently, ${\gamma w}/{\sqrt{q(1-q)}} = 1$ . Thus, we conclude that
4.2. Unbalanced stochastic block model
Suppose that $p_1 = p_2 = p$ . Following the strategy in Section 4.1, we let $E_1$ be the $N \times N$ matrix all of whose entries are $(p+q)/2$ . Then,
Thus, we find that
From $\lambda {uu}^{\sf T} = {\mathbb{E}}[M]$ , we get
i.e., $\theta_1 = 1/\sqrt{N}$ and $\theta_2 = -1/\sqrt{N}$ . Also,
With $\alpha_1 = \alpha_2 = {p(1-p)}/{q(1-q)}$ , we solve the system of equations in (3.5),
Again, we consider the ansatz $m_N = m_1 + O\big(N^{-1/2}\big)$ , which leads us to the result that the upper edge $L_+ = 2 + O\big(N^{-1/2}\big)$ and $\lambda_2 \to 2$ as $N \to \infty$ . The assumption in (3.6) becomes
If the solution to (4.3) exists, it would be
At the critical $\lambda_{\textrm{c}}$ , ${w}/{2\sqrt{q(1-q)}} = 1$ , and thus we conclude that
5. Conclusion and future work
We have considered the generalized stochastic block model with two communities. We showed the phase transition for the G2-SBM where the random part is a Wigner-type matrix, which extends the BBP transition. For the precise formulas, we discussed a hidden community model and an unbalanced stochastic block model with Bernoulli distribution and Gaussian distribution at the Kesten–Stigum threshold. Moreover, referring to [Reference Perry, Wein, Bandeira and Moitra25], along with suitable assumptions on the signal, our theorem can be adapted to both models with a non-Gaussian case.
We believe it is possible to prove the phase transition for the sparse matrix in which the data matrix is not necessarily symmetric, and most elements are composed of zeros. We also hope to extend our result to the G2-SBM with more than two communities.
Appendix A. Local law for Wigner-type matrices
In this section we provide a precise statement of the local law for Wigner-type matrices that was used in (3.3) in Section 3. Wigner-type matrices are defined as follows.
Definition A.1. (Wigner-type matrix.) We say an $N\times N$ random matrix $H=(H_{ij})$ is a Wigner-type matrix if the entries of H are independent real symmetric variables satisfying the following conditions:
${\mathbb{E}}(H_{ij})=0$ for all i, j.
The variance matrix $\boldsymbol{{s}}=(s_{ij})$ , where $s_{ij}={\mathbb{E}}|H_{ij}|^2$ , satisfies $(s^L)_{ij}\ge {\rho}/{N}$ and $s_{ij}\le {s_*}/{N}$ , $1\le i, j\le N$ , for finite parameters $\rho$ , $s_*$ , and L.
For the precise statement of the local law, we use the following definitions, which are frequently used in analysis involving rare events in random matrix theory.
Definition A.2. (Overwhelming probability.) An event $\Omega$ holds with overwhelming probability if, for any big enough $D \gt 0$ , ${\mathbb{P}}(\Omega)\le N^{-D}$ for any sufficient large N.
Definition A.3. (Stochastic domination.) Let $\psi^{(N)}(v)$ and $\phi^{(N)}(v)$ be non-negative random variables parametrized by elements v. Consider two families of non-negative random variables,
where $U^{(N)}$ is an N-dependent parameter set. Suppose $N_0\colon(0,\infty)^2\to\mathbb{N}$ is a given function depending on the model parameters $C_D$ , $\gamma$ , $\alpha_1$ , and $\alpha_2$ . If, for $\varepsilon\gt 0$ small enough and $D\gt 0$ big enough, we have ${\mathbb{P}}\big(\phi^{(N)}\gt N^\varepsilon\psi^{(N)}\big) \le N^{-D}$ for $N\ge N_0(\varepsilon, D)$ , then $\phi$ is stochastically dominated by $\psi$ , which is denoted by $\phi\prec\psi$ .
The following definition of a QVE and its solution uniqueness is discussed on the complex upper half plane $\mathcal{H}$ , where $\mathcal{H}=\{z\in {\mathbb{C}} \mid \textrm{Im}\,z\gt 0\}$ , in [Reference Ajanki, Erdős and Krüger5].
Definition A.4. (Quadratic vector equation.) Consider a Banach space $\mathcal{R} \;:\!=\; \{w\colon\mathscr{X}\to{\mathbb{C}}\mid\sup_{x\in\Lambda}|w_x|\lt \infty\}$ and its subset $\mathcal{R_+} \;:\!=\; \{w\in\mathcal{R}\mid\textrm{Im}\,w_x\gt 0\text{ for all }x\in\mathscr{X}\}$ , where $\mathscr{X}$ is an abstract set of labels. Let $S\colon\mathcal{R}\to\mathcal{R}$ be a non-zero bounded linear operator, and $a\in\mathcal{R}$ a real bounded function. For all $z\in\mathcal{H}$ , $-{1}/{m(z)}=z+a+S[m(z)]$ , and its solution is $m\colon\mathcal{H}\to\mathcal{R_+}$ .
Theorem A.1. (Solution of QVE (5).) Let $\textbf m\;:\!=\; (m_1, m_2, \dots, m_N)\colon\mathcal{H}\to\mathcal{H}^N$ be a function on $\mathcal{H}$ . If a matrix s satisfies the conditions in Definition A.1 , the QVE $-{1}/{m_i(z)}=z+ \sum^N_{j=1} s_{ij} m_j(z)$ for $i=1,2,\ldots,N$ and $z\in \mathcal{H}$ has a unique solution.
We are now ready to state the local law. Let $\rho$ be the density defined as
(See also [Reference Ajanki, Erdős and Krüger4, Corollary 1.3] for more detail.)
Theorem A.2. (Local law [Reference Ajanki, Erdős and Krüger4].) Let H be a Wigner-type matrix and fix an arbitrary $\gamma\in(0,1)$ . Then, uniformly for all $z=a+bi$ with $b\ge N^{\gamma-1}$ , the resolvent $G(z) = (H-zI)^{-1}$ satisfies
Furthermore, for any deterministic vector $w\in{\mathbb{C}}^N$ with $\max_i|w_i|\ge 1$ ,
The local law can be generalized to the anisotropic local law as follows.
Theorem A.3. (Anisotropic law [Reference Ajanki, Erdős and Krüger4].) Suppose the assumptions in Theorem A.2 hold. Then, uniformly for all $z=a+bi$ with $b\ge N^{\gamma-1}$ , and for any two deterministic $\ell^2$ -normalized vectors $w, v \in {\mathbb{C}}^N$ ,
Funding information
The authors were supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1A5A1028324).
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.