Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits

doi:10.1017/9781108616799.014

13 - Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits

Published online by Cambridge University Press: 22 March 2021

Yihong Wu and

Jiaming Xu

Edited by

Miguel R. D. Rodrigues and

Yonina C. Eldar

Show author details

Miguel R. D. Rodrigues: Affiliation:
University College London
Yonina C. Eldar: Affiliation:
Weizmann Institute of Science, Israel

Book contents

Get access

Summary

This chapter provides a survey of the common techniques for determining the sharp statistical and computational limits in high-dimensional statistical problems with planted structures, using community detection and submatrix detection problems as illustrative examples. We discuss tools including the first- and second-moment methods for analyzing the maximum-likelihood estimator, information-theoretic methods for proving impossibility results using mutual information and rate-distortion theory, and methods originating from statistical physics such as the interpolation method. To investigate computational limits, we describe a common recipe to construct a randomized polynomial-time reduction scheme that approximately maps instances of the planted clique problem to the problem of interest in total variation distance.

Keywords

statistical limits computational limits statistical computational trade-offs phase transition community detection community recovery submatrix detection submatrix recovery

Type: Chapter
Information: Information-Theoretic Methods in Data Science , pp. 383 - 424

DOI: https://doi.org/10.1017/9781108616799.014 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Lehmann, E. L. and Casella, G., Theory of point estimation, 2nd edn. Springer, 1998.Google Scholar

Brown, L. D. and Low, M. G., “Information inequality bounds on the minimax risk (with an application to nonparametric regression),” Annals Statist., vol. 19, no. 1, pp. 329–337, 1991.Google Scholar

Van, A. W. der Vaart, Asymptotic statistics. Cambridge University Press, 2000.Google Scholar

Le Cam, L., Asymptotic methods in statistical decision theory. Springer, 1986.Google Scholar

Ibragimov, I. A. and Khas’minskĭ, R. Z., Statistical estimation: Asymptotic theory. Springer, 1981.Google Scholar

Birgé, L., “Approximation dans les espaces métriques et théorie de l’estimation,” Z. Wahrscheinlichkeitstheorie verwandte Gebiete, vol. 65, no. 2, pp. 181–237, 1983.Google Scholar

Yang, Y. and Barron, A. R., “Information-theoretic determination of minimax rates of convergence,” Annals Statist., vol. 27, no. 5, pp. 1564–1599, 1999.Google Scholar

Berthet, Q. and Rigollet, P., “Complexity theoretic lower bounds for sparse principal component detection,” Journal of Machine Learning Research: Workshop and Conference Proceedings, vol. 30, pp. 1046–1066, 2013.Google Scholar

Ma, Z. and Wu, Y., “Computational barriers in minimax submatrix detection,” Annals Statist., vol. 43, no. 3, pp. 1089–1116, 2015.Google Scholar

Hajek, B., Wu, Y., and Xu, J., “Computational lower bounds for community detection on random graphs,” in Proc. COLT 2015, 2015, pp. 899–928.Google Scholar

Wang, T., Berthet, Q., and Samworth, R. J., “Statistical and computational trade-offs in estimation of sparse principal components,” Annals Statist., vol. 44, no. 5, pp. 1896–1930, 2016.CrossRef Google Scholar

Gao, C., Ma, Z., and Zhou, H. H., “Sparse CCA: Adaptive estimation and computational barriers,” Annals Statist., vol. 45, no. 5, pp. 2074–2101, 2017.Google Scholar

Brennan, M., Bresler, G., and Huleihel, W., “Reducibility and computational lower bounds for problems with planted sparse structure,” in Proc. COLT 2018, 2018, pp. 48–166.Google Scholar

McSherry, F., “Spectral partitioning of random graphs,” in 42nd IEEE Symposium on Foundations of Computer Science, 2001, pp. 529–537.Google Scholar

Arias-Castro, E. and Verzelen, N., “Community detection in dense random networks,” Annals Statist., vol. 42, no. 3, pp. 940–969, 2014.Google Scholar

Chen, Y. and Xu, J., “Statistical –computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices,” in Proc. ICML 2014, 2014, arXiv:1402.1267.Google Scholar

Montanari, A., “Finding one community in a sparse random graph,” J. Statist. Phys., vol. 161, no. 2, pp. 273–299, arXiv:1502.05680, 2015.Google Scholar

Holland, P. W., Laskey, K. B., and Leinhardt, S., “Stochastic blockmodels: First steps,” Social Networks, vol. 5, no. 2, pp. 109–137, 1983.Google Scholar

Shabalin, A. A., Weigman, V. J., Perou, C. M., and Nobel, A. B., “Finding large average submatrices in high dimensional data,” Annals Appl. Statist., vol. 3, no. 3, pp. 985–1012, 2009.Google Scholar

Kolar, M., Balakrishnan, S., Rinaldo, A., and Singh, A., “Minimax localization of structural information in large noisy matrices,” in Advances in Neural Information Processing Systems, 2011.Google Scholar

Butucea, C. and Ingster, Y. I., “Detection of a sparse submatrix of a high-dimensional noisy matrix,” Bernoulli, vol. 19, no. 5B, pp. 2652–2688, 2013.Google Scholar

Butucea, C., Ingster, Y., and Suslina, I., “Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix,” ESAIM: Probability and Statistics, vol. 19, pp. 115–134, 2015.Google Scholar

Cai, T. T., Liang, T., and Rakhlin, A., “Computational and statistical boundaries for submatrix localization in a large noisy matrix,” Annals Statist., vol. 45, no. 4, pp. 1403–1430, 2017.Google Scholar

Perry, A., Wein, A. S., and Bandeira, A. S., “Statistical limits of spiked tensor models,” arXiv:1612.07728, 2016.Google Scholar

Alaoui, A. E., Krzakala, F., and Jordan, M. I., “Finite size corrections and likelihood ratio fluctuations in the spiked Wigner model,” arXiv:1710.02903, 2017.Google Scholar

Mossel, E., Neeman, J., and Sly, A., “A proof of the block model threshold conjecture,” Combinatorica, vol. 38, no. 3, pp. 665–708, 2013.Google Scholar

Baik, J., Ben Arous, G., and Péché, S., “Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices,” Annals Probability, vol. 33, no. 5, pp. 1643–1697, 2005.Google Scholar

Péché, S., “The largest eigenvalue of small rank perturbations of hermitian random matrices,” Probability Theory Related Fields, vol. 134, no. 1, pp. 127–173, 2006.Google Scholar

Benaych-Georges, F. and Nadakuditi, R. R., “The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices,” Adv. Math., vol. 227, no. 1, pp. 494–521, 2011.Google Scholar

Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L., and Zhang, P., “Spectral redemption in clustering sparse networks,” Proc. Natl. Acad. Sci. USA., vol. 110, no. 52, pp. 20 935–20 940, 2013.CrossRef Google Scholar PubMed

Massoulié, L., “Community detection thresholds and the weak Ramanujan property,” in Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, 2014, pp. 694–703, arXiv:1109.3318.CrossRef Google Scholar

Bordenave, C., Lelarge, M., and Massoulié, L., “Non -backtracking spectrum of random graphs: Community detection and non-regular Ramanujan graphs,” in 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), 2015, pp. 1347–1357, arXiv: 1501.06087.Google Scholar

Banks, J., Moore, C., Vershynin, R., Verzelen, N., and Xu, J., “Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization,” IEEE Trans. Information Theory, vol. 64, no. 7, pp. 4872–4894, 2018.Google Scholar

Ingster, Y. I. and Suslina, I. A., Nonparametric goodness-of-fit testing under Gaussian models. Springer, 2003.Google Scholar

Wu, Y., “Lecture notes on information-theoretic methods for high-dimensional statistics,” 2017, www.stat.yale.edu/∼yw562/teaching/598/it-stats.pdf.Google Scholar

Vajda, I., “On metric divergences of probability measures,” Kybernetika, vol. 45, no. 6, pp. 885–900, 2009.Google Scholar

Feller, W., An introduction to probability theory and its applications, 3rd edn. Wiley, 1970, vol. I.Google Scholar

Kozakiewicz, W., “On the convergence of sequences of moment generating functions,” Annals Math. Statist., vol. 18, no. 1, pp. 61–69, 1947.Google Scholar

Mossel, E., Neeman, J., and Sly, A., “Reconstruction and estimation in the planted partition model,” Probability Theory Related Fields, vol. 162, nos. 3–4, pp. 431–461, 2015.Google Scholar

Polyanskiy, Y. and Wu, Y., “Application of information-percolation method to reconstruction problems on graphs,” arXiv:1804.05436, 2018.Google Scholar

Abbe, E. and Boix, E., “An information-percolation bound for spin synchronization on general graphs,” arXiv:1806.03227, 2018.Google Scholar

Banks, J., Moore, C., Neeman, J., and Netrapalli, P., “Information -theoretic thresholds for community detection in sparse networks,” in Proc. 29th Conference on Learning Theory, COLT 2016, 2016, pp. 383–416.Google Scholar

Guo, D., Shamai, S., and Verdú, S., “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1261–1282, 2005.Google Scholar

Deshpande, Y. and Montanari, A., “Information -theoretically optimal sparse PCA,” in IEEE International Symposium on Information Theory, 2014, pp. 2197–2201.Google Scholar

Deshpande, Y., Abbe, E., and Montanari, A., “Asymptotic mutual information for the twogroups stochastic block model,” arXiv:1507.08685, 2015.Google Scholar

Krzakala, F., Xu, J., and Zdeborová, L., “Mutual information in rank-one matrix estimation,” in 2016 IEEE Information Theory Workshop (ITW), 2016, pp. 71–75, arXiv: 1603.08447.Google Scholar

Csiszár, I., “Information-type measures of difference of probability distributions and indirect observations,” Studia Scientiarum Mathematicarum Hungarica, vol. 2, pp. 299–318, 1967.Google Scholar

Perry, A., Wein, A. S., Bandeira, A. S., and Moitra, A., “Optimality and sub-optimality of PCA for spiked random matrices and synchronization,” arXiv:1609.05573, 2016.Google Scholar

Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., and Zdeborová, L., “Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula,” in Advances in Neural Information Processing Systems, 2016, pp. 424–432, arXiv: 1606.04142.Google Scholar

Lelarge, M. and Miolane, L., “Fundamental limits of symmetric low-rank matrix estimation,” in Proceedings of the 2017 Conference on Learning Theory, 2017, pp. 1297–1301.Google Scholar

Hajek, B., Wu, Y., and Xu, J., “Information limits for recovering a hidden community,” IEEE Trans. Information Theory, vol. 63, no. 8, pp. 4729–4745, 2017.Google Scholar

Polyanskiy, Y. and Wu, Y., “Lecture notes on information theory,” 2015, http://people.lids.mit.edu/yp/homepage/data/itlectures_v4.pdf.Google Scholar

Tsybakov, A. B., Introduction to nonparametric estimation. Springer, 2009.Google Scholar

Hajek, B., Wu, Y., and Xu, J., “Recovering a hidden community beyond the Kesten–Stigum threshold in O(|E|log*|V|) time,” J. Appl. Probability, vol 55, no. 2, pp. 325–352, 2018.Google Scholar

Hajek, B., Wu, Y., and Xu, J. “Submatrix localization via message passing,” J. Machine Learning Res., vol. 18, no. 186, pp. 1–52, 2018.Google Scholar

Hajek, B., Wu, Y., and J. Xu “Semidefinite programs for exact recovery of a hidden community,” in Proc. Conference on Learning Theory (COLT), 2016, pp. 1051–1095, arXiv:1602.06410.Google Scholar

Yun, S.Y. and Proutiere, A., “Community detection via random and adaptive sampling,” in Proc. 27th Conference on Learning Theory, 2014, pp. 138–175.Google Scholar

Mossel, E., Neeman, J., and Sly, A., “Consistency thresholds for the planted bisection model,” in Proc. Forty-Seventh Annual ACM on Symposium on Theory of Computing, 2015, pp. 69–75.Google Scholar

Abbe, E., Bandeira, A. S., and Hall, G., “Exact recovery in the stochastic block model,” IEEE Trans. Information Theory, vol. 62, no. 1, pp. 471–487, 2016.Google Scholar

Jog, V. and Loh, P.-L., “Information-theoretic bounds for exact recovery in weighted stochastic block models using the Rényi divergence,” arXiv:1509.06418, 2015.Google Scholar

Abbe, E. and Sandon, C., “Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms,” in 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), 2015, pp. 670–688, arXiv:1503.00609.Google Scholar

Alon, N., Krivelevich, M., and Sudakov, B., “Finding a large hidden clique in a random graph,” Random Structures and Algorithms, vol. 13, nos. 3–4, pp. 457–466, 1998.Google Scholar

Hazan, E. and Krauthgamer, R., “How hard is it to approximate the best Nash equilibrium?,” SIAM J. Computing, vol. 40, no. 1, pp. 79–91, 2011.Google Scholar

Alon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R., and Xie, N., “Testing kwise and almost k-wise independence,” in Proc. Thirty-Ninth Annual ACM Symposium on Theory of Computing, 2007, pp. 496–505.Google Scholar

Koiran, P. and Zouzias, A., “On the certification of the restricted isometry property,” arXiv:1103.4984, 2011.Google Scholar

Kučera, L., “A generalized encryption scheme based on random graphs,” in Graph-Theoretic Concepts in Computer Science, 1992, pp. 180–186.Google Scholar

Juels, A. and Peinado, M., “Hiding cliques for cryptographic security,” Designs, Codes and Cryptography, vol. 20, no. 3, pp. 269–280, 2000.Google Scholar

Applebaum, B., Barak, B., and Wigderson, A., “Public -key cryptography from different assumptions,” in Proc. 42nd ACM Symposium on Theory of Computing, 2010, pp. 171–180.Google Scholar

Rossman, B., “Average -case complexity of detecting cliques,” Ph.D. dissertation, Massachusetts Institute of Technology, 2010.Google Scholar

Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S., and Xiao, Y., “Statistical algorithms and a lower bound for detecting planted cliques,” in Proc. 45th Annual ACM Symposium on Theory of Computing, 2013, pp. 655–664.Google Scholar

Meka, R., Potechin, A., and Wigderson, A., “Sum -of-squares lower bounds for planted clique,” in Proc. Forty-Seventh Annual ACM Symposium on Theory of Computing, 2015, pp. 87–96.Google Scholar

Deshpande, Y. and Montanari, A., “Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems,” in Proc. COLT 2015, 2015, pp. 523–562.Google Scholar

Barak, B., Hopkins, S. B., Kelner, J. A., Kothari, P., Moitra, A., and Potechin, A., “A nearly tight sum-of-squares lower bound for the planted clique problem,” in IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS, 2016, pp. 428–437.Google Scholar

Feige, U. and Krauthgamer, R., “Finding and certifying a large hidden clique in a semirandom graph,” Random Structures and Algorithms, vol. 16, no. 2, pp. 195–208, 2000.Google Scholar

Feige, U. and Ron, D., “Finding hidden cliques in linear time,” in Proc. DMTCS, 2010, pp. 189–204.Google Scholar

Dekel, Y., Gurel-Gurevich, O., and Peres, Y., “Finding hidden cliques in linear time with high probability,” Combinatorics, Probability and Computing, vol. 23, no. 01, pp. 29–49, 2014.Google Scholar

Ames, B. P. and Vavasis, S. A., “Nuclear norm minimization for the planted clique and biclique problems,” Mathematical Programming, vol. 129, no. 1, pp. 69–89, 2011.Google Scholar

Deshpande, Y. and Montanari, A., “Finding hidden cliques of size √N/e in nearly linear time,” Foundations Comput. Math., vol. 15, no. 4, pp. 1069–1128, 2015.CrossRef Google Scholar

Jerrum, M., “Large cliques elude the Metropolis process,” Random Structures and Algorithms, vol. 3, no. 4, pp. 347–359, 1992.Google Scholar

Verzelen, N. and Arias-Castro, E., “Community detection in sparse random networks,” Annals Appl. Probability, vol. 25, no. 6, pp. 3465–3510, 2015.Google Scholar

Alon, N., Arora, S., Manokaran, R., Moshkovitz, D., and Weinstein, O.., “Inapproximability of densest k-subgraph from average case hardness,,” 2011, www.nada.kth.se/∼rajsekar/papers/dks.pdf.Google Scholar

Hopkins, S. B., Kothari, P. K., and Potechin, A., “SoS and planted clique: Tight analysis of MPW moments at all degrees and an optimal lower bound at degree four,” arXiv: 1507.05230, 2015.Google Scholar

Raghavendra, P. and Schramm, T., “Tight lower bounds for planted clique in the degree-4 SOS program,” arXiv:1507.05136, 2015.Google Scholar

Ames, B., “Robust convex relaxation for the planted clique and densest k-subgraph problems,” arXiv:1305.4891, 2013.Google Scholar

Lovász, L., Large networks and graph limits. American Mathematical Society, 2012.Google Scholar

Gao, C., Lu, Y., and Zhou, H. H., “Rate-optimal graphon estimation,” Annals Statist., vol. 43, no. 6, pp. 2624–2652, 2015.Google Scholar

Klopp, O. and Verzelen, N., “Optimal graphon estimation in cut distance,” arXiv:1703.05101, 2017.Google Scholar

Xu, J., “Rates of convergence of spectral methods for graphon estimation,” in Proc. 35th International Conference on Machine Learning, 2018, arXiv:1709.03183.Google Scholar

Lesieur, T., Krzakala, F., and Zdeborová, L., “Phase transitions in sparse PCA,” in IEEE International Symposium on Information Theory, 2015, pp. 1635–1639.Google Scholar

Alaoui, A. E. and Krzakala, F., “Estimation in the spiked Wigner model: A short proof of the replica formula,” arXiv:1801.01593, 2018.Google Scholar

Montanari, A., Reichman, D., and Zeitouni, O., “On the limitation of spectral methods: From the Gaussian hidden clique problem to rank one perturbations of Gaussian tensors,” in Advances in Neural Information Processing Systems, 2015, pp. 217–225, arXiv: 1411.6149.Google Scholar

Hillar, C. J. and Lim, L.-H., “Most tensor problems are NP-hard,” J. ACM, vol. 60, no. 6, pp. 45:1–45:39, 2013.Google Scholar

Montanari, A. and Richard, E., “A statistical model for tensor PCA,” in Proc. 27th International Conference on Neural Information Processing Systems, 2014, pp. 2897–2905.Google Scholar

Lesieur, T., Miolane, L., Lelarge, M., Krzakala, F., and Zdeborová, L., “Statistical and computational phase transitions in spiked tensor estimation,” arXiv:1701.08010, 2017.Google Scholar

Hopkins, S. B., Shi, J., and Steurer, D., “Tensor principal component analysis via sum-ofsquare proofs,” in COLT, 2015, pp. 956–1006.Google Scholar

Zhang, A. and Xia, D., “Tensor SVD: Statistical and computational limits,” arXiv:1703.02724, 2017.Google Scholar

Paul, D., “Asymptotics of sample eigenstruture for a large dimensional spiked covariance model,” Statistica Sinica, vol. 17, no. 4, pp. 1617–1642, 2007.Google Scholar

Lesieur, T., Bacco, C. D., Banks, J., Krzakala, F., Moore, C., and Zdeborová, L., “Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering,” arXiv:1610.02918, 2016.Google Scholar

Csiszár, I., “Almost independence and secrecy capacity,” Problemy peredachi informatsii, vol. 32, no. 1, pp. 48–57, 1996.Google Scholar

Polyanskiy, Y. and Wu, Y., “Dissipation of information in channels with input constraints,” IEEE Trans. Information Theory, vol. 62, no. 1, pp. 35–55, 2016.Google Scholar