Understanding Phase Transitions via Mutual Information and MMSE

doi:10.1017/9781108616799.008

7 - Understanding Phase Transitions via Mutual Information and MMSE

Published online by Cambridge University Press: 22 March 2021

Galen Reeves and

Henry D. Pfister

Edited by

Miguel R. D. Rodrigues and

Yonina C. Eldar

Show author details

Miguel R. D. Rodrigues: Affiliation:
University College London
Yonina C. Eldar: Affiliation:
Weizmann Institute of Science, Israel

Book contents

Get access

Summary

The ability to understand and solve high-dimensional inference problems is essential for modern data science. This chapter examines high-dimensional inference problems through the lens of information theory and focuses on the standard linear model as a canonical example that is both rich enough to be practically useful and simple enough to be studied rigorously. In particular, this model can exhibit phase transitions where an arbitrarily small change in the model parameters can induce large changes in the quality of estimates. For this model, the performance of optimal inference can be studied using the replica method from statistical physics but, until recently, it was not known whether the resulting formulas were actually correct. In this chapter, we present a tutorial description of the standard linear model and its connection to information theory. We also describe the replica prediction for this model and outline the authors’ recent proof that it is exact.

Keywords

detection inference regression generalized linear model phase transition replica method

Type: Chapter
Information: Information-Theoretic Methods in Data Science , pp. 197 - 228

DOI: https://doi.org/10.1017/9781108616799.008 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Shannon, C. E., “A mathematical theory of communication,” Bell System Technical J., vol. 27, pp. 379–423, 623–656, 1948.Google Scholar

Lesieur, T., De Bacco, C., Banks, J., Krzakala, F., Moore, C., and Zdeborová, L., “Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering,” in Proc. Allerton Conference on Communication, Control, and Computing, 2016.CrossRef Google Scholar

Cover, T. M. and Thomas, J. A., Elements of information theory, 2nd edn. Wiley-Interscience, 2006.Google Scholar

Han, T. S., Information–spectrum methods in information theory. Springer, 2004.Google Scholar

Mézard, M. and Montanari, A., Information, physics, and computation. Oxford University Press, 2009.Google Scholar

Zdeborová, L. and Krzakala, F., “Statistical physics of inference: Thresholds and algorithms,” Adv. Phys., vol. 65, no. 5, pp. 453–552, 2016.CrossRef Google Scholar

Parisi, G., “A sequence of approximated solutions to the S–K model for spin glasses,” J. Phys. A: Math. and General, vol. 13, no. 4, pp. L115–L121, 1980.Google Scholar

Talagrand, M., “The Parisi formula,” Annals Math., vol. 163, no. 1, pp. 221–263, 2006.Google Scholar

MacKay, D. J., Information theory, inference, and learning algorithms. Cambridge University Press, 2003.Google Scholar

Wainwright, M. J. and Jordan, M. I., Graphical models, exponential families, and variational inference. Now Publisher Inc., 2008.CrossRef Google Scholar

Pereyra, M., Schniter, P., Chouzenoux, E., Pesquet, J.-C., Tourneret, J.-Y., Hero, A. O., and McLaughlin, S., “A survey of stochastic simulation and optimization methods in signal processing,” IEEE J. Selected Topics Signal Processing, vol. 10, no. 2, pp. 224–241, 2016.CrossRef Google Scholar

Opper, M. and Winther, O., “Expectation consistent approximate inference,” J. Machine Learning Res., vol. 6, pp. 2177–2204, 2005.Google Scholar

Pearl, J., Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann Publishers Inc., 1998.Google Scholar

Minka, T. P., “Expectation propagation for approximate Bayesian inference,” in Proc. 17th Conference in Uncertainty in Artificial Intelligence, 2001, pp. 362–369.Google Scholar

Donoho, D. L., Maleki, A., and Montanari, A., “Message–passing algorithms for compressed sensing,” Proc. Natl. Acad. Sci. USA, vol. 106, no. 45, pp. 18914–18919, 2009.Google Scholar

Johnstone, I. M., “Gaussian estimation: Sequence and wavelet models,” 2015, http://statweb.stanford.edu/~imj/.Google Scholar

Foucart, S. and Rauhut, H., A mathematical introduction to compressive sensing. Birkhäuser, 2013.Google Scholar

Eldar, Y. C. and Kutyniok, G., Compressed sensing theory and applications. Cambridge University Press, 2012.Google Scholar

Guo, D. and Verdú, S., “Randomly spread CDMA: Asymptotics via statistical physics,” IEEE Trans. Information Theory, vol. 51, no. 6, pp. 1983–2010, 2005.CrossRef Google Scholar

Reeves, G. and Pfister, H. D., “The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2016, pp. 665–669.Google Scholar

Reeves, G. and Pfister, H. D., “The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact,” 2016, https://arxiv.org/abs/1607.02524.Google Scholar

Reeves, G., “Understanding the MMSE of compressed sensing one measurement at a time,” presented at the Institut Henri Poincaré Spring 2016 Thematic Program on the Nexus of Information and Computation Theories, Paris, 2016, https://youtu.be/vmd8-CMv04I.Google Scholar

Bayati, M. and Montanari, A., “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. Information Theory, vol. 57, no. 2, pp. 764–785, 2011.Google Scholar

Rangan, S., “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2011, pp. 2174–2178.Google Scholar

Vila, J. P. and Schniter, P., “Expectation-maximization Gaussian-mixture approximate message passing,” IEEE Trans. Signal Processing, vol. 61, no. 19, pp. 4658–4672, 2013.CrossRef Google Scholar

Ma, Y., Zhu, J., and Baron, D., “Compressed sensing via universal denoising and approximate message passing,” IEEE Trans. Signal Processing, vol. 64, no. 21, pp. 5611–5622, 2016.CrossRef Google Scholar

Metzler, C. A., Maleki, A., and Baraniuk, R. G., “From denoising to compressed sensing,” IEEE Trans. Information Theory, vol. 62, no. 9, pp. 5117–5144, 2016.Google Scholar

Schniter, P., Rangan, S., and Fletcher, A. K., “Vector approximate message passing for the generalized linear model,” in Asilomar Conference on Signals, Systems and Computers, 2016.Google Scholar

Rangan, S., Schniter, P., and Fletcher, A. K., “Vector approximate message passing,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2017, pp. 1588–1592.Google Scholar

Kabashima, Y., “A CDMA multiuser detection algorithm on the basis of belief propagation,” J. Phys. A: Math. General, vol. 36, no. 43, pp. 11 111–11 121, 2003.CrossRef Google Scholar

Bayati, M., Lelarge, M., and Montanari, A., “Universality in polytope phase transitions and iterative algorithms,” in IEEE International Symposium on Information Theory, 2012.Google Scholar

Reeves, G., “Additivity of information in multilayer networks via additive Gaussian noise transforms,” in Proc. Allerton Conference on Communication, Control, and Computing, 2017, https://arxiv.org/abs/1710.04580.Google Scholar

Çakmak, B., Winther, O., and Fleury, B. H., “ S-AMP: Approximate message passing for general matrix ensembles,” 2014, http://arxiv.org/abs/1405.2767.Google Scholar

Fletcher, A., Sahree-Ardakan, M., Rangan, S., and Schniter, P., “Expectation consistent approximate inference: Generalizations and convergence,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2016.Google Scholar

Rangan, S., Schniter, P., and Fletcher, A. K., “Vector approximate message passing,” 2016, https://arxiv.org/abs/1610.03082.Google Scholar

Schniter, P., Rangan, S., and Fletcher, A. K., “Vector approximate message passing for the generalized linear model,” 2016, https://arxiv.org/abs/1612.01186.Google Scholar

Çakmak, B., Opper, M., Winther, O., and Fleury, B. H., “Dynamical functional theory for compressed sensing,” 2017, https://arxiv.org/abs/1705.04284.Google Scholar

He, H., Wen, C.-K., and Jin, S., “Generalized expectation consistent signal recovery for nonlinear measurements,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2017.Google Scholar

Guo, D., Shamai, S., and Verdú, S., “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1261–1282, 2005.Google Scholar

Stam, A. J., “Some inequalities satisfied by the quantities of information of Fisher and Shannon,” Information and Control, vol. 2, no. 2, pp. 101–112, 1959.Google Scholar

Guo, D., Wu, Y., Shamai, S., and Verdú, S., “Estimation in Gaussian noise: Properties of the minimum mean–square error,” IEEE Trans. Information Theory, vol. 57, no. 4, pp. 2371–2385, 2011.Google Scholar

Reeves, G., Pfister, H. D., and Dytso, A., “Mutual information as a function of matrix SNR for linear Gaussian channels,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2018.Google Scholar

Bhattad, K. and Narayanan, K. R., “An MSE-based transfer chart for analyzing iterative decoding schemes using a Gaussian approximation,” IEEE Trans. Information Theory, vol. 58, no. 1, pp. 22–38, 2007.Google Scholar

Merhav, N., Guo, D., and Shamai, S., “Statistical physics of signal estimation in Gaussian noise: Theory and examples of phase transitions,” IEEE Trans. Information Theory, vol. 56, no. 3, pp. 1400–1416, 2010.Google Scholar

Méasson, C., Montanari, A., Richardson, T. J., and Urbanke, R., “The generalized area theorem and some of its consequences,” IEEE Trans. Information Theory, vol. 55, no. 11, pp. 4793–4821, 2009.Google Scholar

Reeves, G., “Conditional central limit theorems for Gaussian projections,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2017, pp. 3055–3059.Google Scholar

Reeves, G., “Two -moment inequailties for Rényi entropy and mutual information,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2017, pp. 664–668.Google Scholar

Korada, S. B. and Macris, N., “Tight bounds on the capacity of binary input random CDMA systems,” IEEE Trans. Information Theory, vol. 56, no. 11, pp. 5590–5613, 2010.CrossRef Google Scholar

Montanari, A. and Tse, D., “Analysis of belief propagation for non-linear problems: The example of CDMA (or: How to prove Tanaka's formula),” in Proc. IEEE Information Theory Workshop (ITW), 2006, pp. 160–164.Google Scholar

Huleihel, W. and Merhav, N., “Asymptotic MMSE analysis under sparse representation modeling,” Signal Processing, vol. 131, pp. 320–332, 2017.CrossRef Google Scholar

Reeves, G. and Gastpar, M., “The sampling rate-distortion trade-off for sparsity pattern recovery in compressed sensing,” IEEE Trans. Information Theory, vol. 58, no. 5, pp. 3065–3092, 2012.Google Scholar

Reeves, G. and Gastpar, M., “Approximate sparsity pattern recovery: Information–theoretic lower bounds,” IEEE Trans. Information Theory, vol. 59, no. 6, pp. 3451–3465, 2013.Google Scholar

Barbier, J., Dia, M., Macris, N., and Krzakala, F., “The mutual information in random linear estimation,” in Proc. Allerton Conference on Communication, Control, and Computing, 2016.Google Scholar

Barbier, J., Krzakala, F., Macris, N., Miolane, L., and Zdeborová, L., “Phase transitions, optimal errors and optimality of message-passing in generalized linear models,” 2017, https://arxiv.org/abs/1708.03395.Google Scholar

Barbier, J., Krzakala, F., Macris, N., Miolane, L., and Zdeborová, L., “Optimal errors and phase transitions in high-dimensional generalized linear models,” in Conference on Learning Theory, 2018, pp. 728–731.Google Scholar

Manoel, A., Krzakala, F., Mézard, M., and Zdeborová, L., “Multi -layer generalized linear estimation,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2017, pp. 2098–2102.Google Scholar

Fletcher, A. K., Rangan, S., and Schniter, P., “Inference in deep networks in high dimensions,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2018.Google Scholar

Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T., and Zdeborová, L., “Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula,” in Advances in Neural Information Processing Systems (NIPS), 2016, pp. 424–432.Google Scholar

Lelarge, M. and Miolane, L., “Fundamental limits of symmetric low–rank matrix estimation,” 2016, https://arxiv.org/abs/1611.03888.Google Scholar

Lesieur, T., Krzakala, F., and Zdeborová, L., “Constrained low–rank matrix estimation: Phase transitions, approximate message passing and applications,” 2017, https://arxiv. org/abs/1701.00858.CrossRef Google Scholar

Abbe, E., “Community detection and stochastic block models: Recent developments,” 2017, https://arxiv.org/abs/1703.10146.Google Scholar

Kipnis, A., Reeves, G., Eldar, Y. C., and Goldsmith, A., “Compressed sensing under optimal quantization,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2017, pp. 2153–2157.Google Scholar

Kipnis, A., Reeves, G., and Eldar, Y. C., “Single letter formulas for quantized compressed sensing with Gaussian codebooks,” in Proc. IEEE International Symposium on Information Theory (ISIT), 2018.Google Scholar