Introduction to Information Theory and Data Science.

doi:10.1017/9781108616799.002

1 - Introduction to Information Theory and Data Science.

Published online by Cambridge University Press: 22 March 2021

Miguel R. D. Rodrigues ,

Stark C. Draper ,

Waheed U. Bajwa and

Yonina C. Eldar

Edited by

Miguel R. D. Rodrigues and

Yonina C. Eldar

Show author details

Miguel R. D. Rodrigues: Affiliation:
University College London
Yonina C. Eldar: Affiliation:
Weizmann Institute of Science, Israel

Book contents

HTML view is not available for this content. However, as you have access to this content, a full PDF is available via the 'Save PDF' action button.

Summary

The purpose of this chapter is to set the stage for the book and for the upcoming chapters. We first overview classical information-theoretic problems and solutions. We then discuss emerging applications of information-theoretic methods in various data-science problems and, where applicable, refer the reader to related chapters in the book. Throughout this chapter, we highlight the perspectives, tools, and methods that play important roles in classic information-theoretic paradigms and in emerging areas of data science. Table 1.1 provides a summary of the different topics covered in this chapter and highlights the different chapters that can be read as a follow-up to these topics.

Keywords

information theory source coding channel coding communication compression data representation data acquisition data analysis data processing statistics machine learning

Type: Chapter
Information: Information-Theoretic Methods in Data Science , pp. 1 - 43

DOI: https://doi.org/10.1017/9781108616799.002 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

References

Shannon, C. E., “A mathematical theory of communications,” Bell System Technical J., vol. 27, nos. 3–4, pp. 379–423, 623–656, 1948.CrossRef Google Scholar

Gallager, R. G., Information theory and reliable communications. Wiley, 1968.Google Scholar

Berger, T., Rate distortion theory: A mathematical basis for data compression. Prentice-Hall, 1971.Google Scholar

Csiszár, I. and Körner, J., Information theory: Coding theorems for discrete memoryless systems. Cambridge University Press, 2011.Google Scholar

Gersho, A. and Gray, R. M., Vector quantization and signal compression. Kluwer Academic Publishers, 1991.Google Scholar

MacKay, D. J. C., Information theory, inference and learning algorithms. Cambridge University Press, 2003.Google Scholar

Cover, T. M. and Thomas, J. A., Elements of information theory. John Wiley & Sons, 2006.Google Scholar

Yeung, R. W., Information theory and network coding. Springer, 2008.Google Scholar

El Gamal, A. and Kim, Y.-H., Network information theory. Cambridge University Press, 2011.CrossRef Google Scholar

Arikan, E., “Some remarks on the nature of the cutoff rate,” in Proc. Workshop Information Theory and Applications (ITA ’06), 2006.Google Scholar

Blahut, R. E., Theory and practice of error control codes. Addison-Wesley Publishing Company, 1983.Google Scholar

Lin, S. and Costello, D. J., Error control coding. Pearson, 2005.Google Scholar

Roth, R. M., Introduction to coding theory. Cambridge University Press, 2006.CrossRef Google Scholar

Richardson, T. and Urbanke, R., Modern coding theory. Cambridge University Press, 2008.Google Scholar

Ryan, W. E. and Lin, S., Channel codes: Classical and modern. Cambridge University Press, 2009.Google Scholar

Arikan, E., “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Information Theory, vol. 55, no. 7, pp. 3051–3073, 2009.CrossRef Google Scholar

Jiménez-Feltström, A. and Zigangirov, K. S., “Time-varying periodic convolutional codes with low-density parity-check matrix,” IEEE Trans. Information Theory, vol. 45, no. 2, pp. 2181–2191, 1999.Google Scholar

Lentmaier, M., Sridharan, A., Costello, D. J. J., and Zigangirov, K. S., “Iterative decoding threshold analysis for LDPC convolutional codes,” IEEE Trans. Information Theory, vol. 56, no. 10, pp. 5274–5289, 2010.Google Scholar

Kudekar, S., Richardson, T. J., and Urbanke, R. L., “Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC,” IEEE Trans. Information Theory, vol. 57, no. 2, pp. 803–834, 2011.Google Scholar

Candès, E. J. and Wakin, M. B., “An introduction to compressive sampling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 21–30, 2008.Google Scholar

Ngo, H. Q. and Du, D.-Z., “A survey on combinatorial group testing algorithms with applications to DNA library screening,” Discrete Math. Problems with Medical Appl., vol. 55, pp. 171–182, 2000.Google Scholar

Atia, G. K. and Saligrama, V., “Boolean compressed sensing and noisy group testing,” IEEE Trans. Information Theory, vol. 58, no. 3, pp. 1880–1901, 2012.Google Scholar

Donoho, D. and Tanner, J., “Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing,” Phil. Trans. Roy. Soc. A: Math., Phys. Engineering Sci., pp. 4273–4293, 2009.Google Scholar

Amelunxen, D., Lotz, M., McCoy, M. B., and Tropp, J. A., “Living on the edge: Phase transitions in convex programs with random data,” Information and Inference, vol. 3, no. 3, pp. 224–294, 2014.Google Scholar

Banks, J., Moore, C., Vershynin, R., Verzelen, N., and Xu, J., “Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization,” IEEE Trans. Information Theory, vol. 64, no. 7, pp. 4872–4894, 2018.Google Scholar

Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B., and Troyansky, L., “Determining computational complexity from characteristic ‘phase transitions,”’ Nature, vol. 400, no. 6740, pp. 133–137, 1999.Google Scholar

Zeng, G. and Lu, Y., “Survey on computational complexity with phase transitions and extremal optimization,” in Proc. 48th IEEE Conf. Decision and Control (CDC ’09), 2009, pp. 4352–4359.Google Scholar

Eldar, Y. C., Sampling theory: Beyond bandlimited systems. Cambridge University Press, 2014.CrossRef Google Scholar

Shannon, C. E., “Coding theorems for a discrete source with a fidelity criterion,” IRE National Convention Record, vol. 4, no. 1, pp. 142–163, 1959.Google Scholar

Kipnis, A., Goldsmith, A. J., Eldar, Y. C., and Weissman, T., “Distortion-rate function of sub-Nyquist sampled Gaussian sources,” IEEE Trans. Information Theory, vol. 62, no. 1, pp. 401–429, 2016.CrossRef Google Scholar

Kipnis, A., Eldar, Y. C., and Goldsmith, A. J., “Analog-to-digital compression: A new paradigm for converting signals to bits,” IEEE Signal Processing Mag., vol. 35, no. 3, pp. 16–39, 2018.Google Scholar

Kipnis, A., Eldar, Y. C., and Goldsmith, A. J., “Fundamental distortion limits of analogto-digital compression,” IEEE Trans. Information Theory, vol. 64, no. 9, pp. 6013–6033, 2018.Google Scholar

Rodrigues, M. R. D., Deligiannis, N., Lai, L., and Eldar, Y. C., “Rate-distortion trade-offs in acquisition of signal parameters,” in Proc. IEEE International Conference or Acoustics, Speech, and Signal Processing (ICASSP ’17), 2017.Google Scholar

Shlezinger, N., Eldar, Y. C., and Rodrigues, M. R. D., “Hardware-limited task-based quantization,” submitted to IEEE Trans. Signal Processing, accepted 2019.Google Scholar

Shlezinger, N., Eldar, Y. C., and Rodrigues, M. R. D., “Asymptotic task-based quantization with application to massive MIMO,” submitted to IEEE Trans. Signal Processing, accepted 2019.Google Scholar

Argyriou, A., Evgeniou, T., and Pontil, M., “Convex multi-task feature learning,” Machine Learning, vol. 73, no. 3, pp. 243–272, 2008.Google Scholar

Coates, A., Ng, A., and Lee, H., “An analysis of single-layer networks in unsupervised feature learning,” in Proc. 14th International Conference on Artificial Intelligence and Statistics (AISTATS ’11), 2011, pp. 215–223.Google Scholar

Tosic, I. and Frossard, P., “Dictionary learning,” IEEE Signal Processing Mag., vol. 28, no. 2, pp. 27–38, 2011.Google Scholar

Bengio, Y., Courville, A., and Vincent, P., “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.CrossRef Google Scholar PubMed

Yu, S., Yu, K., Tresp, V., Kriegel, H.-P., and Wu, M., “Supervised probabilistic principal component analysis,” in Proc. 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’06), 2006, pp. 464–473.Google Scholar

Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A., “Supervised dictionary learning,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’09), 2009, pp. 1033–1040.Google Scholar

Vu, V. and Lei, J., “Minimax rates of estimation for sparse PCA in high dimensions,” in Proc. 15th International Conference on Artificial Intelligence and Statistics (AISTATS ’12), 2012, pp. 1278–1286.Google Scholar

Cai, T. T., Ma, Z., and Wu, Y., “Sparse PCA: Optimal rates and adaptive estimation,” Annals Statist., vol. 41, no. 6, pp. 3074–3110, 2013.Google Scholar

Jung, A., Eldar, Y. C., and Görtz, N., “On the minimax risk of dictionary learning,” IEEE Trans. Information Theory, vol. 62, no. 3, pp. 1501–1515, 2016.Google Scholar

Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds on dictionary learning for tensor data,” IEEE Trans. Information Theory, vol. 64, no. 4, 2018.Google Scholar

Hotelling, H., “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., vol. 6, no. 24, pp. 417–441, 1933.Google Scholar

Tipping, M. E. and Bishop, C. M., “Probabilistic principal component analysis,” J. Roy. Statist. Soc. Ser. B, vol. 61, no. 3, pp. 611–622, 1999.Google Scholar

Jolliffe, I. T., Principal component analysis, 2nd edn. Springer-Verlag, 2002.Google Scholar

Comon, P., “Independent component analysis: A new concept?,” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994.Google Scholar

Hyvärinen, A., Karhunen, J., and Oja, E., Independent component analysis. John Wiley & Sons, 2004.Google Scholar

Belhumeur, P., Hespanha, J., and Kriegman, D., “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.Google Scholar

Ye, J., Janardan, R., and Li, Q., “Two-dimensional linear discriminant analysis,,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’04), 2004, pp. 1569–1576.Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J., The elements of statistical learning: Data mining, inference, and prediction, 2nd edn. Springer, 2016.Google Scholar

Hyvärinen, A., “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–634, 1999.Google Scholar

Erdogmus, D., Hild, K. E., Rao, Y. N., and Príncipe, J. C., “Minimax mutual information approach for independent component analysis,” Neural Comput., vol. 16, no. 6, pp. 1235– 1252, 2004.CrossRef Google Scholar PubMed

Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D., “Minimax bounds for sparse PCA with noisy high-dimensional data,” Annals Statist., vol. 41, no. 3, pp. 1055–1084, 2013.Google Scholar

Krauthgamer, R., Nadler, B., and Vilenchik, D., “Do semidefinite relaxations solve sparse PCA up to the information limit?,” Annals Statist., vol. 43, no. 3, pp. 1300–1322, 2015.Google Scholar

Berthet, Q. and Rigollet, P., “Representation learning: A review and new perspectives,” Annals Statist., vol. 41, no. 4, pp. 1780–1815, 2013.Google Scholar

Cai, T., Ma, Z., and Wu, Y., “Optimal estimation and rank detection for sparse spiked covariance matrices,” Probability Theory Related Fields, vol. 161, nos. 3–4, pp. 781–815, 2015.Google Scholar

Onatski, A., Moreira, M., and Hallin, M., “Asymptotic power of sphericity tests for highdimensional data,” Annals Statist., vol. 41, no. 3, pp. 1204–1231, 2013.Google Scholar

Perry, A., Wein, A., Bandeira, A., and Moitra, A., “Optimality and sub-optimality of PCA for spiked random matrices and synchronization,” arXiv:1609.05573, 2016.Google Scholar

Ke, Z., “Detecting rare and weak spikes in large covariance matrices,” arXiv:1609.00883, 2018.Google Scholar

Donoho, D. L. and Grimes, C., “Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data,” Proc. Natl. Acad. Sci. USA, vol. 100, no. 10, pp. 5591–5596, 2003.CrossRef Google Scholar PubMed

Tenenbaum, J. B., de Silva, V., and Langford, J. C., “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.Google Scholar

Jenssen, R., “Kernel entropy component analysis,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 32, no. 5, pp. 847–860, 2010.CrossRef Google Scholar PubMed

Schölkopf, B., Smola, A., and Müller, K.-R., “Kernel principal component analysis,,” in Proc. Intl. Conf. Artificial Neural Networks (ICANN ’97), 1997, pp. 583–588.Google Scholar

Yang, J., Gao, X., Zhang, D., and Yang, J.-Y., “Kernel ICA: An alternative formulation and its application to face recognition,” Pattern Recognition, vol. 38, no. 10, pp. 1784–1787, 2005.CrossRef Google Scholar

Mika, S., Ratsch, G., Weston, J., Schölkopf, B., and Mullers, K. R., “Fisher discriminant analysis with kernels,” in Proc. IEEE Workshop Neural Networks for Signal Processing IX, 1999, pp. 41–48.Google Scholar

Narayanan, H. and Mitter, S., “Sample complexity of testing the manifold hypothesis,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’10), 2010, pp. 1786–1794.Google Scholar

Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T.-W., and Sejnowski, T. J., “Dictionary learning algorithms for sparse representation,” Neural Comput., vol. 15, no. 2, pp. 349–396, 2003.Google Scholar

Aharon, M., Elad, M., and Bruckstein, A., “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.Google Scholar

Zhang, Q. and Li, B., “Discriminative K-SVD for dictionary learning in face recognition,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’10), 2010, pp. 2691–2698.Google Scholar

Geng, Q. and Wright, J., “On the local correctness of l¹-minimization for dictionary learning,” in Proc. IEEE International Symposium on Information Theory (ISIT ’14), 2014, pp. 3180–3184.Google Scholar

Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R., “Learning sparsely used overcomplete dictionaries,” in Proc. 27th Conference on Learning Theory (COLT ’14), 2014, pp. 123–137.Google Scholar

Arora, S., Ge, R., and Moitra, A., “New algorithms for learning incoherent and overcomplete dictionaries,” in Proc. 27th Conference on Learning Theory (COLT ’14), 2014, pp. 779–806.Google Scholar

Gribonval, R., Jenatton, R., and Bach, F., “Sparse and spurious: Dictionary learning with noise and outliers,” IEEE Trans. Information Theory, vol. 61, no. 11, pp. 6298–6319, 2015.Google Scholar

Lee, D. D. and Seung, H. S., “Algorithms for non-negative matrix factorization,” in Proc. Advances in Neural Information Processing Systems 13 (NeurIPS ’01), 2001, pp. 556–562.Google Scholar

Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-I., Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009.CrossRef Google Scholar

Alsan, M., Liu, Z., and Tan, V. Y. F., “Minimax lower bounds for nonnegative matrix factorization,” in Proc. IEEE Statistical Signal Processing Workshop (SSP ’18), 2018, pp. 363–367.Google Scholar

LeCun, Y., Bengio, Y., and Hinton, G., “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.Google Scholar

Goodfellow, I., Bengio, Y., and Courville, A., Deep learning. MIT Press, 2016, www. deeplearningbook.org.Google Scholar

Tishby, N. and Zaslavsky, N., “Deep learning and the information bottleneck principle,” in Proc. IEEE Information Theory Workshop (ITW ’15), 2015.Google Scholar

Shwartz-Ziv, R. and Tishby, N., “Opening the black box of deep neural networks via information,” arXiv:1703.00810, 2017.Google Scholar

Huang, C. W. and Narayanan, S. S., “Flow of Rényi information in deep neural networks,” in Proc. IEEE International Workshop Machine Learning for Signal Processing (MLSP ’16), 2016.Google Scholar

Khadivi, P., Tandon, R., and Ramakrishnan, N., “Flow of information in feed-forward deep neural networks,” arXiv:1603.06220, 2016.Google Scholar

Yu, S., Jenssen, R., and Príncipe, J., “Understanding convolutional neural network training with information theory,” arXiv:1804.09060, 2018.Google Scholar

Yu, S. and Príncipe, J., “Understanding autoencoders with information theoretic concepts,” arXiv:1804.00057, 2018.Google Scholar

Achille, A. and Soatto, S., “Emergence of invariance and disentangling in deep representations,” arXiv:1706.01350, 2017.Google Scholar

Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y., “Learning deep representations by mutual information estimation and maximization,” in International Conference on Learning Representations (ICLR ’19), 2019.Google Scholar

Shalev-Shwartz, S. and Ben-David, S., Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.Google Scholar

Akaike, H., “A new look at the statistical model identification,” IEEE Trans. Automation Control, vol. 19, no. 6, pp. 716–723, 1974.Google Scholar

Barron, A., Rissanen, J., and Yu, B., “The minimum description length principle in coding and modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2743–2760, 1998.Google Scholar

Wainwright, M. J., “Information-theoretic limits on sparsity recovery in the highdimensional and noisy setting,” IEEE Trans. Information Theory, vol. 55, no. 12, pp. 5728–5741, 2009.Google Scholar

Wainwright, M. J., “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ₁-constrained quadratic programming (lasso),” IEEE Trans. Information Theory, vol. 55, no. 5, pp. 2183–2202, 2009.Google Scholar

Raskutti, G., Wainwright, M. J., and Yu, B., “Minimax rates of estimation for highdimensional linear regression over ℓ_q-balls,” IEEE Trans. Information Theory, vol. 57, no. 10, pp. 6976–6994, 2011.Google Scholar

Guo, D., Shamai, S., and Verdú, S., “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1261–1282, 2005.Google Scholar

Guo, D., Shamai, S., and Verdú, S., “Mutual information and conditional mean estimation in Poisson channels,” IEEE Trans. Information Theory, vol. 54, no. 5, pp. 1837–1849, 2008.Google Scholar

Lozano, A., Tulino, A. M., and Verdú, S., “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Trans. Information Theory, vol. 52, no. 7, pp. 3033–3051, 2006.Google Scholar

Pérez-Cruz, F., Rodrigues, M. R. D., and Verdú, S., “Multiple-antenna fading channels with arbitrary inputs: Characterization and optimization of the information rate,” IEEE Trans. Information Theory, vol. 56, no. 3, pp. 1070–1084, 2010.Google Scholar

Rodrigues, M. R. D., “Multiple-antenna fading channels with arbitrary inputs: Characterization and optimization of the information rate,” IEEE Trans. Information Theory, vol. 60, no. 1, pp. 569–585, 2014.Google Scholar

A. G. C. P. Ramos and Rodrigues, M. R. D., “Fading channels with arbitrary inputs: Asymptotics of the constrained capacity and information and estimation measures,” IEEE Trans. Information Theory, vol. 60, no. 9, pp. 5653–5672, 2014.Google Scholar

Kay, S. M., Fundamentals of statistical signal processing: Detection theory. Prentice Hall, 1998.Google Scholar

Feder, M. and Merhav, N., “Relations between entropy and error probability,” IEEE Trans. Information Theory, vol. 40, no. 1, pp. 259–266, 1994.Google Scholar

Sason, I. and Verdú, S., “Arimoto–Rényi conditional entropy and Bayesian M-ary hypothesis testing,” IEEE Trans. Information Theory, vol. 64, no. 1, pp. 4–25, 2018.Google Scholar

Polyanskiy, Y., Poor, H. V., and Verdú, S., “Channel coding rate in the finite blocklength regime,” IEEE Trans. Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.Google Scholar

Vazquez-Vilar, G., Campo, A. T., Guillén i Fàbregas, A., and Martinez, A., “Bayesian Mary hypothesis testing: The meta-converse and Verdú–Han bounds are tight,” IEEE Trans. Information Theory, vol. 62, no. 5, pp. 2324–2333, 2016.Google Scholar

Venkataramanan, R. and Johnson, O., “A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation,” Electron. J. Statist, vol. 12, no. 1, pp. 1126–1149, 2018.CrossRef Google Scholar

Abbe, E., “Community detection and stochastic block models: Recent developments,” J. Machine Learning Res., vol. 18, pp. 1–86, 2018.Google Scholar

Hajek, B., Wu, Y., and Xu, J., “Computational lower bounds for community detection on random graphs,” in Proc. 28th Conference on Learning Theory (COLT ’15), Paris, 2015, pp. 1–30.Google Scholar

Vapnik, V. N., “An overview of statistical learning theory,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988–999, 1999.Google Scholar

Bousquet, O. and Elisseeff, A., “Stability and generalization,” J. Machine Learning Res., vol. 2, pp. 499–526, 2002.Google Scholar

Xu, H. and Mannor, S., “Robustness and generalization,” Machine Learning, vol. 86, no. 3, pp. 391–423, 2012.Google Scholar

McAllester, D. A., “PAC-Bayesian stochastic model selection,” Machine Learning, vol. 51, pp. 5–21, 2003.Google Scholar

Russo, D. and Zou, J., “How much does your data exploration overfit? Controlling bias via information usage,” arXiv:1511.05219, 2016.Google Scholar

Xu, A. and Raginsky, M., “Information -theoretic analysis of generalization capability of learning algorithms,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’17), 2017.Google Scholar

Raginsky, M., Rakhlin, A., Tsao, M., Wu, Y., and Xu, A., “Information -theoretic analysis of stability and bias of learning algorithms,” in Proc. IEEE Information Theory Workshop (ITW ’16), 2016.Google Scholar

Bassily, R., Moran, S., Nachum, I., Shafer, J., and Yehudayof, A., “Learners that use little information,” arXiv:1710.05233, 2018.Google Scholar

Asadi, A. R., Abbe, E., and Verdú, S., “Chaining mutual information and tightening generalization bounds,” arXiv:1806.03803, 2018.Google Scholar

Pensia, A., Jog, V., and Loh, P. L., “Generalization error bounds for noisy, iterative algorithms,” arXiv:1801.04295v1, 2018.Google Scholar

Zhang, J., Liu, T., and Tao, D., “An information-theoretic view for deep learning,” arXiv:1804.09060, 2018.Google Scholar

Vera, M., Piantanida, P., and Vega, L. R., “The role of information complexity and randomization in representation learning,” arXiv:1802.05355, 2018.Google Scholar

Vera, M., Vega, L. R., and Piantanida, P., “Compression -based regularization with an application to multi-task learning,” arXiv:1711.07099, 2018.Google Scholar

Chan, C., Al-Bashadsheh, A., and Zhou, Q., “Info-clustering: A mathematical theory of data clustering,” IEEE Trans. Mol. Biol. Multi-Scale Commun., vol. 2, no. 1, pp. 64–91, 2016.Google Scholar

Raman, R. K. and Varshney, L. R., “Universal joint image clustering and registration using multivariate information measures,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 928–943, 2018.Google Scholar

Zhang, Z. and Berger, T., “Estimation via compressed information,” IEEE Trans. Information Theory, vol. 34, no. 2, pp. 198–211, 1988.Google Scholar

Han, T. S. and Amari, S., “Parameter estimation with multiterminal data compression,” IEEE Trans. Information Theory, vol. 41, no. 6, pp. 1802–1833, 1995.Google Scholar

Zhang, Y., Duchi, J. C., Jordan, M. I., and Wainwright, M. J., “Information -theoretic lower bounds for distributed statistical estimation with communication constraints,” in Proc. Advances in Neural Information Processing Systems (NeurIPS ’13), 2013.Google Scholar

Ahlswede, R. and Csiszár, I., “Hypothesis testing with communication constraints,” IEEE Trans. Information Theory, vol. 32, no. 4, pp. 533–542, 1986.Google Scholar

Han, T. S., “Hypothesis testing with multiterminal data compression,” IEEE Trans. Information Theory, vol. 33, no. 6, pp. 759–772, 1987.Google Scholar

Han, T. S. and Kobayashi, K., “Exponential-type error probabilities for multiterminal hypothesis testing,” IEEE Trans. Information Theory, vol. 35, no. 1, pp. 2–14, 1989.Google Scholar

Han, T. S. and Amari, S., “Statistical inference under multiterminal data compression,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2300–2324, 1998.Google Scholar

Shalaby, H. M. H. and Papamarcou, A., “Multiterminal detection with zero-rate data compression,” IEEE Trans. Information Theory, vol. 38, no. 2, pp. 254–267, 1992.Google Scholar

Katz, G., Piantanida, P., Couillet, R., and Debbah, M., “On the necessity of binning for the distributed hypothesis testing problem,” in Proc. IEEE International Symposium on Information Theory (ISIT ’15), 2015.Google Scholar

Xiang, Y. and Kim, Y., “Interactive hypothesis testing against independence,” in Proc. IEEE International Symposium on Information Theory (ISIT ’13), 2013.Google Scholar

Zhao, W. and Lai, L., “Distributed testing against independence with conferencing encoders,” in Proc. IEEE Information Theory Workshop (ITW ’15), 2015.Google Scholar

Zhao, W. and Lai, L., “Distributed testing with zero-rate compression,” in Proc. IEEE International Symposium on Information Theory (ISIT ’15), 2015.Google Scholar

Zhao, W. and Lai, L., “Distributed detection with vector quantizer,” IEEE Trans. Signal Information Processing Networks, vol. 2, no. 2, pp. 105–119, 2016.Google Scholar

Zhao, W. and Lai, L., “Distributed testing with cascaded encoders,” IEEE Trans. Information Theory, vol. 64, no. 11, pp. 7339–7348, 2018.Google Scholar

Raginsky, M., “Learning from compressed observations,” in Proc. IEEE Information Theory Workshop (ITW ’07), 2007.Google Scholar

Raginsky, M., “Achievability results for statistical learning under communication constraints,” in Proc. IEEE International Symposium on Information Theory (ISIT ’09), 2009.Google Scholar

Xu, A. and Raginsky, M., “Information-theoretic lower bounds for distributed function computation,” IEEE Trans. Information Theory, vol. 63, no. 4, pp. 2314–2337, 2017.Google Scholar

Dwork, C. and Roth, A., “The algorithmic foundations of differential privacy,” Foundations and Trends Theoretical Computer Sci., vol. 9, no. 3–4, pp. 211–407, 2014.Google Scholar

Liao, J., Sankar, L., Tan, V. Y. F., and Calmon, F. P., “Hypothesis testing under mutual information privacy constraints in the high privacy regime,” IEEE Trans. Information Forensics Security, vol. 13, no. 4, pp. 1058–1071, 2018.Google Scholar

Calmon, F. P., Wei, D., Vinzamuri, B., Ramamurthy, K. N., and Varshney, K. R., “Data pre-processing for discrimination prevention: Information-theoretic optimization and analysis,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 1106–1119, 2018.Google Scholar

Book contents

1 - Introduction to Information Theory and Data Science.

Summary

Keywords

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive