Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data

doi:10.1017/9781108616799.006

5 - Sample Complexity Bounds for Dictionary Learning from Vector- and Tensor-Valued Data

Published online by Cambridge University Press: 22 March 2021

Zahra Shakeri ,

Anand D. Sarwate and

Waheed U. Bajwa

Edited by

Miguel R. D. Rodrigues and

Yonina C. Eldar

Show author details

Miguel R. D. Rodrigues: Affiliation:
University College London
Yonina C. Eldar: Affiliation:
Weizmann Institute of Science, Israel

Book contents

Get access

Summary

Dictionary learning has emerged as a powerful method for data-driven extraction of features from data. The initial focus was from an algorithmic perspective, but recently there has been increasing interest in the theoretical underpinnings. These rely on information-theoretic analytic tools and help us understand the fundamental limitations of dictionary-learning algorithms. We focus on theoretical aspects and summarize results on dictionary learning from vector- and tensor-valued data. Results are stated in terms of lower and upper bounds on sample complexity of dictionary learning, defined as the number of samples needed to identify or reconstruct the true dictionary underlying data from noiseless or noisy samples, respectively. Many analytic tools that help yield these results come from information theory, including restating the dictionary-learning problem as a channel-coding problem and connecting analysis of minimax risk in statistical estimation to Fano’s inequality. In addition to highlighting effects of parameters on the sample complexity of dictionary learning, we show the potential advantages of dictionary learning from tensor data and present unaddressed problems.

Keywords

data representation dictionary learning vector data tensor data

Type: Chapter
Information: Information-Theoretic Methods in Data Science , pp. 134 - 162

DOI: https://doi.org/10.1017/9781108616799.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Bengio, Y., Courville, A., and Vincent, P., “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.Google Scholar

Bracewell, R. N. and Bracewell, R. N., The Fourier transform and its applications. McGraw-Hill, 1986.Google Scholar

Daubechies, I., Ten lectures on wavelets. SIAM, 1992.CrossRef Google Scholar

Candès, E. J. and Donoho, D. L., “Curvelets: A surprisingly effective nonadaptive representation for objects with edges,” in Proc. 4th International Conference on Curves and Surfaces, 1999, vol. 2, pp. 105–120.Google Scholar

Jolliffe, I. T., “Principal component analysis and factor analysis,” in Principal component analysis. Springer, 1986, pp. 115–128.CrossRef Google Scholar

Aharon, M., Elad, M., and Bruckstein, A., “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.Google Scholar

Kolda, T. G. and Bader, B. W., “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.Google Scholar

Duarte, M. F. and Baraniuk, R. G., “Kronecker compressive sensing,” IEEE Trans. Image Processing, vol. 21, no. 2, pp. 494–504, 2012.Google Scholar

Caiafa, C. F. and Cichocki, A., “Multidimensional compressed sensing and their applications,” Wiley Interdisciplinary Rev.: Data Mining and Knowledge Discovery, vol. 3, no. 6, pp. 355–380, 2013.Google Scholar

Hawe, S., Seibert, M., and Kleinsteuber, M., “Separable dictionary learning,” in Proc. IEEE Conference Computer Vision and Pattern Recognition, 2013, pp. 438–445.Google Scholar

Ghassemi, M., Shakeri, Z., Sarwate, A. D., and Bajwa, W. U., “STARK : Structured dictionary learning through rank-one tensor recovery,” in Proc. IEEE 7th International Workshop Computational Advances in Multi-Sensor Adaptive Processing, 2017, pp. 1–5.Google Scholar

Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds for Kronecker-structured dictionary learning,” in Proc. IEEE International Symposium on Information Theory, 2016, pp. 1148–1152.Google Scholar

Shakeri, Z., Bajwa, W. U., and Sarwate, A. D., “Minimax lower bounds on dictionary learning for tensor data,” IEEE Trans. Information Theory, vol. 64, no. 4, pp. 2706–2726, 2018.Google Scholar

Shakeri, Z., Sarwate, A. D., and Bajwa, W. U., “Identification of Kronecker-structured dictionaries: An asymptotic analysis,” in Proc. IEEE 7th International Workshop Computational Advances in Multi-Sensor Adaptive Processing, 2017, pp. 1–5.Google Scholar

Shakeri, Z., Sarwate, A. D., and Bajwa, W. U., “Identifiability of Kronecker-structured dictionaries for tensor data,” IEEE J. Selected Topics Signal Processing, vol. 12, no. 5, pp. 1047–1062, 2018.CrossRef Google Scholar

Vidal, R., Ma, Y., and Sastry, S., “Generalized principal component analysis (GPCA),” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 12, pp. 1945–1959, 2005.Google Scholar

Raina, R., Battle, A., Lee, H., Packer, B., and Ng, A. Y., “Self -taught learning: Transfer learning from unlabeled data,” in Proc. 24th International Conference on Machine Learning, 2007, pp. 759–766.Google Scholar

Fisher, R. A., “The use of multiple measurements in taxonomic problems,” Annals Human Genetics, vol. 7, no. 2, pp. 179–188, 1936.Google Scholar

Hyvärinen, A., Karhunen, J., and Oja, E., Independent component analysis. John Wiley & Sons, 2004.Google Scholar

Coifman, R. R. and Lafon, S., “Diffusion maps,” Appl. Comput. Harmonic Analysis, vol. 21, no. 1, pp. 5–30, 2006.Google Scholar

Schölkopf, B., Smola, A., and Müller, K.-R., “Kernel principal component analysis,,” in Proc. International Conference on Artificial Neural Networks, 1997, pp. 583–588.Google Scholar

Hinton, G. E. and Salakhutdinov, R. R., “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.CrossRef Google Scholar PubMed

Grosse, R., Raina, R., Kwong, H., and Ng, A. Y., “Shift -invariance sparse coding for audio classification,” in Proc. 23rd Conference on Uncertainty in Artificial Intelligence, 2007, pp. 149–158.Google Scholar

Duarte-Carvajalino, J. M. and Sapiro, G., “Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization,” IEEE Trans. Image Processing, vol. 18, no. 7, pp. 1395–1408, 2009.Google Scholar

Mairal, J., Bach, F., and Ponce, J., “Task-driven dictionary learning,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 34, no. 4, pp. 791–804, 2012.Google Scholar

Aharon, M., Elad, M., and Bruckstein, A. M., “On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them,” Linear Algebra Applications, vol. 416, no. 1, pp. 48–67, 2006.Google Scholar

Remi, R. and Schnass, K., “Dictionary identification – sparse matrix-factorization via ℓ₁-minimization,” IEEE Trans. Information Theory, vol. 56, no. 7, pp. 3523–3539, 2010.Google Scholar

Spielman, D. A., Wang, H., and Wright, J., “Exact recovery of sparsely-used dictionaries,” in Proc. Conference on Learning Theory, 2012, pp. 37.11–37.18.Google Scholar

Geng, Q. and Wright, J., “On the local correctness of ℓ₁-minimization for dictionary learning,” in Proc. IEEE International Symposium on Information Theory, 2014, pp. 3180–3184.Google Scholar

Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., and Tandon, R., “Learning sparsely used overcomplete dictionaries,” in Proc. 27th Annual Conference on Learning Theory, 2014, pp. 1–15.Google Scholar

Arora, S., Ge, R., and Moitra, A., “New algorithms for learning incoherent and overcomplete dictionaries,” in Proc. 25th Annual Conference Learning Theory, 2014, pp. 1–28.Google Scholar

Gribonval, R., Jenatton, R., and Bach, F., “Sparse and spurious: Dictionary learning with noise and outliers,” IEEE Trans. Information Theory, vol. 61, no. 11, pp. 6298–6319, 2015.Google Scholar

Jung, A., Eldar, Y. C., and Görtz, N., “On the minimax risk of dictionary learning,” IEEE Trans. Information Theory, vol. 62, no. 3, pp. 1501–1515, 2015.Google Scholar

Christensen, O., An introduction to frames and Riesz bases. Springer, 2016.Google Scholar

Okoudjou, K. A., Finite frame theory: A complete introduction to overcompleteness. American Mathematical Society, 2016.CrossRef Google Scholar

Bajwa, W. U., Calderbank, R., and Mixon, D. G., “Two are better than one: Fundamental parameters of frame coherence,” Appl. Comput. Harmonic Analysis, vol. 33, no. 1, pp. 58–78, 2012.Google Scholar

Bajwa, W. U. and Pezeshki, A., “Finite frames for sparse signal processing,” in Finite frames, Casazza, P. and Kutyniok, G., eds. Birkhäuser, 2012, ch. 10, pp. 303–335.Google Scholar

Schnass, K., “On the identifiability of overcomplete dictionaries via the minimisation principle underlying K-SVD,” Appl. Comput. Harmonic Analysis, vol. 37, no. 3, pp. 464–491, 2014.Google Scholar

Yuan, M. and Lin, Y., “Model selection and estimation in regression with grouped variables,” J. Roy. Statist. Soc. Ser. B, vol. 68, no. 1, pp. 49–67, 2006.Google Scholar

Vapnik, V., “Principles of risk minimization for learning theory,” in Proc. Advances in Neural Information Processing Systems, 1992, pp. 831–838.Google Scholar

Engan, K., Aase, S. O., and Husoy, J. H., “Method of optimal directions for frame design,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 1999, pp. 2443–2446.Google Scholar

Mairal, J., Bach, F., Ponce, J., and Sapiro, G., “Online learning for matrix factorization and sparse coding,” J. Machine Learning Res., vol. 11, no. 1, pp. 19–60, 2010.Google Scholar

Raja, H. and Bajwa, W. U., “Cloud K-SVD: A collaborative dictionary learning algorithm for big, distributed data,” IEEE Trans. Signal Processing, vol. 64, no. 1, pp. 173–188, 2016.Google Scholar

Shakeri, Z., Raja, H., and Bajwa, W. U., “Dictionary learning based nonlinear classifier training from distributed data,” in Proc. 2nd IEEE Global Conference Signal and Information Processing, 2014, pp. 759–763.Google Scholar

Zhou, M., Chen, H., Paisley, J., Ren, L., Li, L., Xing, Z., Dunson, D., Sapiro, G., and Carin, L., “Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images,” IEEE Trans. Image Processing, vol. 21, no. 1, pp. 130–144, 2012.Google Scholar

Yu, B., “Assouad , Fano, and Le Cam,” in Festschrift for Lucien Le Cam. Springer, 1997, pp. 423–435.Google Scholar

Wainwright, M. J., “Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ₁-constrained quadratic programming (lasso),” IEEE Trans. Information Theory, vol. 55, no. 5, pp. 2183–2202, 2009.Google Scholar

Massart, P., Concentration inequalities and model selection. Springer, 2007.Google Scholar

Tucker, L. R., “Implications of factor analysis of three-way matrices for measurement of change,” in Problems Measuring Change. University of Wisconsin Press, 1963, pp. 122–137.Google Scholar

Van, C. F. Loan, “The ubiquitous Kronecker product,” J. Comput. Appl. Math., vol. 123, no. 1, pp. 85–100, 2000.Google Scholar

Harshman, R. A., “Foundations of the PARAFAC procedure: Models and conditions for an explanatory multi-modal factor analysis,” in UCLA Working Papers in Phonetics, vol. 16, pp. 1–84, 1970.Google Scholar

Kilmer, M. E., Martin, C. D., and Perrone, L., “A third-order generalization of the matrix SVD as a product of third-order tensors,” Technical Report, 2008.Google Scholar

Caiafa, C. F. and Cichocki, A., “Computing sparse representations of multidimensional signals using Kronecker bases,” Neural Computation, vol. 25, no. 1, pp. 186–220, 2013.Google Scholar

Gandy, S., Recht, B., and Yamada, I., “Tensor completion and low-n-rank tensor recovery via convex optimization,” Inverse Problems, vol. 27, no. 2, p. 025010, 2011.CrossRef Google Scholar

De, L. Lathauwer, B. De Moor, and Vandewalle, J., “A multilinear singular value decomposition,” SIAM J. Matrix Analysis Applications, vol. 21, no. 4, pp. 1253–1278, 2000.Google Scholar

Schnass, K., “Local identification of overcomplete dictionaries,” J. Machine Learning Res., vol. 16, pp. 1211–1242, 2015.Google Scholar

Zubair, S. and Wang, W., “Tensor dictionary learning with sparse Tucker decomposition,” in Proc. IEEE 18th International Conference on Digital Signal Processing, 2013, pp. 1–6.Google Scholar

Roemer, F., Del Galdo, G., and Haardt, M., “Tensor-based algorithms for learning multidimensional separable dictionaries,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 3963–3967.CrossRef Google Scholar

Dantas, C. F., da Costa, M. N., and da Rocha Lopes, R., “Learning dictionaries as a sum of Kronecker products,” IEEE Signal Processing Lett., vol. 24, no. 5, pp. 559–563, 2017.Google Scholar

Zhang, Y., Mou, X., Wang, G., and Yu, H., “Tensor-based dictionary learning for spectral CT reconstruction,” IEEE Trans. Medical Imaging, vol. 36, no. 1, pp. 142–154, 2017.Google Scholar

Soltani, S., Kilmer, M. E., and Hansen, P. C., “A tensor-based dictionary learning approach to tomographic image reconstruction,” BIT Numerical Mathe., vol. 56, no. 4, pp. 1–30, 2015.Google Scholar