Book contents
- Frontmatter
- Contents
- Contributors
- Preface
- 1 The Modern Mathematics of Deep Learning
- 2 Generalization in Deep Learning
- 3 Expressivity of Deep Neural Networks
- 4 Optimization Landscape of Neural Networks
- 5 Explaining the Decisions of Convolutional and Recurrent Neural Networks
- 6 Stochastic Feedforward Neural Networks: Universal Approximation
- 7 Deep Learning as Sparsity-Enforcing Algorithms
- 8 The Scattering Transform
- 9 Deep Generative Models and Inverse Problems
- 10 Dynamical Systems andOptimal Control Approach to Deep Learning
- 11 Bridging Many-Body Quantum Physics and Deep Learning via Tensor Networks
4 - Optimization Landscape of Neural Networks
Published online by Cambridge University Press: 29 November 2022
- Frontmatter
- Contents
- Contributors
- Preface
- 1 The Modern Mathematics of Deep Learning
- 2 Generalization in Deep Learning
- 3 Expressivity of Deep Neural Networks
- 4 Optimization Landscape of Neural Networks
- 5 Explaining the Decisions of Convolutional and Recurrent Neural Networks
- 6 Stochastic Feedforward Neural Networks: Universal Approximation
- 7 Deep Learning as Sparsity-Enforcing Algorithms
- 8 The Scattering Transform
- 9 Deep Generative Models and Inverse Problems
- 10 Dynamical Systems andOptimal Control Approach to Deep Learning
- 11 Bridging Many-Body Quantum Physics and Deep Learning via Tensor Networks
Summary
This chapter summarizes recent advances on the analysis of the optimization landscape of neural network training. We first review classical results for linear networks trained with a squared loss and without regularization. Such results show that under certain conditions on the input-output data spurious local minima are guaranteed not to exist, i.e. critical points are either saddle points or global minima. Moreover, the globally optimal weights can be found by factorizing certain matrices obtained from the input-output covariance matrices.We then review recent results for deep networks with parallel structure, positively homogeneous network mapping and regularization, and trained with a convex loss. Such results show that the non-convex objective on theweights can be lower-bounded by a convex objective on the network mapping. Moreover, when the network is sufficiently wide, local minima of the non-convex objective that satisfy a certain condition yield global minima of both the non-convex and convex objectives, and that there is always a non-increasing path to a global minimizer from any initialization.
Keywords
- Type
- Chapter
- Information
- Mathematical Aspects of Deep Learning , pp. 200 - 228Publisher: Cambridge University PressPrint publication year: 2022