Generalized Normalizing Flows via Markov Chains

Paul Lyonel Hagemann; Johannes Hertrich; Gabriele Steidl

doi:10.1017/9781009331012

Series: Elements in Non-local Data Interactions: Foundations and Applications

Generalized Normalizing Flows via Markov Chains

Published online by Cambridge University Press: 19 January 2023

Paul Lyonel Hagemann ,

Johannes Hertrich and

Gabriele Steidl

Show author details

Paul Lyonel Hagemann: Affiliation:
Technische Universität Berlin
Johannes Hertrich: Affiliation:
Technische Universität Berlin
Gabriele Steidl: Affiliation:
Technische Universität Berlin

Summary

Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This Element provides a unified framework to handle these approaches via Markov chains. The authors consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties, and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables the coupling of both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. The authors' framework establishes a useful mathematical tool to combine the various approaches.

Element contents

Summary
References

Get access

Keywords

normalizing flows variational autoencoders invertible neural networks Markov kernels Markov chain Monto Carlo methods

Type: Element
Information: Series: Elements in Non-local Data Interactions: Foundations and Applications

DOI: https://doi.org/10.1017/9781009331012 [Opens in a new window]

Online ISBN: 9781009331012

Publisher: Cambridge University Press

Print publication: 02 February 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Element purchase

Temporarily unavailable

References

Altekrüger, F., and Hertrich, J. 2022. WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution. arXiv:2201.08157.CrossRef Google Scholar

Altekrüger, F., Denker, A., Hagemann, P. et al. 2022. PatchNR: Learning from Small Data by Patch Normalizing Flow Regularization. arXiv:2205.12021.Google Scholar

Ambrosio, L., Gigli, N., and Savaré, G. 2005. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser.Google Scholar

Anderson, B. D. 1982. Reverse-Time Diffusion Equation Models. Stochastic Processes and Their Applications, 12(3), 313–326.Google Scholar

Andrle, A., Farchmin, N., Hagemann, P. et al. 2021. Invertible Neural Networks versus MCMC for Posterior Reconstruction in Grazing Incidence X-Ray Fluorescence. Pages 528–539 of Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., and Simon, L. (eds.), Scale Space and Variational Methods. Lecture Notes in Computer Science, vol. 12679. Springer.CrossRef Google Scholar

Arbel, M., Matthews, A., and Doucet, A. 2021. Annealed Flow Transport Monte Carlo. arXiv:2102.07501.Google Scholar

Ardizzone, L., Kruse, J., Lüth, C. et al. 2021. Conditional Invertible Neural Networks for Diverse Image-to-Image Translation. Pages 373–387 of Pattern Recognition: 42nd DAGM German Conference, DAGM GCPR 2020, Tübingen, Germany, September 28–October 1, 2020, Proceedings 42. Springer. DOI: https://doi.org/10.1007/978-3-030-71278-5_27.Google Scholar

Ardizzone, L., Kruse, J., Rother, C., and Köthe, U. 2019a. Analyzing Inverse Problems with Invertible Neural Networks. In (n. pag.) 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. arXiv:abs/1808.04730.Google Scholar

Ardizzone, L., Lüth, C., Kruse, J., Rother, C., and Köthe, U. 2019b. Guided Image Generation with Conditional Invertible Neural Networks. arXiv:1907.02392.Google Scholar

Behrmann, J., Grathwohl, W., Chen, R., Duvenaud, D., and Jacobsen, J.-H. 2019. Invertible Residual Networks. Pages 573–582 of Proceedings of Machine Learning Research.Google Scholar

Behrmann, J., Vicol, P., Wang, K.-Ch., Grosse, R., and Jacobsen, J.-H. 2020. Understanding and Mitigating Exploding Inverses in Invertible Neural Networks. arXiv:2006.09347.Google Scholar

Chen, R., Behrmann, J., Duvenaud, D. K., and Jacobsen, J.-H. 2019. Residual Flows for Invertible Generative Modeling. Pages 9916–9926 of Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc.Google Scholar

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. 2018. Neural Ordinary Differential Equations. Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.Google Scholar

Coffey, W., and Kalmykov, Y. P. 2012. The Langevin Equation: With Applications to Stochastic Problems in Physics, Chemistry and Electrical Engineering. Vol. 28. World Scientific. Series in Contemporary Chemical Physics. DOI: https://doi.org/10.1142/10490.Google Scholar

Combettes, P. L., and Pesquet, J.-Ch. 2020. Deep Neural Network Structures Solving Variational Inequalities. Set-Valued and Variational Analysis, 28, 491–518.Google Scholar

Cornish, R., Caterini, A. L., Deligiannidis, G., and Doucet, A. 2019. Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows. arXiv:1909.13833.Google Scholar

Cover, T. M., and Thomas, J. A. 2006. Elements of Information Theory. 2nd ed. Wiley Series in Telecommunications and Signal Processing. Wiley-Interscience.Google Scholar

Cunningham, E., Zabounidis, R., Agrawal, A., Fiterau, I., and Sheldon, D. 2020. Normalizing Flows across Dimensions. arXiv:2006.13070.Google Scholar

Dai, B., and Wipf, D. P. 2019. Diagnosing and Enhancing VAE Models. In International Conference on Learning Representations. arXiv:1903.05789.Google Scholar

De Cao, N., Titov, I., and Aziz, W. 2019. Block Neural Autoregressive Flow. arXiv:1904.04676.Google Scholar

Denker, A., Schmidt, M., Leuschner, J., and Maass, P. 2021. Conditional Invertible Neural Networks for Medical Imaging. Journal of Imaging, 7(11), 243.CrossRef Google Scholar PubMed

Dinh, L., Krueger, D., and Bengio, Y. 2015. NICE: Non-linear Independent Components Estimation. In Bengio, Y., and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Workshop Track Proceedings. arXiv:1410.8516.Google Scholar

Dinh, L., Sohl-Dickstein, J., and Bengio, S. 2017. Density Estimation Using Real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. arXiv:1605.08803.Google Scholar

Durkan, C., and Song, Y. 2021. On Maximum Likelihood Training of Score-Based Generative Models. arXiv:2101.09258.Google Scholar

Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. 2019. Neural Spline Flows. Advances in Neural Information Processing Systems. arXiv:abs/1906.04032.Google Scholar

Falorsi, L., de Haan, P., Davidson, T. R. et al. 2018. Explorations in Homeomorphic Variational Auto-encoding. arXiv:abs/1807.04689.Google Scholar

Falorsi, L., de Haan, P., Davidson, T. R., and Forré, P. 2019. Reparameterizing Distributions on Lie Groups. arXiv:1903.02958.Google Scholar

Flamary, R., Courty, N., Gramfort, A. et al. 2021. POT: Python Optimal Transport. Journal of Machine Learning Research, 22(78), 1–8.Google Scholar

Geffner, T., and Domke, J. 2021. MCMC Variational Inference via Uncorrected Hamiltonian Annealing. Pages 639–651 of Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc.Google Scholar

Girolami, M., and Calderhead, B. 2011. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123–214.Google Scholar

González, M., Almansa, A., and Tan, P. 2021. Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior. arXiv:2103.01648.Google Scholar

Goodfellow, I., Bengio, Y., and Courville, A. 2016. Deep Learning. Massachusetts Institute of Technology Press.Google Scholar

Grathwohl, W., Chen, R. T. Q., Bettencourt, J., Sutskever, I., and Duvenaud, D. 2018. FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models. arXiv:1810.01367.Google Scholar

Gritsenko, A. A., Snoek, J., and Salimans, T. 2019. On the Relationship between Normalising Flows and Variational- and Denoising Autoencoders. In Deep Generative Models for Highly Structured Data, ICLR 2019 Workshop.Google Scholar

Grosse, R. B., Maddison, C. J., and Salakhutdinov, R. R. 2013. Annealing between Distributions by Averaging Moments. In Burges, C. J., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc.Google Scholar

Hagemann, P., Hertrich, J., and Steidl, G. 2022. Stochastic Normalizing Flows for Inverse Problems: A Markov Chains Viewpoint. SIAM Journal on Uncertainty Quantification, 10(3):1162–1190. arXiv:abs/2109.11375.Google Scholar

Hagemann, P. L., and Neumayer, S. 2021. Stabilizing Invertible Neural Networks Using Mixture Models. Inverse Problems, 37(8), 085002.Google Scholar

Hasannasab, M., Hertrich, J., Neumayer, S. et al. 2020. Parseval Proximal Neural Networks. Journal of Fourier Analysis and Applications, 26, 59.Google Scholar

Haussmann, U. G., and Pardoux, E. 1986. Time Reversal of Diffusions. The Annals of Probability, 14(4), 1188–1205.CrossRef Google Scholar

He, K., Zhang, X., Ren, S., and Sun, J. 2016. Deep Residual Learning for Image Recognition. Pages 770–778 of Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1512.03385.CrossRef Google Scholar

Heidenreich, S., Gross, H., and Bär, M. 2015. Bayesian Approach to the Statistical Inverse Problem of Scatterometry: Comparison of Three Surrogate Models. International Journal for Uncertainty Quantification, 5(6), 511–526.CrossRef Google Scholar

Heidenreich, S., Gross, H., and Bär, M. 2018. Bayesian Approach to Determine Critical Dimensions from Scatterometric Measurements. Metrologia, 55(6), S201.CrossRef Google Scholar

Hertrich, J., Houdard, A., and Redenbach, C. 2022. Wasserstein Patch Prior for Image Superresolution. IEEE Transactions on Computational Imaging, 8, 693–704.Google Scholar

Hertrich, J., Neumayer, S., and Steidl, G. 2020. Convolutional Proximal Neural Networks and Plug-and-Play Algorithms. Linear Algebra and Its Applications, 631, 203–234.Google Scholar

Ho, J., Jain, A., and Abbeel, P. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems. 33, 6840–6851.Google Scholar

Houdard, A., Leclaire, A., Papadakis, N., and Rabin, J. 2021. Wasserstein Generative Models for Patch-Based Texture Synthesis. Pages 269–280 of Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., and Simon, L. (eds.), Scale Space and Variational Methods in Computer Vision. Springer International Publishing.Google Scholar

Huang, Ch.-W., Krueger, D., Lacoste, A., and Courville, A. 2018. Neural Autoregressive Flows. Pages 2078–2087 of Proceedings of the 35th International Conference on Machine Learning. PMLR.Google Scholar

Hyvärinen, A., and Dayan, P. 2005. Estimation of Non-normalized Statistical Models by Score Matching. Journal of Machine Learning Research, 6(4), 695–709.Google Scholar

Jaini, P., Kobyzev, I., Yu, Y., and Brubaker, M. 2019. Tails of Lipschitz Triangular Flows. arXiv:1907.04481.Google Scholar

Kingma, D. P., and Dhariwal, P. 2018. Glow: Generative Flow with Invertible 1x1 Convolutions. arXiv:1807.03039.Google Scholar

Kingma, D. P., and Welling, M. 2013. Auto-encoding Variational Bayes. arXiv:1312.6114.Google Scholar

Kingma, D. P., and Welling, M. 2019. An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning, 12(4), 307–392.Google Scholar

Kobler, E., Effland, A., Kunisch, K., and Pock, T. 2020. Total Deep Variation for Linear Inverse Problems. Pages 7549–7558 of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI: https://doi.org/10.1109/CVPR42600.2020.00757.Google Scholar

Kothari, K., Khorashadizadeh, A., de Hoop, M., and Dokmanić, I. 2021. Trumpets: Injective Flows for Inference and Inverse Problems. arXiv:2102.10461.Google Scholar

Kruse, J., Detommaso, G., Scheichl, R., and Köthe, U. 2020. HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference. arXiv:1905.10687.CrossRef Google Scholar

Le Gall, J.-F. 2016. Brownian Motion, Martingales, and Stochastic Calculus. Graduate Texts in Mathematics, vol. 274. Springer.Google Scholar

Louizos, C., and Welling, M. 2017. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. Pages 2218–2227 of Proceedings of the 34th International Conference on Machine Learning. PMLR. https://proceedings.mlr.press/v70/louizos17a.html.Google Scholar

Lunz, S., Öktem, O., and Schönlieb, C.-B. 2018. Adversarial Regularizers in Inverse Problems. Neural Information Processing Systems. arXiv:1805.11572v1.Google Scholar

Matthews, Alexander G. D. G, Arbel, M., Rezende, Danilo, J., and Doucet, A. 2022. Continual Repeated Annealed Flow Transport Monte Carlo. arXiv:2201.13117.Google Scholar

McCann, R. J. 1997. A Convexity Principle for Interacting Gases. Advances in Mathematics, 128(1), 153–179.CrossRef Google Scholar

Mirza, M., and Osindero, S. 2014. Conditional Generative Adversarial Nets. arXiv:1411.1784.Google Scholar

Müller, T., McWilliams, B., Rousselle, F., Gross, M., and Novák, J. 2018. Neural Importance Sampling. arXiv:1808.03856.Google Scholar

Neal, R. M. 2001. Annealed Importance Sampling. Statistics and Computing, 11(2), 125–139.CrossRef Google Scholar

Nielsen, D., Jaini, P., Hoogeboom, E., Winther, O., and Welling, M. 2020. SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows. arXiv:abs/2007.02731.Google Scholar

Nilmeier, J. P., Crooks, G., Minh, D. D. L., and Chodera, J. 2011. Nonequilibrium Candidate Monte Carlo is an Efficient Tool for Equilibrium Simulation. Proceedings of the National Academy of Sciences of the United States of America, 108(3), 1009–1018.Google Scholar

Onken, D., Fung, S. W., Li, X., and Ruthotto, L. 2021. OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport. Pages 9223–9232 of Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10: AAAI-21 Technical Tracks 10. DOI: https://doi.org/10.1609/aaai.v35i10.17113.CrossRef Google Scholar

Papamakarios, G., Pavlakou, T., and Murray, I. 2017. Masked Autoregressive Flow for Density Estimation. Advances in Neural Information Processing Systems, 30 (NIPS 2017), 2338–2347.Google Scholar

Pesquet, J.-C., Repetti, A., Terris, M., and Wiaux, Y. 2021. Learning Maximally Monotone Operators for Image Recovery. SIAM Journal on Imaging Sciences, 14(3), 1206–1237.Google Scholar

Peyré, G., and Cuturi, M. 2019. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends in Machine Learning, 11(5–6), 355–607.CrossRef Google Scholar

Rezende, D. J., and Mohamed, S. 2015a. Variational Inference with Normalizing Flows. Pages 1530–1538 of Bach, F., and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37. PMLR.Google Scholar

Rezende, D. J., and Mohamed, S. 2015b. Variational Inference with Normalizing Flows. arXiv:1505.05770.Google Scholar

Roberts, G. O., and Rosenthal, J. S. 2004. General State Space Markov Chains and MCMC Algorithms. Probabability Surveys, 1, 20–71.Google Scholar

Roberts, G. O., and Tweedie, R. L. 1996. Exponential Convergence of Langevin Distributions and Their Discrete Approximations. Bernoulli, 2(4), 341–363.Google Scholar

Rossky, P. J., Doll, J. D., and Friedman, H. L. 1978. Brownian Dynamics as Smart Monte Carlo Simulation. The Journal of Chemical Physics, 69(10), 4628–4633.Google Scholar

Ruthotto, L., and Haber, E. 2021. An Introduction to Deep Generative Modeling. DMV Mitteilungen, 44(3), 1–24.Google Scholar

Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. 2015. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. arXiv:1503.03585.Google Scholar

Sohn, K., Lee, H., and Yan, X. 2015. Learning Structured Output Representation Using Deep Conditional Generative Models. Pages 3483–3491 of Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, 28 (NIPS 2015).Google Scholar

Song, Y., and Ermon, St. 2019. Generative Modeling by Estimating Gradients of the Data Distribution. arXiv:1907.05600.Google Scholar

Song, Y., Sohl-Dickstein, J., Kingma, D. P. et al. 2020. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv:2011.13456.Google Scholar

Sun, H., and Bouman, K. L. 2021. Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging. In AAAI. arXiv:2010.14462v1 [cs.LG].Google Scholar

Teuber, T., Steidl, G., Gwosdek, P., Schmaltz, C., and Weickert, J. 2011. Dithering by Differences of Convex Functions SIAM Journal on Imaging Science, 4(1), 79–108.Google Scholar

Thin, A., Kotelevskii, N., Doucet, A. et al. 2021. Monte Carlo Variational Auto-Encoders. Pages 10247–10257 of Meila, M., and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139. PMLR.Google Scholar

Tierney, L. 1998. A Note on Metropolis-Hastings Kernels for General State Spaces Annals of Applied Probability, 8(1), 1–9.Google Scholar

Tsvetkov, D., Hristov, L., and Angelova-Slavova, R. 2020. On the Convergence of the Metropolis-Hastings Markov Chains. arXiv:1302.0654v4.Google Scholar

Vahdat, A., Kreis, K., and Kautz, J. 2021. Score-Based Generative Modeling in Latent Space. arXiv:2106.05931.Google Scholar

Villani, C. 2003. Topics in Optimal Transportation. American Mathematical Society.CrossRef Google Scholar

Welling, M., and Teh, Y.-W. 2011. Bayesian Learning via Stochastic Gradient Langevin Dynamics. Pages 681–688 of Getoor, L., and Scheffer, T. (eds.), ICML’11: Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress.Google Scholar

Winkler, C., Worrall, D., Hoogeboom, E., and Welling, M. 2019. Learning Likelihoods with Conditional Normalizing Flows. arXiv:1912.00042.Google Scholar

Wu, H., Köhler, J., and Noé, F. 2020. Stochastic Normalizing Flows. Pages 5933–5944 in Larochelle, H., Ranzato, M. A., Hadsell, R., Balcan, M.-F., and Lin, H.-T. (eds.), Advances in Neural Information Processing Systems. 33 (NeurIPS 2020). Curran Associates, Inc.Google Scholar

Zhang, Q., and Chen, Y. 2021. Diffusion Normalizing Flow. Pages 16280–16291 in Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., and Wortman Vaughan, J. (eds.), Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Curran Associates, Inc.Google Scholar

Element contents

Generalized Normalizing Flows via Markov Chains

Summary

Keywords

Access options

Element purchase

Temporarily unavailable

References

Save element to Kindle

Save element to Dropbox

Save element to Google Drive