Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-22T15:12:15.828Z Has data issue: false hasContentIssue false

Generalized Normalizing Flows via Markov Chains

Published online by Cambridge University Press:  19 January 2023

Paul Lyonel Hagemann
Affiliation:
Technische Universität Berlin
Johannes Hertrich
Affiliation:
Technische Universität Berlin
Gabriele Steidl
Affiliation:
Technische Universität Berlin

Summary

Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This Element provides a unified framework to handle these approaches via Markov chains. The authors consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties, and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables the coupling of both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. The authors' framework establishes a useful mathematical tool to combine the various approaches.
Get access
Type
Element
Information
Online ISBN: 9781009331012
Publisher: Cambridge University Press
Print publication: 02 February 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altekrüger, F., and Hertrich, J. 2022. WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution. arXiv:2201.08157.CrossRefGoogle Scholar
Altekrüger, F., Denker, A., Hagemann, P. et al. 2022. PatchNR: Learning from Small Data by Patch Normalizing Flow Regularization. arXiv:2205.12021.Google Scholar
Ambrosio, L., Gigli, N., and Savaré, G. 2005. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser.Google Scholar
Anderson, B. D. 1982. Reverse-Time Diffusion Equation Models. Stochastic Processes and Their Applications, 12(3), 313326.Google Scholar
Andrle, A., Farchmin, N., Hagemann, P. et al. 2021. Invertible Neural Networks versus MCMC for Posterior Reconstruction in Grazing Incidence X-Ray Fluorescence. Pages 528539 of Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., and Simon, L. (eds.), Scale Space and Variational Methods. Lecture Notes in Computer Science, vol. 12679. Springer.CrossRefGoogle Scholar
Arbel, M., Matthews, A., and Doucet, A. 2021. Annealed Flow Transport Monte Carlo. arXiv:2102.07501.Google Scholar
Ardizzone, L., Kruse, J., Lüth, C. et al. 2021. Conditional Invertible Neural Networks for Diverse Image-to-Image Translation. Pages 373387 of Pattern Recognition: 42nd DAGM German Conference, DAGM GCPR 2020, Tübingen, Germany, September 28–October 1, 2020, Proceedings 42. Springer. DOI: https://doi.org/10.1007/978-3-030-71278-5_27.Google Scholar
Ardizzone, L., Kruse, J., Rother, C., and Köthe, U. 2019a. Analyzing Inverse Problems with Invertible Neural Networks. In (n. pag.) 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. arXiv:abs/1808.04730.Google Scholar
Ardizzone, L., Lüth, C., Kruse, J., Rother, C., and Köthe, U. 2019b. Guided Image Generation with Conditional Invertible Neural Networks. arXiv:1907.02392.Google Scholar
Behrmann, J., Grathwohl, W., Chen, R., Duvenaud, D., and Jacobsen, J.-H. 2019. Invertible Residual Networks. Pages 573582 of Proceedings of Machine Learning Research.Google Scholar
Behrmann, J., Vicol, P., Wang, K.-Ch., Grosse, R., and Jacobsen, J.-H. 2020. Understanding and Mitigating Exploding Inverses in Invertible Neural Networks. arXiv:2006.09347.Google Scholar
Chen, R., Behrmann, J., Duvenaud, D. K., and Jacobsen, J.-H. 2019. Residual Flows for Invertible Generative Modeling. Pages 99169926 of Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc.Google Scholar
Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. 2018. Neural Ordinary Differential Equations. Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.Google Scholar
Coffey, W., and Kalmykov, Y. P. 2012. The Langevin Equation: With Applications to Stochastic Problems in Physics, Chemistry and Electrical Engineering. Vol. 28. World Scientific. Series in Contemporary Chemical Physics. DOI: https://doi.org/10.1142/10490.Google Scholar
Combettes, P. L., and Pesquet, J.-Ch. 2020. Deep Neural Network Structures Solving Variational Inequalities. Set-Valued and Variational Analysis, 28, 491518.Google Scholar
Cornish, R., Caterini, A. L., Deligiannidis, G., and Doucet, A. 2019. Relaxing Bijectivity Constraints with Continuously Indexed Normalising Flows. arXiv:1909.13833.Google Scholar
Cover, T. M., and Thomas, J. A. 2006. Elements of Information Theory. 2nd ed. Wiley Series in Telecommunications and Signal Processing. Wiley-Interscience.Google Scholar
Cunningham, E., Zabounidis, R., Agrawal, A., Fiterau, I., and Sheldon, D. 2020. Normalizing Flows across Dimensions. arXiv:2006.13070.Google Scholar
Dai, B., and Wipf, D. P. 2019. Diagnosing and Enhancing VAE Models. In International Conference on Learning Representations. arXiv:1903.05789.Google Scholar
De Cao, N., Titov, I., and Aziz, W. 2019. Block Neural Autoregressive Flow. arXiv:1904.04676.Google Scholar
Denker, A., Schmidt, M., Leuschner, J., and Maass, P. 2021. Conditional Invertible Neural Networks for Medical Imaging. Journal of Imaging, 7(11), 243.CrossRefGoogle ScholarPubMed
Dinh, L., Krueger, D., and Bengio, Y. 2015. NICE: Non-linear Independent Components Estimation. In Bengio, Y., and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Workshop Track Proceedings. arXiv:1410.8516.Google Scholar
Dinh, L., Sohl-Dickstein, J., and Bengio, S. 2017. Density Estimation Using Real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. arXiv:1605.08803.Google Scholar
Durkan, C., and Song, Y. 2021. On Maximum Likelihood Training of Score-Based Generative Models. arXiv:2101.09258.Google Scholar
Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. 2019. Neural Spline Flows. Advances in Neural Information Processing Systems. arXiv:abs/1906.04032.Google Scholar
Falorsi, L., de Haan, P., Davidson, T. R. et al. 2018. Explorations in Homeomorphic Variational Auto-encoding. arXiv:abs/1807.04689.Google Scholar
Falorsi, L., de Haan, P., Davidson, T. R., and Forré, P. 2019. Reparameterizing Distributions on Lie Groups. arXiv:1903.02958.Google Scholar
Flamary, R., Courty, N., Gramfort, A. et al. 2021. POT: Python Optimal Transport. Journal of Machine Learning Research, 22(78), 18.Google Scholar
Geffner, T., and Domke, J. 2021. MCMC Variational Inference via Uncorrected Hamiltonian Annealing. Pages 639651 of Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc.Google Scholar
Girolami, M., and Calderhead, B. 2011. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123214.Google Scholar
González, M., Almansa, A., and Tan, P. 2021. Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior. arXiv:2103.01648.Google Scholar
Goodfellow, I., Bengio, Y., and Courville, A. 2016. Deep Learning. Massachusetts Institute of Technology Press.Google Scholar
Grathwohl, W., Chen, R. T. Q., Bettencourt, J., Sutskever, I., and Duvenaud, D. 2018. FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models. arXiv:1810.01367.Google Scholar
Gritsenko, A. A., Snoek, J., and Salimans, T. 2019. On the Relationship between Normalising Flows and Variational- and Denoising Autoencoders. In Deep Generative Models for Highly Structured Data, ICLR 2019 Workshop.Google Scholar
Grosse, R. B., Maddison, C. J., and Salakhutdinov, R. R. 2013. Annealing between Distributions by Averaging Moments. In Burges, C. J., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc.Google Scholar
Hagemann, P., Hertrich, J., and Steidl, G. 2022. Stochastic Normalizing Flows for Inverse Problems: A Markov Chains Viewpoint. SIAM Journal on Uncertainty Quantification, 10(3):11621190. arXiv:abs/2109.11375.Google Scholar
Hagemann, P. L., and Neumayer, S. 2021. Stabilizing Invertible Neural Networks Using Mixture Models. Inverse Problems, 37(8), 085002.Google Scholar
Hasannasab, M., Hertrich, J., Neumayer, S. et al. 2020. Parseval Proximal Neural Networks. Journal of Fourier Analysis and Applications, 26, 59.Google Scholar
Haussmann, U. G., and Pardoux, E. 1986. Time Reversal of Diffusions. The Annals of Probability, 14(4), 11881205.CrossRefGoogle Scholar
He, K., Zhang, X., Ren, S., and Sun, J. 2016. Deep Residual Learning for Image Recognition. Pages 770778 of Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1512.03385.CrossRefGoogle Scholar
Heidenreich, S., Gross, H., and Bär, M. 2015. Bayesian Approach to the Statistical Inverse Problem of Scatterometry: Comparison of Three Surrogate Models. International Journal for Uncertainty Quantification, 5(6), 511526.CrossRefGoogle Scholar
Heidenreich, S., Gross, H., and Bär, M. 2018. Bayesian Approach to Determine Critical Dimensions from Scatterometric Measurements. Metrologia, 55(6), S201.CrossRefGoogle Scholar
Hertrich, J., Houdard, A., and Redenbach, C. 2022. Wasserstein Patch Prior for Image Superresolution. IEEE Transactions on Computational Imaging, 8, 693704.Google Scholar
Hertrich, J., Neumayer, S., and Steidl, G. 2020. Convolutional Proximal Neural Networks and Plug-and-Play Algorithms. Linear Algebra and Its Applications, 631, 203234.Google Scholar
Ho, J., Jain, A., and Abbeel, P. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems. 33, 68406851.Google Scholar
Houdard, A., Leclaire, A., Papadakis, N., and Rabin, J. 2021. Wasserstein Generative Models for Patch-Based Texture Synthesis. Pages 269280 of Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., and Simon, L. (eds.), Scale Space and Variational Methods in Computer Vision. Springer International Publishing.Google Scholar
Huang, Ch.-W., Krueger, D., Lacoste, A., and Courville, A. 2018. Neural Autoregressive Flows. Pages 20782087 of Proceedings of the 35th International Conference on Machine Learning. PMLR.Google Scholar
Hyvärinen, A., and Dayan, P. 2005. Estimation of Non-normalized Statistical Models by Score Matching. Journal of Machine Learning Research, 6(4), 695709.Google Scholar
Jaini, P., Kobyzev, I., Yu, Y., and Brubaker, M. 2019. Tails of Lipschitz Triangular Flows. arXiv:1907.04481.Google Scholar
Kingma, D. P., and Dhariwal, P. 2018. Glow: Generative Flow with Invertible 1x1 Convolutions. arXiv:1807.03039.Google Scholar
Kingma, D. P., and Welling, M. 2013. Auto-encoding Variational Bayes. arXiv:1312.6114.Google Scholar
Kingma, D. P., and Welling, M. 2019. An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning, 12(4), 307392.Google Scholar
Kobler, E., Effland, A., Kunisch, K., and Pock, T. 2020. Total Deep Variation for Linear Inverse Problems. Pages 75497558 of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI: https://doi.org/10.1109/CVPR42600.2020.00757.Google Scholar
Kothari, K., Khorashadizadeh, A., de Hoop, M., and Dokmanić, I. 2021. Trumpets: Injective Flows for Inference and Inverse Problems. arXiv:2102.10461.Google Scholar
Kruse, J., Detommaso, G., Scheichl, R., and Köthe, U. 2020. HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference. arXiv:1905.10687.CrossRefGoogle Scholar
Le Gall, J.-F. 2016. Brownian Motion, Martingales, and Stochastic Calculus. Graduate Texts in Mathematics, vol. 274. Springer.Google Scholar
Louizos, C., and Welling, M. 2017. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks. Pages 22182227 of Proceedings of the 34th International Conference on Machine Learning. PMLR. https://proceedings.mlr.press/v70/louizos17a.html.Google Scholar
Lunz, S., Öktem, O., and Schönlieb, C.-B. 2018. Adversarial Regularizers in Inverse Problems. Neural Information Processing Systems. arXiv:1805.11572v1.Google Scholar
Matthews, Alexander G. D. G, Arbel, M., Rezende, Danilo, J., and Doucet, A. 2022. Continual Repeated Annealed Flow Transport Monte Carlo. arXiv:2201.13117.Google Scholar
McCann, R. J. 1997. A Convexity Principle for Interacting Gases. Advances in Mathematics, 128(1), 153179.CrossRefGoogle Scholar
Mirza, M., and Osindero, S. 2014. Conditional Generative Adversarial Nets. arXiv:1411.1784.Google Scholar
Müller, T., McWilliams, B., Rousselle, F., Gross, M., and Novák, J. 2018. Neural Importance Sampling. arXiv:1808.03856.Google Scholar
Neal, R. M. 2001. Annealed Importance Sampling. Statistics and Computing, 11(2), 125139.CrossRefGoogle Scholar
Nielsen, D., Jaini, P., Hoogeboom, E., Winther, O., and Welling, M. 2020. SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows. arXiv:abs/2007.02731.Google Scholar
Nilmeier, J. P., Crooks, G., Minh, D. D. L., and Chodera, J. 2011. Nonequilibrium Candidate Monte Carlo is an Efficient Tool for Equilibrium Simulation. Proceedings of the National Academy of Sciences of the United States of America, 108(3), 10091018.Google Scholar
Onken, D., Fung, S. W., Li, X., and Ruthotto, L. 2021. OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport. Pages 92239232 of Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10: AAAI-21 Technical Tracks 10. DOI: https://doi.org/10.1609/aaai.v35i10.17113.CrossRefGoogle Scholar
Papamakarios, G., Pavlakou, T., and Murray, I. 2017. Masked Autoregressive Flow for Density Estimation. Advances in Neural Information Processing Systems, 30 (NIPS 2017), 23382347.Google Scholar
Pesquet, J.-C., Repetti, A., Terris, M., and Wiaux, Y. 2021. Learning Maximally Monotone Operators for Image Recovery. SIAM Journal on Imaging Sciences, 14(3), 12061237.Google Scholar
Peyré, G., and Cuturi, M. 2019. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends in Machine Learning, 11(5–6), 355607.CrossRefGoogle Scholar
Rezende, D. J., and Mohamed, S. 2015a. Variational Inference with Normalizing Flows. Pages 15301538 of Bach, F., and Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37. PMLR.Google Scholar
Rezende, D. J., and Mohamed, S. 2015b. Variational Inference with Normalizing Flows. arXiv:1505.05770.Google Scholar
Roberts, G. O., and Rosenthal, J. S. 2004. General State Space Markov Chains and MCMC Algorithms. Probabability Surveys, 1, 2071.Google Scholar
Roberts, G. O., and Tweedie, R. L. 1996. Exponential Convergence of Langevin Distributions and Their Discrete Approximations. Bernoulli, 2(4), 341363.Google Scholar
Rossky, P. J., Doll, J. D., and Friedman, H. L. 1978. Brownian Dynamics as Smart Monte Carlo Simulation. The Journal of Chemical Physics, 69(10), 46284633.Google Scholar
Ruthotto, L., and Haber, E. 2021. An Introduction to Deep Generative Modeling. DMV Mitteilungen, 44(3), 124.Google Scholar
Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. 2015. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. arXiv:1503.03585.Google Scholar
Sohn, K., Lee, H., and Yan, X. 2015. Learning Structured Output Representation Using Deep Conditional Generative Models. Pages 34833491 of Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, 28 (NIPS 2015).Google Scholar
Song, Y., and Ermon, St. 2019. Generative Modeling by Estimating Gradients of the Data Distribution. arXiv:1907.05600.Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D. P. et al. 2020. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv:2011.13456.Google Scholar
Sun, H., and Bouman, K. L. 2021. Deep Probabilistic Imaging: Uncertainty Quantification and Multi-modal Solution Characterization for Computational Imaging. In AAAI. arXiv:2010.14462v1 [cs.LG].Google Scholar
Teuber, T., Steidl, G., Gwosdek, P., Schmaltz, C., and Weickert, J. 2011. Dithering by Differences of Convex Functions SIAM Journal on Imaging Science, 4(1), 79108.Google Scholar
Thin, A., Kotelevskii, N., Doucet, A. et al. 2021. Monte Carlo Variational Auto-Encoders. Pages 1024710257 of Meila, M., and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139. PMLR.Google Scholar
Tierney, L. 1998. A Note on Metropolis-Hastings Kernels for General State Spaces Annals of Applied Probability, 8(1), 19.Google Scholar
Tsvetkov, D., Hristov, L., and Angelova-Slavova, R. 2020. On the Convergence of the Metropolis-Hastings Markov Chains. arXiv:1302.0654v4.Google Scholar
Vahdat, A., Kreis, K., and Kautz, J. 2021. Score-Based Generative Modeling in Latent Space. arXiv:2106.05931.Google Scholar
Villani, C. 2003. Topics in Optimal Transportation. American Mathematical Society.CrossRefGoogle Scholar
Welling, M., and Teh, Y.-W. 2011. Bayesian Learning via Stochastic Gradient Langevin Dynamics. Pages 681688 of Getoor, L., and Scheffer, T. (eds.), ICML’11: Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress.Google Scholar
Winkler, C., Worrall, D., Hoogeboom, E., and Welling, M. 2019. Learning Likelihoods with Conditional Normalizing Flows. arXiv:1912.00042.Google Scholar
Wu, H., Köhler, J., and Noé, F. 2020. Stochastic Normalizing Flows. Pages 59335944 in Larochelle, H., Ranzato, M. A., Hadsell, R., Balcan, M.-F., and Lin, H.-T. (eds.), Advances in Neural Information Processing Systems. 33 (NeurIPS 2020). Curran Associates, Inc.Google Scholar
Zhang, Q., and Chen, Y. 2021. Diffusion Normalizing Flow. Pages 1628016291 in Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., and Wortman Vaughan, J. (eds.), Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Curran Associates, Inc.Google Scholar

Save element to Kindle

To save this element to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Generalized Normalizing Flows via Markov Chains
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Generalized Normalizing Flows via Markov Chains
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Generalized Normalizing Flows via Markov Chains
Available formats
×