Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances

Austin Brown; Galin L. Jones

doi:10.1017/jpr.2023.21

Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances

Part of: Markov processes

Published online by Cambridge University Press: 05 June 2023

Austin Brown

and

Galin L. Jones

Show author details

Austin Brown*: Affiliation:
University of Warwick
Galin L. Jones*: Affiliation:
University of Minnesota
*: *Postal address: Department of Statistics, University of Warwick, Coventry, UK. Email: [email protected]
**Postal address: School of Statistics, University of Minnesota, Minneapolis, MN, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Under mild assumptions, we show that the exact convergence rate in total variation is also exact in weaker Wasserstein distances for the Metropolis–Hastings independence sampler. We develop a new upper and lower bound on the worst-case Wasserstein distance when initialized from points. For an arbitrary point initialization, we show that the convergence rate is the same and matches the convergence rate in total variation. We derive exact convergence expressions for more general Wasserstein distances when initialization is at a specific point. Using optimization, we construct a novel centered independent proposal to develop exact convergence rates in Bayesian quantile regression and many generalized linear model settings. We show that the exact convergence rate can be upper bounded in Bayesian binary response regression (e.g. logistic and probit) when the sample size and dimension grow together.

Keywords

Bayesian statistics convergence analysis convergence rate lower bounds computational complexity Markov chain Monte Carlo Metropolis–Hastings

MSC classification

Primary: 60J05: Discrete-time Markov processes on general state spaces 60J22: Computational methods in Markov chains

Secondary: 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)

Type: Original Article
Information: Journal of Applied Probability , Volume 61 , Issue 1 , March 2024 , pp. 33 - 54

DOI: https://doi.org/10.1017/jpr.2023.21 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88, 669–679.CrossRef Google Scholar

Belloni, A. and Chernozhukov, V. (2009). On the computational complexity of MCMC-based estimators in large samples. Ann. Statist. 37, 2011–2055.CrossRef Google Scholar

Bogachev, V. I. (1998). Gaussian Measures. American Mathematical Society, Providence, RI.CrossRef Google Scholar

Brooks, S., Gelman, A., Jones, G. L. and Meng, X.-L. (2011). Handbook of Markov chain Monte Carlo. Chapman and Hall/CRC, New York.CrossRef Google Scholar

Brown, A. and Jones, G. L. (2023). Lower bounds on the rate of convergence for accept–reject-based Markov chains. Preprint, arXiv:2212.05955.Google Scholar

Demidenko, E. (2001). Computational aspects of probit model. Math. Commun. 6, 233–247.Google Scholar

Durmus, A. and Moulines, É. (2015). Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis adjusted Langevin algorithm. Statist. Comput. 25, 5–19.CrossRef Google Scholar

Durmus, A. and Moulines, É. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25, 2854–2882.CrossRef Google Scholar

Dwivedi, R., Chen, Y., Wainwright, M. J. and Yu, B. (2018). Log-concave sampling: Metropolis–Hastings algorithms are fast! Proc. Mach. Learn. Res. 75, 793–797.Google Scholar

Eberle, A. (2014). Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions. Ann. Appl. Prob. 24, 337–377.CrossRef Google Scholar

Ekvall, K. O. and Jones, G. L. (2021). Convergence analysis of a collapsed Gibbs sampler for Bayesian vector autoregressions. Electron. J. Statist. 15, 691–721.CrossRef Google Scholar

Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Prob. 8, 252–261.CrossRef Google Scholar

Gibbs, A. L. (2004). Convergence in the Wasserstein metric for Markov chain Monte Carlo algorithms with applications to image restoration. Stoch. Models 20, 473–492.CrossRef Google Scholar

Giraudo, D. (2014). Product measure with a Dirac delta marginal. Mathematics Stack Exchange. Available at: https://math.stackexchange.com/questions/794299/product-measure-with-a-dirac-delta-marginal.Google Scholar

Hairer, M., Stuart, A. M. and Vollmer, S. J. (2014). Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions. Ann. Appl. Prob. 24, 2455–2490.CrossRef Google Scholar

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109.CrossRef Google Scholar

Hiriart-Urruty, J.-B. and Lemaéchal, C. (2001). Fundamentals of Convex Analysis. Springer, Berlin.CrossRef Google Scholar

Jarner, S. F. and Hansen, E. (2000). Geometric ergodicity of Metropolis algorithms. Stoch. Process. Appl. 85, 341–361.CrossRef Google Scholar

Jin, R. and Tan, A. (2020). Central limit theorems for Markov chains based on their convergence rates in Wasserstein distance. Preprint, arXiv:2002.09427.Google Scholar

Johndrow, J. E., Smith, A., Pillai, N. and Dunson, D. B. (2019). MCMC for imbalanced categorical data. J. Amer. Statist. Assoc. 114, 1394–1403.CrossRef Google Scholar

Johnson, L. T. and Geyer, C. J. (2012). Variable transformation to obtain geometric ergodicity in the random-walk Metropolis algorithm. Ann. Statist. 40, 3050–3076.CrossRef Google Scholar

Jones, G. L. (2004). On the Markov chain central limit theorem. Prob. Surv. 1, 299–320.CrossRef Google Scholar

Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Prob. 38, 2418–2442.CrossRef Google Scholar

Kantorovich, L. V. and Rubinstein, G. S. (1957). On a function space in certain extremal problems. Dokl. Akad. Nauk USSR 115, 1058–1061.Google Scholar

Khare, K. and Hobert, J. P. (2012). Geometric ergodicity of the Gibbs sampler for Bayesian quantile regression. J. Multivar. Anal, 112, 108–116.CrossRef Google Scholar

Komorowski, T. and Walczuk, A. (2011). Central limit theorem for Markov processes with spectral gap in the Wasserstein metric. Stoch. Process. Appl. 122, 2155–2184.CrossRef Google Scholar

Liu, J. S. (1996). Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Statist. Comput. 6, 113–119.CrossRef Google Scholar

Madras, N. and Sezer, D. (2010). Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances. Bernoulli 16, 882–908.CrossRef Google Scholar

Mengersen, K. L. and Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. Ann. Statist. 24, 101–121.CrossRef Google Scholar

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092.CrossRef Google Scholar

Meyn, S. P. and Tweedie, R. L. (2009). Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press.CrossRef Google Scholar

Nesterov, Y. (2018). Lectures on Convex Optimization, 2nd edn. Springer, Cham.CrossRef Google Scholar

Papaspiliopoulos, O., Roberts, G. O. and Zanella, G. (2019). Scalable inference for crossed random effects models. Biometrika 107, 25–40.Google Scholar

Papaspiliopoulos, O., Stumpf-Fétizon, T. and Zanella, G. (2021). Scalable computation for Bayesian hierarchical models. Preprint, arXiv:2103.10875.Google Scholar

Pierre, J., Robert, C. P. and Smith, M. H. (2011). Using parallel computation to improve independent Metropolis–Hastings based estimation. J. Comput. Graph. Statist. 20, 616–635.Google Scholar

Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108, 1339–1349.CrossRef Google Scholar

Qin, Q. and Hobert, J. P. (2019). Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression. Ann. Statist. 47, 2320–2347.CrossRef Google Scholar

Qin, Q. and Hobert, J. P. (2021). On the limitations of single-step drift and minorization in Markov chain convergence analysis. Ann. Appl. Prob. 31, 1633–1659 CrossRef Google Scholar

Qin, Q. and Hobert, J. P. (2022). Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions. Ann. Inst. H. Poincaré 58, 872–889.CrossRef Google Scholar

Qin, Q. and Hobert, J. P. (2022). Wasserstein-based methods for convergence complexity analysis of MCMC with applications. Ann. Appl. Prob. 32, 124–166.CrossRef Google Scholar

Rajaratnam, B. and Sparks, D. (2015). MCMC-based inference in the era of big data: A fundamental analysis of the convergence complexity of high-dimensional chains. Preprint, arXiv:1508.00947.Google Scholar

Roberts, G. O. and Tweedie, R. L. (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 95–110.CrossRef Google Scholar

Robertson, N., Flegal, J. M., Vats, D. and Jones, G. L. (2021). Assessing and visualizing simultaneous simulation error. J. Comput. Graph. Statist. 30, 324–334.CrossRef Google Scholar

Rosenthal, J. S. (1995). Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90, 558–566.CrossRef Google Scholar

Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653–667.CrossRef Google Scholar

Smith, R. L. and Tierney, L. (1996). Exact transition probabilities for the independence Metropolis sampler. Technical report, Department of Statistics, University of Cambridge.Google Scholar

Sur, P. and Candès, E. J. (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proc. Nat. Acad. Sci. 116, 14516–14525.CrossRef Google Scholar

Tierney, L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. 22, 1701–1728.Google Scholar

Vats, D., Flegal, J. M. and Jones, G. L. (2019). Multivariate output analysis for Markov chain Monte Carlo. Biometrika 106, 321–337.CrossRef Google Scholar

Villani, C. (2003). Topics in Optimal Transportation. American Mathematical Society, Providence, RI.CrossRef Google Scholar

Villani, C. (2009). Optimal Transport: Old and New. Springer, Berlin.CrossRef Google Scholar

Wang, G. (2022). Exact convergence rate analysis of the independent Metropolis–Hastings algorithms. Bernoulli 28, 2012–2033.CrossRef Google Scholar

Yang, Y., Wainwright, M. J. and Jordan, M. I. (2016). On the computational complexity of high-dimensional Bayesian variable selection. Ann. Statist. 44, 2497–2532.CrossRef Google Scholar

Article contents

Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests