Hostname: page-component-77c89778f8-5wvtr Total loading time: 0 Render date: 2024-07-21T20:10:49.891Z Has data issue: false hasContentIssue false

Is Partial-Dimension Convergence a Problem for Inferences from MCMC Algorithms?

Published online by Cambridge University Press:  19 August 2007

Jeff Gill*
Affiliation:
Center for Applied Statistics, Department of Political Science, Washington University, One Brookings Drive, St Louis, MO 63130-4899, e-mail: [email protected]

Abstract

Increasingly, political science researchers are turning to Markov chain Monte Carlo methods to solve inferential problems with complex models and problematic data. This is an enormously powerful set of tools based on replacing difficult or impossible analytical work with simulated empirical draws from the distributions of interest. Although practitioners are generally aware of the importance of convergence of the Markov chain, many are not fully aware of the difficulties in fully assessing convergence across multiple dimensions. In most applied circumstances, every parameter dimension must be converged for the others to converge. The usual culprit is slow mixing of the Markov chain and therefore slow convergence towards the target distribution. This work demonstrates the partial convergence problem for the two dominant algorithms and illustrates these issues with empirical examples.

Type
Research Article
Copyright
Copyright © The Author 2007. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albert, James H., and Chib, Siddhartha. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88: 669–79.CrossRefGoogle Scholar
Amemiya, Takeshi. 1985. Advanced econometrics. Cambridge, MA: Harvard University Press.Google Scholar
Amit, Y. 1991. On rates of convergence of stochastic relaxation for Gaussian and non-Gaussian distributions. Journal of Multivariate Analysis 38: 8299.Google Scholar
Amit, Y., and Grenander, U. 1991. Comparing sweep strategies for stochastic relaxation. Journal of Multivariate Analysis 37: 197222.Google Scholar
Asmussen, S. P., Glynn, P., and Thorisson, H. 1992. Stationarity detection in the initial transient problem. ACM Transactions on Modeling and Computer Simulation 2: 130–57.CrossRefGoogle Scholar
Athreya, K. B., Doss, Hani, and Sethuraman, Jayaram. 1996. On the convergence of the Markov chain simulation method. Annals of Statistics 24: 69100.Google Scholar
Athreya, K. B., and Ney, P. 1978. A new approach to the limit theory of recurrent Markov chains. Transactions of the American Mathematical Society 245: 493501.Google Scholar
Barker, A. A. 1965. Monte Carlo calculation of the radical distribution functions for a proton-electron plasma. Australian Journal of Physics 18: 119–33.Google Scholar
Brooks, S. P., Dellaportas, P., and Roberts, G. O. 1997. An approach to diagnosing total variation convergence of MCMC algorithms. Journal of Computational and Graphical Statistics 6: 251–65.Google Scholar
Brooks, S. P., and Roberts, G. O. 1998. Convergence assessment techniques for Markov chain Monte Carlo. Statistics and Computing 8: 319–35.Google Scholar
Carlin, B. P., and Polson, N. G. 1991. Inference for nonconjugate Bayesian models using the Gibbs sampler. Canadian Journal of Statistics 19: 399405.Google Scholar
Casella, G., and Robert, C. P. 1996. Rao-Blackwellization of sampling schemes. Biometrika 93: 8194.Google Scholar
Chan, K. S. 1993. Asymptotic behavior of the Gibbs sampler. Journal of the American Statistical Association 88: 320–6.Google Scholar
Chan, K. S., and Geyer, C. J. 1994. Discussion of Markov chains for exploring posterior distributions. (L. Tierney). Annals of Statistics 22: 1747–58.CrossRefGoogle Scholar
Chen, M.-H., Shao, Q.-M., and Ibrahim, J. G. 2000. Monte Carlo methods in Bayesian computation. New York: Springer-Verlag.CrossRefGoogle Scholar
Chib, S. 1992. Bayes inference in the tobit censored regression model. Journal of Econometrics 51: 7999.CrossRefGoogle Scholar
Chib, S., and Greenberg, E. 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 49: 327–35.Google Scholar
Cowles, Mary Kathryn, and Carlin, Bradley P. 1996. Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association 91: 883904.CrossRefGoogle Scholar
Cowles, Mary Kathryn, Roberts, Gareth O., and Rosenthal, Jeffrey S. 1999. Possible biases induced by MCMC convergence diagnostics. Journal of Statistical Computation and Simulation 64: 87104.Google Scholar
Cowles, Mary Kathryn, and Rosenthal, Jeffrey S. 1998. A simulation approach to convergence rates for Markov chain Monte Carlo algorithms. Statistics and Computing 8: 115–24.CrossRefGoogle Scholar
Creutz, M. 1979. Confinement and the critical dimensionality of space-time. Physical Review Letters 43: 553–56.Google Scholar
Creutz, M., Jacobs, L., and Rebbi, C. 1983. Monte Carlo computations in lattice gauge theories. Physical Review 95: 201.Google Scholar
Diaconis, Persi, and Laurent, Saloff-Coste. 1993. Comparison theorems for reversible Markov chains. Annals of Applied Probability 3: 696730.CrossRefGoogle Scholar
Diaconis, Persi, and Laurent, Saloff-Coste. 1996. Logarithmic Sobolev inequalities for finite Markov chains. Annals of Applied Probability 6: 695750.Google Scholar
Diaconis, Persi, and Stroock, Daniel. 1991. Geometric bounds for eigenvalues of Markov chains. Annals of Applied Probability 1: 3661.Google Scholar
Doeblin, W. 1940. Élements d'Une Thérie Générale des Chaînes Simple Constantes de Markoff. Annales Scientifiques de l'Ecole Normale Supérieure 57: 61111.CrossRefGoogle Scholar
Fill, J. 1991. Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains with an application to the exclusion process. Annals of Applied Probability 1: 62–7.Google Scholar
Frieze, A., Kannan, R., and Polson, N. G. 1993. Sampling from log-concave distributions. Annals of Applied Probability 4: 812–37.Google Scholar
Frigessi, Arnoldo, Hwang, Chii-Ruey, Di Steffano, Patrizia, and Sheu, Shuenn-Jyi. 1993. Convergence rates of the Gibbs sampler, the Metropolis algorithm, and other single-site updating dynamics. Journal of the Royal Statistical Society, Series B 55: 205–20.Google Scholar
Fulman, Jason, and Wilmer, Elizabeth L. 1999. Comparing eigenvalue bounds for Markov chains: When does Poincare beat Cheeger? Annals of Applied Probability 9: 113.Google Scholar
Gawande, Kishore. 1998. Comparing theories of endogenous protection: Bayesian comparison of tobit models using Gibbs sampling output. Review of Economics and Statistics 80: 128–40.CrossRefGoogle Scholar
Gelman, Andrew, Huang, Zaiying, van Dyk, David A., and Boscardin, John W. 2006. Transformed and parameter-expanded Gibbs samplers for multilevel linear and generalized linear models. Working paper: http://www.stat.columbia.edu/∼gelman/research/unpublished/alpha4.pdf.Google Scholar
Gelman, A., Roberts, G. O., and Gilks, W. R. 1996. Efficient Metropolis jumping rules. In Bayesian Statistics 5, eds. Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., 599608. Oxford: Oxford University.Google Scholar
Gelman, A., and Rubin, D. B. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7: 457511.Google Scholar
Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721–41.Google Scholar
Geyer, C., and Thompson, E. 1995. Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association 90: 909–20.Google Scholar
Gilks, W., and Roberts, G. 1996. Strategies for improving MCMC. In Markov Chain Monte Carlo in practice, eds. Gilks, W., Richardson, S., and Spiegelhalter, D., 89114. New York: Chapman & Hall.Google Scholar
Gill, Jeff. 2007. Bayesian methods for the social and behavioral sciences. 2nd ed. New York: Chapman & Hall.Google Scholar
Gill, Jeff, and Casella, George. 2004. Dynamic tempered transitions for exploring multimodal posterior distributions. Political Analysis 12: 425–43.CrossRefGoogle Scholar
Gill, Jeff, and Walker, Lee. 2005. Elicited priors for Bayesian model specifications in political science research. Journal of Politics 67: 841–87.Google Scholar
Goodman, J., and Sokal, A. D. 1989. Multigrid Monte Carlo method. Conceptual foundations. Physics Review D 40: 2035–71.Google Scholar
Grenander, Ulf. 1983. Tutorial in pattern theory. Technical Report, Division of Applied Mathematics. Providence, RI: Brown University.Google Scholar
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97109.Google Scholar
Hill, J., and Kriesi, H. 2001. Classification by opinion changing behavior: A mixture model approach. Political Analysis 4: 301–24.Google Scholar
Hobert, J., Robert, C., and Goutis, C. 1997. Connectedness conditions for the convergence of the Gibbs sampler. Statistics and Probability Letters 33: 235–40.Google Scholar
Imai, Kosuke, and van Dyk, David A. 2005. A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics 124: 311–34.Google Scholar
Ingrassia, Salvatore. 1994. On the rate of convergence of the Metropolis algorithm and Gibbs sampler by geometric bounds. Annals of Applied Probability 4: 347–89.Google Scholar
Jackman, Simon. 2000a. Estimation and inference are missing data problems: Unifying social science statistics via Bayesian simulation. Political Analysis 8: 307–32.Google Scholar
Jackman, Simon. 2000b. Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo . American Journal of Political Science 44: 375404.Google Scholar
Jackman, Simon. 2001. Multidimensional analysis of Roll Call Data via Bayesian simulation: Identification, estimation, inference, and model checking. Political Analysis 9: 227–41.Google Scholar
Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. M. 1998. Markov chain Monte Carlo in practice: A roundtable discussion. American Statistician 52: 93100.Google Scholar
Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. 1983. Optimization by simulated annealing. Science 220: 671–80.CrossRefGoogle ScholarPubMed
Krishnan, Neeraja M., Seligman, Hervé, Stewart, Caro-Beth Stewart, de Konig, A. P. Jason, and Pollock, David D. 2004. Ancestral sequence reconstruction in primate mitochondrial DNA: Compositional bias and effect on functional inference. Molecular Biology and Evolution 21: 1871–83.Google Scholar
Lawler, G. F., and Sokal, A. D. 1988. Bounds on the L 2 spectrum for Markov chains and their applications. Transactions of the American Mathematical Society 309: 557–80.Google Scholar
Liu, J. S. 1994. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association 89: 958–66.Google Scholar
Liu, J. S. 1996. Metropolized independent sampling with comparisons to rejection sampling and importance sampling . Statistics and Computing 6: 113–9.Google Scholar
Liu, J. S. 2001. Monte Carlo strategies in scientific computing. New York: Springer-Verlag.Google Scholar
Liu, J. S., Wong, W. H., and Kong, , 1994. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81: 2740.Google Scholar
Macintosh, Duncan. 1994. Partial convergence and approximate truth. British Journal for the Philosophy of Science 45: 153–70.Google Scholar
Manita, A. D. 1997. Convergence time to equilibrium for multi-particle Markov chains. Preprint of French-Russian Institute, No. 3. Moscow University, September 1997. http://mech.math.msu.su/∼manita/.Google Scholar
Marinari, E., and Parisi, G. 1992. Simulated tempering: A new Monte Carlo scheme. Europhysics Letters 19: 451–8.Google Scholar
Martin, Andrew D., and Quinn, Kevin M. 2002. Dynamic ideal point estimation via Markov chain Monte Carlo for the U.S. Supreme Court, 1953-1999. Political Analysis 10: 134–53.Google Scholar
Meng, X.-L., and van Dyk, D. A. 1999. Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86: 301320.Google Scholar
Mengersen, K. L., and Tweedie, R. L. 1996. Rates of convergence of the Hastings and Metropolis algorithms. Annals of Statistics 24: 101–21.Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. 1953. Equations of state calculations by fast computing machine. Journal of Chemical Physics 21: 1087–91.Google Scholar
Meyn, S. P., and Tweedie, R. L. 1993. Markov chains and stochastic stability. New York: Springer-Verlag.Google Scholar
Meyn, S. P., and Tweedie, R. L. 1994. Computable bounds for geometric convergence rates of Markov chains. Annals of Probability 4: 9811011.Google Scholar
Mira, A., and Tierney, L. 2001. Efficiency and convergence properties of slice samplers. Scandanavian Journal of Statistics 29: 1035–53.Google Scholar
Norrander, Barbara. 2000. The multi-layered impact of public opinion on capital punishment implementation in the American states. Political Research Quarterly 53: 771–93.Google Scholar
Norris, J. R. 1997. Markov chains. Cambridge: Cambridge University Press.Google Scholar
Nummelin, E. 1984. General irreducible Markov chains and non-negative operators. Cambridge: Cambridge University Press.Google Scholar
Peskun, P. H. 1973. Optimum Monte Carlo sampling using Markov chains. Biometrika 60: 607–12.CrossRefGoogle Scholar
Polson, N. G. 1996. Convergence of Markov chain Monte Carlo algorithm. In Bayesian statistics 5, eds. Bernardo, J. M., Smith, A. F. M., Dawid, A. P., and Berger, J. O., 297323. Oxford: Oxford University Press.Google Scholar
Quinn, Kevin M., Martin, Andrew, and Whitford, Andrew B. 1999. Voter choice in multi-party democracies: A test of competing theories and models. American Journal of Political Science 43: 1231–47.Google Scholar
Ripley, B. D. 1979. Algorithm AS 137: Simulating spatial patterns: Dependent samples from a multivariate density. Applied Statistics 28: 109–12.Google Scholar
Robert, C. P. 1995. Convergence control methods for Markov chain Monte Carlo algorithms. Statistical Science 10: 231–53.Google Scholar
Robert, C. P., and Casella, G. 2004. Monte Carlo statistical methods. 2nd ed. New York: Springer-Verlag.Google Scholar
Robert, Christian P., and Richardson, Sylvia. 1998. Markov chain Monte Carlo methods. In Discretization and MCMC Convergence Assessment, ed. Robert, C. P., 125. New York: Springer.Google Scholar
Roberts, G. O., and Polson, N. G. 1994. On the geometric convergence of the Gibbs sampler. Journal of the Royal Statistical Society, Series B 56: 377–84.Google Scholar
Roberts, G. O., and Rosenthal, J. S. 1998a. Markov chain Monte Carlo: Some practical implications of theoretical results. Canadian Journal of Statistics 26: 532.CrossRefGoogle Scholar
Roberts, G. O., and Rosenthal, J. S. 1998b. Two convergence properties of hybrid samplers. Annals of Applied Probability 8: 397407.Google Scholar
Roberts, G. O., and Rosenthal, J. S. 1999. Convergence of the slice sampler Markov chains. Journal of the Royal Statistical Society, Series B 61: 643–60.Google Scholar
Roberts, G. O., and Sahu, S. K. 1997. Updating schemes, correlation structure, blocking and parameterization for the Gibbs sampler. Journal of the Royal Statistical Society, Series B 59: 291307.CrossRefGoogle Scholar
Roberts, G. O., and Smith, A. F. M. 1994. Simple conditions for the convergence of the Gibbs sampler and Metropolis-Hastings algorithms. Stochastic Processes and their Applications 44: 207–16.Google Scholar
Roberts, G. O., and Tweedie, R. L. 1996. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83: 95110.Google Scholar
Rosenblatt, M. 1971. Markov processes. Structure and asymptotic behavior. New York: Springer-Verlag.Google Scholar
Rosenthal, Jeffrey S. 1993. Rates of convergence for data augmentation on finite sample spaces. Annals of Applied Probability 3: 819–39.Google Scholar
Rosenthal, Jeffrey S. 1995a. Convergence rates for Markov chains. SIAM Review 37: 387405.Google Scholar
Rosenthal, Jeffrey S. 1995b. Minorization conditions and convergence rates for Markov chain Monte Carlo. Journal of the American Statistical Association 90: 558–66.Google Scholar
Rosenthal, Jeffrey S. 1996. Analysis of the Gibbs sampler for a model related to James-Stein estimators. Statistics and Computing 6: 269–75.Google Scholar
Salzano, Marcia, and Schonmann, Roberto H. 1997. The second lowest extremal invariant measure of the contact process. Annals of Probability 25: 1846–71.Google Scholar
Salzano, Marcia, and Schonmann, Roberto H. 1999. The second lowest extremal invariant measure of the contact process II. Annals of Probability 27: 845–75.Google Scholar
Sinclair, A. J., and Jerrum, M. R. 1988. Conductance and the rapid mixing property for Markov chains: The approximation of the permanent resolved. Proceedings of the 20th Annual ACM Symposium on the Theory of Computing 235–44.Google Scholar
Sinclair, A. J., and Jerrum, M. R. 1989. Approximate counting, uniform generation and rapidly mixing Markov chains. Information and Computation 82: 93133.Google Scholar
Smith, Alastair. 1999. Testing theories of strategic choice: The example of crisis escalation. American Journal of Political Science 43: 1254–83.Google Scholar
Swendsen, R. H., and Wang, J. S. 1987. Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters 58: 86–8.CrossRefGoogle ScholarPubMed
Tanner, M. A., and Wong, W. H. 1987. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Society 82: 528–50.Google Scholar
Tierney, L. 1994. Markov chains for exploring posterior distributions. Annals of Statistics 22: 1701–28.Google Scholar
Tobin, James. 1958. Estimation of relationships for limited dependent variables. Econometrica 26: 2436.CrossRefGoogle Scholar
Tweedie, R. L. 1975. Sufficient conditions for ergodicity and recurrence of Markov chains on a general state space. Stochastic Processes Applications 3: 385403.Google Scholar
van Dyk, D., and Meng, X.-L. 2001. The art of data augmentation. Journal of Computational and Graphical Statistics 10: 150.Google Scholar
Vines, S., and Gilks, W. 1994. Technical Report, MRC Biostatistics Unit, Cambridge University.Google Scholar
Western, Bruce. 1998. Causal heterogeneity in comparative research: A Bayesian hierarchical modeling approach. American Journal of Political Science 42(4): 1233–59.Google Scholar
Zellner, A., and Min, C.-K. 1995. Gibbs sampler convergence criteria. Journal of the American Statistical Association 90: 921–7.Google Scholar