Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-08T09:01:44.419Z Has data issue: false hasContentIssue false

Variational Approximations for Categorical Causal Modeling With Latent Variables

Published online by Cambridge University Press:  01 January 2025

K. Humphreys*
Affiliation:
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet
D. M. Titterington
Affiliation:
Department of Statistics, University of Glasgow
*
Requests for reprints should be sent to Keith Humphreys, Department of Medical Epidemiology and Biostatistics, P.O. Box 281, Karolinska Institutet, 171 77 Stockholm, SWEDEN. E-Mail: [email protected]

Abstract

Latent class models in the social and behavioral sciences have remained structurally simple. One reason for this is that inference in statistical models can be computationally difficult. Methods for approximate inference, known as variational approximations, which have been developed in the machine learning, graphical modeling and statistical physics literatures, can be used to alleviate the computational difficulties of inference for latent variable models. The aim of the present article is to set these methods alongside some social and behavioral science literature to which they are relevant, and in particular to consider their potential for “categorical causal modeling”, using latent class analysis. We have collated a number of popular categorical-data models with latent variables and causal structure, typically incorporating a Markovian structure. The efficacy of the approximation methods has been demonstrated through simulations related to an important behavioral science model.

Type
Article
Copyright
Copyright © 2003 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research was supported by a grant from the UK Engineering and Physical Sciences Research Council. The authors would like to thank anonymous reviewers and the Associate Editor for their very helpful comments on earlier versions of the manuscript.

References

Ajzen, I. (1991). The theory of planned behavior. Organisational Behavior and Human Decision Processes, 50, 179211.CrossRefGoogle Scholar
Amari, S. (1995). Information geometry of the EM and em algorithms for neural networks. Neural Networks, 8, 13791408.CrossRefGoogle Scholar
Bahadur, R.R. (1961). A representation of the joint distribution of responses ton dichotomous items. In Solomon, H. (Eds.), Studies in item analysis and prediction (pp. 158168). Standford, CA: Stanford University Press.Google Scholar
Barber, D., & Wiegerinck, W. (1998). Tractable undirected approximations for graphical models. In Niklasson, L., Bodén, T., & Ziemke, M. (Eds.), Proceedings of the Eighth International Conference on Artificial Neural Networks (pp. 9398). Skövde, Sweden: Springer.Google Scholar
Barber, D., Wiegerinck, W. (1999). Tractable variational structures for approximating graphical models. In Kearns, M.S., Solla, S.A., & Cohn, D.A. (Eds.), Advances in Neural Information Processing Systems (pp. 183189). Cambridge, MA: MIT Press.Google Scholar
Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164171.CrossRefGoogle Scholar
Bentler, P.M. (1989). EQS Structural Equations Program Manual. Los Angeles, CA: BMDP Statistical Software.Google Scholar
Bishop, C.M., Lawrence, N., Jaakkola, T., & Jordan, M.I. (1998). Approximating posterior distributions in belief networks using mixtures. In Jordan, M.I., Kearns, M.J., Solla, S.A. (Eds.), Advances in Neural Information Processing Systems (pp. 416422). Cambridge, MA: MIT Press.Google Scholar
Bollen, K.A. (1989). Structural equations with latent variables. New York, NY: John Wiley & Sons.CrossRefGoogle Scholar
Browne, M.W. (1984). Asymptotically distribution free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 6283.CrossRefGoogle ScholarPubMed
Byrne, B.M. (1995). One application of structural equation modeling from two perspectives: Exploring the EQS and LISREL strategies. In Hoyle, R. (Eds.), Structural equation modeling concepts, issues and applications (pp. 138161). Thousand Oaks, CA: Sage.Google Scholar
Cannings, C., Thompson, E.A., & Skolnick, M.H. (1978). Probability functions on complex pedigrees. Advances in Applied Probability, 10, 2691.CrossRefGoogle Scholar
Cooper, G.F. (1990). Computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42, 393405.CrossRefGoogle Scholar
Cowell, R. (1999). Intoduction to inference for Bayesian networks. In Jordan, M.I. (Eds.), Learning in graphical models (pp. 626). Dordrecht, The Netherlands: Kluwer.Google Scholar
Dayan, P., Hinton, G.E., Neal, R.M., & Zemel, R.S. (1995). The Helmholtz machine. Neural Computation, 7, 889904.CrossRefGoogle ScholarPubMed
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 138.CrossRefGoogle Scholar
Dunmur, A.P., Titterington, D.M. (1999). Analysis of latent structure models with multidimensional latent variables. In Kay, J.W., & Titterington, D.M. (Eds.), Statistics and neural networks: Advances at the interface (pp. 165194). Oxford, U.K.: Oxford University Press.Google Scholar
Gershenfeld, N.A. (1999). The nature of mathematical modeling. Cambridge, U.K.: Cambridge University Press.Google Scholar
Ghahramani, Z. (1996). Factorial learning and the EM algorithm. In Tesauro, G., Touretzky, D.S., & Leen, T.K. (Eds.), Advances in neural information processing systems (pp. 617624). Cambridge, MA: MIT Press.Google Scholar
Ghahramani, Z., Jordan, M.I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245273.CrossRefGoogle Scholar
Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215231.CrossRefGoogle Scholar
Hagenaars, J.A. (1993). Loglinear models with latent variables. Newbury Park, CA: Sage.CrossRefGoogle Scholar
Hagenaars, J.A. (1998). Categorical causal modeling: Latent class analysis and directed log-linear models with latent variables. Sociological Methods and Research, 26, 436486.CrossRefGoogle Scholar
Hall, P., Humphreys, K., & Titterington, D.M. (2002). On the adequacy of variational lower bound functions for likelihood-based inference in Markovian models with missing values. Journal of the Royal Statistical Society, Series B, 64, 549564.CrossRefGoogle Scholar
Humphreys, K., Titterington, D.M. (1999). The exploration of new methods for learning in binary Boltzmann machines. In Heckerman, D., & Whittaker, J. (Eds.), Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (pp. 209214). San Francisco, CA: Morgan Kaufmann.Google Scholar
Humphreys, K., Titterington, D.M. (2000). Improving the mean field approximation in belief networks using Bahadur's reparameterization of the multivariate binary distribution. Neural Processing Letters, 12, 183197.CrossRefGoogle Scholar
Jensen, F. (1996). An introduction to Bayesian networks. London, U.K.: UCL Press.Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. In Jordan, M.I. (Eds.), Learning in graphical models (pp. 105161). Dordrecht, The Netherlands: Kluwer.Google Scholar
Jöreskog, K.G. (1979). Statistical estimation of structural models in longitudinal-development investigations. In Nesselroade, J.R., & Baltes, P.B. (Eds.), Longitudinal research in the study of behavior and development (pp. 303351). New York, NY: Academic Press.Google Scholar
Jöreskog, K.G., & Sörbom, D. (1984). LISREL VI: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood. Chicago, IL: Scientific software.Google Scholar
Lange, K., & Elston, R.C. (1975). Extension to pedigree analysis: Likelihood computations for simple and complex pedigrees. Human Heredity, 25, 95105.CrossRefGoogle Scholar
Langeheine, R. (1994). Latent variables Markov models. In Von Eye, A., & Clogg, C.C. (Eds.), Latent variables analysis: Applications for developmental research (pp. 373395). Beverly Hills, CA: Sage.Google Scholar
Lauritzen, S.L. (1995). The EM algorithm for graphical association models. Computational Statistics and Data Analysis, 10, 191200.CrossRefGoogle Scholar
Lauritzen, S.L. (1996). Graphical models. Oxford, U.K.: Clarendon Press.CrossRefGoogle Scholar
Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their applications to expert systems (with discussion). Journal of the Royal Statistical Society, Series B, 50, 157224.CrossRefGoogle Scholar
Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. Boston, MA: Houghton-Mifflin.Google Scholar
MacDonald, I.L., & Zucchini, W. (1997). Hidden Markov and other models for discrete-valued time series. London, U.K.: Chapman and Hall.Google Scholar
McArdle, J.J., & Aber, M.S. (1990). Patterns of change within latent structure equation models. In von Eye, A. (Eds.), Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 151224). Boston, MA: Academic Press.CrossRefGoogle Scholar
McHugh, R.B. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21, 331347.CrossRefGoogle Scholar
Neal, R.M., & Hinton, G.E. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan, M.I. (Eds.), Learning in graphical models (pp. 355368). Cambridge, MA: MIT Press.Google Scholar
Ng, A.Y., & Jordan, M.I. (2000). Approximate inference algorithms for two-layer Bayesian networks. In Solla, S.A., Leen, T.K., & Müller, K.-R. (Eds.), Advances in neural information processing systems (pp. 533539). Cambridge, MA: MIT Press.Google Scholar
Olsson, U., & Bergman, L.R. (1977). A longitudinal factor model for studying change in ability structure. Multivariate Behavioral Research, 12, 221241.CrossRefGoogle ScholarPubMed
Opper, M., & Saad, D. (2001). Advanced mean field methods: Theory and practice. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.Google Scholar
Pearl, J. (1998). Graphs, causality and structural equation models. Sociological Methods and Research, 27, 226284.CrossRefGoogle Scholar
Pearl, J. (2000). Causality. Cambridge, U.K.: Cambridge University Press.Google Scholar
Peterson, C., & Anderson, J.R. (1987). A mean field theory learning algorithm for neural networks. Complex Systems, 1, 9951019.Google Scholar
Pfeffermann, D., Skinner, C.J., & Humphreys, K. (1998). The estimation of gross flows in the presence of measurement error using auxiliary variables. Journal of the Royal Statistical Society, Series A, 161, 1332.CrossRefGoogle Scholar
Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3, 416.CrossRefGoogle Scholar
Reinecke, J. (1997). Testing the theory of planned behavior with latent Markov models. In Rost, J., & Langeheine, R. (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 398411). Münster, Germany: Waxmann.Google Scholar
Reinecke, J., Schmidt, P., & Ajzen, I. (1996). Application of the theory of planned behavior to adolescents' condom use: A panel study. Journal of Applied Social Psychology, 26, 749772.CrossRefGoogle Scholar
Saul, L.K., Jaakkola, T., & Jordan, M.I. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4, 6176.CrossRefGoogle Scholar
Saul, L.K., & Jordan, M.I. (1995). Boltzmann Chains and Hidden Markov Models. In Tesauro, G., Touretzky, D.S., & Leen, T.K. (Eds.), Advances in neural information processing systems (pp. 435442). Cambridge, MA: MIT Press.Google Scholar
Saul, L.K., & Jordan, M.I. (1996). Exploiting tractable substructures in intractable networks. In Touretzky, D.S., Mozer, M.C., & Hasselmo, M.E. (Eds.), Advances in neural information processing systems (pp. 486492). Cambridge, MA: MIT Press.Google Scholar
Seung, H. (1995). Annealed theories of learning. In Oh, J.-H., Kwon, C., & Cho, S. (Eds.), Neural networks: The statistical mechanics perspective, Proceedings of the CTP-PRSRI Joint workshop on theoretical physics. Singapore, Malaysia: World Scientific.Google Scholar
Smyth, P. (1997). Clustering sequences with hidden Markov models. In Mozer, M.C., Jordan, M.I., & Petsche, T. (Eds.), Advances in neural information processing systems (pp. 648654). Cambridge, MA: MIT Press.Google Scholar
Smyth, P., Heckerman, D., & Jordan, M.I. (1997). Probability independence networks for hidden Markov probability models. Neural Computation, 9, 227269.CrossRefGoogle ScholarPubMed
Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In von Eye, A. (Eds.), Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 125150). Boston, MA: Academic Press.CrossRefGoogle Scholar
van de Pol, F., & Langeheine, R. (1990). Mixed Markov latent class models. In Clogg, C.C. (Eds.), Sociological methodology (pp. 213247). Oxford, U.K.: Blackwell.Google Scholar
West, S.G., Finch, J.F., & Curran, P.J. (1995). Structural equation models with nonnormal variables. In Hoyle, R. (Eds.), Structural equation modeling concepts, issues and applications. Thousand Oaks, CA: Sage.Google Scholar
Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York, NY: John Wiley & Sons.Google Scholar
Wiegerinck, W., & Barber, D. (1999). Variational belief networks for approximate inference. In La Poutre, , & van den Herik, (Eds.), Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence (pp. 177183). Amsterdam, The Netherlands: CWI.Google Scholar
Wiggins, L.M. (1955). Mathematical models for the analysis of multi-wave panels. New York City, NY: Columbia University.Google Scholar
Wiggins, L.M. (1973). Panel Aanalysis: Latent probability models for attitude and behavioral processes. San Francisco, CA: Jossey-Bass/Elsevier.Google Scholar
Zhang, J. (1996). The application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields. IEEE Transactions on Image Processing, 5, 12081214.CrossRefGoogle ScholarPubMed