Variational Approximations for Categorical Causal Modeling With Latent Variables

K. Humphreys; D. M. Titterington

doi:10.1007/BF02294734

Variational Approximations for Categorical Causal Modeling With Latent Variables

Published online by Cambridge University Press: 01 January 2025

K. Humphreys and

D. M. Titterington

Show author details

K. Humphreys*: Affiliation:
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet
D. M. Titterington: Affiliation:
Department of Statistics, University of Glasgow
*: Requests for reprints should be sent to Keith Humphreys, Department of Medical Epidemiology and Biostatistics, P.O. Box 281, Karolinska Institutet, 171 77 Stockholm, SWEDEN. E-Mail: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Latent class models in the social and behavioral sciences have remained structurally simple. One reason for this is that inference in statistical models can be computationally difficult. Methods for approximate inference, known as variational approximations, which have been developed in the machine learning, graphical modeling and statistical physics literatures, can be used to alleviate the computational difficulties of inference for latent variable models. The aim of the present article is to set these methods alongside some social and behavioral science literature to which they are relevant, and in particular to consider their potential for “categorical causal modeling”, using latent class analysis. We have collated a number of popular categorical-data models with latent variables and causal structure, typically incorporating a Markovian structure. The efficacy of the approximation methods has been demonstrated through simulations related to an important behavioral science model.

Keywords

EM algorithm causal model latent class variational approximation

Type: Article
Information: Psychometrika , Volume 68 , Issue 3 , September 2003 , pp. 391 - 412

DOI: https://doi.org/10.1007/BF02294734 [Opens in a new window]
Copyright: Copyright © 2003 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Research was supported by a grant from the UK Engineering and Physical Sciences Research Council. The authors would like to thank anonymous reviewers and the Associate Editor for their very helpful comments on earlier versions of the manuscript.

References

Ajzen, I. (1991). The theory of planned behavior. Organisational Behavior and Human Decision Processes, 50, 179–211.CrossRef Google Scholar

Amari, S. (1995). Information geometry of the EM and em algorithms for neural networks. Neural Networks, 8, 1379–1408.CrossRef Google Scholar

Bahadur, R.R. (1961). A representation of the joint distribution of responses ton dichotomous items. In Solomon, H. (Eds.), Studies in item analysis and prediction (pp. 158–168). Standford, CA: Stanford University Press.Google Scholar

Barber, D., & Wiegerinck, W. (1998). Tractable undirected approximations for graphical models. In Niklasson, L., Bodén, T., & Ziemke, M. (Eds.), Proceedings of the Eighth International Conference on Artificial Neural Networks (pp. 93–98). Skövde, Sweden: Springer.Google Scholar

Barber, D., Wiegerinck, W. (1999). Tractable variational structures for approximating graphical models. In Kearns, M.S., Solla, S.A., & Cohn, D.A. (Eds.), Advances in Neural Information Processing Systems (pp. 183–189). Cambridge, MA: MIT Press.Google Scholar

Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164–171.CrossRef Google Scholar

Bentler, P.M. (1989). EQS Structural Equations Program Manual. Los Angeles, CA: BMDP Statistical Software.Google Scholar

Bishop, C.M., Lawrence, N., Jaakkola, T., & Jordan, M.I. (1998). Approximating posterior distributions in belief networks using mixtures. In Jordan, M.I., Kearns, M.J., Solla, S.A. (Eds.), Advances in Neural Information Processing Systems (pp. 416–422). Cambridge, MA: MIT Press.Google Scholar

Bollen, K.A. (1989). Structural equations with latent variables. New York, NY: John Wiley & Sons.CrossRef Google Scholar

Browne, M.W. (1984). Asymptotically distribution free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.CrossRef Google Scholar PubMed

Byrne, B.M. (1995). One application of structural equation modeling from two perspectives: Exploring the EQS and LISREL strategies. In Hoyle, R. (Eds.), Structural equation modeling concepts, issues and applications (pp. 138–161). Thousand Oaks, CA: Sage.Google Scholar

Cannings, C., Thompson, E.A., & Skolnick, M.H. (1978). Probability functions on complex pedigrees. Advances in Applied Probability, 10, 26–91.CrossRef Google Scholar

Cooper, G.F. (1990). Computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42, 393–405.CrossRef Google Scholar

Cowell, R. (1999). Intoduction to inference for Bayesian networks. In Jordan, M.I. (Eds.), Learning in graphical models (pp. 6–26). Dordrecht, The Netherlands: Kluwer.Google Scholar

Dayan, P., Hinton, G.E., Neal, R.M., & Zemel, R.S. (1995). The Helmholtz machine. Neural Computation, 7, 889–904.CrossRef Google Scholar PubMed

Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.CrossRef Google Scholar

Dunmur, A.P., Titterington, D.M. (1999). Analysis of latent structure models with multidimensional latent variables. In Kay, J.W., & Titterington, D.M. (Eds.), Statistics and neural networks: Advances at the interface (pp. 165–194). Oxford, U.K.: Oxford University Press.Google Scholar

Gershenfeld, N.A. (1999). The nature of mathematical modeling. Cambridge, U.K.: Cambridge University Press.Google Scholar

Ghahramani, Z. (1996). Factorial learning and the EM algorithm. In Tesauro, G., Touretzky, D.S., & Leen, T.K. (Eds.), Advances in neural information processing systems (pp. 617–624). Cambridge, MA: MIT Press.Google Scholar

Ghahramani, Z., Jordan, M.I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245–273.CrossRef Google Scholar

Goodman, L. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215–231.CrossRef Google Scholar

Hagenaars, J.A. (1993). Loglinear models with latent variables. Newbury Park, CA: Sage.CrossRef Google Scholar

Hagenaars, J.A. (1998). Categorical causal modeling: Latent class analysis and directed log-linear models with latent variables. Sociological Methods and Research, 26, 436–486.CrossRef Google Scholar

Hall, P., Humphreys, K., & Titterington, D.M. (2002). On the adequacy of variational lower bound functions for likelihood-based inference in Markovian models with missing values. Journal of the Royal Statistical Society, Series B, 64, 549–564.CrossRef Google Scholar

Humphreys, K., Titterington, D.M. (1999). The exploration of new methods for learning in binary Boltzmann machines. In Heckerman, D., & Whittaker, J. (Eds.), Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics (pp. 209–214). San Francisco, CA: Morgan Kaufmann.Google Scholar

Humphreys, K., Titterington, D.M. (2000). Improving the mean field approximation in belief networks using Bahadur's reparameterization of the multivariate binary distribution. Neural Processing Letters, 12, 183–197.CrossRef Google Scholar

Jensen, F. (1996). An introduction to Bayesian networks. London, U.K.: UCL Press.Google Scholar

Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. In Jordan, M.I. (Eds.), Learning in graphical models (pp. 105–161). Dordrecht, The Netherlands: Kluwer.Google Scholar

Jöreskog, K.G. (1979). Statistical estimation of structural models in longitudinal-development investigations. In Nesselroade, J.R., & Baltes, P.B. (Eds.), Longitudinal research in the study of behavior and development (pp. 303–351). New York, NY: Academic Press.Google Scholar

Jöreskog, K.G., & Sörbom, D. (1984). LISREL VI: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood. Chicago, IL: Scientific software.Google Scholar

Lange, K., & Elston, R.C. (1975). Extension to pedigree analysis: Likelihood computations for simple and complex pedigrees. Human Heredity, 25, 95–105.CrossRef Google Scholar

Langeheine, R. (1994). Latent variables Markov models. In Von Eye, A., & Clogg, C.C. (Eds.), Latent variables analysis: Applications for developmental research (pp. 373–395). Beverly Hills, CA: Sage.Google Scholar

Lauritzen, S.L. (1995). The EM algorithm for graphical association models. Computational Statistics and Data Analysis, 10, 191–200.CrossRef Google Scholar

Lauritzen, S.L. (1996). Graphical models. Oxford, U.K.: Clarendon Press.CrossRef Google Scholar

Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their applications to expert systems (with discussion). Journal of the Royal Statistical Society, Series B, 50, 157–224.CrossRef Google Scholar

Lazarsfeld, P.F., & Henry, N.W. (1968). Latent structure analysis. Boston, MA: Houghton-Mifflin.Google Scholar

MacDonald, I.L., & Zucchini, W. (1997). Hidden Markov and other models for discrete-valued time series. London, U.K.: Chapman and Hall.Google Scholar

McArdle, J.J., & Aber, M.S. (1990). Patterns of change within latent structure equation models. In von Eye, A. (Eds.), Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 151–224). Boston, MA: Academic Press.CrossRef Google Scholar

McHugh, R.B. (1956). Efficient estimation and local identification in latent class analysis. Psychometrika, 21, 331–347.CrossRef Google Scholar

Neal, R.M., & Hinton, G.E. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan, M.I. (Eds.), Learning in graphical models (pp. 355–368). Cambridge, MA: MIT Press.Google Scholar

Ng, A.Y., & Jordan, M.I. (2000). Approximate inference algorithms for two-layer Bayesian networks. In Solla, S.A., Leen, T.K., & Müller, K.-R. (Eds.), Advances in neural information processing systems (pp. 533–539). Cambridge, MA: MIT Press.Google Scholar

Olsson, U., & Bergman, L.R. (1977). A longitudinal factor model for studying change in ability structure. Multivariate Behavioral Research, 12, 221–241.CrossRef Google Scholar PubMed

Opper, M., & Saad, D. (2001). Advanced mean field methods: Theory and practice. Cambridge, MA: MIT Press.CrossRef Google Scholar

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.Google Scholar

Pearl, J. (1998). Graphs, causality and structural equation models. Sociological Methods and Research, 27, 226–284.CrossRef Google Scholar

Pearl, J. (2000). Causality. Cambridge, U.K.: Cambridge University Press.Google Scholar

Peterson, C., & Anderson, J.R. (1987). A mean field theory learning algorithm for neural networks. Complex Systems, 1, 995–1019.Google Scholar

Pfeffermann, D., Skinner, C.J., & Humphreys, K. (1998). The estimation of gross flows in the presence of measurement error using auxiliary variables. Journal of the Royal Statistical Society, Series A, 161, 13–32.CrossRef Google Scholar

Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3, 4–16.CrossRef Google Scholar

Reinecke, J. (1997). Testing the theory of planned behavior with latent Markov models. In Rost, J., & Langeheine, R. (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 398–411). Münster, Germany: Waxmann.Google Scholar

Reinecke, J., Schmidt, P., & Ajzen, I. (1996). Application of the theory of planned behavior to adolescents' condom use: A panel study. Journal of Applied Social Psychology, 26, 749–772.CrossRef Google Scholar

Saul, L.K., Jaakkola, T., & Jordan, M.I. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4, 61–76.CrossRef Google Scholar

Saul, L.K., & Jordan, M.I. (1995). Boltzmann Chains and Hidden Markov Models. In Tesauro, G., Touretzky, D.S., & Leen, T.K. (Eds.), Advances in neural information processing systems (pp. 435–442). Cambridge, MA: MIT Press.Google Scholar

Saul, L.K., & Jordan, M.I. (1996). Exploiting tractable substructures in intractable networks. In Touretzky, D.S., Mozer, M.C., & Hasselmo, M.E. (Eds.), Advances in neural information processing systems (pp. 486–492). Cambridge, MA: MIT Press.Google Scholar

Seung, H. (1995). Annealed theories of learning. In Oh, J.-H., Kwon, C., & Cho, S. (Eds.), Neural networks: The statistical mechanics perspective, Proceedings of the CTP-PRSRI Joint workshop on theoretical physics. Singapore, Malaysia: World Scientific.Google Scholar

Smyth, P. (1997). Clustering sequences with hidden Markov models. In Mozer, M.C., Jordan, M.I., & Petsche, T. (Eds.), Advances in neural information processing systems (pp. 648–654). Cambridge, MA: MIT Press.Google Scholar

Smyth, P., Heckerman, D., & Jordan, M.I. (1997). Probability independence networks for hidden Markov probability models. Neural Computation, 9, 227–269.CrossRef Google Scholar PubMed

Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In von Eye, A. (Eds.), Statistical methods in longitudinal research: Volume 1, Principles and structuring change (pp. 125–150). Boston, MA: Academic Press.CrossRef Google Scholar

van de Pol, F., & Langeheine, R. (1990). Mixed Markov latent class models. In Clogg, C.C. (Eds.), Sociological methodology (pp. 213–247). Oxford, U.K.: Blackwell.Google Scholar

West, S.G., Finch, J.F., & Curran, P.J. (1995). Structural equation models with nonnormal variables. In Hoyle, R. (Eds.), Structural equation modeling concepts, issues and applications. Thousand Oaks, CA: Sage.Google Scholar

Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York, NY: John Wiley & Sons.Google Scholar

Wiegerinck, W., & Barber, D. (1999). Variational belief networks for approximate inference. In La Poutre, , & van den Herik, (Eds.), Proceedings of the Tenth Netherlands/Belgium Conference on Artificial Intelligence (pp. 177–183). Amsterdam, The Netherlands: CWI.Google Scholar

Wiggins, L.M. (1955). Mathematical models for the analysis of multi-wave panels. New York City, NY: Columbia University.Google Scholar

Wiggins, L.M. (1973). Panel Aanalysis: Latent probability models for attitude and behavioral processes. San Francisco, CA: Jossey-Bass/Elsevier.Google Scholar

Zhang, J. (1996). The application of the Gibbs-Bogoliubov-Feynman inequality in mean field calculations for Markov random fields. IEEE Transactions on Image Processing, 5, 1208–1214.CrossRef Google Scholar PubMed

Article contents

Variational Approximations for Categorical Causal Modeling With Latent Variables

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests