Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2025-01-05T14:24:30.725Z Has data issue: false hasContentIssue false

A Unified Neural Network Framework for Extended Redundancy Analysis

Published online by Cambridge University Press:  01 January 2025

Ranjith Vijayakumar
Affiliation:
National University Of Singapore
Ji Yeh Choi*
Affiliation:
York University
Eun Hwa Jung
Affiliation:
Kookmin University
*
Correspondence should be made to Ji Yeh Choi, Department of Psychology, York University, 4700 Keele St., Toronto, ON, Canada. Email: [email protected]

Abstract

Component-based approaches have been regarded as a tool for dimension reduction to predict outcomes from observed variables in regression applications. Extended redundancy analysis (ERA) is one such component-based approach which reduces predictors to components explaining maximum variance in the outcome variables. In many instances, ERA can be extended to capture nonlinearity and interactions between observed and components, but only by specifying a priori functional form. Meanwhile, machine learning methods like neural networks are typically used in a data-driven manner to capture nonlinearity without specifying the exact functional form. In this paper, we introduce a new method that integrates neural networks algorithms into the framework of ERA, called NN-ERA, to capture any non-specified nonlinear relationships among multiple sets of observed variables for constructing components. Simulations and empirical datasets are used to demonstrate the usefulness of NN-ERA. The conclusion is that in social science datasets with unstructured data, where we expect nonlinear relationships that cannot be specified a priori, NN-ERA with its neural network algorithmic structure can serve as a useful tool to specify and test models otherwise not captured by the conventional component-based models.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-022-09853-x.

References

Al-Alawi, S. M., Abdul-Wahab, S. A., Bakheit, C. S., (2008). Combining principal component regression and artificial neural networks for more accurate predictions of ground-level ozone Environmental Modelling and Software 23 (4) 396403 10.1016/j.envsoft.2006.08.007CrossRefGoogle Scholar
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. arXiv:1206.5533.Google Scholar
Berk, R. A. (2016). Statistical learning from a regression perspective (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-77501-2.CrossRefGoogle Scholar
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press.CrossRefGoogle Scholar
Bottou, L. (2012). Stochastic gradient descent tricks. In Montavon, G. Orr, G. B. & Müller, K. R. (Eds.), Neural networks: Tricks of the trade. Lecture notes in computer science (Vol. 7700, pp. 421–436). Springer. https://doi.org/10.1007/978-3-642-35289-8_25.CrossRefGoogle Scholar
Breiman, L., (2001). Random forests Machine Learning 45 532 10.1023/A:1010933404324CrossRefGoogle Scholar
Buckler, F. (2003). NEUSREL: Using neural networks to reveal causal relationships and present them in an understandable way. In Neural networks in marketing management (pp. 103–126). Gabler VerlagGoogle Scholar
Buckler, F., & Hennig-Thurau, T. (2008). Identifying hidden structures in marketing’s structural models through universal structure modeling: An explorative Bayesian neural network complement to LISREL and PLS. Marketing - Journal of Research in Management, 4(2), 49–68. https://doi.org/10.15358/0344-1369-2008-jrm-2-47CrossRefGoogle Scholar
Byrd, R. H., Lu, P., Nocedal, J., Zhu, C., (1995). A limited memory algorithm for bound constrained optimization SIAM Journal on Scientific Computing 16 (5) 11901208 10.1137/0916069CrossRefGoogle Scholar
Coveney, P. V., Dougherty, E. R., Highfield, R. R., (2016). Big data need big theory too Philosophical Transactions of Royal Society A 374 20160153 10.1098/rsta.2016.0153CrossRefGoogle ScholarPubMed
Choi, J. Y., Kyung, M., Hwang, H., Park, J-H (2020). Bayesian extended redundancy analysis: A Bayesian approach to component-based regression with dimension reduction Multivariate Behavioral Research 55 (1) 3048 10.1080/00273171.2019.1598837 31021267CrossRefGoogle ScholarPubMed
de Leeuw, J., Young, F. W., Takane, Y., (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features Psychometrika 41 (4) 471503 10.1007/BF02296971CrossRefGoogle Scholar
Diamantaras, K. I., & Kung, S. Y. (1996). Principal component neural networks: Theory and applications. Wiley.Google Scholar
Freitas, A. A., (2006). Are we really discovering interesting knowledge from data? Expert Update 9 (1) 4147Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics. JSTOR. https://doi.org/10.1214/aos/1013203451CrossRefGoogle Scholar
Girosi, F., Jones, M., Poggio, T., (1995). Regularization theory and neural networks architectures Neural Computation 7 219269 10.1162/neco.1995.7.2.219CrossRefGoogle Scholar
Gunther, F., Fritsch, S., (2010). neuralnet: Training of neural networks The R Journal 2 (1) 3038 10.32614/RJ-2010-006CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2016). Elements of statistical learning: Data mining, inference, and prediction (2nd ed). Springer. https://doi.org/10.1007/978-0-387-84858-7.CrossRefGoogle Scholar
Jolliffe, I. T., (1982). A note on the use of principal components in regression Applied Statistics 10.2307/2348005CrossRefGoogle Scholar
Kelava, A., Werner, C. S., Schermelleh-Engel, K., Moosbrugger, H., Zapf, D., Ma, Y., Cham, H., Aiken, L. S., West, S. G., (2011). Advanced nonlinear latent variable modeling: Distribution analytic LMS and QML estimators of interaction and quadratic effects Structural Equation Modeling: A Multidisciplinary Journal 18 (3) 465491 10.1080/10705511.2011.582408CrossRefGoogle Scholar
Kenny, D. A., Judd, C. M., (1984). Estimating the nonlinear and interactive effects of latent variables Psychological Bulletin 96 (1) 201210 10.1037/0033-2909.96.1.201CrossRefGoogle Scholar
Klein, A. G., Moosbrugger, H., (2000). Maximum likelihood estimation of latent interaction effects with the LMS method Psychometrika 65 (4) 457474 10.1007/bf02296338CrossRefGoogle Scholar
Kok, B. C., Choi, J. S., Oh, H., Choi, J. Y., (2019). Sparse extended redundancy analysis: Variable selection via the exclusive LASSO Multivariate Behavioral Research 10.1080/00273171.2019.1694477 31777286Google ScholarPubMed
LeCun, Y., Bengio, Y., Hinton, G., (2015). Deep learning Nature 521 (7553) 436444 10.1038/nature14539 26017442CrossRefGoogle ScholarPubMed
Marsh, H. W., Wen, Z., Hau, K. T., (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction Psychological Methods 9 (3) 275300 10.1037/1082-989X.9.3.275 15355150CrossRefGoogle ScholarPubMed
McIntosh, C. N., Edwards, J. R., Antonakis, J., (2014). Reflections on partial least squares path modeling Organizational Research Methods 17 (2) 210251 10.1177/1094428114529165CrossRefGoogle Scholar
Moody, J., Hanson, S., Krog, H. A., Hertz, J. A., (1995). A simple weight decay can improve generalization Advances in Neural Information Processing Systems 4 950957Google Scholar
Muthen, B., & Asparouhov, T. (2011). Beyond multilevel regression modeling: Multilevel analysis in a general latent variable framework. In Hox, J., Roberts, J. (Eds.). Handbook of advanced multilevel analysis (pp. 15–40). Routledge. https://doi.org/10.4324/9780203848852.CrossRefGoogle Scholar
Nowlan, S. J., Hinton, G. E., (1992). Simplifying neural networks by soft weight-sharing Neural Computation 4 473493 10.1162/neco.1992.4.4.473CrossRefGoogle Scholar
Nwankpa, C. E., Ijomah, W., Gachagan, A., & Marshall, S. (2018). Activation functions: Comparison of trends in practice and research for deep learning. arXiv:8110.3378.Google Scholar
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and psychometric properties of LIWC2007. LIWC Inc.Google Scholar
R Core Team. (2013). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/Google Scholar
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge University Press.CrossRefGoogle Scholar
Rosipal, R., Trejo, L., (2001). Kernel partial least squares regression in reproducing kernel Hilbert space Journal of Machine Learning Research 2 97123 10.1162/15324430260185556Google Scholar
Schölkopf, B., Smola, A., & Müller, K.-R. (1997). Kernel principal component analysis. International conference on artificial neural network. In W. Gerstner, A. Germond, M. Hasler, & J. D. Nicoud (Eds.), Artificial neural networks—ICANN 1997 (pp. 583–588). Springer.CrossRefGoogle Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., (2014). Dropout: A simple way to prevent neural networks from overfitting Journal of Machine Learning Research 15 19291958 10.5555/2627435.2670313Google Scholar
Takane, Y., Hwang, H., (2005). An extended redundancy analysis and its applications to two practical examples Computational Statistics & Data Analysis 49 (3) 785808 10.1016/j.csda.2004.06.004CrossRefGoogle Scholar
Vapnik, V. (1995). The nature of statistical learning theory. Springer. https://doi.org/10.1007/978-1-4757-2440-0.CrossRefGoogle Scholar
Wold, H. (1973). Nonlinear iterative partial least squares (NIPALS) modeling: Some current developments. In P. R. Krishnaiah (Ed.), Multivariate analysis (pp. 383–487). Academic Press.Google Scholar
Wolpert, D., (1996). The lack of a priori distinctions between learning algorithms Neural Computation 8 (7) 13411390 10.1162/neco.1996.8.7.1341CrossRefGoogle Scholar
Wu, W., Massart, D. L., de Jong, S., (1997). The kernel PCA algorithms for wide data. Part I: Theory and algorithms Chemometrics and Intelligent Laboratory Systems 36 (2) 165172 10.1016/S0169-7439(97)00010-5CrossRefGoogle Scholar
Yalcin, I., Amemiya, Y., (2001). Nonlinear factor analysis as a statistical method Statistical Science 16 (3) 275294 10.1214/ss/1009213729Google Scholar
Supplementary material: File

Vijayakumar et al. supplementary material

Appendix A and B
Download Vijayakumar et al. supplementary material(File)
File 78 KB