Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2025-01-05T14:58:28.222Z Has data issue: false hasContentIssue false

Multinomial Logistic Factor Regression for Multi-source Functional Block-wise Missing Data

Published online by Cambridge University Press:  01 January 2025

Xiuli Du*
Affiliation:
Nanjing Normal University
Xiaohu Jiang
Affiliation:
Nanjing Normal University
Jinguan Lin
Affiliation:
Nanjing Audit University
*
Correspondence should be made to Xiuli Du, College of Mathematical Sciences, Nanjing Normal University, Nanjing 210023, China. Email: [email protected]

Abstract

Multi-source functional block-wise missing data arise more commonly in medical care recently with the rapid development of big data and medical technology, hence there is an urgent need to develop efficient dimension reduction to extract important information for classification under such data. However, most existing methods for classification problems consider high-dimensional data as covariates. In the paper, we propose a novel multinomial imputed-factor Logistic regression model with multi-source functional block-wise missing data as covariates. Our main contribution is to establishing two multinomial factor regression models by using the imputed multi-source functional principal component scores and imputed canonical scores as covariates, respectively, where the missing factors are imputed by both the conditional mean imputation and the multiple block-wise imputation approaches. Specifically, the univariate FPCA is carried out for the observable data of each data source firstly to obtain the univariate principal component scores and the eigenfunctions. Then, the block-wise missing univariate principal component scores instead of the block-wise missing functional data are imputed by the conditional mean imputation method and the multiple block-wise imputation method, respectively. After that, based on the imputed univariate factors, the multi-source principal component scores are constructed by using the relationship between the multi-source principal component scores and the univariate principal component scores; and at the same time, the canonical scores are obtained by the multiple-set canonial correlation analysis. Finally, the multinomial imputed-factor Logistic regression model is established with the multi-source principal component scores or the canonical scores as factors. Numerical simulations and real data analysis on ADNI data show the proposed method works well.

Type
Theory & Methods
Copyright
Copyright © 2023 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-023-09918-5.

References

Bai, J., &Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica,70 (1),191221.CrossRefGoogle Scholar
Bai, J. S., &Li, K. P. (2012). Statistical analysis of factor models of high dimension. The Annals of Statistics,40 (1),436465.CrossRefGoogle Scholar
Bair, E., Hastie, T., Paul, D., &Tibshirani, R. (2006). Prediction by supervised principal components. Journal of the American Statistical Association,101, 119137.CrossRefGoogle Scholar
Berrendero, J. R., Justel, A., &Svarc, M. (2011). Principal components for multivariate functional data. Computational Statistics and Data Analysis,55 (9),26192634.CrossRefGoogle Scholar
Cai, T., Cai, T. T., &Zhang, A. (2016). Structured matrix completion with applications to genomic data integration. Journal of the American Statistical Association,111 (514),621633.CrossRefGoogle ScholarPubMed
Campos, S., Pizarro, L., Valle, C., Gray, K. R., Rueckert, D., & Allende, H. (2015). Evaluating imputation techniques for missing data in ADNI: A patient classification study. Iberoamerican congress on pattern Recognition, Vol. 9423, pp. 3–10. Cham, Switzerland: Springer.CrossRefGoogle Scholar
Chiou, J. M., Chen, Y. T., &Yang, Y. F. (2014). Multivariate functional principal component analysis: A normalization approach. Statistica Sinica,24, 15711596.Google Scholar
Choi, J. Y., Hwang, H., &Yamamoto, M. et al. (2017). A unified approach to functional principal component analysis and functional multiple-set canonical correlation. Psychometrika,82, 427441.CrossRefGoogle ScholarPubMed
Correa, N. M., Eichele, T., Adali, T., Li, Y., &Calhoun, V. D. (2010). Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI. NeuroImage,50, 14381445.CrossRefGoogle ScholarPubMed
Gao, Q., &Lee, T. C. (2017). High-dimensional variable selection in regression and classification with missing data. Signal Processing the Official Publication of the European Association for Signal Processing,131, 17.Google Scholar
Happ, C., &Greven, S. (2018). Multivariate functional principal component analysis for data observed on different (dimensional) domains. Journal of the American Statistical Association,113 (522),649659.CrossRefGoogle Scholar
He, Y., Kong, X. B., Yu, L., &Zhang, X. S. (2022). Large-dimensional factor analysis without moment constraints. Journal of Business & Economic Statistics,40 (1),302312.CrossRefGoogle Scholar
Hwang, H., Jung, K., &Takane, Y. et al. (2012). Functional multiple-set canonical correlation analysis. Psychometrika,77, 4864.CrossRefGoogle Scholar
Hwang, H., Jung, K., Takane, Y., &Woodward, T. S. (2013). A unified approach to multiple-set canonical correlation analysis and principal components analysis. British Journal of Mathematical & Statistical Psychology,66 (2),308321.CrossRefGoogle ScholarPubMed
Jacques, J., &Preda, C. (2014). Model-based clustering for multivariate functional data. Computational Statistics and Data Analysis,71, 92106.CrossRefGoogle Scholar
Koldar, T. G., &Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review,51, 455500.CrossRefGoogle Scholar
Li, Y., Wang, N., &Carroll, R. J. (2013). Selecting the number of principal components in functional data. Journal of the American Statistical Association,108, 12841294.CrossRefGoogle ScholarPubMed
Liu, M., Zhang, J., Yap, P. T., &Shen, D. (2017). View-aligned hypergraph learning for Alzheimer’s disease diagnosis with incomplete multi-modality data. Medical Image Analysis,36, 123134.CrossRefGoogle ScholarPubMed
Poldrack, R. A., Mumford, J. A., &Nichols, T. E.(2011). Handbook of functional MRI data analysis,Cambridge University Press.CrossRefGoogle Scholar
Ramsay, J. O., & Silverman, B. W.(2005). Functional data analysis,Berlin: Springer.CrossRefGoogle Scholar
Saporta, G. (1981). Méthodes exploratoires d’analyse de données temporelles. Cahiers Du Bureau Universitaire De Recherche Opérationnelle Série Recherche,37, 7194.Google Scholar
Takane, Y., &Hwang, H. (2002). Generalized constrained canonical correlation analysis. Multivariate Behavioral Research,37, 163195.CrossRefGoogle Scholar
Takane, Y., Hwang, H., &Abdi, H. (2008). Regularized multiple-set canonical correlation analysis. Psychometrika,73, 753775.CrossRefGoogle Scholar
Tenenhaus, A., &Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika,76, 257284.CrossRefGoogle Scholar
Tenenhaus, M., Tenenhaus, A., &Groenen, P. J. F. (2017). Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika,82, 737777.CrossRefGoogle Scholar
Tenenhaus, A., Philippe, C., &Frouin, V. (2015). Kernel generalized canonical correlation analysis. Computational Statistics & Data Analysis,90, 114131.CrossRefGoogle Scholar
Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P. M., Ye, J., &Initiative, Alzheimer’s Disease Neuroimaging (2014). Bi-level multi-source learning for heterogeneous block-wise missing data. NeuroImage,102, 192206.CrossRefGoogle ScholarPubMed
Xue, F., &Qu, A. (2021). Integrating multisource block-wise missing data in model selection. Journal of the American Statistical Association,116 (536),19141927.CrossRefGoogle Scholar
Yao, F., Müller, H. G., &Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association,100 (470),577590.CrossRefGoogle Scholar
Yu, G., Li, Q., Shen, D., &Liu, Y. (2020). Optimal sparse linear prediction for block-missing multi-modality data without imputation. Journal of the American Statistical Association,115 (531),14061419.CrossRefGoogle ScholarPubMed
Yuan, L., Wang, Y., Thompson, P. M., Narayan, V. A., Ye, J., &Initiative, Alzheimer’s Disease Neuroimaging (2012). Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage,61 (3),622632.CrossRefGoogle ScholarPubMed
Zhang, Y., Tang, N., &Qu, A. (2020). Imputed factor regression for high-dimensional block-wise missing data. Statistica Sinica,30 (2),631651.Google Scholar
Zhu, H., Shen, D., Peng, X., &Liu, L. Y. (2017). MWPCR: Multiscale weighted principal component regression for high-dimensional prediction. Journal of the American Statistical Association,112, 10091021.CrossRefGoogle ScholarPubMed
Supplementary material: File

Du et al. supplementary material

Du et al. supplementary material 1
Download Du et al. supplementary material(File)
File 1.5 MB
Supplementary material: File

Du et al. supplementary material

Du et al. supplementary material 2
Download Du et al. supplementary material(File)
File 537.5 KB