Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-26T11:13:12.489Z Has data issue: false hasContentIssue false

Inferring structure in bipartite networks using the latent blockmodel and exact ICL

Published online by Cambridge University Press:  01 February 2017

JASON WYSE
Affiliation:
Discipline of Statistics, School of Computer Science and Statistics, Trinity College Dublin, College Green, Dublin 2, Ireland (e-mail: [email protected])
NIAL FRIEL
Affiliation:
School of Mathematical Sciences and Insight: The National Centre for Big Data Analytics, University College Dublin, Belfield, Dublin 4, Ireland (e-mail: [email protected])
PIERRE LATOUCHE
Affiliation:
Laboratoire SAMM, Université Paris 1 Panthéon-Sorbonne, 90 rue de Tolbiac, F-75634 Paris Cedex 13, France (e-mail: [email protected])

Abstract

We consider the task of simultaneous clustering of the two node sets involved in a bipartite network. The approach we adopt is based on use of the exact integrated complete likelihood for the latent blockmodel. Using this allows one to infer the number of clusters as well as cluster memberships using a greedy search. This gives a model-based clustering of the node sets. Experiments on simulated bipartite network data show that the greedy search approach is vastly more scalable than competing Markov chain Monte Carlo-based methods. Application to a number of real observed bipartite networks demonstrate the algorithms discussed.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Besag, J. (1986). On the statistical analysis of dirty pictures (with discussion). Journal of the Royal Statistical Society, Series B, 48 (3), 259302.Google Scholar
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (7), 719725.Google Scholar
Borgatti, S. P., & Everett, M. G. (1997). Network analysis of 2-mode data. Social Networks, 19, 243269.Google Scholar
Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new information measure of complexity. In Bozdogan, H., Sclove, S. L., Gupta, A. K., Haughton, D., Kitagawa, G., Ozaki, T. & Tanabe, K. (Eds.), Proceedings of the 1st US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, vol. 2, Multivariate Statistical Modelling', Springer Netherlands, Dordrecht, pp. 69113.Google Scholar
Brusco, M., & Steinley, D. (2011). A tabu search heuristic for deterministic two-mode blockmodelling of binary network matrices. Psychometrika, 76 (4), 612633.CrossRefGoogle Scholar
Brusco, M., Doreian, P., Lloyd, P., & Steinley, D. (2013). A variable neighbourhood search method for a two-mode blockmodelling problem in social network analysis. Network Science, 1 (2), 191212.Google Scholar
Brusco, M. J., & Steinley, D. (2006). Inducing a blockmodel structure on two-mode data using seriation procedures. Journal of Mathematical Psychology, 50 (5), 468477.Google Scholar
Côme, E., & Latouche, P. (2015). Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Statistical Modelling, 15 (6), 564589.CrossRefGoogle Scholar
Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode network data. Social Networks, 26 (4), 2953.CrossRefGoogle Scholar
Doreian, P., Lloyd, P., & Mrvar, A. (2013). Partitioning large signed two-mode networks: Problems and prospects. Social Networks, 35 (2), 121.CrossRefGoogle Scholar
Flynn, C. J., & Perry, P. O. (2016). Consistent Biclustering. e-print. arXiv:1206.6927.Google Scholar
Govaert, G. (1995). Simultaneous clustering of rows and columns. Control Cybernet, 24 (4), 437458.Google Scholar
Govaert, G., & Nadif, M. (1996). Comparison of the mixture and the classification maximum likelihood in cluster analysis when data are binary. Computational Statistics and Data Analysis, 23 (1), 6581.Google Scholar
Govaert, G., & Nadif, M. (2003). Clustering with block mixture models. Pattern Recognition, 36 (2), 463473.Google Scholar
Govaert, G., & Nadif, M. (2005). An em algorithm for the block mixture model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (4), 643647.Google Scholar
Govaert, G., & Nadif, M. (2008). Block clustering with Bernoulli mixture models: Comparison of different approaches. Computational Statistics and Data Analysis, 52 (6), 32333245.Google Scholar
Hartigan, J. A. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67 (337), 123129.Google Scholar
Keribin, C., Brault, V., Celeux, G., & Govaert, G. (2012). Model selection for the binary latent block model. 20th International Conference on Computational Statistics (COMPSTAT 2012), 379–390.Google Scholar
Keribin, C., Govaert, G., & Celeux, G. (2010). Estimation d'un modèle à blocs latents par l'algorithme SEM. 42émes Journées de Statistique. Marseille, France. Retrieved from https://hal.inria.fr/inria-00494796.Google Scholar
Keribin, C., Brault, V., Celeux, G., & Govaert, G. (2015). Estimation and Selection for the Latent Block Model on Categorical Data. Rapport de recherche RR-8264. INRIA.CrossRefGoogle Scholar
Nobile, A., & Fearnside, A. T. (2007). Bayesian finite mixtures with an unknown number of components: The allocation sampler. Statistics and Computing, 17 (2), 147162.CrossRefGoogle Scholar
Nowicki, K., & Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96 (455), 10771087.Google Scholar
Opsahl, T. (2013). Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35 (2), 159167.Google Scholar
Rohe, K., Qin, T., & Yu, B. (2016). Co-clustering directed graphs to discover asymmetries and directional communities. Proceedings of the National Academy of Sciences, 113 (45), 1267912684.CrossRefGoogle ScholarPubMed
van Dijk, B., van Rosmalen, J., & Paap, R. (2009). A Bayesian approach to two-mode clustering (No. EI 2009-06). Report/Econometric Institute, Erasmus University Rotterdam (pp. 1–26). Erasmus School of Economics. Retrieved from http://hdl.handle.net/1765/15112.Google Scholar
Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11, 28372854.Google Scholar
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge, United Kingdom: Cambridge University Press.Google Scholar
Wyse, J., & Friel, N. (2012). Block clustering with collapsed latent block models. Statistics and Computing, 22 (2), 415428.Google Scholar