Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-09T13:59:49.899Z Has data issue: false hasContentIssue false

A network community detection method with integration of data from multiple layers and node attributes

Published online by Cambridge University Press:  07 March 2023

Hannu Reittu*
Affiliation:
VTT Technical Research Centre of Finland, Espoo, Finland
Lasse Leskelä
Affiliation:
School of Science, Department of Mathematics and System Analysis, Aalto University, Espoo, Finland
Tomi Räty
Affiliation:
Microsoft, One Microsoft Way, Redmond, WA, USA
*
*Corresponding author. Email: [email protected]

Abstract

Multilayer networks are in the focus of the current complex network study. In such networks, multiple types of links may exist as well as many attributes for nodes. To fully use multilayer—and other types of complex networks in applications, the merging of various data with topological information renders a powerful analysis. First, we suggest a simple way of representing network data in a data matrix where rows correspond to the nodes and columns correspond to the data items. The number of columns is allowed to be arbitrary, so that the data matrix can be easily expanded by adding columns. The data matrix can be chosen according to targets of the analysis and may vary a lot from case to case. Next, we partition the rows of the data matrix into communities using a method which allows maximal compression of the data matrix. For compressing a data matrix, we suggest to extend so-called regular decomposition method for non-square matrices. We illustrate our method for several types of data matrices, in particular, distance matrices, and matrices obtained by augmenting a distance matrix by a column of node degrees, or by concatenating several distance matrices corresponding to layers of a multilayer network. We illustrate our method with synthetic power-law graphs and two real networks: an Internet autonomous systems graph and a world airline graph. We compare the outputs of different community recovery methods on these graphs and discuss how incorporating node degrees as a separate column to the data matrix leads our method to identify community structures well-aligned with tiered hierarchical structures commonly encountered in complex scale-free networks.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Action Editor: Matteo Magnani

References

Abbe, E. (2017). Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(1), 64466531.Google Scholar
Avrachenkov, K., Dreveton, M., & Leskelä, L. (2022). Community recovery in non-binary and temporal stochastic block models. Retrieved from https://arxiv.org/abs/2008.04790Google Scholar
Bang-Jensen, J., & Gutin, G. Z. (2009). Digraphs: Theory, algorithms and applications. Springer.10.1007/978-1-84800-998-1CrossRefGoogle Scholar
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509512.10.1126/science.286.5439.509CrossRefGoogle ScholarPubMed
Bhamidi, S., van der Hofstad, R., & Hooghiemstra, G. (2017). Universality for first passage percolation on sparse random graphs. Annals of Probability, 45(4), 25682630.10.1214/16-AOP1120CrossRefGoogle Scholar
Bhattacharyya, S., & Bickel, P. (2014). Community detection in networks using graph distance. In Networks with community structure workshop, Eurandom 2014 (pp. 4046). Eurandom.Google Scholar
Bolla, M. (2013). Spectral clustering and biclustering: Learning large graphs and contingency tables. West Sussex, UK: John Wiley and Sons, 1268.10.1002/9781118650684CrossRefGoogle Scholar
Chen, Q., Chang, H., Govindan, R., Jamin, S., Shenker, S., & Willinger, W. (2002). The origin of power laws in internet topologies revisited. In INFOCOM 2002. 22st annual joint conference of the IEEE computer and communications societies, IEEE (pp. 608617).Google Scholar
Contisciani, M., Power, E. A., & Bacco, C. D. (2020). Community detection with node attributes in multilayer networks. Scientific Reports, 10(15736), 1–16.10.1038/s41598-020-72626-yCrossRefGoogle ScholarPubMed
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed.). New York: John Wiley and Sons Inc.Google Scholar
De Domenico, M., Lancichinetti, A., Arenas, A., & Rosvall, M. (2015). Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Physical Review X, 5(1), 111.10.1103/PhysRevX.5.011027CrossRefGoogle Scholar
Fajardo-Fontiveros, O., Guimerà, R., & Sales-Pardo, M. (2022). Node metadata can produce predictability crossovers in network inference problems. Physical Review X, 12(1), 011010.10.1103/PhysRevX.12.011010CrossRefGoogle Scholar
Faloutsos, M., Faloutsos, P., & Faloutsos, C. (1999). On power-law relationships of the internet topology. In Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM ’99, New York, NY, USA: Association for Computing Machinery, (pp. 251262).Google Scholar
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75174.10.1016/j.physrep.2009.11.002CrossRefGoogle Scholar
Gastner, M. T., & Newman, M. E. J. (2006). The spatial structure of networks. European Physical Journal B, 49(2), 247252.10.1140/epjb/e2006-00046-8CrossRefGoogle Scholar
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 78217826.10.1073/pnas.122653799CrossRefGoogle ScholarPubMed
Grünwald, P. D. (2007). The minimum description length principle. Cambridge, MA: MIT Press.10.7551/mitpress/4643.001.0001CrossRefGoogle Scholar
Haryo, L., & Pulungan, R. (2022). Performance evaluation of regular decomposition and benchmark clustering methods. In International conference on future data and security engineering, Springer, (pp. 176191).10.1007/978-981-19-8069-5_12CrossRefGoogle Scholar
Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109137.10.1016/0378-8733(83)90021-7CrossRefGoogle Scholar
Hric, D., Peixoto, T. P., & Fortunato, S. (2016). Network structure, metadata, and the prediction of missing nodes and annotations. Physical Review X, 6(3), 031038.10.1103/PhysRevX.6.031038CrossRefGoogle Scholar
Interdonato, R., Atzmueller, M., Gaito, S., Kanawati, R., & Largeron, C. (2019). Feature-rich networks: Going beyond complex network topologies. Applied Network Science, 4(1), 4.10.1007/s41109-019-0111-xCrossRefGoogle Scholar
Jorritsma, J., & Komjáthy, J. (2020). Weighted distances in scale-free preferential attachment models. Random Structures & Algorithms, 57(3), 823859.10.1002/rsa.20947CrossRefGoogle Scholar
Karrer, B., & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E, 83(1), 016107.10.1103/PhysRevE.83.016107CrossRefGoogle ScholarPubMed
Ketchen, D. J., & Shook, C. L. (1996). The application of cluster analysis in strategic management research: An analysis and critique. Strategic Management Journal, 17(6), 441458.10.1002/(SICI)1097-0266(199606)17:6<441::AID-SMJ819>3.0.CO;2-G3.0.CO;2-G>CrossRefGoogle Scholar
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J., Moreno, Y., & Moreno, M. (2014). Multilayer networks. Journal of Complex Networks, 2(3), 203271.10.1093/comnet/cnu016CrossRefGoogle Scholar
Lei, J., & Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Annals of Statistics, 43(1), 215237.10.1214/14-AOS1274CrossRefGoogle Scholar
Magnani, M., Hanteer, O., Interdonato, R., Rossi, L., & Tagarelli, A. (2021). Community detection in multiplex networks. ACM Computing Surveys, 54(3), 135.10.1145/3444688CrossRefGoogle Scholar
Negre, C. F. A., Ushijima-Mwesigwa, H., & Mniszewski, S. M. (2020). Detecting multiple communities using quantum annealing on the D-wave system. PLOS ONE, 15(2), 114.10.1371/journal.pone.0227538CrossRefGoogle ScholarPubMed
Nepusz, T., Négyessy, L., Tusnády, G., & Bazsó, F. (2008). Reconstructing cortical networks: Case of directed graphs with high level of reciprocity. In: Bollobás, B., Kozma, R., & Miklós, D., eds. Handbook of large-scale random networks, Vol. 18, (pp. 325368). Berlin, Heidelberg: Springer.10.1007/978-3-540-69395-6_8CrossRefGoogle Scholar
Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 85778696.10.1073/pnas.0601602103CrossRefGoogle ScholarPubMed
Newman, M. E. J., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7(1), 111.10.1038/ncomms11863CrossRefGoogle ScholarPubMed
Norros, I., & Reittu, H. (2006). On a conditionally Poissonian graph process. Advances in Applied Probability, 38(1), 5975.10.1239/aap/1143936140CrossRefGoogle Scholar
Norros, I., & Reittu, H. (2008a). Attack resistance of power-law random graphs in the finite-mean, infinite-variance region. Internet Mathematics, 5(3), 251266.10.1080/15427951.2008.10129162CrossRefGoogle Scholar
Norros, I., & Reittu, H. (2008b). Network models with a ’soft hierarchy’: A random graph construction with loglog scalability. IEEE Network, 22(2), 4046.10.1109/MNET.2008.4476070CrossRefGoogle Scholar
Norros, I., Reittu, H., & Bazsó, F. (2022). On model selection for dense stochastic block models. Advances in Applied Probability, 54(1), 202226.10.1017/apr.2021.29CrossRefGoogle Scholar
Pehkonen, V., & Reittu, H. (2011). Szemerédi-type clustering of peer-to-peer streaming system. In Proceedings of the international workshop on modeling, analysis, and control of complex networks, Cnet 2011, San Francisco, USA, pp. 2330, ITC23Google Scholar
Peixoto, T. P. (2012). Parsimonious module inference in large networks. Physical Review Letters, 110, 148701.Google Scholar
Peixoto, T. P. (2015). Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. Physical Review E, 92(4), 04280710.1103/PhysRevE.92.042807CrossRefGoogle ScholarPubMed
Reittu, H., Bazsó, F., & Norros, I. (2017a). Regular decomposition: An information and graph theoretic approach to stochastic block models, arXiv: 1704.07114[cs.IT].Google Scholar
Reittu, H., Bazsó, F., & Weiss, R. (2014). Regular decomposition of multivariate time series and other matrices. In: Fränti, P., Brown, G., Loog, M., Escolano, F., & Pelillo, M., eds. Proceedins of S+SSPR 2014, LNCS, (pp. 424433). Springer.Google Scholar
Reittu, H., Kotovirta, V., Leskelä, L., Rummukainen, H., & Räty, T. (2020). Towards analyzing large graphs with quantum annealing. In Baru, C., Huan, J., Khan, L., Hu, X., Ak, R., Tian, Y., Barga, R., Zaniolo, C., Lee, K., & Ye, Y., eds. Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 (pp. 24572464). USA: IEEE, 09-12-2019 to 12-12-2019,Google Scholar
Reittu, H., Leskelä, L., Räty, T., & Fiorucci, M. (2018). Analysis of large sparse graphs using regular decomposition of graph distance matrices. In IEEE international conference on big data (big data) (pp. 37843792). Seattle, WA: IEEE.Google Scholar
Reittu, H., & Norros, I. (2002). On the effect of very large nodes in internet graphs. In Proceedins of global telecommunications conference, 2002. GLOBECOM’02, Vol. 3, (pp. 26242628). IEEE.10.1109/GLOCOM.2002.1189105CrossRefGoogle Scholar
Reittu, H., & Norros, I. (2004). On the power-law random graph model of massive data networks. Performance Evaluation, 55(1-2), 323.10.1016/S0166-5316(03)00097-XCrossRefGoogle Scholar
Reittu, H., Norros, I., & Bazsó, F. (2017b). Regular decomposition of large graphs and other structures: Scalability and robustness towards missing data. In Al Hasan, M., & Madduri, K., eds. Fourth international workshop on high performance big graph data management, analysis, and mining (BigGraphs 2017) (pp. 1627). Boston, USA: IEEE BigData.Google Scholar
Reittu, H., Norros, I., Räty, T., Bolla, M., & Bazsó, F. (2019). Regular decomposition of large graphs: Foundation of a sampling approach to stochastic block model fitting. Data Science and Engineering, 4(1), 4460.10.1007/s41019-019-0084-xCrossRefGoogle Scholar
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2), 416431.10.1214/aos/1176346150CrossRefGoogle Scholar
Szemerédi, E. (1978). Regular partitions of graphs. Problemés Combinatories et Téorie des Graphes, 260, 399401. in Colloq. Intern. C.N.R.S. Orsay.Google Scholar
Tao, T. (2006). Szemerédi’s regularity lemma revisited. Contributions to Discrete Mathematics, 1, 8–28.Google Scholar
van der Hofstad, R. (2017). Random graphs and complex networks, Vol. 1, Cambridge University Press.Google Scholar
van der Hofstad, R., & Hooghiemstra, G. (2008). Universality of distances in power-law random graphs. Journal of Mathematical Physics, 49(12), 125209.10.1063/1.2982927CrossRefGoogle Scholar
van der Hofstad, R., Hooghiemstra, G., & Znamenski, D. (2007). Distances in random graphs with finite mean and infinite variance degrees. Electronic Journal of Probability, 12, 703766.10.1214/EJP.v12-420CrossRefGoogle Scholar
van der Hoorn, P., & Olvera-Cravioto, M. (2018). Typical distances in the directed configuration model. Annals of Applied Probability, 28(3), 17391792.10.1214/17-AAP1342CrossRefGoogle Scholar
Wilson, J. D., Palowitch, J., Bhamidi, S., & Nobel, A. B. (2017). Community extraction in multilayer networks with heterogeneous community structure. Journal of Machine Learning Research, 18, 149.Google ScholarPubMed
Xu, M., Jog, V., & Loh, P.-L. (2020). Optimal rates for community estimation in the weighted stochastic block model. Annals of Statistics, 48(1), 183204.10.1214/18-AOS1797CrossRefGoogle Scholar
Zhang, A. Y., & Zhou, H. H. (2016). Minimax rates of community detection in stochastic block models. Annals of Statistics, 44(5), 22522280.10.1214/15-AOS1428CrossRefGoogle Scholar
Zhao, Y., Levina, E., & Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Annals of Statistics, 40(4), 22662292.10.1214/12-AOS1036CrossRefGoogle Scholar