Learning to count: A deep learning framework for graphlet count estimation

Xutong Liu; Yu-Zhen Janice Chen; John C. S. Lui; Konstantin Avrachenkov

doi:10.1017/nws.2020.35

Learning to count: A deep learning framework for graphlet count estimation

Published online by Cambridge University Press: 11 September 2020

Xutong Liu

Yu-Zhen Janice Chen ,

John C. S. Lui and

Konstantin Avrachenkov

Show author details

Xutong Liu*: Affiliation:
The Chinese University of Hong Kong, Shatin, NT, Hong Kong (e-mail: [email protected])
Yu-Zhen Janice Chen: Affiliation:
University of Massachusetts Amherst, MA01002, USA (e-mail: [email protected])
John C. S. Lui: Affiliation:
The Chinese University of Hong Kong, Shatin, NT, Hong Kong (e-mail: [email protected])
Konstantin Avrachenkov: Affiliation:
INRIA Sophia Antipolis, 06902Valbonne, France (e-mail: [email protected])
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Graphlet counting is a widely explored problem in network analysis and has been successfully applied to a variety of applications in many domains, most notatbly bioinformatics, social science, and infrastructure network studies. Efficiently computing graphlet counts remains challenging due to the combinatorial explosion, where a naive enumeration algorithm needs O(Nk) time for k-node graphlets in a network of size N. Recently, many works introduced carefully designed combinatorial and sampling methods with encouraging results. However, the existing methods ignore the fact that graphlet counts and the graph structural information are correlated. They always consider a graph as a new input and repeat the tedious counting procedure on a regular basis even if it is similar or exactly isomorphic to previously studied graphs. This provides an opportunity to speed up the graphlet count estimation procedure by exploiting this correlation via learning methods. In this paper, we raise a novel graphlet count learning (GCL) problem: given a set of historical graphs with known graphlet counts, how to learn to estimate/predict graphlet count for unseen graphs coming from the same (or similar) underlying distribution. We develop a deep learning framework which contains two convolutional neural network models and a series of data preprocessing techniques to solve the GCL problem. Extensive experiments are conducted on three types of synthetic random graphs and three types of real-world graphs for all 3-, 4-, and 5-node graphlets to demonstrate the accuracy, efficiency, and generalizability of our framework. Compared with state-of-the-art exact/sampling methods, our framework shows great potential, which can offer up to two orders of magnitude speedup on synthetic graphs and achieve on par speed on real-world graphs with competitive accuracy.

Keywords

Graphlet count estimation Convolutional neural networks Deep learning on graph Network analysis

Type: Research Article
Information: Network Science , Volume 9 , Special Issue S1: Complex Networks 2019 , October 2021 , pp. S23 - S60

DOI: https://doi.org/10.1017/nws.2020.35 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Special Issue Editor: Hocine Cherifi

References

Ahmed, N. K., Neville, J., Rossi, R. A., & Duffield, N. (2015). Efficient graphlet countingor large networks. In 2015 IEEE international conference on data mining (ICDM) (pp. 1–10). IEEE.CrossRef Google Scholar

Ahmed, N. K., Neville, J., Rossi, R. A., Duffield, N. G., & Willke, T. L. (2017). Graphlet decomposition: Framework, algorithms, and applications. Knowledge and Information Systems, 50(3), 689–722.CrossRef Google Scholar

Akoglu, L., Tong, H., & Koutra, D. (2015). Graph based anomaly detection and description: A survey. Data Mining and Knowledge Discovery, 29(3), 626–688.CrossRef Google Scholar

Alon, N., Yuster, R., & Zwick, U. (1995). Color-coding. Journal of the ACM (JACM), 42(4), 844–856.CrossRef Google Scholar

Backstrom, L., & Leskovec, J. (2011). Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the Fourth ACM international conference on web search and data mining (pp. 635–644). ACM.CrossRef Google Scholar

Becchetti, L., Boldi, P., Castillo, C., & Gionis, A. (2008). Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 16–24). ACM.CrossRef Google Scholar

Bhuiyan, M. A., Rahman, M., & Al Hasan, M. (2012). Guise: Uniform sampling of graphlets for large graph analysis. In 2012 IEEE 12th international conference on data mining (ICDM) (pp. 91–100). IEEE.CrossRef Google Scholar

Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D.-U. (2006). Complex networks: Structure and dynamics. Physics Reports, 424(4–5), 175–308.CrossRef Google Scholar

Bressan, M., Chierichetti, F., Kumar, R., Leucci, S., & Panconesi, A. (2017). Counting graphlets: Space vs time. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 557–566). ACM.CrossRef Google Scholar

Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arxiv preprint arxiv:1312.6203.Google Scholar

Butler, S. K. (2008). Eigenvalues and structures of graphs. Ph.D. thesis, UC San Diego.Google Scholar

Chen, X., & Lui, J. (2018). Mining graphlet counts in online social networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(4), 41.CrossRef Google Scholar

Chen, X., Li, Y., Wang, P., & Lui, J. (2016). A general framework for estimating graphlet statistics via random walk. Proceedings of the VLDB Endowment, 10(3), 253–264.CrossRef Google Scholar

Chiba, N., & Nishizeki, T. (1985). Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1), 210–223.CrossRef Google Scholar

Chung, K. L. (2001). A course in probability theory. Cambridge, Massachusetts, USA: Academic Press.Google Scholar

Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems (pp. 3844–3852).Google Scholar

Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 855–864). ACM.CrossRef Google Scholar

Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in neural information processing systems (pp. 1024–1034).Google Scholar

Henaff, M., Bruna, J., & LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arxiv preprint arxiv:1506.05163.Google Scholar

Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.CrossRef Google Scholar

Holland, P. W., & Leinhardt, S. (1976). Local structure in social networks. Sociological Methodology, 7, 1–45.CrossRef Google Scholar

Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195(1), 215–243.CrossRef Google Scholar PubMed

Jain, S., & Seshadhri, C. (2017). A fast and provable method for estimating clique counts using turán’s theorem. In Proceedings of the 26th international conference on world wide web (pp. 441–449). International World Wide Web Conferences Steering Committee.CrossRef Google Scholar

Jha, M., Seshadhri, C., & Pinar, A. (2015). Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proceedings of the 24th international conference on world wide web (pp. 495–505). International World Wide Web Conferences Steering Committee.CrossRef Google Scholar

Johnson, R., & Zhang, T. (2015). Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems (pp. 919–927).Google Scholar

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1725–1732).CrossRef Google Scholar

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arxiv preprint arxiv:1412.6980.Google Scholar

Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arxiv preprint arxiv:1609.02907.Google Scholar

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.CrossRef Google Scholar PubMed

Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636). ACM.CrossRef Google Scholar

Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 2.CrossRef Google Scholar

Liu, C., Yan, X., Yu, H., Han, J., & Yu, P. S. (2005). Mining behavior graphs for backtrace of noncrashing bugs. In Proceedings of the 2005 SIAM international conference on data mining (pp. 286–297). SIAM.CrossRef Google Scholar

Mawhirter, D., Wu, B., Mehta, D., & Ai, C. (2018). ApproxG: Fast approximate parallel graphlet counting through accuracy control. In IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID) (pp. 533–542). IEEE.CrossRef Google Scholar

Milenković, T., Memišević, V., Ganesan, A. K., & Pržulj, N. (2010). Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data. Journal of the Royal Society Interface, 7(44), 423–437.CrossRef Google Scholar PubMed

Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. arxiv preprint arxiv:1611.06440.Google Scholar

Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., & Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the CVPR (p. 3), vol. 1.CrossRef Google Scholar

Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 5, 323–351.CrossRef Google Scholar

Niepert, M., Ahmed, M., & Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In International conference on machine learning (pp. 2014–2023).Google Scholar

Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 701–710). ACM.CrossRef Google Scholar

Pinar, A., Seshadhri, C., & Vishal, V. (2017). Escape: Efficiently counting all 5-vertex subgraphs. In Proceedings of the 26th international conference on world wide web (pp. 1431–1440). International World Wide Web Conferences Steering Committee.CrossRef Google Scholar

Pržulj, N. (2007). Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2), e177–e183.CrossRef Google Scholar PubMed

Rahman, M., Bhuiyan, M. A., & Al Hasan, M. (2014). Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering, 26(10), 2466–2478.CrossRef Google Scholar

Rossi, R. A., & Ahmed, N. K. (2015). The network data repository with interactive graph analytics and visualization. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence.Google Scholar

Rossi, R. A., & Ahmed, N. K. (2019). Complex networks are structurally distinguishable by domain. Social Network Analysis and Mining, 9, 51.CrossRef Google Scholar

Rossi, R. A., Zhou, R., & Ahmed, N. K. (2018). Estimation of graphlet counts in massive networks. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 1–14.Google Scholar PubMed

Schank, T., & Wagner, D. (2005). Finding, counting and listing all triangles in large graphs, an experimental study. In International workshop on experimental and efficient algorithms (pp. 606–609). Springer.CrossRef Google Scholar

Schöning, U. (1988). Graph isomorphism is in the low hierarchy. Journal of Computer and System Sciences, 37(3), 312–323.CrossRef Google Scholar

Seshadhri, C., Pinar, A., & Kolda, T. G. (2013). Triadic measures on graphs: The power of wedge sampling. In Proceedings of the 2013 SIAM international conference on data mining (pp. 10–18). SIAM.CrossRef Google Scholar

Shervashidze, N., Vishwanathan, S. V. N., Petri, T., Mehlhorn, K., & Borgwardt, K. (2009). Efficient graphlet kernels for large graph comparison. In Artificial intelligence and statistics (pp. 488–495).Google Scholar

Sporns, O., Chialvo, D. R., Kaiser, M., & Hilgetag, C. C. (2004). Organization, development and function of complex brain networks. Trends in Cognitive Sciences, 8(9), 418–425.CrossRef Google Scholar PubMed

Tsourakakis, C. E. (2008). Fast counting of triangles in large real networks without counting: Algorithms and laws. In Eighth IEEE international conference on data mining, 2008. ICDM’08 (pp. 608–617). IEEE.CrossRef Google Scholar

Ugander, J., Backstrom, L., & Kleinberg, J. (2013). Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of the 22nd international conference on world wide web (pp. 1307–1318). ACM.CrossRef Google Scholar

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2017). Graph attention networks. arxiv preprint arxiv:1710.10903.Google Scholar

Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., & Borgwardt, K. M. (2010). Graph kernels. Journal of Machine Learning Research, 11, 1201–1242.Google Scholar

Wale, N., Watson, I. A., & Karypis, G. (2008). Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14(3), 347–375.CrossRef Google Scholar

Wang, P., Lui, J. C. S., Towsley, D., & Zhao, J. (2016). Minfer: A method of inferring motif statistics from sampled edges. In 2016 IEEE 32nd international conference on data engineering (ICDE). CrossRef Google Scholar

Wang, P., Zhao, J., Zhang, X., Li, Z., Cheng, J., Lui, J. C. S., Towsley, D., Tao, J., & Guan, X. (2017). MOSS-5: A fast method of approximating counts of 5-node graphlets in large graphs. IEEE Transactions on Knowledge and Data Engineering, 30(1), 73–86.CrossRef Google Scholar

Weisfeiler, B., & Lehman, A. A. (1968). A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-technicheskaya Informatsia, 2(9), 12–16.Google Scholar

Zhang, B., Xing, K., Cheng, X., Huang, L., & Bie, R. (2012). Traffic clustering and online traffic prediction in vehicle networks: A social influence perspective. In INFOCOM, 2012 Proceedings IEEE (pp. 495–503). IEEE.CrossRef Google Scholar

Zhang, M., & Chen, Y. (2017). Weisfeiler–Lehman neural machine for link prediction. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 575–583). ACM.CrossRef Google Scholar

Zhang, Z.-K., Zhou, T., & Zhang, Y.-C. (2010). Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A: Statistical Mechanics and its Applications, 389(1), 179–186.CrossRef Google Scholar

Article contents

Learning to count: A deep learning framework for graphlet count estimation

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests