Average Jaccard index of random graphs

Qunqiang Feng; Shuai Guo; Zhishui Hu

doi:10.1017/jpr.2023.112

Average Jaccard index of random graphs

Part of: Limit theorems Graph theory Combinatorial probability

Published online by Cambridge University Press: 26 February 2024

Qunqiang Feng

Shuai Guo and

Zhishui Hu

Show author details

Qunqiang Feng*: Affiliation:
University of Science and Technology of China
Shuai Guo*: Affiliation:
University of Science and Technology of China
Zhishui Hu*: Affiliation:
University of Science and Technology of China
*: *Postal address: Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China.
*Postal address: Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China.
*Postal address: Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The asymptotic behavior of the Jaccard index in G(n, p), the classical Erdös–Rényi random graph model, is studied as n goes to infinity. We first derive the asymptotic distribution of the Jaccard index of any pair of distinct vertices, as well as the first two moments of this index. Then the average of the Jaccard indices over all vertex pairs in G(n, p) is shown to be asymptotically normal under an additional mild condition that $np\to\infty$ and $n^2(1-p)\to\infty$.

Keywords

Erdös–Rényi Random graph Jaccard similarity asymptotic distribution inverse moment

MSC classification

Primary: 05C80: Random graphs 60C05: Combinatorial probability

Secondary: 60F05: Central limit and other weak theorems

Type: Original Article
Information: Journal of Applied Probability , Volume 61 , Issue 4 , December 2024 , pp. 1139 - 1152

DOI: https://doi.org/10.1017/jpr.2023.112 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ali, M. et al. (2021). Machine learning – a novel approach of well logs similarity based on synchronization measures to predict shear sonic logs. J. Petroleum Sci, Eng. 203, 108602.CrossRef Google Scholar

Arias-Castro, E. and Verzelen, N. (2014). Community detection in dense random networks. Ann. Statist. 42, 940–969.CrossRef Google Scholar

Bag, S., Kumar, S. K. and Tiwari, M. K. (2019). An efficient recommendation generation using relevant Jaccard similarity. Inf. Sci. 483, 53–64.CrossRef Google Scholar

Berahmand, K., Bouyer, A. and Vasighi, M. (2018). Community detection in complex networks by detecting and expanding core nodes through extended local similarity of nodes. IEEE Trans. Comput. Soc. Syst. 5, 1021–1033.CrossRef Google Scholar

Bollobás, B. (2001). Random Graphs, 2nd edn. Cambridge University Press.CrossRef Google Scholar

Chung, N. C., Miasojedow, B., Startek, M. and Gambin, A. (2019). Jaccard/Tanimoto similarity test and estimation methods for biological presence–absence data. BMC Bioinform. 20, 1–11.CrossRef Google Scholar PubMed

da Fontoura Costa, L. (2021). Further generalizations of the Jaccard index. Preprint, arXiv:2110.09619.Google Scholar

Eelbode, T. et al. (2020). Optimization for medical image segmentation: Theory and practice when evaluating with dice score or Jaccard index. IEEE Trans. Med. Imag. 39, 3679–3690.CrossRef Google Scholar PubMed

Fan, X. et al. (2019). Similarity and heterogeneity of price dynamics across China’s regional carbon markets: A visibility graph network approach. Appl. Energy 235, 739–746.CrossRef Google Scholar

Feng, Q., Hu, Z. and Su, C. (2013). The Zagreb indices of random graphs. Prob. Eng. Inf. Sci. 27, 247–260.CrossRef Google Scholar

Gilbert, G. (1972). Distance between sets. Nature 239, 174.CrossRef Google Scholar

Hennig, C. (2007). Cluster-wise assessment of cluster stability. Comput. Statist. Data Anal. 52, 258–271.CrossRef Google Scholar

Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 241–272.Google Scholar

Janson, S., Luczak, T. and Rucinski, A. (2000). Random Graphs. John Wiley, New York.CrossRef Google Scholar

Koeneman, S. H. and Cavanaugh, J. E. (2022). An improved asymptotic test for the Jaccard similarity index for binary data. Statist. Prob. Lett. 184, 109375.CrossRef Google Scholar

Kogge, P. M. (2016). Jaccard coefficients as a potential graph benchmark. In Proc. 2016 IEEE Int. Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 921–928. IEEE, Piscataway, NJ.CrossRef Google Scholar

Kosub, S. (2019). A note on the triangle inequality for the Jaccard distance. Pattern Recognition Lett. 120, 36–38.CrossRef Google Scholar

Lu, H. and Uddin, S. (2023). Embedding-based link predictions to explore latent comorbidity of chronic diseases. Health Inf. Sci. Syst. 11, 2.CrossRef Google Scholar PubMed

Mammone, N. et al. (2018). Permutation Jaccard distance-based hierarchical clustering to estimate EEG network density modifications in MCI subjects. IEEE Trans. Neural Netw. Learn. Syst. 29, 5122–5135.CrossRef Google Scholar

Miasnikof, P., Shestopaloff, A. Y., Pitsoulis, L. and Ponomarenko, A. (2022). An empirical comparison of connectivity-based distances on a graph and their computational scalability. J. Complex Netw. 10, cnac003.CrossRef Google Scholar

Sathre, P., Gondhalekar, A. and Feng, W. C. (2022). Edge-connected Jaccard similarity for graph link prediction on FPGA. In Proc. 2022 IEEE High Performance Extreme Computing Conf. (HPEC), pp. 1–10. IEEE, Piscataway, NY.CrossRef Google Scholar

Shestopaloff, P. M., Alexander, Y., Bravo, C. and Lawryshyn, Y. (2023). Statistical network isomorphism. In Complex Networks and Their Applications XI, eds H. Cherifi, R. N. Mantegna, L. M. Rocha, C. Cherifi, and S. Micciche. Springer, New York, pp. 325–336.Google Scholar

Shi, X., Wu, Y. and Liu, Y. (2010). A note on asymptotic approximations of inverse moments of nonnegative random variables. Statist. Prob. Lett. 80, 1260–1264.CrossRef Google Scholar

Singh, M. D., Krishna, P. R. and Saxena, A. (2009). A privacy preserving Jaccard similarity function for mining encrypted data. In Proc. TENCON 2009–2009 IEEE Region 10 Conf., pp. 1–4.CrossRef Google Scholar

van der Hofstad, R. (2016). Random Graphs and Complex Networks. Cambridge University Press.CrossRef Google Scholar

Verzelen, N. and Arias-Castro, E. (2015). Community detection in sparse random networks. Ann. Appl. Prob. 25, 3465–3510.CrossRef Google Scholar

Wu, C. and Wang, B. (2017). Extracting topics based on word2vec and improved Jaccard similarity coefficient. In Proc. 2017 IEEE 2nd Int. Conf. Data Science in Cyberspace (DSC), pp. 389–397.CrossRef Google Scholar

Wuyungaowa, and Wang, T., (2008). Asymptotic expansions for inverse moments of binomial and negative binomial. Statist. Prob. Lett. 78, 3018–3022.CrossRef Google Scholar

Yin, Y. and Yasuda, K. (2006). Similarity coefficient methods applied to the cell formation problem: A taxonomy and review. Int. J. Prod. Econ. 101, 329–352.CrossRef Google Scholar

Zhang, P. et al. (2016). Measuring the robustness of link prediction algorithms under noisy environment. Sci. Rep. 6, 18881.CrossRef Google Scholar PubMed

Article contents

Average Jaccard index of random graphs

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests