The state of the art in semantic relatedness: a framework for comparison

Yue Feng; Ebrahim Bagheri; Faezeh Ensan; Jelena Jovanovic

doi:10.1017/S0269888917000029

The state of the art in semantic relatedness: a framework for comparison

Published online by Cambridge University Press: 27 March 2017

Yue Feng ,

Ebrahim Bagheri ,

Faezeh Ensan and

Jelena Jovanovic

Show author details

Yue Feng: Affiliation:
Laboratory for Systems, Software and Semantics (LS3), Ryerson University, Toronto, M5B 2K3 ON, Canada e-mail: [email protected]
Ebrahim Bagheri: Affiliation:
Laboratory for Systems, Software and Semantics (LS3), Ryerson University, Toronto, M5B 2K3 ON, Canada e-mail: [email protected]
Faezeh Ensan: Affiliation:
Department of Computer Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad, Razavi Khorasane-mail: [email protected]
Jelena Jovanovic: Affiliation:
Department of Software Engineering, School of Business Administration, University of Belgrade, Jove Ilica 154, 11000 Belgrade, Serbiae-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Semantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.

Type: Survey Article
Information: The Knowledge Engineering Review , Volume 32 , 2017 , e10

DOI: https://doi.org/10.1017/S0269888917000029 [Opens in a new window]
Copyright: © Cambridge University Press, 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M. & Soroa, A. 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 19–27. Association for Computational Linguistics.CrossRef Google Scholar

Banerjee, S. & Pedersen, T. 2002. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing (CICLing ’02), Gelbukh, A. F. (ed.). Springer-Verlag, 136–145.Google Scholar

Bicici, M. E. 2015. RTM-DCU: predicting semantic similarity with referential translation machines. In SemEval-2015: Semantic Evaluation Exercises – International Workshop on Semantic Evaluation. http://doras.dcu.ie/20650/.CrossRef Google Scholar

Bollegala, D., Matsuo, Y. & Ishizuka, M. 2006. Disambiguating personal names on the web using automatically extracted key phrases. In Proceedings of the 17th European Conference on Artificial Intelligence, 553–557. IOS Press.Google Scholar

Bollegala, D., Matsuo, Y. & Ishizuka, M. 2007. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th International Conference on World Wide Web (WWW ’07), 757–766. ACM.Google Scholar

Bu, F., Hao, Y. & Zhu, X. 2011. Semantic relationship discovery with Wikipedia structure. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence – Vol. 3 (IJCAI ’11), Walsh, T. (ed.). AAAI Press, 1770–1775.Google Scholar

Budan, I. A. & Graeme, H. 2006. Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 13–47.Google Scholar

Budanitsky, A. & Hirst, G. 2006. Evaluating Wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1), 13–47.CrossRef Google Scholar

Chen, H. H., Lin, M. S. & Wei, Y. C. 2006. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, 1009–1016. Association for Computational Linguistics.Google Scholar

Chen, P., Ding, W., Bowes, C. & Brown, D. 2009. A fully unsupervised word sense disambiguation method using dependency knowledge. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’09), 28–36. Association for Computational Linguistics.Google Scholar

Cilibrasi, R. L. & Vitanyi, P. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383.Google Scholar

Duan, J. & Zeng, J. 2012. Computing semantic relatedness based on search result analysis. In Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology – Vol. 3, 205–209. IEEE Computer Society.Google Scholar

Euzenat, J. & Shvaiko, P. 2013. Ontology Matching, 2nd edition. Springer-Verlag.Google Scholar

Feng, Y., Fani, H., Bagheri, E. & Jovanovic, J. 2015. Lexical semantic relatedness for Twitter analytics. In IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI 2015), 202–209. IEEE.Google Scholar

Ferrara, F. & Tasso, C. 2013. Evaluating the results of methods for computing semantic relatedness. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing – Part I (CICLing ’13), 447–458. Springer-Verlag.CrossRef Google Scholar

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. & Ruppin, E. 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116–131.Google Scholar

Gabrilovich, E. & Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), Sangal, R., Mehta, H. & Bagga, R. K. (eds). Morgan Kaufmann Publishers Inc., 1606–1611.Google Scholar

Gracia, J. & Mena, E. 2008. Web-based measure of semantic relatedness. In Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE ’08), Bailey, J., Maier, D., Schewe, K. D., Thalheim, B. & Wang, X. S. (eds). Springer-Verlag, 136–150.Google Scholar

Graham, M., Milanowski, A. & Miller, J. 2012. Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Center for Educator Compensation Reform. http://files.eric.ed.gov/fulltext/ED532068.pdf.Google Scholar

Gruninger, M. & Kopena, J. B. 2005. Semantic integration through invariants. AI Magazine 26(1), 11–20.Google Scholar

Gurevych, I. 2005. Using the structure of a conceptual network in computing semantic relatedness. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP ’05), Dale, R., Wong, K. F., Su, J. & Kwong, O. Y. (eds). Springer-Verlag, 767–778.Google Scholar

Gurevych, I. 2006. Computing semantic relatedness across parts of speech. Technical report, Department of Computer Science, Telecooperation, Darmstadt University of Technology.Google Scholar

Gurevych, I. & Niederlich, H. 2005. Computing semantic relatedness of GermaNet concepts. In Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen: Proceedings of the Workshop on Applications of GermaNet II at GLDV2005, 462–474.Google Scholar

Hecht, B., Carton, S. H., Quaderi, M., Schöning, J., Raubal, M., Gergle, D. & Downey, D. 2012. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12), 415–424. ACM.CrossRef Google Scholar

Hughes, T. & Ramage, D. 2007. Lexical semantic relatedness with random graph walks. In Empirical Methods on Natural Language Processing and Computational Natural Language Learning, 581–589.Google Scholar

Jarmasz, M. & Szpakowicz, S. 2012a. Roget’s thesaurus and semantic similarity. arXiv preprint arXiv:1204.0245.Google Scholar

Jarmasz, M. & Szpakowicz, S. 2012b. Roget’s thesaurus: a lexical resource to treasure. arXiv preprint arXiv:1204.0258.Google Scholar

Jiang, J. J. & Conrath, D. W. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.Google Scholar

Karanastasi, A. & Christodoulakis, S. 2007. The OntoNL semantic relatedness measure for OWL ontologies. In Proceedings of the 2nd International Conference on Digital Information Management, ICDIM ’07, 333–338. IEEE Computer Society.CrossRef Google Scholar

Krizhanovsky, A. A. & Lin, F. 2009. Related terms search based on WordNet/Wiktionary and its application in ontology matching. arXiv preprint arXiv:0907.2209.Google Scholar

Leacock, C. & Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49(2), 265–283.Google Scholar

Leong, C. W. & Mihalcea, R. 2011. Measuring the semantic relatedness between words and images. In Proceedings of the 9th International Conference on Computational Semantics, 185–194. Association for Computational Linguistics.Google Scholar

Lesk, M. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC ’86), DeBuys, V. (ed.). ACM, 24–26.Google Scholar

Li, Y., Bandar, Z. A. & McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871–882.Google Scholar

Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hasida, K. & Ishizuka, M. 2007. Polyphonet: an advanced social network extraction system. Web Semantics: Science, Services and Agents on the World Wide Web 5(4), 262–278.Google Scholar

Meyer, C. M. & Gurevych, I. 2012. To exhibit is not to loiter: a multilingual, sense-disambiguated Wiktionary for measuring verb similarity. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), 1763–1780.Google Scholar

Mihalcea, R. & Moldovan, D. I. 1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL ’99), 152–158. Association for Computational Linguistics.Google Scholar

Mika, P. 2007. Ontologies are us: a unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web 5(1), 5–15.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. 2013a. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS ’13), 3111–3119. Curran Associates Inc.Google Scholar

Mikolov, T., Yih, W. T. & Zweig, G. 2013b. Linguistic regularities in continuous space word representations. In Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 746–751. The Association for Computational Linguistics.Google Scholar

Milikic, N., Jovanovic, J. & Stankovic, M. 2011. Discovering the dynamics of terms’ semantic relatedness through Twitter. In Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, 57–68.Google Scholar

Miller, G. A. & Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28.Google Scholar

Milne, D. 2007. Computing semantic relatedness using Wikipedia link structure. In Proceedings of the New Zealand Computer Science Research Student Conference. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.3604.Google Scholar

Mori, J., Ishizuka, M. & Matsuo, Y. 2007. Extracting keyphrases to represent relations in social networks from web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), 2820–2825. Morgan Kaufmann Publishers Inc.Google Scholar

Otero-Cerdeira, L., Rodríguez-Martínez, F. J. & Gómez-Rodríguez, A. 2015. Ontology matching. Expert Systems With Applications 42(2), 949–971.Google Scholar

Patwardhan, S. & Pedersen, T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 Workshop Making Sense of Sense – Bringing Computational Linguistics and Psycholinguistics Together, 1501, 1–8.Google Scholar

Pedersen, T. 2012. Duluth: measuring degrees of relational similarity with the gloss vector measure of semantic relatedness. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval ’12), 497–501. Association for Computational Linguistics.Google Scholar

Pedersen, T., Pakhomov, S. V., Patwardhan, S. & Chute, C. G. 2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40(3), 288–299.Google Scholar

Pirró, G. 2012. REWOrD: semantic relatedness in the web of data. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI ’12), 129–135. AAAI Press.Google Scholar

Polčicová, G. & Návrat, P. 2002. Semantic similarity in content-based filtering. In Proceedings of the 6th East European Conference on Advances in Databases and Information Systems (ADBIS ’02), Manolopoulos, Y. & Návrat, P. (eds). Springer-Verlag, 80–85.Google Scholar

Rada, R., Mili, H., Bicknell, E. & Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 19(1), 17–30.Google Scholar

Radinsky, K., Agichtein, E., Gabrilovich, E. & Markovitch, S. 2011. A word at a time: computing word relatedness using Temporal Semantic Analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW ’11), 337–346. ACM.Google Scholar

Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence – Vol. 1 (IJCAI ’95), Mellish, C. S. (ed.). Morgan Kaufmann Publishers Inc., 448–453.Google Scholar

Resnik, P. 1999. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130.Google Scholar

Rubenstein, H. & Goodenough, J. B. 1965. Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633.CrossRef Google Scholar

Sabou, M., Gracia, J., Angeletou, S., d’Aquin, M. & Motta, E. 2007. Evaluating the semantic web: a task-based approach. In Proceedings of the 6th International Semantic Web Conference, ISWC 2007, 423–437. Springer-Verlag.Google Scholar

Sahami, M. & Heilman, T. D. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (WWW ’06), 377–386. ACM.Google Scholar

Schütze, H. 1998. Automatic word sense discrimination. Computational Linguistics 24(1), 97–123.Google Scholar

Seco, N., Veale, T. & Hayes, J. 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence, ECAI ’2004, 1089–1090.Google Scholar

Spanakis, G., Siolas, G. & Stafylopatis, A. 2009. A hybrid web-based measure for computing semantic relatedness between words. In Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI ’09), 441–448. IEEE Computer Society.Google Scholar

Strube, M. & Ponzetto, S. P. 2006. WikiRelate! Computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence – Vol. 2 (AAAI ’06), Cohn, A. (ed.). AAAI Press, 1419–1424.Google Scholar

Taieb, M. A. H., Aouicha, M. B. & Hamadou, A. B. 2013. Computing semantic relatedness using Wikipedia features. Knowledge-Based Systems 50, 260–278.Google Scholar

Turdakov, D. & Velikhov, P. 2008. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. In Proceedings of the SYRCODIS 2008 Colloquium on Databases and Information Systems. http://ceur-ws.org/Vol-355/turdakov.pdf.Google Scholar

Turney, P. 2006. Expressing implicit semantic relations without supervision. In Proceedings of the 21st International Committee on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), 313–320. Association for Computational Linguistics.CrossRef Google Scholar

Turney, P. D. & Pantel, P. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141–188.CrossRef Google Scholar

Vélez, B., Weiss, R., Sheldon, M. A. & Gifford, D. K. 1997. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’97), 6–15. ACM.Google Scholar

Wan, S. & Angryk, R. 2007. Measuring semantic similarity using Wordnet-based context vectors. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2007. ISIC, 908–913. IEEE Computer Society.Google Scholar

Weng, J. & Lee, B. S. 2011. Event detection in Twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, ICWSM 2011, 401–408. Association for the Advancement of Artificial Intelligence.Google Scholar

Witten, I. & Milne, D. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 25–30. AAAI Press.Google Scholar

Wu, H., Min, M. R. & Bai, B. 2014. Deep semantic embedding. In Proceedings of Workshop on Semantic Matching in Information Retrieval Co-Located with the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 46–52.Google Scholar

Wu, Z. & Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL ’94), 133–138. Association for Computational Linguistics.Google Scholar

Yang, D. & Powers, D. M. 2006. Verb similarity on the taxonomy of WordNet. In Proceedings of the 3rd International WordNet Conference (GWC-06).Google Scholar

Yeh, E., Ramage, D., Manning, C. D., Agirre, E. & Soroa, A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the 2009 Workshop on Graph-Based Methods for Natural Language Processing, 41–49. Association for Computational Linguistics.Google Scholar

Zarrinkalam, F., Fani, H., Bagheri, E. & Kahani, M. 2016. Inferring implicit topical interests on Twitter. In Proceedings of the 38th European Conference on IR Research, ECIR 2016, 479–491. Springer International Publishing.Google Scholar

Zesch, T. 2010. Study of semantic relatedness of words using collaboratively constructed semantic resources. PhD thesis, Technische Universität.Google Scholar

Zesch, T. & Gurevych, I. 2006. Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances (LD ‘06), 16–24. Association for Computational Linguistics.Google Scholar

Zesch, T. & Gurevych, I. 2010. The more the better? Assessing the influence of Wikipedia’s growth on semantic relatedness measures. In Proceedings of the Conference on Language Resources and Evaluation (LREC ’10).Google Scholar

Zesch, T., Gurevych, I. & Mühlhäuser, M. 2007. Comparing Wikipedia and German Wordnet by evaluating semantic relatedness on multiple datasets. In Proceedings of Human Language Technologies 2007: Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, 205–208. Association for Computational Linguistics.Google Scholar

Zesch, T., Müller, C. & Gurevych, I. 2008. Using Wiktionary for computing semantic relatedness. In Proceedings of the 23rd National Conference on Artificial Intelligence – Volume 2 (AAAI ’08), Cohn, A. (ed.). AAAI Press, 861–866.Google Scholar

Zhao, Q., Hoi, S. C., Liu, T. Y., Bhowmick, S. S., Lyu, M. R. & Ma, W. Y. 2006. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of the 15th International Conference on World Wide Web, 543–552. ACM.Google Scholar

Zhou, W., Wang, H., Chao, J., Zhang, W. & Yu, Y. 2012. LODDO: using linked open data description overlap to measure semantic relatedness between named entities. In Proceedings of the 2011 Joint International Conference on The Semantic Web (JIST ’11), Pan, J. Z., Chen, H., Kim, H. G., Li, J. & Wu, Z. (eds). Springer-Verlag, 268–283.Google Scholar

Article contents

The state of the art in semantic relatedness: a framework for comparison

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests