Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-05T09:32:13.382Z Has data issue: false hasContentIssue false

The state of the art in semantic relatedness: a framework for comparison

Published online by Cambridge University Press:  27 March 2017

Yue Feng
Affiliation:
Laboratory for Systems, Software and Semantics (LS3), Ryerson University, Toronto, M5B 2K3 ON, Canada e-mail: [email protected]
Ebrahim Bagheri
Affiliation:
Laboratory for Systems, Software and Semantics (LS3), Ryerson University, Toronto, M5B 2K3 ON, Canada e-mail: [email protected]
Faezeh Ensan
Affiliation:
Department of Computer Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad, Razavi Khorasane-mail: [email protected]
Jelena Jovanovic
Affiliation:
Department of Software Engineering, School of Business Administration, University of Belgrade, Jove Ilica 154, 11000 Belgrade, Serbiae-mail: [email protected]

Abstract

Semantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.

Type
Survey Article
Copyright
© Cambridge University Press, 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M. & Soroa, A. 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 19–27. Association for Computational Linguistics.CrossRefGoogle Scholar
Banerjee, S. & Pedersen, T. 2002. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing (CICLing ’02), Gelbukh, A. F. (ed.). Springer-Verlag, 136–145.Google Scholar
Bicici, M. E. 2015. RTM-DCU: predicting semantic similarity with referential translation machines. In SemEval-2015: Semantic Evaluation Exercises – International Workshop on Semantic Evaluation. http://doras.dcu.ie/20650/.CrossRefGoogle Scholar
Bollegala, D., Matsuo, Y. & Ishizuka, M. 2006. Disambiguating personal names on the web using automatically extracted key phrases. In Proceedings of the 17th European Conference on Artificial Intelligence, 553–557. IOS Press.Google Scholar
Bollegala, D., Matsuo, Y. & Ishizuka, M. 2007. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th International Conference on World Wide Web (WWW ’07), 757–766. ACM.Google Scholar
Bu, F., Hao, Y. & Zhu, X. 2011. Semantic relationship discovery with Wikipedia structure. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence – Vol. 3 (IJCAI ’11), Walsh, T. (ed.). AAAI Press, 1770–1775.Google Scholar
Budan, I. A. & Graeme, H. 2006. Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 1347.Google Scholar
Budanitsky, A. & Hirst, G. 2006. Evaluating Wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1), 1347.CrossRefGoogle Scholar
Chen, H. H., Lin, M. S. & Wei, Y. C. 2006. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, 1009–1016. Association for Computational Linguistics.Google Scholar
Chen, P., Ding, W., Bowes, C. & Brown, D. 2009. A fully unsupervised word sense disambiguation method using dependency knowledge. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’09), 28–36. Association for Computational Linguistics.Google Scholar
Cilibrasi, R. L. & Vitanyi, P. 2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370383.Google Scholar
Duan, J. & Zeng, J. 2012. Computing semantic relatedness based on search result analysis. In Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology – Vol. 3, 205–209. IEEE Computer Society.Google Scholar
Euzenat, J. & Shvaiko, P. 2013. Ontology Matching, 2nd edition. Springer-Verlag.Google Scholar
Feng, Y., Fani, H., Bagheri, E. & Jovanovic, J. 2015. Lexical semantic relatedness for Twitter analytics. In IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI 2015), 202–209. IEEE.Google Scholar
Ferrara, F. & Tasso, C. 2013. Evaluating the results of methods for computing semantic relatedness. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing – Part I (CICLing ’13), 447–458. Springer-Verlag.CrossRefGoogle Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. & Ruppin, E. 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20(1), 116131.Google Scholar
Gabrilovich, E. & Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), Sangal, R., Mehta, H. & Bagga, R. K. (eds). Morgan Kaufmann Publishers Inc., 1606–1611.Google Scholar
Gracia, J. & Mena, E. 2008. Web-based measure of semantic relatedness. In Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE ’08), Bailey, J., Maier, D., Schewe, K. D., Thalheim, B. & Wang, X. S. (eds). Springer-Verlag, 136–150.Google Scholar
Graham, M., Milanowski, A. & Miller, J. 2012. Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Center for Educator Compensation Reform. http://files.eric.ed.gov/fulltext/ED532068.pdf.Google Scholar
Gruninger, M. & Kopena, J. B. 2005. Semantic integration through invariants. AI Magazine 26(1), 1120.Google Scholar
Gurevych, I. 2005. Using the structure of a conceptual network in computing semantic relatedness. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP ’05), Dale, R., Wong, K. F., Su, J. & Kwong, O. Y. (eds). Springer-Verlag, 767–778.Google Scholar
Gurevych, I. 2006. Computing semantic relatedness across parts of speech. Technical report, Department of Computer Science, Telecooperation, Darmstadt University of Technology.Google Scholar
Gurevych, I. & Niederlich, H. 2005. Computing semantic relatedness of GermaNet concepts. In Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen: Proceedings of the Workshop on Applications of GermaNet II at GLDV2005, 462–474.Google Scholar
Hecht, B., Carton, S. H., Quaderi, M., Schöning, J., Raubal, M., Gergle, D. & Downey, D. 2012. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12), 415–424. ACM.CrossRefGoogle Scholar
Hughes, T. & Ramage, D. 2007. Lexical semantic relatedness with random graph walks. In Empirical Methods on Natural Language Processing and Computational Natural Language Learning, 581–589.Google Scholar
Jarmasz, M. & Szpakowicz, S. 2012a. Roget’s thesaurus and semantic similarity. arXiv preprint arXiv:1204.0245.Google Scholar
Jarmasz, M. & Szpakowicz, S. 2012b. Roget’s thesaurus: a lexical resource to treasure. arXiv preprint arXiv:1204.0258.Google Scholar
Jiang, J. J. & Conrath, D. W. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.Google Scholar
Karanastasi, A. & Christodoulakis, S. 2007. The OntoNL semantic relatedness measure for OWL ontologies. In Proceedings of the 2nd International Conference on Digital Information Management, ICDIM ’07, 333–338. IEEE Computer Society.CrossRefGoogle Scholar
Krizhanovsky, A. A. & Lin, F. 2009. Related terms search based on WordNet/Wiktionary and its application in ontology matching. arXiv preprint arXiv:0907.2209.Google Scholar
Leacock, C. & Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49(2), 265283.Google Scholar
Leong, C. W. & Mihalcea, R. 2011. Measuring the semantic relatedness between words and images. In Proceedings of the 9th International Conference on Computational Semantics, 185–194. Association for Computational Linguistics.Google Scholar
Lesk, M. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC ’86), DeBuys, V. (ed.). ACM, 24–26.Google Scholar
Li, Y., Bandar, Z. A. & McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871882.Google Scholar
Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hasida, K. & Ishizuka, M. 2007. Polyphonet: an advanced social network extraction system. Web Semantics: Science, Services and Agents on the World Wide Web 5(4), 262278.Google Scholar
Meyer, C. M. & Gurevych, I. 2012. To exhibit is not to loiter: a multilingual, sense-disambiguated Wiktionary for measuring verb similarity. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), 1763–1780.Google Scholar
Mihalcea, R. & Moldovan, D. I. 1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL ’99), 152–158. Association for Computational Linguistics.Google Scholar
Mika, P. 2007. Ontologies are us: a unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web 5(1), 515.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. 2013a. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS ’13), 3111–3119. Curran Associates Inc.Google Scholar
Mikolov, T., Yih, W. T. & Zweig, G. 2013b. Linguistic regularities in continuous space word representations. In Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 746–751. The Association for Computational Linguistics.Google Scholar
Milikic, N., Jovanovic, J. & Stankovic, M. 2011. Discovering the dynamics of terms’ semantic relatedness through Twitter. In Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, 57–68.Google Scholar
Miller, G. A. & Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 128.Google Scholar
Milne, D. 2007. Computing semantic relatedness using Wikipedia link structure. In Proceedings of the New Zealand Computer Science Research Student Conference. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.3604.Google Scholar
Mori, J., Ishizuka, M. & Matsuo, Y. 2007. Extracting keyphrases to represent relations in social networks from web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), 2820–2825. Morgan Kaufmann Publishers Inc.Google Scholar
Otero-Cerdeira, L., Rodríguez-Martínez, F. J. & Gómez-Rodríguez, A. 2015. Ontology matching. Expert Systems With Applications 42(2), 949971.Google Scholar
Patwardhan, S. & Pedersen, T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 Workshop Making Sense of Sense – Bringing Computational Linguistics and Psycholinguistics Together, 1501, 1–8.Google Scholar
Pedersen, T. 2012. Duluth: measuring degrees of relational similarity with the gloss vector measure of semantic relatedness. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval ’12), 497–501. Association for Computational Linguistics.Google Scholar
Pedersen, T., Pakhomov, S. V., Patwardhan, S. & Chute, C. G. 2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40(3), 288299.Google Scholar
Pirró, G. 2012. REWOrD: semantic relatedness in the web of data. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI ’12), 129–135. AAAI Press.Google Scholar
Polčicová, G. & Návrat, P. 2002. Semantic similarity in content-based filtering. In Proceedings of the 6th East European Conference on Advances in Databases and Information Systems (ADBIS ’02), Manolopoulos, Y. & Návrat, P. (eds). Springer-Verlag, 80–85.Google Scholar
Rada, R., Mili, H., Bicknell, E. & Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 19(1), 1730.Google Scholar
Radinsky, K., Agichtein, E., Gabrilovich, E. & Markovitch, S. 2011. A word at a time: computing word relatedness using Temporal Semantic Analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW ’11), 337–346. ACM.Google Scholar
Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence – Vol. 1 (IJCAI ’95), Mellish, C. S. (ed.). Morgan Kaufmann Publishers Inc., 448–453.Google Scholar
Resnik, P. 1999. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95130.Google Scholar
Rubenstein, H. & Goodenough, J. B. 1965. Contextual correlates of synonymy. Communications of the ACM 8(10), 627633.CrossRefGoogle Scholar
Sabou, M., Gracia, J., Angeletou, S., d’Aquin, M. & Motta, E. 2007. Evaluating the semantic web: a task-based approach. In Proceedings of the 6th International Semantic Web Conference, ISWC 2007, 423–437. Springer-Verlag.Google Scholar
Sahami, M. & Heilman, T. D. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (WWW ’06), 377–386. ACM.Google Scholar
Schütze, H. 1998. Automatic word sense discrimination. Computational Linguistics 24(1), 97123.Google Scholar
Seco, N., Veale, T. & Hayes, J. 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence, ECAI ’2004, 1089–1090.Google Scholar
Spanakis, G., Siolas, G. & Stafylopatis, A. 2009. A hybrid web-based measure for computing semantic relatedness between words. In Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI ’09), 441–448. IEEE Computer Society.Google Scholar
Strube, M. & Ponzetto, S. P. 2006. WikiRelate! Computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence – Vol. 2 (AAAI ’06), Cohn, A. (ed.). AAAI Press, 1419–1424.Google Scholar
Taieb, M. A. H., Aouicha, M. B. & Hamadou, A. B. 2013. Computing semantic relatedness using Wikipedia features. Knowledge-Based Systems 50, 260278.Google Scholar
Turdakov, D. & Velikhov, P. 2008. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. In Proceedings of the SYRCODIS 2008 Colloquium on Databases and Information Systems. http://ceur-ws.org/Vol-355/turdakov.pdf.Google Scholar
Turney, P. 2006. Expressing implicit semantic relations without supervision. In Proceedings of the 21st International Committee on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), 313–320. Association for Computational Linguistics.CrossRefGoogle Scholar
Turney, P. D. & Pantel, P. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141188.CrossRefGoogle Scholar
Vélez, B., Weiss, R., Sheldon, M. A. & Gifford, D. K. 1997. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’97), 6–15. ACM.Google Scholar
Wan, S. & Angryk, R. 2007. Measuring semantic similarity using Wordnet-based context vectors. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2007. ISIC, 908–913. IEEE Computer Society.Google Scholar
Weng, J. & Lee, B. S. 2011. Event detection in Twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, ICWSM 2011, 401–408. Association for the Advancement of Artificial Intelligence.Google Scholar
Witten, I. & Milne, D. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 25–30. AAAI Press.Google Scholar
Wu, H., Min, M. R. & Bai, B. 2014. Deep semantic embedding. In Proceedings of Workshop on Semantic Matching in Information Retrieval Co-Located with the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 46–52.Google Scholar
Wu, Z. & Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL ’94), 133–138. Association for Computational Linguistics.Google Scholar
Yang, D. & Powers, D. M. 2006. Verb similarity on the taxonomy of WordNet. In Proceedings of the 3rd International WordNet Conference (GWC-06).Google Scholar
Yeh, E., Ramage, D., Manning, C. D., Agirre, E. & Soroa, A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the 2009 Workshop on Graph-Based Methods for Natural Language Processing, 41–49. Association for Computational Linguistics.Google Scholar
Zarrinkalam, F., Fani, H., Bagheri, E. & Kahani, M. 2016. Inferring implicit topical interests on Twitter. In Proceedings of the 38th European Conference on IR Research, ECIR 2016, 479–491. Springer International Publishing.Google Scholar
Zesch, T. 2010. Study of semantic relatedness of words using collaboratively constructed semantic resources. PhD thesis, Technische Universität.Google Scholar
Zesch, T. & Gurevych, I. 2006. Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances (LD ‘06), 16–24. Association for Computational Linguistics.Google Scholar
Zesch, T. & Gurevych, I. 2010. The more the better? Assessing the influence of Wikipedia’s growth on semantic relatedness measures. In Proceedings of the Conference on Language Resources and Evaluation (LREC ’10).Google Scholar
Zesch, T., Gurevych, I. & Mühlhäuser, M. 2007. Comparing Wikipedia and German Wordnet by evaluating semantic relatedness on multiple datasets. In Proceedings of Human Language Technologies 2007: Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, 205–208. Association for Computational Linguistics.Google Scholar
Zesch, T., Müller, C. & Gurevych, I. 2008. Using Wiktionary for computing semantic relatedness. In Proceedings of the 23rd National Conference on Artificial Intelligence – Volume 2 (AAAI ’08), Cohn, A. (ed.). AAAI Press, 861–866.Google Scholar
Zhao, Q., Hoi, S. C., Liu, T. Y., Bhowmick, S. S., Lyu, M. R. & Ma, W. Y. 2006. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of the 15th International Conference on World Wide Web, 543–552. ACM.Google Scholar
Zhou, W., Wang, H., Chao, J., Zhang, W. & Yu, Y. 2012. LODDO: using linked open data description overlap to measure semantic relatedness between named entities. In Proceedings of the 2011 Joint International Conference on The Semantic Web (JIST ’11), Pan, J. Z., Chen, H., Kim, H. G., Li, J. & Wu, Z. (eds). Springer-Verlag, 268–283.Google Scholar