Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-26T12:18:10.221Z Has data issue: false hasContentIssue false

Recent advances in methods of lexical semantic relatedness – a survey

Published online by Cambridge University Press:  04 May 2012

ZIQI ZHANG
Affiliation:
Department of Computer Science, University of Sheffield 211 Portobello, Regent Court, Sheffield, UK, S1 4DP e-mail: [email protected], [email protected], [email protected]
ANNA LISA GENTILE
Affiliation:
Department of Computer Science, University of Sheffield 211 Portobello, Regent Court, Sheffield, UK, S1 4DP e-mail: [email protected], [email protected], [email protected]
FABIO CIRAVEGNA
Affiliation:
Department of Computer Science, University of Sheffield 211 Portobello, Regent Court, Sheffield, UK, S1 4DP e-mail: [email protected], [email protected], [email protected]

Abstract

Measuring lexical semantic relatedness is an important task in Natural Language Processing (NLP). It is often a prerequisite to many complex NLP tasks. Despite an extensive amount of work dedicated to this area of research, there is a lack of an up-to-date survey in the field. This paper aims to address this issue with a study that is focused on four perspectives: (i) a comparative analysis of background information resources that are essential for measuring lexical semantic relatedness; (ii) a review of the literature with a focus on recent methods that are not covered in previous surveys; (iii) discussion of the studies in the biomedical domain where novel methods have been introduced but inadequately communicated across the domain boundaries; and (iv) an evaluation of lexical semantic relatedness methods and a discussion of useful lessons for the development and application of such methods. In addition, we discuss a number of issues in this field and suggest future research directions. It is believed that this work will be a valuable reference to researchers of lexical semantic relatedness and substantially support the research activities in this field.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., and Soroa, A. 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'09), pp. 1927. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Al-Mubaid, H. and Nguyen, H. 2006. A cluster-based approach for semantic similarity in the biomedical domain. In Proceedings of the 28th International Conference of IEEE Engineering in Medicine and Biology Society, New York, USA, August 30–September 3, pp. 2713–7.Google Scholar
Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 (17): 3389–402.CrossRefGoogle ScholarPubMed
Alvarez, M. and Liam, S. 2007. A graph modeling of semantic similarity between words. In Proceedings of the International Conference on Semantic Computing (ICSC'07), pp. 355–62. Washington, DC, USA: IEEE Computer Society.Google Scholar
Banerjee, S. and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 805–10. San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
Bär, D., Zesch, T., and Gurevych, I. 2011. A reflective view on text similarity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing 2011 (RANLP 2011), Hissar, Bulgaria, pp. 515–20.Google Scholar
Batet, M., Sánchez, D. and Valls, A. 2011. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44 (1), 118–25.CrossRefGoogle ScholarPubMed
Bhattacharya, A., Bhowmick, A. and Singh, A. 2010. Finding top-k similar pairs of objects annotated with terms from an ontology. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM'10), pp. 214–32. Berlin, Germany: Springer-Verlag.Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. 2009. DBpedia – a crystallization point for the web of data. Journal of Web Semantics 7 (3), 154–65.CrossRefGoogle Scholar
Bollegala, D., Matsuo, Y. and Ishizuka, M. 2007. An integrated approach to measuring semantic similarity between words using information available on the web. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 340–7. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., and Bairoch, A. 2007. UniProtKB/Swiss-Prot. Methods in Molecular Biology 406, 89112.Google ScholarPubMed
Budanitsky, A. and Hirst, G. 2006. Evaluating WordNet-based measures of lexical semantic relatedness. Journal of Computational Linguistics 32 (1), 1347.CrossRefGoogle Scholar
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., and Apweiler, R. 2004. The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Research 32(Database), D262–6.CrossRefGoogle ScholarPubMed
Chen, H., Lin, M. and Wei, Y. 2006. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–16. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Cherry, J., Adler, C., Ball, C., Chervitz, S., Dwight, S., Hester, E., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S., and Botstein, D. 1998. SGD: saccharomyces genome database. Nucleic Acids Research 26 (1), 73–9.CrossRefGoogle ScholarPubMed
Chinchor, N. 2001. Message Understanding Conference (MUC) 7. LDC2001T02, Philadelphia, Penn: Linguistic Data Consortium.Google Scholar
Chinchor, N. and Sundheim, B. 2003. Message Understanding Conference (MUC) 6. LDC Catalog No.: LDC2003T13. Philadelphia, PA: Linguistic Data Consortium.Google Scholar
Cilibrasi, R. and Vitanyi, P. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19 (3), 370–83.CrossRefGoogle Scholar
Collins, A. and Loftus, E. 1975. A spreading-activation theory of semantic processing. Psychological Review 82 (6), 407–28.CrossRefGoogle Scholar
Couto, F., Silva, M. and Coutinho, P. 2005. Semantic similarity over the Gene Ontology: family correlation and selecting disjunctive ancestors. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05), pp. 343–4. New York, NY, USA: ACM.CrossRefGoogle Scholar
Cramer, I. and Finthammer, M. 2008. An evaluation procedure for WordNet-based lexical chaining: methods and issues. In Proceedings of the 4th Global WordNet Meeting, pp. 120–46. Szeged, Hungary: University of Szeged.Google Scholar
Cucerzan, S. 2007. Large-scale named entity disambiguation based on Wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–16. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Curran, J. and Moens, M. 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition (ULA'02), pp. 5966. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Degtyarenko, K., Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcntara, R., Darsow, M., Guedj, M., and Ashburner, M. 2007. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research 36(Database), D344–50.CrossRefGoogle ScholarPubMed
Dolan, B., Quirk, C. and Brockett, C. 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04), pp. 350–6. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Egozi, O., Markovitch, S. and Gabrilovich, E. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Transactions of Information Systems 29 (2), 8:1–8: 34.CrossRefGoogle Scholar
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA, USA: MIT Press.CrossRefGoogle Scholar
Finkelstein, F., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2002. Placing search in context: the concept revisited. ACM Transactions of Information Systems 20 (1), 116–31.Google Scholar
Firth, J. R. 1957. A synopsis of linguistic theory, 1930–1955. In Studies in Linguistic Analysis (special volume of the Philological Society), pp. 132. Harlow, UK: Longman.Google Scholar
Gabrilovich, E. 2007. Wikipedia preprocessor (WikiPrep). http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep/#references. Accessed March 16, 2012).Google Scholar
Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI'07), pp. 1606–11. San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
Gangemi, A. and Presutti, V. 2010. Towards a pattern science for the semantic web. Emantic Web Journal 1 (1–2), 61–8.Google Scholar
Gentleman, R. (2005). Visualizing and distances using GO. http://bioconductor.org/packages/2.0/bioc/vignettes/GOstats/inst/doc/GOvis.pdf. Accessed March 16, 2012.Google Scholar
Gouws, S., van Rooyen, G-J, and Engelbrecht, H. A. 2010. Measuring conceptual similarity by spreading activation over Wikipedia's hyperlink structure. In Proceedings of the COLING 2010, 2nd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources, Beijing, China, pp. 4654.Google Scholar
Gracia, J. and Mena, E. 2008. Web-based measure of semantic relatedness. In Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE'08), pp. 136150. Berlin, Germany: Springer-Verlag.Google Scholar
Gurevych, I. 2005. Using the structure of a conceptual network in computing semantic relatedness. In Proceedings of the 2nd International Joint Conference on Natural Language Processing, pp. 767–78. Berlin, Germany: Springer-Verlag.Google Scholar
Gurevych, I. and Niederlich, H. 2005. Computing semantic relatedness in German with revised information content metrics. In Proceedings of ÖntoLex 2005 – Ontologies and Lexical Resources (IJCNLP'05) Workshop, pp. 2833. Berlin, Germany: Springer-Verlag.Google Scholar
Halavais, A. and Lackaff, D. 2008. An analysis of topical coverage of Wikipedia. Journal of Computer-Mediated Communication 13 (2), 429–40.CrossRefGoogle Scholar
Han, X. and Zhao, J. 2010. Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 50–9. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Harman, D. and Liberman, M. 1993. TIPSTER vol. 1. Philadelphia, PA, USA: Linguistic Data Consortium.Google Scholar
Harrington, B. 2010. A semantic network approach to measuring relatedness. In Proceedings of the 23rd International Conference on Computational Linguistics, pp. 356–64. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Hassan, S. and Mihalcea, R. 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1192–201. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Haveliwala, T. 2002. Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web (WWW'02), pp. 517–26. New York, NY, USA: ACM.CrossRefGoogle Scholar
Hirst, G. and St-Onge, D. 1998. Lexical chains as representation of context for the detection and correction malapropisms. In FellBaum, C. (ed.), WordNet: An Electronic Lexical Database (Language, Speech, and Communication), pp. 305–32. Cambridge, MA, USA: MIT Press.Google Scholar
Holloway, T., Bozicevic, M. and Börner, K. 2007. Analyzing and visualizing the semantic coverage of Wikipedia and its authors. Journal of Complexity, Special issue on Understanding Complex Systems 12 (3), 3040.Google Scholar
Hope, D. 2008. Java WordNet::Similarity (beta). http://www.sussex.ac.uk/Users/drh21/. Accessed March 16, 2012.Google Scholar
Hughes, T. and Ramage, D. 2007. Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 581–9. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Hunter, S., Apweiler, R., Attwood, K., Bairoch, A., Bateman, A., Binns, D., Bork, P., and Das, U. 2009. InterPro: the integrative protein signature database. Nucleic Acids Research 37(Database), D211–5.CrossRefGoogle ScholarPubMed
Jarmasz, M. and Szpakowicz, S. 2003. Roget's thesaurus and semantic similarity. In Proceedings of Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, September 10–12, pp. 212–9.Google Scholar
Jiang, J. and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research on Computational Linguistics, Taiwan, pp. 1933.Google Scholar
Jones, K. 1973. Index term weighting. Information Storage and Retrieval 9 (11), 619–33.CrossRefGoogle Scholar
Kanehisa, M. and Goto, S. 2006. KEGG: Kyoto encyclopedia of genes and genomes. Artificial Intelligence 28 (1), 2730.Google Scholar
Kilgarriff, A. 2007. Googleology is bad science. Journal of Computational Linguistics 33 (1), 147–51.CrossRefGoogle Scholar
Kliegr, T., Chandramouli, K., Nemrava, J., Svatek, V., and Izquierdo, E. 2008. Combining image captions and visual analysis for image concept classification. In Proceedings of the 9th International Workshop on Multimedia Data Mining Held in Conjunction with the ACM SIGKDD 2008 (MDM'08), pp. 817. New York, NY, USA: ACM.CrossRefGoogle Scholar
Kohler, S., Schulz, M., Krawitz, P., Bauer, S., Dolken, S., Ott, C., Mundlos, C., Horn, C., Horn, D., Mundlos, S., and Robinson, P. 2009. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. American Journal of Human Genetics 85 (4), 457–64.CrossRefGoogle ScholarPubMed
Kozima, H. and Furugori, T. 1993. Similarity between words computed by spreading activation on an English dictionary. In Proceedings of the 6th Conference on European Chapter of the Association for Computational Linguistics (EACL '93), pp. 232–9. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Kucera, H. and Francis, W. 1967. Computational Analysis of Present-Day American English. Providence, RI, USA: Brown University Press.Google Scholar
Kunze, C. and Lemnitzer, L. 2002. GermaNet – representation, visualization, application. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'02), Las Palmas, Spain, pp. 1485–91. Paris, France: ELRA.Google Scholar
Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. In FellBaum, C. (ed.), WordNet: An Electronic Lexical Database, pp. 305–32. Cambridge, MA, USA: MIT Press.Google Scholar
Lee, L. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL'99), pp. 2532. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Lee, J., Kim, M. and Lee, Y. 1993. Information retrieval based on conceptual distance in IS-A hierarchies. Journal of Documentation 49 (2), 188207.CrossRefGoogle Scholar
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., and Jurafsky, D. 2011. Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task (CONLL Shared Task '11), pp. 2834. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Lee, M., Pincombe, B. and Welsh, M. 2005. An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 1254–9. Chicago, USA: Lawrence Erlbaum.Google Scholar
Lei, Z. and Dai, Y. 2006. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction. BMC Bioinformatics 7, 491.CrossRefGoogle ScholarPubMed
Lesk, M. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC '86), pp. 24–6. New York, NY, USA: ACM.Google Scholar
Li, Y., Bandar, Z. and McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15 (4), 871–82.Google Scholar
Li, J., Gong, B., Chen, X., Liu, T., Wu, C., Zhang, F., Li, C., Li, X., Rao, S., and Li, X. 2011. DOSim: an R package for similarity between diseases based on disease ontology. BMC Bioinformatics 12, 266.CrossRefGoogle Scholar
Li, L., Hu, X., Hu, B., Wang, J., and Zhou, Y. 2009. Measuring sentence similarity from different aspects. In Proceedings of the 8th International Conference on Machine Learning and Cybernetics (ICMLC 2009), Baoding, China, pp. 2244–9.Google Scholar
Li, Y., McLean, D., Bandar, Z., O'Shea, J., and Crockett, K. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18 (8), 1138–50.CrossRefGoogle Scholar
Li, B., Wang, J., Feltus, F., Zhou, J., and Luo, F. 2010. Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins. In Proceedings of the 11th International Conference on Bioinformatics and Computational Biology, pp. 166–72. Las Vegas, NV, USA: CSREA Press.Google Scholar
Lin, D. 1998a. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '98), pp. 768–74. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Lin, D. 1998b. An information-theoretic definition of similarity. In Proceedings of the 5th International Conference on Machine Learning, (ICML '98), pp. 296304. San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
Liu, H. and Chen, Y. 2010. Computing semantic relatedness between named entities using Wikipedia. In Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence (AICI '10), pp. 388–92. Washington, DC, USA: IEEE Computer Society.Google Scholar
Liu, X., Zhou, Y. and Zheng, R. 2007. Measuring semantic similarity in Wordnet. In Proceedings of the 6th International Conference on Machine Learning and Cybernetics, pp. 3431–5. New York, NY, USA: IEEE.Google Scholar
Lord, P., Stevens, R., Brass, A. and Goble, C. 2003a. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19 (10), 1275–83.CrossRefGoogle ScholarPubMed
Lord, P., Stevens, R., Brass, A. and Goble, C. 2003b. Semantic similarity measures as tools for exploring the Gene Ontology. In Proceedings of Pacific Symposium on Biocomputing, Lihue, HI, USA, January 3–7, pp. 601–12.Google Scholar
Maguitman, A., Menczer, F., Roinestad, H. and Vespignani, A. 2005. Algorithmic detection of semantic similarity. In Proceedings of the 14th International Conference on World Wide Web (WWW '05), pp. 107116. New York, NY, USA: ACM.CrossRefGoogle Scholar
Marcus, M., Marcinkiewicz, M. and Santorini, B. 1993. Building a large annotated corpus of English: the Penn treebank. Journal of Computational Linguistics 19 (2), 313–30.Google Scholar
Matsuo, Y., Sakaki, T., Uchiyama, K. and Ishizuka, M. 2006. Graph-based word clustering using a web search engine. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP '06), pp. 542–50. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
McInnes, B., Pedersen, T. and Pakhomov, S. 2009. UMLS-interface and UMLS-similarity: open source software for measuring paths and semantic similarity. In Proceedings of AMIA Annual Symposium, San Francisco, CA, USA, November 4–18, pp. 431–5.Google Scholar
McKusick, V. 1998. Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders, 12th ed.Baltimore, MD: The Johns Hopkins University Press.CrossRefGoogle Scholar
McQuilton, P., St.Pierre, S., Thurmond, J., and the FlyBase Consortium. 2011. FlyBase 101 – the basics of navigating flyBase. Nucleic Acids Research 39, 19.Google Scholar
Meyer, C. and Gurevych, I. 2010. How web communities analyze human language: word senses in Wiktionary. In Proceedings of the 2nd Web Science Conference, Raleigh, NC, April 26–27.Google Scholar
Mihalcea, R., Corley, C. and Strapparava, C. 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06), pp. 775–80. Palo Alto, CA,USA: AAAI Press.Google Scholar
Mihalcea, R. and Moldovan, D. 1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, (ACL '99), pp. 152–8. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Miller, G. and Charles, W. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1), 128.CrossRefGoogle Scholar
Milne, D., Medelyan, O. and Witten, I. 2006. Mining domain-specific thesauri from Wikipedia: a case study. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI'06), pp. 442–8. Washington, DC, USA: IEEE Computer Society.Google Scholar
Milne, D. and Witten, I. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, pp. 2530. Palo Alto, CA, USA: AAAI Press.Google Scholar
Mitchell, A., Strassel, S., Przybocki, M., Davis, J., Doddington, D., Grishman, R., Meyers, A., Brunstain, A., Ferro, L., and Sundheim, B. 2003. TIDES Extraction (ACE) 2003 Multilingual Training Data. LDC Catalog Number: LDC2004T09, pp. 2530. Philadelphia, PA: Linguistic Data Consortium.Google Scholar
Mohler, M. and Mihalcea, R. 2009. Text-to-text semantic similarity for automatic short answer grading. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL '09), pp. 567–75. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Morris, J. and Hirst, G. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Journal of Computational Linguistics, 17 (1), 2148.Google Scholar
Morris, J. and Hirst, G. 2004. Non-classical lexical semantic relations. In Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics (CLS '04), pp. 4651. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Navarro, E., Sajous, F., Gaume, B., Prévot, L., ShuKai, H., Tzu-Yi, K., Magistry, P., and Chu-Ren, H. 2009. Wiktionary and NLP: improving synonymy networks. In Proceedings of the 2009 Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources (People's Web '09), pp. 1927. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Navigli, R. 2006. Meaningful clustering of senses helps boost word sense disambiguation performance. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44), pp. 105–12. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Navigli, R. 2009. Word sense disambiguation: a survey. ACM Computing Survey 41 (2), 10:1–10:69.CrossRefGoogle Scholar
Othman, R., Deris, S. and Illias, R. 2007. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences. Journal of Biomedical Informatics 41 (1), 529–38.Google ScholarPubMed
Pakhomov, S., Coden, A. and Chute, C. 2004. Creating a test corpus of clinical notes manually tagged for part-of-speech information. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA '04), pp. 62–5. Geneva, Switzerland: Association for Computational Linguistics.Google Scholar
Pakhomov, S., Mcinnes, B., Adam, T., Liu, Y., Pedersen, T., and Melton, G. 2010. Semantic similarity and relatedness between clinical terms: an experimental study. Proceedings of AMIA 2010 Symposium, 572–6. Washington, DC, USA: American Medical Informatics.Google Scholar
Pantel, P., Crestan, E., Borkovsky, A., Popescu, A., and Vyas, V. 2009. Web-scale distributional similarity and entity set expansion. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP '09), pp. 938–47. Berlin, Germany: Association for Computational Linguistics.Google Scholar
Patwardhan, S. and Pedersen, T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. Proceedings of the EACL 2006 Workshop Making Sense of Sense – Bringing Computational Linguistics and Psycholinguistics Together, pp. 18. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Pedersen, T., Pakhomov, S., Patwardhan, S. and Chute, C. 2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40 (3), 288–99.CrossRefGoogle ScholarPubMed
Pedersen, T., Patwardhan, S. and Michelizzi, J. 2004. WordNet::Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004 (HLT-NAACL–Demonstrations '04), pp. 3841. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Pekar, V. and Staab, S. 2002. Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision. In Proceedings of the 19th International Conference on Computational Linguistics – vol. 1, (COLING'02), pp. 17. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Pesquita, C., Faria, D., Falcão, A., Lord, P., and Couto, F. 2009. Semantic similarity in biomedical ontologies. PLoS Computational Biology 5 (7): e1000443. 112.CrossRefGoogle ScholarPubMed
Petrakis, E., Varelas, G., Hliaoutakis, A. and Raftopoulou, P. 2006. Design and evaluation of semantic similarity measures for concepts stemming from the same or different ontologies. In Proceedings of the 4th Workshop on Multimedia Semantics (WMS'06), Chania, Crete, June 19–21, pp. 4452.Google Scholar
Pirrò, G. 2009. A semantic similarity metric combining features and intrinsic information content. Data Knowledge Engineering 68 (11), 1289–308.CrossRefGoogle Scholar
Pirrò, G., and Seco, N. 2008. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II: On the Move to Meaningful Internet Systems (OTM '08), pp. 1271–88. Berlin, Germany: Springer-Verlag.Google Scholar
Ponzetto, S. and Strube, M. 2006. Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL '06), pp. 192–9. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Ponzetto, S. and Strube, M. 2007. An API for measuring the relatedness of words in Wikipedia. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions (ACL '07), pp. 4952. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Ponzetto, S. and Strube, M. 2011. Taxonomy induction based on a collaboratively built knowledge repository. Journal of Artificial Intelligence 175 (9–10), 17371756.CrossRefGoogle Scholar
Pozo, A., Pazos, F. and Valencia, A. 2008. Defining functional distances over Gene Ontology. BMC Bioinformatics 9, 50.CrossRefGoogle ScholarPubMed
Rada, R., Mili, H., Bicknell, E. and Blettner, M. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems Management and Cybernetics 19 (1), 1730.CrossRefGoogle Scholar
Radinsky, K., Agichtein, E., Gabrilovich, E. and Markovitch, S. 2011. A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW '11), pp. 337–46. New York, NY, USA: ACM.CrossRefGoogle Scholar
Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI'95), pp. 448–53. San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
Richardson, R. and Smeaton, A. 1995. Using WordNet in a knowledge-based approach to information retrieval. Technical Report CA-0196, School of Computer Applications, Dublin City University.Google Scholar
Riensche, R., Baddeley, B., Sanfilippo, A., Posse, C., and Gopalan, B. 2007. XOA: web-enabled cross-ontological analytics. In Proceedings of the 1st International Workshop on Service-Oriented Technologies for Biological Databases and Toolsat in the ICWS/SCC Conference, pp. 99105. Washington, DC, USA: IEEE Computer Society.Google Scholar
Rodrìguez, M., and Egenhofer, M. 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15 (2), 442–56.CrossRefGoogle Scholar
Rose, T., Stevenson, M. and Whitehead, M. 2002. The Reuters corpus volume 1-from yesterdays news to tomorrows language resources. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pp. 2931. Paris, France: ELRA.Google Scholar
Rubenstein, H. and Goodenough, J. 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10), 627–33.CrossRefGoogle Scholar
Ruiz-Casado, M., Alfonseca, E. and Castells, P. 2005. Using context-window overlapping in synonym discovery and ontology extension. Proceedings of the International Conference on Recent Advances in Natural Language Processing.Google Scholar
Sahami, M. and Heilman, T. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (WWW '06), pp. 377–86. New York, NY, USA: ACM.CrossRefGoogle Scholar
Schickel-Zuber, V. and Faltings, B. 2007. OSS: a semantic similarity function based on hierarchical ontologies. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI'07), pp. 551–6. San Francisco, CA, USA: Morgan Kaufmann.Google Scholar
Schlicker, A., Domingues, F., Rahnenführer, J. and Lengauer, T. 2006. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7, 302.CrossRefGoogle ScholarPubMed
Seco, N., Veale, T. and Hayes, J. 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, August 22–27, pp. 1089–90.Google Scholar
Sevilla, J., Segura, V., Podhorski, A., Guruceaga, E., Mato, J., Martinez-Cruz, L., Corrales, F., and Rubio, A. 2005. Correlation between gene expression and GO semantic similarity. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2 (4), 330–8.CrossRefGoogle ScholarPubMed
Sheng, H., Chen, H., Yu, T. and Feng, Y. 2010. Linked data-based semantic similarity and data mining. In Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2010), pp. 104–8. New York, NY: IEEE Systems, Man, and Cybernetics Society.Google Scholar
Shima, H. 2011. WS4J. http://code.google.com/p/ws4j/. Accessed March 16, 2012.Google Scholar
Speer, N., Spieth, C. and Zell, A. 2004. A memetic clustering algorithm for the functional partition of genes based on the Gene Ontology. In Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, October 7–8, pp. 252–9. New York, NY, USA: IEEE.Google Scholar
Staab, S., Braun, C., Bruder, I., Düsterhöft, A., Heuer, A., Klettke, M., Neumann, G., Prager, B., Pretzel, J., Schnurr, H., Studer, R., Uszkoreit, H., and Wrenger, B. 1999. GETESS: searching the web exploiting German texts. In Proceedings of the 3rd International Conference on Cooperative Information Agents III (CIA'99), pp. 113–24. Berlin, Germany: Springer-Verlag.Google Scholar
Strube, M. and Ponzetto, S. 2006. WikiRelate! computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06), pp. 1419–24. Palo Alto, CA, USA: AAAI Press.Google Scholar
Sussna, M. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93), pp. 6774. New York, NY, USA: ACM.CrossRefGoogle Scholar
Szarvas, G., Zesch, T. and Gurevych, I. 2011. Combining heterogeneous knowledge resources for improved distributional semantic models. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'11), pp. 289303. Tokyo, Japan: Springer-Verlag.CrossRefGoogle Scholar
The BNC Consortium. 2007. The British National Corpus, Version 3 (BNC XML edition). http://www.natcorp.ox.ac.uk/. Accessed March 16, 2012. Distributed by Oxford University Computing Services on behalf of the BNC Consortium.Google Scholar
The Gene Ontology Consortium. 2005. Gene Ontology: tool for the unification of biology. Nature Genetics 25 (1), 25–9.Google Scholar
Tsatsaronis, G., Varlamis, I. and Vazirgiannis, M. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37 (1), 140.CrossRefGoogle Scholar
Turdakov, D. and Velikhov, P. 2008. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. Proceedings of the Spring Young Researcher's Colloquium On Database and Information Systems (CEUR workshop proceedings), St. Petersburg, Russia. Available at CEUR-WS.org.Google Scholar
Turney, P. and Pantel, P. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–88.CrossRefGoogle Scholar
Tversky, A. 1977. Features of similarity. Psychological Review 84 (4), 327–52.CrossRefGoogle Scholar
Vapnik, V. 1998. Statistical Learning Theory. Chichester, UK: Wiley.Google Scholar
Wang, J., Du, Z., Payattakool, R., Yu, P., and Chen, C. 2007. A new method to measure the semantic similarity of GO terms. BMC Bioinformatics 23 (10), 1274–81.CrossRefGoogle ScholarPubMed
Wang, T. and Hirst, G. 2011. Refining the notions of depth and density in WordNet-based semantic similarity measures. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1003–11. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Weeds, E. 2003. Measures and Applications of Lexical Distributional Similarity. PhD thesis, University of Sussex.CrossRefGoogle Scholar
Wojtinnek, P. and Pulman, S. 2011. Semantic relatedness from automatically generated semantic networks. In Proceedings of the 9th International Conference on Computational Semantics (IWCS '11), pp. 390–4. Oxford, UK: Association for Computational Linguistics.Google Scholar
Wu, Z. and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL '94), pp. 133–8. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Wu, H., Su, Z., Mao, F., Olman, V., and Xu, Y. 2005. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Research 33 (9), 2822–37.CrossRefGoogle ScholarPubMed
Wu, X., Zhu, L., Guo, J., Zhang, D., and Lin, K. 2006. Prediction of yeast protein – protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Research 34 (7), 2137–50.CrossRefGoogle ScholarPubMed
Yang, D. and Powers, D. 2005. Measuring semantic similarity in the taxonomy of WordNet. In Proceedings of the 28th Australasian Conference on Computer Science (ACSC '05), pp. 315–22. Darlinghurst, Australia: Australian Computer Society.Google Scholar
Yang, D. and Powers, D. 2006. Verb similarity on the taxonomy of Wordnet. In Proceedings of the 3rd International WordNet Conference (GWC-06). Masaryk, Czech Republic: Masaryk University.Google Scholar
Yang, X. and Su, J. 2007. Coreference resolution using semantic relatedness information from automatically discovered patterns. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 528–35. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Yazdani, M. and Popescu-Belis, A. 2010. A random walk framework to compute textual semantic similarity: a unified model for three benchmark tasks. In Proceedings of the 2010 IEEE 4th International Conference on Semantic Computing (ICSC '10), pp. 424–9. Washington, DC, USA: IEEE Computer Society.Google Scholar
Ye, P., Peyser, B., Pan, X., Boeke, J., Spencer, F., and Bader, J. 2005. Gene function prediction from congruent synthetic lethal interactions in yeast. Molecular Systems Biology 1:2005.0026. pp. 112.CrossRefGoogle ScholarPubMed
Yeh, E., Ramage, D., Manning, C., Agirre, E., and Soroa, A. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the ACL 2009 Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-4), pp. 41–9. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Yu, H., Gao, L., Tu, K. and Guo, Z. 2005. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene 352, 7581.CrossRefGoogle ScholarPubMed
Zesch, T. and Gurevych, I. 2006. Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances (LD '06), pp. 1624. Stroudsburg, PA, USA: Association for Computational Linguistics.CrossRefGoogle Scholar
Zesch, T. and Gurevych, I. 2007. Analysis of the Wikipedia category graph for NLP applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT), pp. 18Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Zesch, T. and Gurevych, I. 2010a. Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words. Natural Language Engineering 16 (1), 2559.CrossRefGoogle Scholar
Zesch, T. and Gurevych, I. 2010b. The more the better? Assessing the influence of Wikipedia's growth on semantic relatedness measures. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC'10). Paris, France: European Language Resources Association (ELRA).Google Scholar
Zesch, T., Müller, C. and Gurevych, I. 2008a. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 1646–52. Paris, France: European Language Resources Association (ELRA).Google Scholar
Zesch, T., Müller, C. and Gurevych, I. 2008b. Using Wiktionary for computing semantic relatedness. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI'08), pp. 861–6. Palo Alto, CA, USA: AAAI Press.Google Scholar
Zhang, Z., Gentile, A. and Ciravegna, F. 2011. Harnessing different knowledge sources to measure semantic relatedness under a uniform model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11), pp. 9911002. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Ziegler, C., Simon, K. and Lausen, G. 2006. Automatic computation of semantic proximity using taxonomic knowledge. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM '06), pp. 465–74. New York, NY, USA: ACM.CrossRefGoogle Scholar