Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-23T09:56:43.305Z Has data issue: false hasContentIssue false

Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features

Published online by Cambridge University Press:  21 October 2015

ABDULGABBAR SAIF
Affiliation:
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia e-mail: [email protected], [email protected], [email protected]
MOHD JUZAIDDIN AB AZIZ
Affiliation:
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia e-mail: [email protected], [email protected], [email protected]
NAZLIA OMAR
Affiliation:
Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia e-mail: [email protected], [email protected], [email protected]

Abstract

The alignment of WordNet and Wikipedia has received wide attention from researchers of computational linguistics, who are building a new lexical knowledge source or enriching the semantic information of WordNet entities. The main challenge of this alignment is how to handle the synonymy and ambiguity issues in the contents of two units from different sources. Therefore, this paper introduces mapping method that links an Arabic WordNet synset to its corresponding article in Wikipedia. This method uses monolingual and bilingual features to overcome the lack of semantic information in Arabic WordNet. For evaluating this method, an Arabic mapping data set, which contains 1,291 synset–article pairs, is compiled. The experimental analysis shows that the proposed method achieves promising results and outperforms the state-of-the-art methods that depend only on monolingual features. The mapped method has also been used to increase the coverage of Arabic WordNet by inserting new synsets from Wikipedia.

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abouenour, L., Bouzoubaa, K. and Rosso, P. 2010. Using the Yago ontology as a resource for the enrichment of Named Entities in Arabic WordNet. Workshop on Language Resources and Human Language Technologies for Semitic Languages Status, Updates, and Prospects (LREC-2010) Conference, Malta, pp. 2731.Google Scholar
Agirre, E. and Soroa, A. 2009. Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics. Athens, Greece, pp. 3341.Google Scholar
Al-Asal, M. S., and Smadi, O. M. 2012. Arabicization and Arabic expanding techniques used in science lectures in two Arab universities. Perspectives in the Arts and Humanities Asia 2 (1): 1538.Google Scholar
Alhanini, Y. and Ab Aziz, M. J. 2011. The enhancement of Arabic stemming by using light stemming and dictionary-based stemming. Journal of Software Engineering and Applications 4 (9): 522–26.Google Scholar
Alkhalifa, M. and Rodríguez, H. 2008. Automatically extending named entities coverage of Arabic WordNet using Wikipedia. International Journal on Information and Communication Technologies 1 (1): 117.Google Scholar
Atserias, J., Climent, S., Rigau, G. and Rodriguez, H. 1997. Combining multiple methods for the automatic construction of multilingual WordNets. In Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP-1997), Association for Computational Linguistics (ACL). Tzigov Chark, pp. 143–49.Google Scholar
Cilibrasi, R. L. and Vitanyi, P. M. B. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19 (3): 370–83.CrossRefGoogle Scholar
Cucerzan, S. 2007. Large-scale Named Entity disambiguation based on Wikipedia data. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2007), Association for Computational Linguistics. Prague, Czech Republic, pp. 708–16.Google Scholar
Elkateb, S., Black, W., Rodríguez, H., Alkhalifa, M., Vossen, P., Pease, A., and Fellbaum, C. 2006. Building a wordnet for arabic. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-2006), Citeseer. Genoa - Italy, pp. 2934.Google Scholar
Fellbaum, C. 1998. WordNet: an Electrical Lexical Database. Cambridge, MA: The MIT Press.CrossRefGoogle Scholar
Fernando, S. and Stevenson, M. 2012. Mapping WordNet synsets to Wikipedia articles. In Calzolari, N., Choukri, K., Declerck, T., Doğan, M. U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC-2012), European Language Resources Association (ELRA). Istanbul, Turkey, pp. 590–96.Google Scholar
Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), Morgan Kaufmann. Hyderabad, India, pp. 1606–11.Google Scholar
Hassan, S. and Mihalcea, R. 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. Singapore, pp. 1192–201.Google Scholar
Hassan, S. and Mihalcea, R. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI 2011 (25th AAAI Conference on Artificial Intelligence), Association for the Advancement of Artificial Intelligence. San Francisco, pp. 884–89.Google Scholar
Kashgary, A. D. 2011. The paradox of translating the untranslatable: Equivalence vs. non-equivalence in translating from Arabic into English. Journal of King Saud University-Languages and Translation 23 (1): 4757.Google Scholar
Leacock, C. and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49 (2): 265–83.Google Scholar
Li, Y., Bandar, Z. A. and McLean, D. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15 (4): 871–82.Google Scholar
Matuschek, M. and Gurevych, I. 2013. Dijkstra-WSA: a graph-based approach to word sense alignment. Transactions of the Association for Computational Linguistics 1 (1): 151–64.Google Scholar
Medelyan, O., Milne, D., Legg, C. and Witten, I. H. 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67 (9): 716–54.Google Scholar
Mihalcea, R. 2007. Using Wikipedia for automatic word sense disambiguation. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics. Rochester, New York, pp. 196203.Google Scholar
Milne, D. and Witten, I. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, Association for the Advancement of Artificial Intelligence. Chicago, USA: AAAI Press, pp. 2530.Google Scholar
Navigli, R. and Ponzetto, S. P. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193 (12): 217–50.Google Scholar
Niemann, E. and Gurevych, I. 2011. The people's web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet. In Proceedings of the 9th International Conference on Computational Semantics (IWCS-2011), Citeseer. Oxford, UK, pp. 205–14.Google Scholar
Paul, P. 1978. Longman Dictionary of Contemporary English. England: Longman Group Limited.Google Scholar
Pilehvar, M. T. and Navigli, R. 2014. A robust approach to aligning heterogeneous lexical resources. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-2014), Association for Computational Linguistics. Baltimore, Maryland, pp. 468–78.Google Scholar
Pirró, G. and Euzenat, J. 2010. A feature and information theoretic framework for semantic similarity and relatedness. In Proceedings of the 9th International Semantic Web Conference (ISWC-2010), Springer. Shanghai, China, pp. 615–30.Google Scholar
Ponzetto, S. P. and Navigli, R. 2010. Knowledge-rich word sense disambiguation rivaling supervised systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics. Uppsala, Sweden, pp. 1522–31.Google Scholar
Pradet, Q., de Chalendar, G., and Desormeaux, J. B. 2014. WoNeF, an improved, expanded and evaluated automatic French translation of WordNet. In Proceedings of the 7th Global WordNetConference, Tartu, Estonia, pp. 3239.Google Scholar
Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 448–53.Google Scholar
Rodríguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Martí, M. A., Black, W., Elkateb, S., Kirk, J., and Pease, A. 2008. Arabic wordnet: Current state and future extensions. In Proceedings of the 4th Global WordNet Conference, Citeseer. Szeged, Hungary, pp. 120.Google Scholar
Roget, P. M. 1911. Roget'S International Thesaurus, 1st ed. New York, USA: Thomas Y. Crowell Co.Google Scholar
Ruiz-Casado, M., Alfonseca, E., and Castells, P. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. Advances in Web Intelligence, pp. 380–86. Lodz, Poland: Springer.Google Scholar
Saif, A., Ab Aziz, M. J., and Omar, N. 2013. Measuring the compositionality of Arabic multiword expressions. Soft Computing Applications and Intelligent Systems, pp. 245–56. Shah Alam, Malaysia: Springer.Google Scholar
Sánchez, D., Batet, M. and Isern, D. 2011. Ontology-based information content computation. Knowledge-Based Systems 24 (2): 297303.Google Scholar
Seco, N., Veale, T. and Hayes, J. 2004. An intrinsic information content metric for semantic similarity in WordNet. 16th European Conference on Artificial Intelligence (ECAI-2004), Including Prestigious Applicants of Intelligent Systems, IOS Press. Valencia, Spain, pp. 1089–90.Google Scholar
Suchanek, F. M., Kasneci, G. and Weikum, G. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, ACM. Banff, Canada, pp. 697706.Google Scholar
Toral, A., Munoz, R. and Monachini, M. 2008. Named entity wordnet. In Proceedings of the 6th International Conference on Language Resources and Evaluation, Citeseer. Marrakech, Marocco, pp. 741–47.Google Scholar
Vossen, P. 1998. A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
Vossen, P., Soroa, A., Zapirain, B. and Rigau, G. 2012. Cross-lingual event-mining using wordnet as a shared knowledge interface. 6th Global WordNet Conference, Publ. Tribun EU. Matsue, Japan, pp. 382–89.Google Scholar
Wolf, E. and Gurevych, I. 2010. Aligning sense inventories in wikipedia and wordnet. In Proceedings of the 1st Workshop on Automated Knowledge Base Construction, Citeseer. Grenoble, France, pp. 2428.Google Scholar
Wu, Z. and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics. Las Cruces, New Mexico, pp. 133–38.Google Scholar
Zesch, T. and Gurevych, I. 2010. Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural Language Engineering 16 (1): 2559.Google Scholar
Zhang, Z., Gentile, A. L. and Ciravegna, F. 2012. Recent advances in methods of lexical semantic relatedness–a survey. Natural Language Engineering 1 (1): 169.Google Scholar
Zhou, Z., Wang, Y. and Gu, J. 2008. A new model of information content for semantic similarity in WordNet. 2nd International Conference on Future Generation Communication and Networking Symposia (FGCNS-2008), IEEE. Hainan Island, China, pp. 8589.Google Scholar