Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-05T19:31:20.014Z Has data issue: false hasContentIssue false

Lemaza: An Arabic why-question answering system*

Published online by Cambridge University Press:  24 August 2017

AQIL M. AZMI
Affiliation:
Department of Computer Science, King Saud University, Riyadh 11543, Saudi Arabia e-mail: [email protected], [email protected]
NOUF A. ALSHENAIFI
Affiliation:
Department of Computer Science, King Saud University, Riyadh 11543, Saudi Arabia e-mail: [email protected], [email protected]

Abstract

Question answering systems retrieve information from documents in response to queries. Most of the questions are who- and what-type questions that deal with named entities. A less common and more challenging question to deal with is the why -question. In this paper, we introduce Lemaza (Arabic for why), a system for automatically answering why -questions for Arabic texts. The system is composed of four main components that make use of the Rhetorical Structure Theory. To evaluate Lemaza, we prepared a set of why -question–answer pairs whose answer can be found in a corpus that we compiled out of Open Source Arabic Corpora. Lemaza performed best when the stop-words were not removed. The performance measure was 72.7%, 79.2% and 78.7% for recall, precision and c@1, respectively.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

We would like to thank W. Al-Sanie for sharing his RST implementation; and the language specialist for helping us with why-question–answer pairs. The first author would like to thank Miss Maryam for her assistance in proof-reading the manuscript. Special thanks to all three anonymous reviewers for their constructive comments, which helped in further improvement of the manuscript. This work was supported by a special fund in the Research Center of College of Computer & Information Sciences (CCIS) at King Saud University for which the authors are thankful.

References

Abouenour, L., Bouzouba, K., and Rosso, P., 2013. An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. International Journal on Information and Communication Technologies (IJICT) 3 (3): 3751.Google Scholar
Abouenour, L., Bouzoubaa, K., and Rosso, P. 2008. Improving Q/A using Arabic wordnet. In Proceedings of the 2008 International Arab Conference on Information Technology (ACIT’08), Tunisia.Google Scholar
Akour, M., Abufardeh, S., Magel, K., and Al-Radaideh, Q., 2011. QArabPro: a rule based question answering system for reading comprehension tests in Arabic. American Journal of Applied Sciences 8 (6): 652–61.Google Scholar
Al-Kabi, M. N., Kazakzeh, S. A., Abu Ata, B. M., Al-Rababah, S. A., and Alsmadi, I. M., 2015. A novel root based Arabic stemmer. Journal of King Saud University – Computer and Information Sciences 27 (2): 94103.Google Scholar
Al-Sanie, W. 2005. Towards an Infrastructure for Arabic Text Summarization using Rhetorical Structure Theory. Master’s Thesis, King Saud University, Riyadh, Saudi Arabia.Google Scholar
Asher, N., and Lascarides, A., 2003. Logics of Conversation. Cambridge: Cambridge University Press.Google Scholar
Azmi, A. M., and Al-Thanyyan, S., 2012. A text summarizer for Arabic. Computer Speech and Language 26 (4): 260–73.CrossRefGoogle Scholar
Azmi, A. M., and Aljafari, E. A. 2017. Universal web accessibility and the challenge to integrate informal Arabic users: a case study. In Universal Access in the Information Society (UAIS), Springer, doi:10.1007/s10209-017-0522-3.Google Scholar
Azmi, A. M., and Almajed, R. S., 2015. A survey of automatic Arabic diacritization techniques. Natural Language Engineering (NLE) 21 (3): 477–95.Google Scholar
Azmi, A. M., and AlShenaifi, N. 2014. Handling ‘why’ questions in Arabic. In Proceedings of the 5th International Conference on Arabic Language Processing (CITALA ’14), Oujda, Morocco. Available at http://www.citala.org/papers/paper_56.pdf.Google Scholar
Bateman, J., and Delin, J. 2006. Rhetorical structure theory. In Brown, K. (ed.), Encyclopedia of Language and Linguistics, 2nd ed., pp. 589–97. Amsterdam: Elsevier, BV.Google Scholar
Benajiba, Y. 2007. Arabic Question Answering. Master’s Thesis, Universidad Politécnica de Valencia, Spain.Google Scholar
Benajiba, Y., Rosso, P., and Soriano, J. 2007. Adapting the JIRS passage retrieval system to the Arabic language. In Computational Linguistics and Intelligent Text Processing, pp. 530–41. Lecture Notes in Computer Science, vol. 4394. Berlin Heidelberg: Springer.Google Scholar
Bosma, W. 2005. Extending answers using discourse structure. In RANLP 2005 Workshop on Crossing Barriers in Text Summarization Research, Borovets, Bulgaria.Google Scholar
Brini, W., Ellouze, M., Trigui, O., Mesfar, S., Belguith, L. H., and Rosso, P., 2009. Factoid and definitional Arabic question answering system. In NOOJ ’09, Tozeur, Tunisia, pp. 243–55.Google Scholar
El-Khair, I. A., 2006. Effects of stop words elmination for Arabic information retrieval: a comparative study. International Journal of Computing and Information Sciences 4 (3): 119–33.Google Scholar
Ezzeldin, A. M., and Shaheen, M. 2012. A survey of Arabic question answering: challenges, tasks, approaches, tools, and future trends. In Proceedings of the 13th International Arab Conference on Information Technology (ACIT’12), pp. 280–7.Google Scholar
Farghaly, A., and Shaalan, K., 2009. Arabic natural language processing: challenges and solutions. ACM Transaction on Asian Language Information Processing 8 (4): 122.Google Scholar
Ferguson, C. A., 1959. Diglossia. Word 15 (2): 325–40.Google Scholar
Gaizauskas, R., and Humphreys, K., 2000. A combined IR/NLP approach to question answering against large text collections. In Proceedings of RIAO 2000: Content-Based Multimedia Information Access, Paris, France, pp. 1288–1304.Google Scholar
Habash, N., and Rambow, O. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings 43rd Annual Meeting on Association for Computational Linguistics, pp. 573–80.Google Scholar
Habash, N., Rambow, O., and Roth, R., 2009. MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, pp. 102–9.Google Scholar
Hammo, B., Abu-Salem, H., Lytinen, S., and Evens, M. 2002. QARAB: a question answering system to support the Arabic language. In Workshop on Computational Approaches to Semitic Languages (ACL ’02). Association for Computational Linguistics, pp. 55–68.Google Scholar
Hammo, B., Abuleil, S., Lytinen, S., and Evens, M., 2004. Experimenting with a question answering system for the Arabic language. Computers and the Humanities 38 (4): 397415.Google Scholar
Higashinaka, R., and Isozaki, H., 2008. Corpusbased question answering for why questions. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 419–25.Google Scholar
Iruskieta, M., da Cunha, I., and Taboada, M., 2014. A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora. Language Resources & Evaluation 49 (2): 263309.Google Scholar
Kanaan, G., Hammouri, A., Al-Shalabi, R., and Swalha, M., 2009. A new question answering system for the Arabic language. American Journal of Applied Sciences 6 (4): 797805.CrossRefGoogle Scholar
Keskes, I., Zitoune, F. B., and Belguith, L. H., 2014. Splitting Arabic texts into elementary discourse units. ACM Transaction Asian Language Information Processing 13 (2): 9:19:23.Google Scholar
Khoja, S., and Roger, G. 1999. Stemming Arabic text. Technical Report, Computing department, Lancaster University.Google Scholar
Larkey, L. S., Ballesteros, L., and Connell, M. E. 2002. Improving stemming for Arabic information retrieval: light stemming and cooccurrence analysis. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 275–82.Google Scholar
Mann, W. C., and Thompson, S. A. 1988. Rhetorical structure theory: toward a functional theory of text organization. Text 8 (3), 243–81.Google Scholar
Manning, C. D., Raghavan, P., and Schütze, H., 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.Google Scholar
Marcu, D. 1997. The Rhetorical Parsing, Summarization, and Generation of Natural Languag Texts. PhD’s Thesis, University of Toronto, Toronto, Canada.Google Scholar
Marcu, D., 1998. Improving summarization through rhetorical parsing tuning. In Proceedings of the 6th Workshop on Very Large Corpora, Montreal QC, Canada, pp. 206–15.Google Scholar
Marcu, D., 2000. The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: MIT Press.Google Scholar
Nakov, P., Màrquez, L., Magdy, W., Moschitti, A., Glass, J., and Randeree, B., 2015. Semeval-2015 task 3: answer selection in community question answering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval ’15), Denver, Colorado, pp. 269–81.Google Scholar
Nakov, P., Màrquez, L., Moschitti, A., Magdy, W., Mubarak, H., Freihat, A., Glass, J., and Randeree, B. 2016. SemEval- 2016 task 3: community question answering. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval ’16), San Diego, California.Google Scholar
Nakov, P., Hoogeveen, D., Màrquez, L., Moschitti, A., Mubarak, H., Baldwin, T., and Verspoor, K. 2017. SemEval- 2017 task 3: community question answering. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval ’17), Vancouver, Canada.Google Scholar
Oh, J. H., Torisawa, K., Hashimoto, C., Kawada, T., De Saeger, S., Kazama, J., and Wang, Y., 2012. Why-question answering using sentiment analysis and word classes. In Proceedings 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp. 368–78.Google Scholar
Oh, J. H., Torisawa, K., Hashimoto, C., Sano, M., De Saeger, S., and Ohtake, K., 2013. Why-question answering using intra and intersentential causal relations. In Proceedings 51st Annual Meeting of the Association for Computational Linguistic (ACL 2013), Sofia, Bulgaria, pp. 1733–43.Google Scholar
Peñas, A., and Rodrigo, A., 2011. A simple measure to assess non-response. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL–HLT ’11), Portland, Oregon, pp. 1415–24.Google Scholar
Peñas, A., Hovy, E. H., Forner, P., Rodrigo, Á, Sutcliffe, R. F. E., Sporleder, C., Forascu, C., Benajiba, Y., and Osenova, P. 2012. Overview of QA4MRE at CLEF 2012: question answering for machine reading evaluation. In CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy.Google Scholar
Rosso, P., Benajiba, Y., and Lyhyaoui, A. 2006. Towards an Arabic question answering system. In Proceedings of the 4th Conference on Scientific Research Outlook & Technology Development in the Arab World, Syria, pp. 11–14.Google Scholar
Ryding, K. C., 2005. A Reference Grammar of Modern Standard Arabic. Cambridge: Cambridge University Press.Google Scholar
Saad, M. K., and Ashour, W. Nov., 2010. OSAC: open source Arabic corpora. In Proceedings of the 6th International Conference on Electrical and Computer Science (EECS’10), Lefke, North Cyprus, pp. 118–23.Google Scholar
Salem, Z., Sadek, J., Chakkour, F., and Haskkour, N. 2010. Automatically finding answers to ‘Why’ and ‘How to’ questions for arabic language. In Setchi, R., Jordanov, I., Howlett, R., and Lakhmi, J. (eds.), Knowledge-Based and Intelligent Information and Engineering Systems, vol. 6279, pp. 586–93. Lecture Notes in Computer Science. Berlin Heidelberg: Springer.Google Scholar
Salton, G., Wong, A., and Yang, C. S., 1975. A vector space model for automatic indexing. Communications of ACM 18 (11): 613–20.Google Scholar
Scott, D. R., and de Souza, C. S. 1990. Getting the message across in RST-based text generation. In Dale, R., Mellish, C., and Zock, M. (eds.), Current Research in Natural Language Generation, pp. 4773. San Diego CA: Academic Press Professional Inc.Google Scholar
Seif, A., Mathkour, H., and Touir, A., 2005. An RST computational tool for the Arabic language. In Proceedings of the 7th International Conference on Information Integrationed Web-based Applications Services (iiWAS’05), Kuala Lumpur, Malaysia, pp. 527–34.Google Scholar
Semmar, N., Laib, M., and Fluhr, C. 2006. Using stemming in morphological analysis to improve Arabic information retrieval, Traitement automatique des Langues naturelles (TALN 2006), Leuven, Belgium, pp. 317–26.Google Scholar
Severyn, A., and Moschitti, A., 2012. Structural relationships for largescale learning of answer reranking. In Proceedings of the 35th Annual ACM SIGIR Conference (SIGIR 2012), Portland, Oregon, pp. 741–50.Google Scholar
Severyn, A., and Moschitti, A., 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th Annual ACM SIGIR Conference (SIGIR 2015), Santiago, Chile, pp. 373–82.Google Scholar
Shaheen, M., and Ezzeldin, A. M., 2014. Arabic question answering: systems, resources, tools, and future trends. Arabian Journal for Science and Engineering 39 (6): 4541–64.Google Scholar
Silberztein, M. 2005. NooJ: a linguistic annotation system for corpus processing. In Proceedings of the Conference on Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver BC, Canada.CrossRefGoogle Scholar
Taboada, M., and Stede, M. 2009. Introduction to RST (Rhetorical Structure Theory). Slides available at http://edu.cs.uni-magdeburg.de/EC/lehre/wintersemester-2011-2012/dokumentverarbeitung/folien-und-materialien/RST_Introduction.pdf Google Scholar
Trigui, O., Belguith, L. H., and Rosso, P., 2010. DefArabicQA: Arabic definition question answering system. In Proceedings of the 7th LREC Workshop on Language Resources and Human Language Technologies for Semitic Languages, Valletta, Malta, pp. 40–5.Google Scholar
Tymoshenko, K., and Moschitti, A., 2015. Assessing the impact of syntactic and semantic structures for answer passages reranking. In Proceedings of The 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Melbourne, Australia, pp. 1451–60.Google Scholar
Verberne, S. 2010. In Search of the Why. PhD Thesis, University of Nijmegen, The Netherlands.Google Scholar
Verberne, S., Boves, L., Coppen, P.-A., and Oostdijk, N. 2007. Discourse-based answering of why-questions. Traitement automatique des Langues (TAL), Published by Association pour le traitement automatique des langues (ATALA), Paris France 47 (2): 2141.Google Scholar
Verberne, S., Boves, L., Oostdijk, N., and Coppen, P.-A. 2010. What is not in the bag of words for Why-QA? Computational Linguistics 36 (2): 229–45.Google Scholar
Verberne, S., van Halteren, H., Theijssen, D., Raaijmakers, S., and Boves, L., 2011. Learning to rank for why-question answering. Information Retrieval 14 (2): 107–32.Google Scholar
Webber, B., 2004. D-LTAG: extending lexicalized TAG to discourse. Cognitive Science 28 (5): 751–79.Google Scholar
Zhao, Y.-M., Xu, Z.-M., Guan, Y., and Wang, X.-L., 2006. An open domain question answering system based on improved system similarity model. In Proceedings of the 5th International Conference on Machine Learning and Cybernetics, Dalian, China, pp. 4521–6.Google Scholar