Arabic Corpus Linguistics and Related Tools

doi:10.1017/9781108277327.020

19 - Arabic Corpus Linguistics and Related Tools

An Overview and Some Critical Observations

from Part IV - Arabic Computational and Corpus Linguistics

Published online by Cambridge University Press: 23 September 2021

Mark Van Mol

Edited by

Karin Ryding and

David Wilmsen

Show author details

Karin Ryding: Affiliation:
Georgetown University, Washington DC
David Wilmsen: Affiliation:
American University of Beirut

Book contents

Get access

Summary

Mark Van Mol provides a critical review of the issues involved in the construction of usable Arabic corpora and the solutions that programmers have attempted in resolving them. One such issue is whether a corpus is made freely available or is placed behind a paywall. This distinction often translates into corpus size, as well, with freely available corpora generally being larger and untagged for parts of speech (POS) and those hidden behind paywalls being smaller and POS-tagged. The reason for this is clear: POS tagging requires large amounts of painstaking labour; on the other hand, scouring large amounts of text from the Internet with web scrubber applications can be done in seconds. As for corpus size, different qualifications make it difficult to compare. Size may be expressed in the number of articles, hours, tokens, kilobytes, megabytes, sentences, words, and sometimes paragraphs that the corpus encompasses. One of the reasons for this is that defining the searchable units of Arabic texts presents complications. Such considerations pertain directly to questions of corpus representativeness. With that arises the question of the nature of the phenomenon under scrutiny, whether the corpora are intended to represent Classical Arabic, modern written Arabic, or Arabic dialects.

Keywords

corpus linguistics Arabic corpora size of corpora POS tagging morphological analysers annotation of corpora

Type: Chapter
Information: The Cambridge Handbook of Arabic Linguistics , pp. 446 - 472

DOI: https://doi.org/10.1017/9781108277327.020 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Abdelnour, J. (1983). Dictionnaire Arabe–Français. Bayreuth: Dar el-Ilm lil-Malayin.Google Scholar

Abouenour, L., Bouzoubaa, K., and Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation 47, 891–917.CrossRef Google Scholar

Ad-Dahdah, A. (1990). Muʿjam qawāʿid al-ʿarabiyya al-ʿālamiyya [A Dictionary of Universal Arabic Grammar]. Beirut: Maktabat Lubnan.Google Scholar

Adouane, W. and Dobnik, S. (2017). Identification of languages in Algerian Arabic multilingual documents. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 1–8.Google Scholar

Al-Badrashiny, M. (2017). Layered language model based hybrid approach to automatic full diacritization of Arabic. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 177–84.Google Scholar

Alfaifi, A. (2015). Building the Arabic Learner Corpus and a System for Arabic Error Annotation. PhD thesis, University of Leeds, School of Computing.Google Scholar

Alhawiti, K. (2014). Adaptive Models of Arabic Text. PhD dissertation, Bangor University, Wales, UK.Google Scholar

Alkhazi, I. (2017). Classifying and segmenting Classical and Modern Standard Arabic using minimum cross-entropy. International Journal of Advanced Computer Science and Applications, 8(4), 421–30.Google Scholar

Al-Marwani, N. and Diab, M. (2017). Arabic textual entailment with word embeddings. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia: Association for Computational Linguistics, 177–84.Google Scholar

Almujaiwel, S. (2017). Discursive patterns of anti-feminism and pro-feminism in Arabic newspapers of the KACST corpus. Discourse & Communication, 11(5), 441–66.CrossRef Google Scholar

Al-Najem, T. (2007). Inheritance-based approach to Arabic verbal root-and-pattern morphology. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 67–87.CrossRef Google Scholar

Alosaimy, A. and Atwell, E. (2017). Tagging Classical Arabic text using available morphological analysers and part of speech taggers. Journal for Language Technology and Computational Linguistics, 32(1), 1–26.Google Scholar

Alqassas, A. (2017). Arabic diglossia and heritage language acquisition: Remarks on acquisition planning. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 81–97.Google Scholar

Al-Sayed, A., Hammo, B., and Yagi, S. (2017). Construction of an English–Arabic political parallel corpus. in Proceedings of the New Trends in Information Technology (NTIT-2017). Amman: The University of Jordan.Google Scholar

Al-Shargi, F. and Rambow, O. (2015). DIWAN: A dialectal word annotation tool for Arabic. In Habash, N., Vogel, S., and Darwish, K., eds., Proceedings of the Second Workshop on Arabic Natural Language Processing. Beijing: Association for Computational Linguistics, 49–58.Google Scholar

Alshutayri, A. and Atwell, E. (2017). Exploring Twitter as a source of an Arabic dialect corpus. International Journal of Computational Linguistics, 8(2), 37–44.Google Scholar

Al-Thubaity, A. and Almujaiwel, S. (2017). A quantitative inquiry into the keywords between primary and reference Arabic corpora. Journal of Quantitative Linguistics 25(2), 121–41. DOI: 10.1080/09296174.2017.1359883, 1–20.Google Scholar

Badawi, E., Carter, M. G., and Gully, A. (2003). Modern Written Arabic: A Comprehensive Grammar. London: Routledge.Google Scholar

Bernardi, F., Chakhaia, L., and Leopold, L. (2017). ‘Sing me a song with social significance’: The (mis)use of statistical significance testing in European sociological research. European Sociological Review, 33(1), 1–15.Google Scholar

Biadsy, F., Hirschberg, J., and Habash, N. (2009). Spoken Arabic dialect identification using phonotactic modeling. In Rosner, M. and Shuly, W., eds., Proceedings of the EACL Workshop on Computational Approaches to Semitic Languages, Athens, ACL, Stroudsburg, PA, USA, 53–61.Google Scholar

Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–57.CrossRef Google Scholar

Blanc, H. (1960). Style variations in Spoken Arabic: A sample of interdialectal educated conversation. In Ferguson, C., Contributions to Arabic Linguistics. Cambridge, MA: Harvard University Press, 81–161.Google Scholar

Bouamor, H., Habash, N., and Oflazer, K. (2014). A Multidialectal Parallel Corpus of Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’ 14), European Language Resources Association. (ELRA) Reykjavik, Iceland, 1240–5.Google Scholar

Boudchiche, M., Mazroui, A., Ould Bebah, M. O. M., Lakhouaja, A., and Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University – Computer and Information Sciences, 29(2), 141–6.CrossRef Google Scholar

Boudelaa, S. and Marslen-Wilson, W. (2010). Aralex: A lexical database for Modern Standard Arabic. Behavior Research Methods, 42(2), 481. https://aralex.mrc-cbu.cam.ac.uk/aralex.online/.Google Scholar

Bougrine, S. Chorana, A., Lakhdari, A., and Cherroun, H. (2017). Toward a web-based speech corpus for Algerian Arabic dialectal varieties. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 138–46.Google Scholar

Buchberger, E. (2009). Book review: Arabic Computational Morphology. Natural Language Engineering, 15, 309–10.Google Scholar

Buckwalter, T. (2007). Issues in Arabic morphological analysis. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 23–41.CrossRef Google Scholar

Buckwalter, T. and Parkinson, D. (2011). A Frequency Dictionary of Arabic Core Vocabulary for Learners, London: Routledge.Google Scholar

Cahill, L. (2007). A syllable-based account of Arabic morphology. In Soudi, A., Bosch, A., and Neumann, G., eds., Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38. Dordrecht: Springer, 45–67.CrossRef Google Scholar

Carter, M. G. (2004). Sibawayhi. Oxford: Oxford Centre for Islamic Studies.Google Scholar

Cleary, J. and Witten, I. (1984). Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, COM-32(4), 396–402.CrossRef Google Scholar

Darwish, K. (2007). Adapting morphology for Arabic information retrieval. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 245–62.Google Scholar

Darwish, K., Mubarak, H., and Abdelali, A. (2017a). Arabic diacritization: Stats, rules, and hacks. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 9–17.CrossRef Google Scholar

Darwish, K., Mubarak, H., Abdelali, A., and Eldesouki, M. (2017b). Arabic POS tagging: Don’t abandon feature engineering just yet. In Habash, N., Diab, M., Darwish, K., et al. eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 130–7.Google Scholar

Diab, M., Al-Badrashiny, M., Aminian, M., Attia, M., Elfardy, H., Habash, N., et al. (2014). Tharwa: A large scale dialectal Arabic–Standard Arabic–English Lexicon. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA) Reykjavik, Iceland, 3782–9.Google Scholar

Diab, M., Hacioglu, K., and Jurafsky, D. (2007). Automatic processing of Modern Arabic text. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 159–79.Google Scholar

Dichy, J. (2002). L’enseignement de l’arabe, langue pluriglossie que dans la France d’aujourd’hui. In Bistolfi, R. and Giordan, A., eds., Les langues de la méditerranée, volume des Cahiers de Confluences Méditerranée. Paris: l’Harmattan, 313–29.Google Scholar

Dichy, J. (2017). Polyglossie de l’Arabe et subsidiarité: au-delà des confusions entraînées par la naotion de diglossie. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 1–23.Google Scholar

Dichy, J. and Farghaly, A. (2007). Grammar–lexis relations in the computational morphology of Arabic. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 115–40.Google Scholar

Ditters, E. (2013). Issues in Arabic computational linguistics. In Owens, J., ed., The Oxford Handbook of Arabic Linguistics. Oxford: Oxford University Press, 213–40.Google Scholar

Eddakrouri, A. (2018). Al-mudāwwanāt al-luġawiyyyat wa dawruha fi mu^cālajat an-nuṣūṣ al-ʿarabiyya [Arabic Corpora and Their Role in the Analysis of Arabic Texts]. Riyadh: King Abdullah bin Abdulaziz International Center for the Arabic Language.Google Scholar

El-Kah, A., Zeroual, I., and Lakhouaja, A. (2017). Application of Arabic language processing in language learning. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, New York: Association for Computing Machinery. http://dx.doi.org/10.1145/3090354.3090390, 1–6.Google Scholar

Farghaly, A. (2010). Arabic Computational Linguistics. Stanford, CA: CSLI Publications.Google Scholar

Farghaly, A. and Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing (TALIP), 8(4), Article 14.CrossRef Google Scholar

Fasha, M., Obeid, N., and Hammo, B. (2017). A proposed model for extracting information from Arabic-based controlled text domains. In Proceedings of the New Trends in Information Technology, Amman: University of Jordan, 86–92.Google Scholar

Fashwan, A. and Alansary, S. (2017). SHAKKIL: An automatic diacritization system for Modern Standard Arabic texts. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, Association for Computational Linguistics, 84–93.Google Scholar

Habash, N. and Roth, R. (2009). CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009, Conference Short Papers, 221–4.CrossRef Google Scholar

Habash, N., Zalmout, N., Taji, D., Hoang, H., and Alzate, M. (2017). A parallel corpus for evaluating machine translation between Arabic and European languages. In Lapata, M., Blunsom, P., and Koller, A., eds., Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers, Valencia: Association for Computational Linguistics, 235–41.Google Scholar

Hajič, J., Hajivcová, E., Pajas, P., Panevová, J., Sgall, P., and Hladka, B. (2001). Prague Dependency Treebank 1.0. www.researchgate.net/publication/307174711_Prague_Dependency_Treebank_10.Google Scholar

Hinds, M. and Badawi, E. (2009). A Dictionary of Egyptian Arabic, Arabic–English. Beirut: Librairie du Liban.Google Scholar

Holes, C. (2013). Orality, culture and language. In Owens, J., ed., The Oxford Handbook of Arabic Linguistics. Oxford: Oxford University Press, 281–99.Google Scholar

Hoogland, J. (2003). Woordenboek Arabisch–Nederlands [Arabic–Dutch Dictionary]. Amsterdam: Dutch Language Union – Bulaaq.Google Scholar

Ibrahimi, K. (2017). L’arabe standard, une langue en quête de reconnaissance et de promotion. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 25–31.Google Scholar

Jarrar, M., Habash, N., Alrimawi, F., Akra, D., and Zalmout, N. (2017). Curras: An annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation, 51(3), 745–75.CrossRef Google Scholar

Kazimirski, A. (1860). Dictionnaire Arabe–Français. Beyrouth: Librairie du Liban, 2 vols.Google Scholar

Khalifa, S., Hassan, S., and Habash, N. (2017). A morphological analyzer for Gulf Arabic verbs. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 35–44.Google Scholar

Koplenig, A. (2017). Against statistical significance testing in corpus linguistics. Corpus Linguistics and Linguistic Theory, 15(2). doi: 10.1515/cllt-2016–0036.Google Scholar

Köprü, S. and Miller, J. (2009). A unification-based approach to the morphological analysis and generation of Arabic. In Proceedings of the 3rd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL3).Google Scholar

Larkey, L. S., Ballesteros, L., and Connell, M. E. (2007). Light stemming for Arabic information retrieval. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 221–43.Google Scholar

Leech, G. (2007). New resources, or just better old ones? The Holy Grail of representativeness. In Hundt, M., Nesselhauf, N., and Biewer, C., eds., Corpus Linguistics and the Web. Amsterdam: Rodopi, 133–49.Google Scholar

Lelubre, X. (2017). Variations regionals et communication scientifique en arabe. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 59–79.Google Scholar

Maamouri, M. and Bies, A. (2009). Penn Arabic Treebank Guidelines version 4.92. Tech. report, University of Pennsylvania.Google Scholar

Maamouri, M., Bies, A., Buckwalter, T., and Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools.Google Scholar

McCarthy, J. (1981). A prosodic theory of nonconcatenative morphology. Linguistic Inquiry 12, 373–418.Google Scholar

McEnery, T. Xiao, R., and Tono, Y. (2006). Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge.Google Scholar

Mdhaffar, S. (2017). Sentiment analysis of Tunisian dialect: Linguistic resources and experiments. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 55–61.Google Scholar

Menacer, M., Mella, O., Fohr, D., Jouvet, D., Langlois, D., and Smaili, K. (2017). An enhanced automatic speech recognition system for Arabic. Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, 157–65.Google Scholar

Mohamed, E., Mohit, B., and Oflazer, K. (2012). Annotating and learning morphological segmentation of Egyptian colloquial Arabic. In Proceedings of International Conference on Language Resources and Evaluation, 873–7.Google Scholar

Muhammed, R., Farrag, M., Elshamly, N., and Abdel-Ghaffar, N. (2011). Summary of Arabizi or Romanization: The dilemma of writing texts. in Proceedings of Jil Jaded Conference, University of Texas at Austin, 18–19 February (2011).Google Scholar

Nagoudi, E. and Schwab, D. (2017). Semantic similarity of Arabic sentences with word embeddings. In N. Habash, M. Diab, K. Darwish et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, 18–24.Google Scholar

Parkinson, D. (2001). Future variability: A corpus study of Arabic future particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV. Amsterdam: Benjamins, 191–211.Google Scholar

Pasha, A., Al-Badrashiny, M., El Kholy, A., Eskander, R., Diab, M., Habash, N., et al. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Reykjavik, Iceland.Google Scholar

Pinon, C. (2017). Intégrer les variations dans l’enseignement de l’arabe langue étrangère: enjeux et méthodes. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 1–23.Google Scholar

Ryding, K. (2005). A Reference Grammar of Modern Standard Arabic. Cambridge: Cambridge University Press.Google Scholar

Saleh, M. (2012). Al-ḥāsūb wa-l bahth al luġawiyy (al mudawannāt alluġawiyyat namūdajan) [The Computer and Linguistic Research (Corpora as a Model)]. Jaamiʾat al-Malik Sauud, Riyadh, 79.Google Scholar

Samih, Y., Attia, M., Eldesouki, M., Mubarak, H., Abdelali, A., Kallmeyer, L., et al. (2017). A neural architecture for dialectal Arabic segmentation. In Habash, N., Diab, M., Darwish, K. et al. eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 46–54.Google Scholar

Schultz, T. and Schlippe, T. (2014). GlobalPhone: Pronunciation dictionaries in 20 languages. In Calzolari, N., Choukri, K., and Declerck, T. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik: European Languages Resources Association, 337–41.Google Scholar

Sforza, V. and Soudi, A. (2007). Arabic computational morphology: A trade-off between multiple operations and multiple stems. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 89–114.CrossRef Google Scholar

Soliman, A., Eissa, K., and El-Beltagy, S. A. (2017). Aravec: A set of Arabic word embedding models for use in Arabic. Procedia Computer Science, 117, 256–65.CrossRef Google Scholar

Soudi, A., van den Bosch, A., and Neumann, G. (2007). Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer.Google Scholar

Taji, D., Habash, N., and Zeman, D. (2017). Universal dependencies for Arabic. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 166–76.Google Scholar

Tratz, S. (2016). Arabic Dependency Treebank. ARL, US Army Research Laboratory, https://catalog.ldc.upenn.edu/docs/LDC2016T18/ARL-TN-0735.pdf.Google Scholar

Van den Bosch, A., Marsi, E., and Soudi, A. (2007). Memory-based morphological analysis and part-of-speech tagging of Arabic. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 201–17.Google Scholar

Van Mol, M. (1998). Variatie in Modern Standaard Arabisch in radionieuwsbulletins, Een synchronisch descriptief onderzoek naar het gebruik van complementaire partikels. PhD dissertation, University of Leuven.Google Scholar

Van Mol, M. (2000). Arabic language and vocabulary acquisition. MIDEO, 24, 434–40.Google Scholar

Van Mol, M. (2001). Evolution of MSA: The case of some complementary particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV. Amsterdam: Benjamins, 135–47.Google Scholar

Van Mol, M. (2003). Variation in Modern Standard Arabic in Radio News Broadcasts, A Synchronic Descriptive Investigation in the Use of Complementary Particles, Orientalia Lovaniensia Analecta, 117. Leuven: Peeters.Google Scholar

Van Mol, M. (2005). From lexical database to tagged Arabic corpus. Paper Presented at the ACIDA/ICMI Conference, Tozeur, 5–6 November. https://ilt.kuleuven.be/arabic/pdf/Mark%20Van%20Mol%20A031.pdf; last accessed 11 December 2020.Google Scholar

Van Mol, M. (2010). Arabic oral media and corpus linguistics: A first methodological outline. In Bassiouni, R., ed., Arabic and the Media: Linguistic Analyses and Applications. Leiden: Brill, 63–79.Google Scholar

Van Mol, M. (2012). From paper dictionary to an elaborate electronic lexicographical database. In Vatvedt, R. and Torjusen, J. M., eds., Proceedings of the 15th EURALEX International Congress,7–11 August (2012). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 758–63.Google Scholar

Van Mol, M. (2014). تطوير متكامل إلكتروني لتدريس اللغة العربية لللناطقين بغيرها [The development of an all compassing electronic device for L2 Arabic learners] In Al-Qahtani, A. et al., eds., أعمال مؤتمر :اتجاهات حديثة في تعليم لغة ثانية [Proceedings of the Current Tendencies in the Teaching of Arabic as L2 Language Conference]. Ryadh: Dār Jāmi^cat al-Malik Sa^cūd lil-Nashr, 219–55.Google Scholar

Van Mol, M. (2017a). La langue arabe et la definition de ses différents niveaux de langue. Éxigences, possibilités et limitations d’une analyse numérique sur base de corpus représentatifs. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 3–46.Google Scholar

Van Mol, M. (2017b). Arabic language teaching and the real linguistic situation: What does linguistic empirical research teach us about Arabic language levels. In Shigeki, K., ed., Proceedings of the 8th Congress of Arabic Linguistics (2015). Kyoto: Tokyo University of Foreign Studies, 331–51.Google Scholar

Van Mol, M. and Berghman, K. (2001a). Leerwoordenboek Modern Arabisch– Nederlands, (Learners Dictionary Modern Arabic–Dutch). Amsterdam: The Dutch Language Union, Bulaaq.Google Scholar

Van Mol, M. and Berghman, K. (2001b). Leerwoordenboek Nederlands – Modern Arabisch (Learners Dictionary Dutch–Modern Arabic). Amsterdam: The Dutch Language Union, Bulaaq.Google Scholar

Wehr, H. (1994). Arabic–English Dictionary, 4th ed. Urbana, IL: Spoken Language Services.Google Scholar

Whitcomb, L. and Alansary, S. (2018). Using linguistic corpora in Arabic Foreign Language Teaching. In Wahba, K., England, L., and Taha, Z. A., eds., Handbook for Arabic Language Teaching Professionals in the 21st Century, vol. II. New York: Routledge, 219–31.Google Scholar

Yaghan, M. A. (2008). Arabizi: A contemporary style of Arabic slang. Design Issues, 24, 39–52.Google Scholar

Yassen, K., Sawalha, M., and Al Zaghoul, F. (2017). Part-of-speech tagging for Classical and MSA text using NLTK. In Proceedings of the New Trends in Information Technology. Amman: University of Jordan, 106–12.Google Scholar

Yaʾqub, I. (1988). Mawsuʿat al-ḥurūf [Thesaurus]. Beirut: Dar al Jayl.Google Scholar

Zaghouani, W. (2014). Critical survey of the freely available Arabic corpora. In Calzolari, N., Choukri, K., and Declerck, T. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik: European Languages Resources Association, 1–8.Google Scholar

Zahran, M. A., Magooda, A., Mahgoub, A. Y., Raafat, H., Rashwan, M., and Atyia, A. (2015). Word representations in vector space and their applications for Arabic. In Gelbukh, A., ed., International Conference on Intelligent Text Processing and Computational Linguistics. Dordrecht: Springer, 430–43.Google Scholar

Zeroual, I., Lakhoaga, A., and Belhabib, R. (2017). Towards a standard part of speech tagset for the Arabic language. Journal of King Saud University – Computer and Information Sciences, 29(2), 171–8.Google Scholar