Morphologically rich Urdu grammar parsing using Earley algorithm

QAISER ABBAS

doi:10.1017/S1351324915000133

Morphologically rich Urdu grammar parsing using Earley algorithm

Published online by Cambridge University Press: 16 April 2015

QAISER ABBAS

Show author details

QAISER ABBAS*: Affiliation:
Fachbereich Sprachwissenschaft, Universität Konstanz, 78457 Konstanz, Germany e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This work presents the development and evaluation of an extended Urdu parser. It further focuses on issues related to this parser and describes the changes made in the Earley algorithm to get accurate and relevant results from the Urdu parser. The parser makes use of a morphologically rich context free grammar extracted from a linguistically-rich Urdu treebank. This grammar with sufficient encoded information is comparable with the state-of-the-art parsing requirements for the morphologically rich Urdu language. The extended parsing model and the linguistically rich extracted-grammar both provide us better evaluation results in Urdu/Hindi parsing domain. The parser gives 87% of f-score, which outperforms the existing parsing work of Urdu/Hindi based on the tree-banking approach.

Type: Articles
Information: Natural Language Engineering , Volume 22 , Issue 5 , September 2016 , pp. 775 - 810

DOI: https://doi.org/10.1017/S1351324915000133 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abbas, Q. 2012. Building a hierarchical annotated corpus of urdu: the URDU.KON-TB Treebank. Lecture Notes in Computer Science 7181 (1): 66–79.CrossRef Google Scholar

Abbas, Q., 2014a. Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser. PhD thesis, Germany: KOPS, University of Konstanz.Google Scholar

Abbas, Q., 2014b. Exploiting language variants via grammar parsing having morphologically rich information. In Proceedings of the EMNLP Workshop on Language Technology for Closely Related Languages and Language Variants, Association for Computational Linguistics, Doha, Qatar, pp. 35–45.Google Scholar

Abbas, Q., 2014c. Semi-semantic part of speech annotation and evaluation. In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association for Computational Linguistics, Dublin, Ireland, pp. 75–81.Google Scholar

Abbas, Q., Karamat, N., and Niazi, S., 2009. Development of tree-bank based probabilistic grammar for Urdu language. International Journal of Electrical & Computer Science 9 (09): 231–235.Google Scholar

Abbas, Q., and Nabi Khan, A., 2009. Lexical functional grammar for Urdu modal verbs. In IEEE International Conference on Emerging Technologies (ICET), IEEE, Islamabad, Pakistan, pp. 7–12.Google Scholar

Abbas, Q., and Raza, G., 2014. A computational classification of Urdu dynamic copula verb. International Journal of Computer Applications 85 (10): 1–12.CrossRef Google Scholar

Abbas, Q., Zia, T., and Khan, A. N., 2015. Syntactic and semantic analysis of Urdu modal verbs using XLE parser. International Journal of Computer Applications 107 (10): 39–46.CrossRef Google Scholar

Agrawal, B., Agarwal, R., Husain, S., and Sharma, D. M. 2013. An automatic approach to treebank error detection using a dependency parser. In Computational Linguistics and Intelligent Text Processing, pp. 294–303. Samos, Greece. Springer-Verlag.CrossRef Google Scholar

Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2007. Compilers: Principles, Techniques, & Tools, vol. 1009. USA: Pearson/Addison Wesley.Google Scholar

Ali, W., and Hussain, S. 2010. Urdu dependency parser: a data-driven approach. In Proceedings of Conference on Language and Technology (CLT10), SNLP, Lahore, Pakistan.Google Scholar

Appel, A. W., and Palsberg, J., 2007. Modern Compiler Implementation in Java. New York: Cambridge University Press.Google Scholar

Arun, A., and Keller, F. 2005. Lexicalization in crosslinguistic probabilistic parsing: the case of French. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 306–313. Ann Arbor, Michigan, United States.CrossRef Google Scholar

Aycock, J., and Horspool, R. N. 2002. Practical earley parsing. The Computer Journal 45 (6): 620–630.CrossRef Google Scholar

Begum, R., Husain, S., Dhwaj, A., Sharma, D. M., Bai, L., and Sangal, R. 2008. Dependency annotation scheme for Indian languages. In Proceedings of The 3rd International Joint Conference on Natural Language Processing (IJCNLP), IIIT, Hyderabad, India, pp. 721–726.Google Scholar

Bharati, A., Bhatia, M., Chaitanya, V., and Sangal, R. 1996. Paninian grammar framework applied to english. Technical Report, TRCS-96-238, CSE, IIIT, Kanpur, India.Google Scholar

Bharati, A., Chaitanya, V., Sangal, R., and Ramakrishnamacharyulu, K., 1995. Natural Language Processing: A Paninian Perspective. New Delhi: Prentice-Hall of India.Google Scholar

Bharati, A., Gupta, M., Yadav, V., Gali, K., and Sharma, D. M., 2009. Simple parser for Indian languages in a dependency framework. In Proceedings of the 3rd Linguistic Annotation Workshop, Association for Computational Linguistics, Singapore, pp. 162–165.Google Scholar

Bharati, A., Husain, S., Sharma, D. M., and Sangal, R. 2008. A two-stage constraint based dependency parser for free word order languages. In Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP), COLIPS, Thailand.CrossRef Google Scholar

Bhat, R. A., Jain, S., and Sharma, D. M. 2012. Experiments on dependency parsing of Urdu. In Proceedings of the 11th International Workshop on Treebanks and Linguistic Theories (TLT11), edi-colibri, Portugal.Google Scholar

Butt, M. 1993. Hindi-Urdu infinitives as NPs. In Kachru, Y. (ed.), South Asian Language Review: Special Issue on Studies in Hindi-Urdu, vol. 3(1), pp. 51–72. New Delhi: Creative Publishers.Google Scholar

Butt, M. 2003. The light verb jungle. In Harvard Working Papers in Linguistics, Harvard University, USA.Google Scholar

Butt, M. 2010. The light verb jungle: still hacking away. In Amberber, M., Harvey, M., and Baker, B. (eds.), Complex Predicates in Cross-Linguistic Perspective, pp. 48–78. USA: Cambridge University Press.Google Scholar

Butt, M., and King, T. H. 2007. Urdu in a parallel grammar development environment. In Takenobu, T., and Huang, C. -R. (eds.), Language Resources and Evaluation: Special Issue on Asian Language Processing: State of the Art Resources and Processing, vol. 41, pp. 191–207. Netherlands: Kluwer Academic Publishers.Google Scholar

Butt, M., and Ramchand, G. 2001. Complex aspectual structure in Hindi/Urdu. In Liakata, M., Jensen, B., and Maillat, D. (eds.), Oxford Working Papers in Linguistics, Philology and Phonetics, pp. 1–30. UK: Oxford University.Google Scholar

Butt, M., and Rizvi, J. 2010. Tense and aspect in Urdu. In Cabredo-Hofherr, P., and Laca, B. (eds.), Layers of Aspect. Stanford: CSLI Publications.Google Scholar

Chomsky, N., 1956. Three models for the description of language. IRE Trans. Inform. Theory 2 (3): 113–124.CrossRef Google Scholar

Collins, M., Ramshaw, L., Hajič, J., and Tillmann, C., 1999. A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Maryland, USA, pp. 505–512.Google Scholar

Corazza, A., Lavelli, A., Satta, G., and Zanoli, R. 2004. Analyzing an Italian treebank with state-of-the-art statistical parsers. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories (TLT 2004), Kluwer Academic Publishers, Tuebingen, Germany.Google Scholar

Dubey, A., and Keller, F., 2003. Probabilistic parsing for German using sister-head dependencies. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, Association for Computational Linguistics, Sapporo, Japan, pp. 96–103.Google Scholar

Earley, J., 1970. An efficient context-free parsing algorithm. Communications of the ACM 13 (2): 94–102.CrossRef Google Scholar

Earley, J. C., 1968. An Efficient Context-Free Parsing Algorithm. PhD thesis, PA, USA: Carnegie Mellon University, Pittsburgh.Google Scholar

Hopcroft, J. E., Motwani, R., and Ullman, J. D., 2001. Introduction to Automata Theory, Languages, and Computation. USA: Addison-Wesley.Google Scholar

Jiang, W., Xiong, H., and Liu, Q. 2009. Mutipath shift-reduce parsing with online training. In Proceedings of 1st Workshop on Chinese Syntactic Parsing Evaluation, CIPS ParsEval, Beijing.Google Scholar

Khan, A. J., 2006. Urdu/Hindi: An Artificial Divide: African Heritage, Mesopotamian Roots, Indian Culture & Britiah Colonialism. USA: Algora Publishers.Google Scholar

Kulick, S., Gabbard, R., and Marcus, M. 2006. Parsing the Arabic treebank: analysis and improvements. In: Hajič, J., and Nivre, J. (eds.), Proceedings of the TLT06, Institute of Formal and Applied Linguistics, Prague, Czech Republic, pp. 31–42.Google Scholar

Leblanc, R., and Fischer, C. N., 1988. Crafting a Compiler. USA: Benjamin-Cummings Publishing Company.Google Scholar

Lewis, P. M., Simons, G. F., and Fennig, C. D. 2013. Ethnologue: Languages of the World, 17th ed. Dallas: SIL International.Google Scholar

McDonald, R., Pereira, F., Ribarov, K., and Hajič, J. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 523–530. Vancouver, B.C., Canada.CrossRef Google Scholar

McLane, J. R., 1970. The Political Awakening in India. New Jersey, US: Prentice Hall.Google Scholar

Mukhtar, N., Khan, M. A., and Zuhra, F. T., 2011. Probabilistic context free grammar for Urdu. Linguistic and Literature Review 1 (1): 86–94.Google Scholar

Mukhtar, N., Khan, M. A., and Zuhra, F. T., 2012. Algorithm for developing Urdu Probabilistic Parser. International journal of Electrical and Computer Sciences 12 (3): 57–66.Google Scholar

Mukhtar, N., Khan, M. A., Zuhra, F. T., and Chiragh, N., 2012. Implementation of Urdu probabilistic parser. International Journal of Computational Linguistics (IJCL) 3 (1): 12–20.Google Scholar

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., and Marsi, E., 2007. MaltParser: a language-independent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95–135.CrossRef Google Scholar

Sikkel, K., and Nijholt, A., 1997. Parsing of Context-Free Languages. Berlin/Heidelberg, Germany: Springer Verlag.CrossRef Google Scholar

Tesnière, L., and Fourquet, J. 1959. Eléments de Syntaxe Structurale, vol. 1965. Paris: Klincksieck.Google Scholar

Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Candito, M., Foster, J., Versley, Y., Rehbein, I., and Tounsi, L. 2010. Statistical parsing of morphologically rich languages (SPMRL): what, how and whither. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages, Association for Computational Linguistics, Los Angeles, CA.Google Scholar

Tsarfaty, R., Seddah, D., Kübler, S., and Nivre, J., 2013. Parsing morphologically rich languages: introduction to the special issue. Computational Linguistics 39 (1): 15–22.CrossRef Google Scholar

Tsarfaty, R., and Sima’an, K., 2007. Three-dimensional parametrization for parsing morphologically rich languages. In Proceedings of the 10th International Conference on Parsing Technologies, Association for Computational Linguistics, Prague, Czech Republic, pp. 156–167.Google Scholar

Article contents

Morphologically rich Urdu grammar parsing using Earley algorithm

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests