Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-07T22:19:36.735Z Has data issue: false hasContentIssue false

A method based on rules and machine learning for logic form identification in Spanish

Published online by Cambridge University Press:  24 August 2015

F. MARTÍNEZ-SANTIAGO
Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: [email protected], [email protected], [email protected], [email protected]
M. C. DÍAZ-GALIANO
Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: [email protected], [email protected], [email protected], [email protected]
M. Á. GARCÍA-CUMBRERAS
Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: [email protected], [email protected], [email protected], [email protected]
A. MONTEJO-RÁEZ
Affiliation:
Department of Computer Science, Universidad de Jaén, Paraje Las Lagunillas, s/n, 23071, Jaén, Spain e-mail: [email protected], [email protected], [email protected], [email protected]

Abstract

Logic Forms (LF) are simple, first-order logic knowledge representations of natural language sentences. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. LF systems usually identify the syntactic function by means of syntactic rules but this approach is difficult to apply to languages with a high syntax flexibility and ambiguity, for example, Spanish. In this study, we present a mixed method for the derivation of the LF of sentences in Spanish that allows the combination of hard-coded rules and a classifier inspired on semantic role labeling. Thus, the main novelty of our proposal is the way the classifier is applied to generate the predicates of the verbs, while rules are used to translate the rest of the predicates, which are more straightforward and unambiguous than the verbal ones. The proposed mixed system uses a supervised classifier to integrate syntactic and semantic information in order to help overcome the inherent ambiguity of Spanish syntax. This task is accomplished in a similar way to the semantic role labeling task. We use properties extracted from the AnCora-ES corpus in order to train a classifier. A rule-based system is used in order to obtain the LF from the rest of the phrase. The rules are obtained by exploring the syntactic tree of the phrase and encoding the syntactic production rules. The LF algorithm has been evaluated by using shallow parsing with some straightforward Spanish phrases. The verb argument labeling task achieves 84% precision and the proposed mixed LFi method surpasses 11% a system based only on rules.

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work has been partially funded by the ATTOS project (TIN2012-38536-C03-01) from the Spanish Government and the AORESCU project (TIC 07684) from the Andalucía Government.

References

Agerri, R., and Peñas, A. 2010. On the automatic generation of intermediate logic forms for WordNet glosses. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, pp. 2637. Lecture Notes in Computer Science, vol. 6008. Berlin Heidelberg: Springer.Google Scholar
Ahn, D., Fissaha, S., Jijkoun, V., and De Rijke, M. 2004. The University of Amsterdam at Senseval-3: semantic roles and logic forms. In Mihalcea, R., and Edmonds, P. (eds.), Senseval-3: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain: Association for Computational Linguistics, pp. 4953.Google Scholar
Alfaraz, G. 2012. Word order as a change in progress: evidence from Cuban Spanish. In Proceedings of the 6th International Workshop on Spanish Sociolinguistics. University of Arizona, published by Cascadilla Proceedings Project, Somerville, MA, USA.Google Scholar
Anthony, S., and Patrick, J. 2004. Dependency based logical form transformations. In Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, pp. 5457.Google Scholar
Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, Université de Montreal, Canada, pp. 8690.Google Scholar
Bick, E., and Valverde, M. 2009. Automatic semantic role annotation for Spanish. In Proceedings of NODALIDA, Odense, Denmark, pp. 215218.Google Scholar
Carreras, X., Chao, I., Padró, L., and Padró, M. 2004a. FreeLing: an open-source suite of language analyzers. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04). Lisbon, Portugal.Google Scholar
Carreras, X., Màrquez, L., and Chrupała, G. 2004b. Hierarchical recognition of propositional arguments with perceptrons. In Proceeding of CoNLL’2004 Shared Task: Semantic Role Labeling. Boston, MA, USA.Google Scholar
Collins, M. J. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, Philadelphia: University of Pennsylvania.Google Scholar
Daelemans, W., and van den Bosch, A. 2005. Memory-Based Language Processing, Cambridge: Cambridge University Press.Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch, A. 2004. TiMBL: Tilburg Memory-Based Learner, version 5.1, Reference Guide. ILK Technical Report 04-02.Google Scholar
Delmonte, R., and Rotondi, A. 2012. Treebanks of logical forms: they are useful only if consistent. In LREC 2012, ISA7 Workshop.Google Scholar
Ferrández, Ó., Terol, R. M., Muñoz, R., Martínez-Barco, P., and Palomar, M. 2007. A knowledge-based textual entailment approach applied to the AVE task. In Peters, C., Clough, P., Gey, F. C., Karlgren, J., Magnini, B., Oard, D. W., Rijke, M., and Stempfhuber, M. (eds.), Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 490493. Lecture Notes in Computer Science, vol. 4730. Berlin Heidelberg: Springer.Google Scholar
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., Lally, A., Murdock, J. W., Nyberg, E., Prager, J., Schlaefer, N., and Welty, C. 2010. Building Watson: an overview of the DeepQA project. AI Magazine 31 (3): 5979.Google Scholar
Fowler, A., Hauser, B., Hodges, D., Niles, I., Novischi, A., and Stephan, J. 2005. Applying COGEX to recognize textual entailment. In Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment. Southampton, U.K.: PASCAL Recognising Textual Entailment Challenge, pp. 6972.Google Scholar
Gildea, D., and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3), 245288.Google Scholar
Harabagiu, S. M., Miller, G. A., and Moldovan, D. I. 1999. Wordnet 2-A morphologically and semantically enhanced resource. In Proceedings of SIGLEX, Vol. 99. College Park, Maryland, USA.Google Scholar
Henderson, J., Merlo, P., Titov, I., and Musillo, G. 2013. Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model. Computational linguistics, 39 (4), 949998.Google Scholar
Johansson, R., and Nugues, P. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Waikiki, Honolulu, Hawai, USA, pp. 68791.Google Scholar
Lepore, E., and Ludwig, K. 2001. What is logical form?. In Kotatko, P., Pagin, P., and Segal, G. (eds.), Interpreting Davidson, pp. 111142. Stanford: CSLI, 2001.Google Scholar
Màrquez, L., Carreras, X., Litkowski, K., and Stevenson, S. 2008. Semantic role labeling: an introduction to the special issue. Computational Linguistics 34 (2): 145159.Google Scholar
McCord, M. C., Murdock, J. W., and Boguraev, B. K. 2012. Deep parsing in Watson. IBM Journal of Research and Development 56 (3.4): Berlin: Springer.Google Scholar
McCune, W. W. 1994. OTTER reference manual and guide. Argonne National Laboratory, Illinois.CrossRefGoogle Scholar
Moldovan, D., Clark, C., Harabagiu, S., and Hodges, D. 2007. Cogex: a semantically and contextually enriched logic prover for question answering. Journal of Applied Logic 5 (1): 4969.Google Scholar
Moldovan, D. I., and Rus, V. 2001. Logic form transformation of WordNet and its applicability to question answering. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. ACL ’01, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 402409.Google Scholar
Morante, R., and Busser, B. 2007. Memory-based semantic role labelling of Catalan and Spanish. In Proceedings of Conference on Recent Adavances in Natural Language Processing RANLP-2007, Borovets, pp. 388394.Google Scholar
Muñoz-Terol, R., Martínez-Barco, P., and Palomar, M. 2007. Applying logic forms and statistical methods to CL-SR performance. In Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 766769. Lecture Notes in Computer Science, vol. 4730. Berlin: Springer.Google Scholar
Nakamura, M., Kimura, Y., Pham, M. Q. N., Nguyen, M. L., and Shimazu, A. 2010. Treatment of legal sentences including itemization written in Japanese, english and Vietnamese towards translation into logical forms. Journal Natural Language Processing 81–100.Google Scholar
Nguyen, M. L., and Shimazu, A. 2014. A semi supervised learning for mapping NL sentences to logical form with ambiguous supervision. In Data and Knowledge Engineering.Google Scholar
Padró, L., and Stanilovsky, E. 2013. FreeLing 3.0: towards wider multilinguality. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012). ELRA, Istanbul, Turkey.Google Scholar
Palmer, M., Gildea, D., and Xue, N. 2010. Semantic role labeling. Synthesis Lectures on Human Language Technologies 3 (1), 1103.Google Scholar
Pietroski, P. 2009. Logical form. The Stanford Encyclopedia of Philosophy (Fall 2009 Edition), Zalta, E. N. (ed.). http://plato.stanford.edu/entries/logical-form/ Google Scholar
Punyakanok, V., Roth, D., Yih, W.-tau, and Zimak, D. 2004. Semantic role labeling via integer linear programming inference. In Proceedings of the 20th International Conference on Computational Linguistics. COLING ’04, Stroudsburg, PA, USA: Association for Computational Linguistics, p. 1346.Google Scholar
Rus, V. 2002. Logic Form For WordNet Glosses and Application to Question Answering. Ph.D. thesis, Computer Science Department, School of Engineering, Southern Methodist University, Dallas, Texas.Google Scholar
Rus, V. 2004a. Experiments with machine learning for logic arguments identification. In Proceedings of the 15th Midwest Artificial Intelligence and Cognitive Science Conference MAICS 2004. Chicago: Omnipress, pp. 4047.Google Scholar
Rus, V. 2004b. A first evaluation of logic form identification systems. In Proceedings of Senseval-3: Third International Workshop on Evaluation of Systems for Semnatic Analysis for Text, Barcelona, Spain: Association for Computational Linguistics, pp. 3740.Google Scholar
Russell, B. 1914. Our Knowledge of the External World: As a Field for Scientific Method in Philosophy, p. 53. New York: Routledge.Google Scholar
Suñer, M. 1982. Syntax and Semantics of Spanish Presentational Sentence-types, Romance languages and linguistics series. Georgetown: Georgetown University Press Washington, DC.Google Scholar
Tatu, M., Iles, B., and Moldovan, D. 2007. Automatic answer validation using COGEX. In Evaluation of Multilingual and Multi-modal Information Retrieval, pp. 494501. Lecture Notes in Computer Science. Berlin: Springer.Google Scholar
Taulé, M., Martí, M., and Recasens, M. 2008. AnCora: multilevel annotated corpora for Catalan and Spanish LREC.Google Scholar
Todorova, Y. 2009. Answering questions from natural language using A-Prolog. In Logic Programming, pp. 544546. Lecture Notes in Computer Science, vol. 5649. Berlin: Springer Berlin Heidelberg.CrossRefGoogle Scholar
Tustison, C. A. 2004. Logical form Identification for Medical Clinical Trials. Ph.D. thesis.Google Scholar
Wenner, C. 2007. Rule-based logical forms extraction. In Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Tartu, Estonia: Tartu University, pp. 402409.Google Scholar
Zhao, H., Chen, W., Kity, C., and Zhou, G. 2009. Multilingual dependency learning: a huge feature engineering method to semantic dependency parsing. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task. Boulder, Colorado: Association for Computational Linguistics, pp. 5560.Google Scholar