Hostname: page-component-586b7cd67f-2brh9 Total loading time: 0 Render date: 2024-11-26T03:07:43.399Z Has data issue: false hasContentIssue false

Interpreting compound nouns with kernel methods

Published online by Cambridge University Press:  12 March 2013

DIARMUID Ó SÉAGHDHA
Affiliation:
Computer Laboratory, University of Cambridge, Cambridge, UK e-mail: [email protected], [email protected]
ANN COPESTAKE
Affiliation:
Computer Laboratory, University of Cambridge, Cambridge, UK e-mail: [email protected], [email protected]

Abstract

This paper presents a classification-based approach to noun–noun compound interpretation within the statistical learning framework of kernel methods. In this framework, the primary modelling task is to define measures of similarity between data items, formalised as kernel functions. We consider the different sources of information that are useful for understanding compounds and proceed to define kernels that compute similarity between compounds in terms of these sources. In particular, these kernels implement intuitive notions of lexical and relational similarity and can be computed using distributional information extracted from text corpora. We report performance on classification experiments with three semantic relation inventories at different levels of granularity, demonstrating in each case that combining lexical and relational information sources is beneficial and gives better performance than either source taken alone. The data used in our experiments are taken from general English text, but our methods are also applicable to other domains and potentially to other languages where noun–noun compounding is frequent and productive.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

ACE 2008. Automatic Content Extraction 2008 Evaluation Plan. Available at http://www.itl.nist.gov/iad/mig/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf. Accessed 12 December 2012.Google Scholar
Agarwal, A. and Daumé, H. III 2011. Generative kernels for exponential families. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS-11), Ft. Lauderdale, FL.Google Scholar
Baldwin, T. and Tanaka, T. 2004. Translation by machine of complex nominals: getting it right. In Proceedings of the ACL-04 Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain.Google Scholar
Bauer, L. 2001. Compounding. In Haspelmath, M. (eds.), Language Typology and Language Universals. Hague, Netherlands: Mouton de Gruyter. 695707.Google Scholar
Berg, C., Christensen, J. P. R. and Ressel, P. 1984. Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Berlin, Germany: Springer.CrossRefGoogle Scholar
Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3: 9931022.Google Scholar
Briscoe, T., Carroll, J. and Watson, R. 2006. The second release of the RASP system. In Proceedings of the ACL-06 Interactive Presentation Sessions, Sydney, Australia.Google Scholar
Burnard, L. 1995. Users' Guide for the British National Corpus. Oxford, UK: British National Corpus Consortium, Oxford University Computing Service.Google Scholar
Butnariu, C., Kim, Su N., Nakov, P., Ó Séaghdha, D., Szpakowicz, S., and Veale, T. 2010. Semeval-2010 task 9: the interpretation of noun compounds using paraphrasing verbs and prepositions. In Proceedings of the SemEval-2 Workshop, Uppsala, Sweden.Google Scholar
Clark, S., Copestake, A., Curran, James R., Zhang, Y., Herbelot, A., Haggerty, J., Ahn, B.-G., Wyk, C. Van, Roesner, J., Kummerfeld, J., and Dawborn, T. 2009. Large-scale syntactic processing: parsing the web. Technical report, final report of the 2009 JHU CLSP Workshop, Baltimore, MD.Google Scholar
Cortes, C., Mohri, M. and Rostamizadeh, A. 2010. Two-stage learning kernel algorithms. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel.Google Scholar
Cortes, C. and Vapnik, V. 1995. Support vector networks. Machine Learning 20 (3): 273–97.CrossRefGoogle Scholar
Curran, J. 2003. From Distributional to Semantic Similarity. PhD thesis, School of Informatics, University of Edinburgh, Edinburgh, UK.Google Scholar
Devereux, B. and Costello, F. 2005. Investigating the relations used in conceptual combination. Artificial Intelligence Review 24 (3–4): 489515.CrossRefGoogle Scholar
Devereux, B. and Costello, F. 2007. Learning to interpret novel noun-noun compounds: evidence from a category learning experiment. In Proceedings of the ACL-07 Workshop on Cognitive Aspects of Computational Language Acquisition, Prague, Czech Republic.Google Scholar
Dietterich, Thomas G. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10 (7): 1895–923.Google ScholarPubMed
Estes, Z. and Jones, Lara L. 2006. Priming via relational similarity: a copper horse is faster when seen through a glass eye. Journal of Memory and Language 55 (1): 89101.CrossRefGoogle Scholar
Gagné, Christina L. 2002. Lexical and relational influences on the processing of novel compounds. Brain and Language 81 (1–3): 723–35.CrossRefGoogle ScholarPubMed
Gagné, Christina L., and Shoben, Edward J. 1997. Influence of thematic relations on the comprehension of modifier-noun combinations. Journal of Experimental Psychology: Learning, Memory and Cognition 23 (1): 7187.Google Scholar
Gagné, Christina L., and Shoben, Edward J. 2002. Priming relations in ambiguous noun-noun compounds. Memory and Cognition 30 (4): 637–46.CrossRefGoogle Scholar
Gärtner, T., Flach, Peter A., Kowalczyk, A., and Smola, Alex J. 2002. Multi-instance kernels. In Proceedings of the 19th International Conference on Machine Learning (ICML-02), Sydney, Australia.Google Scholar
Girju, R., Moldovan, D., Tatu, M. and Antohe, D. 2005. On the semantics of noun compounds. Computer Speech and Language 19 (4): 479–96.CrossRefGoogle Scholar
Girju, R., Nakov, P., Nastase, V., Szpakowicz, S., Turney, P., and Yuret, D. 2007. SemEval-2007 Task 04: classification of semantic relations between nominals. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-07), Prague, Czech Republic.Google Scholar
Graff, D., Kong, J., Chen, K. and Maeda, K. 2005. English Gigaword Corpus, 2nd ed.Philadelphia, PA: Linguistic Data Consortium.Google Scholar
Hein, M. and Bousquet, O. 2005. Hilbertian metrics and positive definite kernels on probability measures. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS-05), Barbados.Google Scholar
Joachims, T., Cristianini, N. and Shawe-Taylor, J. 2001. Composite kernels for hypertext categorisation. In Proceedings of the 18th International Conference on Machine Learning (ICML-01), Williamstown, MA.Google Scholar
Kim, Su N., and Baldwin, T. 2005. Automatic interpretation of noun compounds using WordNet similarity. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), Jeju Island, Korea.Google Scholar
Lafferty, J. and Lebanon, G. 2005. Diffusion kernels on statistical manifolds. Journal of Machine Learning Research, 6: 129–63.Google Scholar
Lauer, M. 1995. Designing Statistical Language Learners: Experiments on Compound Nouns. PhD thesis, Macquarie University.Google Scholar
Lee, L. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, MD.Google Scholar
Lin, D. 1999. Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, MD.Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, Christopher J. C. H. 2002. Text classification using string kernels. Journal of Machine Learning Research, 2: 419–44.Google Scholar
Martins, André F. T., Smith, Noah A., Xing, Eric P., Aguiar, Pedro M. Q., and Figueiredo, Mário A. T. 2009. Nonextensive information theoretic kernels on measures. Journal of Machine Learning Research, 10: 935–75.Google Scholar
Mercer, J. 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London, Series A, 209: 415–46.Google Scholar
Nakov, P. 2008. Noun compound interpretation using paraphrasing verbs: Feasibility study. In Proceedings of the 13th International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA-08), Varna, Bulgaria.Google Scholar
Nakov, P. and Hearst, Marti A. 2008. Solving relational similarity problems using the web as a corpus. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), Columbus, OH.Google Scholar
Nastase, V., Shirabad, J. S., Sokolova, M. and Szpakowicz, S. 2006. Learning noun-modifier semantic relations with corpus-based and WordNet-based features. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA.Google Scholar
Nastase, V. and Szpakowicz, S. 2003. Exploring noun-modifier semantic relations. In Proceedings of the 5th International Workshop on Computational Semantics (IWCS-03), Tilburg, The Netherlands.Google Scholar
Ó Séaghdha, D. 2008. Learning Compound Noun Semantics. PhD thesis, University of Cambridge. Published as University of Cambridge Computer Laboratory Technical Report 735.Google Scholar
Ó Séaghdha, D., and Copestake, A. 2007. Co-occurrence contexts for noun compound interpretation. In Proceedings of the ACL-07 Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic.Google Scholar
Ó Séaghdha, D., and Copestake, A. 2008. Semantic classification with distributional kernels. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08), Manchester, UK.Google Scholar
Ó Séaghdha, D., and Copestake, A. 2009. Using lexical and relational similarity to classify semantic relations. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece.Google Scholar
Ó Séaghdha, D., and Korhonen, A. 2011. Probabilistic models of similarity in syntactic context. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP-11), Edinburgh, UK.Google Scholar
Padó, S. and Lapata, M. 2007. Dependency-based construction of semantic space models. Computational Linguistics, 33 (2): 161–99.CrossRefGoogle Scholar
Raffray, Claudine N., Pickering, Martin J., and Branigan, Holly P. 2007. Priming the interpretation of noun-noun compounds. Journal of Memory and Language, 57 (3): 380–95.CrossRefGoogle Scholar
Russell, S. W. 1972. Semantic categories of nominals for conceptual dependency analysis of natural language. Computer Science Department Report CS-299, Stanford University.Google Scholar
Ryder, M. E. 1994. Ordered Chaos: The Interpretation of English Noun-Noun Compounds. Berkeley, CA: University of California Press.Google Scholar
Shawe-Taylor, J., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis., Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Su, Stanley Y. W. 1969. A semantic theory based upon interactive meaning. Computer Sciences Technical Report #68, University of Wisconsin.Google Scholar
Tratz, S. and Hovy, E. 2010. A taxonomy, dataset and classifier for automatic noun compound interpretation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), Uppsala, Sweden.Google Scholar
Turney, Peter D. 2006. Similarity of semantic relations. Computational Linguistics, 32 (3): 379416.CrossRefGoogle Scholar
Turney, Peter D. 2008. A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08), Manchester, UK.Google Scholar
Turney, Peter D., and Pantel, P. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37: 141–88.CrossRefGoogle Scholar
Yao, L., Mimno, D. and McCallum, A. 2009. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-09), Paris, France.Google Scholar