Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-07T22:13:19.807Z Has data issue: false hasContentIssue false

A non-negative tensor factorization model for selectional preference induction

Published online by Cambridge University Press:  11 October 2010

TIM VAN DE CRUYS*
Affiliation:
INRIA & Université Paris 7, Rocquencourt, France e-mail: [email protected]

Abstract

The distributional similarity methods have proven to be a valuable tool for the induction of semantic similarity. Until now, most algorithms use two-way co-occurrence data to compute the meaning of words. Co-occurrence frequencies, however, need not be pairwise. One can easily imagine situations where it is desirable to investigate co-occurrence frequencies of three modes and beyond. This paper will investigate tensor factorization methods to build a model of three-way co-occurrences. The approach is applied to the problem of selectional preference induction, and automatically evaluated in a pseudo-disambiguation task. The results show that tensor factorization, and non-negative tensor factorization in particular, is a promising tool for Natural Language Processing (nlp).

Type
Papers
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abe, N. and Li, H. 1996. Learning word association norms using tree cut pair models. In Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 311.Google Scholar
Acar, E. and Yener, B. 2009. Unsupervised multiway data analysis: A literature survey. IEEE Transactions on Knowledge and Data Engineering 21 (1): 620.CrossRefGoogle Scholar
Bader, B. W. and Kolda, T. G. 2006a. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software 32 (4), December.CrossRefGoogle Scholar
Bader, B. W. and Kolda, T. G. 2006b. Efficient MATLAB computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, December.Google Scholar
Bader, B. W. and Kolda, T. G. 2009. Matlab tensor toolbox version 2.3. http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/, July.Google Scholar
Basili, R., Pazienza, M. T., and Velardi, P. 1992. Computational lexicons: the neat examples and the odd exemplars. In Proceedings of Applied Natural Language Processing Conference - ANLP, Trento, Italy, pp. 96103.Google Scholar
Basili, R., De Cao, D., Marocco, P., and Pennacchiotti, M. 2007. Learning selectional preferences for entailment or paraphrasing rules. In Proceedings of RANLP 2007, Borovets, Bulgaria.Google Scholar
Bhagat, R., Pantel, P., and Hovy, E. 2007. Ledir: an unsupervised algorithm for learning directionality of inference rules. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-07), pp. 161170, Prague, Czech Republic.Google Scholar
Bro, R. and De Jong, S. 1997. A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics 11: 393401.3.0.CO;2-L>CrossRefGoogle Scholar
Bullinaria, J. A. and Levy, J. P. 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods 39: 510526.Google Scholar
Carroll, J. D. and Chang, J.-J. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35: 283319.Google Scholar
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information & lexicography. Computational Linguistics 16 (1): 2229.Google Scholar
Clark, S. and Weir, D. 2001. Class-based probability estimation using a semantic hierarchy. In Proceedings of NAACL 2001, Pittsburgh, USA, pp. 95102.Google Scholar
Deprettere, F. (ed.) 1988. SVD and Signal Processing: Algorithms, Applications and Architectures. Amsterdam, The Netherlands: North-Holland Publishing.Google Scholar
Erk, K. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of ACL 2007, Prague, Czech Republic, pp. 216223.Google Scholar
Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28 (3): 245288.CrossRefGoogle Scholar
Grishman, R. and Sterling, J. 1992. Acquisition of selectional patterns. In Proceedings of COLING 1992, Nantes, France, pp. 658664.Google Scholar
Harshman, R. A. 1970. Foundations of the parafac procedure: models and conditions for an “explanatory” multi-mode factor analysis. In UCLA Working Papers in Phonetics, vol. 16, pp. 184, Los Angeles: University of California.Google Scholar
Hindle, D. and Rooth, M. 1993. Structural ambiguity and lexical relations. Computational Linguistics 19 (1): 103120.Google Scholar
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, Stockholm, Sweden, pp. 289296.Google Scholar
Kiers, H. A. L. and van Mechelen, I. 2001. Three-way component analysis: Principles and illustrative application. Psychological Methods 6: 84110.Google Scholar
Kiers, H. A. L. 2000. Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics 14: 105122.Google Scholar
Kolda, T. and Bader, B. 2006. The TOPHITS model for higher-order web link analysis. In Workshop on Link Analysis, Counterterrorism and Security, Bethesda, MD, USA.Google Scholar
Kolda, T. G. and Bader, B. W. 2009. Tensor decompositions and applications. SIAM Review 51 (3), September.Google Scholar
Landauer, T. and Dumais, S. 1997. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychology Review 104: 211240.Google Scholar
Landauer, T., Foltz, P., and Laham, D. 1998. An Introduction to Latent Semantic Analysis. Discourse Processes 25: 295–284.CrossRefGoogle Scholar
Lawson, C. L. and Hanson, B. J. 1974. Solving Least Squares Problems. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Lee, D. D. and Seung, H. S. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the 2000 Conference of the Advances in Neural information Processing Systems 13, Denver, CO, USA, pp. 556562.Google Scholar
Light, M. and Greiff, W. 2002. Statistical models for the induction and use of selectional preferences. Cognitive Science 26: 269281.CrossRefGoogle Scholar
McCarthy, D. and Carroll, J. 2003. Disambiguating nouns, verbs and adjectives using automatically acquired selectional preferences. Computational Linguistics 29 (4): 639654.Google Scholar
Ordelman, R. J. F. 2002. Twente Nieuws Corpus (TwNC), August. Parlevink Language Technology Group, University of Twente, The Netherlands.Google Scholar
Pereira, F., Tishby, N., and Lee, L. 1993. Distributional clustering of English words. In 31st Annual Meeting of the ACL, Columbus, OH, USA, pp. 183190.Google Scholar
Resnik, P. S. 1993. Selection And Information: A Class-based Approach to Lexical Relationships. Ph.D. thesis, University of Pennsylvania.Google Scholar
Resnik, P. 1996. Selectional constraints: an information-theoretic model and its computational realization. Cognition 61: 127159, November.Google Scholar
Rooth, M., Riezler, S., Prescher, D., Carroll, G., and Beil, F. 1999. Inducing a semantically annotated lexicon via em-based clustering. In 37th Annual Meeting of the ACL, College Park, Maryland, USA, pp. 104111.Google Scholar
Shashua, A. and Hazan, T. 2005. Non-negative tensor factorization with applications to statistics and computer vision. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pp. 792799, New York, NY, USA: ACM.CrossRefGoogle Scholar
Tucker, L. R. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31: 279311.CrossRefGoogle ScholarPubMed
Turney, P. D. 2007. Empirical evaluation of four tensor decomposition algorithms. Technical Report ERB-1152, Ottawa, ON, Canada: National Research Council, Institute for Information Technology.Google Scholar
van Noord, G. 2006. At Last Parsing Is Now Operational. In Mertens, Piet, Fairon, Cedrick, Dister, Anne, and Watrin, Patrick (eds.), TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, pp. 2042, Leuven, Belgium, Leuven University Press.Google Scholar
Vasilescu, M. A. O. and Terzopoulos, D. 2002. Multilinear analysis of image ensembles: Tensorfaces. In European Conference on Computer Vision (ECCV '02), Copenhagen, Denmark, pp. 447460.Google Scholar
Welling, M. and Weber, M. 2001. Positive tensor factorization. Pattern Recognition Letters 22: 12551261.CrossRefGoogle Scholar