Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-09T22:11:13.385Z Has data issue: false hasContentIssue false

Distributional lexical semantics: Toward uniform representation paradigms for advanced acquisition and processing tasks

Published online by Cambridge University Press:  11 October 2010

R. BASILI
Affiliation:
University of Roma, Tor Vergata, Via della Ricerca Scientifica, 00133 Roma, Italy
M. PENNACCHIOTTI
Affiliation:
Yahoo! Inc., Santa Clara, CA, USA

Extract

The distributional hypothesis states that words with similar distributional properties have similar semantic properties (Harris 1968). This perspective on word semantics, was early discussed in linguistics (Firth 1957; Harris 1968), and then successfully applied to Information Retrieval (Salton, Wong and Yang 1975). In Information Retrieval, distributional notions (e.g. document frequency and word co-occurrence counts) have proved a key factor of success, as opposed to early logic-based approaches to relevance modeling (van Rijsbergen 1986; Chiaramella and Chevallet 1992; van Rijsbergen and Lalmas 1996).

Type
Papers
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Basili, R., De Cao, D., Marocco, P., and Pennacchiotti, M. 2007. Learning selectional preferences for entailment or paraphrasing rules. In Proceedings of Recent Advanced in Natural Language Processing '07, Borovets, Bulgaria.Google Scholar
Belkin, M., Niyogi, P., and Sindhwani, V. 2006. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7: 23992434.Google Scholar
Berry, M. W., Dumais, S. T., and O'Brien, G. W. 1995. Using linear algebra for intelligent information retrieval. SIAM Review 37 (4): 573595.CrossRefGoogle Scholar
Birkhoff, G., and von Neumann, J. 1936. The logic of quantum mechanics. Annals of Mathematics 37: 823843.CrossRefGoogle Scholar
Bloehdorn, S., Basili, R., Cammisa, M., and Moschitti, A. 2006. Semantic kernels for text classification based on topological measures of feature similarity. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 06), 18–22 December 2006, pp. 808812, Hong Kong.Google Scholar
Bloehdorn, S., and Moschitti, A. 2007. Combined syntactic and semantic kernels for text classification. In Advances in Information Retrieval – Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), 2–5 April 2007, pp. 307318, vol. 4425, Lecture Notes in Computer Science. Rome: Springer.Google Scholar
Budanitsky, A., and Hirst, G. 2006. Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32 (1): 1347.CrossRefGoogle Scholar
Bullinaria, J., and Levy, J. P. 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behaviour Research Methods 39: 510526.CrossRefGoogle ScholarPubMed
Chiaramella, Y., and Chevallet, J. P. 1992. About retrieval models and logic. The Computer Journal 35: 233242.CrossRefGoogle Scholar
Church, K. W., and Mercer, R. L. 1993. Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics 19 (1): 124.Google Scholar
Collins, M., and Duffy, N. 2002. New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 263270. Association for Computational Linguistics, Morristown, NJ.Google Scholar
Cristianini, N., and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press.CrossRefGoogle Scholar
Cristianini, N., Shawe-Taylor, J., and Lodhi, H. 2002. Latent semantic kernels. Journal of Intelligent Information Systems 18 (2–3): 127152.CrossRefGoogle Scholar
Erk, K., and Padó, S. 2008. A structured vector space model for word meaning in context. In EMNLP '08: Proceedings of conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA: Association for Computational Linguistics, pp. 897906.CrossRefGoogle Scholar
Fenstad, J. E. 1999. Why grammar needs geometry more than lambda-terms. In Gerbrandy, J., Marx, M., de Rijke, M., and Vossiuspers, Y. Venema (eds.), Collection of paper for the 50th Birthday of Johan van Benthem. Amsterdam University Press.Google Scholar
Firth, J. R. 1957. A synopsis of linguistic theory 1930–55. In Studies in Linguistic Analysis, pp. 1–32.Google Scholar
Furnas, G. W., Deerwester, S., Dumais, S. T., Landauer, T. K., Harshman, R. A., Streeter, L. A., and Lochbaum, K. E. 1988. Information retrieval using a singular value decomposition model of latent semantic structure. In Proceedings of SIGIR '88, New York.Google Scholar
Gärdenfors, P. 2004. Conceptual Spaces: The Geometry of Thought. The MIT Press.Google Scholar
Globerson, A., Chechik, G., Pereira, F., and Tishby, N. 2007. Euclidean embedding of co-occurrence data. Journal of Machine Learning Research 8: 22652295.Google Scholar
Harris, Z. 1968. Mathematical Structures of Language. New York: Interscience Publishers.Google Scholar
Haussler, D. July, 1999. Convolution kernels on discrete structures. Technical Report, UCSC-CRL-99-10, University of California, Santa Cruz, CA.Google Scholar
He, X., and Niyogi, P. 2003. Locality preserving projections. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada.Google Scholar
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99, Stockholm, Sweden.Google Scholar
Kintsch, W. 2000. Metaphor comprehension: a computational theory. Psychonomic Bulletin and Review 7: 257266.CrossRefGoogle ScholarPubMed
Landauer, T., and Dumais, S. 1997. A solution to plato's problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104: 211240.CrossRefGoogle Scholar
Lin, D. 1998. Automatic retrieval and clustering of similar word. In Proceedings of the Joint International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Montreal, Canada.Google Scholar
Lin, D., and Pantel, P. 2001. DIRT-discovery of inference rules from text. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD-01), San Francisco, CA.Google Scholar
Lowe, W., and McDonald, S. 2000. The direct route: mediated priming in semantic space. In COGSCI 2000, Lawrence Erlbaum Associates, pp. 675–680.Google Scholar
Lund, K., and Burgess, C. 1997. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments & Computers 28: 203208.CrossRefGoogle Scholar
Mitchell, T. M., Shinkareva, S. V., Carlson, A., Vicente, L.Malva, K.-M. C., Mason, R. A., and Just, M. A. 2008. Predicting human brain activity associated with the meanings of nouns. Science 320: 11911195.CrossRefGoogle ScholarPubMed
Moschitti, A. 2004. A study on convolution kernels for shallow statistic parsing. In Proceedings of the Conference of the Association of Computational Linguistics, Barcelona, Spain, pp. 335342.Google Scholar
Moschitti, A., Pighin, D., and Basili, R. 2008. Tree kernels for semantic role labeling. Computational Linguistics 34.CrossRefGoogle Scholar
ÓSéaghdha, D. Séaghdha, D., and Copestake, A. 2008. Semantic classification with distributional kernels. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 649656. Coling 2008 Organizing Committee.Google Scholar
Pado, S., and Lapata, M. 2007. Dependency-based construction of semantic space models. Computational Linguistics 33 (2): 161199.CrossRefGoogle Scholar
Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., and Hovy, E. 2007. Isp: Learning inferential selectional preferences. In Proceedings of HLT/NAACL, Rochester, NY, USA.Google Scholar
Pantel, P., and Lin, D. 2003. Automatically discovering word senses. In Proceedings of Human Language Technology / North American Association for Computational Linguistics, Edmonton, Canada.Google Scholar
Pennacchiotti, M., De Cao, D., Basili, R., Croce, D., and Roth, M. 2008. Automatic induction of framenet lexical units. In Proceedings of The Empirical Methods in Natural Language Processing (EMNLP 2008) Waikiki, Honolulu, Hawaii.Google Scholar
Pereira, F. C. N., Tishby, N., and Lee, L. 1993. Distributional clustering of english words. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, Ohio State University, Columbus, OH, USA, pp. 183190.CrossRefGoogle Scholar
Pollard, C., and Sag, I. 1994. Head-driven Phrase Structure Grammar. Chicago: University of Chicago Press.Google Scholar
Roweis, S., and Saul, L. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500): 23232326.CrossRefGoogle ScholarPubMed
Sahlgren, M. 2006. The Word-Space Model. PhD thesis, Stockholm University.Google Scholar
Salton, G., Wong, A., and Yang, C. 1975. A vector space model for automatic indexing. Communications of the ACM 18: 613620.CrossRefGoogle Scholar
Saul, L. K., and Roweis, S. T. 2003. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4: 119155.Google Scholar
Schutze, H. 1998. Automatic word sense discrimination. Computational Linguistics 24 (1): 97124.Google Scholar
Sindhwani, V., Belkin, M., and Niyogi, P. 2006. The geometric basis of semi-supervised learning. In Chapelle, O., , B. S., and Zien, A. (eds.), Semi-supervised Learning, pp. 217235. MIT Press.CrossRefGoogle Scholar
Sindhwani, V., and Melville, P. 2008. Document-word coregularization for semi-supervised sentiment analysis. In Proceedings of IEEE ICDM.CrossRefGoogle Scholar
Tenenbaum, J., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290: 23192323.CrossRefGoogle ScholarPubMed
Van der Plas, L. 2008. Automatic Lexico-Semantic Acquisition for Question Answering. PhD Thesis, University of Groningen, Groningen, The Netherlands.Google Scholar
van Rijsbergen, C. J. 1986. A non-classical logic for information retrieval. The Computer Journal 29: 481485.CrossRefGoogle Scholar
van Rijsbergen, K. 2004. The Geometry of Information Retrieval. Cambridge University Press.CrossRefGoogle Scholar
van Rijsbergen, C., and Lalmas, M. 1996. An information calculus for information retrieval. Journal of the American Society for Information Science 47: 385398.3.0.CO;2-S>CrossRefGoogle Scholar
Vapnik, V. 1995. The Nature of Statistical Learning Theory. Berlin: Springer-Verlag.CrossRefGoogle Scholar
Varadarajan, V. S. 1985. Geometry of Quantum Theory. Springer-Verlag.Google Scholar
Weinberger, K. Q., and Chapelle, O. 2008. Large margin taxonomy embedding for document categorization. In Koller, D., Schuurmans, D., Bengio, Y, and Bottou, L. (eds.), NIPS, pp. 17371744. MIT Press.Google Scholar
Widdows, D. 2004. Geometry and Meaning. Center for the Study of Language and Information/SRI.Google Scholar
Wittgenstein, L. 1953. Philosophical Investigations. Oxford: Blackwell.Google Scholar