Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-01-11T01:33:17.173Z Has data issue: false hasContentIssue false

An analysis of property inference methods

Published online by Cambridge University Press:  14 January 2022

Alex Rosenfeld*
Affiliation:
Intelligent Automation, Inc., Rockville, MD 20855, USA
Katrin Erk
Affiliation:
Department of Linguistics, The University of Texas at Austin, Austin, TX 78705, USA
*
*Corresponding author. E-mail: [email protected]

Abstract

Property inference involves predicting properties for a word from its distributional representation. We focus on human-generated resources that link words to their properties and on the task of predicting these properties for unseen words. We introduce the use of label propagation, a semi-supervised machine learning approach, for this task and, in the first systematic study of models for this task, find that label propagation achieves state-of-the-art results. For more variety in the kinds of properties tested, we introduce two new property datasets.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research performed while attending The University of Texas at Austin.

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M. and Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado. Association for Computational Linguistics, pp. 19–27.CrossRefGoogle Scholar
Almuhareb, A. and Poesio, M. (2004). Attribute-based and value-based clustering: an evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. Association for Computational Linguistics, pp. 158–165.Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A. and Zanchetta, E. (2009). The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3), 209226.CrossRefGoogle Scholar
Baroni, M. and Lenci, A. (2010). Distributional memory: a general framework for corpus-based semantics. Computational Linguistics 36(4), 673721.CrossRefGoogle Scholar
Baroni, M. and Lenci, A. (2011). How we BLESSed distributional semantic evaluation. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, Edinburgh, UK. Association for Computational Linguistics, pp. 1–10.Google Scholar
Bernier-Colborne, G. and Barrière, C. (2018). CRIM at SemEval-2018 task 9: a hybrid approach to hypernym discovery. In Proceedings of The 12th International Workshop on Semantic Evaluation, New Orleans, Louisiana. Association for Computational Linguistics, pp. 725–731.CrossRefGoogle Scholar
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T. and Taylor, J. (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In Wang J.T. (ed), Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008. ACM, pp. 1247–1250.CrossRefGoogle Scholar
Bruni, E., Boleda, G., Baroni, M. and Tran, N.-K. (2012). Distributional semantics in technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jeju Island, Korea. Association for Computational Linguistics, pp. 136–145.Google Scholar
Clark, S. (2015). Vector space models of lexical meaning. In The Handbook of Contemporary Semantic Theory, Chapter 16. John Wiley & Sons, Ltd., pp. 493–522.CrossRefGoogle Scholar
Derby, S., Miller, P. and Devereux, B. (2019). Feature2Vec: distributional semantic modelling of human property knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics, pp. 58535859.CrossRefGoogle Scholar
Devereux, B., Pilkington, N., Poibeau, T. and Korhonen, A. (2009). Towards unrestricted, large-scale acquisition of feature-based conceptual representations from corpus data. Research on Language and Computation 7(2–4), 137170.CrossRefGoogle Scholar
Devereux, B.J., Tyler, L.K., Geertzen, J. and Randall, B. (2014). The Centre for Speech, Language and the Brain (CSLB) concept property norms. Behavior Research Methods 46(4), 11191127.CrossRefGoogle ScholarPubMed
Devlin, J.T., Gonnerman, L.M., Andersen, E.S. and Seidenberg, M.S. (1998). Category-specific semantic deficits in focal and widespread brain damage: a computational account. Journal of Cognitive Neuroscience 10(1), 77–94.CrossRefGoogle Scholar
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J. and Vapnik, V. (1996). Support vector regression machines. In Mozer M., Jordan M.I. and Petsche, T. (eds), Advances in Neural Information Processing Systems 9, NIPS, Denver, CO, USA, December 2–5, 1996. MIT Press, pp. 155161.Google Scholar
Erk, K. (2012). Vector space models of word meaning and phrase meaning: a survey. Language and Linguistics Compass 6(10), 635653.CrossRefGoogle Scholar
Fagarasan, L., Vecchi, E.M. and Clark, S. (2015). From distributional semantics to feature norms: grounding semantic models in human perceptual data. In Proceedings of the 11th International Conference on Computational Semantics, London, UK. Association for Computational Linguistics, pp. 52–57.Google Scholar
Fellbaum, C. (ed) (1998). WordNet: An Electronic Lexical Database . Language, Speech, and Communication. Cambridge, MA: MIT Press.Google Scholar
Feng, Y. and Lapata, M. (2010). Visual information in semantic representation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California. Association for Computational Linguistics, pp. 9199.Google Scholar
Gärdenfors, P. (2014). The Geometry of Meaning: Semantics Based on Conceptual Spaces. Cambridge, MA:MIT Press.CrossRefGoogle Scholar
Garrard, P., Lambon Ralph, M.A., Hodges, J.R. and Patterson, K. (2001). Prototypicality, distinctiveness, and intercorrelation: analyses of the semantic attributes of living and nonliving concepts. Cognitive Neuropsychology 18(2), 125174.CrossRefGoogle ScholarPubMed
Graff, D., Kong, J., Chen, K. and Maeda, K. (2003). English gigaword. Linguistic Data Consortium, Philadelphia 4(1), 34.Google Scholar
Gupta, A., Boleda, G., Baroni, M. and Padó, S. (2015). Distributional vectors encode referential attributes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Association for Computational Linguistics, pp. 1221.CrossRefGoogle Scholar
Herbelot, A. (2013). What is in a text, what isn’t, and what this has to do with lexical semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers, Potsdam, Germany. Association for Computational Linguistics, pp. 321–327.Google Scholar
Herbelot, A. and Vecchi, E.M. (2015). Building a shared world: mapping distributional to model-theoretic semantic spaces. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Association for Computational Linguistics, pp. 22–32.CrossRefGoogle Scholar
Herbelot, A. and Vecchi, E.M. (2016b). Many speakers, many worlds: interannotator variations in the quantification of feature norms. In Linguistic Issues in Language Technology, Volume 13, 2016. CSLI Publications.CrossRefGoogle Scholar
Hintzman, D. (1986). “Schema abstraction” in a multiple-trace memory model. Psychological Review 93(4), 411428.CrossRefGoogle Scholar
Hintzman, D.L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review 95(4), 528.CrossRefGoogle Scholar
Hsu, C.-W., Chang, C.-C. and Lin, C.-J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science and Information Engineering, National Taiwan University.Google Scholar
Johns, B.T. and Jones, M.N. (2012). Perceptual inference through global lexical similarity. Topics in Cognitive Science 4(1), 103120.CrossRefGoogle ScholarPubMed
Jurafsky, D. and Martin, J.H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd Edn. Prentice Hall Series in Artificial Intelligence. Prentice Hall, Pearson Education International.Google Scholar
Kilgarriff, A. (1997). Putting frequencies in the dictionary. International Journal of Lexicography 10(2), 135155.CrossRefGoogle Scholar
Langone, H., Haskell, B.R. and Miller, G.A. (2004). Annotating WordNet. In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004, Boston, Massachusetts, USA. Association for Computational Linguistics, pp. 6369.Google Scholar
Lazaridou, A., Bruni, E. and Baroni, M. (2014). Is this a wampimuk? cross-modal mapping between distributional semantics and the visual world. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland. Association for Computational Linguistics, pp. 1403–1414.CrossRefGoogle Scholar
Levy, O. and Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Ghahramani Z., Welling M., Cortes C., Lawrence N.D. and Weinberger K.Q. (eds), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp. 21772185.Google Scholar
McRae, K., Cree, G.S., Seidenberg, M.S. and McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods 37(4), 547559.CrossRefGoogle ScholarPubMed
McRae, K., Cree, G.S., Westmacott, R. and Sa, V.R.D. (1999). Further evidence for feature correlations in semantic memory. Canadian Journal of Experimental Psychology = Revue canadienne de psychologie expérimentale 53(4), 360.CrossRefGoogle ScholarPubMed
McRae, K., De Sa, V.R. and Seidenberg, M.S. (1997). On the nature and scope of featural representations of word meaning. Journal of Experimental Psychology: General 126(2), 99.CrossRefGoogle ScholarPubMed
Miller, G.A., Leacock, C., Tengi, R. and Bunker, R.T. (1993). A semantic concordance. In Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21–24, 1993.Google Scholar
Montefinese, M., Zannino, G.D. and Ambrosini, E. (2015). Semantic similarity between old and new items produces false alarms in recognition memory. Psychological Research 79(5), 785794.CrossRefGoogle ScholarPubMed
Murphy, G. (2004). The Big Book of Concepts. A Bradford Book. Cambridge, MA: MIT Press.Google Scholar
Ng, K.S. (2013). A simple explanation of partial least squares. Technical report, The Australian National University.Google Scholar
Nickel, M. and Kiela, D. (2017). Poincaré embeddings for learning hierarchical representations. In Guyon I., von Luxburg U., Bengio S., Wallach H.M., Fergus R., Vishwanathan S.V.N. and Garnett R. (eds), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 63386347.Google Scholar
Noraset, T., Liang, C., Birnbaum, L. and Downey, D. (2017). Definition modeling: learning to define word embeddings in natural language. In Singh S.P. and Markovitch S. (eds), Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA. AAAI Press, pp. 3259–3266.Google Scholar
Pinter, Y. and Eisenstein, J. (2018). Predicting semantic relations using global graph properties. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, pp. 1741–1751.CrossRefGoogle Scholar
Randall, B., Moss, H.E., Rodd, J.M., Greer, M. and Tyler, L.K. (2004). Distinctiveness and correlation in conceptual structure: behavioral and computational studies. Journal of Experimental Psychology: Learning, Memory, and Cognition 30(2), 393.Google ScholarPubMed
Roller, S., Erk, K. and Boleda, G. (2014). Inclusive yet selective: supervised distributional hypernymy detection. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland. Dublin City University and Association for Computational Linguistics, pp. 10251036.Google Scholar
Rosipal, R. and Trejo, L.J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. The Journal of Machine Learning Research 2, 97123.Google Scholar
Rothe, S. and Schütze, H. (2015). AutoExtend: extending word embeddings to embeddings for synsets and lexemes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China. Association for Computational Linguistics, pp. 1793–1803.CrossRefGoogle Scholar
Rubinstein, D., Levi, E., Schwartz, R. and Rappoport, A. (2015). How well do distributional models capture different types of semantic knowledge? In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China. Association for Computational Linguistics, pp. 726–730.Google Scholar
Semin, G.R. and Fiedler, K. (1988). The cognitive functions of linguistic categories in describing persons: social cognition and language. Journal of Personality and Social Psychology 54(4), 558.CrossRefGoogle Scholar
Stone, P. (1997). Thematic text analysis: new agendas for analyzing text content. In Roberts C. (ed), Text Analysis for the Social Sciences. Mahwah, NJ: Lawerence Erlbaum Associates.Google Scholar
Talukdar, P.P. and Crammer, K. (2009). New regularized algorithms for transductive learning. In Buntine W.L., Grobelnik M., Mladenic D. and Shawe-Taylor J. (eds), Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7–11, 2009, Proceedings, Part II, vol. 5782. Lecture Notes in Computer Science. Springer, pp. 442–457.CrossRefGoogle Scholar
The British National Corpus, Version 3 (BNC XML Edition). (2007). Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium.Google Scholar
Turney, P.D. and Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141188.CrossRefGoogle Scholar
Turton, J., Vinson, D. and Smith, R. (2020). Extrapolating binder style word embeddings to new words. In Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources, Marseille, France. European Language Resources Association, pp. 1–8.Google Scholar
Tyler, L.K., Moss, H.E., Durrant-Peatfield, M. and Levy, J. (2000). Conceptual structure and the structure of concepts: a distributed account of category-specific deficits. Brain and Language 75(2), 195231.CrossRefGoogle ScholarPubMed
Ustalov, D., Arefyev, N., Biemann, C. and Panchenko, A. (2017). Negative sampling improves hypernymy extraction based on projection learning. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain. Association for Computational Linguistics, pp. 543–550.CrossRefGoogle Scholar
Vieth, H.E., McMahon, K.L. and de Zubicaray, G.I. (2014). The roles of shared vs. distinctive conceptual features in lexical access. Frontiers in Psychology 5, 1014.CrossRefGoogle ScholarPubMed
Vinson, D. and Vigliocco, G. (2002). A semantic analysis of noun-verb dissociation in aphasia. Journal of Neurolinguistics 15, 317351.CrossRefGoogle Scholar
Vinson, D.P. and Vigliocco, G. (2008). Semantic feature production norms for a large set of objects and events. Behavior Research Methods 40(1), 183190.CrossRefGoogle ScholarPubMed
Vinson, D.P., Vigliocco, G., Cappa, S. and Siri, S. (2003). The breakdown of semantic knowledge: insights from a statistical model of meaning representation. Brain and Language 86(3), 347365.CrossRefGoogle ScholarPubMed
Vulić, I. and Mrkšić, N. (2018). Specialising word vectors for lexical entailment. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 11341145.Google Scholar
Wing, B. and Baldridge, J. (2011). Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA. Association for Computational Linguistics, pp. 955–964.Google Scholar
Supplementary material: File

Rosenfeld and Erk supplementary material

Rosenfeld and Erk supplementary material
Download Rosenfeld and Erk supplementary material(File)
File 128 KB