On generalization of the sense retrofitting model

Yang-Yin Lee; Ting-Yu Yen; Hen-Hsen Huang; Yow-Ting Shiue; Hsin-Hsi Chen

doi:10.1017/S1351324922000523

On generalization of the sense retrofitting model

Published online by Cambridge University Press: 31 March 2023

Yow-Ting Shiue and

Yang-Yin Lee*: Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Ting-Yu Yen: Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Hen-Hsen Huang: Affiliation:
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Yow-Ting Shiue: Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Hsin-Hsi Chen: Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
*: *Corresponding author: E-mail: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

With the aid of recently proposed word embedding algorithms, the study of semantic relatedness has progressed rapidly. However, word-level representations are still lacking for many natural language processing tasks. Various sense-level embedding learning algorithms have been proposed to address this issue. In this paper, we present a generalized model derived from existing sense retrofitting models. In this generalization, we take into account semantic relations between the senses, relation strength, and semantic strength. Experimental results show that the generalized model outperforms previous approaches on four tasks: semantic relatedness, contextual word similarity, semantic difference, and synonym selection. Based on the generalized sense retrofitting model, we also propose a standardization process on the dimensions with four settings, a neighbor expansion process from the nearest neighbors, and combinations of these two approaches. Finally, we propose a Procrustes analysis approach that inspired from bilingual mapping models for learning representations that outside of the ontology. The experimental results show the advantages of these approaches on semantic relatedness tasks.

Keywords

Sense embedding Retrofitting Generalization Semantic relatedness

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 4 , July 2023 , pp. 1097 - 1125

DOI: https://doi.org/10.1017/S1351324922000523 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

†

These authors contributed equally to this work.

References

Artetxe, M., Labaka, G. and Agirre, E. (2016). Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA. Association for Computational Linguistics, pp. 2289–2294.CrossRef Google Scholar

Azzini, A., da Costa Pereira, C., Dragoni, M. and Tettamanzi, A.G. (2011). A neuro-evolutionary corpus-based method for word sense disambiguation. IEEE Intelligent Systems 27(6), 26–35.CrossRef Google Scholar

Bengio, Y., Delalleau, O. and Le Roux, N. (2006). Label propagation and quadratic criterion. In Semi-Supervised Learning.CrossRef Google Scholar

Bian, J., Gao, B. and Liu, T.-Y. (2014). Knowledge-powered deep learning for word embedding. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Nancy, France. Springer, pp. 132–148.CrossRef Google Scholar

Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146.CrossRef Google Scholar

Bollegala, D., Alsuhaibani, M., Maehara, T. and Kawarabayashi, K.-i. (2016). Joint word representation learning using a corpus and a semantic lexicon. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, AZ, USA. AAAI Press, pp. 2690–2696.CrossRef Google Scholar

Bolukbasi, T., Chang, K.-W., Zou, J.Y., Saligrama, V. and Kalai, A.T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), vol. 29, Barcelona, Spain. Curran Associates, Inc.Google Scholar

Brunet, M.-E., Alkalay-Houlihan, C., Anderson, A. and Zemel, R. (2019). Understanding the origins of bias in word embeddings. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA. PMLR, pp. 803–811.Google Scholar

Bruni, E., Tran, N.-K. and Baroni, M. (2014). Multimodal distributional semantics. Journal of Artificial Intelligence Research 49, 1–47.CrossRef Google Scholar

Bullinaria, J.A. and Levy, J.P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(3), 510–526.CrossRef Google Scholar PubMed

Caliskan, A., Bryson, J.J. and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186.CrossRef Google Scholar PubMed

Camacho-Collados, J. and Pilehvar, M.T. (2018). From word to sense embeddings: A survey on vector representations of meaning. Journal of Artificial Intelligence Research 63, 743–788.CrossRef Google Scholar

Camacho-Collados, J., Pilehvar, M.T. and Navigli, R. (2015). Nasari: A novel approach to a semantically-aware representation of items. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO, USA. Association for Computational Linguistics, pp. 567–577.CrossRef Google Scholar

Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q. and Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy. Association for Computational Linguistics, pp. 2978–2988.CrossRef Google Scholar

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.3.0.CO;2-9>CrossRef Google Scholar

Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (NAACL-HLT), Minneapolis, MN, USA. Association for Computational Linguistics, pp. 4171–4186.Google Scholar

Dolan, B., Quirk, C. and Brockett, C. (2004). Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), Geneva, Switzerland. COLING, pp. 350–356.CrossRef Google Scholar

Dragoni, M. and Petrucci, G. (2017). A neural word embeddings approach for multi-domain sentiment analysis. IEEE Transactions on Affective Computing 8(4), 457–470.CrossRef Google Scholar

Ethayarajh, K. (2019). How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics, pp. 55–65.CrossRef Google Scholar

Ettinger, A., Resnik, P. and Carpuat, M. (2016). Retrofitting sense-specific word vectors using parallel text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), San Diego, CA, USA. Association for Computational Linguistics, pp. 1378–1383.CrossRef Google Scholar

Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E. and Smith, N.A. (2015). Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO. Association for Computational Linguistics, pp. 1606–1615.CrossRef Google Scholar

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems 20(1), 116–131.Google Scholar

Ganitkevitch, J., Van Durme, B. and Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Atlanta, GA, USA. Association for Computational Linguistics, pp. 758–764.Google Scholar

Glavaš, G. and Vulić, I. (2018). Explicit retrofitting of distributional word vectors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) ACL, Melbourne, Australia. Association for Computational Linguistics, pp. 34–45.CrossRef Google Scholar

Hamilton, W.L., Leskovec, J. and Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL), Berlin, Germany. Association for Computational Linguistics, pp. 1489–1501.CrossRef Google Scholar

Huang, E.H., Socher, R., Manning, C.D. and Ng, A.Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL), Jeju Island, Korea. Association for Computational Linguistics, pp. 873–882.Google Scholar

Iacobacci, I., Pilehvar, M.T. and Navigli, R. (2015). Sensembed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL-IJCNLP), Beijing, China. Association for Computational Linguistics, pp. 95–105.CrossRef Google Scholar

Jarmasz, M. and Szpakowicz, S. (2004). Roget’s thesaurus and semantic similarity. In Recent Advances in Natural Language Processing III: Selected Papers from RANLP, 2003, 111.Google Scholar

Jauhar, S.K., Dyer, C. and Hovy, E. (2015). Ontologically grounded multi-sense representation learning for semantic vector space models. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO, USA. Association for Computational Linguistics, pp. 683–693.CrossRef Google Scholar

Jonauskaite, D., Sutton, A., Cristianini, N. and Mohr, C. (2021). English colour terms carry gender and valence biases: A corpus study using word embeddings. PLoS ONE 16(6), e0251559.CrossRef Google Scholar PubMed

Joulin, A., Bojanowski, P., Mikolov, T., Jégou, H. and Grave, E. (2018). Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium. Association for Computational Linguistics, pp. 2979–2984.CrossRef Google Scholar

Kipfer, B.A. (1993). Roget’s 21st Century Thesaurus in Dictionary Form: The Essential Reference for Home, School, or Office. Laurel.Google Scholar

Krebs, A. and Paperno, D. (2016). Capturing discriminative attributes in a distributional space: Task proposal. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, Berlin, Germany. Association for Computational Linguistics, pp. 51–54.CrossRef Google Scholar

Landauer, T.K. and Dumais, S.T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240.CrossRef Google Scholar

Leacock, C. and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49(2), 265–283.Google Scholar

Lee, Y.-Y., Ke, H., Huang, H.-H. and Chen, H.-H. (2016). Combining word embedding and lexical database for semantic relatedness measurement. In Proceedings of the 25th International Conference Companion on World Wide Web (WWW), Montréal, Québec, Canada. International World Wide Web Conferences Steering Committee, pp. 73–74.CrossRef Google Scholar

Lee, Y.-Y., Yen, T.-Y., Huang, H.-H. and Chen, H.-H. (2017). Structural-fitting word vectors to linguistic ontology for semantic relatedness measurement. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM), Singapore, Singapore. Association for Computing Machinery, pp. 2151–2154.CrossRef Google Scholar

Lee, Y.-Y., Yen, T.-Y., Huang, H.-H., Shiue, Y.-T. and Chen, H.-H. (2018). Gensense: A generalized sense retrofitting model. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, NM, USA. Association for Computational Linguistics, pp. 1662–1671.Google Scholar

Lengerich, B.J., Maas, A.L. and Potts, C. (2017). Retrofitting distributional embeddings to knowledge graphs with functional relations. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, NM, USA. Association for Computational Linguistics.Google Scholar

Li, J. and Jurafsky, D. (2015). Do multi-sense embeddings improve natural language understanding? In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. Association for Computational Linguistics, pp. 1722–1732.Google Scholar

Lin, D. (1998). An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), vol. 98, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp. 296–304.Google Scholar

Lin, D. and Pantel, P. (2001). Dirt – discovery of inference rules from text. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Francisco, CA, USA. Association for Computing Machinery, pp. 323–328.CrossRef Google Scholar

Liu, X., Nie, J.-Y. and Sordoni, A. (2016). Constraining word embeddings by prior knowledge–application to medical information retrieval. In Information Retrieval Technology. Beijing, China: Springer International Publishing, pp. 155–167.CrossRef Google Scholar

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.Google Scholar

Loureiro, D. and Jorge, A. (2019). Language modelling makes sense: Propagating representations through wordnet for full-coverage word sense disambiguation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy. Association for Computational Linguistics, pp. 5682–5691.CrossRef Google Scholar

Luong, T., Socher, R. and Manning, C. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL), Sofia, Bulgaria. Association for Computational Linguistics, pp. 104–113.Google Scholar

Mancini, M., Camacho-Collados, J., Iacobacci, I. and Navigli, R. (2017). Embedding words and senses together via joint knowledge-enhanced training. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL), Vancouver, Canada. Association for Computational Linguistics, pp. 100–111.CrossRef Google Scholar

Maneewongvatana, S. and Mount, D.M. (1999). It’s okay to be skinny, if your friends are fat. In Center for Geometric Computing 4th Annual Workshop on Computational Geometry, vol. 2, pp. 1–8.Google Scholar

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar

Mikolov, T., Le, Q.V. and Sutskever, I. (2013b). Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.Google Scholar

Miller, G.A. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar

Mrkšic, N., OSéaghdha, D., Thomson, B., Gašic, M., Rojas-Barahona, L., Su, P.-H., Vandyke, D., Wen, T.-H. and Young, S. (2016). Counter-fitting word vectors to linguistic constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), San Diego, California. Association for Computational Linguistics, pp. 142–148.CrossRef Google Scholar

Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B. and Callison-Burch, C. (2015). PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (ACL-IJCNLP), Beijing, China. Association for Computational Linguistics, pp. 425–430.Google Scholar

Pennington, J., Socher, R. and Manning, C. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 1532–1543.CrossRef Google Scholar

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (NAACL), New Orleans, Louisiana. Association for Computational Linguistics.Google Scholar

Qiu, L., Tu, K. and Yu, Y. (2016). Context-dependent sense embedding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, Texas. Association for Computational Linguistics, pp. 183–191.CrossRef Google Scholar

Quirk, C., Brockett, C. and Dolan, W.B. (2004). Monolingual machine translation for paraphrase generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain. Association for Computational Linguistics, pp. 142–149.Google Scholar

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9.Google Scholar

Radinsky, K., Agichtein, E., Gabrilovich, E. and Markovitch, S. (2011). A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW). New York, NY, USA: Association for Computing Machinery, pp. 337–346.CrossRef Google Scholar

Reisinger, J. and Mooney, R.J. (2010). Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Los Angeles, CA, USA. Association for Computational Linguistics, pp. 109–117.Google Scholar

Remus, S. and Biemann, C. (2018). Retrofitting word representations for unsupervised sense aware word similarities. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan. European Language Resources Association.Google Scholar

Sanh, V., Debut, L., Chaumond, J. and Wolf, T. (2019). Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.Google Scholar

Santos, J., Consoli, B. and Vieira, R. (2020). Word embedding evaluation in downstream tasks and semantic analogies. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), Marseille, France. European Language Resources Association, pp. 4828–4834.Google Scholar

Scarlini, B., Pasini, T. and Navigli, R. (2020). Sensembert: Context-enhanced sense embeddings for multilingual word sense disambiguation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 34(05), 8758–8765.CrossRef Google Scholar

Shi, W., Chen, M., Zhou, P. and Chang, K.-W. (2019). Retrofitting contextualized word embeddings with paraphrases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics.CrossRef Google Scholar

Smith, S.L., Turban, D.H., Hamblin, S. and Hammerla, N.Y. (2017). Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In 5th International Conference on Learning Representations (ICLR), Toulon, France. OpenReview.net.Google Scholar

Sun, F., Guo, J., Lan, Y., Xu, J. and Cheng, X. (2016). Inside out: Two jointly predictive models for word representations and phrase representations. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 30, Phoenix, AZ, USA. AAAI Press.CrossRef Google Scholar

Turney, P.D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning (ECML). Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 491–502.CrossRef Google Scholar

Wiedemann, G., Remus, S., Chawla, A. and Biemann, C. (2019). Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS), Erlangen, Germany. German Society for Computational Linguistics & Language Technology.Google Scholar

Wu, Z. and Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL), Las Cruces, NM, USA. Association for Computational Linguistics, pp. 133–138.CrossRef Google Scholar

Xing, C., Wang, D., Liu, C. and Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO, USA. Association for Computational Linguistics, pp. 1006–1011.CrossRef Google Scholar

Yen, T.-Y., Lee, Y.-Y., Huang, H.-H. and Chen, H.-H. (2018). That makes sense: Joint sense retrofitting from contextual and ontological information. In Companion Proceedings of the Web Conference 2018 (WWW), Lyon, France. International World Wide Web Conferences Steering Committee, pp. 15–16.CrossRef Google Scholar

Yin, Z. and Shen, Y. (2018). On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS), vol. 31, Montréal, Canada. Curran Associates, Inc., pp. 895–906.Google Scholar

Yu, M. and Dredze, M. (2014). Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL), Baltimore, MD, USA. Association for Computational Linguistics, pp. 545–550.CrossRef Google Scholar

Article contents

On generalization of the sense retrofitting model

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests