Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-09T03:54:46.705Z Has data issue: false hasContentIssue false

Sentence embeddings in NLI with iterative refinement encoders

Published online by Cambridge University Press:  31 July 2019

Aarne Talman*
Affiliation:
Department of Digital Humanities, University of Helsinki, Finland
Anssi Yli-Jyrä
Affiliation:
Department of Digital Humanities, University of Helsinki, Finland
Jörg Tiedemann
Affiliation:
Department of Digital Humanities, University of Helsinki, Finland
*
*Corresponding author. Email: [email protected]

Abstract

Sentence-level representations are necessary for various natural language processing tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for Stanford Natural Language Inference and Multi-Genre Natural Language Inference. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings’ ability to capture some of the important linguistic properties of sentences.

Type
Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Balazs, J., Marrese-Taylor, E., Loyola, P. and Matsuo, Y. (2017). Refining raw sentence representations for textual entailment recognition via attention. In Workshop on Evaluating Vector Space Representations for NLP. ACL. pp. 5155.CrossRefGoogle Scholar
Bowman, S.R., Angeli, G., Potts, C. and Manning, C.D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing EMNLP. Association for Computational Linguistics. pp. 632642.CrossRefGoogle Scholar
Bowman, S.R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C.D. and Potts, C. (2016). A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 14661477CrossRefGoogle Scholar
Chatzikyriakidis, S., Cooper, R., Dobnik, S. and Larsson, S. (2017). An overview of natural language inference data collection: The way forward? In Proceedings of the Computing Natural Language Inference Workshop.Google Scholar
Chen, Q., Ling, Z.-H. and Zhu, X. (2018). Enhancing sentence embedding with generalized pooling. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics. pp. 18151826.Google Scholar
Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., Jiang, H. and Inkpen, D. (2017a). Enhanced lstm for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 16571668.CrossRefGoogle Scholar
Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., Jiang, H. and Inkpen, D. (2017b). Recurrent neural network-based sentence encoder with gated attention for natural language inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. pp. 3640. ACL.CrossRefGoogle Scholar
Conneau, A. and Kiela, D. (2018). SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the 11th Language Resources and Evaluation Conference. European Language Resource Association. Miyazaki, Japan: Phoenix Seagaia Conference Center, pp. 16991704.Google Scholar
Conneau, A., Kiela, D., Schwenk, H., Barrault, L. and Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 670680.CrossRefGoogle Scholar
Conneau, A., Kruszewski, G., Lample, G., Barrault, L. and Baroni, M. (2018). What you can cram into a single vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 21262136.CrossRefGoogle Scholar
Glockner, M., Shwartz, V. and Goldberg, Y. (2018). Breaking nli systems with sentences that require simple lexical inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics. pp. 650655.CrossRefGoogle Scholar
Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S. and Smith, N.A. (2018). Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics. pp. 107112.Google Scholar
Hill, F., Cho, K. and Korhonen, A. (2016). Learning distributed representations of sentences from unlabelled data. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics. pp. 11367–1377.Google Scholar
Khot, T., Sabharwal, A. and Clark, P. (2018). Scitail: A textual entailment dataset from science question answering. In AAAI Conference on Artificial Intelligence.Google Scholar
Kingma, D.P. and Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Urtasun, R., Torralba, A. and Fidler, S. (2015). Skip-thought vectors. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc. pp. 32943302.Google Scholar
Maas, A.L., Hannun, A.Y. and Ng, A.Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, USA: Curran Associates, Inc. pp. 31113119.Google Scholar
Mou, L., Men, R., Li, G., Xu, Y., Zhang, L., Yan, R. and Jin, Z. (2016). Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics. pp. 130136.CrossRefGoogle Scholar
Nie, Y. and Bansal, M. (2017). Shortcut-stacked sentence encoders for multi-domain inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. pp. 4145.CrossRefGoogle Scholar
Parikh, A.P., Täckström, O., Das, D. and Uszkoreit, J. (2016). A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 22492255.CrossRefGoogle Scholar
Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. pp. 15321543.CrossRefGoogle Scholar
Poliak, A., Naradowsky, J., Haldar, A., Rudinger, R. and Van Durme, B. (2018). Hypothesis only baselines in natural language inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics. pp. 180191.CrossRefGoogle Scholar
Talman, A. and Chatzikyriakidis, S. (2019). Testing the generalization power of neural network models across NLI benchmarks. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. arXiv:1810.09774.Google Scholar
Tay, Y., Tuan, L.A. and Hui, S.C. (2018). Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 15651575.CrossRefGoogle Scholar
Vendrov, I., Kiros, R., Fidler, S. and Urtasun, R. (2016). Order-embeddings of images and language. In 6th International Conference on Learning Representations.Google Scholar
Vu, H. (2017). Lct-malta’s submission to repeval 2017 shared task. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. pp. 5660.CrossRefGoogle Scholar
Williams, A., Nangia, N. and Bowman, S.R. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics. pp. 11121122.Google Scholar
Yoon, D., Lee, D. and Lee, S. (2018). Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding. arXiv:1808.07383.Google Scholar
Young, P., Lai, A., Hodosh, M. and Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In Transactions of the Association for Computational Linguistics (TACL) 2, pp. 6778.CrossRefGoogle Scholar