Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-23T12:33:39.390Z Has data issue: false hasContentIssue false

Linguistic knowledge-based vocabularies for Neural Machine Translation

Published online by Cambridge University Press:  02 July 2020

Noe Casas*
Affiliation:
TALP Research Center, Universitat Politècnica de Catalunya
Marta R. Costa-jussà
Affiliation:
TALP Research Center, Universitat Politècnica de Catalunya
José A. R. Fonollosa
Affiliation:
TALP Research Center, Universitat Politècnica de Catalunya
Juan A. Alonso
Affiliation:
Lucy Software, United Language Group
Ramón Fanlo
Affiliation:
Lucy Software, United Language Group
*
*Corresponding author. E-mail: [email protected]

Abstract

Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.

Type
Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alexandrescu, A. and Kirchhoff, K. (2006). Factored neural language models. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACL-Short’06, pp. 14.CrossRefGoogle Scholar
Alonso, J.A. and Thurmair, G. (2003). The comprendium translator system. In Proceedings of the Ninth Machine Translation Summit.Google Scholar
Avramidis, E. and Koehn, P. (2008). Enriching morphologically poor languages for statistical machine translation. In Proceedings of ACL-08: HLT, Columbus, Ohio. Association for Computational Linguistics, pp. 763–770.Google Scholar
Bahdanau, D., Cho, K. and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings.Google Scholar
Banerjee, S. and Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan. Association for Computational Linguistics, pp. 65–72.Google Scholar
Bengio, Y., Simard, P. and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 50 (2), 157166.CrossRefGoogle Scholar
Callison-Burch, C., Osborne, M. and Koehn, P. (2006). Re-evaluating the role of Bleu in machine translation research. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. Association for Computational Linguistics.Google Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y. (2014). Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 17241734. 10.3115/v1/D14-1179.CrossRefGoogle Scholar
Conforti, C., Huck, M. and Fraser, A. (2018). Neural morphological tagging of lemma sequences for machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), Boston, MA. Association for Machine Translation in the Americas, pp. 3953.Google Scholar
Costa-jussà, M.R., Escolano, C. and Fonollosa, J.A.R. (2017). Byte-based neural machine translation. In Proceedings of the First Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics, pp. 154158.CrossRefGoogle Scholar
Creutz, M. and Lagus, K. (2002). Unsupervised discovery of morphemes. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning. Association for Computational Linguistics, pp. 2130. 10.3115/1118647.1118650.CrossRefGoogle Scholar
de Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014). Universal Stanford dependencies: A cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland. European Languages Resources Association (ELRA), pp. 45854592.Google Scholar
Etchegoyhen, T., Azpeitia, A. and Pérez, N. (2016). Exploiting a large strongly comparable corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia. European Language Resources Association (ELRA), pp. 35233529.Google Scholar
Faruqui, M., Schuetze, H., Trancoso, I. and Yaghoobzadeh, Y. (2017). Proceedings of the First Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics.Google Scholar
Faruqui, M., Schuetze, H., Trancoso, I., Tsvetkov, Y. and Yaghoobzadeh, Y. (2018). Proceedings of the Second Workshop on Subword and Character Level Models in NLP (SCLeM 2018). Association for Computational Linguistics.Google Scholar
Garcıa-Martınez, M., Barrault, L. and Bougares, F. (2016). Factored neural machine translation architectures. In Proceedings of the International Workshop on Spoken Language Translation. Seattle, USA, IWSLT, vol. 16.Google Scholar
Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München 910 (1).Google Scholar
Honnibal, M. and Montani, I. (to appear) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.Google Scholar
Jan, N., Cattoni, R., Sebastian, S., Cettolo, M., Turchi, M. and Federico, M. (2018). The iwslt 2018 evaluation campaign. In International Workshop on Spoken Language Translation, pp. 26.Google Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J. and Rush, A. (2017). OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada. Association for Computational Linguistics, pp. 67–72.CrossRefGoogle Scholar
Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. Association for Computational Linguistics, pp. 388395.Google Scholar
Lamiroy, B. and Gebruers, R. (1989). Syntax and machine translation: The metal project. Lingvisticae Investigationes 130 (2), 307332.CrossRefGoogle Scholar
Luong, M.-T. and Manning, C.D. (2016). Achieving open vocabulary neural machine translation with hybrid word-character models. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 1054–1063. 10.18653/v1/P16-1100.CrossRefGoogle Scholar
Luong, T., Pham, H. and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 14121421. 10.18653/v1/D15-1166.CrossRefGoogle Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Maryland. Association for Computational Linguistics, pp. 5560. 10.3115/v1/P14-5010.CrossRefGoogle Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013) Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26. Curran Associates, Inc., pp. 31113119.Google Scholar
Mikolov, T., Sutskever, I., Deoras, A., Le, H.-S., Kombrink, S. and Cernocky, J. (2012). Subword language modeling with neural networks. Technical report, Faculty of Information Technology, Brno University of Technology.Google Scholar
Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 311318. 10.3115/1073083.1073135.Google Scholar
Passban, P. (2017). Machine Translation of Morphologically Rich Languages Using Deep Neural Networks. PhD Thesis, Dublin City University.Google Scholar
Ponti, E.M., Reichart, R., Korhonen, A. and VuliĆ I. (2018). Isomorphic transfer of syntactic structures in cross-lingual NLP. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia. Association for Computational Linguistics.Google Scholar
Post, M. (2018). A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium. Association for Computational Linguistics, pp. 186191. 10.18653/v1/W18-6319.CrossRefGoogle Scholar
Riezler, S. and Maxwell, J.T. (2005). On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan. Association for Computational Linguistics, pp. 5764.Google Scholar
Schütze, H. (1993). Word space. In Advances in Neural Information Processing Systems, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp. 895902. ISBN 1-55860-274-7.Google Scholar
Sennrich, R. and Haddow, B. (2016). Linguistic input features improve neural machine translation. In Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers. Association for Computational Linguistics, pp. 8391. 10.18653/v1/W16-2209.CrossRefGoogle Scholar
Sennrich, R., Haddow, B. and Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany. Association for Computational Linguistics, pp. 17151725. 10.18653/v1/P16-1162.CrossRefGoogle Scholar
Sennrich, R., Schneider, G., Volk, M. and Warin, M. (2009). A new hybrid dependency parser for German. Proceedings of the German Society for Computational Linguistics and Language Technology 115, 124.Google Scholar
Sennrich, R., Volk, M. and Schneider, G. (2013). Exploiting synergies between open resources for German dependency parsing, POS-tagging, and morphological analysis. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 601609.Google Scholar
Shaik, M.A.B., Mousa, A.E.-D., Schlüter, R. and Ney, H. (2011). Hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR. In Interspeech, Florence, Italy, pp. 14411444.Google Scholar
Song, K., Zhang, Y., Zhang, M. and Luo, W. (2018). Improved English to Russian translation by neural suffix prediction. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Sutskever, I., Martens, J. and Hinton, G. (2011). Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Madison, WI, USA. Omnipress, pp. 10171024. ISBN 9781450306195.Google Scholar
Ueffing, N. and Ney, H. (2003). Using POS information for statistical machine translation into. Pure and Applications Algebra 34, 119145.Google Scholar
Vania, C. and Lopez, A. (2017). From characters to words to in between: Do we capture morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 2016–2027. 10.18653/v1/P17-1184.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u. and Polosukhin, I. (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., pp. 59986008.Google Scholar
Virpioja, S., Smit, P., Grönroos, S.-A. and Kurimo, M. (2013). Morfessor 2.0: Python implementation and extensions for Morfessor baseline. Technical report.Google Scholar
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint .Google Scholar