Comparing example-based and statistical machine translation

ANDY WAY; NANO GOUGH

doi:10.1017/S1351324905003888

Abstract

In previous work (Gough and Way 2004), we showed that our Example-Based Machine Translation (EBMT) system improved with respect to both coverage and quality when seeded with increasing amounts of training data, so that it significantly outperformed the on-line MT system Logomedia according to a wide variety of automatic evaluation metrics. While it is perhaps unsurprising that system performance is correlated with the amount of training data, we address in this paper the question of whether a large-scale, robust EBMT system such as ours can outperform a Statistical Machine Translation (SMT) system. We obtained a large English-French translation memory from Sun Microsystems from which we randomly extracted a near 4K test set. The remaining data was split into three training sets, of roughly 50K, 100K and 200K sentence-pairs in order to measure the effect of increasing the size of the training data on the performance of the two systems. Our main observation is that contrary to perceived wisdom in the field, there appears to be little substance to the claim that SMT systems are guaranteed to outperform EBMT systems when confronted with ‘enough’ training data. Our tests on a 4.8 million word bitext indicate that while SMT appears to outperform our system for French-English on a number of metrics, for English-French, on all but one automatic evaluation metric, the performance of our EBMT system is superior to the baseline SMT model.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Carl, Michael 2007. A system-theoretical view of EBMT. Machine Translation, Vol. 19, Issue. 3-4, p. 229.

Wu, Dekai 2007. MT model space: statistical versus compositional versus example-based machine translation. Machine Translation, Vol. 19, Issue. 3-4, p. 213.

Quirk, Christopher and Menezes, Arul 2007. Dependency treelet translation: the convergence of statistical and example-based machine-translation?. Machine Translation, Vol. 20, Issue. 1, p. 43.

Groves, Declan and Way, Andy 2007. Hybrid data-driven models of machine translation. Machine Translation, Vol. 19, Issue. 3-4, p. 301.

Hutchins, John 2007. Example-based machine translation: a review and commentary. Machine Translation, Vol. 19, Issue. 3-4, p. 197.

Sun, Xiao Ren, Fuji and Huang, Degen 2009. Extended super function based Chinese Japanese machine translation. p. 1.

2010. The Handbook of Computational Linguistics and Natural Language Processing. p. 655.

Way, Andy 2010. Panning for EBMT gold, or “Remembering not to forget”. Machine Translation, Vol. 24, Issue. 3-4, p. 177.

Sun, Xiao and Huang, Degen 2011. Nested Template-based Model for Chinese-Japanese Machine Translation. p. 161.

Wibawa, Aji Prasetya Nafalski, Andrew Tweedale, Jeffrey Murray, Neil and Kadarisman, Ahmad Effendi 2013. Hybrid machine translation for Javanese speech levels. p. 64.

Gavrila, Monica and Elita, Natalia 2014. Human Language Technology Challenges for Computer Science and Linguistics. Vol. 8387, Issue. , p. 445.

Nahar, Shamsun Huda, Mohammad Nurul Nur-E-Arefin, Md. and Rahman, Mohammad Mahbubur 2017. Evaluation of machine translation approaches to translate English to Bengali. p. 1.

Lane, Ryan and Bansal, Ajay 2017. An Adaptive Machine Translator for Multilingual Communication. p. 21.

Lane, Ryan and Bansal, Ajay 2019. Towards a Bidirectional Machine Translator Generator for Multilingual Communication. p. 25.

Mukta, Afsana Parveen Mamun, Al-Amin Basak, Chaity Nahar, Shamsun and Arif, Md. Faizul Huq 2019. A Phrase-Based Machine Translation from English to Bangla Using Rule-Based Approach. p. 1.

Anikin, Anton and Sychev, Oleg 2020. Biologically Inspired Cognitive Architectures 2019. Vol. 948, Issue. , p. 22.

Leeson, Lorraine Morrissey, Sara Shterionov, Dimitar Stein, Daniel van den Heuvel, Henk and Way, Andy 2024. Sign Language Machine Translation. Vol. 5, Issue. , p. 27.

Article contents

Comparing example-based and statistical machine translation

Abstract

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Comparing example-based and statistical machine translation

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests