Find the errors, get the better: Enhancing machine translation via word confidence estimation

NGOC-QUANG LUONG; LAURENT BESACIER; BENJAMIN LECOUTEUX

doi:10.1017/S1351324917000080

Find the errors, get the better: Enhancing machine translation via word confidence estimation

Published online by Cambridge University Press: 07 March 2017

NGOC-QUANG LUONG ,

LAURENT BESACIER and

BENJAMIN LECOUTEUX

Show author details

NGOC-QUANG LUONG: Affiliation:
Laboratoire d’Informatique de Grenoble, Campus de Grenoble 41, Rue des Mathématiques, BP53, F-38041 Grenoble Cedex 9, France e-mails: [email protected], [email protected], [email protected]
LAURENT BESACIER: Affiliation:
Laboratoire d’Informatique de Grenoble, Campus de Grenoble 41, Rue des Mathématiques, BP53, F-38041 Grenoble Cedex 9, France e-mails: [email protected], [email protected], [email protected]
BENJAMIN LECOUTEUX: Affiliation:
Laboratoire d’Informatique de Grenoble, Campus de Grenoble 41, Rue des Mathématiques, BP53, F-38041 Grenoble Cedex 9, France e-mails: [email protected], [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MT N-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of the N-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) than N-best List Re-ranking method.

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 4 , July 2017 , pp. 617 - 639

DOI: https://doi.org/10.1017/S1351324917000080 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Aziz, W., De Sousa, S. C. M., and Specia, L. 2012. Pet: a tool for post-editing and assessing machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey.Google Scholar

Bicici, E. 2013. Referential translation machines for quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria.Google Scholar

Blackwood, G. 2010. Lattice Rescoring Methods for Statistical Machine Translation. PhD Thesis, University of Cambridge, Cambridge, England.Google Scholar

Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. 2003. Confidence estimation for machine translation. Technical Report, JHU/CLSP Summer Workshop.Google Scholar

Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. 2004. Confidence estimation for machine translation. In Proceedings of COLING 2004, Geneva.Google Scholar

Camargo-de-Souza, J. G., González-Rubio, J., Buck, C., Turchi, M., and Negri, M. 2014. Fbk-upv-uedin participation in the wmt14 quality estimation shared-task. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.Google Scholar

Capit, N., and Joseph, E. 2013. OAR Documentation - User Guide. LIG laboratory, Laboratoire d’Informatique de Grenoble, France.Google Scholar

Clark, J., Dyer, C., Lavie, A., and Smith, N., 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the Association for Computational Lingustics, Portland, Oregon, USA, pp. 176–181.Google Scholar

Duh, K., and Kirchhoff, K., 2008. Beyond log-linear models: boosted minimum error rate training for n-best re-ranking. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (Short Papers), Columbus, Ohio, USA, pp. 37–40.Google Scholar

Felice, M., and Specia, L. 2012. Linguistic features for quality estimation. In Proceedings of the 7th Workshop on Statistical Machine Translation, Montreal, Canada.Google Scholar

Frank, V. B. 2004. CONDOR: A Constrained, Non-Linear, Derivative-Free Parallel Optimizer for Continuous, High Computing Load, Noisy Objective Functions. PhD Thesis, University of Brussels (ULB - Université Libre de Bruxelles), Belgium.Google Scholar

Han, A. L. F., Lu, J., Wong, D. F., Chao, L. S., He, L., and Xing, J. 2013. Quality estimation for machine translation using the joint method of evaluation criteria and statistical modeling. In Proceedings of the 8th Workshop on Statistical Machine Translation, Sofia, Bulgaria.Google Scholar

Kirchhoff, K., and Yang, M. 2005. Improved language modeling for statistical machine translation. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, Ann Arbor, Michigan.Google Scholar

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.Google Scholar

Kreutzer, J., Schamoni, S., and Riezler, S. 2015. QUality Estimation from ScraTCH (QUETCH): deep learning for word-level translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, San Francisco, CA.Google Scholar

Lavergne, T., Cappé, O., and Yvon, F. 2010. Practical very large scale CRFs. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar

Logacheva, V., Hokamp, C., and Specia, L. 2015. Data enhancement and selection strategies for the word-level quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar

Luong, N. Q. 2012. Integrating lexical, syntactic and system-based features to improve word confidence estimation in SMT. In Proceedings of JEP-TALN-RECITAL, Grenoble, France.Google Scholar

Luong, N. Q., Besacier, L., and Lecouteux, B. 2013. Word confidence estimation and its integration in sentence quality estimation for machine translation. In Proceedings of The 5th International Conference on Knowledge and Systems Engineering, Hanoi, Vietnam.Google Scholar

Luong, N. Q., Besacier, L., and Lecouteux, B. 2014. LIG System for word level WE task at WMT14. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.Google Scholar

Luong, N. Q., Lecouteux, B., and Besacier, L. 2013. LIG system for WMT13 QE task: investigating the usefulness of features inWord confidence estimation for MT. In Proceedings of the 8th Workshop on Statistical Machine Translation, Sofia, Bulgaria.Google Scholar

Nakov, P., Guzman, F., and Vogel, S. 2012. Optimizing for sentence-level bleu+1 yields short translations. In Proceedings of COLING 2012, Mumbai, India.Google Scholar

Nguyen, B., Huang, F., and Al-Onaizan, Y. 2011. Goodness: a method for measuring machine translation confidence. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon.Google Scholar

Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan.Google Scholar

Papineni, K., Roukos, S., Ard, T., and Zhu, W. J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA.Google Scholar

Potet, M., Rodier, E. E., Besacier, L., and Blanchon, H. 2012. Collection of a large database of French-English SMT output corrections. In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul.Google Scholar

Shah, K., Logacheva, V., Paetzold, G., Blain, F., Beck, D., Bougares, F., and Specia, L. 2015. SHEF-NN: translation quality estimation with neural networks. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar

Shang, L., Cai, D., and Ji, D. 2015. Strategy- based technology for estimating MT quality. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar

Snover, M., Madnani, N., Dorr, B., and Schwartz, R. 2008. Terp system description. In MetricsMATR Workshop at the Conference of the Association for Machine Translation in the Americas (AMTA), Honolulu, Hawaii, USA.Google Scholar

Sokolov, A., Wisniewski, G., and Yvon, F., 2012a. Computing lattice bleu oracle scores for machine translation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 120–129.Google Scholar

Sokolov, A., Wisniewski, G., and Yvon, F. 2012b. Non-linear n-best list reranking with few features. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, CA, USA.Google Scholar

Soricut, R., and Echihabi, A. 2010. Trustrank: inducing trust in automatic translations via ranking. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar

Stolcke, A. 2002. Srilm - an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing, Denver, USA.Google Scholar

Tezcan, A., Hoste, V., Desmet, B., and Macken, L. 2015. UGENT-LT3 SCATE system for machine translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar

Ueffing, N., Macherey, K., and Ney, H. 2003. Confidence measures for statistical machine translation. In MT Summit IX, New Orleans, LA.Google Scholar

Ueffing, N., and Ney, H. 2005. Word-level confidence estimation for machine translation using phrased-based translation models. In Human Language Technology Conference and Conference on Empirical Methods in NLP, Vancouver.Google Scholar

Ueffing, N., and Ney, H., 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33 (1): 9–40.Google Scholar

Watanabe, T., Suzuki., Tsukada, H., and Isozaki, H. 2007. Online large-margin training for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic.Google Scholar

Wisniewski, G., Pécheux, N., Allauzen, A., and Yvon, F. 2014. Limsi submission for wmt’14 qe task. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.Google Scholar

Xiong, D., Zhang, M., and Li, H. 2010. Error detection for statistical machine translation using linguistic features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar

Zhang, Y., Almut, S. H., and Stephan, V. 2006. Distributed language modeling for n-best list re-ranking. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney.Google Scholar

Article contents

Find the errors, get the better: Enhancing machine translation via word confidence estimation

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests