Advances in Speech-to-Speech Translation Technologies

doi:10.1017/9781108525695.012

12 - Advances in Speech-to-Speech Translation Technologies

Published online by Cambridge University Press: 10 June 2019

Mark Seligman and

Alex Waibel

Edited by

Meng Ji and

Michael Oakes

Show author details

Meng Ji: Affiliation:
University of Sydney
Michael Oakes: Affiliation:
University of Wolverhampton

Book contents

Get access

Summary

Automated speech translation, long a dream, has come into widespread use, as enterprises, application developers, and government agencies have become aware. Real-world S2ST applications have been tested locally over the past decade in consumer, healthcare, military, and humanitarian missions, and several projects aim to enable automatic cross-language communications at the 2020 Olympic Games to be held in Tokyo. Accordingly, this chapter provides a survey of the field’s technologies, approaches, companies, projects, and target use cases. (It is based on an industry report sponsored by the Translation Automation Users Society, released in 2017.) Sections examine the Past, Present, and Future of speech-to-speech translation. The first provides an orientation concerning issues in speech translation and a capsule history; the second snapshots technical achievements and representative participants in the burgeoning current scene; and the third speculates about future directions, with emphasis on platforms and form factors, big data, knowledge source integration, and the roles of human and automatic translators.

Keywords

speech-to-speech translation speech recognition machine translation big data

Type: Chapter
Information: Advances in Empirical Translation Studies
Developing Translation Resources and Technologies
, pp. 217 - 251

DOI: https://doi.org/10.1017/9781108525695.012 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Allen, Jonathan, Hunnicutt, Sharon, Carlson, Rolf, and Granstrom, Bjorn (1979). MITalk: The 1979 MIT Text-to-Speech system. The Journal of the Acoustical Society of America 65 (S1).Google Scholar

Alshawi, Hayan, Carter, David, Pulman, Steve, Rayner, Manny, and Björn, Gambäck (1992). English-Swedish translation dialogue software. In Translating and the Computer, 14. Aslib, London, November, pp. 10–11.Google Scholar

Brown, Peter F., Della Pietra, Stephen A., Della Pietra, Vincent J., and Mercer, Robert L. (1993). The mathematics of Statistical Machine Translation: Parameter estimation. Computational Linguistics 19(2) (June), 263–311.Google Scholar

Cohen, Jordan (2007). The GALE project: A description and an update. In Institute of Electrical and Electronics Engineers (IEEE) Workshop on Automatic Speech Recognition and Understanding (ASRU). Kyoto, Japan, December 9–13, pp. 237–237.Google Scholar

Eck, Matthias, Lane, Ian, Zhang, Y., and Waibel, Alex (2010). Jibbigo: Speech-to-Speech translation on mobile devices. In Spoken Technology Workshop (SLT), Institute of Electrical and Electronics Engineers (IEEE) 2010. Berkeley, CA, December 12–15, pp. 165–166.Google Scholar

Ehsani, Farzad, Kimzey, Jim, Zuber, Elaine, Master, Demitrios, and Sudre, Karen (2008). Speech to speech translation for nurse patient interaction. In COLING 2008: Proceedings of the Workshop on Speech Processing for Safety Critical Translation and Pervasive Applications. International Committee on Computational Linguistics (COLING) and the Association for Computational Linguistics (ACL). Manchester, England, August, pp. 54–59.Google Scholar

Frandsen, Michael W., Riehemann, Susanne Z., and Precoda, Kristin (2008). IraqComm and FlexTrans: A speech translation system and flexible framework. In Innovations and Advances in Computer Sciences and Engineering. Dordrecht, Heidelberg, London, New York: Springer, pp. 527–532.Google Scholar

Frederking, Robert, Rudnicky, Alexander, Hogan, Christopher, and Lenzo, Kevin (2000). Interactive speech translation in the DIPLOMAT project. Machine Translation 15(1–2), 27–42.Google Scholar

Fügen, Christian, Waibel, Alex, and Kolss, Muntsin (2007). Simultaneous translation of lectures and speeches. Machine Translation 21(4), 209–252.Google Scholar

Gao, Jiang, Yang, Jie, Zhang, Ying, and Waibel, Alex (2004). Automatic detection and translation of text from natural scenes. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Image Processing 13(1) (January), 87–91.Google Scholar

Gao, Yuqing, Gu, Liang, Zhou, Bowen, Sarikaya, Ruhi, Afify, Mohamed, Kuo, Hong-kwang, Zhu, Wei-zhong, Deng, Yonggang, Prosser, Charles, Zhang, Wei, and Besacier, Laurent (2006). IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-speech Translator. In Proceedings of the First International Workshop on Medical Speech Translation, in conjunction with the North American Chapter of the Association for Computational Linguistics, Human Language Technology (NAACL/HLT). New York City, NY, June 9, pp. 57–60.Google Scholar

Kumar, Rohit, Hewavitharana, Sanjika, Zinovieva, Nina, Roy, Matthew E., and Pattison-Gordon, Edward (2015). Error-tolerant speech-to-speech translation. In Proceedings of Machine Translation (MT) Summit XV, Volume 1: MT Researchers’ Track, MT Summit XV. Miami, FL, October 30–November 3, pp. 229–239.Google Scholar

Levin, Lori, Gates, Donna, Lavie, Alon, and Waibel, Alex (1998). An interlingua based on domain actions for machine translation of task-oriented dialogues. In Proceedings of the Fifth International Conference on Spoken Language Processing, ICSLP-98. Sydney, Australia, November 30–December 4, pp. 1155–1158.Google Scholar

Maier-Hein, Lena, Metze, Florian, Schultz, Tanja, and Waibel, Alex (2005). Session independent non-audible speech recognition using surface electromyography. In Proceedings of the 2005 Institute of Electrical and Electronics Engineers (IEEE) Workshop on Automatic Speech Recognition and Understanding, ASRU 2005. Cancun, Mexico, November 27–December 1, pp. 331–336.Google Scholar

Morimoto, Tsuyoshi, Takezawa, Toshiyuki, Yato, Fumihiro, Sagayama, Shigeki, Tashiro, Toshihisa, Nagata, Masaaki, and Kurematsu, Akira (1993). ATR’s speech translation system: ASURA. In EUROSPEECH-1993, the Third European Conference on Speech Communication and Technology. Berlin, September 21–23, pp. 1291–1294.CrossRef Google Scholar

Och, Franz Josef, and Ney, Hermann (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, PA, July, pp. 295–302.Google Scholar

Olive, Joseph, Christianson, Caitlin, and McCary, John (eds.) (2011). Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation. New York City, NY: Springer Science and Business Media.CrossRef Google Scholar

Roe, David B., Moreno, Pedro J., Sproat, Richard, Pereira, Fernando C. N., Riley, Michael D., and Macaron, Alejandro (1992). A spoken language translator for restricted-domain context-free languages, Speech Communication 11(2–3) (June), 311–319.Google Scholar

Seligman, Mark (2000). Nine issues in speech translation, Machine Translation 15(1–2) Special Issue on Spoken Language Translation (June), 149–186.Google Scholar

Seligman, Mark, and Dillinger, Mike (2011). Real-time multi-media translation for healthcare: A usability study. In Proceedings of the 13th Machine Translation (MT) Summit. Xiamen, China, September 19–23, pp. 595–602.Google Scholar

Seligman, Mark, and Dillinger, Mike (2015). Evaluation and revision of a speech translation system for healthcare. In Proceedings of International Workshop for Spoken Language Translation (IWSLT) 2015. Da Nang, Vietnam, December 3–4, pp. 209–216.Google Scholar

Seligman, Mark, Waibel, Alex, and Joscelyne, Andrew (2017). TAUS Speech-to-Speech Translation Technology Report. Available via www.taus.net/think-tank/reports/translate-reports/taus-speech-to-speech-translation-technology-report#download-purchase.Google Scholar

Shimizu, Hiroaki, Neubig, Graham, Sakti, Sakriani, Toda, Tomoki, and Nakamura, Satoshi (2013). Constructing a speech translation system using simultaneous interpretation data. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT) 2013. Heidelberg, Germany, December 5–6, pp. 212–218.Google Scholar

Stallard, David, Prasad, Rohit, Natarajan, Prem, Choi, Fred, Saleem, Shirin, Meermeier, Ralf, Krstovski, Kriste, Ananthakrishnan, Shankar, and Devlin, Jacob (2011). The BBN TransTalk speech-to-speech translation system. In Ipsic, Ivo (ed.), Speech and Language Technologies. InTech, DOI:10.5772/19405. Available from:www.intechopen.com/books/speech-and-language-technologies/the-bbn-transtalk-speech-to-speech-translation-system.Google Scholar

Suhm, Bernhard, Myers, Brad, and Waibel, Alex (1996a). Interactive recovery from speech recognition errors in speech user interfaces. In Proceedings of the Fourth International Conference on Spoken Language Processing (ICSLP) 1996. Philadelphia, PA, October 3–6, pp. 865–868.Google Scholar

Suhm, Bernhard, Myers, Brad, and Waibel, Alex (1996b). Designing interactive error recovery methods for speech interfaces. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI) 1996, Workshop on Designing the User Interface for Speech Recognition Applications. Vancouver, Canada, April 13–18.Google Scholar

Wahlster, Wolfgang (ed.) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer: Berlin.Google Scholar

Waibel, Alex (1987). Phoneme recognition using time-delay neural networks. In Meeting of the Institute of Electrical, Information, and Communication Engineers (IEICE), SP87-100. Tokyo, Japan, December.Google Scholar

Waibel, Alex (1996). Interactive translation of conversational speech. Computer 29(7), July, 41–48.Google Scholar

Waibel, Alex (2002). Portable Object Identification and Translation System. US Patent 20030164819.Google Scholar

Waibel, Alex, Aoki, Naomi, Fügen, Christian, and Rottman, Kay (2016). Hybrid, Offline/Online Speech Translation System. US Patent 9,430,465.Google Scholar

Waibel, Alex, Badran, Ahmed, Black, Alan W., Frederking, Robert, Gates, Donna, Lavie, Alon, Levin, Lori, Lenzo, Kevin, Tomokiyo, Laura Mayfield, Reichert, Jurgen, Schultz, Tanja, Wallace, Dorcas, Woszczyna, Monika, and Zhang, Jing (2003). Speechalator: Two-way speech-to-speech translation on a consumer PDA. In EUROSPEECH-2003, the Eighth European Conference on Speech Communication and Technology. Geneva, Switzerland, September 1–4, pp. 369–372.Google Scholar

Waibel, Alex, and Fügen, Christian (2013). Simultaneous Translation of Open Domain Lectures and Speeches. US Patent 8,504,351.Google Scholar

Waibel, Alex, Hanazawa, Toshiyuki, Hinton, Geoffrey, and Shikano, Kiyohiro (1987). Phoneme recognition using time-delay neural networks. Advanced Telecommunications Research (ATR) Interpreting Telephony Research Laboratories Technical Report. October 30.Google Scholar

Waibel, Alex, Hanazawa, Toshiyuki, Hinton, Geoffrey, Shikano, Kiyohiro, and Lang, Kevin (1989). Phoneme recognition using time-delay neural networks. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Acoustics, Speech and Signal Processing 37(3) (March), 328–339.Google Scholar

Waibel, Alex, Jain, Ajay N., McNair, Arthur E., Saito, Hiroaki, Hauptmann, Alexander G., and Tebelskis, Joe (1991). JANUS: A speech-to-speech translation system using connectionist and symbolic processing strategies. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1991. Toronto, Canada, May 14–17, pp. 793–796.Google Scholar

Waibel, Alex and Lane, Ian R. (2012a). System and Methods for Maintaining Speech-to-Speech Translation in the Field. US Patent 8,204,739.Google Scholar

Waibel, Alex and Lane, Ian R. (2012b). Enhanced Speech-to-Speech Translation System and Method for Adding a New Word. US Patent 8,972,268.Google Scholar

Waibel, Alex and Lane, Ian R. (2015). Speech Translation with Back-Channeling Cues. US Patent 9,070,363 B2.Google Scholar

Waibel, Alex, Lavie, Alon, and Levin, Lori S. (1997). JANUS: A system for translation of conversational speech. Künstliche Intelligenz 11, 51–55.Google Scholar

Yang, Jie, Yang, Weiyi, Denecke, Matthias, and Waibel, Alex (1999). Smart Sight: A tourist assistant system. In The Third International Symposium on Wearable Computers (ISWC) 1999, Digest of Papers. San Francisco, CA, October 18–19, pp. 73–78.Google Scholar

Yang, Jie, Gao, Jiang, Zhang, Ying, and Waibel, Alex (2001a). Towards automatic sign translation. In Proceedings of the First Human Language Technology Conference (HLT) 2001. San Diego, CA, March 18–21.Google Scholar

Yang, Jie, Gao, Jiang, Zhang, Ying, and Waibel, Alex (2001b). An automatic sign recognition and translation system. In Proceedings of the Workshop on Perceptual User Interfaces (PUI) 2001. Orlando, FL, November 15–16.Google Scholar

Zhang, Jing, Chen, Xilin, Yang, Jie, and Waibel, Alex (2002a). A PDA-based sign translator. In Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI) 2002. Pittsburgh, PA, October 14–16, pp. 217–222.Google Scholar

Zhang, Ying, Zhao, Bing, Yang, Jie, and Waibel, Alex (2002b). Automatic sign translation. In Proceedings of the Seventh International Conference on Spoken Language Processing (ICSLP) 2002, Second INTERSPEECH Event. Denver, CO, September 16–20.Google Scholar

Zhou, Bowen, Cui, Xiaodong, Huang, Songfang, Cmejrek, Martin, Zhang, Wei,Xue, Jian, Cui, Jia, Xiang, Bing, Daggett, Gregg, Chaudhari, Upendra,Maskey, Sameer, and Marcheret, Etienne (2013). The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks. Computer Speech and Language 27(2) (February), 592–618.Google Scholar