Skip to main content Accessibility help
×
Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-22T14:18:34.240Z Has data issue: false hasContentIssue false

12 - Advances in Speech-to-Speech Translation Technologies

Published online by Cambridge University Press:  10 June 2019

Meng Ji
Affiliation:
University of Sydney
Michael Oakes
Affiliation:
University of Wolverhampton
Get access

Summary

Automated speech translation, long a dream, has come into widespread use, as enterprises, application developers, and government agencies have become aware. Real-world S2ST applications have been tested locally over the past decade in consumer, healthcare, military, and humanitarian missions, and several projects aim to enable automatic cross-language communications at the 2020 Olympic Games to be held in Tokyo. Accordingly, this chapter provides a survey of the field’s technologies, approaches, companies, projects, and target use cases. (It is based on an industry report sponsored by the Translation Automation Users Society, released in 2017.) Sections examine the Past, Present, and Future of speech-to-speech translation. The first provides an orientation concerning issues in speech translation and a capsule history; the second snapshots technical achievements and representative participants in the burgeoning current scene; and the third speculates about future directions, with emphasis on platforms and form factors, big data, knowledge source integration, and the roles of human and automatic translators.

Type
Chapter
Information
Advances in Empirical Translation Studies
Developing Translation Resources and Technologies
, pp. 217 - 251
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allen, Jonathan, Hunnicutt, Sharon, Carlson, Rolf, and Granstrom, Bjorn (1979). MITalk: The 1979 MIT Text-to-Speech system. The Journal of the Acoustical Society of America 65 (S1).Google Scholar
Alshawi, Hayan, Carter, David, Pulman, Steve, Rayner, Manny, and Björn, Gambäck (1992). English-Swedish translation dialogue software. In Translating and the Computer, 14. Aslib, London, November, pp. 1011.Google Scholar
Brown, Peter F., Della Pietra, Stephen A., Della Pietra, Vincent J., and Mercer, Robert L. (1993). The mathematics of Statistical Machine Translation: Parameter estimation. Computational Linguistics 19(2) (June), 263311.Google Scholar
Cohen, Jordan (2007). The GALE project: A description and an update. In Institute of Electrical and Electronics Engineers (IEEE) Workshop on Automatic Speech Recognition and Understanding (ASRU). Kyoto, Japan, December 9–13, pp. 237237.Google Scholar
Eck, Matthias, Lane, Ian, Zhang, Y., and Waibel, Alex (2010). Jibbigo: Speech-to-Speech translation on mobile devices. In Spoken Technology Workshop (SLT), Institute of Electrical and Electronics Engineers (IEEE) 2010. Berkeley, CA, December 12–15, pp. 165166.Google Scholar
Ehsani, Farzad, Kimzey, Jim, Zuber, Elaine, Master, Demitrios, and Sudre, Karen (2008). Speech to speech translation for nurse patient interaction. In COLING 2008: Proceedings of the Workshop on Speech Processing for Safety Critical Translation and Pervasive Applications. International Committee on Computational Linguistics (COLING) and the Association for Computational Linguistics (ACL). Manchester, England, August, pp. 5459.Google Scholar
Frandsen, Michael W., Riehemann, Susanne Z., and Precoda, Kristin (2008). IraqComm and FlexTrans: A speech translation system and flexible framework. In Innovations and Advances in Computer Sciences and Engineering. Dordrecht, Heidelberg, London, New York: Springer, pp. 527532.Google Scholar
Frederking, Robert, Rudnicky, Alexander, Hogan, Christopher, and Lenzo, Kevin (2000). Interactive speech translation in the DIPLOMAT project. Machine Translation 15(1–2), 2742.Google Scholar
Fügen, Christian, Waibel, Alex, and Kolss, Muntsin (2007). Simultaneous translation of lectures and speeches. Machine Translation 21(4), 209252.Google Scholar
Gao, Jiang, Yang, Jie, Zhang, Ying, and Waibel, Alex (2004). Automatic detection and translation of text from natural scenes. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Image Processing 13(1) (January), 8791.Google Scholar
Gao, Yuqing, Gu, Liang, Zhou, Bowen, Sarikaya, Ruhi, Afify, Mohamed, Kuo, Hong-kwang, Zhu, Wei-zhong, Deng, Yonggang, Prosser, Charles, Zhang, Wei, and Besacier, Laurent (2006). IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-speech Translator. In Proceedings of the First International Workshop on Medical Speech Translation, in conjunction with the North American Chapter of the Association for Computational Linguistics, Human Language Technology (NAACL/HLT). New York City, NY, June 9, pp. 5760.Google Scholar
Kumar, Rohit, Hewavitharana, Sanjika, Zinovieva, Nina, Roy, Matthew E., and Pattison-Gordon, Edward (2015). Error-tolerant speech-to-speech translation. In Proceedings of Machine Translation (MT) Summit XV, Volume 1: MT Researchers’ Track, MT Summit XV. Miami, FL, October 30–November 3, pp. 229239.Google Scholar
Levin, Lori, Gates, Donna, Lavie, Alon, and Waibel, Alex (1998). An interlingua based on domain actions for machine translation of task-oriented dialogues. In Proceedings of the Fifth International Conference on Spoken Language Processing, ICSLP-98. Sydney, Australia, November 30–December 4, pp. 11551158.Google Scholar
Maier-Hein, Lena, Metze, Florian, Schultz, Tanja, and Waibel, Alex (2005). Session independent non-audible speech recognition using surface electromyography. In Proceedings of the 2005 Institute of Electrical and Electronics Engineers (IEEE) Workshop on Automatic Speech Recognition and Understanding, ASRU 2005. Cancun, Mexico, November 27–December 1, pp. 331336.Google Scholar
Morimoto, Tsuyoshi, Takezawa, Toshiyuki, Yato, Fumihiro, Sagayama, Shigeki, Tashiro, Toshihisa, Nagata, Masaaki, and Kurematsu, Akira (1993). ATR’s speech translation system: ASURA. In EUROSPEECH-1993, the Third European Conference on Speech Communication and Technology. Berlin, September 21–23, pp. 12911294.CrossRefGoogle Scholar
Och, Franz Josef, and Ney, Hermann (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, PA, July, pp. 295302.Google Scholar
Olive, Joseph, Christianson, Caitlin, and McCary, John (eds.) (2011). Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation. New York City, NY: Springer Science and Business Media.CrossRefGoogle Scholar
Roe, David B., Moreno, Pedro J., Sproat, Richard, Pereira, Fernando C. N., Riley, Michael D., and Macaron, Alejandro (1992). A spoken language translator for restricted-domain context-free languages, Speech Communication 11(2–3) (June), 311319.Google Scholar
Seligman, Mark (2000). Nine issues in speech translation, Machine Translation 15(1–2) Special Issue on Spoken Language Translation (June), 149186.Google Scholar
Seligman, Mark, and Dillinger, Mike (2011). Real-time multi-media translation for healthcare: A usability study. In Proceedings of the 13th Machine Translation (MT) Summit. Xiamen, China, September 19–23, pp. 595602.Google Scholar
Seligman, Mark, and Dillinger, Mike (2015). Evaluation and revision of a speech translation system for healthcare. In Proceedings of International Workshop for Spoken Language Translation (IWSLT) 2015. Da Nang, Vietnam, December 3–4, pp. 209216.Google Scholar
Seligman, Mark, Waibel, Alex, and Joscelyne, Andrew (2017). TAUS Speech-to-Speech Translation Technology Report. Available via www.taus.net/think-tank/reports/translate-reports/taus-speech-to-speech-translation-technology-report#download-purchase.Google Scholar
Shimizu, Hiroaki, Neubig, Graham, Sakti, Sakriani, Toda, Tomoki, and Nakamura, Satoshi (2013). Constructing a speech translation system using simultaneous interpretation data. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT) 2013. Heidelberg, Germany, December 5–6, pp. 212218.Google Scholar
Stallard, David, Prasad, Rohit, Natarajan, Prem, Choi, Fred, Saleem, Shirin, Meermeier, Ralf, Krstovski, Kriste, Ananthakrishnan, Shankar, and Devlin, Jacob (2011). The BBN TransTalk speech-to-speech translation system. In Ipsic, Ivo (ed.), Speech and Language Technologies. InTech, DOI:10.5772/19405. Available from:www.intechopen.com/books/speech-and-language-technologies/the-bbn-transtalk-speech-to-speech-translation-system.Google Scholar
Suhm, Bernhard, Myers, Brad, and Waibel, Alex (1996a). Interactive recovery from speech recognition errors in speech user interfaces. In Proceedings of the Fourth International Conference on Spoken Language Processing (ICSLP) 1996. Philadelphia, PA, October 3–6, pp. 865868.Google Scholar
Suhm, Bernhard, Myers, Brad, and Waibel, Alex (1996b). Designing interactive error recovery methods for speech interfaces. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI) 1996, Workshop on Designing the User Interface for Speech Recognition Applications. Vancouver, Canada, April 13–18.Google Scholar
Wahlster, Wolfgang (ed.) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer: Berlin.Google Scholar
Waibel, Alex (1987). Phoneme recognition using time-delay neural networks. In Meeting of the Institute of Electrical, Information, and Communication Engineers (IEICE), SP87-100. Tokyo, Japan, December.Google Scholar
Waibel, Alex (1996). Interactive translation of conversational speech. Computer 29(7), July, 4148.Google Scholar
Waibel, Alex (2002). Portable Object Identification and Translation System. US Patent 20030164819.Google Scholar
Waibel, Alex, Aoki, Naomi, Fügen, Christian, and Rottman, Kay (2016). Hybrid, Offline/Online Speech Translation System. US Patent 9,430,465.Google Scholar
Waibel, Alex, Badran, Ahmed, Black, Alan W., Frederking, Robert, Gates, Donna, Lavie, Alon, Levin, Lori, Lenzo, Kevin, Tomokiyo, Laura Mayfield, Reichert, Jurgen, Schultz, Tanja, Wallace, Dorcas, Woszczyna, Monika, and Zhang, Jing (2003). Speechalator: Two-way speech-to-speech translation on a consumer PDA. In EUROSPEECH-2003, the Eighth European Conference on Speech Communication and Technology. Geneva, Switzerland, September 1–4, pp. 369372.Google Scholar
Waibel, Alex, and Fügen, Christian (2013). Simultaneous Translation of Open Domain Lectures and Speeches. US Patent 8,504,351.Google Scholar
Waibel, Alex, Hanazawa, Toshiyuki, Hinton, Geoffrey, and Shikano, Kiyohiro (1987). Phoneme recognition using time-delay neural networks. Advanced Telecommunications Research (ATR) Interpreting Telephony Research Laboratories Technical Report. October 30.Google Scholar
Waibel, Alex, Hanazawa, Toshiyuki, Hinton, Geoffrey, Shikano, Kiyohiro, and Lang, Kevin (1989). Phoneme recognition using time-delay neural networks. Institute of Electrical and Electronics Engineers (IEEE) Transactions on Acoustics, Speech and Signal Processing 37(3) (March), 328339.Google Scholar
Waibel, Alex, Jain, Ajay N., McNair, Arthur E., Saito, Hiroaki, Hauptmann, Alexander G., and Tebelskis, Joe (1991). JANUS: A speech-to-speech translation system using connectionist and symbolic processing strategies. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1991. Toronto, Canada, May 14–17, pp. 793796.Google Scholar
Waibel, Alex and Lane, Ian R. (2012a). System and Methods for Maintaining Speech-to-Speech Translation in the Field. US Patent 8,204,739.Google Scholar
Waibel, Alex and Lane, Ian R. (2012b). Enhanced Speech-to-Speech Translation System and Method for Adding a New Word. US Patent 8,972,268.Google Scholar
Waibel, Alex and Lane, Ian R. (2015). Speech Translation with Back-Channeling Cues. US Patent 9,070,363 B2.Google Scholar
Waibel, Alex, Lavie, Alon, and Levin, Lori S. (1997). JANUS: A system for translation of conversational speech. Künstliche Intelligenz 11, 5155.Google Scholar
Yang, Jie, Yang, Weiyi, Denecke, Matthias, and Waibel, Alex (1999). Smart Sight: A tourist assistant system. In The Third International Symposium on Wearable Computers (ISWC) 1999, Digest of Papers. San Francisco, CA, October 18–19, pp. 7378.Google Scholar
Yang, Jie, Gao, Jiang, Zhang, Ying, and Waibel, Alex (2001a). Towards automatic sign translation. In Proceedings of the First Human Language Technology Conference (HLT) 2001. San Diego, CA, March 18–21.Google Scholar
Yang, Jie, Gao, Jiang, Zhang, Ying, and Waibel, Alex (2001b). An automatic sign recognition and translation system. In Proceedings of the Workshop on Perceptual User Interfaces (PUI) 2001. Orlando, FL, November 15–16.Google Scholar
Zhang, Jing, Chen, Xilin, Yang, Jie, and Waibel, Alex (2002a). A PDA-based sign translator. In Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI) 2002. Pittsburgh, PA, October 14–16, pp. 217222.Google Scholar
Zhang, Ying, Zhao, Bing, Yang, Jie, and Waibel, Alex (2002b). Automatic sign translation. In Proceedings of the Seventh International Conference on Spoken Language Processing (ICSLP) 2002, Second INTERSPEECH Event. Denver, CO, September 16–20.Google Scholar
Zhou, BowenCui, XiaodongHuang, SongfangCmejrek, MartinZhang, Wei,Xue, JianCui, JiaXiang, BingDaggett, GreggChaudhari, Upendra,Maskey, Sameer, and Marcheret, Etienne (2013). The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks. Computer Speech and Language 27(2) (February), 592618.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×