Bootstrapping spoken dialogue systems by exploiting reusable libraries

GIUSEPPE DI FABBRIZIO; GOKHAN TUR; DILEK HAKKANI-TÜR; MAZIN GILBERT; BERNARD RENGER; DAVID GIBBON; ZHU LIU; BEHZAD SHAHRARAY

doi:10.1017/S1351324907004561

Bootstrapping spoken dialogue systems by exploiting reusable libraries

Published online by Cambridge University Press: 01 July 2008

GIUSEPPE DI FABBRIZIO ,

GOKHAN TUR ,

ZHU LIU and

GIUSEPPE DI FABBRIZIO: Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
GOKHAN TUR: Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
DILEK HAKKANI-TÜR: Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
MAZIN GILBERT: Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
BERNARD RENGER: Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
DAVID GIBBON: Affiliation:
AT&T Labs—Research, 200 Laurel Avenue South, Middletown, NJ 07748, USA e-mail: [email protected], [email protected], [email protected]
ZHU LIU: Affiliation:
AT&T Labs—Research, 200 Laurel Avenue South, Middletown, NJ 07748, USA e-mail: [email protected], [email protected], [email protected]
BEHZAD SHAHRARAY: Affiliation:
AT&T Labs—Research, 200 Laurel Avenue South, Middletown, NJ 07748, USA e-mail: [email protected], [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Building natural language spoken dialogue systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes and defines by hand the system core functionalities: the system semantic scope (call-types) and the dialogue manager strategy that will drive the human–machine interaction. This approach is extensive and error-prone since it involves several nontrivial design decisions that can be evaluated only after the actual system deployment. Moreover, scalability is compromised by time, costs, and the high level of UE know-how needed to reach a consistent design. We propose a novel approach for bootstrapping spoken dialogue systems based on the reuse of existing transcribed and labeled data, common reusable dialogue templates, generic language and understanding models, and a consistent design process. We demonstrate that our approach reduces design and development time while providing an effective system without any application-specific data.

Type: Papers
Information: Natural Language Engineering , Volume 14 , Issue 3 , July 2008 , pp. 313 - 335

DOI: https://doi.org/10.1017/S1351324907004561 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abella, A. and Gorin, A. 1999. Construct algebra: Analytical dialog management. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Washington, DC, June.CrossRef Google Scholar

Bobrow, D. and Fraser, B. 1969. An augmented state transition network analysis procedure. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 557–567, Washington, DC, May.Google Scholar

Buntschuh, B., Kamm, C., Di Fabbrizio, G., Abella, A., Mohri, M., Narayanan, S., Zeljkovic, I., Sharp, R. D., Wright, J., Marcus, S., Shaffer, J., Duncan, R. and Wilpon, J. G., 1998. VPQ: A spoken language interface to large scale directory information. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Sydney, New South Wales, Australia, November.CrossRef Google Scholar

Di Fabbrizio, G., Dutton, D., Gupta, N., Hollister, B., Rahim, M., Riccardi, G., Schapire, R. and Schroeter, J. 2002. AT&T Help Desk. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, September.CrossRef Google Scholar

Di Fabbrizio, G. and Lewis, C. 2004. Florence: A dialogue manager framework for spoken dialogue systems. In ICSLP 2004, 8th International Conference on Spoken Language Processing, Jeju, Jeju Island, Korea, October 4–8.Google Scholar

Di Fabbrizio, G., Tur, G. and Hakkani-Tür, D. 2004. Bootstrapping spoken dialog systems with data reuse. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue, Cambridge, MA, April 30 – May 1.Google Scholar

Dybkjr, L. and Bernsen, N. 2000. The MATE workbench. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, May.Google Scholar

Godfrey, J. J., Holliman, E. C. and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 1, pages 517–520, San Francisco, CA, March.CrossRef Google Scholar

Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M., Riccardi, G. and Saraclar, M. 2005. The AT&T Watson Speech Recognizer. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, May 19–23.Google Scholar

Gorin, A. L., Riccardi, G. and Wright, J. H. 1997. How may I help you? Speech Communication 23: 113–127, October.CrossRef Google Scholar

Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G. and Rahim, M. 2006. The AT&T Spoken Language Understanding System. IEEE Transactions on Audio, Speech and Language Processing 14 (1): 213–222, January.CrossRef Google Scholar

Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Computer Speech & Language 13: 267–282, July.CrossRef Google Scholar

Kotelly, B. 2003. The Art and the Business of Speech Recognition—Creating the Noble Voice, chapter 5, pp. 58–64. Addison-Wesley.CrossRef Google Scholar

Lewis, C. and Di Fabbrizio, G. 2005. A clarification algorithm for spoken dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, May 19–23.Google Scholar

McTear, M. F. 2002. Spoken dialogue technology: enabling the conversational user interface. ACM Computing Surveys (CSUR) 34 (1): 90–169, March.CrossRef Google Scholar

NAICS. 2002. North American Industry Classification System (NAICS). http://www.census.gov/epcd/www/naics.html Google Scholar

Natarajan, P., Prasad, R., Suhm, B. and McCarthy, D. 2002. Speech enabled natural language call routing: BBN call director. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, September.CrossRef Google Scholar

Paek, T. 2001. Empirical methods for evaluating dialog systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Workshop on Evaluation Methodologies for Language and Dialogue Systems, Toulouse, France, July.CrossRef Google Scholar

Riccardi, G. and Hakkani-Tür, D. 2003. Active and unsupervised learning for automatic speech recognition. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, September.CrossRef Google Scholar

Riccardi, G., Pieraccini, R. and Bocchieri, E. 1996. Stochastic automata for language modeling. Computer Speech & Language, 10: 265–293.CrossRef Google Scholar

Rosenfeld, R. 1995. Optimizing lexical and n-gram coverage via judicious use of linguistic data. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 2, pp. 1763–1766, Madrid, Spain, September.CrossRef Google Scholar

Schapire, R. E. and Singer, Y. 2000. BoosTexter: A boosting-based system for text categorization. Machine Learning 39 (2/3): 135–168.CrossRef Google Scholar

Schapire, R. E., Rochery, M., Rahim, M. and Gupta, N. 2002. Incorporating prior knowledge into boosting. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, New South Wales, Australia, July.Google Scholar

Schapire, R. E. 2001. The boosting approach to machine learning: An overview. In Proceedings of the MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, March.Google Scholar

Sutton, S. and Cole, R. 1998. Universal speech tools: The CSLU toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Sydney, New South Wales, Australia, November.CrossRef Google Scholar

Tur, G., Hakkani-Tür, D. and Schapire, R. E. 2005. Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45 (2): 171–186.CrossRef Google Scholar

Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, September.CrossRef Google Scholar

VoiceXML. 2003. Voice extensible markup language (VoiceXML) version 2.0. http://www.w3.org/TR/voicexml20/Google Scholar

Walker, M. A., Litman, D. J.Kamm, C. A. and Abella, A. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)–Conference of the European Chapter of the Association for Computational Linguistics (EACL), Madrid, Spain, July.CrossRef Google Scholar

Article contents

Bootstrapping spoken dialogue systems by exploiting reusable libraries

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests