Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-10T10:17:28.620Z Has data issue: false hasContentIssue false

Bootstrapping spoken dialogue systems by exploiting reusable libraries

Published online by Cambridge University Press:  01 July 2008

GIUSEPPE DI FABBRIZIO
Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
GOKHAN TUR
Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
DILEK HAKKANI-TÜR
Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
MAZIN GILBERT
Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
BERNARD RENGER
Affiliation:
AT&T Labs—Research, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: [email protected], [email protected], [email protected], [email protected], [email protected]
DAVID GIBBON
Affiliation:
AT&T Labs—Research, 200 Laurel Avenue South, Middletown, NJ 07748, USA e-mail: [email protected], [email protected], [email protected]
ZHU LIU
Affiliation:
AT&T Labs—Research, 200 Laurel Avenue South, Middletown, NJ 07748, USA e-mail: [email protected], [email protected], [email protected]
BEHZAD SHAHRARAY
Affiliation:
AT&T Labs—Research, 200 Laurel Avenue South, Middletown, NJ 07748, USA e-mail: [email protected], [email protected], [email protected]

Abstract

Building natural language spoken dialogue systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes and defines by hand the system core functionalities: the system semantic scope (call-types) and the dialogue manager strategy that will drive the human–machine interaction. This approach is extensive and error-prone since it involves several nontrivial design decisions that can be evaluated only after the actual system deployment. Moreover, scalability is compromised by time, costs, and the high level of UE know-how needed to reach a consistent design. We propose a novel approach for bootstrapping spoken dialogue systems based on the reuse of existing transcribed and labeled data, common reusable dialogue templates, generic language and understanding models, and a consistent design process. We demonstrate that our approach reduces design and development time while providing an effective system without any application-specific data.

Type
Papers
Copyright
Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abella, A. and Gorin, A. 1999. Construct algebra: Analytical dialog management. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Washington, DC, June.CrossRefGoogle Scholar
Bobrow, D. and Fraser, B. 1969. An augmented state transition network analysis procedure. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 557–567, Washington, DC, May.Google Scholar
Buntschuh, B., Kamm, C., Di Fabbrizio, G., Abella, A., Mohri, M., Narayanan, S., Zeljkovic, I., Sharp, R. D., Wright, J., Marcus, S., Shaffer, J., Duncan, R. and Wilpon, J. G., 1998. VPQ: A spoken language interface to large scale directory information. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Sydney, New South Wales, Australia, November.CrossRefGoogle Scholar
Di Fabbrizio, G., Dutton, D., Gupta, N., Hollister, B., Rahim, M., Riccardi, G., Schapire, R. and Schroeter, J. 2002. AT&T Help Desk. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, September.CrossRefGoogle Scholar
Di Fabbrizio, G. and Lewis, C. 2004. Florence: A dialogue manager framework for spoken dialogue systems. In ICSLP 2004, 8th International Conference on Spoken Language Processing, Jeju, Jeju Island, Korea, October 4–8.Google Scholar
Di Fabbrizio, G., Tur, G. and Hakkani-Tür, D. 2004. Bootstrapping spoken dialog systems with data reuse. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue, Cambridge, MA, April 30 – May 1.Google Scholar
Dybkjr, L. and Bernsen, N. 2000. The MATE workbench. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, May.Google Scholar
Godfrey, J. J., Holliman, E. C. and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 1, pages 517–520, San Francisco, CA, March.CrossRefGoogle Scholar
Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M., Riccardi, G. and Saraclar, M. 2005. The AT&T Watson Speech Recognizer. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, May 19–23.Google Scholar
Gorin, A. L., Riccardi, G. and Wright, J. H. 1997. How may I help you? Speech Communication 23: 113127, October.CrossRefGoogle Scholar
Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G. and Rahim, M. 2006. The AT&T Spoken Language Understanding System. IEEE Transactions on Audio, Speech and Language Processing 14 (1): 213222, January.CrossRefGoogle Scholar
Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Computer Speech & Language 13: 267282, July.CrossRefGoogle Scholar
Kotelly, B. 2003. The Art and the Business of Speech Recognition—Creating the Noble Voice, chapter 5, pp. 58–64. Addison-Wesley.CrossRefGoogle Scholar
Lewis, C. and Di Fabbrizio, G. 2005. A clarification algorithm for spoken dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA, May 19–23.Google Scholar
McTear, M. F. 2002. Spoken dialogue technology: enabling the conversational user interface. ACM Computing Surveys (CSUR) 34 (1): 90169, March.CrossRefGoogle Scholar
NAICS. 2002. North American Industry Classification System (NAICS). http://www.census.gov/epcd/www/naics.htmlGoogle Scholar
Natarajan, P., Prasad, R., Suhm, B. and McCarthy, D. 2002. Speech enabled natural language call routing: BBN call director. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, September.CrossRefGoogle Scholar
Paek, T. 2001. Empirical methods for evaluating dialog systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Workshop on Evaluation Methodologies for Language and Dialogue Systems, Toulouse, France, July.CrossRefGoogle Scholar
Riccardi, G. and Hakkani-Tür, D. 2003. Active and unsupervised learning for automatic speech recognition. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, September.CrossRefGoogle Scholar
Riccardi, G., Pieraccini, R. and Bocchieri, E. 1996. Stochastic automata for language modeling. Computer Speech & Language, 10: 265293.CrossRefGoogle Scholar
Rosenfeld, R. 1995. Optimizing lexical and n-gram coverage via judicious use of linguistic data. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 2, pp. 1763–1766, Madrid, Spain, September.CrossRefGoogle Scholar
Schapire, R. E. and Singer, Y. 2000. BoosTexter: A boosting-based system for text categorization. Machine Learning 39 (2/3): 135168.CrossRefGoogle Scholar
Schapire, R. E., Rochery, M., Rahim, M. and Gupta, N. 2002. Incorporating prior knowledge into boosting. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, New South Wales, Australia, July.Google Scholar
Schapire, R. E. 2001. The boosting approach to machine learning: An overview. In Proceedings of the MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, March.Google Scholar
Sutton, S. and Cole, R. 1998. Universal speech tools: The CSLU toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Sydney, New South Wales, Australia, November.CrossRefGoogle Scholar
Tur, G., Hakkani-Tür, D. and Schapire, R. E. 2005. Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45 (2): 171186.CrossRefGoogle Scholar
Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), Geneva, Switzerland, September.CrossRefGoogle Scholar
VoiceXML. 2003. Voice extensible markup language (VoiceXML) version 2.0. http://www.w3.org/TR/voicexml20/Google Scholar
Walker, M. A., Litman, D. J.Kamm, C. A. and Abella, A. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)–Conference of the European Chapter of the Association for Computational Linguistics (EACL), Madrid, Spain, July.CrossRefGoogle Scholar