Recent research advances in Reinforcement Learning in Spoken Dialogue Systems

Matthew Frampton; Oliver Lemon

doi:10.1017/S0269888909990166

Recent research advances in Reinforcement Learning in Spoken Dialogue Systems

Published online by Cambridge University Press: 01 December 2009

Matthew Frampton and

Oliver Lemon

Show author details

Matthew Frampton*: Affiliation:
Center for the Study of Language and Information, Stanford University, Stanford, CA 94305-4101, USA; e-mail: [email protected]
Oliver Lemon*: Affiliation:
School of Mathematical and Computer Sciences, Heriot Watt University, Edinburgh EH14 4AS, UK; e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper will summarize and analyze the work of the different research groups who have recently made significant contributions in using Reinforcement Learning techniques to learn dialogue strategies for Spoken Dialogue Systems (SDSs). This use of stochastic planning and learning has become an important research area in the past 10 years, since it promises automatic data-driven optimization of the behavior of SDSs that were previously hand-coded by expert developers. We survey the most important developments in the field, compare and contrast the different approaches, and describe current open problems.

Type: Articles
Information: The Knowledge Engineering Review , Volume 24 , Issue 4 , December 2009 , pp. 375 - 408

DOI: https://doi.org/10.1017/S0269888909990166 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Bohus, D., Rudnicky, A. 2005. Sorry, I didn’t catch that!—an investigation of non-understanding errors and recovery strategies. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 128–143.Google Scholar

Bos, J., Klein, E., Lemon, O., Oka, T. 2003. DIPPER: description and formalisation of an information-state update dialogue system architecture. In Proceedings of the 4th SIGdial Workshop on Discourse and Dialogue, 115–124.Google Scholar

Cheyer, A., Martin, D. 2001. The open agent architecture. Journal of Autonomous Agents and Multi-Agent Systems 4(1/2), 143–148.CrossRef Google Scholar

Chickering, D., Paek, T. 2005. Online Adaptation of Influence Diagrams. Technical Report MSR-TR-2005-55. Microsoft Corporation.Google Scholar

English, M., Heeman, P. 2005. Learning mixed-initiative dialog strategies by using reinforcement learning on both conversants. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 1011–1018.Google Scholar

Forbes-Riley, K., Litman, D. 2005. Using bigrams to identify relationships between student certainness states and tutor responses in a spoken dialogue corpus. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 87–96.Google Scholar

Frampton, M. 2008. Using Dialogue Acts in Dialogue Strategy Learning: Optimising Repair Strategies. PhD thesis, University of Edinburgh, UK.Google Scholar

Frampton, M., Lemon, O. 2005. Reinforcement learning of dialogue strategies using the user’s last dialogue act. In Proceedings of the 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Zukerman, I., Alexandersson, J. & Jönsson, A. (eds). Edinburgh, UK, 83–90.Google Scholar

Frampton, M., Lemon, O. 2006. Learning more effective dialogue strategies using limited dialogue move features. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 185–192.Google Scholar

Frampton, M., Lemon, O. 2008. Using dialogue acts to learn better repair strategies. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, USA, 5045–5048.Google Scholar

Fraser, N., Gilbert, G. 1991. Simulating speech systems. Computer Speech and Language 5(1), 81–99.CrossRef Google Scholar

Georgila, K., Henderson, J., Lemon, O. 2005a. Learning user simulations for information state update dialogue systems. In Proceedings of Eurospeech, Lisbon, Portugal, 893–896.Google Scholar

Georgila, K., Lemon, O., Henderson, J. 2005b. Automatic annotation of COMMUNICATOR dialogue data for learning dialogue strategies and user simulations. In Proceedings of the 9th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL: DIALOR), Gardent, C. & Gaiffe, B. (eds). Nancy, France.Google Scholar

Hall, M. 1999. Correlation-based Feature Selection for Machine Learning. PhD thesis, University Of Waikato, New Zealand.Google Scholar

Heckerman, D. 1995. A Bayesian approach to learning causal networks. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 285–295.Google Scholar

Henderson, J., Lemon, O., Georgila, K. 2005. Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data. In Proceedings of the 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Zukerman, I., Alexandersson, J. & Jönsson, A. (eds). Edinburgh, UK, 68–75.Google Scholar

Henderson, J., Lemon, O., Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed datasets. Computational Linguistics 34(4), 487–511.CrossRef Google Scholar

Kearns, M., Mansour, Y., Ng, A. 1999. A sparse sampling algorithm for near-optimal planning in large Markov Decision Processes. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, 1324–1331.Google Scholar

Lemon, O., Georgila, K., Henderson, J. 2006a. Evaluating effectiveness and portability of reinforcement learned strategies. In Proceedings of the IEEE/ACL Workshop on Spoken Language Technology, Palm Beach, Aruba, 178–181.Google Scholar

Lemon, O., Georgila, K., Henderson, J., Stuttle, M. 2006b. An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 119–122.Google Scholar

Levin, E., Pieraccini, R. 1997. A stochastic model of computer-human interaction for learning dialogue strategies. In Proceedings of Eurospeech, 1883–1886.CrossRef Google Scholar

Levin, E., Pieraccini, R., Eckert, W. 1998. Using Markov Decision Processes for learning dialogue strategies. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Seattle, USA, 201–204.Google Scholar

Levin, E., Pieraccini, R., Eckert, W. 2000. A stochastic model of computer-human interaction for learning dialogue strategies. IEEE Transactions On Speech and Audio Processing 8(1), 11–23.CrossRef Google Scholar

Litman, D., Kearns, M., Singh, S., Walker, M. 2000. Automatic optimization of dialogue management. In Proceedings of COLING, Saarbrücken, Germany, 502–508.Google Scholar

Litman, D., Silliman, S. 2004. ITSPOKE: An Intelligent Tutoring Spoken dialogue system. In Companion Proceedings of the Human Language Technology Conference: 4th Meeting of the North American Chapter of the Association for Computational Linguistics, Boston, USA, 5–8.Google Scholar

Paek, T., Chickering, D. 2005. The Markov assumption in spoken dialogue management. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 35–44.Google Scholar

Pietquin, O. 2004. A Framework for Unsupervised Learning of Dialogue Strategies. PhD thesis, Faculté Polytechnique de Mons, TCTS Lab, Belgique.Google Scholar

Pietquin, O., Renals, S. 2002. ASR system modeling for automatic evaluation and optimization of dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, 45–48.Google Scholar

Roy, N., Pineau, J., Thrun, S. 2000. Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 93–100.Google Scholar

Schatzmann, J., Weilhammer, K., Stuttle, M. N., Young, S. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review 21(2), 97–126.CrossRef Google Scholar

Scheffler, K., Young, S. 2000. Probabilistic simulation of human-machine dialogues. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1217–1220.Google Scholar

Scheffler, K., Young, S. 2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation. In Proceedings of the NAACL Workshop on Adaptation in Dialogue Systems, 64–70.Google Scholar

Scheffler, K., Young, S. 2002. Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning. In Proceedings of the Human Language Technology conference, Marcus, M. (ed.). San Diego, USA, 12–19.Google Scholar

Sheskin, D. 2007. Handbook of Parametric and Nonparametric Statistical Procedures, 4th ednTaylor and Francis Group.Google Scholar

Singh, S., Kearns, M., Litman, D., Walker, M. 1999. Reinforcement learning for spoken dialogue systems. In Proceedings of the Annual Conference on Neural Information Processing Systems, Denver, USA, 956–962.Google Scholar

Singh, S., Kearns, M., Litman, D., Walker, M. 2000. Reinforcement learning for spoken dialogue systems. In Advances in Neural Information Processing Systems, Solla, S.A., Leen, T.K. & Müller, K.-R. (eds). 12, 956–962. MIT Press.Google Scholar

Singh, S., Litman, D., Kearns, M., Walker, M. 2002. Optimizing dialogue management with reinforcement learning: experiments with the NJFun system. Journal of Artificial Intelligence Research 16, 105–133.CrossRef Google Scholar

Skantze, G. 2003. Exploring human error handling strategies: implications for spoken dialogue systems. In Proceedings of the ISCA Workshop on Error Handling in Spoken Dialogue Systems, Vaud, Switzerland, 71–76.Google Scholar

Sutton, R., Barto, A. 1998. Reinforcement Learning: An Introduction. The MIT Press.Google Scholar

Tetreault, J., Litman, D. 2006. Comparing the utility of state features in spoken dialogue using reinforcement learning. In Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting, Moore, R.C., Bilmes, J.A. & Chu-Carroll, J. (eds). New York, USA, 272–279.Google Scholar

Thomson, B., Schatzmann, J., Weilhammer, K., Ye, H., Young, S. 2007. Training a real-world POMDP-based dialog system. In Proceedings of Bridging the Gap: Academic and Industrial Research in Dialog Technologies, 9–16. ACL.Google Scholar

Walker, M. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research 12, 387–416.CrossRef Google Scholar

Walker, M., Aberdeen, J., Boland, J., Bratt, E., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, B., Polifroni, B., Potamianos, A., Prabhu, P., Rudnicky, A., Sanders, G., Seneff, S., Stallard, D., Whittaker, S. 2001a. DARPA Communicator dialog travel planning systems: the June 2000 data collection. In Proceedings of Eurospeech, Aalborg, Denmark, 1371–1374.CrossRef Google Scholar

Walker, M., Fromer, J., Narayanan, S. 1998. Learning optimal dialogue strategies: a case study of a spoken dialogue agent for email. In Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics, 1345–1352.CrossRef Google Scholar

Walker, M., Kamm, C., Litman, D. 2000. Towards developing general models of usability with PARADISE. Natural Language Engineering 6(3), 363–377.CrossRef Google Scholar

Walker, M., Litman, D., Kamm, C., Abella, A. 1997. PARADISE: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics, 271–280.CrossRef Google Scholar

Walker, M., Passonneau, R. 2001. DATE: a Dialogue Act Tagging scheme for Evaluation of spoken dialogue systems. In Proceedings of the Human Language Technology Conference, San Diego, USA.Google Scholar

Walker, M., Passonneau, R., Boland, J. 2001b. Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems. In Proceedings of the 39th Annual Meeting of the Association for Compuational Linguistics, 515–522.Google Scholar

Walker, M., Rudnicky, A., Prasad, R., Aberdeen, J., Bratt, E., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Roukos, S., Sanders, G., Seneff, S., Stallard, D. 2002. DARPA Communicator: cross-system results for the 2001 evaluation. In Proceedings of the International Conference on Spoken Language Processing, Denver, USA, 269–272.Google Scholar

Williams, J. 2007. Applying POMDPs to dialog systems in the troubleshooting domain. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, 1–8. ACL.Google Scholar

Williams, J., Poupart, P., Young, S. 2005a. Factored Partially Observable Markov Decision Processes for dialogue management. In 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialog Systems, Zukerman, I., Alexandersson, J. & Jönsson, A. (eds). Edinburgh, UK, 76–82.Google Scholar

Williams, J., Poupart, P., Young, S. 2005b. Partially Observable Markov Decision Processes with continuous observations for dialogue management. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, Dybkjaer, L. & Minker, W. (eds). Lisbon, Portugal, 87–96.Google Scholar

Williams, J., Young, S. 2005. Scaling up POMDPs for dialog management: the ‘Summary POMDP’ method. In Automatic Speech Recognition and Understanding Workshop, Puerto Rico, USA, 250–255.Google Scholar

Williams, J., Young, S. 2006. Scaling POMDPs for dialog management with composite summary point-based value iteration (CSPBVI). In AAAI Workshop on Statistical and Empirical Approaches for Spoken Dialogue Systems, Boston, USA, 37–42.Google Scholar

Williams, J., Young, S. 2007. Partially Observable Markov Decision Processes for spoken dialog systems. Computer Speech and Language 21(2), 231–422.Google Scholar

Article contents

Recent research advances in Reinforcement Learning in Spoken Dialogue Systems

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests