Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-20T04:38:13.359Z Has data issue: false hasContentIssue false

13 - Comparative evaluation and shared tasks for NLG in interactive systems

from Part V - Evaluation and shared tasks

Published online by Cambridge University Press:  05 July 2014

Anja Belz
Affiliation:
University of Brighton
Helen Hastie
Affiliation:
Heriot-Watt University
Amanda Stent
Affiliation:
AT&T Research, Florham Park, New Jersey
Srinivas Bangalore
Affiliation:
AT&T Research, Florham Park, New Jersey
Get access

Summary

Introduction

Natural Language Generation (NLG) has strong evaluation traditions, in particular in the area of user evaluation of NLG-based application systems, as conducted for example in the M-PIRO (Isard et al., 2003), COMIC (Foster and White, 2005), and SumTime (Reiter and Belz, 2009) projects. There are also examples of embedded evaluation of NLG components compared to non-NLG baselines, including, e.g., the DIAG (Di Eugenio et al., 2002), STOP (Reiter et al., 2003b), and SkillSum (Williams and Reiter, 2008) evaluations, and of different versions of the same component, e.g., in the ILEX (Cox et al., 1999), SPoT (Rambow et al., 2001), and CLASSiC (Janarthanam et al., 2011) projects. Starting with Langkilde and Knight's work (Knight and Langkilde, 2000), automatic evaluation against reference texts also began to be used, especially in surface realization. What was missing, until 2006, were comparative evaluation results for directly comparable, but independently developed, NLG systems.

In 1981, Spärck Jones wrote that information retrieval (IR) lacked consolidation and the ability to progress collectively, and that this was substantially because there was no commonly agreed framework for describing and evaluating systems (Sparck Jones, 1981, p. 245). Since then, various sub-disciplines of natural language processing (NLP) and speech technology have consolidated results and progressed collectively through developing common task definitions and evaluation frameworks, in particular in the context of shared-task evaluation campaigns (STECs), and have achieved successful commercial deployment of a range of technologies (e.g. speech recognition software, document retrieval, and dialogue systems).

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ai, H., Raux, A., Bohus, D., Eskenzai, M., and Litman, D. (2007). Comparing spoken dialog corpora collected with recruited subjects versus real users. In Proceedings of the SIGdial Workshop on Discourse and Dialogue (SIGDIAL), pages 124-131, Antwerp, Belgium. Association for Computational Linguistics.Google Scholar
Androutsopoulos, I., Kallonis, S., and Karkaletsis, V. (2005). Exploiting OWL ontologies in the multilingual generation of object description. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 150-155, Aberdeen, Scotland. Association for Computational Linguistics.Google Scholar
Angeli, G., Liang, P., and Klein, D. (2010). A simple domain-independent probabilistic approach to generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 502-512, Boston, MA. Association for Computational Linguistics.Google Scholar
Bagga, A. and Baldwin, B. (1998). Algorithms for scoring coreference chains. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), pages 563-566, Granada, Spain. European Language Resources Association.Google Scholar
Basile, V. and Bos, J. (2011). Towards generating text from discourse representation structures. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 145-150, Nancy, France. Association for Computational Linguistics.Google Scholar
Belz, A. (2007). Probabilistic generation of weather forecast texts. In Proceedings ofHuman Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 164-171, Rochester, NY. Association for Computational Linguistics.Google Scholar
Belz, A. (2009). Prodigy-METEO: Pre-alpha release notes. Technical Report NLTG-09-01, Natural Language Technology Group, CMIS, University of Brighton.Google Scholar
Belz, A. (2010). GREC named entity recognition and GREC named entity regeneration challenges 2010: Participants' pack. Technical Report NLTG-10-01, Natural Language Technology Group, University of Brighton.Google Scholar
Belz, A. and Kow, E. (2009). System building cost vs. output quality in data-to-text generation. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 16-24, Athens, Greece. Association for Computational Linguistics.Google Scholar
Belz, A., Kow, E., Viethen, J., and Gatt, A. (2009). The GREC main subject reference generation challenge 2009: Overview and evaluation results. In Proceedings of the Workshop on Language Generation and Summarisation, pages 79-87, Suntec, Singapore. Association for Computational Linguistics.Google Scholar
Belz, A. and Reiter, E. (2006). Comparing automatic and human evaluation of NLG systems. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 313-320, Trento, Italy. Association for Computational Linguistics.Google Scholar
Belz, A., White, M., Espinosa, D., Kow, E., Hogan, D., and Stent, A. (2011). The first surface realisation shared task: Overview and evaluation results. In Proceedings of the Generation Challenges Session at the European Workshop on Natural Language Generation, pages 217-226, Nancy, France. Association for Computational Linguistics.Google Scholar
Black, A. W., Burger, S., Conkie, A., Hastie, H., Keizer, S., Lemon, O., Merigaud, N., Parent, G., Schubiner, G., Thomson, B., Williams, J. D., Yu, K., Young, S., and Eskenazi, M. (2011). Spoken dialog challenge 2010: Comparison of live and control test results. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 2-7, Portland, OR. Association for Computational Linguistics.Google Scholar
Bohnet, B. and Dale, R. (2005). Viewing referring expression generation as search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1004-1009, Edinburgh, Scotland. International Joint Conference on Artificial Intelligence.Google Scholar
Bohnet, B., Wanner, L., Mille, S., and Burga, A. (2010). Broad coverage multilingual deep sentence generation with a stochastic multi-level realizer. In Proceedings of the International Conference on Computational Linguistics (COLING), pages 98-106, Beijing, China. International Committee on Computational Linguistics.Google Scholar
Bonneau-Maynard, H., Devillers, L., and Rosset, S. (2000). Predictive performance of dialog systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Athens, Greece. European Language Resources Association.Google Scholar
Boyd, A. and Meurers, D. (2011). Data-driven correction of function words in non-native English. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 267-269, Nancy, France. Association for Computational Linguistics.Google Scholar
Cahill, A. and van Genabith, J. (2006). Robust PCFG-based generation using automatically acquired LFG approximations. In Proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pages 1033-1040, Sydney, Australia. Association for Computational Linguistics.Google Scholar
Callaway, C. B. (2003). Do we need deep generation of disfluent dialogue? In Working Papers of the AAAISpring Symposium on Natural Language Generation in Spoken and Written Dialogue, pages 6-11, Stanford, CA. AAAI Press.Google Scholar
Cox, R., O'Donnell, M., and Oberlander, J. (1999). Dynamic versus static hypermedia in museum education: An evaluation of ILEX, the intelligent labelling explorer. In Proceedings of the Conference on Artificial Intelligence in Education, pages 181–188, Le Mans, France. International Artificial Intelligence in Education Society.Google Scholar
Dale, R. (1989). Cooking up referring expressions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 68–75, Vancouver, Canada. Association for Computational Linguistics.Google Scholar
Dale, R. (1990). Generating recipes: An overview of Epicure. In Dale, R., Mellish, C., and Zock, M., editors, Current Research in Natural Language Generation, pages 229-255. Academic Press, San Diego, CA, USA.Google Scholar
Dale, R. and Kilgarriff, A. (2011). Helping our own: The HOO 2011 pilot shared task. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 242–249, Nancy, France. Association for Computational Linguistics.Google Scholar
Dale, R. and Reiter, E. (1995). Computational interpretation of the Gricean maxims in the generation of referring expressions. Cognitive Science, 19(2):233-263.CrossRefGoogle Scholar
Dethlefs, N. and Cuayáhuitl, H. (2011). Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 110–120, Nancy, France. Association for Computational Linguistics.Google Scholar
Di Eugenio, B. (2007). Shared tasks and comparative evaluation for NLG: To go ahead, or not to go ahead? In Proceedings of the NSF Workshop on SharedTasks andComparative Evaluation in Natural Language Generation, Arlington, VA. National Science Foundation.Google Scholar
Di Eugenio, B., Glass, M., and Trolio, M. J. (2002). The DIAG experiments: Natural language generation for intelligent tutoring systems. In Proceedings of the International Conference on Natural Language Generation (INLG), pages 120–127, Arden Conference Center, NY. Association for Computational Linguistics.Google Scholar
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the Human Language Technology Conference (HLT), pages 138–145, San Diego, CA. Morgan Kaufmann.Google Scholar
Eckert, W., Levin, E., and Pieraccini, R. (1997). User modeling for spoken dialogue system evaluation. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (ASRU), pages 80–87, Santa Barbara, CA. Institute of Electrical and Electronics Engineers.Google Scholar
Forbes-Riley, K. and Litman, D. (2009). Adapting to student uncertainty improves tutoring dialogues. In Proceedings of the Artificial Intelligence in Education Conference (AIED), pages 33–40, Brighton, UK. IOS Press.Google Scholar
Foster, M. E. and White, M. (2005). Assessing the impact of adaptive generation in the COMIC multimodal dialogue system. In Proceedings of the Workshop on Knowledge and Reasoning in Practical Dialogue Systems (KRPDS), pages 24–31, Edinburgh, Scotland. International Joint Conference on Artificial Intelligence.Google Scholar
Gatt, A. and Belz, A. (2010). Introducing shared tasks to NLG: The TUNA shared task evaluation challenges. In Krahmer, E. and Theune, M., editors, Empirical Methods in Natural Language Generation, pages 264-293. Springer, Berlin, Heidelberg.Google Scholar
Gatt, A., van der Sluis, I., and van Deemter, K. (2007). Evaluating algorithms for the generation of referring expressions using a balanced corpus. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 49–56, Saarbrücken, Germany. Association for Computational Linguistics.Google Scholar
Goldberg, E., Driedger, N., and Kittredge, R. I. (1994). Using natural-language processing to produce weather forecasts. IEEE Expert: Intelligent Systems and Their Applications, 9(2): 45-53.CrossRefGoogle Scholar
Gupta, S. and Stent, A. (2005). Automatic evaluation of referring expression generation using corpora. In Proceedings of the Workshop on Using Corpora for Natural Language Generation (UCNLG), Birmingham, UK. ITRI, University of Brighton.Google Scholar
Hartikainen, M., Salonen, E.-P., and Turunen, M. (2004). Subjective evaluation of spoken dialogue systems using SERVQUAL method. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 2273–2276, Jeju Island, Korea. International Speech Communication Association.Google Scholar
Henderson, J., Lemon, O., and Georgila, K. (2008). Hybrid reinforcement/supervised learning of dialogue policies from fixed datasets. Computational Linguistics, 34(4): 487-513.CrossRefGoogle Scholar
Isard, A., Oberlander, J., Androutsopoulos, I., and Matheson, C. (2003). Speaking the users' languages. IEEE Intelligent Systems Magazine: Special Issue “Advances in Natural Language Processing”, 18(1):40-45.Google Scholar
Janarthanam, S., Hastie, H., Lemon, O., and Liu, X. (2011). “The day after the day after tomorrow?”: A machine learning approach to adaptive temporal expression generation: Training and evaluation with real users. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 142–151, Portland, OR. Association for Computational Linguistics.Google Scholar
Janarthanam, S. and Lemon, O. (2009). A Wizard of Oz environment to study referring expression generation in a situated spoken dialogue task. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 94–97, Athens, Greece. Association for Computational Linguistics.Google Scholar
Janarthanam, S. and Lemon, O. (2010). Learning to adapt to unknown users: Referring expression generation in spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 69–78, Uppsala, Sweden. Association for Computational Linguistics.Google Scholar
Janarthanam, S. and Lemon, O. (2011). The GRUVE challenge: Generating routes under uncertainty in virtual environments. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 208–211, Nancy, France. Association for Computational Linguistics.Google Scholar
Johansson, R. and Nugues, P. (2007). Extended constituent-to-dependency conversion for English. In Proceedings of the 16th Nordic Conference on Computational Linguistics, pages 105–112, Tartu, Estonia. Northern European Association for Language Technology.Google Scholar
Jordan, P. W. (2000). Can nominal expressions achieve multiple goals? An empirical study. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 142–149, Hong Kong. Association for Computational Linguistics.Google Scholar
Jordan, P. W. and Walker, M. A. (2005). Learning content selection rules for generating object descriptions in dialogue. Journal of Artificial Intelligence Research, 24(1): 157-194.Google Scholar
Jurcicek, F., Keizer, S., Gasic, M., Mairesse, F., Thomson, B., Yu, K., and Young, S. (2011). Real user evaluation of spoken dialogue systems using Amazon mechanical turk. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 3061–3064, Florence, Italy. International Speech Communication Association.Google Scholar
Karasimos, A. and Isard, A. (2004). Multi-lingual evaluation of a natural language generation systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal. European Language Resources Association.Google Scholar
Keeney, R. L. and Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley & Sons, New York, NY.Google Scholar
Knight, K. and Langkilde, I. (2000). Preserving ambiguities in generation via automata intersection. In Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), pages 697–702, Austin, TX. AAAI Press.Google Scholar
Koller, A., Striegnitz, K., Gargett, A., Byron, D., Cassell, J., Dale, R., Moore, J., and Oberlander, J. (2010). Report on the second NLG challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the International Conference on Natural Language Generation (INLG), pages 243–250, Trim, Ireland. Association for Computational Linguistics.Google Scholar
Lamel, L., Rosset, S., Gauvain, J.-L., Bennacef, S., Garnier-Rizet, M., and Prouts, B. (2000). The LIMSI ARISE system. Speech Communication, 31(4):339-354.CrossRefGoogle Scholar
Langkilde-Geary, I. (2002). An empirical verification of coverage and correctness for a generalpurpose sentence generator. In Proceedings of the International Conference on Natural Language Generation (INLG), pages 17–24, Arden Conference Center, NY. Association for Computational Linguistics.Google Scholar
Langner, B. (2010). Data-driven Natural Language Generation: Making Machines Talk Like Humans Using Natural Corpora. PhD thesis, Language Technologies Institute, School of Computer Science, Carnegie Mellon University.Google Scholar
Lavie, A. and Agarwal, A. (2007). METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the ACL Workshop on Statistical Machine Translation, pages 228–231, Prague, Czech Republic. Association for Computational Linguistics.Google Scholar
Lemon, O. (2008). Adaptive natural language generation in dialogue using reinforcement learning. In Proceedings of the Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 141–148, London, UK. SemDial.Google Scholar
Liu, X., Rieser, V., and Lemon, O. (2009). A Wizard of Oz interface to study information presentation strategies for spoken dialogue systems. In Proceedings of the Europe-Asia Spoken Dialogue Systems Technology Workshop, Kloster Irsee, Germany.Google Scholar
Luo, X. (2005). On coreference resolution performance metrics. In Proceedings of the Joint Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP), pages 25–32, Vancouver, Canada. Association for Computational Linguistics.Google Scholar
Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., and Schasberger, B. (1994). The Penn Treebank: Annotating predicate argument structure. In Proceedings of the Human Language Technology Conference (HLT), pages 114–119, Plainsboro, NJ. Association for Computational Linguistics.Google Scholar
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., and Grishman, R. (2004). The NomBank project: An interim report. In Proceedings of the NAACL/HLT Workshop on Frontiers in Corpus Annotation, pages 24–31, Boston, MA. Association for Computational Linguistics.Google Scholar
Moller, S. and Ward, N. G. (2008). A framework for model-based evaluation of spoken dialog systems. In Proceedings of the SIGdial Workshop on Discourse and Dialogue (SIGDIAL), pages 182–189, Columbus, OH. Association for Computational Linguistics.Google Scholar
Nakanishi, H., Miyao, Y., and Tsujii, J. (2005). Probabilistic models for disambiguation of an HPSG-based chart generator. In Proceedings of the International Workshop on Parsing Technologies, pages 93–102, Vancouver, Canada. Association for Computational Linguistics.Google Scholar
Palmer, M., Gildea, D., and Kingsbury, P. (2005). The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71-105.CrossRefGoogle Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 311–318, Philadelphia, PA. Association for Computational Linguistics.Google Scholar
Passonneau, R. (2006). Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy. European Language Resources Association.Google Scholar
Rahim, M., Di Fabbrizio, G., Kamm, C., Walker, M. A., Pokrovsky, A., Ruscitti, P., Levin, E., Lee, S., Syrdal, A., and Schlosser, K. (2001). Voice-IF: A mixed-initiative spoken dialogue system for AT & T conference services. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), pages 1339–1342, Aalborg, Denmark. International Speech Communication Association.Google Scholar
Rambow, O., Rogati, M., and Walker, M. A. (2001). Evaluating a trainable sentence planner for a spoken dialogue system. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 426–433, Toulouse, France. Association for Computational Linguistics.Google Scholar
Reiter, E. and Belz, A. (2006). GENEVAL: A proposal for shared-task evaluation in NLG. In Proceedings of the International Workshop on Natural Language Generation (INLG), pages 136–138, Sydney, Australia. Association for Computational Linguistics.Google Scholar
Reiter, E. and Belz, A. (2009). An investigation into the validity of some metrics for automatically evaluating NLG systems. Computational Linguistics, 35(4):529-558.CrossRefGoogle Scholar
Reiter, E., Robertson, R., and Osman, L. M. (2003a). Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence, 144(1-2):41-58.CrossRefGoogle Scholar
Reiter, E., Sripada, S., Hunter, J., and Davy, I. (2005). Choosing words in computer-generated weather forecasts. Artificial Intelligence, 167:137-169.CrossRefGoogle Scholar
Reiter, E., Sripada, S., and Robertson, R. (2003b). Acquiring correct knowledge for natural language generation. Journal of Artificial Intelligence Research, 18:491-516.Google Scholar
Rieser, V. and Lemon, O. (2010). Natural language generation as planning under uncertainty for spoken dialogue systems. In Krahmer, E. and Theune, M., editors, Empirical Methods in Natural Language Generation, pages 105-120. Springer, Berlin, Heidelberg.Google Scholar
Sambaraju, R., Reiter, E., Logie, R., McKinlay, A., McVittie, C., Gatt, A., and Sykes, C. (2011). What is in a text and what does it do: Qualitative evaluations of an NLG system – the BT-Nurse – using content analysis and discourse analysis. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 22–31, Nancy, France. Association for Computational Linguistics.Google Scholar
Schatzmann, J., Georgila, K., and Young, S. (2005). Quantitative evaluation of user simulation techniques for spoken dialogue systems. In Proceedings of the SIGdial Workshop on Discourse and Dialogue (SIGDIAL), pages 45–54, Lisbon, Portugal. Association for Computational Linguistics.Google Scholar
Schmitt, A., Schatz, B., and Minker, W. (2011). Modeling and predicting quality in spoken human-computer interaction. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 173–184, Portland, OR. Association for Computational Linguistics.Google Scholar
Scott, D. and Moore, J. (2007). An NLG evaluation competition? Eight reasons to be cautious. In Proceedings of the NSF Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation, Arlington, VA. National Science Foundation.Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas (AMTA), pages 223–231, Boston, MA. Association for Machine Translation in the Americas.Google Scholar
Sparck Jones, K. (1981). Retrieval system tests 1958-1978. In Sparck Jones, K., editor, Information Retrieval Experiment, pages 213-255. Butterworths, London.Google Scholar
Sparck Jones, K. (1994). Towards better NLP system evaluation. In Proceedings of the Human Language Technology Conference (HLT), pages 102–107, Plainsboro, NJ. Association for Computational Linguistics.Google Scholar
Sripada, S. G., Reiter, E., Hunter, J., and Yu, J. (2002). SUMTIME-METEO: A parallel corpus of naturally occurring forecast texts and weather data. Technical Report AUCS/TR0201, Computing Science Department, University of Aberdeen.Google Scholar
Stent, A., Prasad, R., and Walker, M. A. (2004). Trainable sentence planning for complex information presentation in spoken dialog systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 79–86, Barcelona, Spain. Association for Computational Linguistics.Google Scholar
Suendermann, D., Liscombe, J., and Pieraccini, R. (2010). Contender. In Proceedings of the Spoken Language Technology Conference (SLT), pages 330–335, Berkeley, CA. Institute of Electrical and Electronics Engineers.Google Scholar
Surdeanu, M., Johansson, R., Meyers, A., Marquez, L., and Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL), pages 159–177, Manchester, UK. Association for Computational Linguistics.Google Scholar
Tokunaga, T., Iida, R., Yasuhara, M., Terai, A., Morris, D., and Belz, A. (2010). Construction of bilingual multimodal corpora of referring expressions in collaborative problem solving. In Proceedings of the Workshop on Asian Language Resources, pages 38–46, Beijing, China. Chinese Information Processing Society of China.Google Scholar
van Deemter, K., Gatt, A., van der Sluis, I., and Power, R. (2012). Generation of referring expressions: Assessing the Incremental Algorithm. Cognitive Science, 36(5): 799-836.Google ScholarPubMed
van der Sluis, I., Gatt, A., and van Deemter, K. (2007). Evaluating algorithms for the generation of referring expressions: Going beyond toy domains. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria. Recent Advances in Natural Language Processing.Google Scholar
Viethen, J. and Dale, R. (2006). Algorithms for generating referring expressions: Do they do what people do? In Proceedings of the International Conference on Natural Language Generation (INLG), pages 63–72, Sydney, Australia. Association for Computational Linguistics.Google Scholar
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., and Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In Proceedings of the Message Understanding Conference, pages 45–52, Columbia, MD. Defense Advanced Research Projects Agency.Google Scholar
Walker, M., Rudnicky, A. I., Aberdeen, J., Bratt, E. O., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Prasad, R., Roukos, S., Sanders, G., Seneff, S., and Stallard, D. (2002a). Darpa Communicator evaluation: Progress from 2000 to 2001. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 273–276, Denver, CO. International Speech Communication Association.Google Scholar
Walker, M. A., Aberdeen, J., Boland, J., Bratt, E. O., Garofolo, J., Hirschman, L., Le, A., Lee, S., Narayanan, S., Papineni, K., Pellom, B., Polifroni, J., Potamianos, A., Prabhu, P., Rudnicky, A. I., Sanders, G., Seneff, S., Stallard, D., and Whittaker, S. (2001a). Darpa Communicator dialog travel planning systems: The June 2000 data collection. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), pages 1371–1374, Aalborg, Denmark. International Speech Communication Association.Google Scholar
Walker, M. A., Kamm, C., and Litman, D. (2000). Towards developing general models of usability with PARADISE. Natural Language Engineering, 6(3-4):363-377.CrossRefGoogle Scholar
Walker, M. A., Passonneau, R., and Boland, J. (2001b). Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 515–522, Toulouse, France. Association for Computational Linguistics.Google Scholar
Walker, M. A., Rudnicky, A. I., Prasad, R., Aberdeen, J., Bratt, E. O., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., Passonneau, R., Roukos, S., Sanders, G., Seneff, S., and Stallard, D. (2002b). Darpa Communicator: Cross-system results for the 2001 evaluation. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 269–272, Denver, CO. International Speech Communication Association.Google Scholar
White, M. and Rajkumar, R. (2009). Perceptron reranking for CCG realization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 410-419, Singapore. Association for Computational Linguistics.Google Scholar
White, M., Rajkumar, R., and Martin, S. (2007). Towards broad coverage surface realization with CCG. In Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation. Association for Computational Linguistics.Google Scholar
Williams, J. D. and Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21(2):393-422.CrossRefGoogle Scholar
Williams, S. and Reiter, E. (2008). Generating basic skills reports for low-skilled readers. Natural Language Engineering, 14(4):495-525.CrossRefGoogle Scholar
Yang, Z., Li, B., Zhu, Y., King, I., Levow, G., and Meng, H. (2010). Collection of user judgments on spoken dialog system with crowdsourcing. In Proceedings of the Spoken Language Technology Conference (SLT), pages 277–282, Berkeley, CA. Institute of Electrical and Electronics Engineers.Google Scholar
Zhong, H. and Stent, A. (2005). Building surface realizers automatically from corpora using general-purpose tools. In Proceedings of the Workshop on Using Corpora for Natural Language Generation (UCNLG), Birmingham, UK. ITRI, University of Brighton.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×