Automatic summarisation of discussion fora

ALMER S. TIGELAAR; RIEKS OP DEN AKKER; DJOERD HIEMSTRA

doi:10.1017/S135132491000001X

Automatic summarisation of discussion fora

Published online by Cambridge University Press: 24 March 2010

ALMER S. TIGELAAR ,

RIEKS OP DEN AKKER and

DJOERD HIEMSTRA

Show author details

ALMER S. TIGELAAR: Affiliation:
Database and Human Media Interaction Groups, University of Twente, The Netherlands e-mail: [email protected], [email protected], [email protected]
RIEKS OP DEN AKKER: Affiliation:
Database and Human Media Interaction Groups, University of Twente, The Netherlands e-mail: [email protected], [email protected], [email protected]
DJOERD HIEMSTRA: Affiliation:
Database and Human Media Interaction Groups, University of Twente, The Netherlands e-mail: [email protected], [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Web-based discussion fora proliferate on the Internet. These fora consist of threads about specific matters. Existing forum search facilities provide an easy way for finding threads of interest. However, understanding the content of threads is not always trivial. This problem becomes more pressing as threads become longer. It frustrates users that are looking for specific information and also makes it more difficult to make valuable contributions to a discussion. We postulate that having a concise summary of a thread would greatly help forum users. But, how would we best create such summaries? In this paper, we present an automated method of summarising threads in discussion fora. Compared with summarisation of unstructured texts and spoken dialogues, the structural characteristics of threads give important advantages. We studied how to best exploit these characteristics. Messages in threads contain both explicit and implicit references to each other and are structured. Therefore, we term the threads hierarchical dialogues. Our proposed summarisation algorithm produces one summary of an hierarchical dialogue by ‘cherry-picking’ sentences out of the original messages that make up a thread. We try to select sentences usable for obtaining an overview of the discussion. Our method is built around a set of heuristics based on observations of real fora discussions. The data used for this research was in Dutch, but the developed method equally applies to other languages. We evaluated our approach using a prototype. Users judged our summariser as very useful, half of them indicating they would use it regularly or always when visiting fora.

Type: Papers
Information: Natural Language Engineering , Volume 16 , Issue 2 , April 2010 , pp. 161 - 192

DOI: https://doi.org/10.1017/S135132491000001X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agresti, A. 2002. Categorical Data Analysis, p. 68, 2nd ed.New York: Wiley-Interscience.CrossRef Google Scholar

op den Akker, R., Hospers, M., Kroezen, E., Nijholt, A., and Lie, D. 2002. A rule-based reference resolution method for dutch discourse analysis. In Proceedings of International Symposium on Reference Resolution in NLP, Alicante, Spain, pp. 59–66.Google Scholar

Baldwin, T., Martinez, D., and Penman, R. B. 2007. Automatic thread classification for linux user forum information access. In Proceedings of ADCS, Melbourne, Australia, pp. 72–9.Google Scholar

Bird, S., Klein, E., and Loper, E. 2008. Natural language processing in python. http://nltk.sourceforge.net/index.php/Book (Draft Version 0.9.2).Google Scholar

Bogers, T. 2004. Dutch Named Entity Recognition: Optimizing Features, Algorithms, and Output. Master's thesis, University of Tilburg.Google Scholar

Bouma, G., van Noord, G., and Malouf, R. 2000. Alpino: wide-coverage computational analysis of Dutch. In Proceedings of CLIN, Tilburg, The Netherlands, pp. 45–59.Google Scholar

Carenini, G., Ng, R. T., and Zhou, X. 2007. Summarizing email conversations with clue words. In Proceedings of WWW, Banff, AB, Canada, pp. 91–100.Google Scholar

Coleman, M., and Liau, T. L. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology 60 (2): 283–84.CrossRef Google Scholar

Dalli, A., Yunqing, X., and Wilks, Y. 2004. FASIL Email summarisation system. In Proceedings of COLING, Geneva, Switzerland, pp. 994–1001.CrossRef Google Scholar

van Deemter, K., and Kibble, R. 2000. On coreferring: coreference in MUC and related annotation schemes. Computational Linguistics 26 (2): 629–37.CrossRef Google Scholar

DuBay, W. H. 2004. The principles of readability. Technical Report, Impact Information. http://www.impact-information.com/impactinfo/readability02.pdf.Google Scholar

van Eynde, F. 2004. Part of Speech Tagging en Lemmatisering van het Corpus Gesproken Nederlands. Centre for Computerlinguistics, Catholic University of Leuven.Google Scholar

Farell, R. 2002. Summarizing electronic discourse. International Journal of Intelligent Systems in Accounting, Finance & Management 11: 23–38.Google Scholar

Farell, R., Fait-weather, P. G., and Snyder, K. 2001. Summarization of discussion groups. In Proceedings of CIKM, Atlanta, GA, pp. 532–34.CrossRef Google Scholar

Feng, D., Shaw, E., Kim, J., and Hovy, E. 2006. Learning to detect conversation focus of threaded discussions. In Proceedings of HLT-NAACL, New York, pp. 208–15.Google Scholar

Francis, W. N., and Kûcera, H. 1979. Brown corpus manual. http://icame.uib.no/brown/bcm.html Google Scholar

Hoste, V., and Daelemans, W. 2005. Learning Dutch coreference resolution. In Proceedings of CLIN'04, Leiden, The Netherlands.Google Scholar

Hoste, V., and van den Bosch, A. 2007. A modular approach to learning Dutch co-reference resolution. In Proceedings of WAR I, Bergen, Norway, pp. 51–75.Google Scholar

Hovy, E. 2004. The Oxford Handbook of Computational Linguistics: Text Summarization, chapter 32, pp. 583–98. Oxford, UK: Oxford University Press.Google Scholar

Hovy, E., Hermjakob, U., and Ravichandran, D. 2002. Qtargets used in webclopedia. http://www.isi.edu/natural-language/projects/webclopedia/Taxonomy Google Scholar

Jurafsky, D., and Martin, J. H. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, p. 340. Upper Saddle River, NJ: Prentice-Hall.Google Scholar

Kim, J., Chem, G., Feng, D., Shaw, E., and Hovy, E. 2006a Mining and assessing discussions on the web through speech act analysis. In Proceedings of ISWC, Athens, GA.Google Scholar

Kim, J., Chem, G., Feng, D., Shaw, E., and Hovy, E. 2006b Modeling and assessing student activities in on-line discussions. In Proceedings of AAAI EDM. Boston, MA.Google Scholar

Kiss, T., and Strunk, J. 2006. Unsupervised multilingual sentence boundary detection. Computational Linguistics 32 (4): 485–525.CrossRef Google Scholar

Klaas, M. 2005. Toward indicative discussion fora summarization. Technical Report UBC-CS TR-2005-04, University of British Columbia.Google Scholar

Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604–632.CrossRef Google Scholar

Lam, D., Rohall, S. L., Schmandt, C., and Stern, M. K. 2002. Exploiting e-mail structure to improve summarization. In Proceedings of CSCW (Interactive Posters), New Orleans, LA.Google Scholar

Lang, K. 1995. Newsweeder: learning to filter netnews. In Proceedings of ICML, Tahoe City, CA, pp. 331–39.Google Scholar

Lin, C.-Y. 2004. Looking for a few good metrics: ROUGE and its evaluation. In Proceedings of NTCIR Workshop, Tokyo, Japan, pp. 1765–76.Google Scholar

Manning, C., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing, p. 371. Cambridge, MA: MIT Press.Google Scholar

McKeown, K., Shrestha, L., and Rambow, O. 2007. Using question-answer pairs in extractive summarization of email conversations. In Proceedings of CICLing, Mexico City, Mexico, pp. 542–50.Google Scholar

Mitkov, R. 1999. Multilingual anaphora resolution. Machine Translation 14 (3–4): 281–99.CrossRef Google Scholar

Rambow, O., Shrestha, L., Chen, J., and Lauridsen, C. 2004. Summarizing email threads. In Proceedings of HTL/NAACL Short Papers, Boston, MA, pp. 105–8.Google Scholar

Ratcliff, J. W., and Metzener, D. M. 1988. Gestalt: an introduction to the Ratcliff/Obershelp pattern matching algorithm. Dr. Dobbs Journal, 7, p. 46.Google Scholar

Rienks, R. 2007. Meetings in Smart Environments: Implications of Progressing Technology. Ph.D. thesis, University of Twente.Google Scholar

Sang, E. T. K. 2005. Language-independent named entity recognition. http://www.cnts.ua.ac.be/conll2002/ner/Google Scholar

Schuth, A., Marx, M., and de Rijke, M. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of CIKM/WIDM, Lisbon, Portugal, vol. 123, pp. 97–104.Google Scholar

Stegeman, L. 2007. Hammer tagger. http://wwwhome.cs.utwente.nl/~infrieks/stt/stt.html Google Scholar

Wan, S., and McKeown, K. 2004. Generating overview summaries of ongoing email thread discussions. In Proceedings of COLING, Geneva, Switzerland, pp. 549–56.CrossRef Google Scholar

Weimer, M., Gurevych, I., and Mühlhäuser, M. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of ACL Demo and Poster Sessions, Prague, Czech Republic, pp. 125–28.Google Scholar

Zechner, K. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics 28 (4): 447–485.CrossRef Google Scholar

Article contents

Automatic summarisation of discussion fora

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests