Using patterns of thematic progression for building a table of contents of a text

MARIE-FRANCINE MOENS

doi:10.1017/S135132490600430X

Using patterns of thematic progression for building a table of contents of a text

Published online by Cambridge University Press: 01 April 2008

MARIE-FRANCINE MOENS

Show author details

MARIE-FRANCINE MOENS*: Affiliation:
Interdisciplinary Centre for Law and Information Technology, Katholieke Universiteit Leuven, Tienstraat 41, B-3000 Leuven, Belgium e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A text usually contains one or a few main topics, which are split up into subtopics, which in their turn can be further described by more detailed topics. In this article we describe a system that segments a text into topics and subtopics. Each segment is characterized by important key terms that are extracted from it and by its begin and end position in the text. A table of contents is built by using the hierarchical and sequential relationships between topical segments that are identified in a text. The table of contents generator relies upon universal linguistic theories on the topic and comment of a sentence and on patterns of thematic progression in text. The linguistic theories of topic and comment are modeled both deterministically and probabilistically. The system is applied to English texts (news, World Wide Web and encyclopedia texts) and is evaluated.

Type: Papers
Information: Natural Language Engineering , Volume 14 , Issue 2 , April 2008 , pp. 145 - 172

DOI: https://doi.org/10.1017/S135132490600430X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Angheluta, R., Mitra, R., Jing, X. and Moens, M.-F. (2004) K.U.Leuven summarization system at DUC-2004. DUC Workshop Papers and Agenda, pp. 53–60. Boston.Google Scholar

Barzilay, R. and Elhadad, M. (1999) Using lexical chains for text summarization. In: Mani, I. and Maybury, M. T. (eds.), Advances in Automatic Text Summarization, pp. 111–121. Cambridge, MA: MIT Press.Google Scholar

Beeferman, D., Berger, A. and Lafferty, J. (1999) Statistical models for text segmentation. Machine Learning 34: 177–210.CrossRef Google Scholar

Berger, A., Della Pietra, S. and DellaPietra, V. Pietra, V. (1996) A maximum entropy approach to natural language processing. Computational Linguistics 22 1: 39–71.Google Scholar

Buyukkokten, O.Garcia-Molina, H. and Paepcke, A. (2001) Seeing the whole in parts: text summarization for web browsing on handheld devices. Proceedings of the World Wide Web Conference 10, pp. 652–662. New York: ACM.Google Scholar

Carletta, J. (1996) Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22 2: 249–254.Google Scholar

Chali, I., Kolla, M., Singh, N. and Zhang, Z. (2003) The University of Lethbridge text summarizer at DUC- 2003. In: Radev, D. and Teufel, S. (eds.), Proceedings of the Text Summarization Workshop and 2003 Document Understanding Conference, pp. 148–152. Gaithersburg, MD: NIST.Google Scholar

Choi, F. Y. Y. (2000) Advances in domain independent linear text segmentation. Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 26–33.Google Scholar

Croft, W. (1990) Typology and Universals. Cambridge, UK: Cambridge University Press.Google Scholar

Dane

, F. (1974) Functional sentence perspective and the organization of the text. In: F., Dane

, (ed.), Papers on Functional Sentence Perspective, pp. 106–128. The Hague: Mouton.CrossRef Google Scholar

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B 39 1: 1–38.Google Scholar

Dunning, T. (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19: 61–74.Google Scholar

Fries, P. H. (1994) On theme, rheme and discourse goals. In Coulthard, M. (ed.), Advances in Written Text Analysis, pp. 229–249. London: Routledge.Google Scholar

Givón, T. (1983) Introduction. In: Givón, T. (ed.). Topic Continuity in Discourse: A Quantitative Cross-Language Study, pp. 1–41. Amsterdam: John Benjamins.CrossRef Google Scholar

Givón, T. (1988) The pragmatics of word-order: Predictability, importance and attention. In: Hammond, M.Moravcsik, E. and Wirth, J. (eds.), Studies in Syntactic Typology, pp. 243–284. Amsterdam: John Benjamins.CrossRef Google Scholar

Givón, T. (2001) Syntax: An Introduction. Amsterdam: John Benjamin.Google Scholar

Gregory, M. L. and Michaelis, L. A. (2001) Topicalization and left-dislocation: A functional opposition revisited. Journal of Pragmatics 33 11: 1665–1706.CrossRef Google Scholar

Grosz, B. J. and Sidner, C. L. (1998) Lost intuitions and forgotten intentions. In: Walker, M. A.Joshi, A. K. and Prince, E. F. (eds.), Centering Theory in Discourse, pp. 39–51. Oxford, UK: Clarendon Press.Google Scholar

Gundel, J. (1988) Universals of topic-comment structure. In:Hammond, M.Moravcsik, E. and Wirth, J. (eds.), Studies in Syntactic Typology, pp. 209–239. Amsterdam: John Benjamins.CrossRef Google Scholar

Gundel, J. (1999) On different kinds of focus. In: Bosch, P. and Sandt, R. van der (eds.), Focus: Linguistic, Cognitive and Computational Perspectives, pp. 293–305. Cambridge, UK: University Press.Google Scholar

Hahn, U. (1990) Topic parsing: accounting for text macro structures in full-text analysis. Information Processing and Management 26 1: 135–170.CrossRef Google Scholar

Haji

ová, E. (1994) Topic/focus and related research. In P.A., Luelsdorff (ed.), The Prague School of Structural and Functional Linguistics, pp. 245–275. Amsterdam: John Benjamins.CrossRef Google Scholar

Haji

ová, E. and Sgall, P. (1988) Topic and focus of a sentence and the patterning of a text. In: J. S., Petfi (ed.), Text and Discourse Constitution: Empirical Aspects, Theoretical Approaches, pp. 70–96. Berlin: Walter de Gruyter.CrossRef Google Scholar

Halliday, M. A. K. (1967) Notes on transitivity and theme in English, part II. Journal of Linguistics: 189–202.Google Scholar

Halliday, M. A. K. (1976) Theme and information in the English clause. In: Kress, G. R. and Halliday, M. A. K. (eds.), Halliday: System and Function in Language, pp. 174–188. London: Oxford University Press.Google Scholar

Hearst, M. A. (1997) TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23 1: 33–64.Google Scholar

Hearst, M. A. and Plaunt, C. (1993) Subtopic structuring for full-length document access. In: Korfhage, R.Rasmussen, E. and Willett, P. (eds.), Proceedings of the Sixteenth SIGIR Conference, pp. 59–68. New York: ACM.Google Scholar

Hinds, J. (1979) Organizational patterns in discourse. In: Givón, T. (ed.), Syntax and Semantics 12. Discourse and Syntax, pp. 135–157. New York: Academic Press.Google Scholar

Hopper, P. J. (1979) Aspect and foregrounding in discourse. In: Givón, T. (ed.), Syntax and Semantics 12. Discourse and Syntax, pp. 213–241. New York: Academic Press.Google Scholar

Kan, M.-Y. (2003) Automatic Text Summarization as Applied to Information Retrieval. Using Indicative and Informative Summaries. PhD thesis Columbia University, NY.Google Scholar

Kan, M.-Y.Klavans, J. L. and McKeown, K. R. (1998) Linear segmentation and segment relevance. In Proceedings of the 6th International Workshop of Very Large Corpora (WVLC-6), Montréal, Québec, Canada: August 1998, pp. 197–205.Google Scholar

Kan, M.-Y.McKeown, K. R. and Klavans, J. L. (2001) Domain-specific informative and indicative summarization for information retrieval. In D. Harman and D. Marcu (eds.), Proceedings of DUC 2001 Workshop on Text Summarization.Google Scholar

Kieras, D. E. (1985) Thematic processes in the comprehension of technical prose. In: Britton, B. K. and Black, J. B. (eds.), Understanding Expository Text, pp. 89–107. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar

Kintsch, W. (2002) On the notions of theme and topic in psychological process models of text comprehension. In Louwerse, M. and van Peer, W. (eds.), Thematics: Interdisciplinary Studies, pp. 157–170. Amsterdam: Benjamins.CrossRef Google Scholar

Kintsch, W. and vanDijk, T. A. Dijk, T. A. (1978) Toward a model of text comprehension and production. Psychological Review 85 5: 363–394.CrossRef Google Scholar

Kononenko, I., Kononenko, S., Popov, I. and Zagorulko, Y. (2000) Information extraction from nonsegmented text. RIAO'2000 Content-Based Multimedia Information Access. Paris.Google Scholar

Li, H. and Yamanishi, K. (2003) Topic analysis using a finite mixture model. Information Processing and Management, 39 4: 521–541.CrossRef Google Scholar

Marcu, D. (2000) The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: The MIT Press.CrossRef Google Scholar

Meinunger, A. (2000) Syntactic Aspects of Topic and Comment. Amsterdam: John Benjamins.CrossRef Google Scholar

Mikheev, A. (1998) Part-of-Speech Guessing Rules: Learning and Evaluation.Google Scholar

Mitra, R., Angheluta, R., Jeuniaux, P. and Moens, M.-F. (2003) Progressive fuzzy clustering for noun phrase coreference resolution. Proceedings of the Fourth Dutch-Belgian Information Retrieval Workshop DIR-2003, pp. 10–15. Amsterdam: CWI.Google Scholar

Moens, M.-F. and Angheluta, R. (2003) Concept extraction from legal cases: The use of a statistic of coincidence. Proceedings of the Eight International Conference on Artificial Intelligence and Law, pp. 142–146. New York: ACM.Google Scholar

Moens, M.-F.Angheluta, R. and Dumortier, J. (2005) Generic technologies for single- and multi-document summarization. Information Processing and Management, 41 3: 569–586.CrossRef Google Scholar

Morris, J. and Hirst, G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17 1: 21–43.Google Scholar

Over, P. and Yen, J. (2003) An introduction to DUC-2003: Intrinsic evaluation of generic news text summarization systems. Proceedings of the 2003 Document Understanding Conference. Gaithersburg, MD: NIST.Google Scholar

Paducheva, E. V. (1996) Theme-rheme structure: Its exponents and its semantic interpretation. In: B. H., Partee and P.Sgall (eds.), Discourse and Meaning. Papers in Honor of Eva Haji

ová, pp. 273–287. Amsterdam: John Benjamins.CrossRef Google Scholar

Peregrin, J. (1996) Topic and focus in a formal framework. In B. Partee and P. Sgall (eds.), Discourse and Meaning: Papers in Honor of Eva Haji

ová, pp. 235–254. Amsterdam: John Benjamins.CrossRef Google Scholar

Ponte, J. M. and Croft, W. B. (1997) Text segmentation by topic. Proceedings of the first European Conference on Research and Advanced Technology for Digital Libraries, pp. 120–129.CrossRef Google Scholar

Prikhod'ko, S. M. and Skorokhod'ko, E. F. (1982) Automatic abstracting from analysis of links between phrases. Nauchno-Tekhnicheskaya Informatsiya, Seriya 216 1: 27–32.Google Scholar

Ratnaparkhi, A. (1998) Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania.Google Scholar

Reinhart, T. (1982) Pragmatics and Linguistics: An Analysis of Sentence Topics. Indiana University Linguistics Club, Bloomington Indiana.CrossRef Google Scholar

Roberts, C. (1998) The place of centering in a general theory of anaphora resolution. In: Walker, M. A.Joshi, A. K. and Prince, F. (eds.), Centering Theory in Discourse, pp. 359–399. Oxford, UK: Clarendon Press.Google Scholar

Salton, G., Allan, J., Buckley, C. and Singhal, A. (1994) Automatic analysis, theme generation, and summarization of machine-readable texts. Science 264: 1421–1426.CrossRef Google Scholar PubMed

Salton, G., Singhal, A., Buckley, C. and Mitra, M. (1996) Automatic text decomposition using text segments and text themes. Hypertext 96: 53–65.CrossRef Google Scholar

Sanderson, M. and Croft, W. B. (1999) Deriving concept hierarchies from texts. Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, pp. 206–213. New York: ACM.Google Scholar

Sidner, C. L. (1983) Focusing in the comprehension of definite anaphora. In: Brady, M. and Berwick, R. C. (eds.), Computational Models of Discourse, pp. 267–330. Cambridge, MA: The MIT Press.Google Scholar

Sormunen, E., Kekäläinen, J., Koivisto, J. and Järvelin, K. (2001) Document text characteristics affect the ranking of the most relevant documents by expanded structured queries. Journal of Documentation 57 3: 358–376.CrossRef Google Scholar

Tomlin, R. S., Forrest, L., Pu, M. M. and Kim, M. H. (1997) Discourse semantics. In: T. A. van Dijk (ed.), Discourse as Structure and Process Discourse Studies: A Multidisciplinary Introduction 1), pp. 63–111. London: Sage.CrossRef Google Scholar

Van Dijk, T. A. (1988) News as Discourse. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar

Van Dijk, T. A. (1997) The study of discourse. In: van Dijk, T. A. (ed.), Discourse as Structure and Process Discourse Studies: A Multidisciplinary Introduction 1), pp. 1–34. London: Sage.Google Scholar

Yaari, Y. (2000) NLP-assisted exploration of texts. In Proceedings RIAO'2000 Content-Based Multimedia Information Access. Paris: CID-CASIS.Google Scholar

Yang, C. and Wang, F. L. (2003) Fractal summarization for mobile devices to access large documents on the Web. Proceedings of the International World Wide Web Conference, Budapest, Hungary. New York: ACM.Google Scholar

Zizi, M. and Beaudouin-Fafon, M. (1995) Hypermedia exploration with interactive dynamic maps. International Journal Human-Computer Studies 43 3: 441–464.CrossRef Google Scholar

Article contents

Using patterns of thematic progression for building a table of contents of a text

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests