Book contents
- Frontmatter
- Contents
- Preface
- The Design of Lucida®: an Integrated Family of Types for Electronic Literacy
- Tabular Typography
- A Simple Mechanism for Authorship of Dynamic Documents
- VORTEXT: VictORias TEXT reading and authoring system
- An Approach to the Design of a Page Description Language
- Intelligent Matching and Retrieval for Electronic Document Manipulation
- A Disciplined Text Environment
- Semantic Guided Editing: A Case Study On Genetic Manipulations
- Trends and Standards in Document Representation
- Textmaster – document filing and retrieval using ODA
- Combining Interactive Document Editing with Batch Document Formatting
- Formatting Structure Documents: Batch versus Interactive?
- Advanced Catalogue Production at Unipart
- Legibility of Digital Type-fonts and Comprehension in Reading
- An Overview of the W Document Preparation System
- Grif: An Interactive System for Structured Document Manipulation
- Procedural Page Description Languages
- A Strategy for Compressed Storage and Retrieval of Documents
- CONCEPT BROWSER: a System for Interactive Creation of Dynamic Documentation
- An Integrated, but not Exact-Representation, Editor/Formatter
- An Annotated Bibliography on Document Processing
- Systems used
A Strategy for Compressed Storage and Retrieval of Documents
Published online by Cambridge University Press: 05 May 2010
- Frontmatter
- Contents
- Preface
- The Design of Lucida®: an Integrated Family of Types for Electronic Literacy
- Tabular Typography
- A Simple Mechanism for Authorship of Dynamic Documents
- VORTEXT: VictORias TEXT reading and authoring system
- An Approach to the Design of a Page Description Language
- Intelligent Matching and Retrieval for Electronic Document Manipulation
- A Disciplined Text Environment
- Semantic Guided Editing: A Case Study On Genetic Manipulations
- Trends and Standards in Document Representation
- Textmaster – document filing and retrieval using ODA
- Combining Interactive Document Editing with Batch Document Formatting
- Formatting Structure Documents: Batch versus Interactive?
- Advanced Catalogue Production at Unipart
- Legibility of Digital Type-fonts and Comprehension in Reading
- An Overview of the W Document Preparation System
- Grif: An Interactive System for Structured Document Manipulation
- Procedural Page Description Languages
- A Strategy for Compressed Storage and Retrieval of Documents
- CONCEPT BROWSER: a System for Interactive Creation of Dynamic Documentation
- An Integrated, but not Exact-Representation, Editor/Formatter
- An Annotated Bibliography on Document Processing
- Systems used
Summary
ABSTRACT
Document storage and retrieval systems should possess fast string search capabilities. The access paths needed to reduce the search times require substantial amounts of storage in addition to the very large storage requirements for the documents themselves. In this paper we investigate a technique that supports access paths on compressed documents, so that the total storage requirements for the access paths and the compressed documents are less than that for the original documents.
Introduction
Advances in hardware technology are unlikely to keep pace with the increasing growth of on-line document storage. In an environment where the trend is towards local and wide area networks (there is the promise of an interconnected society around the corner), a large number of documents would be transmitted between nodes. Document storage, their communication along network paths and between peripherals and processors requires, for the provision of a satisfactory service at reasonable cost, that the documents be held more compactly than at present. Natural language being highly redundant a suitable encoding scheme could be utilized with any resultant compression reducing both storage and communication cost. In an online environment the compression and decompression schemes must not involve excessive overheads in either time or space; since the documents would need to be compressed only once for storage while decompressed (or retrieved) more often, it is possible to tolerate higher levels of overhead during the compression stage.
Document retrieval requires fast string search capabilities, and it is usual to provide additional access paths to reduce the search times e.g. by providing inverted lists on words. In [Goyal83] a scheme was proposed that made use of inverted indexes associated with compressed documents.
- Type
- Chapter
- Information
- Text Processing and Document ManipulationProceedings of the International Conference, University of Nottingham, 14-16 April 1986, pp. 224 - 232Publisher: Cambridge University PressPrint publication year: 1986