A Strategy for Compressed Storage and Retrieval of Documents

doi:10.1017/CBO9780511663130.019

ABSTRACT

Document storage and retrieval systems should possess fast string search capabilities. The access paths needed to reduce the search times require substantial amounts of storage in addition to the very large storage requirements for the documents themselves. In this paper we investigate a technique that supports access paths on compressed documents, so that the total storage requirements for the access paths and the compressed documents are less than that for the original documents.

Introduction

Advances in hardware technology are unlikely to keep pace with the increasing growth of on-line document storage. In an environment where the trend is towards local and wide area networks (there is the promise of an interconnected society around the corner), a large number of documents would be transmitted between nodes. Document storage, their communication along network paths and between peripherals and processors requires, for the provision of a satisfactory service at reasonable cost, that the documents be held more compactly than at present. Natural language being highly redundant a suitable encoding scheme could be utilized with any resultant compression reducing both storage and communication cost. In an online environment the compression and decompression schemes must not involve excessive overheads in either time or space; since the documents would need to be compressed only once for storage while decompressed (or retrieved) more often, it is possible to tolerate higher levels of overhead during the compression stage.

Document retrieval requires fast string search capabilities, and it is usual to provide additional access paths to reduce the search times e.g. by providing inverted lists on words. In [Goyal83] a scheme was proposed that made use of inverted indexes associated with compressed documents.

Book contents

A Strategy for Compressed Storage and Retrieval of Documents

Summary

Access options

Book contents

A Strategy for Compressed Storage and Retrieval of Documents

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive