13 - Recent Trends
Published online by Cambridge University Press: 05 September 2016
Summary
Throughout the book we have covered a number of compact data structures that are well established and tested, and their basic aspects can be considered stable. This chapter is conceived as an epilogue, where we take a look to the future. We describe some recent trends that, while not mature or general enough to deserve a thorough treatment in the book, are certainly promising and likely to be the focus of intense research in the upcoming years. Therefore, the chapter may also serve to guide the readers looking for hot research topics. Its writing style is different from the other chapters, as we discuss the bibliography together with the main material.
First, we consider encoding data structures. These ensure only that a certain set of queries of interest can be answered; the actual data cannot be recovered. Encodings can offer significant space reductions with respect to the data entropy and are interesting because they have the potential to reach the minimum space needed just to answer the desired queries; nothing superfluous is stored. They also define a new concept of entropy, in terms not only of a universe of objects, but also of a set of queries posed on them. Encodings for a few problems exist already, but they are very recent and mostly in a theoretical stage.
Second, we consider repetitive document collections. Many of the largest text collections arising in applications these days are highly repetitive, and thus very compressible if one applies the right compression methods. The compact data structures we have seen, however, focus on statistical compression, which is insensitive to this kind of compressibility. New text indexes building on other compression principles are being designed, although they are generally slower than the classical ones and still do not reach the exact entropy bounds as current statistical methods do.
Finally, we consider secondary memory. Compact data structures use less space than classical ones and thus may fit in smaller and faster memories. However, they have generally less locality of reference, thus they are slower than classical structures when competing in the same level of the memory hierarchy. This is particularly relevant when the datasets are huge and the structures must operate on disk, where the space usage is not so important and locality of reference is crucial.
- Type
- Chapter
- Information
- Compact Data StructuresA Practical Approach, pp. 501 - 548Publisher: Cambridge University PressPrint publication year: 2016