Book contents
- Frontmatter
- Contents
- Preface
- Prologue
- 1 Introduction
- 2 On sets and kinds for IR
- 3 Vector and Hilbert spaces
- 4 Linear transformations, operators and matrices
- 5 Conditional logic in IR
- 6 The geometry of IR
- Appendix I Linear algebra
- Appendix II Quantum mechanics
- Appendix III Probability
- Bibliography
- Author index
- Index
2 - On sets and kinds for IR
Published online by Cambridge University Press: 14 January 2010
- Frontmatter
- Contents
- Preface
- Prologue
- 1 Introduction
- 2 On sets and kinds for IR
- 3 Vector and Hilbert spaces
- 4 Linear transformations, operators and matrices
- 5 Conditional logic in IR
- 6 The geometry of IR
- Appendix I Linear algebra
- Appendix II Quantum mechanics
- Appendix III Probability
- Bibliography
- Author index
- Index
Summary
In this chapter an elementary introduction to simple information retrieval is given using set theory. We show how the set-theoretic approach leads naturally to a Boolean algebra which formally captures Boolean retrieval (Blair, 1990). We then move onto to assume a slightly more elaborate class structure, which naturally leads to an algebra which is non-Boolean and hence reflects a non-Boolean logic (see Aerts et al., 1993, for a concrete example). The chapter finishes by giving a simple example in Hilbert space of the failure of the distribution law in logic.
Elementary IR
We will begin with a set of objects; these objects are usually documents. A document may have a finer-grained structure, that is, it may contain some structured text, some images and some speech. For the moment we will not be concerned with that internal structure. We will only make the assumption that for each document it is possible to decide whether a particular attribute or property applies to it. For example, for a text, we can decide whether it is about ‘politics’ or not; for images we might be able to decide that an image is about ‘churches’. For human beings such decisions are relatively easy to make, for machines, unfortunately, it is very much harder. Traditionally in IR the process of deciding is known as indexing, or the assigning of index terms, or keywords. We will assume that this process is unproblematic until later in the book when we will discuss it in more detail.
- Type
- Chapter
- Information
- The Geometry of Information Retrieval , pp. 28 - 40Publisher: Cambridge University PressPrint publication year: 2004