Book contents
- Frontmatter
- Dedication
- Contents
- Figures and tables: acknowledgements
- Contributors
- Foreword
- Preface
- 1 Interactive information retrieval: history and background
- 2 Information behavior and seeking
- 3 Task-based information searching and retrieval
- 4 Approaches to investigating information interaction and behaviour
- 5 Information representation
- 6 Access models
- 7 Evaluation
- 8 Interfaces for information retrieval
- 9 Interactive techniques
- 10 Web retrieval, ranking and personalization
- 11 Recommendation, collaboration and social search
- 12 Multimedia: behaviour, interfaces and interaction
- 13 Multimedia: information representation and access
- References
- Index
5 - Information representation
Published online by Cambridge University Press: 08 June 2018
- Frontmatter
- Dedication
- Contents
- Figures and tables: acknowledgements
- Contributors
- Foreword
- Preface
- 1 Interactive information retrieval: history and background
- 2 Information behavior and seeking
- 3 Task-based information searching and retrieval
- 4 Approaches to investigating information interaction and behaviour
- 5 Information representation
- 6 Access models
- 7 Evaluation
- 8 Interfaces for information retrieval
- 9 Interactive techniques
- 10 Web retrieval, ranking and personalization
- 11 Recommendation, collaboration and social search
- 12 Multimedia: behaviour, interfaces and interaction
- 13 Multimedia: information representation and access
- References
- Index
Summary
Introduction
We build information retrieval systems to help people satisfy their information needs. Although new computer interfaces have increased the ways that users can satisfy their information needs, the predominant interfaces require users to describe their needs with words. Similarly, most retrieval systems represent the items in their collection using words. The words provided by the user are then compared to the words attached to the items in the collection. If the user's words match those of the item, then that item might be what the user wants. If the words do not match, then the item is not returned by the retrieval system. While this explanation is a simplification of matters, the way we represent documents is the first step towards obtaining a high quality retrieval system. We begin by discussing issues in representing text items that pertain to both manual and automatic representation techniques.
Text representation
Textual items in our collections could be books, journal articles, web pages, XML documents, emails, word processing files and so forth. These items vary in length and structure and may contain non-text items such as images. In all cases, we will refer to the text items in a collection as documents or items, but it is important to remember that there is a large variety of possible text items.
Although there are many possible representations of documents, our focus will be on representations that use words or tokens derived from words. The process of deciding which words to use to describe a document is called indexing and the chosen words are called index terms. Sometimes we want to represent documents with more than words and then it makes sense to talk about the use of features, which are more generic than index terms. An example of a non-word feature could be the number of words in a document.
When we, as humans, manually index documents, we look at the item, read it, and then make decisions ourselves about what index terms to use. When we automatically index, we write computer algorithms to process digital forms of the documents and make the decisions about index terms.
- Type
- Chapter
- Information
- Interactive Information Seeking, Behaviour and Retrieval , pp. 77 - 94Publisher: FacetPrint publication year: 2011
- 1
- Cited by