Web retrieval, ranking and personalization

doi:10.29085/9781856049740.012

Web search is unique

Web search is different from other types of information retrieval. The scale and diversity of web content is several orders of magnitude larger than what is found in traditional information retrieval corpora. Web corpora contain billions of web pages with content ranging from images to blogs to technical articles. Similarly, the scale and variety of the people who issue web queries and the tasks that underlie those queries are immense. Web search tools are used for everything from simple navigational queries to complex research tasks that extend over time. Although there are some common motivations, strategies, tasks and information targets, there is a long tail of uncommon behavior that accounts for a significant portion of all web search interactions. We begin this chapter by looking more closely at what makes web search unique. We then show how these unique aspects have led to interesting approaches to ranking, including the use of machine learning and rich features. We describe how people interact with web search results, how rich behavioral data can be used to personalize the search experience, and the challenges and opportunities that these capabilities pose for evaluation. We conclude by looking at emerging trends in web search to give a glimpse of what web search tools of the future might look like.

Large-scale, diverse content

The web is very large; the number of documents it contains is estimated in the tens of billions, with many pages generated on the fly based on underlying databases or user interaction. Before the rise of the web, search engines typically only dealt with static corpora of, at most, millions of documents. The billions of web documents search tools must contend with come in a larger variety of formats than typically dealt with in traditional information retrieval, including blogs, online stores, government pages, forums and news sites. Text can be extracted from many of these document types (e.g. HTML, PDF, Word), but not necessarily from all (e.g. images, video). The content people search for within this large and diverse corpus is broad. In addition to documents, people look for particular images, answers to their questions, entities, templates and applications.

Book contents

10 - Web retrieval, ranking and personalization

Summary

Access options

Book contents

10 - Web retrieval, ranking and personalization

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive