Book contents
- Frontmatter
- Contents
- List of Panels
- Preface
- PART I A LONG-PONDERED OUTFIT
- PART II THE EVALUATION DISCORDANCE
- PART III THE ALGORITHMIC CONFLUENCE
- 7 Intelligence and Algorithmic Information Theory
- 8 Cognitive Tasks and Difficulty
- 9 From Tasks to Tests
- 10 The Arrangement of Abilities
- 11 General Intelligence
- PART IV THE SOCIETY OF MINDS
- PART V THE KINGDOM OF ENDS
- References
- Index
- Plate section
9 - From Tasks to Tests
from PART III - THE ALGORITHMIC CONFLUENCE
Published online by Cambridge University Press: 19 January 2017
- Frontmatter
- Contents
- List of Panels
- Preface
- PART I A LONG-PONDERED OUTFIT
- PART II THE EVALUATION DISCORDANCE
- PART III THE ALGORITHMIC CONFLUENCE
- 7 Intelligence and Algorithmic Information Theory
- 8 Cognitive Tasks and Difficulty
- 9 From Tasks to Tests
- 10 The Arrangement of Abilities
- 11 General Intelligence
- PART IV THE SOCIETY OF MINDS
- PART V THE KINGDOM OF ENDS
- References
- Index
- Plate section
Summary
Altitude … is the degree of difficulty at which a given percentage of success is attained. …Width [is] the percent of successes at any given altitude or the average percent of successes at any given series of altitudes. … Area is the total number of tasks done correctly, or the percentage which this total is of the number of tasks in the entire list.
– Edward L. Thorndike, The Measurement of Intelligence (1927)THE PRECISE definition and understanding of a task is crucial for its use as a measurement tool. In the previous chapter we have seen how tasks can be characterised in terms of their policies and their difficulty viewed as a search problem. Setting the perspective on the policies is helpful, but we need to pursue this direction further. A test is much more than just a task. In what follows we will introduce tools to examine how the performance of an agent changes for a range of difficulties. When building a test, we will focus on how to sample from the whole set of tasks and instances to obtain an appropriate pool of items, which maximise the information they take about the subject. This choice of items can be performed by diversity, difficulty and also by discriminating power, a challenging issue that is crucial for the design of adaptive tests.
AGENT CHARACTERISTIC CURVES
In our eagerness for quantification, we usually summarise a phenomenon by a single number. For instance, when we want to evaluate an agent for a given task, we sample from the task and record its aggregate response. This value can then be compared with those of other agents. Should not this be enough for a cognitive test and the comparison of agents? Unfortunately, much information is lost whenever we summarise the results of many task instances into a single number. The study of the responses for single instances can provide valuable information about how the agent performs but, most importantly, it can give us ways of designing better tests.
- Type
- Chapter
- Information
- The Measure of All MindsEvaluating Natural and Artificial Intelligence, pp. 234 - 258Publisher: Cambridge University PressPrint publication year: 2017