From Tasks to Tests

José Hernández-Orallo

doi:10.1017/9781316594179.010

Altitude … is the degree of difficulty at which a given percentage of success is attained. …Width [is] the percent of successes at any given altitude or the average percent of successes at any given series of altitudes. … Area is the total number of tasks done correctly, or the percentage which this total is of the number of tasks in the entire list.

– Edward L. Thorndike, The Measurement of Intelligence (1927)

THE PRECISE definition and understanding of a task is crucial for its use as a measurement tool. In the previous chapter we have seen how tasks can be characterised in terms of their policies and their difficulty viewed as a search problem. Setting the perspective on the policies is helpful, but we need to pursue this direction further. A test is much more than just a task. In what follows we will introduce tools to examine how the performance of an agent changes for a range of difficulties. When building a test, we will focus on how to sample from the whole set of tasks and instances to obtain an appropriate pool of items, which maximise the information they take about the subject. This choice of items can be performed by diversity, difficulty and also by discriminating power, a challenging issue that is crucial for the design of adaptive tests.

AGENT CHARACTERISTIC CURVES

In our eagerness for quantification, we usually summarise a phenomenon by a single number. For instance, when we want to evaluate an agent for a given task, we sample from the task and record its aggregate response. This value can then be compared with those of other agents. Should not this be enough for a cognitive test and the comparison of agents? Unfortunately, much information is lost whenever we summarise the results of many task instances into a single number. The study of the responses for single instances can provide valuable information about how the agent performs but, most importantly, it can give us ways of designing better tests.

Book contents

9 - From Tasks to Tests

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

9 - From Tasks to Tests

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive