The Boundaries against a Unified Evaluation

José Hernández-Orallo

doi:10.1017/9781316594179.007

[Edsger Dijkstra] asked me what I was working on. Perhaps just to provoke a memorable exchange I said, “AI”. To that he immediately responded, “Why don't you work on I?”

He was right, of course, that if “I” is more general than “AI”, one should work on the more general problem, especially if it is the one that is the natural phenomenon, which in this case it is.

– Leslie Valiant, Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World (2013)

IN THE PREVIOUS three chapters, we have seen three very different approaches to the evaluation of behaviour. Psychometrics use well-defined test batteries, usually composed of abstract culture-fair problems or questionnaires, different from everyday tasks. Comparative psychology also presents tasks to animals, not necessarily so abstract, but careful attention is put on interfaces and motivation, with rewards being key. Artificial intelligence evaluation is significantly different, using benchmarks and competitions. What happens if we use definitions, tools and tests from one discipline to evaluate subjects in other disciplines? How often has this been done and advocated for? Why has it not worked so far?

THE FRAGMENTED EVALUATION OF BEHAVIOUR

There was a time when a certain fragmentation existed between psychology, evolutionary biology and artificial intelligence. However, with the increasing relevance of evolutionary psychology and cognitive science, the boundaries between these disciplines have well been trespassed and new areas have appeared in between, such as artificial life, evolutionary computing, evolutionary robotics and, human-machine interfaces, developmental robotics and swarm computing. Unfortunately, we cannot say the same for the evaluation of behavioural features. The preceding three chapters present different terminologies, principles, tools and, ultimately, tests.

Table 6.1 shows a simplified picture of some of the distinctions between psychometrics, comparative psychology and AI evaluation.

Each discipline is extremely diverse. It is very different if we evaluate a small child or an adult, a chimpanzee or a bacterium, an ‘intelligent’ vacuum cleaner or a reinforcement learning system playing games. Hence, the simplification would be turned into distortion for other differences, which are therefore not included in the table, such as what is evaluated (individual, group or species), whether the measurement is quantitative or qualitative, how difficulty is inferred and the relevance of physical traits in the measurement (e.g., sensorimotor abilities).

Book contents

6 - The Boundaries against a Unified Evaluation

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

6 - The Boundaries against a Unified Evaluation

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive