Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-23T14:08:22.438Z Has data issue: false hasContentIssue false

Automatic language and information processing: rethinking evaluation

Published online by Cambridge University Press:  26 April 2001

KAREN SPARCK JONES
Affiliation:
Computer Laboratory, University of Cambridge, New Museums Site, Pembroke Street, Cambridge CB2 3QG, UK; e-mail: [email protected]

Abstract

System evaluation has mattered since research on automatic language and information processing began. However, the (D)ARPA conferences have raised the stakes substantially in requiring and delivering systematic evaluations and in sustaining these through long term programmes; and it has been claimed that this has both significantly raised task performance, as defined by appropriate effectiveness measures, and promoted relevant engineering development. These controlled laboratory evaluations have made very strong assumptions about the task context. The paper examines these assumptions for six task areas, considers their impact on evaluation and performance results, and argues that for current tasks of interest, e.g. summarising, it is now essential to play down the present narrowly-defined performance measures in order to address the task context, and specifically the role of the human participant in the task, so that new measures, of larger value, can be developed and applied.

Type
Research Article
Copyright
© 2001 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I am grateful to James Allan and Fred Jelinek for inviting me to give the talk on which this paper is based, and to two referees for their comments.