Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-20T18:17:24.335Z Has data issue: false hasContentIssue false

Common errors in data analysis: the apparent error rate of classification rules

Published online by Cambridge University Press:  09 July 2009

D. J. Hand*
Affiliation:
Biometrics Unit, Institute of Psychiatry, London
*
1Address for correspondence: Dr D. J. Hand, Biometrics Unit, Institute of Psychiatry, Dc Crespigny Park, Denmark Hill, London SE5 8AF.

Synopsis

Classification and diagnosis are concepts of fundamental importance in medicine. Yet all too frequently in published papers the only measure of performance of a classification rule is the optimistic apparent error rate. This is defined, some real examples are given illustrating how poor it is as an estimate of true future performance, and alternative measures are suggested.

Type
Brief Communications
Copyright
Copyright © Cambridge University Press 1983

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Dische, S., Yule, W., Corbett, J. & Hand, D. J. (1982). Childhood nocturnal enuresis: factors associated with outcome of treatment with an enuresis alarm. Developmental Medicine and Child Neurology (in the press).Google Scholar
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 126.CrossRefGoogle Scholar
Everitt, B. S. (1980). Cluster Analysis (2nd edn). Heinemann Educational Books: London.Google Scholar
Hand, D. J. (1981). Discrimination and Classification. John Wiley and Sons: Chichester.Google Scholar
Hand, D. J. (1982). Kernel Discriminant Analysis. Research Studies Press: Letchworth.Google Scholar
Hand, D. J. (1983 a). A comparison of two methods of discriminant analysis applied to binary data. Biometrics (in the press).CrossRefGoogle Scholar
Hand, D. J. (1983 b). Leaving one out error estimation in discriminant analysis. In preparation.Google Scholar
Lachenbruch, P. A. (1975). Discriminant Analysis. Hafner Press: New York.Google Scholar
Lachenbruch, P. A. & Mickey, M. R. (1968). Estimation of error rates in discriminant analysis. Technometrics 10, 111.CrossRefGoogle Scholar
McLachlan, G. J. (1980). The efficiency of Efron's ‘Bootstrap’ approach applied to error rate estimation in discriminant analysis. Journal of Statistics and Computer Simulation 11, 273279.CrossRefGoogle Scholar
Reading, A. E., Hand, D. J. & Sledmere, C. M. (1982). A comparison of response profiles obtained on the McGill pain questionnaire and an adjective checklist. Pain (submitted).CrossRefGoogle Scholar
Rogers, W., Ryack, B. & Moeller, G. (1979). Computer-aided medical diagnosis: literature review. International Journal of Bio-Medical Computing 10, 267289.CrossRefGoogle ScholarPubMed
Schoolman, H. M. & Bernstein, L. M. (1978). Computer use in diagnosis, prognosis, and therapy. Science 200, 926931.CrossRefGoogle ScholarPubMed
Shortliffe, E. H., Buchanan, B. G. & Feigenbaum, E. A. (1979). Knowledge engineering for medical decision making. A review of computer-based clinical decision aids. Proceedings of the Institute of Electrical and Electronics Engineers 67, 12071224.CrossRefGoogle Scholar
Toussaint, G. T. (1974). Bibliography on estimation of misclassification. Institute of Electrical and Electronics Engineers Transactions on Information Theory IT-20, 472479.CrossRefGoogle Scholar