Book contents
7 - Classification accuracy
Published online by Cambridge University Press: 07 January 2010
Summary
Background
Probably the most important aim for any classifier is that it should make accurate predictions. However, it is surprisingly difficult to arrive at an adequate definition and measurement of accuracy (see Fielding and Bell, 1997; Fielding, 1999b, 2002). Also, it may be better to think in terms of costs rather than accuracy and use a minimum cost criterion to evaluate the effectiveness of a classifier. Unfortunately costs have rarely been used (explicitly) in biological studies and it is reasonable to expect some opposition to their future use. However, in Section 7.8 it is shown that all classifiers apply costs that may have gone unrecognised.
Ultimately the only test of any classifier is its future performance, i.e. its ability to classify correctly novel cases. This is known as generalisation and it is linked to both classifier design and testing. In general, complex classifiers are fine tuned to re-classify the cases used in developmental testing. As such they are likely to incorporate too much of the ‘noise’ from the original cases, leading to a decline in accuracy when presented with novel cases. It is sometimes necessary to accept reduced accuracy on the training data if it leads to increased accuracy with novel cases. This was illustrated in the decision tree and artificial neural network sections in Chapter 6. Focusing on the generalisation of a classifier differs from traditional statistical approaches which are usually judged by the coefficient p-values or some overall goodness of fit such as R2.
- Type
- Chapter
- Information
- Cluster and Classification Techniques for the Biosciences , pp. 179 - 199Publisher: Cambridge University PressPrint publication year: 2006