Published online by Cambridge University Press: 04 August 2010
“Pruning is done to prevent over crowding, for the health of the plant, to open up the lower branches to the light and to create space.” – Ashley Stephenson, The Garden Planner (1981).
Abstract: Discrimination or Classification Trees are a popular form of knowledge representation, and have even been used as the basis for expert systems. One reason for their popularity is that efficient algorithms exist for inducing such trees automatically from sample data (Brieman et al., 1984; Quinlan, 1986). However, it is widely recognized among machine-learning researchers that trees derived from noisy or inconclusive data sets tend to be over-complex. This unnecessary complexity renders them hard to interpret and typically degrades their performance on unseen test cases. The present paper introduces a measure of tree quality, and an associated tree pruning technique, based on the minimum-message-length (MML) criterion (Wallace & Freeman, 1987; Wolff, 1991). Empirical trials with a variety of data sets indicate that it achieves greater than 80% reduction in tree size, coupled with a slight improvement in accuracy in classifying unseen test cases, thus comparing favourably with alternative simplification strategies. Moreover, it is simpler that previously published pruning techniques, even those based on the MML principle such as that of Quinlan & Rivest (1989).
Keywords: Machine Learning, Data Compression, Inductive Inference, Information Theory, Entropy Minimax, Classification Algorithms, Discrimination Trees.
INTRODUCTION
One reason for the popularity of discrimination trees (also known as decision trees) for representing knowledge is that they are relatively easy to understand.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.