Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-08T18:32:03.079Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  05 November 2012

Peter Flach
Affiliation:
University of Bristol
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Machine Learning
The Art and Science of Algorithms that Make Sense of Data
, pp. 367 - 382
Publisher: Cambridge University Press
Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abudawood, T. (2011). Multi-class subgroup discovery: Heuristics, algorithms and predictiveness. Ph.D. thesis, University of Bristol, Department of Computer Science, Faculty of Engineering. 357
Abudawood, T. and Flach, P.A. (2009). Evaluation measures for multi-class subgroup discovery. In W.L., Buntine, M., Grobelnik, D., Mladenić and J., Shawe-Taylor (eds.), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2009), Part I, LNCS, volume 5781, pp. 35–50. Springer. 193Google Scholar
Agrawal, R., Imielinski, T. and Swami, A.N. (1993). Mining association rules between sets of items in large databases. In P., Buneman and S., Jajodia (eds.), Proceedings of the ACM International Conference on Management of Data (SIGMOD 1993), pp. 207–216. ACM Press. 103Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A.I. (1996). Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI/MIT Press. 193Google Scholar
Allwein, E.L., Schapire, R.E. and Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. In P., Langley (ed.), Proceedings of the Seventeenth In ternational Conference on Machine Learning (ICML 2000), pp. 9–16. Morgan Kaufmann. 102Google Scholar
Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9(7):1545–1588.CrossRefGoogle Scholar
Angluin, D., Frazier, M. and Pitt, L. (1992). Learning conjunctions of Horn clauses. Machine Learning 9:147–164. 128CrossRefGoogle Scholar
Bakir, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B. and Vishwanathan, S.V.N. (2007). Predicting Structured Data. MIT Press. 361Google Scholar
Banerji, R.B. (1980). Artificial Intelligence: A Theoretical Approach. Elsevier Science. 127Google Scholar
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1):1–127. 361CrossRefGoogle Scholar
Best, M.J. and Chakravarti, N. (1990). Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming 47(1):425–439. 80, 229CrossRefGoogle Scholar
Blockeel, H. (2010 a). Hypothesis language. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 507–511. Springer. 127Google Scholar
Blockeel, H. (2010 b). Hypothesis space. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 511–513. Springer. 127Google Scholar
Blockeel, H., De Raedt, L. and Ramon, J. (1998). Top-down induction of clustering trees. In J.W., Shavlik (ed.), Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), pp. 55–63. Morgan Kaufmann. 103, 156Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M.K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36(4):929–965. 128CrossRefGoogle Scholar
Boser, B.E., Guyon, I. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the International Conference on Computational Learning Theor y (COLT 1992), pp. 144–152. 229Google Scholar
Bouckaert, R. and Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In H., Dai, R., Srikant and C., Zhang (eds.), Advances in Knowledge Discovery and Data Mining, LNCS, volume 3056, pp. 3–12. Springer. 358Google Scholar
Boullé, M. (2004). Khiops: A statistical discretization method of continuous attributes. Machine Learning 55(1):53–69. 328CrossRefGoogle Scholar
Boullé, M. (2006). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning 65(1):131–165. 328CrossRefGoogle Scholar
Bourke, C., Deng, K., Scott, S.D., Schapire, R.E. and Vinodchandran, N.V. (2008). On reoptimizing multi-class classifiers. Machine Learning 71(2-3):219–242. 102CrossRefGoogle Scholar
Brazdil, P., Giraud-Carrier, C.G., Soares, C. and Vilalta, R. (2009). Metalearning – Applications to Data Mining. Springer. 342Google Scholar
Brazdil, P., Vilalta, R., Giraud-Carrier, C.G. and Soares, C. (2010). Metalearning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 662–666. Springer. 342Google Scholar
Breiman, L. (1996 a). Bagging predictors. Machine Learning 24(2):123–140. 341CrossRefGoogle Scholar
Breiman, L. (1996 b). Stacked regressions. Machine Learning 24(1):49–64. 342
Breiman, L. (2001). Random forests. Machine Learning 45(1):5–32. 341Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. Wadsworth. 156Google Scholar
Brier, G.W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78(1):1–3. 802.0.CO;2>CrossRefGoogle Scholar
Brown, G. (2010). Ensemble learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 312–320. Springer. 341Google Scholar
Bruner, J.S., Goodnow, J.J. and Austin, G.A. (1956). A Study of Thinking. Science Editions. 2nd edn 1986. 127Google Scholar
Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press. 361CrossRefGoogle Scholar
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence (ECAI 1990), pp. 147–149. 296Google Scholar
Clark, P. and Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y., Kodratoff (ed.), Proceedings of the European Working Session on Learning (EWSL 1991), LNCS, volume 482, pp. 151–163. Springer. 192Google Scholar
Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning 3:261–283. 192CrossRefGoogle Scholar
Cohen, W. W. (1995). Fast effective rule induction. In A., Prieditis and S.J., Russell (eds.), Proceedings of the Twelfth International Conference on Machine Learning (ICML 1995), pp. 115–123. Morgan Kaufmann. 192, 341Google Scholar
Cohen, W.W. and Singer, Y. (1999). A simple, fast, and effictive rule learner. In J., Hendler and D., Subramanian (eds.), Proceedings of the Sixteenth National Conference on Ar tificial Intelligence (AAAI 1999), pp. 335–342. AAAI Press / MIT Press. 341Google Scholar
Cohn, D. (2010). Active learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 10–14. Springer. 128Google Scholar
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning 20(3):273–297. 229CrossRefGoogle Scholar
Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1):21–27. 260CrossRefGoogle Scholar
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press. 229Google Scholar
Dasgupta, S. (2010). Active learning theory. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 14–19. Springer. 128Google Scholar
Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In W.W., Cohen and A., Moore (eds.), Proceedings of the Twenty-Third International Conference on Machine Learning (ICML 2006), pp. 233–240. ACM Press. 358Google Scholar
De Raedt, L. (1997). Logical settings for concept-learning. Artificial Intelligence 95(1):187–201. 128CrossRefGoogle Scholar
De Raedt, L. (2008). Logical and Relational Learning. Springer. 193CrossRefGoogle Scholar
De Raedt, L. (2010). Logic of generality. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 624–631. Springer. 128Google Scholar
De Raedt, L. and Kersting, K. (2010). Statistical relational learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 916–924. Springer. 193Google Scholar
Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) pp. 1–38. 296Google Scholar
Demšar, J. (2008). On the appropriateness of statistical tests in machine learning. In Proceedings of the ICML'08 Workshop on Evaluation Methods for Machine Learning. 359Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7:1–30. 359Google Scholar
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7):1895–1923. 358CrossRefGoogle ScholarPubMed
Dietterich, T.G. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2:263–286. 102Google Scholar
Dietterich, T. G., Kearns, M.J. and Mansour, Y. (1996). Applying the weak learning framework to understand and improve c4.5. In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 96–104. 156Google Scholar
Ding, C.H.Q. and He, X. (2004). K-means clustering via principal component analysis. In C.E., Brodley (ed.), Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004). ACM Press. 329Google Scholar
Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29(2):103–130. 296CrossRefGoogle Scholar
Donoho, S.K. and Rendell, L.A. (1995). Rerepresenting and restructuring domain theories: A constructive induction approach. Journal of Artificial Intelligence Research 2:411–446. 328Google Scholar
Drummond, C. (2006). Machine learning as an experimental science (revisited). In Proceedings of the AAAI'06 Workshop on Evaluation Methods for Machine Learning. 359Google Scholar
Drummond, C. and Holte, R.C. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In P., Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 239–246. Morgan Kaufmann. 156Google Scholar
Egan, J.P. (1975). Signal Detection Theory and ROC Analysis. Academic Press. 80Google Scholar
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters 27(8):861–874. 80, 358CrossRefGoogle Scholar
Fawcett, T. and Niculescu-Mizil, A. (2007). PAV and the ROC convex hull. Machine Learning 68(1):97–106. 80, 229CrossRefGoogle Scholar
Fayyad, U.M. and Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 1993), pp. 1022–1029. 328Google Scholar
Ferri, C., Flach, P.A. and Hernández-Orallo, J. (2002). Learning decision trees using the area under the ROC curve. In C., Sammut and A.G., Hoffmann (eds.), Proceedings of the Ni neteenth International Conference on Machine Learning (ICML 2002), pp. 139–146. Morgan Kaufmann. 156Google Scholar
Ferri, C., Flach, P.A. and Hernández-Orallo, J. (2003). Improving the AUC of probabilistic estimation trees. In N., Lavrač, D., Gamberger, L., Todorovski and H., Blockeel (eds.), Proceedings of the European Conference on Machine Learning (ECML 2003), LNCS, volume 2837, pp. 121–132. Springer. 156Google Scholar
Fix, E. and Hodges, J.L. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties. Technical report, USAF School of Aviation Medicine, Texas: Randolph Field. Report Number 4, Project Number 21-49-004. 260Google Scholar
Flach, P.A. (1994). Simply Logical – Intelligent Reasoning by Example. Wiley. 193Google Scholar
Flach, P.A. (2003). The geometry of ROC space: Understanding machine learning metrics through ROC isometrics. In T., Fawcett and N., Mishra (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 194–201. AAAI Press. 156Google Scholar
Flach, P.A. (2010 a). First-order logic. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 410–415. Springer. 128Google Scholar
Flach, P.A. (2010 b). ROC analysis. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 869–875. Springer. 80Google Scholar
Flach, P.A. and Lachiche, N. (2001). Confirmation-guided discovery of first-order rules with Tertius. Machine Learning 42(1/2):61–95. 193CrossRefGoogle Scholar
Flach, P.A. and Matsubara, E.T. (2007). A simple lexicographic ranker and probability estimator. In J.N., Kok, J., Koronacki, R.L., de Mántaras, S., Matwin, D., Mladenic and A., Skowron (eds.), Proceedings of the Eighteenth European Conference on Machine Learning (ECML 2007), LNCS, volume 4701, pp. 575–582. Springer. 80, 229Google Scholar
Freund, Y., Iyer, R.D., Schapire, R.E. and Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4:933–969. 341Google Scholar
Freund, Y. and Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1):119–139. 341CrossRefGoogle Scholar
Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review 13(1):3–54. 192CrossRefGoogle Scholar
Fürnkranz, J. (2010). Rule learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 875–879. Springer. 192Google Scholar
Fürnkranz, J. and Flach, P.A. (2003). An analysis of rule evaluation metrics. In T., Fawcett and N., Mishra (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 202–209. AAAI Press. 79Google Scholar
Fürnkranz, J. and Flach, P.A. (2005). ROC ‘n’ Rule learning – towards a better understanding of covering algorithms. Machine Learning 58(1):39–77. 192CrossRefGoogle Scholar
Fürnkranz, J., Gamberger, D. and Lavrač, N. (2012). Foundations of Rule Learning. Springer. 192CrossRefGoogle Scholar
Fürnkranz, J. and Hüllermeier, E. (eds.) (2010). Preference Learning. Springer. 361
Fürnkranz, J. and Widmer, G. (1994). Incremental reduced error pruning. In Proceedings of the Eleventh International Conference on Machine Learning (ICML 1994), pp. 70–77. 192Google Scholar
Gama, J. and Gaber, M.M. (eds.) (2007). Learning from Data Streams: Processing Techniques in Sensor Networks. Springer. 361CrossRef
Ganter, B. and Wille, R. (1999). Formal Concept Analysis: Mathematical Foundations. Springer. 127CrossRefGoogle Scholar
Garriga, G.C., Kralj, P. and Lavrač, N. (2008). Closed sets for labeled data. Journal of Machine Learning Research 9:559–580. 127Google Scholar
Gärtner, T. (2009). Kernels for Structured Data. World Scientific. 230Google Scholar
Grünwald, P.D. (2007). The Minimum Description Length Principle. MIT Press. 297Google Scholar
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research 3:1157–1182. 328Google Scholar
Hall, M.A. (1999). Correlation-based feature selection for machine learning. Ph.D. thesis, University of Waikato. 328
Han, J., Cheng, H., Xin, D. and Yan, X. (2007). Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15(1):55–86. 193CrossRefGoogle Scholar
Hand, D.J. and Till, R.J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45(2):171–186. 102CrossRefGoogle Scholar
Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial Intelligence 36(2):177–221. 128CrossRefGoogle Scholar
Hernández-Orallo, J., Flach, P.A. and Ferri, C. (2011). Threshold choice methods: The missing link. Available online at http://arxiv.org/abs/1112.2640. 358Google Scholar
Ho, T.K. (1995). Random decision forests. In Proceedings of the International Conference on Document Analysis and Recognition, p. 278. IEEE Computer Society, Los Alamitos, CA, USA. 341Google Scholar
Hoerl, A.E. and Kennard, R.W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics pp. 55–67. 228Google Scholar
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the Twenty-Second Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 50–57. ACM Press. 329Google Scholar
Hunt, E.B., Marin, J. and Stone, P.J. (1966). Experiments in Induction. Academic Press. 127, 156Google Scholar
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999). Data clustering: A review. ACM Computing Surveys 31(3):264–323. 261CrossRefGoogle Scholar
Japkowicz, N. and Shah, M. (2011). Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press. 357CrossRefGoogle Scholar
Jebara, T. (2004). Machine Learning: Discriminative and Generative. Springer. 296CrossRefGoogle Scholar
John, G.H. and Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI 1995), pp. 338–345. Morgan Kaufmann. 295Google Scholar
Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley. 261CrossRefGoogle Scholar
Kearns, M.J. and Valiant, L.G. (1989). Cryptographic limitations on learning Boolean formulae and finite automata. In D.S., Johnson (ed.), Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (STOC 1989), pp. 433–444. ACM Press. 341Google Scholar
Kearns, M.J. and Valiant, L.G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM 41(1):67–95. 341CrossRefGoogle Scholar
Kerber, R. (1992). Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI 1992), pp. 123–128. AAAI Press. 328Google Scholar
Kibler, D.F. and Langley, P. (1988). Machine learning as an experimental science. In Proceedings of the European Working Session on Learning (EWSL 1988), pp. 81–92. 359Google Scholar
King, R.D., Srinivasan, A. and Dehaspe, L. (2001). Warmr: A data mining tool for chemical data. Journal of Computer-Aided Molecular Design 15(2):173–181. 193CrossRefGoogle ScholarPubMed
Kira, K. and Rendell, L.A. (1992). The feature selection problem: Traditional methods and a new algorithm. In W.R., Swartout (ed.), Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI 1992), pp. 129–134. AAAI Press / MIT Press. 328Google Scholar
Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In Advances in Knowledge Discovery and Data Mining, pp. 249–271. MIT Press. 103Google Scholar
Kohavi, R. and John, G.H. (1997). Wrappers for feature subset selection. Artificial Intelligence 97(1-2):273–324. 328CrossRefGoogle Scholar
Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. IEEE Computer 42(8):30–37. 328CrossRefGoogle Scholar
Kramer, S. (1996). Structural regression trees. In Proceedings of the National Conference on Artificial Intelligence (AAAI 1996), pp. 812–819. 156Google Scholar
Kramer, S., Lavrač, N. and Flach, P.A. (2000). Propositionalization approaches to relational data mining. In S., Džeroski and N., Lavrač (eds.), Relational Data Mining, pp. 262–286. Springer. 328Google Scholar
Krogel, M.A., Rawles, S., Zelezný, F., Flach, P.A., Lavrač, N. and Wrobel, S. (2003). Comparative evaluation of approaches to propositionalization. In T., Horváth (ed.), Proceedings of the Thir teenth International Conference on Inductive Logic Programming (ILP 2003), LNCS, volume 2835, pp. 197–214. Springer. 328Google Scholar
Kuncheva, L.I. (2004). Combining Pattern Classifiers: Methods and Algorithms. John Wiley and Sons. 341CrossRefGoogle Scholar
Lachiche, N. (2010). Propositionalization. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 812–817. Springer. 328Google Scholar
Lachiche, N. and Flach, P.A. (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In T., Fawcett and N., Mishra (eds.), Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 416–423. AAAI Press. 102Google Scholar
Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In C.E., Brodley and A.P., Danyluk (eds.), Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 282–289. Morgan Kaufmann. 296Google Scholar
Langley, P. (1988). Machine learning as an experimental science. Machine Learning 3:5–8. 359CrossRefGoogle Scholar
Langley, P. (1994). Elements of Machine Learning. Morgan Kaufmann. 156Google Scholar
Langley, P. (2011). The changing science of machine learning. Machine Learning 82(3):275–279. 359CrossRefGoogle Scholar
Lavrač, N., Kavšek, B., Flach, P.A. and Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5:153–188. 193Google Scholar
Lee, D.D., Seung, H.S. et al. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. 328Google ScholarPubMed
Leman, D., Feelders, A. and Knobbe, A.J. (2008). Exceptional model mining. In W., Daelemans, B., Goethals and K., Morik (eds.), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2008), Part II, LNCS, volume 5212, pp. 1–16. Springer. 103Google Scholar
Lewis, D. (1998). Naive Bayes at forty: The independence assumption in information retrieval. In Proceedings of the Tenth European Conference on Machine Learning (ECML 1998), pp. 4–15. Springer. 295Google Scholar
Li, W., Han, J. and Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In N., Cercone, T.Y., Lin and X., Wu (eds.), Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), pp. 369–376. IEEE Computer Society. 193Google Scholar
Little, R.J.A. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. Wiley. 296Google Scholar
Liu, B., Hsu, W. and Ma, Y. (1998). Integrating classification and association rule mining. In Proceedings of the Fourth In ternational Conference on Knowledge Discovery and Data Mining (KDD 1998), pp. 80–86. AAAI Press. 193Google Scholar
Lloyd, J.W. (2003). Logic for Learning – Learning Comprehensible Theories from Structured Data. Springer. 193Google Scholar
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2):129–137. 261CrossRefGoogle Scholar
Mahalanobis, P.C. (1936). On the generalised distance in statistics. Proceedings of the National Institute of Science, India 2(1):49–55. 260Google Scholar
Mahoney, M.W. and Drineas, P. (2009). CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences 106(3):697. 329CrossRefGoogle ScholarPubMed
McCallum, A. and Nigam, K. (1998). A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, pp. 41–48. 295Google Scholar
Michalski, R.S. (1973). Discovering classification rules using variable-valued logic system VL1. In Proceedings of the Third International Joint Conference on Artificial Intelligence, pp. 162–172. Morgan Kaufmann Publishers. 127Google Scholar
Michalski, R.S. (1975). Synthesis of optimal and quasi-optimal variable-valued logic formulas. In Proceedings of the 1975 International Symposium on Multiple-Valued Logic, pp. 76–87. 192Google Scholar
Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood. 342Google Scholar
Miettinen, P. (2009). Matrix decomposition methods for data mining: Computational complexity and algorithms. Ph.D. thesis, University of Helsinki. 329
Minsky, M. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. 228Google Scholar
Mitchell, T.M. (1977). Version spaces: A candidate elimination approach to rule learning. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, pp. 305–310. Morgan Kaufmann Publishers. 127Google Scholar
Mitchell, T.M. (1997). Machine Learning. McGraw-Hill. 128Google Scholar
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing 13(3&4):245–286. 193CrossRefGoogle Scholar
Muggleton, S., De Raedt, L., Poole, D., Bratko, I., Flach, P.A., Inoue, K. and Srinivasan, A. (2012). ILP turns 20 – biography and future challenges. Machine Learning 86(1):3–23. 193CrossRefGoogle Scholar
Muggleton, S. and Feng, C. (1990). Efficient induction of logic programs. In Proceedings of the International Conference on Algorithmic Learning Theory (ALT 1990), pp. 368–381. 193Google Scholar
Murphy, A.H. and Winkler, R.L. (1984). Probability forecasting in meteorology. Journal of the American Statistical Association pp. 489–500. 80Google Scholar
Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A (General) pp. 370–384. 296Google Scholar
Novikoff, A.B. (1962). On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pp. 615–622. Polytechnic Institute of Brooklyn, New York. 228Google Scholar
Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Proceedings of the In ternational Conference on Database Theory (ICDT 1999), pp. 398–416. Springer. 127Google Scholar
Peng, Y., Flach, P.A., Soares, C. and Brazdil, P. (2002). Improved dataset characterisation for meta-learning. In S., Lange, K., Satoh and C.H., Smith (eds.), Proceedings of the Fifth International Conference on Discovery Science (DS 2002), LNCS, volume 2534, pp. 141–152. Springer. 342Google Scholar
Pfahringer, B., Bensusan, H. and Giraud-Carrier, C.G. (2000). Meta-learning by land-marking various learning algorithms. In P., Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 743–750. Morgan Kaufmann. 342Google Scholar
Platt, J.C. (1998). Using analytic QP and sparseness to speed training of support vector machines. In M.J., Kearns, S.A., Solla and D.A., Cohn (eds.), Advances in Neural Information Processing Systems 11 (NIPS 1998), pp. 557–563. MIT Press. 229Google Scholar
Plotkin, G.D. (1971). Automatic methods of inductive inference. Ph.D. thesis, University of Edinburgh. 127
Provost, F.J. and Domingos, P. (2003). Tree induction for probability-based ranking. Machine Learning 52(3):199–215. 156CrossRefGoogle Scholar
Provost, F.J. and Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning 42(3):203–231. 79CrossRefGoogle Scholar
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning 1(1):81–106. 155CrossRefGoogle Scholar
Quinlan, J.R. (1990). Learning logical definitions from relations. Machine Learning 5:239–266. 193CrossRefGoogle Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann. 156Google Scholar
Ragavan, H. and Rendell, L.A. (1993). Lookahead feature construction for learning hard concepts. In Proceedings of the Tenth International Conference on Machine Learning (ICML 1993), pp. 252–259. Morgan Kaufmann. 328Google Scholar
Rajnarayan, D.G. and Wolpert, D. (2010). Bias-variance trade-offs: Novel applications. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 101–110. Springer. 103Google Scholar
Rissanen, J. (1978). Modeling by shortest data description. Automatica 14(5):465–471. 297CrossRefGoogle Scholar
Rivest, R.L. (1987). Learning decision lists. Machine Learning 2(3):229–246. 192CrossRefGoogle Scholar
Robnik-Sikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53(1-2):23–69. 328CrossRefGoogle Scholar
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6):386. 228CrossRefGoogle Scholar
Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20(0):53–65. 261CrossRefGoogle Scholar
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning representations by back-propagating errors. Nature 323(6088):533–536. 229CrossRefGoogle Scholar
Schapire, R.E. (1990). The strength of weak learnability. Machine Learning 5:197–227. 341CrossRefGoogle Scholar
Schapire, R.E. (2003). The boosting approach to machine learning: An overview. In Nonlinear Estimation and Classification, pp. 149–172. Springer. 341Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics 26(5):1651–1686. 341Google Scholar
Schapire, R.E. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3):297–336. 341CrossRefGoogle Scholar
Settles, B. (2011). Active Learning. Morgan & Claypool. 361Google Scholar
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press. 230CrossRefGoogle Scholar
Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the Twenty-Four th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), pp. 1297–1304. 155Google Scholar
Silver, D. and Bennett, K. (2008). Guest editor's introduction: special issue on inductive transfer learning. Machine Learning 73(3):215–220. 361CrossRefGoogle Scholar
Solomonoff, R.J. (1964 a). A formal theory of inductive inference: Part I. Information and Control 7(1):1–22. 297Google Scholar
Solomonoff, R.J. (1964 b). A formal theory of inductive inference: Part II. Information and Control 7(2):224–254. 297Google Scholar
Srinivasan, A. (2007). The Aleph manual, version 4 and above. Available online at www.cs.ox.ac.uk/activities/machlearn/Aleph/. 193Google Scholar
Stevens, S.S. (1946). On the theory of scales of measurement. Science 103(2684):677–680. 327CrossRefGoogle Scholar
Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning: An Introduction. MIT Press. 361Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological) pp. 267–288. 228Google Scholar
Todorovski, L. and Dzeroski, S. (2003). Combining classifiers with meta decision trees. Machine Learning 50(3):223–249. 342CrossRefGoogle Scholar
Tsoumakas, G., Zhang, M.L. and Zhou, Z.H. (2012). Introduction to the special issue on learning from multi-label data. Machine Learning 88(1-2):1–4. 361CrossRefGoogle Scholar
Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley. 103Google Scholar
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM 27(11):1134–1142. 128CrossRefGoogle Scholar
Vapnik, V.N. and Chervonenkis, A.Y. (1971). On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei I Ee Primeneniya 16(2):264–279. 128Google Scholar
Vere, S.A. (1975). Induction of concepts in the predicate calculus. In Proceedings of the Fourth International Joint Conference on Artificial Intelligence, pp. 281–287. 127Google Scholar
von Hippel, P.T. (2005). Mean, median, and skew: Correcting a textbook rule. Journal of Statistics Education 13(2). 327CrossRefGoogle Scholar
Wallace, C.S. and Boulton, D.M. (1968). An information measure for classification. Computer Journal 11(2):185–194. 297CrossRefGoogle Scholar
Webb, G.I. (1995). Opus: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research 3:431–465. 192Google Scholar
Webb, G.I., Boughton, J.R. and Wang, Z. (2005). Not so naive Bayes: Aggregating one-dependence estimators. Machine Learning 58(1):5–24. 295CrossRefGoogle Scholar
Winston, P.H. (1970). Learning structural descriptions from examples. Technical report, MIT Artificial Intelligence Lab. AITR-231. 127Google Scholar
Wojtusiak, J., Michalski, R.S., Kaufman, K.A. and Pietrzykowski, J. (2006). The AQ21 natural induction program for pattern discovery: Initial version and its novel features. In Proceedings of the Eighteenth IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2006), pp. 523–526. 192Google Scholar
Wolpert, D.H. (1992). Stacked generalization. Neural Networks 5(2):241–259. 342CrossRefGoogle Scholar
Zadrozny, B. and Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the Eighth ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2002), pp. 694–699. ACM Press. 80, 229Google Scholar
Zeugmann, T. (2010). PAC learning. In C., Sammut and G.I., Webb (eds.), Encyclopedia of Machine Learning, pp. 745–753. Springer. 128Google Scholar
Zhou, Z.H. (2012). Ensemble Me thods: Foundations and Algorithms. Taylor & Francis. 341Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • References
  • Peter Flach, University of Bristol
  • Book: Machine Learning
  • Online publication: 05 November 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511973000.017
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • References
  • Peter Flach, University of Bristol
  • Book: Machine Learning
  • Online publication: 05 November 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511973000.017
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • References
  • Peter Flach, University of Bristol
  • Book: Machine Learning
  • Online publication: 05 November 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511973000.017
Available formats
×