Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-23T06:33:02.930Z Has data issue: false hasContentIssue false

Prediction and Classification in Nonlinear Data Analysis: Something Old, Something New, Something Borrowed, Something Blue

Published online by Cambridge University Press:  01 January 2025

Jacqueline J. Meulman*
Affiliation:
Leiden University
*
Requests for reprints should be sent to Jacqueline J. Meulman, Data Theory Group, Department of Education, Leiden University, P.O. Box 9555, 2300 RB Leiden, THE NETHERLANDS. E-Mail: [email protected]

Abstract

Prediction and classification are two very active areas in modern data analysis. In this paper, prediction with nonlinear optimal scaling transformations of the variables is reviewed, and extended to the use of multiple additive components, much in the spirit of statistical learning techniques that are currently popular, among other areas, in data mining. Also, a classification/clustering method is described that is particularly suitable for analyzing attribute-value data from systems biology (genomics, proteomics, and metabolomics), and which is able to detect groups of objects that have similar values on small subsets of the attributes.

Type
2003 Presidential Address
Copyright
Copyright © 2003 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Special thanks are due to Brian Junker who gave me very helpful comments, and to Tim Null who made the printed version look as good as it does. Both waited patiently for me to finish, for which I'm forever grateful.

This article is based on the Presidential Address Jacqueline Meulman gave on July 9, 2003 at the 68th Annual Meeting of the Psychometric Society held near Cagliari, Italy on the island of Sardinia.—Editor

References

Bock, R.D. (1960). Methods and applications of optimal scaling. Chapel Hill, NC: University of North Carolina, L.L. Thurstone Psychometric Laboratory.Google Scholar
Boon, M.E., Zeppa, P., Ouwerkerk-Noordam, E., Kok, L.P. (1990). Exploiting the tooth-pick effect of the cytobrush by plastic embedding of cervical samples. Acta Cytologica, 35, 5763.Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 26, 123140.CrossRefGoogle Scholar
Breiman, L. (1996). Stacked regressions. Machine Learning, 24, 5164.CrossRefGoogle Scholar
Breiman, L., Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80, 580598.CrossRefGoogle Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.Google Scholar
Buja, A. (1990). Remarks on functional canonical variates, alternating least squares methods and ACE. Annals of Statistics, 18, 10321069.CrossRefGoogle Scholar
de Leeuw, J., Heiser, W.J. (1980). Multidimensional scaling with restrictions on the configuration. In Krishnaiah, P.R. (Eds.), Multivariate analysis, Vol. V (pp. 501522). Amsterdam: North-Holland.Google Scholar
de Leeuw, J., Young, F.W., Takane, Y. (1976). Additive structure in qualitative data. Psychometrika, 41, 471503.CrossRefGoogle Scholar
Duda, R., Hart, P., Stork, D. (2000). Pattern classification 2nd ed., New York, NY: John Wiley & Sons.Google Scholar
Freund, Y., Schapire, R.E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference (pp. 148156). San Francisco, CA: Morgan Kauffman.Google Scholar
Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 11891232.CrossRefGoogle Scholar
Friedman, J.H., Hastie, T., Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Annals of Statistics, 28, 337–307.CrossRefGoogle Scholar
Friedman, J.H., & Meulman, J.J. (in press). Clustering objects on subsets of attributes, (with discussion). Journal of the Royal Statistical Society, Series B. Available at http://www-stat.stanford.edu/~jhf/ftp/cosa.pdfGoogle Scholar
Friedman, J.H., Meulman, J.J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9), 13651381.CrossRefGoogle ScholarPubMed
Friedman, J.H., & Meulman, J.J. (2003b). COSA [Software]. Available at http://www-stat.stanford.edu/~jhf/COSA.htmlGoogle Scholar
Friedman, J., Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76, 817823.CrossRefGoogle Scholar
Gifi, A. (1990). Nonlinear multivariate analysis First edition, Chichester, U.K.: John Wiley & Sons.Google Scholar
Groenen, P.J.F., van Os, B.J., Meulman, J.J. (2000). Optimal scaling by alternating length constrained nonnegative least squares: An application to distance based principal components analysis. Psychometrika, 65, 511524.CrossRefGoogle Scholar
Guttman, L. (1950). The principal components of scale analysis. In Stouffer, S.A., Guttman, L., Suchman, E.A., Lazarsfield, P.F., Star, S.A., Clausen, J.A. (Eds.), Measurement and prediction. Princeton, NJ: Princeton University Press.Google Scholar
Harrison, D., Rubinfeld, D.L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics Management, 5, 81102.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R. (1990). Generalized additive models. New York, NY: Chapman and Hall.Google Scholar
Hastie, T., Tibshirani, R., Buja, A. (1998). Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association, 89, 12551270.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., Friedman, J.H. (2001). The elements of statistical learning. New York, NY: Springer-Verlag.CrossRefGoogle Scholar
Hayashi, C. (1952). On the prediction of phenomena from qualitative data and the quantification of qualitative data from the mathematico-statistical point of view. Annals of the Institute of Statitical Mathematics, 2, 9396.Google Scholar
Heiser, W.J. (1995). Convergent computation by iterative majorization: Theory and applications in multidimensional data analysis. In Krzanowski, W.J. (Eds.), Recent advances in descriptive multivariate analysis (pp. 157189). Oxford, U.K.: Oxford University Press.CrossRefGoogle Scholar
Kruskal, J.B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 128.CrossRefGoogle Scholar
Kruskal, J.B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29, 115129.CrossRefGoogle Scholar
Kruskal, J.B. (1965). Analysis of factorial experiments by estimating monotone transformations of the data. Journal of the Royal Statistical Society, 27, 251263.CrossRefGoogle Scholar
Max, J. (1960). Quantizing for minimum distortion. Proceedings IEEE (Information Theory), 6, 712.CrossRefGoogle Scholar
McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. New York, NY: John Wiley & Sons.CrossRefGoogle Scholar
Meulman, J.J. (2000). Discriminant analysis with optimal scaling. In Decker, R., Gaul, W. (Eds.), Classification and information processing at the turn of the millenium (pp. 3239). Heidelberg-Berlin, Germany: Springer-Verlag.CrossRefGoogle Scholar
Meulman, J.J., Zeppa, P., Boon, M.E., Rietveld, W.J. (1992). Prediction of various grades of cervical preneoplasia and neoplasia on plastic embedded cytobrush samples: Discriminant analysis with qualitative and quantitative predictors. Analytical and Quantitative Cytology and Histology, 14, 6072.Google ScholarPubMed
Meulman, J.J., & van der Kooij, A.J. (2000, May). Transformations towards independence through optimal scaling. Paper presented at the International Conference on Measurement and Multivariate Analysis (ICMMA), Banff, Canada.Google Scholar
Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto, Canada: University of Toronto Press.CrossRefGoogle Scholar
Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Ramsay, J.O. (1988). Monotone regression splines in action. Statistical Science, 4, 425461.Google Scholar
Ripley, B.D. (1996). Pattern recognition and neural networks. Cambridge, U.K.: Cambridge University Press.CrossRefGoogle Scholar
Takane, Y. (1998). Nonlinear multivariate analysis by neural network models. In Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.H., Baba, Y. (Eds.), Data science, classification, and related methods (pp. 527538). Tokyo: Springer.CrossRefGoogle Scholar
Takane, Y., Oshima-Takane, Y. (2002). Nonlinear generalized canonical correlation analysis by neural network models. In Nishisato, S., Baba, Y., Bozdogan, H., Kanefuji, K. (Eds.), Measurement and multivariate analysis (pp. 183190). Tokyo: Springer-Verlag.CrossRefGoogle Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58, 267288.CrossRefGoogle Scholar
van der Greef, J., Davidov, E., Verheij, E., Vogels, J., van der Heijden, R., Adourian, A.S., Oresic, M., Marple, E.W., Naylor, S. (2003). The role of metabolomics in drug discovery: A new vision for drug discovery and development. In Harrigan, G.G., Goodacre, R. (Eds.), Metabolic profiling: Its role in biomarker discovery and gene function analysis (pp. 170198). Boston, MA: Dordrecht; London: Kluwer Academic Publishers.Google Scholar
van der Kooij, A.J., Meulman, J.J. (1999). Regression with optimal scaling. In Meulman, J.J., Heiser, W.J.SPSS Inc. (Eds.), SPPS Categories 10.0 (pp. 18). Chicago, IL: SPSS.Google Scholar
van der Kooij, A.J., Meulman, J.J., & Heiser, W.J. (2003). Local minima in categorical multiple regression. Manuscript mubmitted for publication.Google Scholar
Vapnik, V. (1996). The nature of statistical learning theory. New York, NY: Springer-Verlag.Google Scholar
Whittaker, J.L. (1990). Graphical models in applied multivariate statistics. New York, NY: John Wiley & Sons.Google Scholar
Winsberg, S., Ramsay, J.O. (1980). Monotonic transformations to additivity using splines. Biometrika, 67, 669674.CrossRefGoogle Scholar
Yanai, H., Okada, A., Shigemasu, K., Kano, T., Meulman, J.J. (2003). New developments in psychometrics. Tokyo: Springer-Verlag.CrossRefGoogle Scholar
Young, F.W., de Leeuw, J., Takane, Y. (1976). Regression with qualitative and quantitative variables: An alternating least squares method with optimal scaling features. Psychometrika, 41, 505528.CrossRefGoogle Scholar