Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-08T10:24:28.824Z Has data issue: false hasContentIssue false

A Proposal for Handling Missing Data

Published online by Cambridge University Press:  01 January 2025

Terry C. Gleason
Affiliation:
Carnegie-Mellon University
Richard Staelin
Affiliation:
Carnegie-Mellon University

Abstract

A method for dealing with the problem of missing observations in multivariate data is developed and evaluated. The method uses a transformation of the principal components of the data to estimate missing entries. The properties of this method and four alternative methods are investigated by means of a Monte Carlo study of 42 computer-generated data matrices. The methods are compared with respect to their ability to predict correlation matrices as well as missing entries.

The results indicate that whenever there exists modest intercorrelations among the variables (i.e., average off diagonal correlation above .2) the proposed method is at least as good as the best alternative (a regression method) while being considerably faster and simpler computationally. Models for determining the best alternative based upon easily calculated characteristics of the matrix are given. The generality of these models is demonstrated using the previously published results of Timm.

Type
Original Paper
Copyright
Copyright © 1975 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

This is an extension and elaboration of a paper read at the Spring 1973 meetings of the Psychometric Society. We wish to express our appreciation to Timothy McGuire for his helpful comments.

References

Anderson, T. W.. Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 1957, 52, 200203.CrossRefGoogle Scholar
Buck, S. F.. A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society, Series B, 1960, 22, 302307.CrossRefGoogle Scholar
Christofferson, A.. A method for component analysis when the data are incomplete. Seminar communication, 1965, Uppsala: University Institute of Statistics.Google Scholar
Dear, R. E. A principal-Component missing data method for multiple regression models. System Development Corporation, Technical Report SP-86, 1959.Google Scholar
Eckart, C. and Young, G.. The approximation of one matrix by another of lower rank. Psychometrika, 1936, 1, 211218.CrossRefGoogle Scholar
Edgett, G. L.. Multiple regression with missing observations among the independent variables. Journal of the American Statistical Association, 1956, 51, 122132.CrossRefGoogle Scholar
Glasser, M.. Linear regression analysis with missing observations among the independent variables. Journal of the American Statistical Association, 1964, 59, 834844.CrossRefGoogle Scholar
Gleason, T. C. and Staelin, R. Improving the metric quality of questionnaire data. Psychometrika, 1973, 393410.CrossRefGoogle Scholar
Haitovsky, Y.. Missing data in regression analysis. Journal of the Royal Statistical Society, Series B, 1968, 30, 6782.CrossRefGoogle Scholar
Horn, J. L.. A rationale and test for the number of factors in factor analysis. Psychometrika, 1965, 30, 179185.CrossRefGoogle ScholarPubMed
Johnson, R. M.. On a theorem stated by Eckart and Young. Psychometrika, 1963, 28, 259264.CrossRefGoogle Scholar
Srivastava, J. N. and McDonald, L.. On a large class of incomplete multivariate models which can be transformed to make manova applicable. Metron, 1970, 28, 241252.Google Scholar
Staelin, R. and Gleason, T. C. On the quality of principle components. American Marketing Association Combined Conference Proceedings Spring and Fall 1972, Becker, B. W. and Becker, H. (Eds.), 34, 484488.Google Scholar
Timm, N. H.. The estimation of variance-covariance and correlation matrices from incomplete data. Psychometrika, 1970, 35, 417438.CrossRefGoogle Scholar
Trawinski, I. M. and Bargmann, R. E.. Maximum likelihood estimation with incomplete multivariate data. Annals of Mathematical Statistics, 1964, 35, 647657.CrossRefGoogle Scholar
Walsh, J. E.. Computer-feasible method for handling incomplete data in regression analysis. Journal of the Association for Computer Machinery, 1961, 18, 201211.CrossRefGoogle Scholar
Wilks, S. S.. Moments and distributions of estimates of population parameters from fragmentary samples. Annals of Mathematical Statistics, 1932, 3, 163195.CrossRefGoogle Scholar
Wold, H.. Nonlinear estimation by iterative least squares procedures. In David, F. N. (Eds.), Festchrift Jerzy Neyman, 1966, New York: Wiley.Google Scholar