Listwise Deletion in High Dimensions

J. Sophia Wang; P. M. Aronow

doi:10.1017/pan.2022.5

Listwise Deletion in High Dimensions

Published online by Cambridge University Press: 02 March 2022

J. Sophia Wang

and

P. M. Aronow

Show author details

J. Sophia Wang: Affiliation:
Graduate Student, Department of Political Science, Yale University, New Haven, CT, USA. E-mail: [email protected]
P. M. Aronow*: Affiliation:
Associate Professor, Departments of Political Science, Biostatistics, and Statistics and Data Science, Yale University, New Haven, CT, USA. E-mail: [email protected]
*: Corresponding author P. M. Aronow

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We consider the properties of listwise deletion when both n and the number of variables grow large. We show that when (i) all data have some idiosyncratic missingness and (ii) the number of variables grows superlogarithmically in n, then, for large n, listwise deletion will drop all rows with probability 1. Using two canonical datasets from the study of comparative politics and international relations, we provide numerical illustration that these problems may emerge in real-world settings. These results suggest that, in practice, using listwise deletion may mean using few of the variables available to the researcher.

Keywords

missing data listwise deletion high dimensional inference

Type: Letter
Information: Political Analysis , Volume 31 , Issue 1 , January 2023 , pp. 149 - 155

DOI: https://doi.org/10.1017/pan.2022.5 [Opens in a new window]
Copyright: © The Author(s) 2022. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Edited by Jeff Gill

References

Allison, P. D. 2001. Missing Data, Quantitative Applications in Social Sciences, Vol. 136. Thousand Oaks: Sage.Google Scholar

Arel-Bundock, V., and Pelc, K. J.. 2018. “When Can Multiple Imputation Improve Regression Estimates?” Political Analysis 26 (2): 240–245.CrossRef Google Scholar

Berk, R. 1983. “Applications of the General Linear Model to Survey Data.” In Handbook of Survey Research, edited by Peter, A. B. A., Rossi, H., and Wright, J. D., pp. 495–546. Quantitative Studies in Social Relations. New York: Academic Press.CrossRef Google Scholar

Cameron, A., and Trivedi, P.. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.CrossRef Google Scholar

Esty, D. C., et al. 1999. “State Failure Task Force Report: Phase II Findings.” Environmental Change and Security Project Report 5: 49–72.Google Scholar

Esty, D. C., Goldstone, J., Gurr, T. R., Surko, P., and Unger, A.. 1995. Working Papers: State Failure Task Force Report. McLean: Science Applications International Corporation.Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R.. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22.CrossRef Google Scholar PubMed

Honaker, J., and King, G.. 2010. “What to Do About Missing Values in Time-Series Cross-Section Data.” American Journal of Political Science 54 (2): 561–581.CrossRef Google Scholar

King, G., Honaker, J., Joseph, A., and Scheve, K.. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political Science Review 95: 49–69.CrossRef Google Scholar

King, G., and Zeng, L.. 2001. “Improving Forecasts of State Failure.” World Politics 53 (4): 623–658.CrossRef Google Scholar

King, G., and Zeng, L.. 2007. “Replication Data for: Improving Forecasts of State Failure.” Harvard Dataverse.Google Scholar

Lai, T. L., Robbins, H., and Wei, C. Z.. 1978. “Strong Consistency of Least Squares Estimates in Multiple Regression.” Proceedings of the National Academy of Sciences of the United States of America 75 (7): 3034–3036.CrossRef Google Scholar PubMed

Lall, R. 2016. “How Multiple Imputation Makes a Difference.” Political Analysis 24 (4): 414–433.CrossRef Google Scholar

Lehmann, E. 1999. Elements of Large-Sample Theory, Springer Texts in Statistics. New York: Springer.10.1007/b98855CrossRef Google Scholar

Little, R. J., and Rubin, D. B.. 2019. Statistical Analysis with Missing Data, Wiley Series in Probability and Statistics, Vol. 793. Hoboken: Wiley.Google Scholar

Liu, Y., Wang, Y., Feng, Y., and Wall, M. M.. 2016. “ Variable Selection and Prediction with Incomplete High-Dimensional Data .” The Annals of Applied Statistics 10 (1): 418–450.CrossRef Google Scholar PubMed

Pepinsky, T. B. 2018. “A Note on Listwise Deletion Versus Multiple Imputation.” Political Analysis 26 (4): 480–488.CrossRef Google Scholar

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar

Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC.CrossRef Google Scholar

Stata.com. 2020. “Regress—Linear Regression.” https://www.stata.com/manuals13/rregress.pdf.Google Scholar

Teorell, J., Sundström, A., Holmberg, S., Rothstein, B., Pachon, N. A., and Dalli, C. M., 2021. “The Quality of Government Standard Dataset, Version Jan21.” University of Gothenburg, The Quality of Government Institute.CrossRef Google Scholar

Wang, J. S., and Aronow, P. M.. 2021. “Replication Data for: Listwise Deletion in High Dimensions.” Harvard Dataverse, Draft Version, UNF:6:0gB5c9RyKb6AH1zMEUNOpQ==[fileUNF].” https://doi.org/10.7910/DVN/T8BG2K.CrossRef Google Scholar

Wang and Aronow supplementary material

PDF 646.6 KB

Article contents

Listwise Deletion in High Dimensions

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Wang and Aronow supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests