ON EFFICIENCY GAINS FROM MULTIPLE INCOMPLETE SUBSAMPLES

Saraswata Chaudhuri

doi:10.1017/S0266466619000239

ON EFFICIENCY GAINS FROM MULTIPLE INCOMPLETE SUBSAMPLES

Published online by Cambridge University Press: 04 September 2019

Saraswata Chaudhuri

Show author details

Saraswata Chaudhuri*: Affiliation:
McGill University
*: *Address correspondence to Saraswata Chaudhuri, Department of Economics, McGill University, Montreal, Canada; e-mail: [email protected].

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Cost-effective survey methods such as multi(R)-phase sampling typically generate samples that are collections of monotonic subsamples, i.e., the variables observed for the units in subsample r are also observed for the units in subsample r + 1 for r = 1,…,R – 1. These subsamples represent subpopulations that can be systematically different if the selection of a unit in each phase of sampling depends on the observed variables for that unit from past phases. Our article is about optimally combining all the subsamples for the efficient estimation of a finite dimensional parameter defined by moment restrictions on a generic target population that is an arbitrary union of these subpopulations. Only the R-th subsample is assumed to contain all the variables that are arguments of the moment function. Semiparametric efficiency bounds for estimation are obtained under a unified framework, allowing for full generality of the selection on observables in the sampling design. Contribution of each subsample toward efficient estimation is analyzed; and this turns out to differ fundamentally from that in setups where the same collection of subsamples is instead generated unplanned by unknown sampling. Uniquely, our setup enables all the subsamples to contribute to the efficient estimation for all the target populations, which we show is not possible in other setups. Efficient estimation is standard. Simulation evidence of substantive efficiency gains from using all the subsamples is provided for all the targets.

Type: ARTICLES
Information: Econometric Theory , Volume 36 , Issue 3 , June 2020 , pp. 488 - 525

DOI: https://doi.org/10.1017/S0266466619000239 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

I am very much grateful to the editor P.C.B. Phillips, the co-editor P. Guggenberger and three anonymous referees for their detailed insightful comments. The article was circulated before as “A Note on Efficiency Gains from Multiple Incomplete Subsamples” but the title was modified at the suggestion of the editor. Previous versions of the article, some of which are available on the author’s webpage, benefitted from the helpful comments of A. Prokhorov, C. Muris, D. Guilkey, D. Frazier, E. Renault, F. Lange, J. Hill, J. Haushofer, J. MacKinnon, J. Wooldridge, M. Carrasco, M. Chemin, P. Saha Chaudhuri, S.J. Lee, and V. Zinde-Walsh, the seminar participants at Brown, Concordia, McGill (Econ and Biostat), Queen’s, U. Canterbury, U. Montreal, U. New South Wales, UNC Chapel Hill, U. Sydney, West Virginia University and the Midwest Econometrics Group meetings (2013).

References

REFERENCES

Abrevaya, J. & Donald, S.G. (2017) A GMM approach for dealing with missing data on regressors and instruments. Review of Economics and Statistics 99, 657–662.CrossRef Google Scholar

Ackerberg, D., Chen, X., & Hahn, J. (2012) A practical asymptotic variance estimator for two-step semiparametric estimators. The Review of Economics and Statistics 94, 481–498.CrossRef Google Scholar

Ai, C. & Chen, X. (2012) The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions. Journal of Econometrics 170, 442–457.CrossRef Google Scholar

Andrews, D.W.K. (1994) Asymptotics for semiparametric econometric models via stochastic equicontinuity. Econometrica 62, 43–72.CrossRef Google Scholar

Ashraf, N., Berry, J., & Shapiro, J.M. (2010) Can higher prices stimulate product use? Evidence from a field experiment in zambia. American Economic Review 100, 2383–2413.CrossRef Google Scholar

Ashraf, N., Field, E., & Lee, J. (2014) Household bargaining and excess fertility: An experimental study in zambia. American Economic Review 104, 2210–2237.CrossRef Google Scholar

Barnwell, J.L. & Chaudhuri, S. (2018) Efficient Estimation in Sub and Full Populations with Monotonically Missing at Random Data. Technical report, McGill University.Google Scholar

Beaman, L., Karlan, D., Thusbaert, B., & Udry, C. (2015) Self-Selection into Credit Markets: Evidence from Agriculture in Mali. Mimeo.Google Scholar

Beegle, K., Weerdt, J.D., Friedman, J., & Gibson, J. (2012) Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics 98, 3–18.CrossRef Google Scholar

Brown, B. & Newey, W. (1998) Efficient semiparametric estimation of expectations. Econometrica 66, 453–464.CrossRef Google Scholar

Carroll, R., Ruppert, D., & Stefanski, L. (1995) Measurement Error in Nonlinear Models. Chapman and Hall.CrossRef Google Scholar

Cattaneo, M. (2010) Efficient semiparametric estimation of multivalued treatment effects under ignorability. Journal of Econometrics 155, 138–154.CrossRef Google Scholar

Chamberlain, G. (1992). Comment: Sequential moment restrictions in panel data. Journal of Business and Economic Statistics 10, 20–26.Google Scholar

Chatterjee, N. & Li, Y. (2010) Inference in semiparametric regression models under partial questionnaire design and nonmonotone missing data. Journal of the American Statistical Association 105, 787–797.CrossRef Google Scholar

Chaudhuri, S. (2014) A Note on Efficiency Gains from Multiple Incomplete Subsamples. Mimeo.Google Scholar

Chaudhuri, S. & Guilkey, D.K. (2016) GMM with multiple missing variables. Journal of Applied Econometrics 31, 678–706.CrossRef Google Scholar

Chaudhuri, S. & Hill, J.B. (2016) Heavy Tail Robust Estimation and Inference for Average Treatment Effect. Technical report, University of North Carolina.Google Scholar

Chen, X., Hong, H., & Tamer, E. (2005) Measurement error models with auxiliary data. Review of Economic Studies 72, 343–366.CrossRef Google Scholar

Chen, X., Hong, H., & Tarozzi, A. (2008) Semiparametric efficiency in GMM models with auxiliary data. Annals of Statistics 36, 808–843.CrossRef Google Scholar

Chen, X., Linton, O., & van Keilegom, I. (2003) Estimation of semiparametric models when the criteria function is not smooth. Econometrica 71, 1591–1608.CrossRef Google Scholar

Dardanoni, V., Modica, S., & Peracchi, F. (2011) Regression with imputed covariates: A generalized missing-indicator approach. Journal of Econometrics 162, 362–368.CrossRef Google Scholar

Devereux, P.J. & Tripathi, G. (2009) Optimally combining censored and uncensored datasets. Journal of Econometrics 151, 17–32.CrossRef Google Scholar

Graham, B.S. (2011) Efficiency bounds for missing data models with semiparametric restrictions. Econometrica 79, 437–452.Google Scholar

Graham, B.S., Pinto, C., & Egel, D. (2012) Inverse probability tilting for moment condition models with missing data. Review of Economic Studies 79, 1053–1079.CrossRef Google Scholar

Graham, B.S., Pinto, C.C.D.X., & Egel, D. (2016) Efficient estimation of data combination models by the method of auxiliary-to-study tilting. Journal of Business and Economic Statistics 34, 288–301.CrossRef Google Scholar

Graham, J.W., Hofer, S.M., & MacKinnon, D.P. (1996) Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research 31, 197–218.CrossRef Google Scholar PubMed

Graham, J.W., Taylor, B.J., Olchowski, A.E., & Cumsille, P.E. (2006) Planned missing data designs in psychological research. Psychological Methods 11, 323–342.CrossRef Google Scholar PubMed

Hahn, J. (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66, 315–331.CrossRef Google Scholar

Holcroft, C., Rotnitzky, A., & Robins, J.M. (1997) Efficient estimation of regression parameters from multistage studies with validation of outcome and covariates. Journal of Statistical Planning and Inference 65, 349–374.CrossRef Google Scholar

Holt, C.A. & Laury, S.K. (2002) Risk aversion and incentive effects. The American Economic Review 92, 1644–1655.CrossRef Google Scholar

Ichimura, I. & Martinez-Sanchis, E. (2005) Identification and Estimation of GMM Models by Combining Two Data Sets. Working paper.Google Scholar

Khan, S. & Tamer, E. (2010) Irregular identification, support conditions, and inverse weight estimation. Econometrica 78, 2021–2042.Google Scholar

Lee, A.J., Scott, A.J., & Wild, C.J. (2012) Efficient estimation in multiphase case–control studies. Biometrika 97, 361–374.CrossRef Google Scholar

Little, R. & Rubin, D. (2002) Statistical Analysis with Missing Data. Wiley.CrossRef Google Scholar

McKenzie, D. & Rosenzweig, M. (2012) Preface for symposium on measurement and survey design. Journal of Development Economics 98, 1–2.CrossRef Google Scholar

Muris, C. (2016) Efficient GMM Estimation with a General Missing Data Pattern. Technical report, Simon Frasier University.Google Scholar

Newey, W.K. & McFadden, D.L. (1994) Large sample estimation and hypothesis testing. In Engle, R.F. & McFadden, D. (eds.), Handbook of Econometrics, vol. IV, chapter 36. pp. 2212–2245. Elsevier Science Publisher.Google Scholar

Pakes, A. & Pollard, D. (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57, 1027–1057.CrossRef Google Scholar

Raghunathan, T.E. & Grizzle, J.E. (1995) A split questionnaire survey design. Journal of the American Statistical Association 90, 54–63.CrossRef Google Scholar

Reilly, M. (1996) Optimal sampling strategies for two-stage studies. American Journal of Epidemiology 143, 92–100.CrossRef Google Scholar PubMed

Ridder, G. & Moffitt, R. (2007) The econometrics of data combination. In Heckman, J.J. & Leamer, E.E. (eds.), Handbook of Econometrics, vol. 6B, chapter 75. pp. 5470–5547. Elsevier Science Publisher.Google Scholar

Robins, J. & Rotnitzky, A. (1995) Semiparametric efficiency in multivariate regression models with missing data. Journal of American Statistical Association 90, 122–129.CrossRef Google Scholar

Robins, M., Rotnitzky, A., & Zhao, L. (1994) Estimation of regression coefficients when some regressors are not always observed. Journal of American Statistical Association 427, 846–866.CrossRef Google Scholar

Robins, M., Rotnitzky, A., & Zhao, L. (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of American Statistical Association 429, 106–121.CrossRef Google Scholar

Rotnitzky, A. & Robins, J. (1995) Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82, 805–820.CrossRef Google Scholar

Rubin, D. (1976) Inference and missing data. Biometrika 63, 581–592.CrossRef Google Scholar

Shoemaker, D.M. (1973) Principles and Procedures of Multiple Matrix Sampling. Ballinger.Google Scholar

Thornton, R.L. (2008) The demand for, and impact of, learning HIV status. American Economic Review 98, 1829–1863.CrossRef Google Scholar PubMed

Tripathi, G. (2009) Optimally combining censored and uncensored datasets. Journal of Econometrics 151, 17–32.Google Scholar

Tripathi, G. (2011) Moment-based inference with stratified data. Econometric Theory 27, 47–73.CrossRef Google Scholar

Tsiatis, A.A. (2006) Semiparametric Theory and Missing Data. Springer.Google Scholar

Wacholder, S., Carroll, R.J., Pee, D., & Gail, M.H. (1994) The partial questionnaire design for case-control studies. Statistics in Medicine 13, 623–634.CrossRef Google Scholar PubMed

Whittemore, A.S. (1997) Multistage sampling designs and estimating equations. Journal of Royal Statistical Society, Series B 59, 589–602.CrossRef Google Scholar

Wooldridge, J. (1999) Asymptotic properties of weighted M-estimators for variable probability samples. Econometrica 69, 1385–1406.CrossRef Google Scholar

Wooldridge, J. (2007) Inverse probability weighted estimation for general missing data problems. Journal of Econometrics 141(2), 1281–1301.CrossRef Google Scholar

Chaudhuri supplementary material

Online supplement

PDF 253.4 KB

Article contents

ON EFFICIENCY GAINS FROM MULTIPLE INCOMPLETE SUBSAMPLES

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

REFERENCES

Chaudhuri supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests