Sparse Estimation and Uncertainty with Application to Subgroup Analysis

Marc Ratkovic; Dustin Tingley

doi:10.1017/pan.2016.14

Sparse Estimation and Uncertainty with Application to Subgroup Analysis

Published online by Cambridge University Press: 22 February 2017

Marc Ratkovic and

Dustin Tingley

Show author details

Marc Ratkovic*: Affiliation:
Assistant Professor, Department of Politics, Princeton University, Princeton NJ 08544, USA. Email: [email protected], http://www.princeton.edu/∼ratkovic
Dustin Tingley: Affiliation:
Professor of Government, Harvard University, USA. Email: [email protected], http://scholar.harvard.edu/dtingley
*: *Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We introduce a Bayesian method, LASSOplus, that unifies recent contributions in the sparse modeling literatures, while substantially extending pre-existing estimators in terms of both performance and flexibility. Unlike existing Bayesian variable selection methods, LASSOplus both selects and estimates effects while returning estimated confidence intervals for discovered effects. Furthermore, we show how LASSOplus easily extends to modeling repeated observations and permits a simple Bonferroni correction to control coverage on confidence intervals among discovered effects. We situate LASSOplus in the literature on how to estimate subgroup effects, a topic that often leads to a proliferation of estimation parameters. We also offer a simple preprocessing step that draws on recent theoretical work to estimate higher-order effects that can be interpreted independently of their lower-order terms. A simulation study illustrates the method’s performance relative to several existing variable selection methods. In addition, we apply LASSOplus to an existing study on public support for climate treaties to illustrate the method’s ability to discover substantive and relevant effects. Software implementing the method is publicly available in the R package sparsereg.

Type: Articles
Information: Political Analysis , Volume 25 , Issue 1 , January 2017 , pp. 1 - 40

DOI: https://doi.org/10.1017/pan.2016.14 [Opens in a new window]
Copyright: Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Authors’ note: We are grateful to Neal Beck, Scott de Marchi, In Song Kim, John Londregan, Luke Miratrix, Michael Peress, Jasjeet Sekhon, Yuki Shiraito, Brandon Stewart, and Susan Athey for helpful comments on an earlier draft. Earlier versions presented at the 2015 Summer Methods Meeting, Harvard IQSS Applied Statistics Workshop, Princeton Political Methodology Colloquium, DARPA/ISAT Conference “What If? Machine Learning for Causal Inference,” and EITM 2016. We are also grateful to two anonymous reviewers for detailed feedback on an earlier version. All mistakes are because of the authors. Replication data is available at Ratkovic and Tingley 2016.

Contributing Editor: R. Michael Alvarez

References

Albert, James H., and Chib, Siddhartha. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88:669–679.Google Scholar

Alhamzawi, Rahim, Yu, Keming, and Benoit, Dries F.. 2012. Bayesian adaptive Lasso quantile regression. Statistical Modelling 12(3):279–297.Google Scholar

Armagan, Artin, Dunson, David B., and Lee, Jaeyong. 2013. Generalized double pareto shrinkage. Statistica Sinica 23:119–143.Google Scholar

Bechtel, Michael M., and Scheve, Kenneth F.. 2013. Mass support for global climate agreements depends on institutional design. Proceedings of the National Academy of Sciences 110(34):13763–13768.Google Scholar

Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C.. 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6):2369–2429, doi:10.3982/ECTA9626.Google Scholar

Belloni, Alexandre, and Chernozhukov, Victor. 2013. Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547.Google Scholar

Belloni, Alexandre, Chernozhukov, Victor, and Hansen, Christian. 2011. Inference for high-dimensional sparse econometric models. CeMMAP Working Papers CWP41/11 Centre for Microdata Methods and Practice, Institute for Fiscal Studies.Google Scholar

Benjamini, Yoav, and Yekutieli, Daniel. 2005. False discovery rate-adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association 100(469):71–93.Google Scholar

Berger, J. O., and Bernardo, J. M.. 1989. Estimating a product of means: Bayesian analysis with reference priors. Journal of American Statistical Association 84:200–207.Google Scholar

Berger, James O. 2006. The case for objective Bayesian analysis. Bayesian Analysis 1(3):385–402.Google Scholar

Berger, James O., Bernardo, Jose M., and Sun, Dongchu. 2009. The formal definition of reference priors. The Annals of Statistics 37(2):905–938.Google Scholar

Berger, James O., Wang, Xiaojing, and Shen, Lei. 2015. A Bayesian approach to subgroup identification. Journal of Biopharmaceutical Statistics 24(1):110–129.Google Scholar

Berk, Richard, Brown, Lawrence, Buja, Andreas, Zhang, Kai, and Zhao, Linda. 2013. Valid post-selection inference. Annals of Statistics 41(2):802–837.Google Scholar

Bernardo, J. M. 1979. Reference posterior distributions for Bayesian inference. Journal of the Royal Statistical Society Series B 41:113–147.Google Scholar

Bernardo, Jose M. 2005. Reference analysis. ed. Dey, D. K. and Rao, C. R.. Handbook of statistics . Elsevier.Google Scholar

Berry, Donald. 1990. Subgroup analysis. Biometrics 46(4):1227–1230.Google Scholar

Bhadra, Anindya, Datta, Jyotishka, Polson, Nicholas G., and Willard, Brandon. 2015. The Horseshoe

$+$ estimator of ultra-sparse signals. Working Paper.Google Scholar

Bhattacharya, Anirban, Pati, Debdeep, Pillai, Natesh S., and Dunson, David B.. 2015. Dirichlet-laplace priors for optimal shrinkage. Journal of the Americal Statistical Association 110:1479–1490.Google Scholar

Bickel, Peter, Ritov, Ya’acov, and Tsybakov, Alexandre. 2009. Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics 37(4):1705–1732.Google Scholar

Buhlmann, Peter, and van de Geer, Sara. 2013. Statistics for high-dimensional data . Berlin: Springer.Google Scholar

Candes, E., and Tao, T.. 2007. The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Annals of Statistics 35:2313–2404.Google Scholar

Candes, Emmanuel J. 2006. Modern statistical estimation via oracle inequalities. Acta Numerica 15:1–69.Google Scholar

Carvalho, C., Polson, N., and Scott, J.. 2010. The Horseshoe estimator for sparse signals. Biometrika 97:465–480.Google Scholar

Chatterjee, A., and Lahiri, S. N.. 2011. Bootstrapping lasso estimators. Journal of the American Statistical Association 106(494):608–625.Google Scholar

Chatterjee, Sourav. 2014. Assumptionless consistency of the LASSO. arXiv:1303.5817v5.Google Scholar

Chernozhukov, Victor, Fernández-Val, Iván, and Melly, Blaise. 2013. Inference on counterfactual distributions. Econometrica 81(6):2205–2268.Google Scholar

Datta, Jyotishka, and Ghosh, Jayanta K.. 2013. Asymptotic properties of bayes risk for the Horseshoe prior. Bayesian Analysis 8(1):111–132.Google Scholar

Donoho, David L., and Johnstone, Iain M.. 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455.Google Scholar

Efron, Bradley. 2015. Frequentist accuracy of Bayesian estimates. Journal of the Royal Statistical Society Series B 77(3):617–646.Google Scholar

Esarey, Justin, and Summer, Jane Lawrence. 2015. Marginal effects in interaction models: Determining and controlling the false positive rate. Working Paper.Google Scholar

Fan, Jianqing, and Peng, Heng. 2004. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics 32(3):928–961.Google Scholar

Fan, Jianqing, and Li, Runze. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456):1348–1360.Google Scholar

Figueiredo, Mario. 2004. Lecture Notes on the EM Algorithm. Lecture notes. Instituto de Telecomunicacoes, Instituto Superior Tecnico.Google Scholar

Foster, J. C., Taylor, J. M., and Ruberg, S. J.. 2011. Subgroup identification from randomized clinical trial data. Statistics in Medicine 30(2867-2880).Google Scholar

Gelman, Andrew. 2006. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1(3):515–534.Google Scholar

Gelman, Andrew, Jakulin, Aleks, Pittau, Maria Grazia, and Su, Yu-Sung. 2008. A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics 2(4):1360–1383.Google Scholar

Gelman, Andrew, and Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models . Cambridge: Cambrdige University Press.Google Scholar

Gelman, Andrew, Carlin, John B., Stern, Hal S., Dunson, David B., Vehtari, Aki, and Rubin, Donald B.. 2014. Bayesian data analysis . Text in statistical science series. Boca Raton, FL: CRC Press.Google Scholar

Gill, Jeff. 2014. Bayesian methods: A social and behavioral sciences approach . 3rd ed. CRC Press.Google Scholar

Gillen, B., Montero, S., Moon, H. R., and Shum, M.. 2016. BLP-Lasso for aggregate discrete choice models applied to elections with rich demographic covariates. Working Paper.Google Scholar

Green, Donald P., and Kern, Holger L.. 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly 76:491–511.Google Scholar

Griffin, J. E., and Brown, P. J.. 2010. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5(1):171–188.Google Scholar

Griffin, J. E., and Brown, P. J.. 2012. Structuring shrinkage: Some correlated priors for regression. Biometrika 99(2):481–487.Google Scholar

Grimmer, Justin, Messing, Solomon, and Westwood, Sean. Forthcoming. Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis.Google Scholar

Hahn, P. Richard, and Carvalho, Carlos M.. 2015. Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective. Journal of the American Statistical Association 110(509):435–448.Google Scholar

Hainmueller, Jens, and Hazlett, Chad. 2013. Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Political Analysis 22:143–169.Google Scholar

Hainmueller, Jens, Hopkins, Daniel J., and Yamamoto, Teppei. 2014. Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments. Political Analysis 22(1):1–30.Google Scholar

Hans, Chris. 2009. Bayesian lasso regression. Biometrika 96(4):835–845.Google Scholar

Harding, Matthew, and Lamarche, Carlos. 2016. Penalized quantile regression with semiparametric correlated effects: An application with heterogeneous preferences. Journal of Applied Econometrics , doi:10.1002/jae.2520.Google Scholar

Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2010. The elements of statistical learning: Data mining, inference, and prediction . New York: Springer.Google Scholar

Imai, Kosuke, and Strauss, Aaron. 2011. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get- out-the-vote campaign. Political Analysis 19(1):1–19.Google Scholar

Imai, Kosuke, and Ratkovic, Marc. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics 7(1):443–470.Google Scholar

Jackman, Simon. 2009. Bayesian analysis for the social sciences . West Sussex, UK: Wiley.Google Scholar

Jaynes, E. T. 1982. On the rationale of maximum-entropy methods. Proceedings of the IEEE 70:939–952.Google Scholar

Kang, Jian, and Guo, Jian. 2009. Self-adaptive Lasso and its Bayesian estimation. Working Paper.Google Scholar

Kenkel, Brenton, and Signorino, Curtis. 2012. A method for flexible functional form estimation: Bootstrapped basis regression with variable selection. Working Paper.Google Scholar

Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George. 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369–412.Google Scholar

Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George et al. . 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369–411.Google Scholar

Leeb, Hannes, and Potscher, Benedikt. 2008. Sparse estimators and the oracle property, or the return of hodges estimator. Journal of Econometrics 142:201–211.Google Scholar

Leeb, Hannes, Potscher, Benedikt, and Ewald, Karl. 2015. On various confidence intervals post-model-selection. Statistical Science 30(2):216–227.Google Scholar

Leng, Chenlei, Tran, Minh-Ngoc, and Nott, David. 2014. Bayesian adaptive LASSO. Annals of the Institute of Statistical Mathematics 66(2):221–244.Google Scholar

Lipkovich, I., Dmitrienko, A., Denne, J., and Enas, G.. 2011. Subgrosup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine 30:2601–2621.Google Scholar

Liu, H., and Yu, B.. 2013. Asymptotic properties of Lasso

$+$ mLS and Lasso

$+$ Ridge in sparse high-dimensional linear regression. Electronic Journal of Statistics 7:3124–3169.Google Scholar

Lockhart, Richard, Taylor, Jonathan, Tibshirani, Ryan J., and Tibshirani, Robert. 2014. A significance test for the lasso. The Annals of Statistics 42(2):413–468.Google Scholar

Loh, Wei-Yin, Heb, Xu, and Manc, Michael. 2015. A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine 34:1818–1833.Google Scholar

Minnier, Jessica, Tian, Lu, and Cai, Tianxi. 2011. A perturbation method for inference on regularized regression estimates. Journal of the American Statistical Association 106:1371–1382.Google Scholar

Mitchell, T. J., and Beauchamp, J. J.. 1988. Bayesian variable selection in linear regression. Journal of the Americal Statistical Association 83(404):1023–1032.Google Scholar

O’Hara, R. B., and Silanapaa, M. J.. 2009. A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis 4(1):85–118.Google Scholar

Park, Trevor, and Casella, George. 2008. The bayesian lasso. Journal of the American Statistical Association 103(482):681–686.Google Scholar

Polson, Nicholas, and Scott, James. 2012. Local shrinkage rules, Levy processes and regularized regression. Journal of the Royal Statistical Society, Series B 74(2):287–311.Google Scholar

Potscher, Benedikt, and Leeb, Hannes. 2009. On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. Journal of Multivariate Analysis 100(9):2065–2082.Google Scholar

Ratkovic, Marc, and Tingley, Dustin. 2016. Replication data for: Sparse estimation and uncertainty with application to subgroup analysis. doi:10.7910/DVN/RNMB1Q, Harvard Dataverse, September 6, 2016.Google Scholar

Stewart, Brandon M.Latent factor regressions for the social sciences. Working Paper.Google Scholar

Strezhnev, Anton, Hainmueller, Jens, Hopkins, Daniel, and Yamamoto, Teppei. 2014. cjoint: AMCE estimator for conjoint experiments. R package version 1.0.3.Google Scholar

Su, Xiaogang, Tsai, Chih-Ling, Wang, Hansheng, Nickerson, David M., and Li, Bogong. 2009. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10:141–158.Google Scholar

Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58:267–288.Google Scholar

Tierney, Luke. 1994. Markov chains for exploring posterior distributions. The Annals of Statistics 22(4):1701–1728.Google Scholar

Tingley, Dustin, and Tomz, Michael. 2013. Conditional cooperation and climate change. Comparative Political Studies , p. 0010414013509571.Google Scholar

Wager, S., and Athey, S.. 2015. Estimation and inference of heterogeneous treatment effects using random forests. Working paper.Google Scholar

West, M. 1987. On scale mixtures of normal distributions. Biometrika 74:646–648.Google Scholar

Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476):1418–1429.Google Scholar

Ratkovic and Tingley supplementary material

Ratkovic and Tingley supplementary material 1

File 257.5 KB

Article contents

Sparse Estimation and Uncertainty with Application to Subgroup Analysis

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Ratkovic and Tingley supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests