Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia

Cyrus Samii; Laura Paler; Sarah Zukerman Daly

doi:10.1093/pan/mpw019

Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia

Published online by Cambridge University Press: 04 January 2017

Cyrus Samii ,

Laura Paler and

Sarah Zukerman Daly

Show author details

Cyrus Samii*: Affiliation:
Department of Politics, New York University, 19 West 14th Street, New York, NY 10012
Laura Paler: Affiliation:
Department of Political Science, University of Pittsburgh, 4600 Wesley W. Posvar Hall, Pittsburgh, PA 15260 e-mail: [email protected]
Sarah Zukerman Daly: Affiliation:
Department of Political Science, University of Notre Dame, 217 O’Shaughnessy Hall, Notre Dame, IN 46556 e-mail: [email protected]
*: e-mail: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We present new methods to estimate causal effects retrospectively from micro data with the assistance of a machine learning ensemble. This approach overcomes two important limitations in conventional methods like regression modeling or matching: (i) ambiguity about the pertinent retrospective counterfactuals and (ii) potential misspecification, overfitting, and otherwise bias-prone or inefficient use of a large identifying covariate set in the estimation of causal effects. Our method targets the analysis toward a well-defined “retrospective intervention effect” based on hypothetical population interventions and applies a machine learning ensemble that allows data to guide us, in a controlled fashion, on how to use a large identifying covariate set. We illustrate with an analysis of policy options for reducing ex-combatant recidivism in Colombia.

Type: Articles
Information: Political Analysis , Volume 24 , Issue 4 , Autumn 2016 , pp. 434 - 456

DOI: https://doi.org/10.1093/pan/mpw019 [Opens in a new window]
Copyright: Copyright © The Author 2016. Published by Oxford University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Authors’ note: Authors are listed in reverse alphabetical order and are equal contributors to the project. All replication materials are available at the Political Analysis Dataverse (article url: http://dx.doi.org/10.7910/DVN/QXCFO2). We thank Carolina Serrano for excellent research assistance in Colombia and the team at Fundación Ideas para la Paz for their collaboration in the data collection. We also thank the Organization of American States, Misión de Apoyo al Proceso de Paz and the Agencia Colombiana para la Reintegración for their collaboration. Daly acknowledges funding from the Swedish Foreign Ministry, the Smith Richardson Foundation, and the Carroll L. Wilson Award. For helpful discussions, the authors thank Michael Alvarez, two anonymous Political Analysis reviewers, Deniz Aksoy, Peter Aronow, Neal Beck, Matthew Blackwell, Drew Dimmery, Ryan Jablonski, Michael Peress, Fredrik Savje, Maya Sen, Teppei Yamamoto, Rodrigo Zarazaga, and seminar participants at the American Political Science Association annual meetings, European Political Science Association annual meetings, Empirical Studies of Conflict working group, Massachussetts Institute of Technology, Midwest Political Science association annual meetings, New York University, and the University of Rochester. Supplementary materials for this article are available on the Political Analysis website.

References

Angrist, Joshua D., and Krueger, Alan B. 1999. Empirical strategies in labor economics. In Handbook of labor economics, eds. Ahsenfelter, Orley C. and Card, David, Vol. 3:1277–1366. Amsterdam: North Holland.Google Scholar

Angrist, Joshua D., and Pischke, Jorn-Steffen. 2009. Mostly harmless econometrics: an empiricist's companion. Princeton, NJ: Princeton University Press.Google Scholar

Aronow, Peter M., and Samii, Cyrus. 2016. Does regression produce representative estimates of causal effects? American Journal of Political Science 60(1):250–67.Google Scholar

Athey, Susan, and Imbens, Guido W. 2015. Machine learning methods for estimating heterogeneous causal effects. Working paper.Google Scholar

Bang, Heejung, and Robins, James M. 2005. Doubly robust estimation in missing data and causal inference models. Biometrics 61:962–72.Google Scholar

Bickel, Peter J., and Li, Bo. 2006. Regularization in statistics. Test 15(2):271–344.CrossRef Google Scholar

Blackwell, Matthew. 2013. A framework for dynamic causal inference in political science. American Journal of Political Science 57(2):504–19.Google Scholar

Busso, Matias, DiNardo, John, and McCrary, Justin. 2014. New evidence on the finite sample properties of propensity score reweighting and matching estimators. The Review of Economics and Statistics 96(5):885–97.Google Scholar

Chalimourda, Athanassia, Schoelkopf, Bernhard, and Smola, Alex J. 2004. Experimentally optimal v in support vector regression for difference noise models and parameter settings. Neural Networks 17:127–41.Google Scholar

Chen, Pai-Hsuen, Lin, Chih-Jen, and Schoelkopf, Bernhard. 2005. A tutorial on nu-support vector machines. Applied Stochastic Models in Business and Industry 21:111–36.Google Scholar

Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. 2010. BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1):266–98.Google Scholar

Cox, David R. 1958. Planning of experiments. New York: Wiley.Google Scholar

Crump, Richard K., Joseph Hotz, V., Imbens, Guido W., and Mitnik, Oscar A. 2009. Dealing with limited overlap in estimation of average treatment effects. Biometrika 96(1):187–99.Google Scholar

Daly, Sarah Zukerman, Laura, Paler, and Cyrus, Samii. 2016. Wartime Networks and the Social Logic of Crime. Typescript, University of Notre Dame, University of Pittsburgh: New York University.Google Scholar

Gelman, Andrew, Jakulin, Aleks, Grazia Pittau, Maria, and Su, Yu-Sung. 2008. A weakly informative default prior for logistic and other regression models. Annals of Applied Statistics 2(4):1360–83.Google Scholar

Geman, Stuart, and Hwang, Chii-Ruey. 1982. Nonparametric maximum likelihood estimation by the method of sieves. The Annals of Statistics 10(2):401–14.Google Scholar

Green, Donald P., and Kern, Holger L. 2012. Modeling heterogenous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly 76(3):491–511.Google Scholar

Greenshtein, Eitan, and YaAcov, Ritov. 2004. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6):971–88.Google Scholar

Grimmer, Justin, Messing, Solomon, and Westwood, Sean J. 2014. Estimating heterogenous treatment effects and the effects of heterogenous treatments with ensemble methods. Unpublished manuscript, Stanford University.Google Scholar

Hainmueller, Jens. 2011. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis 17(4):400–17.Google Scholar

Hainmueller, Jens, and Hazlett, Chad. 2014. Kernel regularized least squares: Reducing misspecification bias with a flexible ad interpretable machine learning approach. Political Analysis 22(2):143–68.CrossRef Google Scholar

Hansen, Ben B. 2008. The prognostic analogue to the propensity score. Biometrika 95(2):481–88.Google Scholar

Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.Google Scholar

Hill, Jennifer. 2011. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20(1):217–40.CrossRef Google Scholar

Ho, Daniel E., Imai, Kosuke, King, Gary, and Stuart, Elizabeth A. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15(3):199–236.CrossRef Google Scholar

Holland, Paul W. 1986. Statistics and causal inference. Journal of the American Statistical Association 81(396):945–60.Google Scholar

Hubbard, Alan E., and Van der Laan, Mark J. 2008. Population intervention models in causal inference. Biometrika 95(1):35–47.Google Scholar

Imai, Kosuke, and Ratkovic, Marc. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Annals of Applied Statistics 7(1):443–70.Google Scholar

Imai, Kosuke, and Strauss, Aaron. 2011. Estimation of heterogenous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19(1):1–19.Google Scholar

Imai, Kosuke, and van Dyk, David A. 2004. Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association 99(467):854–66.CrossRef Google Scholar

Imbens, Guido W. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics 86(1):4–29.Google Scholar

Imbens, Guido W., and Wooldridge, Jeffrey M. 2009. Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47(1):5–86.Google Scholar

International Crisis Group. 2012. Dismantling Colombia's new illegal armed groups: Lessons from a surrender. International Crisis Group Latin America Report 41.Google Scholar

King, Gary, and Zeng, Langche. 2002. Estimating risk and rate leveks, ratios, and differences in case–control studies. Statistics in Medicine 21(10):1409–27.Google Scholar

King, Gary, and Zeng, Langche. 2006. The dangers of extreme counterfactuals. Political Analysis 14(2):131–59.Google Scholar

Korn, Edward L., and Graubard, Barry I. 1999. Analysis of health surveys. New York: Wiley.CrossRef Google Scholar

Little, Roderick J.A., and Rubin, Donald B. 2002. Statistical analysis with missing data, 2nd ed. Hoboken, NJ: Wiley.Google Scholar

Lumley, Thomas. 2010. Complex surveys: A guide to analysis in R. Hoboken, NJ: Wiley.Google Scholar

Manski, Charles F. 1995. Identification problems in the social sciences. Cambridge, MA: Harvard University Press.Google Scholar

Montgomery, Jacob M., Hollanbach, Florian M., and Ward, Michael D. 2012. Improving predictions using ensemble Bayesian model averaging. Political Analysis 20:271–91.Google Scholar

Myers, Jessica A., Rassen, Jeremy A., Gagne, Jashua J., Huybrechts, Krista F., Schneeweiss, Sebastian, Rothman, Kenneth J., Joffe, Marshall M., and Glynn, Robert J. 2011. Effects of adjusting for instrumental variables on bias and precision of effect estimates. American Journal of Epidemiology 174(11):1213–22.Google Scholar

O’Brien, Peter C. 1984. Procedures for comparing samples with multiple endpoints. Biometrics 40(4):1079–87.Google Scholar

Pearl, Judea. 2009. Causality: Models, reasoning, and inference, 2nd ed. New York: Cambridge University Press.Google Scholar

Pearl, Judea. 2010. On a class of bias-amplifying variables that endanger effect estimates. In Proceedings of UAI, eds. Grunwald, Peter and Spirtes, Peter, 417–24. Corvallis, OR: AUAI.Google Scholar

Petersen, Maya L., Porter, Kristin E., Gruber, Susan, Wang, Yue, and Van der Laan, Mark J. 2011. Positivity. In Targeted learning: Causal inference for observational and experimental data, eds. Van der Laan, Mark J. and Rose, Sherri, chap. 10, 161–86. New York: Springer.Google Scholar

Polley, Eric C., and Van der Laan, Mark J. 2012. SuperLearner: Super learner prediction. R package version 2.0–9. http://cran.r-project.org/web/packages/SuperLearner/index.html.Google Scholar

Polley, Eric C., Rose, Sherri, and Van der Laan, Mark J. 2011. Super learning. In Targeted learning: Causal inference for observational and experimental data, eds. Van der Laan, Mark J. and Rose, Sherri, chap. 3, 43–66. New York: Springer.Google Scholar

Ratkovic, Marc. 2014. Balancing within the margin: Causal effect estimation with support vector machines. Unpublished manuscript, Princeton University.Google Scholar

Robins, James M., and Rotnitzky, Andrea. 1995. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association 90:122–29.Google Scholar

Rosenbaum, Paul R. 1984. The consequences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society, Series A 147(5):656–66.Google Scholar

Rosenbaum, Paul R., and Rubin, Donald B. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55.Google Scholar

Rothman, Kenneth J., Greenland, Sander, and Lash, Timothy L. 2008. Modern epidemiology, 3rd ed. Philadelphia, PA: Lippincott, Williams. and Wilkins.Google Scholar

Royston, Patrick. 2004. Multiple imputation of missing values. Stata Journal 4(3):227–41.Google Scholar

Rubin, Donald B. 1978. Bayesian inference for causal effects: The role of randomization. The Annals of Statistics 6(1):34–58.Google Scholar

Rubin, Donald B. 2008. For objective causal inference, design trumps analysis. The Annals of Applied Statistics 2(3):808–40.Google Scholar

Rubin, Donald D. 1990. Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference 25:279–92.Google Scholar

Samii, Cyrus. 2016. Replication data for: Retrospective causal inference with machine learning ensembles: An application to anti-recidivism policies in Colombia. http://dx.doi.org/10.7910/DVN/QXCFO2, Harvard Dataverse.Google Scholar

Sekhon, Jasjeet S. 2009. Opiates for the matches: Matching methods for causal inference. Annual Review of Political Science 12(1):487–508.Google Scholar

Tourangeau, Roger, and Yan, Ting. 2005. Sensitive questions in surveys. Psychological Bulletin 133(5):859–83.Google Scholar

Van der Laan, Mark J., Polley, Eric C., and Hubbard, Alan E. 2007. Super learner. Statistical Applications in Genetic and Molecular Biology 6(1):1–21.Google Scholar

Van der Laan, Mark J., and Rose, Sherry. 2011. Targeted learning: Causal inference for observational and experimental data. New York: Springer.Google Scholar

VanderWeele, Tyler, 2009. Concerning the consistency assumption in causal inference. Epidemiology 20(6):880–83.Google Scholar

Young, Jessica G., Hubbard, Alan E., Eshkenazi, Brenda, and Jewell, Nicholas P. 2009. A machine-learning algorithm for estimating and ranking the impact of environmental risk factors in exploratory epidemiological studies. University of California Berkeley Division of Biostatistics Working Paper Series 250.Google Scholar

Samii et al. Supplementary Material

Supplementary Material

PDF 149.3 KB

Article contents

Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-recidivism Policies in Colombia

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Samii et al. Supplementary Material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests