Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Reagan Mozer; Luke Miratrix; Aaron Russell Kaufman; L. Jason Anastasopoulos

doi:10.1017/pan.2020.1

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Published online by Cambridge University Press: 17 March 2020

Reagan Mozer

Luke Miratrix

Aaron Russell Kaufman

and

L. Jason Anastasopoulos

Show author details

Reagan Mozer*: Affiliation:
Bentley University, Department of Mathematical Sciences, Waltham, MA02452-4713, USA. Email: [email protected]
Luke Miratrix: Affiliation:
Harvard Graduate School of Education, Cambridge, MA02138, USA. Email: [email protected]
Aaron Russell Kaufman: Affiliation:
Division of Social Science, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates. Email: [email protected]
L. Jason Anastasopoulos: Affiliation:
University of Georgia, Department of Public Administration and Policy and Political Science, Athens, GA30601, USA. Email: [email protected]
*: *Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into (1) the choice of text representation and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether, we evaluate over 100 unique text-matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher subjective match quality than current state-of-the-art techniques. We enhance the precision of these results by developing a predictive model to estimate the match quality of pairs of text documents as a function of our various distance scores. This model, which we find successfully mimics human judgment, also allows for approximate and unsupervised evaluation of new procedures in our context. We then employ the identified best method to illustrate the utility of text matching in two applications. First, we engage with a substantive debate in the study of media bias by using text matching to control for topic selection when comparing news articles from thirteen news sources. We then show how conditioning on text data leads to more precise causal inferences in an observational study examining the effects of a medical intervention.

Keywords

statistical analysis of texts matching methods observational studies

Type: Articles
Information: Political Analysis , Volume 28 , Issue 4 , October 2020 , pp. 445 - 468

DOI: https://doi.org/10.1017/pan.2020.1 [Opens in a new window]
Copyright: Copyright © The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Contributing Editor: Jeff Gill

References

Aronson, A. R. 2001. “Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The Metamap Program.” In Proceedings of the AMIA Symposium , 17–21. American Medical Informatics Association.Google Scholar

Austin, P. C. 2009. “Balance Diagnostics for Comparing the Distribution of Baseline Covariates Between Treatment Groups in Propensity-score Matched Samples.” Statistics in Medicine 28(25):3083–3107.CrossRef Google Scholar PubMed

Budak, C., Goel, S., and Rao, J. M.. 2016. “Fair and Balanced? Quantifying Media Bias Through Crowdsourced Content Analysis.” Public Opinion Quarterly 80:250–271.CrossRef Google Scholar

Budak, C., Goel, S., and Rao, J. M.. 2019. Quantifying News Media Bias through Crowdsourcing and Machine Learning Dataset. University of Michigan - Deep Blue.Google Scholar

D’Amour, A., Ding, P., Feller, A., Lei, L., and Sekhon, J.. 2017 “Overlap in Observational Studies With High-Dimensional Covariates.” Preprint, arXiv:1711.02582.Google Scholar

Dehejia, R. H., and Wahba, S.. 2002. “Propensity Score-Matching Methods for Nonexperimental Causal Studies.” Review of Economics and Statistics 84(1):151–161.CrossRef Google Scholar

Egami, N., Fong, C. J., Grimmer, J., Roberts, M. E., and Stewart, B. M.. 2017 “How to Make Causal Inferences Using Texts.” Preprint.Google Scholar

Enos, R. D., Hill, M., and Strange, A. M.. 2016 “Voluntary Digital Laboratories for Experimental Social Science: The Harvard Digital Lab for the Social Sciences.” Working Paper.Google Scholar

Feng, M., McSparron, J., Kien, D. T., Stone, D., Roberts, D., Schwartzstein, R., Vieillard-Baron, A., and Celi, L. A.. 2018 “When More is Not Less: A Robust Framework to Evaluate the Value of a Diagnostic Test in Critical Care.” Submitted.Google Scholar

Fogarty, C. B., Mikkelsen, M. E., Gaieski, D. F., and Small, D. S.. 2016. “Discrete Optimization for Interpretable Study Populations and Randomization Inference in an Observational Study of Severe Sepsis Mortality.” Journal of the American Statistical Association 111(514):447–458.CrossRef Google Scholar

Gentzkow, M., and Shapiro, J. M.. 2006. “Media Bias and Reputation.” Journal of Political Economy 114(2):280–316.CrossRef Google Scholar

Gentzkow, M., and Shapiro, J. M.. 2010. “What Drives Media Slant? Evidence From Us Daily Newspapers.” Econometrica 78(1):35–71.Google Scholar

Groeling, T. 2013. “Media Bias by the Numbers: Challenges and Opportunities in the Empirical Study of Partisan News.” Annual Review of Political Science 16:129–151.10.1146/annurev-polisci-040811-115123CrossRef Google Scholar

Groseclose, T., and Milyo, J.. 2005. “A Measure of Media Bias.” The Quarterly Journal of Economics 120(4):1191–1237.CrossRef Google Scholar

Gu, X. S., and Rosenbaum, P. R.. 1993. “Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms.” Journal of Computational and Graphical Statistics 2(4):405–420.Google Scholar

Hansen, B. B., and Klopfer, S. O.. 2006. “Optimal Full Matching and Related Designs via Network Flows.” Journal of computational and Graphical Statistics 15(3):609–627.CrossRef Google Scholar

Ho, D. E., and Quinn, K. M.. 2008. “Measuring Explicit Political Positions of Media.” Quarterly Journal of Political Science 3(4):353–377.CrossRef Google Scholar

Holland, P. W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81(396):945–960.CrossRef Google Scholar

Iacus, S. M., King, G., Porro, G., and Katz, J. N.. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis 20(1):1–24.CrossRef Google Scholar

Imai, K., King, G., and Stuart, E. A.. 2008. “Misunderstandings Between Experimentalists and Observationalists About Causal Inference.” Journal of the Royal Statistical Society: Series A 171(2):481–502.CrossRef Google Scholar

Imbens, G. W., and Rubin, D. B.. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge: Cambridge University Press.CrossRef Google Scholar

Johnson, A. E., Pollard, T. J., Shen, L., Li-wei, H. L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., and Mark, R. G.. 2016. “Mimic-Iii, a Freely Accessible Critical Care Database.” Scientific Data 3: 160035.CrossRef Google Scholar PubMed

Kaufman, A. R. 2020. “Measuring the Content of Presidential Policy Making: Applying Text Analysis to Executive Branch Directives.” Presidential Studies Quarterly , doi:10.1111/psq.126629.CrossRef Google Scholar

Kohavi, R. et al. . 1995. “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.” In IJCAI’95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2 , 1137–1143. San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar

Kroeger, M. A.2016. “Plagiarizing Policy: Model Legislation in State Legislatures.” Princeton Typescript.Google Scholar

Le, Q., and Mikolov, T.. 2014. “Distributed Representations of Sentences and Documents.” In International Conference on Machine Learning , edited by Xing, E. P. and Jebara, T., 1188–1196.Google Scholar

MacLean, D. L., and Heer, J.. 2013. “Identifying Medical Terms in Patient-Authored Text: A Crowdsourcing-Based Approach.” Journal of the American Medical Informatics Association 20(6):1120–1127.CrossRef Google Scholar PubMed

Mason, W., and Suri, S.. 2012. “Conducting Behavioral Research on Amazon’s Mechanical Turk.” Behavior Research Methods 44(1):1–23.CrossRef Google Scholar PubMed

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems , edited by Burges, C. J. C., Bottou, L., Ghahramani, Z., and Weinberger, K. Q., 3111–3119. Red Hook, NY: Curran Associates.Google Scholar

Mozer, R.2019a. “Replication Data for: Matching With Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality.” https://doi.org/10.7910/DVN/K8IL3V, Harvard Dataverse, V1.CrossRef Google Scholar

Mozer, R.2019b. textmatch: Tools for matching text and measuring match quality. R version v0.0.0 (Version v0.0.0). Zenodo. http://doi.org/10.5281/zenodo.2626730.CrossRef Google Scholar

Peterson, A., and Spirling, A.. 2018. “Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26(1):120–128.CrossRef Google Scholar

Roberts, M. E., Stewart, B. M., and Airoldi, E. M.. 2016. “A Model of Text for Experimentation in the Social Sciences.” Journal of the American Statistical Association 111(515):988–1003.CrossRef Google Scholar

Roberts, M. E., Stewart, B. M., and Nielsen, R. A.. 2019. “Adjusting for Confounding with Text Matching.” Working Papers, https://scholar.princeton.edu/sites/default/files/bstewart/files/textbasedconfounding.pdf.Google Scholar

Rosenbaum, P. R. 1989. “Optimal Matching for Observational Studies.” Journal of the American Statistical Association 84(408):1024–1032.CrossRef Google Scholar

Rosenbaum, P. R. 2002. “Observational Studies.” In Observational Studies , 1–17. New York: Springer.CrossRef Google Scholar

Rosenbaum, P. R. 2010. Design of Observational Studies . New York: Springer.CrossRef Google Scholar PubMed

Rosenbaum, P. R., and Rubin, D. B.. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70(1):41–55.CrossRef Google Scholar

Rosenbaum, P. R., and Rubin, D. B.. 1985. “Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score.” The American Statistician 39(1):33–38.Google Scholar

Rubin, D. B. 1973a. “Matching to Remove Bias in Observational Studies.” Biometrics 29(1):159–183.CrossRef Google Scholar

Rubin, D. B. 1973b. “The Use of Matched Sampling and Regression Adjustment to Remove Bias in Observational Studies.” Biometrics 29(1):185–203.CrossRef Google Scholar

Rubin, D. B. 1978. “Bias Reduction Using Mahalanobis Metric Matching.” ETS Research Report Series 1978(2):1–10.Google Scholar

Rubin, D. B. 2006. Matched Sampling for Causal Effects . Cambridge: Cambridge University Press.CrossRef Google Scholar

Salton, G. 1991. “Developments in Automatic Text Retrieval.” Science 253(5023):974–980.CrossRef Google Scholar PubMed

Salton, G., and McGill, M. J.. 1986. Introduction to Modern Information Retrieval . New York: McGraw-Hill, Inc.Google Scholar

Sarndal, C.-E., Swensson, B., and Wretman, J.. 2003. Model Assisted Survey Sampling . New York: Springer.Google Scholar

Silber, J. H., Rosenbaum, P. R., Ross, R. N., Ludwig, J. M., Wang, W., Niknam, B. A., Mukherjee, N., Saynisch, P. A., Even-Shoshan, O., and Kelz, R. R.. 2014. “Template Matching for Auditing Hospital Cost and Quality.” Health Services Research 49(5):1446–1474.CrossRef Google Scholar PubMed

Smith, H. L. 1997. “Matching With Multiple Controls to Estimate Treatment Effects in Observational Studies.” Sociological Methodology 27(1):325–353.CrossRef Google Scholar

Snow, R., O’Connor, B., Jurafsky, D., and Ng, A. Y.. 2008. “Cheap and Fast—but Is It Good?: Evaluating Non-Expert Annotations for Natural Language Tasks.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing , edited by Lapata, M. and Tou Ng, H., 254–263. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Steiner, D. F., MacDonald, R., Liu, Y., Truszkowski, P., Hipp, J. D., Gammage, C., Thng, F., Peng, L., and Stumpe, M. C.. 2018. “Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer.” The American Journal of Surgical Pathology 42(12):1636–1646.CrossRef Google Scholar PubMed

Stuart, E. A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25(1):1–25.CrossRef Google Scholar

Taddy, M. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association 108(503):755–770.CrossRef Google Scholar

Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological) 58(1):267–288.CrossRef Google Scholar

Zeng, Q. T., Tse, T., Divita, G., Keselman, A., Crowell, J., Browne, A. C., Goryachev, S., and Ngo, L.. 2007. “Term Identification Methods for Consumer Health Vocabulary Development.” Journal of Medical Internet Research 9(1):e4.CrossRef Google Scholar PubMed

Zubizarreta, J. R., Small, D. S., and Rosenbaum, P. R.. 2014. “Isolation in the Construction of Natural Experiments.” The Annals of Applied Statistics 8(4):2096–2121.CrossRef Google Scholar

Mozer et al. supplementary material

File 1.4 MB

Article contents

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Abstract

Keywords

Access options

Footnotes

References

Mozer et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests