Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-27T15:11:49.711Z Has data issue: false hasContentIssue false

Computer-Assisted Text Analysis for Comparative Politics

Published online by Cambridge University Press:  04 January 2017

Christopher Lucas
Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge St., Cambridge MA 02138, USA, e-mail: [email protected]
Richard A. Nielsen
Affiliation:
Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue Cambridge, MA 02139, USA, e-mail: [email protected]
Margaret E. Roberts
Affiliation:
Department of Political Science, University of California, San Diego, 9500 Gilman Drive, #0521 La Jolla, CA 92093, USA, e-mail: [email protected]
Brandon M. Stewart
Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138, USA, e-mail: [email protected]
Alex Storer
Affiliation:
Graduate School of Business, Stanford University, 655 Knight Way, Stanford, CA 94305, USA, e-mail: [email protected]
Dustin Tingley*
Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge St., Cambridge, MA 02138, USA
*
e-mail: [email protected] (corresponding author)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Recent advances in research tools for the systematic analysis of textual data are enabling exciting new research throughout the social sciences. For comparative politics, scholars who are often interested in non-English and possibly multilingual textual datasets, these advances may be difficult to access. This article discusses practical issues that arise in the processing, management, translation, and analysis of textual data with a particular focus on how procedures differ across languages. These procedures are combined in two applied examples of automated text analysis using the recently introduced Structural Topic Model. We also show how the model can be used to analyze data that have been translated into a single language via machine translation tools. All the methods we describe here are implemented in open-source software packages available from the authors.

Type
Articles
Copyright
Copyright © The Author 2015. Published by Oxford University Press on behalf of the Society for Political Methodology 

Footnotes

Authors' note: Our thanks to Sam Brotherton and Jetson Leder-Luis for research assistance and Amy Catilinac for discussion about text analyses in comparative politics. We also thank Christopher Blattman, Dan Corstange, Macartan Humphreys, Amaney Jamal, Gary King, Helen Milner, Tamar Mitts, Brendan O’Connor, Arthur Spirling, and the Columbia University Comparative Politics Workshop for comments. Our software discussed in this article is open source and available.

References

Alfonseca, E., Bilac, S., and Pharies, S. 2008. Decompounding query keywords from compounding languages. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, 253–256. Association for Computational Linguistics.Google Scholar
Barberá, P. 2012. Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data. In APSA 2012 Annual Meeting Paper.Google Scholar
Baturo, A., and Mikhaylov, S. 2013. Life of Brian revisited: Assessing informational and non-informational leadership tools. Political Science Research and Methods 1(01): 139–57.Google Scholar
Blei, D. M. 2012. Probabilistic topic models. Communications of the ACM 55(4): 7784.Google Scholar
Blei, D. M., and Lafferty, J. D. 2007. A correlated topic model of science. Annals of Applied Statistics 1(1): 1735.Google Scholar
Boyd-Graber, J., and Blei, D. M. 2009. Multilingual topic models for unaligned text. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 7582. AUAI Press.Google Scholar
Boyd-Graber, J., and Resnik, P. 2010. Holistic sentiment analysis across languages: Multilingual supervised latent dirichlet allocation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 4555. Association for Computational Linguistics.Google Scholar
Brachman, J. 2009. Global Jihadism. New York: Routledge.Google Scholar
Brady, H. E., and Collier, D. 2010. Rethinking social inquiry: Diverse tools, shared standards. Lanham, MD: Rowman & Littlefield.Google Scholar
Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Computational Linguistics 16(2): 7985.Google Scholar
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2): 263311.Google Scholar
Budge, I., Hans-Dieter, K., Andrea, V., Judith, B., and Eric, T. 2001. Mapping Policy Preferences: Estimates for Parties, Electors, and Governments 1945–1998. Oxford: Oxford University Press, Oxford, UK.Google Scholar
Campbell, R. S., and Pennebaker, J. W. 2003. The secret life of pronouns flexibility in writing style and physical health. Psychological Science 14(1): 6065.Google Scholar
Catalinac, A. 2014. Pork to policy: The Rise of National Security in Elections in Japan, unpublished manuscript.Google Scholar
Cheng, K.-S., Young, G. H., and Wong, K.-F. 1999. A study on word-based and integral-bit Chinese text compression algorithms. Journal of the American Society for Information Science 50(3): 218–28.Google Scholar
Chiozza, G. 2009. Anti-Americanism and the American world order. Baltimore: Johns Hopkins University Press.Google Scholar
Coscia, M., and Rios, V. 2012. Knowing where and how criminal organizations operate using web content. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 1412–1421. ACM.Google Scholar
Eggers, A., and Spirling, A. 2011. Partisan convergence in executive-legislative interactions modeling debates in the House of Commons, 1832–1915. Unpublished manuscript.Google Scholar
Farrell, H., and Finnemore, M. 2013. The end of hypocrisy: American foreign policy in the age of leaks. Foreign Affairs 92:22.Google Scholar
Feinerer, I., Hornik, K., and Meyer, D. 2008. Text mining infrastructure in R. Journal of Statistical Software 25(5): 154.Google Scholar
Fokkens, A., Van Erp, M., Postma, M., Pedersen, T., Vossen, P., and Freire, N. 2013. Offspring from reproduction problems: What replication failure teaches us. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1691–1701, Sofia, Bulgaria, August. Association for Computational Linguistics.Google Scholar
George, A., and Bennett, A. 2005. Case studies and theory development in the social sciences. Cambridge, MA: MIT Press.Google Scholar
Griffiths, T. L., and Steyvers, M. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101(Suppl 1): 5228–235.Google Scholar
Grimmer, J. 2010. A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. Political Analysis 18(1):1.Google Scholar
Grimmer, J., and Stewart, B. M. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3): 267–97.Google Scholar
Habash, N., and Hu, J. 2009. Improving Arabic-Chinese statistical machine translation using English as pivot language. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 173–81. Association for Computational Linguistics.Google Scholar
Harman, D. 1991. How effective is suffixing? JASIS 42(1): 715.Google Scholar
Hollink, V., Kamps, J., Monz, C., and De Rijke, M. 2004. Monolingual document retrieval for European languages. Information Retrieval 7(1–2): 3352.Google Scholar
Hu, Y., Zhai, K., Eidelman, V., and Boyd-Graber, J. 2014. Polylingual tree-based topic models for translation domain adaptation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers):1166–1176.Google Scholar
Hull, D. A. 1996. Stemming algorithms: A case study for detailed evaluation. JASIS 47(1): 7084.Google Scholar
Jamal, A., Keohane, R. O., Romney, D., and Tingley, D. n.d. Anti-Americanism or anti-interventionism? Evidence from the Arabic Twitter universe. Perspectives on Politics. Forthcoming.Google Scholar
Katzenstein, P. J., and Keohane, R. O. 2007. Varieties of anti-Americanism: A framework for analysis. In Anti-Americanisms in world politics, eds. Katzenstein, P. J. and Keohane, R. O., 938. Ithaca: Cornell University Press.Google Scholar
King, G., Pan, J., and Roberts, M. E. 2013. How censorship in China allows government criticism but silences collective expression. American Political Science Review 107:118.Google Scholar
Koehn, P. 2009. Statistical machine translation. Cambridge, UK: Cambridge University Press.Google Scholar
Krovetz, R. J. 1995. Word-sense disambiguation for large text databases PhD thesis, University of Massachusetts, Amherst.Google Scholar
Laver, M., Benoit, K., and Garry, J. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97(02): 311–31.Google Scholar
Lunde, K. 2009. CJKV information processing. New York, NY: O’Reilly Media, Inc.Google Scholar
Lynch, M. 2007. Anti-Americanism in the Arab world. In Anti-Americanisms in world politics, eds. Katzenstein, P. J. and Keohane, R. O., 196224. Ithaca: Cornell University Press.Google Scholar
Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to information retrieval, Vol. 1. Cambridge: Cambridge University Press.Google Scholar
McCallum, A. K. 2002. Mallet: A machine learning for language toolkit. Available at http://mallet.cs.umass.edu.Google Scholar
McCants, W. 2006. Militant ideology atlas. Technical report, Combating Terrorism Center, U.S. Military Academy.Google Scholar
Miller, M. C. 2013. Wronged by empire: Post-imperial ideology and foreign policy in India and China. Stanford, CA: Stanford University Press.Google Scholar
Mimno, D., Wallach, H. M., Naradowsky, J., Smith, D. A., and McCallum, A. 2009. Polylingual topic models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, 880–889. Association for Computational Linguistics.Google Scholar
Mosteller, F., and Wallace, D. L. 1963. Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist Papers. Journal of the American Statistical Association 58(302): 275309.Google Scholar
Nielsen, R. 2013. The lonely Jihadist: Weak networks and the radicalization of Muslim clerics. PhD Thesis, Harvard University. Ann Arbor: ProQuest/UMI (Publication No. 3567018).Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics, 311–318. Association for Computational Linguistics.Google Scholar
Paul, M., Yamamoto, H., Sumita, E., and Nakamura, S. 2009. On the importance of pivot language selection for statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 221224. Association for Computational Linguistics.Google Scholar
Quinn, K., Monroe, B., Colaresi, M., Crespin, M., and Radev, D. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1): 209228.Google Scholar
Roberts, M. E., Stewart, B. M., and Airoldi, E. 2015. A model of text for experimentation in the social sciences. Unpublished manuscript.Google Scholar
Roberts, M. E., Stewart, B. M., and Tingley, D. 2014. stm: R package for structural topic models. R package version 0.6.21. software package http://structuraltopicmodel.com/.Google Scholar
Roberts, M. E., Stewart, B. M., Tingley, D., and Airoldi, E. M. 2013. The structural topic model and applied social science. Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation.Google Scholar
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S., Albertson, B., and Rand, D. 2014. Structural topic models for open-ended survey responses. American Journal of Political Science 58(4): 10641082.Google Scholar
Rubin, B. 2002. The real roots of Arab anti-Americanism. Foreign Affairs 81(6): 7385.Google Scholar
Salton, G. 1989. Automatic text processing: The transformation, analysis, and retrieval of information by computer. Boston, MA: Addison-Wesley.Google Scholar
Schonhardt-Bailey, C. 2006. From the Corn Laws to free trade [electronic resource]: Interests, ideas, and institutions in historical perspective. Cambridge, MA: MIT Press.Google Scholar
Schrodt, P. A., and Gerner, D. J. 1994. Validity assessment of a machine-coded event data set for the Middle East, 1982–92. American Journal of Political Science 38(3): 825854.Google Scholar
Slapin, J. B., and Proksch, S.-O. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705722.Google Scholar
Stewart, B. M., and Zhukov, Y. M. 2009. Use of force and civil-military relations in Russia: An automated content analysis. Small Wars & Insurgencies 20(2): 319343.Google Scholar
Stockmann, D. 2012. Media commercialization and authoritarian rule in China. New York, NY: Cambridge University Press.Google Scholar
Telhami, S. 2002. The stakes: America and the Middle East. Boulder, CO: Westview Press.Google Scholar
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., and Manning, C. 2005. A conditional random field word segmenter for Sighan Bakeoff 2005. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Vol. 171. Jeju Island, Korea.Google Scholar
Utiyama, M., and Isahara, H. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In 2007 Proceedings of NAACL/HLT, pp. 484491.Google Scholar
Van Atteveldt, W., Kleinnijenhuis, J., and Ruigrok, N. 2008. Parsing, semantic networks, and political authority using syntactic analysis to extract semantic relations from Dutch newspaper articles. Political Analysis 16(4): 428446.Google Scholar
Volkens, A., Lehmann, P., Merz, N., Regel, S., Werner, A., Lacewell, O., and Schultze, H. 2013. The manifesto data collection. In Manifesto Project (MRG/CMP/MARPOR). Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB).Google Scholar
Zhao, B., and Xing, E. P. 2006. Bitam: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 969–76. Association for Computational Linguistics.Google Scholar
Supplementary material: PDF

Lucas et al. supplementary material

Appendix

Download Lucas et al.  supplementary material(PDF)
PDF 269.2 KB