Hostname: page-component-586b7cd67f-vdxz6 Total loading time: 0 Render date: 2024-11-23T05:07:42.609Z Has data issue: false hasContentIssue false

Using Word Order in Political Text Classification with Long Short-term Memory Models

Published online by Cambridge University Press:  23 December 2019

Charles Chang
Affiliation:
Postdoctoral Associate, The Council on East Asian Studies, Yale University, New Haven, CT06511, USA Postdoctoral Associate, Center on Religion and Chinese Society, Purdue University, West Lafayette, IN47907, USA. Email: [email protected]
Michael Masterson*
Affiliation:
PhD Candidate, Political Science at the University of Wisconsin–Madison, Madison, WI53706, USA. Email: [email protected]

Abstract

Political scientists often wish to classify documents based on their content to measure variables, such as the ideology of political speeches or whether documents describe a Militarized Interstate Dispute. Simple classifiers often serve well in these tasks. However, if words occurring early in a document alter the meaning of words occurring later in the document, using a more complicated model that can incorporate these time-dependent relationships can increase classification accuracy. Long short-term memory (LSTM) models are a type of neural network model designed to work with data that contains time dependencies. We investigate the conditions under which these models are useful for political science text classification tasks with applications to Chinese social media posts as well as US newspaper articles. We also provide guidance for the use of LSTM models.

Type
Articles
Copyright
Copyright © The Author(s) 2019. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Contributing Editor: Daniel Hopkins

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., and Devin, M. et al. . 2016. “Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.” Preprint, arXiv:1603.04467.Google Scholar
Baum, M. A., Cohen, D. K., and Zhukov, Y. M.. 2018. “Does Rape Culture Predict Rape? Evidence from U.S. Newspapers, 2000–2013.” Quarterly Journal of Political Science (QJPS) 13(3):263289.CrossRefGoogle Scholar
Beck, N., King, G., and Zeng, L.. 2000. “Improving Quantitative Studies of International Conflict: A Conjecture.” The American Political Science Review 94(1):2135.CrossRefGoogle Scholar
Bengio, Y. 2009. “Learning Deep Architectures for AI.” Foundations and Trends in Machine Learning 2(1):1127.CrossRefGoogle Scholar
Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., and Mikhaylov, S.. 2016. “Crowd-Sourced Text Analysis: Reproducible and Agile Production of Political Data.” American Political Science Review 110(2):278295.CrossRefGoogle Scholar
Bird, S., Klein, E., and Loper, E.. 2009. Natural Language Processing with Python. Sebastopol, CA: O’Reilly Media, Inc.Google Scholar
Burscher, B., Vliegenthart, R., and De Vreese, C. H.. 2015. “Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?The ANNALS of the American Academy of Political and Social Science 659(1):122131.CrossRefGoogle Scholar
Carlson, D., and Montgomery, J. M.. 2017. “A Pairwise Comparison Framework for Fast, Flexible, and Reliable Human Coding of Political Texts.” American Political Science Review 111(4):835843.CrossRefGoogle Scholar
Chang, C., and Masterson, M.. 2019. “Replication Data for: Using Word Order in Political Text Classification with Long Short-Term Memory Models.” https://doi.org/10.7910/DVN/MRVKIR, Harvard Dataverse, V1.CrossRefGoogle Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.. 2002. “SMOTE: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16:321357.CrossRefGoogle Scholar
Chollet, F. et al. . 2015. Keras. GitHub. https://github.com/fchollet/keras.Google Scholar
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” Preprint, arXiv:1412.3555.Google Scholar
Cortez, P., and Embrechts, M. J.. 2013. “Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models.” Information Sciences 225:117.CrossRefGoogle Scholar
Diermeier, D., Godbout, J.-F., Yu, B., and Kaufmann, S.. 2012. “Language and Ideology in Congress.” British Journal of Political Science 42(01):3155.CrossRefGoogle Scholar
D’Orazio, V., Landis, S. T., Palmer, G., and Schrodt, P.. 2014. “Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines.” Political Analysis 22(2):224242.CrossRefGoogle Scholar
Elman, J. 1990. “Finding Structure in Time.” Cognitive Science 14(2):179211.CrossRefGoogle Scholar
Fernández, A., Garcia, S., Herrera, F., and Chawla, N. V.. 2018. “Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary.” Journal of Artificial Intelligence Research 61:863905.CrossRefGoogle Scholar
Friedman, J., Hastie, T., and Tibshirani, R.. 2001. The Elements of Statistical Learning, vol. 1 (Springer Series in Statistics). New York: Springer.Google Scholar
Gers, F. A., Schmidhuber, J., and Cummins, F.. 2000. “Learning to forget: Continual prediction with LSTM.” Neural Comput. 12(10):24512471.CrossRefGoogle ScholarPubMed
Geurts, P., Ernst, D., and Wehenkel, L.. 2006. “Extremely randomized trees.” Machine Learning 63(1):342.CrossRefGoogle Scholar
Google. 2017. Vector Representation of Words. www.tensorflow.org/tutorials/word2vec.Google Scholar
Graves, A., and Schmidhuber, J.. 2005. “Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures.” Neural Networks 18(5-6):602610.CrossRefGoogle ScholarPubMed
Greff, K., Srivastava, R. K., Koutník, Jan, Steunebrink, B. R., and Schmidhuber, J.. 2016. “LSTM: A Search Space Odyssey.” IEEE Transactions on Neural Networks and Learning Systems 28(10):22222232.CrossRefGoogle ScholarPubMed
Grimmer, J., and Stewart, B. M.. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21(03):267297.CrossRefGoogle Scholar
Han, R., Gill, M., Spirling, A., and Cho, K.. 2018. “Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop.” Working Paper.CrossRefGoogle Scholar
Hochreiter, S. 1998. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(02):107116.CrossRefGoogle Scholar
Hochreiter, S., and Schmidhuber, J.. 1997a. “Long Short-Term Memory.” Neural Computation 9(8):17351780.CrossRefGoogle Scholar
Hochreiter, S., and Schmidhuber, J.. 1997b. “LSTM Can Solve Hard Long Time Lag Problems.” In Advances in Neural Information Processing Systems, edited by Mozer, M. C., Jordan, M. I., and Petsche, T., 473479. Cambridge, MA: MIT Press.Google Scholar
Hopkins, D. J., and King, G.. 2010. “A Method of Automated Nonparametric Content Analysis for Social Science.” American Journal of Political Science 54(1):229247.CrossRefGoogle Scholar
James, G., Witten, D., Hastie, T., and Tibshirani, R.. 2013. An Introduction to Statistical Learning, vol. 112. New York: Springer.CrossRefGoogle Scholar
Jernite, Y., Grave, E., Joulin, A., and Mikolov, T.. 2017. “Variable Computation in Recurrent Neural Networks.” In 5th International Conference on Learning Representations, ICLR 2017.Google Scholar
Johnson, R., and Zhang, T.. 2016.“Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings.” Preprint, arXiv:1602.02373.Google Scholar
Krebs, R. R. 2015. “How Dominant Narratives Rise and Fall: Military Conflict, Politics, and the Cold War Consensus.” International Organization 69(04):809845.CrossRefGoogle Scholar
Liang, H., Sun, X., Sun, Y., and Gao, Y.. 2017. “Text Feature Extraction Based on Deep Learning: A Review.” EURASIP Journal on Wireless Communications and Networking 2017(1):211.CrossRefGoogle ScholarPubMed
Liu, X., Wu, J., and Zhou, Z.. 2009. “Exploratory Undersampling for Class-Imbalance Learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539550.Google ScholarPubMed
Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., and Tingley, D.. 2015. “Computer-Assisted Text Analysis for Comparative Politics.” Political Analysis 23(2):254277.CrossRefGoogle Scholar
Maknickiene, N., and Maknickas, A.. 2012. Application of Neural Network for Forecasting of Exchange Rates and Forex Trading. Vilnius Gediminas Technical University Publishing House Technika, 122127.Google Scholar
Mikolov, T., Chen, K., Corrado, G., and Dean, J.. 2013a, “Efficient Estimation of Word Representations in Vector Space.” Preprint, arXiv:1301.3781.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.. 2013b. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems, edited by Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., 31113119. Cambridge, MA: MIT Press.Google Scholar
Mnih, A., and Kavukcuoglu, K.. 2013. “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation.” In Advances in Neural Information Processing Systems, edited by Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., 22652273. Cambridge, MA: MIT Press.Google Scholar
Montgomery, J. M., and Olivella, S.. 2018. “Tree-Based Models for Political Science Data.” American Journal of Political Science 62(3):729744.CrossRefGoogle Scholar
Olden, J. D., and Jackson, D. A.. 2002. “Illuminating the Black Box: a Randomization Approach for Understanding Variable Contributions in Artificial Neural Networks.” Ecological Modelling 154(1):135150.CrossRefGoogle Scholar
Osowski, S., Siwek, K., and Markiewicz, T.. 2004. “Mlp and SVM Networks-a Comparative Study.” In Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004, 3740. Espoo, Finland: IEEE.Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.. 2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12:28252830.Google Scholar
Prechelt, L. 1998. “Early Stopping-But When?” In Neural Networks: Tricks of the Trade, 5569. Heidelberg: Springer.CrossRefGoogle Scholar
Ruizendaal, R.2017. “Deep Learning #4: Why You Need to Start Using Embedding Layers” Towards Data Science, July 17. https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12.Google Scholar
Spirling, A. 2012. “US Treaty Making With American Indians: Institutional Change and Relative Power, 1784–1911.” American Journal of Political Science 56(1):8497.CrossRefGoogle Scholar
Sun, J.2015. Jieba. Version 0.38. https://github.com/fxsjy/jieba.Google Scholar
TensorFlow. 2018. Vector Representations of Words. https://www.tensorflow.org/tutorials/representation/word2vec.Google Scholar
Theano Development Team. 2017. LSTM Networks for Sentiment Analysis. http://deeplearning.net/tutorial/lstm.html.Google Scholar
Weiss, G. M. 2004. “Mining with Rarity: a Unifying Framework.” ACM Sigkdd Explorations Newsletter 6(1):719.CrossRefGoogle Scholar
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., and Lipson, H.. 2015. “Understanding neural networks through deep visualization.” Preprint, arXiv:1506.06579.Google Scholar
Supplementary material: File

Chang and Masterson supplementary material

Chang and Masterson supplementary material

Download Chang and Masterson supplementary material(File)
File 1.2 MB