Hostname: page-component-586b7cd67f-dlnhk Total loading time: 0 Render date: 2024-11-22T22:32:09.841Z Has data issue: false hasContentIssue false

An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

Published online by Cambridge University Press:  24 July 2003

Get access

Abstract

Despite widespread recognition that aggregated summary statistics on international conflict and cooperation miss most of the complex interactions among nations, the vast majority of scholars continue to employ annual, quarterly, or (occasionally) monthly observations. Daily events data, coded from some of the huge volume of news stories produced by journalists, have not been used much for the past two decades. We offer some reason to change this practice, which we feel should lead to considerably increased use of these data. We address advances in event categorization schemes and software programs that automatically produce data by “reading” news stories without human coders. We design a method that makes it feasible, for the first time, to evaluate these programs when they are applied in areas with the particular characteristics of international conflict and cooperation data, namely event categories with highly unequal prevalences, and where rare events (such as highly conflictual actions) are of special interest. We use this rare events design to evaluate one existing program, and find it to be as good as trained human coders, but obviously far less expensive to use. For large-scale data collections, the program dominates human coding. Our new evaluative method should be of use in international relations, as well as more generally in the field of computational linguistics, for evaluating other automated information extraction tools. We believe that the data created by programs similar to the one we evaluated should see dramatically increased use in international relations research. To facilitate this process, we are releasing with this article data on 3.7 million international events, covering the entire world for the past decade.

Type
Research Notes
Copyright
Copyright © The IO Foundation 2003

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Appelt, Douglas E., and Israel, David J.. 1999 Introduction to Information Extraction Technology. A Tutorial Prepared for IJCAI-99. Available at ⟨http://www.ai.sri.com/~appelt/ie-tutorial/IJCAI99.pdf⟩.Google Scholar
Azar, Edward E. 1982. Codebook of the Conflict and Peace Databank. College Park: Center for International Development, University of Maryland.Google Scholar
Bond, Doug, Jenkins, J. Craig, Taylor, Charles L., and Schock, Kurt. 1997. Mapping Mass Political Conflict and Civil Society: Issues and Prospects for the Automated Development of Events Data. Journal of Conflict Resolution 41 (4):554–79.CrossRefGoogle Scholar
Bond, Doug, Joe Bond, J. Craig Jenkins, Oh, Churl, and Taylor, Charles L.. 2001. Integrated Data for Events Analysis (IDEA): An Event Form Typology for Automated Events Data Development. Unpublished manuscript, Harvard University, Cambridge, Mass.Google Scholar
Breslow, Norman E. 1996. Statistics in Epidemiology: The Case-Control Study. Journal of the American Statistical Association 91 (433):1428.CrossRefGoogle ScholarPubMed
Cowie, Jim, and Lehnert, Wendy. 1996. Information Extraction. Communications of the ACM 39 (1):8091.CrossRefGoogle Scholar
Davies, John L., and McDaniel, Chad K.. 1994. A New Generation of International Event-Data. International Interactions 20 (1–2):5578.CrossRefGoogle Scholar
Fellbaum, Christine, ed. 1998. WordNet. An Electronic Lexical Database. Cambridge, Mass.: MIT Press.CrossRefGoogle Scholar
Gerner, Deborah J., Schrodt, Philip A., Francisco, Ronald A., and Weddle, Judith L.. 1994. Machine Coding of Event Data Using Regional and International Sources. International Studies Quarterly 38 (1):91119.CrossRefGoogle Scholar
Goldstein, Joshua S. A. 1992. Conflict-Cooperation Scale for WEIS Events Data. The Journal of Conflict Resolution 36 (2):369–85.CrossRefGoogle Scholar
Goldstein, Joshua S. A., and Pevehouse, Jon C.. 1997. Reciprocity, Bullying and International Conflict: Time-Series Analysis of the Bosnia Conflict. American Political Science Review 91 (3):515–29.CrossRefGoogle Scholar
Goldstein, Joshua S. A., Pevehouse, Jon C., Gerner, Deborah J., and Telhami, Shibley. 2001. Reciprocity, Tringularity, and Cooperation in the Middle East, 1979–97. Journal of Conflict Resolution 45 (5):594620.CrossRefGoogle Scholar
Grishman, Ralph. 1997. Information Extraction: Techniques and Challenges. In Information Extraction. A Multidisciplinary Approach to an Emerging Information Technology, edited by Pazienza, Maria Teresa, 1027. Berlin: Springer Verlag.CrossRefGoogle Scholar
Grishman, Ralph, and Sundheim, Beth. 1996. Message Understanding Conference 6: A Brief History. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING-96), edited by Grishman, Ralph and Sundheim, Beth, 466–71. Copenhagen.CrossRefGoogle Scholar
Jurafsky, Daniel, and Martin, James H.. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, N.J.: Prentice Hall.Google Scholar
King, Gary, and Zeng, Langche. 2001a. Explaining Rare Events in International Relations. International Organization 55 (3):693715.CrossRefGoogle Scholar
King, Gary, and Zeng, Langche. 2001b. Logistic Regression in Rare Events Data. Political Analysis 9 (1):137–63.CrossRefGoogle Scholar
King, Gary, and Zeng, Langche. 2002. Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies. Statistics in Medicine 21 (10):1409–27.CrossRefGoogle ScholarPubMed
Laurance, Edward J. 1990. Events Data and Policy Analysis: Improving the Potential for Applying Academic Research to Foreign and Defense Policy Problems. Policy Sciences 23 (2):111–32.Google Scholar
Leng, Russell J., and Singer, J. David. 1988. Militarized Interstate Crises: The BCOW Typology and its Applications. International Studies Quarterly 32 (2):155–73.CrossRefGoogle Scholar
McClelland, Charles A. 1978. World Event/Interaction Survey (WEIS) Project, 1966–1978. Ann Arbor, Mich.: Inter-University Consortium for Political and Social Research.Google Scholar
Merritt, Richard L. 1994. Measuring Events for International Political Analysis. International Interactions 20 (1–2):333.CrossRefGoogle Scholar
Most, Benjamin A., and Starr, Harvey. 1984. International Relations, Foreign Policy Substitutability, and “Nice” Laws. World Politics 36 (3):383406.CrossRefGoogle Scholar
Rummell, Rudolph J. 1975. The Dimensions of Nations. Beverly Hills, Calif.: Sage.Google Scholar
Schrodt, Philip A. 1995. Event Data in Foreign Policy Analysis. In Foreign Policy Analysis: Continuity and Change in its Second Generation, edited by Neack, Laura, Haney, Patrick J., and Hay, Jean A. K., 145–66. Englewood Cliffs, N.J.: Prentice-Hall.Google Scholar
Schrodt, Philip A., and Gerner, Deborah J.. 1994. Validity Assessment of a Machine-Coded Event Data Set for the Middle East, 1982–92. American Journal of Political Science 38 (3):825–54.CrossRefGoogle Scholar
Schrodt, Philip A., and Gerner, Deborah J.. 2000. Cluster-Based Early Warning Indicators for Political Change in the Contemporary Levant. American Political Science Review 94 (4):803–18.CrossRefGoogle Scholar
Schrodt, Philip A., Davis, Shannon G., and Weddle, Judith L.. 1994. Political Science: KEDS—A Program for the Machine Coding of Event Data. Social Science Computer Review 12 (4):561–88.CrossRefGoogle Scholar
Sowa, John F. 1999. Knowledge Representation Logical, Philosophical and Computational Foundations. Pacific Grove, Calif.: Brooks Cole.Google Scholar
Sundheim, Beth. 1992. Overview of the Fourth Message Understanding Evaluation and Conference. In Proceedings of the Fourth Message Understanding Conference, edited by Sundheim, Beth, 322. San Mateo, Calif.: Morgan Kaufmann.Google Scholar
Sundheim, Beth, ed. 1991. Proceedings of the Third Message Understanding Conference. San Mateo, Calif.: Morgan Kaufmann.Google Scholar
Taylor, Charles Lewis, Joe Bond, Doug Bond, Jenkins, J. Craig, and Kuzucu, Zeynep Benderlioglu. 1999. Conflict-Cooperation for Interstate and Intrastate Interactions: An Expansion of the Goldstein Scale. Paper presented at the 40th Annual Convention of the International Studies Association, February, Washington, D.C. Available at ⟨http://www.ciaonet.org/isa/trc01/⟩.Google Scholar