Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-22T16:03:43.394Z Has data issue: false hasContentIssue false

Introducing ICBe: an event extraction dataset from narratives about international crises

Published online by Cambridge University Press:  24 May 2024

Rex W. Douglass*
Affiliation:
Department of Political Science, University of California, San Diego, CA, USA
Thomas Leo Scherer
Affiliation:
Department of Political Science, University of California, San Diego, CA, USA
J. Andrés Gannon
Affiliation:
Department of Political Science, Vanderbilt University, Nashville, TN, USA
Erik Gartzke
Affiliation:
Department of Political Science, University of California, San Diego, CA, USA
Jon Lindsay
Affiliation:
School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, GA, USA
Shannon Carcelli
Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Jonathan Wilkenfeld
Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
David M. Quinn
Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Catherine Aiken
Affiliation:
Center for Security and Emerging Technology, Georgetown University, Washington, DC, USA
Jose Miguel Cabezas Navarro
Affiliation:
Health and Society Research Center, Universidad Mayor, Santiago, Chile
Neil Lund
Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Egle Murauskaite
Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Diana Partridge
Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
*
Corresponding author: Rex W. Douglass; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

How do international crises unfold? We conceptualize international relations as a strategic chess game between adversaries and develop a systematic way to measure pieces, moves, and gambits accurately and consistently over a hundred years of history. We introduce a new ontology and dataset of international events called ICBe based on a very high-quality corpus of narratives from the International Crisis Behavior (ICB) Project. We demonstrate that ICBe has higher coverage, recall, and precision than existing state of the art datasets and conduct two detailed case studies of the Cuban Missile Crisis (1962) and the Crimea-Donbas Crisis (2014). We further introduce two new event visualizations (event iconography and crisis maps), an automated benchmark for measuring event recall using natural language processing (synthetic narratives), and an ontology reconstruction task for objectively measuring event precision. We make the data, supplementary appendix, replication material, and visualizations of every historical episode available at a companion website crisisevents.org.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press on behalf of EPS Academic Ltd

If we could record every international interaction in the realms of diplomacy, conflict, economics, and beyond, how much unique information would this chronicle amount to, and how surprised would we be to see something new? In other words, what is the entropy of international relations? While this record could in principle be unbounded, the central conceit of social science is that there are structural regularities that limit what actors can do, their best options, and even which actors are likely to survive (Brecher, Reference Brecher1999; Reiter, Reference Reiter2015). If so, then these events can be recorded and systematically measured by social scientists interested in these regularities.Footnote 1 A large and growing measurement literature seeks to do just that, using human coding and improving natural language processing techniques to capture unstructured streams of events from text such as international news reports.Footnote 2

We advance existing efforts to identify and structure regularized events and actors in international politics by combining human coding with natural language processing to create (1) a large, flexible ontology of international affairs and (2) a fine-grained and structured event dataset of international crises from 1918 to 2017, which we developed by applying our ontology to an unusually high-quality corpus of historical narratives of international crises (Brecher, Reference Brecher1999; Wilkenfeld and Brecher, Reference Wilkenfeld, Brecher and Midlarsky2000; Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016). We then develop several methods for objectively gauging how well these event codings reconstruct the information contained in the original crisis narrative. We conclude by benchmarking our event codings against several current state-of-the-art event data collection efforts. The underlying fine-grained variation in international affairs is unrecognizable through the lens of current quantification efforts. We find that existing models produce data on historical episodes that do not contain enough information to reconstruct the underlying event. In focusing this initial effort on international crises as a proof of concept sample, we demonstrate our ontology and method's potential to improve upon existing empirical identifications of patterns of international interactions.

Over the next five sections, this measurement paper makes the following arguments. First, there is a real-world unobserved latent concept known as international relations that can and should be systematically measured. Second, we propose a method for systematic large-scale measurement of the actors and behaviors in international affairs and as a proof of concept apply that method to a well-regarded and salient sample of events known as international crises. Third, in doing so, we confirm that those measurements exhibit several desirable kinds of internal and external validity and out-perform existing approaches. Fourth, this validation can be evaluated in detail via new event visualizations, with examples provided for case studies of the 1962 Cuban Missile Crisis and 2014 Crimea-Donbas Crisis. A final section concludes.

1. Identifying and measuring international relations

1.1 Motivation

Our knowledge of any historical episode, including the participants and their preferences, behaviors, and beliefs, is only indirectly observed from historical records that most often take the form of unstructured natural language text. Despite its complexity, all international interactions fundamentally involve a finite set of actors expressing their interests through at least theoretically observable behaviors. So how can we abstract and measure discrete events that make up a historical episode in international relations? The easiest way to convey the desired product is with an example. Figure 1 shows a narrative account of the Cuban Missile Crisis (1962) in natural language sentences alongside a mapping to discrete machine-readable abstractive events. From this, scholars can identify similarities and differences across events like what foreign policy actions deter versus inflame (Jervis, Reference Jervis1978; Glaser, Reference Glaser2000), when third parties mediate (Haffar, Reference Haffar2002; Quinn et al., Reference Quinn, Wilkenfeld, Smarick and Asal2006), and how actors communicate resolve (Trager, Reference Trager2016; Lupton, Reference Lupton2018). Identifying patterns of international interactions is not just an inherently interesting enterprise; it is a necessary precondition to important efforts to predict where policymakers should turn their attention to improve global welfare (Ward et al., Reference Ward, Metternich, Dorff, Gallop, Hollenbach, Schultz and Weschle2013; Beger et al., Reference Beger, Morgan and Ward2021).

Figure 1. Comparison of a natural language and machine-readable abstractive account of the Cuban Missile Crisis (1962). The text on the left is a summary of the event from the ICB Crisis Narrative. The mapping on the right shows the corresponding ICBe coding.

1.2 Existing state of the art measurements

We begin by drawing informative prior beliefs about the underlying process of international relations that we expect to govern behavior during historical episodes and their later transcription into the historical record. We organize our prior beliefs along two overarching axes: (1) existing efforts to identify the actors/actions of international relations; and (2) the types of behaviors and information we hope to recover. Table 1 describes these two axes as columns and rows, respectively.

Table 1. Ontological coverage of ICBe versus the existing state of the art

The rows in Table 1 represent the types of information we expect to find in international relations and forms the basis for our proposed ontology. We began the ontology by first doing a full natural language processing pass of the corpus and identifying all of the named entities and verbs mentioned in the text. To identify possible behaviors, we matched verbs to the most likely definition found in Wordnet (Miller, Reference Miller1995), tallied them (SI Appendix 1.2), and then aggregated them into a smaller number of behaviors balancing conceptual detail with manageable sparsity for human coding (informed by existing conceptual literature and measurement research). We used the International Crisis Behavior (ICB) project actor level data to identify likely actors for each crisis and location options relative to each actor. For behavior, actor, and location, coders could write-in a value if the given options were insufficient. The codebook lists eleven behaviors added post-coding as coders flagged events that were not captured by the initial ontology (e.g., propaganda).

As we are not the first to attempt to measure international relations in a structured manner, the columns of Table 1 compare the ontological coverage of ICBe to existing state of the art systems in production and with global coverage. We choose these datasets and models as they represent frequently used and reputable efforts to structure and describe historical events of interest to scholars of international politics. The first column starts with our contribution, ICBe, alongside other event-level datasets including CAMEO dictionary lookup-based systems (Historical Phoenix (Althaus et al., Reference Althaus, Bajjalieh, Carter, Peyton and Shalmon2019); ICEWS (Boschee et al., Reference Boschee, Lautenschlager, O'Brien, Shellman, Starz and Ward2015); Terrier (Grant et al., Reference Grant, Halterman, Irvine, Liang and Jabr2017)), the Militarized Interstate Disputes Incidents dataset, the UCDP-GED dataset (Sundberg and Melander., Reference Sundberg and Melander2013; Davies et al., Reference Davies, Pettersson and Öberg2022), and ACLED (Raleigh et al., Reference Raleigh, Linke, Hegre and Karlsen2010).Footnote 3 The final set of columns compares episode-level datasets beginning with the original ICB project (Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016; Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982; Beardsley et al., Reference Beardsley, James, Wilkenfeld and Brecher2020), the Militarized Interstate Disputes dataset (Gibler, Reference Gibler2018; Palmer et al., Reference Palmer, McManus, D'Orazio, Kenwick, Karstens, Bloch, Dietrich, Kahn, Ritter and Soules2022), and the Correlates of War (Sarkees and Wayman, Reference Sarkees and Wayman2010). We include episode-level datasets as they remain a common and trusted tool for analyzing international relations, and because ICBe is unique among event-level datasets as events are matched to crises and can be aggregated to the episode level. There is imperfect overlap between their intended depth and scope of coverage; “international crises” are similar, but not identical to, “interstate wars” and “militarized interstate disputes,” which differ yet again from “individual events of organized violence” and “non-violent action.” Even like-concepts require care in comparison, as an “aim” in ICBe is the same as in MIPS, but an “alert” in ICBe is not the same as an “alert” in MIDs.

This comparison is not intended to fault existing data and models for not including every variable in ICBe's ontology, as some of these variables fall outside the scope of a particular dataset's intended purpose. Rather, it serves as an initial basis for identifying the heterogeneity in existing efforts to abstract and measure discrete historical events of interest and to provide theoretical justifications from existing research about what is included in our dataset's ontology and where ICBe's detail about historical events can be compared to the current state of the art.

With the exception of large-scale CAMEO dictionary-based systems (the first grouping of columns), our ontology improves upon the existing state of the art quantitative datasets that ignore important information about international interactions.Footnote 4 We highlight two particular innovations. First, we separate the “chess pieces” from the “chess players” in distinguishing between different actors within a state. By virtue of our ontology, coding military versus civilian actors and national leaders versus bureaucrats, our data can be used to explore important questions concerning civilian-military relations (Narang and Talmadge, Reference Narang and Talmadge2018), Track Two diplomacy, the role of sub-national actors (Hsu et al., Reference Hsu, Höhne, Kuramochi, Vilariño and Sovacool2020), and the evolution of which actors are engaged in crises—a topic of increasing interest as states engage in gray zone conflict by employing the coast guard or paramilitary mercenaries instead of internationally recognized state militaries (Gannon, Reference Gannon2022). Second, we add information about the domains in which actors behave—whether in land, air, sea, space, or cyber—since they differ in their technology, tactics, geography, and purpose (Gartzke and Lindsay, Reference Gartzke and Lindsay2019). Doing so allows researchers to identify and explain patterns in escalation conditional on the military means states use in conflict. Recent concerns about cross-domain conflict, and the effect of new domains of conflict like space and cyber, have made this an endeavor of increased interest to practitioners (Gannon, Reference Gannon2022).

2. Methodology and data

2.1 Corpus

For our corpus, we select a set of unusually high-quality historical narratives from the ICB project (n = 471) with coverage spanning 1918–2017 (SI Appendix 1.1) (Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1997; Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016). ICB defines a crisis as meeting three conditions: (1) an actor perceives a threat to one of more of its core values, (2) the actor has a finite time horizon for responding to the perceived threat, and (3) the probability of military hostility has increased (Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982). Crises are a significant focus of detailed single case studies and case comparisons because they provide an opportunity to examine behaviors in international relations short of, or at least prior to, full conflict (Holsti, Reference Holsti1965; Paige, Reference Paige1968; Allison and Zelikow, Reference Allison and Zelikow1971; Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982; Gavin, Reference Gavin2014; Iakhnis and James, Reference Iakhnis and James2019). The corpus is also unique in that it was designed to be used in a downstream quantitative coding project, meaning each narrative was written by a small number of scholars using a uniform coding scheme where things like word choice, writing style, and level of specificity were done deliberately and consistently (Hewitt, Reference Hewitt2001). Case selection was exhaustive based on a survey of world news archives and region experts, cross-checked against other databases of war and conflict, and non-English sources (Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016; Kang and Yu-Ting Lin., Reference Kang and Lin2019, 59).

2.2 Coding process

The ICBe ontology follows a hierarchical design philosophy where a smaller number of significant decisions are made early on and then progressively refined into more specific details (Brust and Denzler, Reference Brust and Denzler2020).Footnote 5 Each coder was instructed to first thoroughly read the full crisis narrative and then presented with a custom graphical user interface (GUI) (SI Appendix 2.1). Coders then proceeded sentence by sentence, choosing the number of events (0–3) that occurred, the highest behavior (thought, speech, or action), a set of players, whether the means were primarily armed or unarmed, whether there was an increase or decrease in aggression (uncooperative/escalating or cooperative/de-escalating), and finally one or more specific and non-mutually exclusive activities. Some additional details were always collected (e.g., location and timing) while other details were only collected if appropriate (e.g., force size, fatalities, domains, units). While each event was matched to a sentence, coders could fill in details outside that sentence (e.g., antecedents to pronouns). We reviewed, standardized, and normalized where coders listed a behavior, actor, or location outside the ontology.Footnote 6

A unique feature of the ontology is that thought, speech, and do behaviors can be nested into combinations, e.g. an offer for the U.S.S.R. to remove missiles from Cuba in exchange for the U.S. removing missiles from Turkey. Through compounding, the ontology can capture what players were said to have known, learned, or said about other specific fully described actions.

No existing event data distinguishes thoughts, speeches, and actions. In fact, most only try to code actions and entirely omit thoughts and speech acts despite recognition of their importance in international politics (Smith, Reference Smith1998). Scholars have opted against coding thoughts and speech acts because of a lack of confidence the full universe could be readily observed and consequently at least theoretically be included.Footnote 7 But the perfect should not be the enemy of the good, and measurement challenges are only overcome after an initial attempt to estimate difficult-to-observe concepts of interest. The ICB narratives are one of the better sources for this endeavor due to the consistent use of high-quality primary source material that takes advantage of qualitative methods well-suited to identifying thoughts and speech acts like archival work and expert interviews.

Each crisis was typically assigned to two expert coders and two novice coders with an additional tie-breaking expert coder assigned to sentences with high disagreement.Footnote 8 For the purposes of measuring intercoder agreement and consensus, we temporarily disaggregate the unit of analysis to the Coder-Crisis-Sentence-Tag (n = 993,731), where a tag is any unique piece of information a coder can associate with a sentence such as an actor, date, behavior, etc. We then aggregate those tags into final events (n = 18,783), using a consensus procedure (SI Appendix 2.2) that requires a tag to have been chosen by at least one expert coder and either a majority of expert or novice coders. This screens noisy tags that no expert considered possible but leverages novice knowledge to tie-break between equally plausible tags chosen by experts. Requiring sentence-tag matching may underestimate agreement but minimizes the inclusion of noise and allows for additional validation. Once filtered for agreement, we find 472 actors and 119 different behaviors: 12 thought, 13 speech, and 94 actions.

3. Performance comparison

3.1 Internal consistency

We evaluate the internal validity of the coding process in several ways. For every tag applied we calculate the observed intercoder agreement as the percent of other coders who also applied that same tag (SI Appendix 2.3). Across all concepts, the Top 1 Tag Agreement was low among novices (31 percent), moderate for experts (65 percent), and high (73 percent) following the consensus screening procedure.

We attribute the remaining disagreement primarily to three sources. First, we required coders to rate and justify their confidence in the coding. They reported low confidence for 20 percent of sentences; 45 percent of those were due to a mismatch between the ontology and the text (“survey doesn't fit event”) and 46 percent were from a lack of information or confused writing in the source text (40 percent “more knowledge needed,” 6 percent “confusing sentence”). Observed disagreement varied predictably with self-reported confidence (SI Appendix 2.4). Second, as intended, agreement is higher (75–80 percent) for questions with fewer options near the root of the ontology compared to agreement for questions near the leaves of the ontology (50–60 percent). Third, individual coders exhibit nontrivial coding styles, e.g. some more expressive coders applied many tags per concept while others focused on only the single best match. We further observed unintended synonymity, e.g. the same information can be framed as either a threat to do something or a promise not to do something.

3.2 Improvement over existing efforts

To evaluate our coding process relative to existing datasets, we measure the recall and precision of ICBe events in absolute terms and relative to other existing systems. Recall measures the share of desired information recovered by a sequence of coded events while precision measures the degree to which a sequence of events correctly and usefully describes the information in history. To aid in subjective evaluation of the precision and recall of ICBe for each event, we provide full ICB narratives, ICBe coding in an easy-to-read iconographic form, and a wide range of visualizations for every case on the companion website.

Recall for historical episodes is poorly defined for two reasons. History may or may not be written by the victors but by virtue of being written by someone there is no genuine ground truth about what occurred, only surviving texts about it (Turberville, Reference Turberville1933). Second, there is no a priori guide to what information is necessary detail and what is ignorable trivia. History suffers from what is known as the Coastline Paradox (Mandelbrot, Reference Mandelbrot1983)—it has a fractal dimension greater than one such that the more you zoom in, the more detail you will find about individual events as well as in between any two discrete events. The ICBe ontology is a proposal about what information is important, but we need an independent benchmark to evaluate whether that proposal is a good one and that allows for comparing proposals from event projects that had different goals. We need a yardstick for history.

Our strategy for dealing with both problems is a plausibly objective yardstick called a synthetic historical narrative. We collect a large diverse corpus of narratives spanning timelines, encyclopedia entries, journal articles, news reports, websites, and government documents. Using natural language processing (fully described in SI Appendix 3.1), we identify details that appear across multiple accounts. A detail refers to the smallest textual unit for which we can calculate similarity across corpora to identify whether sentences semantically refer to the same broader observed event (Narayan et al., Reference Narayan, Cohen and Lapata2018). The more accounts that mention a detail, the more central it is to understanding the true historical episode. The theoretical motivation is that authors face word limits which force them to pick and choose which details to include, and they choose details that serve the specific context of the document they are producing. With a sufficiently large and diverse corpus of documents, we can vary the context while holding the overall episode constant and see which details tend to be invariant to context. Sufficiently similar details were binned together and then summarized so they could be compared to the coding in ICBe. This presents a harder evaluation baseline than comparing ICBe's recall to just that of ICB since there are non-crisis aspects of these events that may be included in other narratives but are out of the scope of our data. For example, the nationalization of businesses in Cuba may be included as important context in the Cuban Missile Crisis in documents that do not focus on the crisis dimensions like ICB. Using this hard case, a recall measure of ICBe on the synthetic narratives thus serves as a way to evaluate the breadth of ICBe's ontology and potential application to non-crisis international events.

We find substantive variation in recall across existing state of the art methods. Mentions of a detail across accounts are exponentially distributed with context-invariant details appearing dozens to hundreds of times more than context-dependent details.Footnote 9 Furthermore, crisis start and stop dates are arbitrary, and the historical record points to many precursor events as necessary detail for understanding later events. Figure 2 compares ICBe's recall with that of existing datasets for the two case studies detailed in Section 4. ICBe strictly dominates all of the systems but ICEWs in recall though we note that the small sample sizes mean these systems should be considered statistically indistinguishable. Across all existing datasets and ICBe, recall increases with the number of document mentions which is an important sign of validity for both them and our benchmark. The one outlier is Phoenix which in the Cuban Missile Crisis case is so noisy that its recall curve is flat to decreasing as mentions increase. The two episode-level datasets (MIDs and ICM) have low coverage of contextual details. The two other dictionary systems ICEWs and Terrier have higher coverage, with ICEWs outperforming Terrier. Importantly our corpus of ICB narratives has high recall of frequently mentioned details giving us confidence in how those summaries were constructed, and ICBe lags only slightly behind showing that it left little additional information on the table.Footnote 10

Figure 2. Recall comparison of two cases across existing state of the art efforts. Higher y-axis values represent higher recall and higher x-axis values represent number of times that detail is mentioned across the full corpus used to construct the synthetic narrative.

The second component of event measurement validation is precision. It does little good to recall a historical event but too vaguely (e.g., MIDs describes the Cuban Missile Crisis as a blockade, a show of force, and a stalemate) or with too much error to be useful for downstream applications (e.g., ICEWS records 263 “Detonate Nuclear Weapons” events between 1995 and 2019). ICBe's ontology and coding system are designed to strike a balance so that the most important information is recovered accurately but also abstracted to a level that is still useful and interpretable.

We demonstrate ICBe's precision in a number of different ways. First, we develop the iconography system for presenting event codings as coherent statements that can be compared side by side to the original source narrative for every case on the companion website. We further provide a stratified sample of event codings alongside their source text (SI Appendix 4.2). We find both the visualizations of macrostructure and head-to-head comparisons of ICBe codings to the raw text to strongly support the quality of ICBe. Second, we develop a visualization we call a crisis map, a directed graph intersected with a timeline. A researcher should be able to lay out the events of a crisis on a timeline and read off the macrostructure of an episode from each individual move. A crisis map using ICBe for the Cuban Missile Crisis case study is provided in Figure 5, crisis maps for the two case studies using existing event datasets can be found in SI Appendix 4.3 and 4.4, and crisis maps for all crises using all datasets can be found on the companion website. The crisis maps reveal episode-level datasets like MIDs or the original ICB are too sparse and vague to reconstruct the structure of the crisis (SI Appendix 4.3 and 4.4). On the other end of the spectrum, the high recall dictionary-based event datasets like Terrier and ICEWs produce so many noisy events (several hundred thousand) that even with heavy filtering their crisis maps are completely unintelligible. Further, because of copyright issues, none of these datasets directly provide the original text spans making event-level precision difficult to verify.

We further want to automatically verify the precision of individual ICBe event codings, which we can do in the case of ICBe because each event is mapped to a specific span of text. Our proposed measure is a reconstruction task to see whether our intended ontology can be recovered through only unsupervised clustering of sentences they were applied to. Figure 3 shows the location of every sentence from the ICBe corpus in semantic space, as embedded using the same large language model as before, and the median location of each ICBe event tag applied to those sentences.Footnote 11 Labels reflect the individual leaves of the ontology and colors reflect the higher level coarse branch nodes of the ontology. If ICBe has high precision, substantively similar tags ought to have been applied to substantively similar source text, which is what we see both in two dimensions in the main plot and via hierarchical clustering on all dimensions in the dendrogram along the right-hand side.Footnote 12

Figure 3. Computational evaluation of the precision of ICBe event codings. The plot on the left is a map of the semantic meaning of every sentence in the corpus (black points) as assigned by a large language model (Paraphrase-MPNET-base-v2) and projected down into two dimensions (UMAP). Overlaid are the median semantic locations of each label assigned by ICBe coders (colored labels). The labels with similar meaning are assigned to sentences with similar semantic meaning, creating an observable structure and pattern we would not observe with low-quality coding where tag location would instead appear random. The plot on the right shows a hierarchical dendrogram clustering labels into groups by their average semantic location with more similar labels being more closely connected on the tree. The clustering by color indicates it closely mirrors the intended ICBe ontology, suggesting high precision in the coding.

4. Case illustrations

In this section, we focus our validation on two case studies for which we have produced synthetic narratives using the method described in Section 3.2. The first is the Cuban Missile Crisis which took place primarily in the second half of 1962, involved the United States, the Soviet Union, and Cuba, and is widely known for bringing the world to the brink of nuclear war (Figure 1). The second is the Crimea-Donbas Crisis which took place primarily in 2014, involved Russia, Ukraine, and NATO, and within a decade spiraled into a full-scale invasion (SI Appendix 4.1). We choose these cases because they are significant in contemporary international relations, are widely known across academic disciplines as well as among the public, and are sufficiently brief to evaluate in depth. They are similar in that both cases involve a superpower in crisis with a neighbor that changed from a friendly to a hostile regime, both held implications for the economic and military security for the superpower by risking full-scale invasion, and both eventually invited intervention by an opposing superpower.

4.1 Cuban Missile Crisis (1962)

A synthetic historical narrative for the Cuban Missile Crisis appears in Figure 4, with 51 events drawn from 2,020 documents. Each row represents a detail that appeared in at least five documents along with an approximate start date, a handwritten summary, the number of documents it was mentioned in, and whether it could be identified in the text of the original ICB corpus, our ICBe events, and any of the competing existing models.

Figure 4. Synthetic narratives combine several thousand accounts of each crisis into a single timeline of events, taking only those mentioned in at least 5 or more documents. Checkmarks represent whether that event could be hand matched to any detail in the ICB corpus, ICBe dataset, or any of the other event datasets (SI Appendix 3.2 and 3.3).

ICBe's improved recall of the Cuban Missile Crisis relative to the state of the art was summarized in Section 3.2, but the events that explain that improvement can now be seen. Our ground truth ICB narrative contains 17/51 of the events from the synthetic narrative of a case that includes high-level previously classified details. ICBe captures nearly all details included in ICB as well as more details from the synthetic narrative than any competing dataset. Phoenix includes some earlier information than ICBe like the nationalization of businesses and back channel negotiations, but the crisis narrative has a clean canonical end with the Soviets agreeing to withdraw missiles. ICBe stands out in including more communicative behavior (do–speech) than existing datasets like US threats to attack and later promises not to invade. Given the recognized importance of threat credibility for understanding international conflict, the addition of this information is a substantively important improvement over the existing state of the art (Slantchev, Reference Slantchev2011).

Figure 5 shows the crisis map for the Cuban Missile Crisis. Looking at the crisis on a timeline, one can now identify the structure of actors and the environment, along with its supporting details, in a way that validates the precision of ICBe. Although harder to measure objectively, this crisis map provides face validity that ICBe's account is not too vague, but also not unnecessarily detailed. We include much of the geopolitically important details like Soviet deployment, US discovery of that deployment, heightened alert levels, a blockade, and negotiations that ended with a formal agreement. At the same time, the crisis map indicates that ICBe does not include unnecessary nuances that preclude useful comparison to other international events.

Figure 5. Crisis map for the Cuban Missile Crisis. The start of the crisis is at the top and end of the crisis is at the bottom, with each actor in a column with labeled points identifying their speeches, actions, and thoughts.

4.2 Crimea-Donbas (2014)

A synthetic historical narrative for the 2014 Crimea-Donbas Crisis (30 events drawn from 971 documents) appears in Figure 6. As in the earlier case, rows represent details that appeared in at least five documents and whether it is identified in ICBe and existing datasets.

Figure 6. Synthetic narratives combine several thousand accounts of each crisis into a single timeline of events, taking only those mentioned in at least five or more documents. Checkmarks represent whether that event could be hand matched to any detail in the ICB corpus, ICBe dataset, or any of the other event datasets (SI Appendix 3.2 and 3.3).

Again quantitatively summarized earlier in Section 3.2 (Figure 2), our ground truth ICB narrative contains 23/30 of the events from the synthetic narrative. Like the gray zone precursor to the Cuban Missile Crisis (Cormac and Aldrich, Reference Cormac and Aldrich2018), Ukraine provided several security guarantees to Russia that were potentially undone, e.g. a long-term lease on naval facilities in Crimea. But unlike the Cuban Missile Crisis, the end of this crisis is unclear, with the event meekly ending with a second cease-fire agreement (Minsk II) but continued fighting. ICBe again recalls more important information about the crisis than any existing dataset, particularly information concerning the behavior of non-state separatist groups like the Donetsk People's Republic (DPR) and Luhansk People's Republic (LPR).

As this more recent case reflects primarily public reporting rather than the previously classified details relevant for the Cuban Missile Crisis, ICBe's improvement relative to the global and real-time coverage of dictionary-based event systems is still present, but less pronounced. We want to take seriously the possibility that some functional transformation could recover the precision of ICBe. For example, Terechshenko (Reference Terechshenko2020) attempts to correct for the mechanically increasing amount of news coverage each year by de-trending violent event counts from Phoenix using a human-coded baseline. Others have focused on verifying precision for ICEWs on specific subsets of details against known ground truths, e.g. geolocation (Cook and Weidmann, Reference Cook and Weidmann2019), protest events (80 percent) (Wüest and Lorenzini, Reference Wüest, Lorenzini, Kriesi, Lorenzini, Wüest and Hausermann2020), and anti-government protest networks (46.1 percent) (Jäger, Reference Jäger2018).

We take the same approach here in Figure 7, selecting four specific CAMEO event codings and checking how often they reflect a true real-world event from the Crimea-Donbas synthetic narrative. We choose four event types around key moments in the crisis. The start of the crisis revolves around Ukraine backing out of a trade deal with the EU in favor of Russia, but “sign formal agreement” events act more like a topic detector with dozens of events generated by discussions of a possible agreement but not the actual agreement which never materialized. The switch is caught by the “reject plan, agreement to settle dispute” event type, but also continues for Viktor Yanukovych even after he was removed from power because of articles retroactively discussing the cause of his removal. Events for “use conventional military force” capture a threshold around the start of hostilities and who the participants were but not any particular battles or campaigns. Likewise, “impose embargo, boycott, or sanctions” captures the start of waves of sanctions and from who but are effectively constant as the news coverage does not distinguish between subtle changes or additions. In sum, dictionary-based methods on news corpora tend to have high recall because they parse everything in the news, but for the same reason, their specificity for most event types is too low to back out individual chess-like sequencing that ICBe aims to record.

Figure 7. The unit of analysis is the dyad-day. Top 10 most active dyads per category shown. Red text shows events from the synthetic narrative relative to that event category. Blue bars indicate an event recorded by ICEWs for that dyad on that day.

5. Conclusion

The scope and complexity of international politics should not discourage the identification of trends, patterns, and regularities. In undertaking event abstraction from narratives about key historical episodes in international relations, this paper has proposed a mapping between unstructured historical records and a structured ontology of these events with high coverage of concepts of interest. Multiple validity checks find the resulting codings have high internal validity (e.g., intercoder agreement) and external validity (i.e., matching source material in both micro-details at the sentence level and macro-details spanning full historical episodes). Further, these codings perform much better in terms of recall, precision, coverage, and overall coherence in capturing these historical episodes than existing event systems used in international relations.

These data, along with the open-source code, documentation, and companion website provide several substantive and methodological contributions to the discipline. Substantively, these data are appropriate for statistical analysis of hard questions in the study of crises like interactions between means of warfare and the preconditions for conflict escalation (Gannon, Reference Gannon2022). Methodologically, our mapping from codings to source text at the sentence level provide a new resource for natural language processing with access to coder-level disaggregation that furthers the study of uncertainty in the interpretation of international events and in the quantitative coding of historical events. Finally, we provide a companion website (crisisevents.org) that incorporates detailed visualizations of all the data introduced here as a new resource for the study of international crises in a scalable, yet detailed, manner.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2024.17. To obtain replication material for this article, https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FMNVUEP&version=DRAFT

Data

This article's data, supplementary appendix, replication material, and visualizations of every historical episode are available on the GitHub repository ICBEventData and through the companion website crisisevents.org.

Acknowledgements

We thank the ICB Project and its directors and contributors for their foundational work and their help with this effort. We make special acknowledgement of Michael Brecher for helping found the ICB project in 1975, creating a resource that continues to spark new insights to this day. We thank the many undergraduate coders. Thanks to the Center for Peace and Security Studies and its membership for comments. Special thanks to Rebecca Cordell, Philip Schrodt, Zachary Steinert-Threlkeld, and Zhanna Terechshenko for their generous feedback. Thank you to the cPASS research assistants: Helen Chung, Daman Heer, Syeda ShahBano Ijaz, Anthony Limon, Erin Ling, Ari Michelson, Prithviraj Pahwa, Gianna Pedro, Tobias Stodiek, Yiyi ‘Effie’ Sun, Erin Werner, Lisa Yen, and Ruixuan Zhang.

Author contributions

Conceptualization: R. W. D., E. G., J. L.; methodology: R. W. D., T. L. S.; software: R. W. D.; validation: R. W. D., T. L. S.; formal analysis: R. W. D., T. L. S.; investigation: S. C., R. W. D., J. A. G., C. A., N. L., E. M., J. M. C. N., D. P., D. M. Q., J. W.; data curation: R. W. D., D. M. Q., T. L. S., J. W.; writing — original draft: R. W. D., T. L. S.; writing — review and editing: R. W. D., J. A. G., E. G., T. L. S.; visualization: R. W. D., T. L. S.; supervision: E. G.; project administration: S. C., R. W. D., J. A. G., D. M. Q., T. L. S., J. W.; funding acquisition: E. G., J. L.

Financial support

This work was supported by a grant from the Office of Naval Research N00014-19-1-2491 and from the Charles Koch Foundation 20180481. The financial sponsors played no role in the design, execution, analysis and interpretation of data, or writing of the study.

Competing interest

The authors declare that there are no competing interests.

Footnotes

1 See work on crises (Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982; Beardsley et al., Reference Beardsley, James, Wilkenfeld and Brecher2020), militarized disputes (Gibler, Reference Gibler2018; Palmer et al., Reference Palmer, McManus, D'Orazio, Kenwick, Karstens, Bloch, Dietrich, Kahn, Ritter and Soules2022), wars (Reiter et al., Reference Reiter, Stam and Horowitz2016), organized violence (Sundberg and Croicu, Reference Sundberg and Croicu2016; Davies et al., Reference Davies, Pettersson and Öberg2022), political violence (Raleigh et al., Reference Raleigh, Linke, Hegre and Karlsen2010), sanctions (Felbermayr et al., Reference Felbermayr, Kirilakha, Syropoulos, Yalcin and Yotov2020), and international agreements (Kinne, Reference Kinne2020; Owsiak et al., Reference Owsiak, Cuttner and Buck2018), dispute resolution (Frederick et al., Reference Frederick, Hensel and Macaulay2017), and diplomacy (Moyer et al., Reference Moyer, Turner and Meisel2021; Sechser, Reference Sechser2011).

3 Other related datasets that insufficiently overlap ICBe's domain for comparison include BCOW (Leng and Singer, Reference Leng and Singer1988), WEIS (McClelland, Reference McClelland1978), CREON (Hermann, Reference Hermann1984), CASCON (Bloomfield and Moulton., Reference Bloomfield and Moulton1989), SHERFACS (Sherman, Reference Sherman2000), Real-Time Phoenix (Brandt et al., Reference Brandt, DOrazio, Holmes, Khan and Ng2018), and COfEE (Balali et al., Reference Balali, Asadpour and Jafari2021) (see histories in Merritt, Reference Merritt1994 and Schrodt and Hall, Reference Schrodt and Hall2006).

4 See Balali et al. (Reference Balali, Asadpour and Jafari2021) for a recent review of ontological depth and availability of Gold Standard example text.

5 This process quickly focuses the coder on a smaller number of relevant options while also allowing them to apply multiple tags if the sentence explicitly includes more than one or there is insufficient evidence to choose only one tag. The guided coding process also allows for the possibility that earlier coarse decisions have less error than later fine-grained decisions.

6 See the full codebook on Github Repository ICBEventData.

7 Even the coding of overt actions like MIDs is not without contention (Gibler, Reference Gibler2018).

8 Expert coders were graduate students or postgraduates who collaboratively developed the ontology and documentation for the codebook. Undergraduate coders were students who engaged in classroom workshops.

9 As the ICB narratives are intended to explain conflictual behavior in a political context, many of the missing events concern more economic components of conflict (eg. nationalizing a foreign business). Even when they occur in the context of a crisis, these events largely fall outside the sample of information on which ICBe's ontology is currently trained. Even with this limitation, ICBe is more comprehensive than the existing datasets that do try to code the economic dimensions of these crises. We see expanding the ontology to broader international phenomenon as a promising future implementation of our model.

10 Although Figure 2 focuses only on two crises, the synthetic narrative approach and recall comparison can, and should, be more broadly applied to all international crises in a way that could reveal systematic blindspots across datasets.

11 We preprocess sentences to replace named entities with a generic Entity token.

12 Hierarchical clustering on cosine similarity and with Ward's method.

References

Allen, MA, Flynn, ME and Machain, CM (2022) US global military deployments, 1950–2020. Conflict Management and Peace Science 39(3), 351370. https://doi.org/10.1177/07388942211030885.CrossRefGoogle Scholar
Allison, GT and Zelikow, P (1971) Essence of Decision: Explaining the Cuban Missile Crisis. Vol. 327. Boston: Little, Brown, and Co.Google Scholar
Althaus, S, Bajjalieh, J, Carter, JF, Peyton, B and Shalmon, DA (2019) Cline Center Historical Phoenix Event Data Variable Descriptions. Cline Center Historical Phoenix Event Data.Google Scholar
Balali, A, Asadpour, M and Jafari, SH (2021) COfEE: a Comprehensive Ontology for Event Extraction from Text. arXiv. https://doi.org/10.48550/arXiv.2107.10326.CrossRefGoogle Scholar
Beardsley, K (2011) The Mediation Dilemma. Ithaca and London: Cornell University Press.CrossRefGoogle Scholar
Beardsley, K, James, P, Wilkenfeld, J and Brecher, M (2020) The International Crisis Behavior Project. Oxford Research Encyclopedia of Politics. https://oxfordre.com/politics/view/10.1093/acrefore/9780190228637.001.0001/acrefore-9780190228637-e-1638. https://doi.org/10.1093/acrefore/9780190228637.013.1638.CrossRefGoogle Scholar
Beger, A, Morgan, RK and Ward, MD (2021) Reassessing the role of theory and machine learning in forecasting civil conflict. Journal of Conflict Resolution 65, 14051426. https://doi.org/10.1177/0022002720982358CrossRefGoogle Scholar
Beieler, J, Brandt, PT, Halterman, A, Schrodt, PA, Simpson, EM and Alvarez, R (2016) Generating political event data in near real time: opportunities and challenges. In Alvarez, RM (ed.), Computational Social Science Discovery and Prediction. Cambridge: Cambridge University Press, pp. 98120.CrossRefGoogle Scholar
Ben-Yehuda, H and MishaliRam, M (2006) Ethnic actors and international crises: theory and findings, 1918–2001. International Interactions 32, 4978.CrossRefGoogle Scholar
Bloomfield, LP and Moulton, A (1989) CASCON III: Computer-aided System for Analysis of Local Conflicts. MIT Center for International Studies, Cambridge.Google Scholar
Boschee, E, Lautenschlager, J, O'Brien, S, Shellman, S, Starz, J and Ward, M (2015) ICEWS Coded Event Data. Harvard Dataverse 12.Google Scholar
Brandt, PT, DOrazio, V, Holmes, J, Khan, L and Ng, V (2018) Phoenix Real-Time Event Data.Google Scholar
Brecher, M (1999) International studies in the twentieth century and beyond: flawed dichotomies, synthesis, cumulation: ISA presidential address. International Studies Quarterly 43, 213264.CrossRefGoogle Scholar
Brecher, M and Wilkenfeld, J (1982) Crises in world politics. World Politics 34, 380417. https://doi.org/10.2307/2010324CrossRefGoogle Scholar
Brecher, M and Wilkenfeld, J (1997) A Study of Crisis. Ann Arbor, MI: University of Michigan Press.CrossRefGoogle Scholar
Brecher, M, Wilkenfeld, J, Beardsley, KC, James, P and Quinn, D (2021) International Crisis Behavior Data Codebook. Codebook Version 14.Google Scholar
Brust, C-A and Denzler, J (2020) Integrating domain knowledge: using hierarchies to improve deep classifiers. arXiv:1811.07125 [Cs], January. https://arxiv.org/abs/1811.07125.CrossRefGoogle Scholar
Bush, SS and Hadden, J (2019) Density and decline in the founding of international NGOs in the United States. International Studies Quarterly 63, 1133–46. https://doi.org/10.1093/isq/sqz061CrossRefGoogle Scholar
Carafano, JJ (2014) Measuring military power. Strategic Studies Quarterly 8, 1118. https://www.jstor.org/stable/26270616Google Scholar
Carter, DB (2010) The strategy of territorial conflict. American Journal of Political Science 54, 969987. https://doi.org/10.1111/j.1540-5907.2010.00471.xCrossRefGoogle Scholar
Chenoweth, E, Hendrix, CS and Hunter, K (2019) Introducing the nonviolent action in violent contexts (NVAVC) dataset. Journal of Peace Research 56, 295305. https://doi.org/10.1177/0022343318804855CrossRefGoogle Scholar
Cook, SJ and Weidmann, NB (2019) Lost in aggregation: improving event analysis with report-level data. American Journal of Political Science 63, 250264.CrossRefGoogle Scholar
Cormac, R and Aldrich, RJ (2018) Grey is the new black: covert action and implausible deniability. International Affairs 94, 477494. https://doi.org/10.1093/ia/iiy067CrossRefGoogle Scholar
Davies, S, Pettersson, T and Öberg, M (2022) Organized violence 1989–2021 and drone warfare. Journal of Peace Research 59, 593610.CrossRefGoogle Scholar
Eck, K and Hultman, L (2007) One-sided violence against civilians in war: insights from new fatality data. Journal of Peace Research 44, 233246. https://doi.org/10.1177/0022343307075124CrossRefGoogle Scholar
Fazal, TM (2011) State Death: The Politics and Geography of Conquest, Occupation, and Annexation. Princeton, NJ: Princeton University Press.Google Scholar
Felbermayr, G, Kirilakha, A, Syropoulos, C, Yalcin, E and Yotov, YV (2020) The global sanctions data base. European Economic Review 129, 103561. https://doi.org/10.1016/j.euroecorev.2020.103561CrossRefGoogle Scholar
Fortna, VP (2018) Peace Time. Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Frederick, BA, Hensel, PR and Macaulay, C (2017) The issue correlates of war territorial claims data, 1816–20011. Journal of Peace Research 54, 99108. https://doi.org/10.1177/0022343316676311CrossRefGoogle Scholar
Gannon, JA (2022) One if by land, and two if by sea: cross-domain contests and the escalation of international crises. International Studies Quarterly 66(4), sqac065. https://doi.org/10.1093/isq/sqac065CrossRefGoogle Scholar
Gannon, JA, Gartzke, E, Lindsay, JR and Schram, P (2024) The shadow of deterrence: why capable actors engage in contests short of war. Journal of Conflict Resolution 68(2-3), 230268.CrossRefGoogle Scholar
Gartzke, E and Lindsay, JR (2019) Cross-Domain Deterrence: Strategy in an Era of Complexity. Oxford: Oxford University Press.CrossRefGoogle Scholar
Gavin, FJ (2014) History, security studies, and the july crisis. Journal of Strategic Studies 37, 319331. https://doi.org/10.1080/01402390.2014.912916CrossRefGoogle Scholar
Gibler, DM (2018) International Conflicts, 1816–2010: Militarized Interstate Dispute Narratives. Lanham, MD: Rowman & Littlefield.Google Scholar
Gibler, DM and Sarkees, MR (2004) Measuring alliances: the correlates of war formal interstate alliance dataset, 1816–2000. Journal of Peace Research 41, 211222. https://doi.org/10.1177/0022343304041061CrossRefGoogle Scholar
Glaser, CL (2000) The causes and consequences of arms races. Annual Review of Political Science 3, 251276. https://doi.org/10.1146/annurev.polisci.3.1.251CrossRefGoogle Scholar
Goemans, HE, Gleditsch, KS and Chiozza, G (2009) Introducing archigos: a dataset of political leaders. Journal of Peace Research 46, 269283.CrossRefGoogle Scholar
Goertz, G and Diehl, PF (1986) Measuring military allocations: a comparison of different approaches. Journal of Conflict Resolution 30, 553581. https://doi.org/10.1177/0022002786030003009CrossRefGoogle Scholar
Goldgeier, J and Tetlock, P (2001) Psychology and international relations theory. Annual Review of Political Science 4, 6792. https://doi.org/10.1146/annurev.polisci.4.1.67CrossRefGoogle Scholar
Grant, C, Halterman, A, Irvine, J, Liang, Y and Jabr, K (2017) OU Event Data Project, December.Google Scholar
Haffar, W (2002) Emergent peacemakers: cataloguing new patterns of activity in post-cold war conflict. Peace Economics, Peace Science and Public Policy 8(2), 142. https://doi.org/10.2202/1554-8597.1054CrossRefGoogle Scholar
Hermann, C (1984) Comparative Research on the Events of Nations (CREON) Project: foreign policy events, 1959–1968: Version 1. ICPSR - Interuniversity Consortium for Political and Social Research. https://doi.org/10.3886/ICPSR05205.V1.CrossRefGoogle Scholar
Hewitt, J (2001) Engaging international data in the classroom: using the ICB interactive data library to teach conflict and crisis analysis. International Studies Perspectives 2, 371383. https://doi.org/10.1111/1528-3577.00066CrossRefGoogle Scholar
Holsti, OR (1965) The 1914 case. The American Political Science Review 59, 365378. https://doi.org/10.2307/1953055CrossRefGoogle Scholar
Hsu, A, Höhne, N, Kuramochi, T, Vilariño, V and Sovacool, BK (2020) Beyond states: harnessing sub-national actors for the deep decarbonisation of cities, regions, and businesses. Energy Research & Social Science 70, 101738. https://doi.org/10.1016/j.erss.2020.101738CrossRefGoogle Scholar
Iakhnis, E and James, P (2019) Near crises in world politics: a new dataset. Conflict Management and Peace Science 38(2), 224243. https://doi.org/10.1177/0738894219855610CrossRefGoogle Scholar
Jäger, K (2018) The limits of studying networks with event data: evidence from the ICEWS dataset. Journal of Global Security Studies 3, 498511.CrossRefGoogle Scholar
Jervis, R (1978) Cooperation under the security dilemma. World Politics 30, 167214. https://doi.org/10.2307/2009958CrossRefGoogle Scholar
Kang, DC and Lin, AY-T (2019) US bias in the study of Asian security: using Europe to study Asia. Journal of Global Security Studies 4, 393401.CrossRefGoogle Scholar
Kinne, BJ (2020) The defense cooperation agreement dataset (DCAD). Journal of Conflict Resolution 64, 729755. https://doi.org/10.1177/0022002719857796CrossRefGoogle Scholar
Lacina, B (2006) Explaining the severity of civil wars. Journal of Conflict Resolution 50, 276289. https://doi.org/10.1177/0022002705284828CrossRefGoogle Scholar
LaFree, G and Dugan, L (2007) Introducing the global terrorism database. Terrorism and Political Violence 19, 181204.CrossRefGoogle Scholar
Lai, B (2004) The effects of different types of military mobilization on the outcome of international crises. Journal of Conflict Resolution 48, 211229.CrossRefGoogle Scholar
Leeds, BA (1999) 2003. Alliance reliability in times of war: explaining state decisions to violate treaties. International Organization 57, 801827. https://doi.org/10.1017/S0020818303574057CrossRefGoogle Scholar
Leeds, BA (1999) Domestic political institutions, credible commitments, and international cooperation. American Journal of Political Science 43, 9791002. https://doi.org/10.2307/2991814CrossRefGoogle Scholar
Leng, RJ and Singer, J (1988) Militarized interstate crises: the BCOW typology and its applications. International Studies Quarterly 32, 155173. https://doi.org/10.2307/2600625CrossRefGoogle Scholar
Li, Q, Peng, H, Li, J, Hei, Y, Sun, R, Sheng, J and Guo, S (2021) A comprehensive survey on schema-based event extraction with deep learning. arXiv:2107.02126 [Cs], August. https://arxiv.org/abs/2107.02126.Google Scholar
Lindsay, JR and Gartzke, E (2020) Politics by many other means: the comparative strategic advantages of operational domains. Journal of Strategic Studies 0, 134. https://doi.org/10.1080/01402390.2020.1768372Google Scholar
Lupton, DL (2018) Reexamining reputation for resolve: leaders, states, and the onset of international crises. Journal of Global Security Studies 3, 198216. https://doi.org/10.1093/jogss/ogy004CrossRefGoogle Scholar
Mandelbrot, BB (1983) The fractal geometry of nature. New York: Freeman.CrossRefGoogle Scholar
McClelland, C (1978) World Event/Interaction Survey, 1966–1978. WEIS Codebook ICPSR 5211.Google Scholar
McNabb Cochran, K and Long, SB (2017) Measuring military effectiveness: calculating casualty loss-exchange ratios for multilateral wars, 1816–1990. International Interactions 43, 10191040. https://doi.org/10.1080/03050629.2017.1273914CrossRefGoogle Scholar
Merritt, RL (1994) Measuring events for international political analysis. International Interactions 20, 333.CrossRefGoogle Scholar
Miller, GA (1995) WordNet: a lexical database for English. Communications of the ACM 38, 3941. https://doi.org/10.1145/219717.219748CrossRefGoogle Scholar
Min, E (2021) Interstate war battle dataset (1823–2003). Journal of Peace Research 58, 294303. https://doi.org/10.1177/0022343320913305CrossRefGoogle Scholar
Moyer, JD, Turner, SD and Meisel, CJ (2021) What are the drivers of diplomacy? Introducing and testing new annual dyadic data measuring diplomatic exchange. Journal of Peace Research 58(6), 13001310. https://doi.org/10.1177/0022343320929740CrossRefGoogle Scholar
Narang, V and Talmadge, C (2018) Civil-military pathologies and defeat in war: tests using new data. Journal of Conflict Resolution 62(7), 13791405. https://doi.org/10.1177/0022002716684627CrossRefGoogle Scholar
Narayan, S, Cohen, SB and Lapata, M (2018) Don't give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. arXiv:1808.08745.Google Scholar
O'Neill, B (2018) International negotiation: some conceptual developments. Annual Review of Political Science 21, 515533. https://doi.org/10.1146/annurev-polisci-031416-092909CrossRefGoogle Scholar
Owsiak, AP, Cuttner, AK and Buck, B (2018) The international border agreements dataset. Conflict Management and Peace Science 35, 559576. https://doi.org/10.1177/0738894216646978CrossRefGoogle Scholar
Paige, GD (1968) The Korean Decision, June 24–30, 1950. New York, NY: Free Press.Google Scholar
Palmer, G, McManus, RW, D'Orazio, V, Kenwick, MR, Karstens, M, Bloch, C, Dietrich, N, Kahn, K, Ritter, K and Soules, MJ (2022) The MID5 dataset, 2011–2014: procedures, coding rules, and description. Conflict Management and Peace Science 39(4), 470482. https://doi.org/10.1177/0738894221995743CrossRefGoogle Scholar
Powell, R (2002) Bargaining theory and international conflict. Annual Review of Political Science 5, 130. https://doi.org/10.1146/annurev.polisci.5.092601.141138CrossRefGoogle Scholar
Powell, JM and Thyne, CL (2011) Global instances of coups from 1950 to 2010: a new dataset. Journal of Peace Research 48, 249259. https://doi.org/10.1177/0022343310397436CrossRefGoogle Scholar
Quinn, D, Wilkenfeld, J, Smarick, K and Asal, V (2006) Power play: mediation in symmetric and asymmetric international crises. International Interactions 32, 441470. https://doi.org/10.1080/03050620601011107CrossRefGoogle Scholar
Raleigh, C, Linke, A, Hegre, H and Karlsen, J (2010) Introducing ACLED: an armed conflict location and event dataset: special data feature. Journal of Peace Research 47, 651660.CrossRefGoogle Scholar
Ramsay, KW (2017) Information, uncertainty, and war. Annual Review of Political Science 20, 505527. https://doi.org/10.1146/annurev-polisci-051215-022729CrossRefGoogle Scholar
Reiter, D (2015) Should we leave behind the subfield of international relations?. Annual Review of Political Science 18, 481499. https://doi.org/10.1146/annurev-polisci-053013-041156CrossRefGoogle Scholar
Reiter, D, Stam, AC and Horowitz, MC (2016) A revised look at interstate wars, 1816–2007. Journal of Conflict Resolution 60, 956–76. https://doi.org/10.1177/0022002714553107CrossRefGoogle Scholar
Sarkees, MReid and Wayman, F (2010) Resort to War: 1816–2007. Washington, DC: CQ Press.CrossRefGoogle Scholar
Schrodt, PA and Hall, B (2006) Twenty years of the Kansas event data system project. The Political Methodologist 14, 28.Google Scholar
Sechser, TS (2011) Militarized compellent threats, 1918–2001. Conflict Management and Peace Science 28, 377401. https://doi.org/10.1177/0738894211413066CrossRefGoogle Scholar
Sherman, FL (2000) SHERFACS: a cross-paradigm, hierarchical, and contextually-sensitive international conflict dataset, 1937–1985: Version 1. ICPSR – Interuniversity Consortium for Political and Social Research. https://doi.org/10.3886/ICPSR02292.V1.CrossRefGoogle Scholar
Slantchev, BL (2011) Military Threats: The Costs of Coercion and the Price of Peace. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Smith, A (1998) International crises and domestic politics. American Political Science Review 92, 623638. https://doi.org/10.2307/2585485CrossRefGoogle Scholar
Spruyt, H (1996) The Sovereign State and Its Competitors: An Analysis of Systems Change. Princeton, NJ: Princeton University Press.Google Scholar
Stein, AA and Russett, BM (1980) Evaluating war: outcomes and consequences. In Handbook of Political Conflict: Theory and Research, 399–422. Free Press New York.Google Scholar
Steinert-Threlkeld, ZC (2019) The future of event data is images. Sociological Methodology 49, 6875. https://doi.org/10.1177/0081175019860238CrossRefGoogle Scholar
Sullivan, PL (2007) War aims and war outcomes: why powerful states lose limited wars. Journal of Conflict Resolution 51, 496524. https://doi.org/10.1177/0022002707300187CrossRefGoogle Scholar
Sundberg, R and Croicu, M (2016) UCDP GED Codebook Version 5.0. Department of Peace and Conflict Research, Uppsala University.Google Scholar
Sundberg, R and Melander, E (2013) Introducing the UCDP georeferenced event dataset. Journal of Peace Research 50, 523532.CrossRefGoogle Scholar
Terechshenko, Z (2020) Hot under the collar: a latent measure of interstate hostility. Journal of Peace Research 57, 764776. https://doi.org/10.1177/0022343320962546CrossRefGoogle Scholar
Trager, RF (2016) The diplomacy of war and peace. Annual Review of Political Science 19, 205228. https://doi.org/10.1146/annurev-polisci-051214-100534CrossRefGoogle Scholar
Turberville, A (1933) History objective and subjective. History 17, 289302. https://www.jstor.org/stable/24400365CrossRefGoogle Scholar
Ward, MD, Metternich, NW, Dorff, CL, Gallop, M, Hollenbach, FM, Schultz, A and Weschle, S (2013) Learning from the past and stepping into the future: toward a new generation of conflict prediction. International Studies Review 15, 473490.CrossRefGoogle Scholar
Wilkenfeld, J and Brecher, M (2000) Interstate crises and violence: twentieth-century findings. In Midlarsky, MI (ed.), Handbook of War Studies II. Ann Arbor, MI: University of Michigan Press, pp. 271300.Google Scholar
Wüest, B and Lorenzini, J (2020) External Validation of Protest Event Analysis. In Kriesi, H, Lorenzini, J, Wüest, B and Hausermann, S (eds), Contention in Times of Crisis: Recession and Political Protest in Thirty European Countries. Cambridge: Cambridge University Press, pp. 4974.CrossRefGoogle Scholar
Yarhi-Milo, K (2013) In the eye of the beholder: how leaders and intelligence communities assess the intentions of adversaries. International Security 38, 751. https://doi.org/10.1162/ISEC/_a/_00128CrossRefGoogle Scholar
Yarhi-Milo, K, Lanoszka, A and Cooper, Z (2016) To arm or to ally? The patron's dilemma and the strategic logic of arms transfers and alliances. International Security 41, 90139. https://doi.org/10.1162/ISEC/_a/_00250CrossRefGoogle Scholar
Zartman, I and Faure, GO (2005) Escalation and Negotiation in International Conflicts. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Zhang, H and Pan, J (2019) CASM: a deep-learning approach for identifying collective action events with text and image data from social media. Sociological Methodology 49, 157. https://doi.org/10.1177/0081175019860244CrossRefGoogle Scholar
Figure 0

Figure 1. Comparison of a natural language and machine-readable abstractive account of the Cuban Missile Crisis (1962). The text on the left is a summary of the event from the ICB Crisis Narrative. The mapping on the right shows the corresponding ICBe coding.

Figure 1

Table 1. Ontological coverage of ICBe versus the existing state of the art

Figure 2

Figure 2. Recall comparison of two cases across existing state of the art efforts. Higher y-axis values represent higher recall and higher x-axis values represent number of times that detail is mentioned across the full corpus used to construct the synthetic narrative.

Figure 3

Figure 3. Computational evaluation of the precision of ICBe event codings. The plot on the left is a map of the semantic meaning of every sentence in the corpus (black points) as assigned by a large language model (Paraphrase-MPNET-base-v2) and projected down into two dimensions (UMAP). Overlaid are the median semantic locations of each label assigned by ICBe coders (colored labels). The labels with similar meaning are assigned to sentences with similar semantic meaning, creating an observable structure and pattern we would not observe with low-quality coding where tag location would instead appear random. The plot on the right shows a hierarchical dendrogram clustering labels into groups by their average semantic location with more similar labels being more closely connected on the tree. The clustering by color indicates it closely mirrors the intended ICBe ontology, suggesting high precision in the coding.

Figure 4

Figure 4. Synthetic narratives combine several thousand accounts of each crisis into a single timeline of events, taking only those mentioned in at least 5 or more documents. Checkmarks represent whether that event could be hand matched to any detail in the ICB corpus, ICBe dataset, or any of the other event datasets (SI Appendix 3.2 and 3.3).

Figure 5

Figure 5. Crisis map for the Cuban Missile Crisis. The start of the crisis is at the top and end of the crisis is at the bottom, with each actor in a column with labeled points identifying their speeches, actions, and thoughts.

Figure 6

Figure 6. Synthetic narratives combine several thousand accounts of each crisis into a single timeline of events, taking only those mentioned in at least five or more documents. Checkmarks represent whether that event could be hand matched to any detail in the ICB corpus, ICBe dataset, or any of the other event datasets (SI Appendix 3.2 and 3.3).

Figure 7

Figure 7. The unit of analysis is the dyad-day. Top 10 most active dyads per category shown. Red text shows events from the synthetic narrative relative to that event category. Blue bars indicate an event recorded by ICEWs for that dyad on that day.

Supplementary material: File

Douglass et al. supplementary material

Douglass et al. supplementary material
Download Douglass et al. supplementary material(File)
File 20.7 MB