Introducing ICBe: an event extraction dataset from narratives about international crises

Rex W. Douglass; Thomas Leo Scherer; J. Andrés Gannon; Erik Gartzke; Jon Lindsay; Shannon Carcelli; Jonathan Wilkenfeld; David M. Quinn; Catherine Aiken; Jose Miguel Cabezas Navarro; Neil Lund; Egle Murauskaite; Diana Partridge

doi:10.1017/psrm.2024.17

Introducing ICBe: an event extraction dataset from narratives about international crises

Published online by Cambridge University Press: 24 May 2024

Jonathan Wilkenfeld ,

David M. Quinn ,

Catherine Aiken and

Jose Miguel Cabezas Navarro

...Show all authors

Show author details

Rex W. Douglass*: Affiliation:
Department of Political Science, University of California, San Diego, CA, USA
Thomas Leo Scherer: Affiliation:
Department of Political Science, University of California, San Diego, CA, USA
J. Andrés Gannon: Affiliation:
Department of Political Science, Vanderbilt University, Nashville, TN, USA
Erik Gartzke: Affiliation:
Department of Political Science, University of California, San Diego, CA, USA
Jon Lindsay: Affiliation:
School of Cybersecurity and Privacy, Georgia Institute of Technology, Atlanta, GA, USA
Shannon Carcelli: Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Jonathan Wilkenfeld: Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
David M. Quinn: Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Catherine Aiken: Affiliation:
Center for Security and Emerging Technology, Georgetown University, Washington, DC, USA
Jose Miguel Cabezas Navarro: Affiliation:
Health and Society Research Center, Universidad Mayor, Santiago, Chile
Neil Lund: Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Egle Murauskaite: Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
Diana Partridge: Affiliation:
Department of Government and Politics, University of Maryland, College Park, MD, USA
*: Corresponding author: Rex W. Douglass; Email: [email protected]

Article contents

Abstract
Identifying and measuring international relations
Methodology and data
Performance comparison
Case illustrations
Conclusion
Data
Author contributions
Financial support
Competing interest
Footnotes
References

Rights & Permissions

Abstract

How do international crises unfold? We conceptualize international relations as a strategic chess game between adversaries and develop a systematic way to measure pieces, moves, and gambits accurately and consistently over a hundred years of history. We introduce a new ontology and dataset of international events called ICBe based on a very high-quality corpus of narratives from the International Crisis Behavior (ICB) Project. We demonstrate that ICBe has higher coverage, recall, and precision than existing state of the art datasets and conduct two detailed case studies of the Cuban Missile Crisis (1962) and the Crimea-Donbas Crisis (2014). We further introduce two new event visualizations (event iconography and crisis maps), an automated benchmark for measuring event recall using natural language processing (synthetic narratives), and an ontology reconstruction task for objectively measuring event precision. We make the data, supplementary appendix, replication material, and visualizations of every historical episode available at a companion website crisisevents.org.

Keywords

data collection measurement text and content analysis

Type: Original Article
Information: Political Science Research and Methods , Volume 12 , Issue 4 , October 2024 , pp. 729 - 749

DOI: https://doi.org/10.1017/psrm.2024.17 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press on behalf of EPS Academic Ltd

If we could record every international interaction in the realms of diplomacy, conflict, economics, and beyond, how much unique information would this chronicle amount to, and how surprised would we be to see something new? In other words, what is the entropy of international relations? While this record could in principle be unbounded, the central conceit of social science is that there are structural regularities that limit what actors can do, their best options, and even which actors are likely to survive (Brecher, Reference Brecher1999; Reiter, Reference Reiter2015). If so, then these events can be recorded and systematically measured by social scientists interested in these regularities.Footnote ¹ A large and growing measurement literature seeks to do just that, using human coding and improving natural language processing techniques to capture unstructured streams of events from text such as international news reports.Footnote ²

We advance existing efforts to identify and structure regularized events and actors in international politics by combining human coding with natural language processing to create (1) a large, flexible ontology of international affairs and (2) a fine-grained and structured event dataset of international crises from 1918 to 2017, which we developed by applying our ontology to an unusually high-quality corpus of historical narratives of international crises (Brecher, Reference Brecher1999; Wilkenfeld and Brecher, Reference Wilkenfeld, Brecher and Midlarsky2000; Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016). We then develop several methods for objectively gauging how well these event codings reconstruct the information contained in the original crisis narrative. We conclude by benchmarking our event codings against several current state-of-the-art event data collection efforts. The underlying fine-grained variation in international affairs is unrecognizable through the lens of current quantification efforts. We find that existing models produce data on historical episodes that do not contain enough information to reconstruct the underlying event. In focusing this initial effort on international crises as a proof of concept sample, we demonstrate our ontology and method's potential to improve upon existing empirical identifications of patterns of international interactions.

Over the next five sections, this measurement paper makes the following arguments. First, there is a real-world unobserved latent concept known as international relations that can and should be systematically measured. Second, we propose a method for systematic large-scale measurement of the actors and behaviors in international affairs and as a proof of concept apply that method to a well-regarded and salient sample of events known as international crises. Third, in doing so, we confirm that those measurements exhibit several desirable kinds of internal and external validity and out-perform existing approaches. Fourth, this validation can be evaluated in detail via new event visualizations, with examples provided for case studies of the 1962 Cuban Missile Crisis and 2014 Crimea-Donbas Crisis. A final section concludes.

1. Identifying and measuring international relations

1.1 Motivation

Our knowledge of any historical episode, including the participants and their preferences, behaviors, and beliefs, is only indirectly observed from historical records that most often take the form of unstructured natural language text. Despite its complexity, all international interactions fundamentally involve a finite set of actors expressing their interests through at least theoretically observable behaviors. So how can we abstract and measure discrete events that make up a historical episode in international relations? The easiest way to convey the desired product is with an example. Figure 1 shows a narrative account of the Cuban Missile Crisis (1962) in natural language sentences alongside a mapping to discrete machine-readable abstractive events. From this, scholars can identify similarities and differences across events like what foreign policy actions deter versus inflame (Jervis, Reference Jervis1978; Glaser, Reference Glaser2000), when third parties mediate (Haffar, Reference Haffar2002; Quinn et al., Reference Quinn, Wilkenfeld, Smarick and Asal2006), and how actors communicate resolve (Trager, Reference Trager2016; Lupton, Reference Lupton2018). Identifying patterns of international interactions is not just an inherently interesting enterprise; it is a necessary precondition to important efforts to predict where policymakers should turn their attention to improve global welfare (Ward et al., Reference Ward, Metternich, Dorff, Gallop, Hollenbach, Schultz and Weschle2013; Beger et al., Reference Beger, Morgan and Ward2021).

Figure 1. Comparison of a natural language and machine-readable abstractive account of the Cuban Missile Crisis (1962). The text on the left is a summary of the event from the ICB Crisis Narrative. The mapping on the right shows the corresponding ICBe coding.

1.2 Existing state of the art measurements

We begin by drawing informative prior beliefs about the underlying process of international relations that we expect to govern behavior during historical episodes and their later transcription into the historical record. We organize our prior beliefs along two overarching axes: (1) existing efforts to identify the actors/actions of international relations; and (2) the types of behaviors and information we hope to recover. Table 1 describes these two axes as columns and rows, respectively.

Table 1. Ontological coverage of ICBe versus the existing state of the art

The rows in Table 1 represent the types of information we expect to find in international relations and forms the basis for our proposed ontology. We began the ontology by first doing a full natural language processing pass of the corpus and identifying all of the named entities and verbs mentioned in the text. To identify possible behaviors, we matched verbs to the most likely definition found in Wordnet (Miller, Reference Miller1995), tallied them (SI Appendix 1.2), and then aggregated them into a smaller number of behaviors balancing conceptual detail with manageable sparsity for human coding (informed by existing conceptual literature and measurement research). We used the International Crisis Behavior (ICB) project actor level data to identify likely actors for each crisis and location options relative to each actor. For behavior, actor, and location, coders could write-in a value if the given options were insufficient. The codebook lists eleven behaviors added post-coding as coders flagged events that were not captured by the initial ontology (e.g., propaganda).

As we are not the first to attempt to measure international relations in a structured manner, the columns of Table 1 compare the ontological coverage of ICBe to existing state of the art systems in production and with global coverage. We choose these datasets and models as they represent frequently used and reputable efforts to structure and describe historical events of interest to scholars of international politics. The first column starts with our contribution, ICBe, alongside other event-level datasets including CAMEO dictionary lookup-based systems (Historical Phoenix (Althaus et al., Reference Althaus, Bajjalieh, Carter, Peyton and Shalmon2019); ICEWS (Boschee et al., Reference Boschee, Lautenschlager, O'Brien, Shellman, Starz and Ward2015); Terrier (Grant et al., Reference Grant, Halterman, Irvine, Liang and Jabr2017)), the Militarized Interstate Disputes Incidents dataset, the UCDP-GED dataset (Sundberg and Melander., Reference Sundberg and Melander2013; Davies et al., Reference Davies, Pettersson and Öberg2022), and ACLED (Raleigh et al., Reference Raleigh, Linke, Hegre and Karlsen2010).Footnote ³ The final set of columns compares episode-level datasets beginning with the original ICB project (Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016; Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982; Beardsley et al., Reference Beardsley, James, Wilkenfeld and Brecher2020), the Militarized Interstate Disputes dataset (Gibler, Reference Gibler2018; Palmer et al., Reference Palmer, McManus, D'Orazio, Kenwick, Karstens, Bloch, Dietrich, Kahn, Ritter and Soules2022), and the Correlates of War (Sarkees and Wayman, Reference Sarkees and Wayman2010). We include episode-level datasets as they remain a common and trusted tool for analyzing international relations, and because ICBe is unique among event-level datasets as events are matched to crises and can be aggregated to the episode level. There is imperfect overlap between their intended depth and scope of coverage; “international crises” are similar, but not identical to, “interstate wars” and “militarized interstate disputes,” which differ yet again from “individual events of organized violence” and “non-violent action.” Even like-concepts require care in comparison, as an “aim” in ICBe is the same as in MIPS, but an “alert” in ICBe is not the same as an “alert” in MIDs.

This comparison is not intended to fault existing data and models for not including every variable in ICBe's ontology, as some of these variables fall outside the scope of a particular dataset's intended purpose. Rather, it serves as an initial basis for identifying the heterogeneity in existing efforts to abstract and measure discrete historical events of interest and to provide theoretical justifications from existing research about what is included in our dataset's ontology and where ICBe's detail about historical events can be compared to the current state of the art.

With the exception of large-scale CAMEO dictionary-based systems (the first grouping of columns), our ontology improves upon the existing state of the art quantitative datasets that ignore important information about international interactions.Footnote ⁴ We highlight two particular innovations. First, we separate the “chess pieces” from the “chess players” in distinguishing between different actors within a state. By virtue of our ontology, coding military versus civilian actors and national leaders versus bureaucrats, our data can be used to explore important questions concerning civilian-military relations (Narang and Talmadge, Reference Narang and Talmadge2018), Track Two diplomacy, the role of sub-national actors (Hsu et al., Reference Hsu, Höhne, Kuramochi, Vilariño and Sovacool2020), and the evolution of which actors are engaged in crises—a topic of increasing interest as states engage in gray zone conflict by employing the coast guard or paramilitary mercenaries instead of internationally recognized state militaries (Gannon, Reference Gannon2022). Second, we add information about the domains in which actors behave—whether in land, air, sea, space, or cyber—since they differ in their technology, tactics, geography, and purpose (Gartzke and Lindsay, Reference Gartzke and Lindsay2019). Doing so allows researchers to identify and explain patterns in escalation conditional on the military means states use in conflict. Recent concerns about cross-domain conflict, and the effect of new domains of conflict like space and cyber, have made this an endeavor of increased interest to practitioners (Gannon, Reference Gannon2022).

2. Methodology and data

2.1 Corpus

For our corpus, we select a set of unusually high-quality historical narratives from the ICB project (n = 471) with coverage spanning 1918–2017 (SI Appendix 1.1) (Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1997; Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016). ICB defines a crisis as meeting three conditions: (1) an actor perceives a threat to one of more of its core values, (2) the actor has a finite time horizon for responding to the perceived threat, and (3) the probability of military hostility has increased (Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982). Crises are a significant focus of detailed single case studies and case comparisons because they provide an opportunity to examine behaviors in international relations short of, or at least prior to, full conflict (Holsti, Reference Holsti1965; Paige, Reference Paige1968; Allison and Zelikow, Reference Allison and Zelikow1971; Brecher and Wilkenfeld, Reference Brecher and Wilkenfeld1982; Gavin, Reference Gavin2014; Iakhnis and James, Reference Iakhnis and James2019). The corpus is also unique in that it was designed to be used in a downstream quantitative coding project, meaning each narrative was written by a small number of scholars using a uniform coding scheme where things like word choice, writing style, and level of specificity were done deliberately and consistently (Hewitt, Reference Hewitt2001). Case selection was exhaustive based on a survey of world news archives and region experts, cross-checked against other databases of war and conflict, and non-English sources (Brecher et al., Reference Brecher, Wilkenfeld, Beardsley, James and Quinn2016; Kang and Yu-Ting Lin., Reference Kang and Lin2019, 59).

2.2 Coding process

The ICBe ontology follows a hierarchical design philosophy where a smaller number of significant decisions are made early on and then progressively refined into more specific details (Brust and Denzler, Reference Brust and Denzler2020).Footnote ⁵ Each coder was instructed to first thoroughly read the full crisis narrative and then presented with a custom graphical user interface (GUI) (SI Appendix 2.1). Coders then proceeded sentence by sentence, choosing the number of events (0–3) that occurred, the highest behavior (thought, speech, or action), a set of players, whether the means were primarily armed or unarmed, whether there was an increase or decrease in aggression (uncooperative/escalating or cooperative/de-escalating), and finally one or more specific and non-mutually exclusive activities. Some additional details were always collected (e.g., location and timing) while other details were only collected if appropriate (e.g., force size, fatalities, domains, units). While each event was matched to a sentence, coders could fill in details outside that sentence (e.g., antecedents to pronouns). We reviewed, standardized, and normalized where coders listed a behavior, actor, or location outside the ontology.Footnote ⁶

A unique feature of the ontology is that thought, speech, and do behaviors can be nested into combinations, e.g. an offer for the U.S.S.R. to remove missiles from Cuba in exchange for the U.S. removing missiles from Turkey. Through compounding, the ontology can capture what players were said to have known, learned, or said about other specific fully described actions.

No existing event data distinguishes thoughts, speeches, and actions. In fact, most only try to code actions and entirely omit thoughts and speech acts despite recognition of their importance in international politics (Smith, Reference Smith1998). Scholars have opted against coding thoughts and speech acts because of a lack of confidence the full universe could be readily observed and consequently at least theoretically be included.Footnote ⁷ But the perfect should not be the enemy of the good, and measurement challenges are only overcome after an initial attempt to estimate difficult-to-observe concepts of interest. The ICB narratives are one of the better sources for this endeavor due to the consistent use of high-quality primary source material that takes advantage of qualitative methods well-suited to identifying thoughts and speech acts like archival work and expert interviews.

Each crisis was typically assigned to two expert coders and two novice coders with an additional tie-breaking expert coder assigned to sentences with high disagreement.Footnote ⁸ For the purposes of measuring intercoder agreement and consensus, we temporarily disaggregate the unit of analysis to the Coder-Crisis-Sentence-Tag (n = 993,731), where a tag is any unique piece of information a coder can associate with a sentence such as an actor, date, behavior, etc. We then aggregate those tags into final events (n = 18,783), using a consensus procedure (SI Appendix 2.2) that requires a tag to have been chosen by at least one expert coder and either a majority of expert or novice coders. This screens noisy tags that no expert considered possible but leverages novice knowledge to tie-break between equally plausible tags chosen by experts. Requiring sentence-tag matching may underestimate agreement but minimizes the inclusion of noise and allows for additional validation. Once filtered for agreement, we find 472 actors and 119 different behaviors: 12 thought, 13 speech, and 94 actions.

3. Performance comparison

3.1 Internal consistency

We evaluate the internal validity of the coding process in several ways. For every tag applied we calculate the observed intercoder agreement as the percent of other coders who also applied that same tag (SI Appendix 2.3). Across all concepts, the Top 1 Tag Agreement was low among novices (31 percent), moderate for experts (65 percent), and high (73 percent) following the consensus screening procedure.

We attribute the remaining disagreement primarily to three sources. First, we required coders to rate and justify their confidence in the coding. They reported low confidence for 20 percent of sentences; 45 percent of those were due to a mismatch between the ontology and the text (“survey doesn't fit event”) and 46 percent were from a lack of information or confused writing in the source text (40 percent “more knowledge needed,” 6 percent “confusing sentence”). Observed disagreement varied predictably with self-reported confidence (SI Appendix 2.4). Second, as intended, agreement is higher (75–80 percent) for questions with fewer options near the root of the ontology compared to agreement for questions near the leaves of the ontology (50–60 percent). Third, individual coders exhibit nontrivial coding styles, e.g. some more expressive coders applied many tags per concept while others focused on only the single best match. We further observed unintended synonymity, e.g. the same information can be framed as either a threat to do something or a promise not to do something.

3.2 Improvement over existing efforts

To evaluate our coding process relative to existing datasets, we measure the recall and precision of ICBe events in absolute terms and relative to other existing systems. Recall measures the share of desired information recovered by a sequence of coded events while precision measures the degree to which a sequence of events correctly and usefully describes the information in history. To aid in subjective evaluation of the precision and recall of ICBe for each event, we provide full ICB narratives, ICBe coding in an easy-to-read iconographic form, and a wide range of visualizations for every case on the companion website.

Recall for historical episodes is poorly defined for two reasons. History may or may not be written by the victors but by virtue of being written by someone there is no genuine ground truth about what occurred, only surviving texts about it (Turberville, Reference Turberville1933). Second, there is no a priori guide to what information is necessary detail and what is ignorable trivia. History suffers from what is known as the Coastline Paradox (Mandelbrot, Reference Mandelbrot1983)—it has a fractal dimension greater than one such that the more you zoom in, the more detail you will find about individual events as well as in between any two discrete events. The ICBe ontology is a proposal about what information is important, but we need an independent benchmark to evaluate whether that proposal is a good one and that allows for comparing proposals from event projects that had different goals. We need a yardstick for history.

Our strategy for dealing with both problems is a plausibly objective yardstick called a synthetic historical narrative. We collect a large diverse corpus of narratives spanning timelines, encyclopedia entries, journal articles, news reports, websites, and government documents. Using natural language processing (fully described in SI Appendix 3.1), we identify details that appear across multiple accounts. A detail refers to the smallest textual unit for which we can calculate similarity across corpora to identify whether sentences semantically refer to the same broader observed event (Narayan et al., Reference Narayan, Cohen and Lapata2018). The more accounts that mention a detail, the more central it is to understanding the true historical episode. The theoretical motivation is that authors face word limits which force them to pick and choose which details to include, and they choose details that serve the specific context of the document they are producing. With a sufficiently large and diverse corpus of documents, we can vary the context while holding the overall episode constant and see which details tend to be invariant to context. Sufficiently similar details were binned together and then summarized so they could be compared to the coding in ICBe. This presents a harder evaluation baseline than comparing ICBe's recall to just that of ICB since there are non-crisis aspects of these events that may be included in other narratives but are out of the scope of our data. For example, the nationalization of businesses in Cuba may be included as important context in the Cuban Missile Crisis in documents that do not focus on the crisis dimensions like ICB. Using this hard case, a recall measure of ICBe on the synthetic narratives thus serves as a way to evaluate the breadth of ICBe's ontology and potential application to non-crisis international events.

We find substantive variation in recall across existing state of the art methods. Mentions of a detail across accounts are exponentially distributed with context-invariant details appearing dozens to hundreds of times more than context-dependent details.Footnote ⁹ Furthermore, crisis start and stop dates are arbitrary, and the historical record points to many precursor events as necessary detail for understanding later events. Figure 2 compares ICBe's recall with that of existing datasets for the two case studies detailed in Section 4. ICBe strictly dominates all of the systems but ICEWs in recall though we note that the small sample sizes mean these systems should be considered statistically indistinguishable. Across all existing datasets and ICBe, recall increases with the number of document mentions which is an important sign of validity for both them and our benchmark. The one outlier is Phoenix which in the Cuban Missile Crisis case is so noisy that its recall curve is flat to decreasing as mentions increase. The two episode-level datasets (MIDs and ICM) have low coverage of contextual details. The two other dictionary systems ICEWs and Terrier have higher coverage, with ICEWs outperforming Terrier. Importantly our corpus of ICB narratives has high recall of frequently mentioned details giving us confidence in how those summaries were constructed, and ICBe lags only slightly behind showing that it left little additional information on the table.Footnote ¹⁰

Figure 2. Recall comparison of two cases across existing state of the art efforts. Higher y-axis values represent higher recall and higher x-axis values represent number of times that detail is mentioned across the full corpus used to construct the synthetic narrative.

The second component of event measurement validation is precision. It does little good to recall a historical event but too vaguely (e.g., MIDs describes the Cuban Missile Crisis as a blockade, a show of force, and a stalemate) or with too much error to be useful for downstream applications (e.g., ICEWS records 263 “Detonate Nuclear Weapons” events between 1995 and 2019). ICBe's ontology and coding system are designed to strike a balance so that the most important information is recovered accurately but also abstracted to a level that is still useful and interpretable.

We demonstrate ICBe's precision in a number of different ways. First, we develop the iconography system for presenting event codings as coherent statements that can be compared side by side to the original source narrative for every case on the companion website. We further provide a stratified sample of event codings alongside their source text (SI Appendix 4.2). We find both the visualizations of macrostructure and head-to-head comparisons of ICBe codings to the raw text to strongly support the quality of ICBe. Second, we develop a visualization we call a crisis map, a directed graph intersected with a timeline. A researcher should be able to lay out the events of a crisis on a timeline and read off the macrostructure of an episode from each individual move. A crisis map using ICBe for the Cuban Missile Crisis case study is provided in Figure 5, crisis maps for the two case studies using existing event datasets can be found in SI Appendix 4.3 and 4.4, and crisis maps for all crises using all datasets can be found on the companion website. The crisis maps reveal episode-level datasets like MIDs or the original ICB are too sparse and vague to reconstruct the structure of the crisis (SI Appendix 4.3 and 4.4). On the other end of the spectrum, the high recall dictionary-based event datasets like Terrier and ICEWs produce so many noisy events (several hundred thousand) that even with heavy filtering their crisis maps are completely unintelligible. Further, because of copyright issues, none of these datasets directly provide the original text spans making event-level precision difficult to verify.

We further want to automatically verify the precision of individual ICBe event codings, which we can do in the case of ICBe because each event is mapped to a specific span of text. Our proposed measure is a reconstruction task to see whether our intended ontology can be recovered through only unsupervised clustering of sentences they were applied to. Figure 3 shows the location of every sentence from the ICBe corpus in semantic space, as embedded using the same large language model as before, and the median location of each ICBe event tag applied to those sentences.Footnote ¹¹ Labels reflect the individual leaves of the ontology and colors reflect the higher level coarse branch nodes of the ontology. If ICBe has high precision, substantively similar tags ought to have been applied to substantively similar source text, which is what we see both in two dimensions in the main plot and via hierarchical clustering on all dimensions in the dendrogram along the right-hand side.Footnote ¹²

Figure 3. Computational evaluation of the precision of ICBe event codings. The plot on the left is a map of the semantic meaning of every sentence in the corpus (black points) as assigned by a large language model (Paraphrase-MPNET-base-v2) and projected down into two dimensions (UMAP). Overlaid are the median semantic locations of each label assigned by ICBe coders (colored labels). The labels with similar meaning are assigned to sentences with similar semantic meaning, creating an observable structure and pattern we would not observe with low-quality coding where tag location would instead appear random. The plot on the right shows a hierarchical dendrogram clustering labels into groups by their average semantic location with more similar labels being more closely connected on the tree. The clustering by color indicates it closely mirrors the intended ICBe ontology, suggesting high precision in the coding.

4. Case illustrations

In this section, we focus our validation on two case studies for which we have produced synthetic narratives using the method described in Section 3.2. The first is the Cuban Missile Crisis which took place primarily in the second half of 1962, involved the United States, the Soviet Union, and Cuba, and is widely known for bringing the world to the brink of nuclear war (Figure 1). The second is the Crimea-Donbas Crisis which took place primarily in 2014, involved Russia, Ukraine, and NATO, and within a decade spiraled into a full-scale invasion (SI Appendix 4.1). We choose these cases because they are significant in contemporary international relations, are widely known across academic disciplines as well as among the public, and are sufficiently brief to evaluate in depth. They are similar in that both cases involve a superpower in crisis with a neighbor that changed from a friendly to a hostile regime, both held implications for the economic and military security for the superpower by risking full-scale invasion, and both eventually invited intervention by an opposing superpower.

4.1 Cuban Missile Crisis (1962)

A synthetic historical narrative for the Cuban Missile Crisis appears in Figure 4, with 51 events drawn from 2,020 documents. Each row represents a detail that appeared in at least five documents along with an approximate start date, a handwritten summary, the number of documents it was mentioned in, and whether it could be identified in the text of the original ICB corpus, our ICBe events, and any of the competing existing models.

Figure 4. Synthetic narratives combine several thousand accounts of each crisis into a single timeline of events, taking only those mentioned in at least 5 or more documents. Checkmarks represent whether that event could be hand matched to any detail in the ICB corpus, ICBe dataset, or any of the other event datasets (SI Appendix 3.2 and 3.3).

ICBe's improved recall of the Cuban Missile Crisis relative to the state of the art was summarized in Section 3.2, but the events that explain that improvement can now be seen. Our ground truth ICB narrative contains 17/51 of the events from the synthetic narrative of a case that includes high-level previously classified details. ICBe captures nearly all details included in ICB as well as more details from the synthetic narrative than any competing dataset. Phoenix includes some earlier information than ICBe like the nationalization of businesses and back channel negotiations, but the crisis narrative has a clean canonical end with the Soviets agreeing to withdraw missiles. ICBe stands out in including more communicative behavior (do–speech) than existing datasets like US threats to attack and later promises not to invade. Given the recognized importance of threat credibility for understanding international conflict, the addition of this information is a substantively important improvement over the existing state of the art (Slantchev, Reference Slantchev2011).

Figure 5 shows the crisis map for the Cuban Missile Crisis. Looking at the crisis on a timeline, one can now identify the structure of actors and the environment, along with its supporting details, in a way that validates the precision of ICBe. Although harder to measure objectively, this crisis map provides face validity that ICBe's account is not too vague, but also not unnecessarily detailed. We include much of the geopolitically important details like Soviet deployment, US discovery of that deployment, heightened alert levels, a blockade, and negotiations that ended with a formal agreement. At the same time, the crisis map indicates that ICBe does not include unnecessary nuances that preclude useful comparison to other international events.

Figure 5. Crisis map for the Cuban Missile Crisis. The start of the crisis is at the top and end of the crisis is at the bottom, with each actor in a column with labeled points identifying their speeches, actions, and thoughts.

4.2 Crimea-Donbas (2014)

A synthetic historical narrative for the 2014 Crimea-Donbas Crisis (30 events drawn from 971 documents) appears in Figure 6. As in the earlier case, rows represent details that appeared in at least five documents and whether it is identified in ICBe and existing datasets.

Figure 6. Synthetic narratives combine several thousand accounts of each crisis into a single timeline of events, taking only those mentioned in at least five or more documents. Checkmarks represent whether that event could be hand matched to any detail in the ICB corpus, ICBe dataset, or any of the other event datasets (SI Appendix 3.2 and 3.3).

Again quantitatively summarized earlier in Section 3.2 (Figure 2), our ground truth ICB narrative contains 23/30 of the events from the synthetic narrative. Like the gray zone precursor to the Cuban Missile Crisis (Cormac and Aldrich, Reference Cormac and Aldrich2018), Ukraine provided several security guarantees to Russia that were potentially undone, e.g. a long-term lease on naval facilities in Crimea. But unlike the Cuban Missile Crisis, the end of this crisis is unclear, with the event meekly ending with a second cease-fire agreement (Minsk II) but continued fighting. ICBe again recalls more important information about the crisis than any existing dataset, particularly information concerning the behavior of non-state separatist groups like the Donetsk People's Republic (DPR) and Luhansk People's Republic (LPR).

As this more recent case reflects primarily public reporting rather than the previously classified details relevant for the Cuban Missile Crisis, ICBe's improvement relative to the global and real-time coverage of dictionary-based event systems is still present, but less pronounced. We want to take seriously the possibility that some functional transformation could recover the precision of ICBe. For example, Terechshenko (Reference Terechshenko2020) attempts to correct for the mechanically increasing amount of news coverage each year by de-trending violent event counts from Phoenix using a human-coded baseline. Others have focused on verifying precision for ICEWs on specific subsets of details against known ground truths, e.g. geolocation (Cook and Weidmann, Reference Cook and Weidmann2019), protest events (80 percent) (Wüest and Lorenzini, Reference Wüest, Lorenzini, Kriesi, Lorenzini, Wüest and Hausermann2020), and anti-government protest networks (46.1 percent) (Jäger, Reference Jäger2018).

We take the same approach here in Figure 7, selecting four specific CAMEO event codings and checking how often they reflect a true real-world event from the Crimea-Donbas synthetic narrative. We choose four event types around key moments in the crisis. The start of the crisis revolves around Ukraine backing out of a trade deal with the EU in favor of Russia, but “sign formal agreement” events act more like a topic detector with dozens of events generated by discussions of a possible agreement but not the actual agreement which never materialized. The switch is caught by the “reject plan, agreement to settle dispute” event type, but also continues for Viktor Yanukovych even after he was removed from power because of articles retroactively discussing the cause of his removal. Events for “use conventional military force” capture a threshold around the start of hostilities and who the participants were but not any particular battles or campaigns. Likewise, “impose embargo, boycott, or sanctions” captures the start of waves of sanctions and from who but are effectively constant as the news coverage does not distinguish between subtle changes or additions. In sum, dictionary-based methods on news corpora tend to have high recall because they parse everything in the news, but for the same reason, their specificity for most event types is too low to back out individual chess-like sequencing that ICBe aims to record.

Figure 7. The unit of analysis is the dyad-day. Top 10 most active dyads per category shown. Red text shows events from the synthetic narrative relative to that event category. Blue bars indicate an event recorded by ICEWs for that dyad on that day.

5. Conclusion

The scope and complexity of international politics should not discourage the identification of trends, patterns, and regularities. In undertaking event abstraction from narratives about key historical episodes in international relations, this paper has proposed a mapping between unstructured historical records and a structured ontology of these events with high coverage of concepts of interest. Multiple validity checks find the resulting codings have high internal validity (e.g., intercoder agreement) and external validity (i.e., matching source material in both micro-details at the sentence level and macro-details spanning full historical episodes). Further, these codings perform much better in terms of recall, precision, coverage, and overall coherence in capturing these historical episodes than existing event systems used in international relations.

These data, along with the open-source code, documentation, and companion website provide several substantive and methodological contributions to the discipline. Substantively, these data are appropriate for statistical analysis of hard questions in the study of crises like interactions between means of warfare and the preconditions for conflict escalation (Gannon, Reference Gannon2022). Methodologically, our mapping from codings to source text at the sentence level provide a new resource for natural language processing with access to coder-level disaggregation that furthers the study of uncertainty in the interpretation of international events and in the quantitative coding of historical events. Finally, we provide a companion website (crisisevents.org) that incorporates detailed visualizations of all the data introduced here as a new resource for the study of international crises in a scalable, yet detailed, manner.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2024.17. To obtain replication material for this article, https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FMNVUEP&version=DRAFT

Data

This article's data, supplementary appendix, replication material, and visualizations of every historical episode are available on the GitHub repository ICBEventData and through the companion website crisisevents.org.

Acknowledgements

We thank the ICB Project and its directors and contributors for their foundational work and their help with this effort. We make special acknowledgement of Michael Brecher for helping found the ICB project in 1975, creating a resource that continues to spark new insights to this day. We thank the many undergraduate coders. Thanks to the Center for Peace and Security Studies and its membership for comments. Special thanks to Rebecca Cordell, Philip Schrodt, Zachary Steinert-Threlkeld, and Zhanna Terechshenko for their generous feedback. Thank you to the cPASS research assistants: Helen Chung, Daman Heer, Syeda ShahBano Ijaz, Anthony Limon, Erin Ling, Ari Michelson, Prithviraj Pahwa, Gianna Pedro, Tobias Stodiek, Yiyi ‘Effie’ Sun, Erin Werner, Lisa Yen, and Ruixuan Zhang.

Author contributions

Conceptualization: R. W. D., E. G., J. L.; methodology: R. W. D., T. L. S.; software: R. W. D.; validation: R. W. D., T. L. S.; formal analysis: R. W. D., T. L. S.; investigation: S. C., R. W. D., J. A. G., C. A., N. L., E. M., J. M. C. N., D. P., D. M. Q., J. W.; data curation: R. W. D., D. M. Q., T. L. S., J. W.; writing — original draft: R. W. D., T. L. S.; writing — review and editing: R. W. D., J. A. G., E. G., T. L. S.; visualization: R. W. D., T. L. S.; supervision: E. G.; project administration: S. C., R. W. D., J. A. G., D. M. Q., T. L. S., J. W.; funding acquisition: E. G., J. L.

Financial support

This work was supported by a grant from the Office of Naval Research N00014-19-1-2491 and from the Charles Koch Foundation 20180481. The financial sponsors played no role in the design, execution, analysis and interpretation of data, or writing of the study.

Competing interest

The authors declare that there are no competing interests.

Footnotes

² See Beieler et al. (Reference Beieler, Brandt, Halterman, Schrodt, Simpson, Alvarez and Alvarez2016), Boschee et al. (Reference Boschee, Lautenschlager, O'Brien, Shellman, Starz and Ward2015), Brandt et al. (Reference Brandt, DOrazio, Holmes, Khan and Ng2018), Grant et al. (Reference Grant, Halterman, Irvine, Liang and Jabr2017), Li et al. (Reference Li, Peng, Li, Hei, Sun, Sheng and Guo2021). On event extraction from images and social media see Zhang and Pan (Reference Zhang and Pan2019) and Steinert-Threlkeld (Reference Steinert-Threlkeld2019).

³ Other related datasets that insufficiently overlap ICBe's domain for comparison include BCOW (Leng and Singer, Reference Leng and Singer1988), WEIS (McClelland, Reference McClelland1978), CREON (Hermann, Reference Hermann1984), CASCON (Bloomfield and Moulton., Reference Bloomfield and Moulton1989), SHERFACS (Sherman, Reference Sherman2000), Real-Time Phoenix (Brandt et al., Reference Brandt, DOrazio, Holmes, Khan and Ng2018), and COfEE (Balali et al., Reference Balali, Asadpour and Jafari2021) (see histories in Merritt, Reference Merritt1994 and Schrodt and Hall, Reference Schrodt and Hall2006).

⁴ See Balali et al. (Reference Balali, Asadpour and Jafari2021) for a recent review of ontological depth and availability of Gold Standard example text.

⁵ This process quickly focuses the coder on a smaller number of relevant options while also allowing them to apply multiple tags if the sentence explicitly includes more than one or there is insufficient evidence to choose only one tag. The guided coding process also allows for the possibility that earlier coarse decisions have less error than later fine-grained decisions.

⁶ See the full codebook on Github Repository ICBEventData.

⁷ Even the coding of overt actions like MIDs is not without contention (Gibler, Reference Gibler2018).

⁸ Expert coders were graduate students or postgraduates who collaboratively developed the ontology and documentation for the codebook. Undergraduate coders were students who engaged in classroom workshops.

⁹ As the ICB narratives are intended to explain conflictual behavior in a political context, many of the missing events concern more economic components of conflict (eg. nationalizing a foreign business). Even when they occur in the context of a crisis, these events largely fall outside the sample of information on which ICBe's ontology is currently trained. Even with this limitation, ICBe is more comprehensive than the existing datasets that do try to code the economic dimensions of these crises. We see expanding the ontology to broader international phenomenon as a promising future implementation of our model.

¹⁰ Although Figure 2 focuses only on two crises, the synthetic narrative approach and recall comparison can, and should, be more broadly applied to all international crises in a way that could reveal systematic blindspots across datasets.

¹¹ We preprocess sentences to replace named entities with a generic Entity token.

¹² Hierarchical clustering on cosine similarity and with Ward's method.

References

Allen, MA, Flynn, ME and Machain, CM (2022) US global military deployments, 1950–2020. Conflict Management and Peace Science 39(3), 351–370. https://doi.org/10.1177/07388942211030885.CrossRef Google Scholar

Allison, GT and Zelikow, P (1971) Essence of Decision: Explaining the Cuban Missile Crisis. Vol. 327. Boston: Little, Brown, and Co.Google Scholar

Althaus, S, Bajjalieh, J, Carter, JF, Peyton, B and Shalmon, DA (2019) Cline Center Historical Phoenix Event Data Variable Descriptions. Cline Center Historical Phoenix Event Data.Google Scholar

Balali, A, Asadpour, M and Jafari, SH (2021) COfEE: a Comprehensive Ontology for Event Extraction from Text. arXiv. https://doi.org/10.48550/arXiv.2107.10326.CrossRef Google Scholar

Beardsley, K (2011) The Mediation Dilemma. Ithaca and London: Cornell University Press.CrossRef Google Scholar

Beardsley, K, James, P, Wilkenfeld, J and Brecher, M (2020) The International Crisis Behavior Project. Oxford Research Encyclopedia of Politics. https://oxfordre.com/politics/view/10.1093/acrefore/9780190228637.001.0001/acrefore-9780190228637-e-1638. https://doi.org/10.1093/acrefore/9780190228637.013.1638.CrossRef Google Scholar

Beger, A, Morgan, RK and Ward, MD (2021) Reassessing the role of theory and machine learning in forecasting civil conflict. Journal of Conflict Resolution 65, 1405–1426. https://doi.org/10.1177/0022002720982358CrossRef Google Scholar

Beieler, J, Brandt, PT, Halterman, A, Schrodt, PA, Simpson, EM and Alvarez, R (2016) Generating political event data in near real time: opportunities and challenges. In Alvarez, RM (ed.), Computational Social Science Discovery and Prediction. Cambridge: Cambridge University Press, pp. 98–120.CrossRef Google Scholar

Ben-Yehuda, H and MishaliRam, M (2006) Ethnic actors and international crises: theory and findings, 1918–2001. International Interactions 32, 49–78.CrossRef Google Scholar

Bloomfield, LP and Moulton, A (1989) CASCON III: Computer-aided System for Analysis of Local Conflicts. MIT Center for International Studies, Cambridge.Google Scholar

Boschee, E, Lautenschlager, J, O'Brien, S, Shellman, S, Starz, J and Ward, M (2015) ICEWS Coded Event Data. Harvard Dataverse 12.Google Scholar

Brandt, PT, DOrazio, V, Holmes, J, Khan, L and Ng, V (2018) Phoenix Real-Time Event Data.Google Scholar

Brecher, M (1999) International studies in the twentieth century and beyond: flawed dichotomies, synthesis, cumulation: ISA presidential address. International Studies Quarterly 43, 213–264.CrossRef Google Scholar

Brecher, M and Wilkenfeld, J (1982) Crises in world politics. World Politics 34, 380–417. https://doi.org/10.2307/2010324CrossRef Google Scholar

Brecher, M and Wilkenfeld, J (1997) A Study of Crisis. Ann Arbor, MI: University of Michigan Press.CrossRef Google Scholar

Brecher, M, Wilkenfeld, J, Beardsley, KC, James, P and Quinn, D (2021) International Crisis Behavior Data Codebook. Codebook Version 14.Google Scholar

Brust, C-A and Denzler, J (2020) Integrating domain knowledge: using hierarchies to improve deep classifiers. arXiv:1811.07125 [Cs], January. https://arxiv.org/abs/1811.07125.CrossRef Google Scholar

Bush, SS and Hadden, J (2019) Density and decline in the founding of international NGOs in the United States. International Studies Quarterly 63, 1133–46. https://doi.org/10.1093/isq/sqz061CrossRef Google Scholar

Carafano, JJ (2014) Measuring military power. Strategic Studies Quarterly 8, 11–18. https://www.jstor.org/stable/26270616 Google Scholar

Carter, DB (2010) The strategy of territorial conflict. American Journal of Political Science 54, 969–987. https://doi.org/10.1111/j.1540-5907.2010.00471.xCrossRef Google Scholar

Chenoweth, E, Hendrix, CS and Hunter, K (2019) Introducing the nonviolent action in violent contexts (NVAVC) dataset. Journal of Peace Research 56, 295–305. https://doi.org/10.1177/0022343318804855CrossRef Google Scholar

Cook, SJ and Weidmann, NB (2019) Lost in aggregation: improving event analysis with report-level data. American Journal of Political Science 63, 250–264.CrossRef Google Scholar

Cormac, R and Aldrich, RJ (2018) Grey is the new black: covert action and implausible deniability. International Affairs 94, 477–494. https://doi.org/10.1093/ia/iiy067CrossRef Google Scholar

Davies, S, Pettersson, T and Öberg, M (2022) Organized violence 1989–2021 and drone warfare. Journal of Peace Research 59, 593–610.CrossRef Google Scholar

Eck, K and Hultman, L (2007) One-sided violence against civilians in war: insights from new fatality data. Journal of Peace Research 44, 233–246. https://doi.org/10.1177/0022343307075124CrossRef Google Scholar

Fazal, TM (2011) State Death: The Politics and Geography of Conquest, Occupation, and Annexation. Princeton, NJ: Princeton University Press.Google Scholar

Felbermayr, G, Kirilakha, A, Syropoulos, C, Yalcin, E and Yotov, YV (2020) The global sanctions data base. European Economic Review 129, 103561. https://doi.org/10.1016/j.euroecorev.2020.103561CrossRef Google Scholar

Fortna, VP (2018) Peace Time. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Frederick, BA, Hensel, PR and Macaulay, C (2017) The issue correlates of war territorial claims data, 1816–20011. Journal of Peace Research 54, 99–108. https://doi.org/10.1177/0022343316676311CrossRef Google Scholar

Gannon, JA (2022) One if by land, and two if by sea: cross-domain contests and the escalation of international crises. International Studies Quarterly 66(4), sqac065. https://doi.org/10.1093/isq/sqac065CrossRef Google Scholar

Gannon, JA, Gartzke, E, Lindsay, JR and Schram, P (2024) The shadow of deterrence: why capable actors engage in contests short of war. Journal of Conflict Resolution 68(2-3), 230–268.CrossRef Google Scholar

Gartzke, E and Lindsay, JR (2019) Cross-Domain Deterrence: Strategy in an Era of Complexity. Oxford: Oxford University Press.CrossRef Google Scholar

Gavin, FJ (2014) History, security studies, and the july crisis. Journal of Strategic Studies 37, 319–331. https://doi.org/10.1080/01402390.2014.912916CrossRef Google Scholar

Gibler, DM (2018) International Conflicts, 1816–2010: Militarized Interstate Dispute Narratives. Lanham, MD: Rowman & Littlefield.Google Scholar

Gibler, DM and Sarkees, MR (2004) Measuring alliances: the correlates of war formal interstate alliance dataset, 1816–2000. Journal of Peace Research 41, 211–222. https://doi.org/10.1177/0022343304041061CrossRef Google Scholar

Glaser, CL (2000) The causes and consequences of arms races. Annual Review of Political Science 3, 251–276. https://doi.org/10.1146/annurev.polisci.3.1.251CrossRef Google Scholar

Goemans, HE, Gleditsch, KS and Chiozza, G (2009) Introducing archigos: a dataset of political leaders. Journal of Peace Research 46, 269–283.CrossRef Google Scholar

Goertz, G and Diehl, PF (1986) Measuring military allocations: a comparison of different approaches. Journal of Conflict Resolution 30, 553–581. https://doi.org/10.1177/0022002786030003009CrossRef Google Scholar

Goldgeier, J and Tetlock, P (2001) Psychology and international relations theory. Annual Review of Political Science 4, 67–92. https://doi.org/10.1146/annurev.polisci.4.1.67CrossRef Google Scholar

Grant, C, Halterman, A, Irvine, J, Liang, Y and Jabr, K (2017) OU Event Data Project, December.Google Scholar

Haffar, W (2002) Emergent peacemakers: cataloguing new patterns of activity in post-cold war conflict. Peace Economics, Peace Science and Public Policy 8(2), 1–42. https://doi.org/10.2202/1554-8597.1054CrossRef Google Scholar

Hermann, C (1984) Comparative Research on the Events of Nations (CREON) Project: foreign policy events, 1959–1968: Version 1. ICPSR - Interuniversity Consortium for Political and Social Research. https://doi.org/10.3886/ICPSR05205.V1.CrossRef Google Scholar

Hewitt, J (2001) Engaging international data in the classroom: using the ICB interactive data library to teach conflict and crisis analysis. International Studies Perspectives 2, 371–383. https://doi.org/10.1111/1528-3577.00066CrossRef Google Scholar

Holsti, OR (1965) The 1914 case. The American Political Science Review 59, 365–378. https://doi.org/10.2307/1953055CrossRef Google Scholar

Hsu, A, Höhne, N, Kuramochi, T, Vilariño, V and Sovacool, BK (2020) Beyond states: harnessing sub-national actors for the deep decarbonisation of cities, regions, and businesses. Energy Research & Social Science 70, 101–738. https://doi.org/10.1016/j.erss.2020.101738CrossRef Google Scholar

Iakhnis, E and James, P (2019) Near crises in world politics: a new dataset. Conflict Management and Peace Science 38(2), 224–243. https://doi.org/10.1177/0738894219855610CrossRef Google Scholar

Jäger, K (2018) The limits of studying networks with event data: evidence from the ICEWS dataset. Journal of Global Security Studies 3, 498–511.CrossRef Google Scholar

Jervis, R (1978) Cooperation under the security dilemma. World Politics 30, 167–214. https://doi.org/10.2307/2009958CrossRef Google Scholar

Kang, DC and Lin, AY-T (2019) US bias in the study of Asian security: using Europe to study Asia. Journal of Global Security Studies 4, 393–401.CrossRef Google Scholar

Kinne, BJ (2020) The defense cooperation agreement dataset (DCAD). Journal of Conflict Resolution 64, 729–755. https://doi.org/10.1177/0022002719857796CrossRef Google Scholar

Lacina, B (2006) Explaining the severity of civil wars. Journal of Conflict Resolution 50, 276–289. https://doi.org/10.1177/0022002705284828CrossRef Google Scholar

LaFree, G and Dugan, L (2007) Introducing the global terrorism database. Terrorism and Political Violence 19, 181–204.CrossRef Google Scholar

Lai, B (2004) The effects of different types of military mobilization on the outcome of international crises. Journal of Conflict Resolution 48, 211–229.CrossRef Google Scholar

Leeds, BA (1999) 2003. Alliance reliability in times of war: explaining state decisions to violate treaties. International Organization 57, 801–827. https://doi.org/10.1017/S0020818303574057CrossRef Google Scholar

Leeds, BA (1999) Domestic political institutions, credible commitments, and international cooperation. American Journal of Political Science 43, 979–1002. https://doi.org/10.2307/2991814CrossRef Google Scholar

Leng, RJ and Singer, J (1988) Militarized interstate crises: the BCOW typology and its applications. International Studies Quarterly 32, 155–173. https://doi.org/10.2307/2600625CrossRef Google Scholar

Li, Q, Peng, H, Li, J, Hei, Y, Sun, R, Sheng, J and Guo, S (2021) A comprehensive survey on schema-based event extraction with deep learning. arXiv:2107.02126 [Cs], August. https://arxiv.org/abs/2107.02126.Google Scholar

Lindsay, JR and Gartzke, E (2020) Politics by many other means: the comparative strategic advantages of operational domains. Journal of Strategic Studies 0, 1–34. https://doi.org/10.1080/01402390.2020.1768372Google Scholar

Lupton, DL (2018) Reexamining reputation for resolve: leaders, states, and the onset of international crises. Journal of Global Security Studies 3, 198–216. https://doi.org/10.1093/jogss/ogy004CrossRef Google Scholar

Mandelbrot, BB (1983) The fractal geometry of nature. New York: Freeman.CrossRef Google Scholar

McClelland, C (1978) World Event/Interaction Survey, 1966–1978. WEIS Codebook ICPSR 5211.Google Scholar

McNabb Cochran, K and Long, SB (2017) Measuring military effectiveness: calculating casualty loss-exchange ratios for multilateral wars, 1816–1990. International Interactions 43, 1019–1040. https://doi.org/10.1080/03050629.2017.1273914CrossRef Google Scholar

Merritt, RL (1994) Measuring events for international political analysis. International Interactions 20, 3–33.CrossRef Google Scholar

Miller, GA (1995) WordNet: a lexical database for English. Communications of the ACM 38, 39–41. https://doi.org/10.1145/219717.219748CrossRef Google Scholar

Min, E (2021) Interstate war battle dataset (1823–2003). Journal of Peace Research 58, 294–303. https://doi.org/10.1177/0022343320913305CrossRef Google Scholar

Moyer, JD, Turner, SD and Meisel, CJ (2021) What are the drivers of diplomacy? Introducing and testing new annual dyadic data measuring diplomatic exchange. Journal of Peace Research 58(6), 1300–1310. https://doi.org/10.1177/0022343320929740CrossRef Google Scholar

Narang, V and Talmadge, C (2018) Civil-military pathologies and defeat in war: tests using new data. Journal of Conflict Resolution 62(7), 1379–1405. https://doi.org/10.1177/0022002716684627CrossRef Google Scholar

Narayan, S, Cohen, SB and Lapata, M (2018) Don't give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. arXiv:1808.08745.Google Scholar

O'Neill, B (2018) International negotiation: some conceptual developments. Annual Review of Political Science 21, 515–533. https://doi.org/10.1146/annurev-polisci-031416-092909CrossRef Google Scholar

Owsiak, AP, Cuttner, AK and Buck, B (2018) The international border agreements dataset. Conflict Management and Peace Science 35, 559–576. https://doi.org/10.1177/0738894216646978CrossRef Google Scholar

Paige, GD (1968) The Korean Decision, June 24–30, 1950. New York, NY: Free Press.Google Scholar

Palmer, G, McManus, RW, D'Orazio, V, Kenwick, MR, Karstens, M, Bloch, C, Dietrich, N, Kahn, K, Ritter, K and Soules, MJ (2022) The MID5 dataset, 2011–2014: procedures, coding rules, and description. Conflict Management and Peace Science 39(4), 470–482. https://doi.org/10.1177/0738894221995743CrossRef Google Scholar

Powell, R (2002) Bargaining theory and international conflict. Annual Review of Political Science 5, 1–30. https://doi.org/10.1146/annurev.polisci.5.092601.141138CrossRef Google Scholar

Powell, JM and Thyne, CL (2011) Global instances of coups from 1950 to 2010: a new dataset. Journal of Peace Research 48, 249–259. https://doi.org/10.1177/0022343310397436CrossRef Google Scholar

Quinn, D, Wilkenfeld, J, Smarick, K and Asal, V (2006) Power play: mediation in symmetric and asymmetric international crises. International Interactions 32, 441–470. https://doi.org/10.1080/03050620601011107CrossRef Google Scholar

Raleigh, C, Linke, A, Hegre, H and Karlsen, J (2010) Introducing ACLED: an armed conflict location and event dataset: special data feature. Journal of Peace Research 47, 651–660.CrossRef Google Scholar

Ramsay, KW (2017) Information, uncertainty, and war. Annual Review of Political Science 20, 505–527. https://doi.org/10.1146/annurev-polisci-051215-022729CrossRef Google Scholar

Reiter, D (2015) Should we leave behind the subfield of international relations?. Annual Review of Political Science 18, 481–499. https://doi.org/10.1146/annurev-polisci-053013-041156CrossRef Google Scholar

Reiter, D, Stam, AC and Horowitz, MC (2016) A revised look at interstate wars, 1816–2007. Journal of Conflict Resolution 60, 956–76. https://doi.org/10.1177/0022002714553107CrossRef Google Scholar

Sarkees, MReid and Wayman, F (2010) Resort to War: 1816–2007. Washington, DC: CQ Press.CrossRef Google Scholar

Schrodt, PA and Hall, B (2006) Twenty years of the Kansas event data system project. The Political Methodologist 14, 2–8.Google Scholar

Sechser, TS (2011) Militarized compellent threats, 1918–2001. Conflict Management and Peace Science 28, 377–401. https://doi.org/10.1177/0738894211413066CrossRef Google Scholar

Sherman, FL (2000) SHERFACS: a cross-paradigm, hierarchical, and contextually-sensitive international conflict dataset, 1937–1985: Version 1. ICPSR – Interuniversity Consortium for Political and Social Research. https://doi.org/10.3886/ICPSR02292.V1.CrossRef Google Scholar

Slantchev, BL (2011) Military Threats: The Costs of Coercion and the Price of Peace. Cambridge: Cambridge University Press.CrossRef Google Scholar

Smith, A (1998) International crises and domestic politics. American Political Science Review 92, 623–638. https://doi.org/10.2307/2585485CrossRef Google Scholar

Spruyt, H (1996) The Sovereign State and Its Competitors: An Analysis of Systems Change. Princeton, NJ: Princeton University Press.Google Scholar

Stein, AA and Russett, BM (1980) Evaluating war: outcomes and consequences. In Handbook of Political Conflict: Theory and Research, 399–422. Free Press New York.Google Scholar

Steinert-Threlkeld, ZC (2019) The future of event data is images. Sociological Methodology 49, 68–75. https://doi.org/10.1177/0081175019860238CrossRef Google Scholar

Sullivan, PL (2007) War aims and war outcomes: why powerful states lose limited wars. Journal of Conflict Resolution 51, 496–524. https://doi.org/10.1177/0022002707300187CrossRef Google Scholar

Sundberg, R and Croicu, M (2016) UCDP GED Codebook Version 5.0. Department of Peace and Conflict Research, Uppsala University.Google Scholar

Sundberg, R and Melander, E (2013) Introducing the UCDP georeferenced event dataset. Journal of Peace Research 50, 523–532.CrossRef Google Scholar

Terechshenko, Z (2020) Hot under the collar: a latent measure of interstate hostility. Journal of Peace Research 57, 764–776. https://doi.org/10.1177/0022343320962546CrossRef Google Scholar

Trager, RF (2016) The diplomacy of war and peace. Annual Review of Political Science 19, 205–228. https://doi.org/10.1146/annurev-polisci-051214-100534CrossRef Google Scholar

Turberville, A (1933) History objective and subjective. History 17, 289–302. https://www.jstor.org/stable/24400365 CrossRef Google Scholar

Ward, MD, Metternich, NW, Dorff, CL, Gallop, M, Hollenbach, FM, Schultz, A and Weschle, S (2013) Learning from the past and stepping into the future: toward a new generation of conflict prediction. International Studies Review 15, 473–490.CrossRef Google Scholar

Wilkenfeld, J and Brecher, M (2000) Interstate crises and violence: twentieth-century findings. In Midlarsky, MI (ed.), Handbook of War Studies II. Ann Arbor, MI: University of Michigan Press, pp. 271–300.Google Scholar

Wüest, B and Lorenzini, J (2020) External Validation of Protest Event Analysis. In Kriesi, H, Lorenzini, J, Wüest, B and Hausermann, S (eds), Contention in Times of Crisis: Recession and Political Protest in Thirty European Countries. Cambridge: Cambridge University Press, pp. 49–74.CrossRef Google Scholar

Yarhi-Milo, K (2013) In the eye of the beholder: how leaders and intelligence communities assess the intentions of adversaries. International Security 38, 7–51. https://doi.org/10.1162/ISEC/_a/_00128CrossRef Google Scholar

Yarhi-Milo, K, Lanoszka, A and Cooper, Z (2016) To arm or to ally? The patron's dilemma and the strategic logic of arms transfers and alliances. International Security 41, 90–139. https://doi.org/10.1162/ISEC/_a/_00250CrossRef Google Scholar

Zartman, I and Faure, GO (2005) Escalation and Negotiation in International Conflicts. Cambridge: Cambridge University Press.CrossRef Google Scholar

Zhang, H and Pan, J (2019) CASM: a deep-learning approach for identifying collective action events with text and image data from social media. Sociological Methodology 49, 1–57. https://doi.org/10.1177/0081175019860244CrossRef Google Scholar