1 Introduction
Identifying who is more likely to join an extremist movement is a pressing issue for both political science and public policy. However, empirical research on this topic is beset by methodological challenges. Population surveys offer little insight into the phenomenon as recruits to extremism are tiny minorities in any society, and so are tiny minorities in samples. This is before obvious problems related to eliciting truthful responses to questions probing illicit actions. Recent innovations in survey and online digital trace methodologies have allowed researchers to obtain more accurate measures of support for extremism (Bail, Merhout, and Ding Reference Bail, Merhout and Ding2018; Blair et al. Reference Blair, Fair, Malhotra and Shapiro2013; Corstange Reference Corstange2009; Mitts Reference Mitts2019). However, these approaches capture attitudes rather than behavior. For researchers interested in why some individuals join extremist movements and not others, the most common strategy is to collect a convenience sample of recruits. Using these data, scholars typically either (i) report sample proportions of a given characteristic, for example, the percentage of recruits who have college (university) education, or (ii) assign recruits to meaningful contexts and use the characteristics of those places to explain variation in the recruitment rate. While the first approach is descriptively useful, it fails to account for population baselines and other confounding factors affecting the incidence of recruitment. The second approach does provide a counterfactual and allows for multivariate analysis but suffers from familiar problems of ecological inference (Robinson Reference Robinson1950).
The method we propose in this paper allows researchers to leverage both survey and contextual data to make robust inferences about the individual and ecological correlates of recruitment to extremism. To do so, we take inspiration from the case–control design used in epidemiology and show how it can be adapted to combine a convenience sample of cases (recruits to extremism) with controls (respondents from a representative survey). In this, we build on the recent introduction of case–control methods to political science by Rosenfeld (Reference Rosenfeld2017; Reference Rosenfeld2018), who shows how this design can be used to study protest participation and other forms of rare political behavior. Several statistical challenges arising from the nature of extremism remain, however. In particular, popular approaches for modeling rare events (King and Zeng Reference King and Zeng2001) do not account for hierarchical data structures or spatial autocorrelation in the incidence of recruitment. We also have to account for potential separation issues and the possibility of contamination between cases and controls (Rosenfeld Reference Rosenfeld2018).
Our approach offers a complete solution to these statistical problems and can be described as a hierarchical, Bayesian case–control design that is robust to rare events, contamination, and spatial autocorrelation patterning the incidence of recruitment (Rota et al. Reference Rota, Millspaugh, Kesler, Lehman, Rumble and Jachowski2013). Following Rosenfeld (Reference Rosenfeld2018), the Bayesian approach is preferable for a number of reasons. First, it permits the use of informative priors to account for the true prevalence of the event of recruitment, as well as to regularize coefficient estimates to account for separation bias and instability when carrying out regressions (Heinze and Schemper Reference Heinze and Schemper2002). Second, in the absence of prior knowledge of the overall propensity of being a recruit in a given context, the model can estimate the propensity from the data (Rota et al. Reference Rota, Millspaugh, Kesler, Lehman, Rumble and Jachowski2013). Finally, Bayesian probabilistic programming software provides unique flexibility in the modeling of the complex hierarchical structures characterizing recruitment into extremism.
A great strength of our method—and the open-source software that accompanies this paper—is that applied extremism researchers can choose those parameters most relevant for their case. When sampling from national populations, the risk of contamination between cases and controls may be sufficiently low such that it does not pose a threat to inference. On the other hand, recruitment may not qualify as a rare event when comparing recruits to certain subpopulations. So too, spatial autocorrelation in recruitment may not apply if sampling from a small area or closed context. Our modeling strategy is flexible to the inclusion or exclusion of these parameters, depending on the case at hand. In support of our approach, and to help guide the modeling decisions of future practitioners, we provide practical advice and an extensive simulation study that compares our model to alternative frameworks, and show its robustness and superiority in predicting the true underlying probability of recruitment under various bias-inducing scenarios.
To display some of the key properties of our modeling strategy, we analyze recruitment of Sunni Muslim males in nine MENA countries to the Islamic State in Iraq and Syria (ISIS). We focus our analysis on an individual’s level of education and social status—two key factors associated with recruitment to extremism found in the literature on violent Islamist movements (Gambetta and Hertog Reference Gambetta and Hertog2016; Krueger Reference Krueger2017; Krueger and Maleckova Reference Krueger and Maleckova2003; Mesquita Reference Mesquita2005; Morris Reference Morris2020). We show how our approach can be used to perform two types of analyses. In the first, we leverage a multilevel regression model trained on a cross-national sample of ISIS recruits and non-recruits. This provides a robust descriptive analysis about the individual-level characteristics of recruits across countries and subnational administrative units. A second analysis focuses on two countries for which we have rich contextual information: Egypt and Tunisia. This analysis adds value by adjusting for local heterogeneity with the addition of relevant ecological covariates, allowing us to ascertain the potential sensitivity of individual-level findings to unobserved contextual confounding.
For the purposes of illustration, we implement the complete solution described above, accounting for spatial autocorrelation in recruitment, the possibility of contamination, and separation in our regression coefficients. Overall, we find that high-status males with college education in their early twenties were more likely to join ISIS. We also find that relatively deprived males in Egypt were more likely to join ISIS, but not in Tunisia. This heterogeneity in the individual and contextual correlates of violent extremism demonstrates the importance of accounting for both individual- and context-specific factors.
2 Explaining Recruitment to Extremism
A common strategy available to researchers interested in the correlates of recruitment to extremism is to sample on the dependent variable, obtaining relevant demographic information on individual extremists or members of extremist movements. In the ideal scenario, researchers are able to obtain movement membership lists, which can reveal information on tens of thousands of individuals (e.g., Biggs and Knauss Reference Biggs and Knauss2012), although in practice such complete data are rare. Absent such lists, a well-established strategy is to leverage data from arrests or killings to generate samples of participants (e.g., Ketchley and Biggs Reference Ketchley and Biggs2017; Krueger and Maleckova Reference Krueger and Maleckova2003; Skare Reference Skare2022). Alternatively, researchers can look to collect demographic information on extremists by either interviewing former recruits (e.g., Bérubé et al. Reference Bérubé, Scrivens, Venkatesh and Gaudette2019; della Porta Reference della Porta2013) or by reconstructing the biographical profiles of prominent individuals from open-source information (e.g., Gambetta and Hertog Reference Gambetta and Hertog2016; Jensen, Atwell Seate, and James Reference Jensen, Atwell Seate and James2020; Ketchley, Brooke, and Lia Reference Ketchley, Brooke and Lia2021). Per Rosenfeld (Reference Rosenfeld2018), a principle limitation of these samples is that they do not provide information on individuals outside of the subpopulation of interest, meaning that it is not possible to compare recruits to the population from which they are drawn. To remedy this, researchers typically either confine attention to variation among recruits (e.g., Morris Reference Morris2020), or else assign individuals to meaningful contexts, for example, universities, cities, or countries, and then use the characteristics of those units to explain cross-sectional variation in the recruitment rate (e.g., Barrie and Ketchley Reference Barrie and Ketchley2018; Pape Reference Pape2021). While this latter approach is undoubtedly superior to simply analyzing sample proportions, it inevitably relies on ecological inference.
2.1 A Hierarchical Bayesian Case–Control Design
In what follows, we suggest two new methods for analyzing recruitment to extremism. The first leverages a cross-national, multilevel regression model trained on a complete sample of recruits and survey respondents. This provides a robust descriptive analysis about the individual-level factors which characterize recruits across countries and subnational units. The model uses random effects to control for unobservable subnational heterogeneity; these are preferable to fixed effects due to potentially heavily imbalanced area-level sample sizes (Clark and Linzer Reference Clark and Linzer2015; Gelman and Hill Reference Gelman and Hill2006). The model further uses a conditionally autoregressive prior (Besag, York, and Mollié Reference Besag, York and Mollié1991; Morris et al. Reference Morris, Wheeler-Martin, Simpson, Mooney, Gelman and DiMaggio2019) to account for spatial smoothing. The second analysis focuses on single country studies where rich contextual information is available. The added value of this analysis lies in controlling for local heterogeneity in order to ascertain the robustness of any individual-level findings to contextual confounding. Taken together, our proposed setup thus plots a way forward for researchers to combine survey and ecological information for the robust analysis of recruitment to extremism.
2.2 Simple Case–Control Setup
We begin by describing the backbone of our model, which is a logistic regression accounting for case–control sampling protocol via an offset. Borrowing from Rota et al. (Reference Rota, Millspaugh, Kesler, Lehman, Rumble and Jachowski2013), we define $r_i=\{0,1\}$ as the set of states that observation i in our sample of size $n = n_0 + n_1$ can obtain, where $r_i = 1$ implies the observation is a “case”, $r_i = 0$ defines a control, , and . In our application, a “case” would refer to a known extremist; a “control” to a survey respondent. Recall that cases are selected entirely on the dependent variable, while controls come from the population that cases are drawn from. Take $N_1$ to represent the number of cases in the population of interest, and $N_0$ the number of controls. The probability of being included in the sample ( $s_i = 1$ ) conditional on the true state of any individual can hence be understood as $P_1 = \mbox {Pr}(s_i = 1 \mid r_i = 1) = \frac {n_1}{N_1}$ , while that of being sampled as a control is $P_0 = \mbox {Pr}(s_i = 1 \mid r_i = 0) = \frac {n_0}{N_0}$ . The log ratio of these sampling probabilities can then be used as an “offset” in a logistic regression, to account for the sampling protocol. The hierarchical specification of the model follows, with regression coefficients being assigned a very weakly informative prior;Footnote 1
The above hierarchical model thus contains three layers: layer (1) is a model of the true state of an observation, conditional on their latent propensity $\rho $ ; layer (2) describes this latent propensity, by accounting for systematic variation due to heterogeneity in covariates; and layer (3) models the effects of each covariate by assigning a prior probabilistic model.
2.3 Contaminated Controls
Recall that the case–control setup as described above takes known recruits and combines them with “controls” taken from survey respondents. While we know that our cases are correctly labeled, we do not know whether this is true of our controls. That is, our controls may be “contaminated” as survey respondents may have become recruits (Lancaster and Imbens Reference Lancaster and Imbens1996; Rosenfeld Reference Rosenfeld2018). This is especially concerning when researchers have access to biographical information on tens of thousands of extremists (e.g., Biggs and Knauss Reference Biggs and Knauss2012) or are comparing recruits to small subpopulations (e.g., Ketchley and Biggs Reference Ketchley and Biggs2017; Ketchley et al. Reference Ketchley, Brooke and Lia2021). Rota et al. (Reference Rota, Millspaugh, Kesler, Lehman, Rumble and Jachowski2013) outline a “latent variable” formulation of their contamination model. Below, we present our version of that same model as a mixture, which we find more intuitive.
The “label” of an observation, $y_i=\{0,1\}$ , is observed for all observations, while the true “state” of an observation, $r_i=\{0,1\}$ , is only observed for cases. The implied probability distribution of labels conditional on being a control is:
Due to contamination, it is possible that observations characterized by $y_i=0$ are actually in state $r_i = 1$ ; hence, we need a probability distribution for $y \mid r_i = 1$ . Let $\pi = \frac {N_1}{N_1 + N_0}$ be the prevalence of recruits in the population of interest, and let be the number of unlabeled observations. We expect there to be $\pi n_u$ cases among the unlabeled observations. We can then characterize the probability distribution of labels, conditional on being a case, as
Finally, our model for the latent state $r_i$ must reflect the possibility of contamination. We do this by redefining the relative risk of being sampled as
The updated, hierarchical specification for the case–control model accounting for contaminated controls is then
In summary, we derive our labels via two distinct data-generating processes, identified by a latent state $r_i = \{1,0\}$ . In the event that the latent state of a given record is that of a true control, $r_i = 0$ , it is then impossible for this record to be labeled $y_i = 1$ ; conversely, if the latent state is that of a true case, $r_i = 1$ , then it is still possible for a record to be labeled $y_i = 0$ , with probability $(1-\theta _1)$ . This latter model describes the issue of contamination. Note that in our application, $\theta $ is always observed, and fed to the model as data.
2.4 Area-Level Random Effects
Survey data and information on extremists often contain information on the origin or location of residence of individuals. We can understand individuals as nested within geographical units of increasing sizes. Generalizing, we can exploit variance at three levels: the individual, some small-area, and some large-area.
These area effects could be incorporated in the model via fixed effects, by expanding the design matrix to include relevant dummy variables for each area of interest. We consider this strategy unwise when trying to explain recruitment to extremism and prefer a random-effects approach. In the case of rare forms of political behavior, our geographical units at all levels of analysis will have relatively few observations (Gelman and Hill Reference Gelman and Hill2006). Additionally, for many units, we will have no cases. Finally, we know that lists of recruits are unlikely to be exhaustive; that is, we will not have data for every recruit hailing from every subnational unit or country. Here, a sample of recruitment data or similar can be treated as a non-probability sample—it is unlikely that we can have complete confidence the sample constitutes a complete or random sample of the population of interest. Given these concerns, a random-effects approach is preferable as it means: (1) we are able to borrow strength across areas, which also increases efficiency, to produce more realistic estimates for the area-level coefficients (Baio Reference Baio2012; Clark and Linzer Reference Clark and Linzer2015) and (2) in the absence of more detailed knowledge about the data-generating process, the shrinkage effect obtained by partial pooling is more likely to shield our estimates from any systematic sampling bias among our cases (Gelman and Hill Reference Gelman and Hill2006).
We can also relax some of the theoretical bias associated with the shrinkage induced by random effects via incorporating observable area-level heterogeneity in the design matrix as fixed effects (Gelman and Hill Reference Gelman and Hill2006). This is what we elect to do in single-country analyses. Finally, it is worth highlighting that our goal is not to make inferences about area-level effects. Rather, we seek to strip our individual-level effects estimates of contested variance that may be associated with the provenance of the recruit. The resulting hierarchical model is as follows:
where $\epsilon $ stands for some arbitrary number, chosen as a compromise to minimize the prior information and maximize the Markov chain Monte Carlo (MCMC) convergence speed and stability.
2.5 Spatial Autocorrelation
The network ties connecting actors across space play an important role in recruitment to high-risk activism. Sometimes the ties connecting recruits will be available; more commonly this information will not be recoverable. In the absence of detailed network information, we propose controlling for network effects at levels of varying scale. We work on the assumption that network ties are more likely to form between individuals who are geographically proximate. Depending on the richness of the data on recruits, we may generate distance matrices between geographical units of varying size.
To account for area-level spatial autocorrelation, we incorporate a version of the conditional autoregressive (CAR) model (Besag et al. Reference Besag, York and Mollié1991). This approach has been used in individual-level models of behavior, enabling local smoothing of predictions according to behavior observed in neighboring areas (Selb and Munzert Reference Selb and Munzert2011). The key ingredients of a CAR model are $\boldsymbol {\omega }$ , a distance-weight matrix; $\alpha $ , a parameter governing the degree of autocorrelation, where $\alpha =0$ implies spatial independence, and $\alpha =1$ implies an intrinsic conditional autoregressive (ICAR) model (Besag and Kooperberg Reference Besag and Kooperberg1995); and $\sigma _{\psi }$ , the standard deviation of the subnational unit effects. The resulting model for spatial random effect $\psi _l \mbox { } \forall \mbox { }l = \{1,...,L\}$ is then
In practice, we implement the ICAR specification of the model, with $\alpha = 1$ , and take $\boldsymbol {\omega }$ to be the neighborhood matrix. The neighborhood matrix has diagonals zero (a unit cannot neighbor itself) and off-diagonal zero or one depending on whether the given units are neighbors. We choose this specification of the distance matrix because of the efficiency gains it affords in a Bayesian context (Morris et al. Reference Morris, Wheeler-Martin, Simpson, Mooney, Gelman and DiMaggio2019). This leads to
where $d_{l,l}$ is an entry of the diagonal matrix D of size $L \times L$ , whose diagonal is defined as a vector of the number of neighbors of each area. The joint distribution of this model is simply a multivariate normal distribution $\boldsymbol {\phi } \sim N(0,[\tau _\psi (D - W)]^{-1})$ , $\tau _\psi = \frac {1}{\sigma ^2_\psi }$ , which is conveniently proportional to the squared pairwise difference of neighboring effects. Note that the sum-to-zero constraint is needed for identifiability, as in its absence any constant added to the $\psi $ s would cancel out in the difference.Footnote 2 Following Morris et al. (Reference Morris, Wheeler-Martin, Simpson, Mooney, Gelman and DiMaggio2019), setting the precision to $1$ and centering the model such that $\sum ^L_l \psi _l = 0$ , we arrive at
The hierarchical model we implement to incorporate the spatial component is within the Besag–York–Mollié (BYM) family (Besag et al. Reference Besag, York and Mollié1991). For a given level of analysis, say the city or province in a cross-country analysis, BYM models are characterized by two random effects which explain unobserved heterogeneity: $\phi _l$ defines a non-spatial component, while $\psi _l$ defines systematic variance due to spatial dependency. The typical challenge with BYM is that the two areal effects cannot be identified without imposing some structure since they are mutually dependent, meaning that either component is capable of accounting for contested variance at the area level. This leads to inefficient posterior exploration of any MCMC sample, and subsequent lack of convergence (Riebler et al. Reference Riebler, Sørbye, Simpson and Rue2016). To overcome this, we implement a state-of-the-art solution leveraging penalized-complexity priors (Simpson et al. Reference Simpson, Rue, Riebler, Martins and Sørbye2017), which proposes modeling the two effects as a scaled mixture such that
where $\phi $ and $\psi $ are random effects scaled to have unitary variance and $\lambda \in [0,1]$ is a mixing parameter, defining the proportion of residual variation attributable to spatial dependency. In order for the spatial and unstructured effects to share $\sigma $ , they must be on the same scale (Riebler et al. Reference Riebler, Sørbye, Simpson and Rue2016). We must therefore scale the ICAR-distributed effects, as their original scale is defined by the local neighborhood. A proposed scaling factor is chosen such that the geometric mean of the variance parameters over the areal units is $1$ , $\mbox {Var}(\psi _l)= 1$ . Note that this scaling factor, s in the equation above, can be calculated directly from the adjacency matrix, and hence it is not to be estimated but passed to the model as data.
The resulting hierarchical specification of our model follows:
where $\frac {1}{2}N$ denotes a half-normal distribution, which is the recommended prior for the variance of BYM effects (Morris et al. Reference Morris, Wheeler-Martin, Simpson, Mooney, Gelman and DiMaggio2019).
2.6 Regularizing Prior Coefficients
Multiple contributions have highlighted problems with logistic regression coefficient estimates under rare events (King and Zeng Reference King and Zeng2001). The intuition behind these challenges is typically described as some variation on the standard separation problem where any given covariate or simple combination thereof perfectly separates cases from controls. This leads to biased and unstable point estimates with large associated uncertainty (Heinze Reference Heinze2017). A number of regularization techniques have been proposed to reduce bias and stabilize the coefficient estimates. Our preferred regularization method is that proposed by Gelman et al. (Reference Gelman, Jakulin, Pittau and Su2008) and Ghosh, Li, and Mitra (Reference Ghosh, Li and Mitra2018). The approach assumes that it should be unlikely to observe unit changes in the (standardized) covariates that would lead to outcome changes as large as $5$ points on the logit scale. Using a slight variation on this approach to ensure sufficient regularization, we use a Cauchy prior with scale-parameter set to $1$ for the regression coefficients, and a “looser” scale of $10$ logit points on the intercept to accommodate for the rarity of the event in the sample. The advantages of the Cauchy prior lie in its fat tails, which avoid over-shrinkage of large coefficients (Ghosh et al. Reference Ghosh, Li and Mitra2018). We apply this prior to our fixed effects exclusively, as the likelihood of our random effects is already structured and penalized. Our final model specification is then as follows:
2.7 Simulation and Practical Advice
In Section D of the Supplementary Material, we outline an extensive simulation study demonstrating the performance advantage of a hierarchical Bayesian case–control approach relative to competing strategies such as the King and Zeng model (Reference King and Zeng2001), as well as a simple fixed-effects logistic regression. In the simulation study, we explicitly test the performance of our model under varying values for the following parameters: (a) sample size $(n)$ ; (b) population prevalence $(\pi )$ ; (c) discrepancy between sample and population prevalence $(\pi - \hat {\pi })$ ; and (d) spatial autocorrelation (as measured by Moran’s I). Two dimensions of our modeling framework remain untested: (i) the sensitivity of the model to poor prior information about $\pi $ , the population prevalence assumed for the contamination layer, and (ii) the model’s ability to deal with non-probability samples resulting from exogenous selection effects (i.e., beyond the “selection on the dependent variable” type). In Section E of the Supplementary Material, we provide actionable advice for researchers and discuss how these untested dimensions may affect the robustness of the model, in light of the results from the simulation study and the robust modeling framework we have adopted.
3 Who Was More Likely to Join ISIS?
To illustrate our approach, we analyze a set of leaked border documents capturing recruitment to ISIS. This leak was widely covered in international news media and has been used to provide descriptive statistics on the geographical distribution and demographic characteristics of ISIS fighters from multiple MENA countries (Devarajan et al. Reference Devarajan2016; Sterman and Rosenblatt Reference Sterman and Rosenblatt2018; Zelin Reference Zelin2018). For the case–control design, we combine individual-level ISIS recruitment data with a nationally representative sample of Muslim males from Wave III of the Arab Barometer (2014) survey. The fieldwork for the Arab Barometer surveys was completed before most recruits recorded in our border documents entered ISIS-held territory, and so may be vulnerable to contamination.Footnote 3
Our choice of covariates to use from this survey is constrained by the information included in the border documents. We elect to include covariates for age, age squared, marital status, college education, and student status. We also combine two variables for unemployed and employment in agricultural or manual labor to create a composite variable designed to measure “low status” activity. An interaction between this variable and our college education variable is designed to capture relative deprivation, that is, whether highly educated individuals engaged in low status economic activity are more likely to become recruits. Full details of each covariate are listed in the Supplementary Material.
A first model—which we refer to as the “Bird’s Eye” approach—uses a multilevel regression model trained on the complete sample of $1,051$ recruits and $5,093$ unlabeled records. This first model provides a robust descriptive analysis of the individual-level factors characterizing recruits across countries and subnational units.
A second model—which we refer to as the “Worm’s Eye” approach—incorporates contextual information for Egypt ( $n_1 = 66, n_0 = 551$ complete records) and Tunisia ( $n_1 = 426$ , $n_0 = 589$ complete records) at the district level. We focus on these two countries due to the availability of contextual information at the district level that is not accessible for the other countries in our sample. The added value of this analysis lies in controlling for observable district-level heterogeneity in order to ascertain the robustness of any individual-level findings to contextual confounding. For both Egypt and Tunisia, we include variables to capture subnational differences in demographic and labor-market composition, employment opportunities, as well as more context-specific variables designed to capture support for Islamist political organizations and prehistories of contentious politics. Full details of all covariates are listed in the Supplementary Material.
For the main analyses, we present (1) the posterior density of fixed and random effects according to our models and (2) the posterior predictive distribution across potential recruitment profiles.Footnote 4
3.1 Fixed and Random Effects
Figure 1 presents the posterior density of the individual-level fixed effects in the Bird’s Eye model; Figure 2a and b presents the Worm’s Eye equivalent. These plots contain the main results of our models. Note that all the covariates, including dummies, are centered and scaled; hence, the coefficients are to be interpreted in terms of standard deviations from the mean of each covariate (Supplementary Figures G.1–G.3 are the individual-level posterior densities on the original, non-standardized scale). Since we are principally interested in the robust estimation of individual-level predictors, we display only the posterior density of individual fixed effects for all of our models.Footnote 5
The estimated intercepts for the three models are extremely low. For the Bird’s Eye model, the log odds are in the order of $-11$ . For the Egypt Worm’s Eye model, it is just over $-13$ ; in Tunisia, it is $-9$ . The size of the intercept is primarily driven by the size of the offset, which is in turn determined by the overall prevalence of recruitment. It is therefore not surprising that Egypt’s intercept is so dramatically low, given the close-to-zero prevalence of recruitment when compared to population size ( $\pi = \frac {4}{100,000}$ ) versus Tunisia where this prevalence is higher ( $\pi = \frac {2}{1,000}$ ). For the Bird’s Eye model, a different offset is provided for observations coming from different countries, to account for country-specific prevalence. The large and negative intercept underscores an important challenge in the explanation of why individuals join movements like ISIS: a linear combination of features capable of pushing an individual to become a recruit has to be extremely large, on the log-odds scale, to meaningfully affect the otherwise extremely low probability of recruitment.
We focus primarily on testing the role of education and social status in an individual’s decision to join ISIS. An individual who has college education and low status is assumed to be relatively deprived. We compare predicted log odds, as opposed to predicted probabilities, as these are scarcely comparable due to the powerful effect of the intercept, which drags probabilities of most profiles close to zero (though see Supplementary Figures H.1–H.3 for predicted probabilities of recruitment relative to the “average” profile, and Supplementary Figures H.4–H.6 for expected counts under different relative-deprivation scenarios). The total logit effects on probability of recruitment for different relative-deprivation profiles are shown in Figure 3 for the Bird’s Eye model, and in Figure 4a and b for the Worm’s Eye.
Relative deprivation finds mixed support: at the Bird’s eye level, we find being high status plays a key role in increasing propensity of being recruited, while having college education plays a more minor role. A similar pattern is evident in Tunisia, though the effect of being high status and having college education is starker, meaningfully increasing the propensity to join ISIS by around $3$ points on the log-odds scale compared to relatively deprived individuals. In Egypt, the effects are more consistent with relative deprivation; however, note the large prediction intervals around the total effects of relatively deprived individuals. There is also substantial overlap between the distributions in all plots. This is largely due to the uncertainty around the intercept, which plays a role in marginalizing these effects. Note further that varying prediction intervals on the effects reflect the highly unbalanced prevalence of the groups in our study. All in all, the evidence from these analyses suggests that high-status individuals were more likely to be recruited by ISIS, and that being high status and having a college education further increases the likelihood of recruitment. The large prediction intervals, which result from uncertainty around the intercept, underscore that much remains unknown about the underlying systematic determinants of recruitment.
To fit the ICAR model, we implemented the fully connected graph shown in Figure 5a. The spatially autocorrelated component dominates the governorate-level variance, as shown by the posterior of mixing parameter $\lambda $ in Figure 5b, estimated via Monte Carlo mean at close to $0.9$ , suggesting that around $90\%$ of the variance at the governorate level can be explained by the ICAR model.Footnote 6
We repeat these analyses for Egypt and Tunisia. Figure 6 shows similar mixing among spatial and non-spatial components for the two countries, with around $15\%$ of the district-level variance in Egypt being explained by spatial patterns, and $19\%$ in Tunisia. It is noteworthy that very few of our contextual variables have explanatory power for predicting recruitment. Coupled with the low percentage of variance being explained by the spatial components, our analysis suggests that, in spite of our best efforts to account for observable heterogeneity, there exist a vast array of unobserved, non-spatial district-level effects, which accounts for over $80\%$ of the unexplained district-level variance in both Egypt and Tunisia. Hence, this contextual variance, while properly accounted for, remains unexplained. In the Supplementary Material, we also describe Moran’s I statistics for the Worm’s Eye analysis as well as point estimates for the district and governorate effects in Egypt and Tunisia (Supplementary Figure I.1).
3.2 Predicted Propensity of Recruitment by Profile
To conclude our analysis, we present inferences derived from the posterior predictive distribution of the out-of-sample probability of recruitment, focusing on individual-level characteristics.
What is the profile of individuals “at risk” of recruitment to ISIS according to our models? We attempt to answer this question by analyzing the predicted probabilities of all possible theoretical profiles, defined by the individual-level characteristics available in our data. Every profile is assumed to come from a hypothetical “average district”. Figure 7 presents point estimates and prediction intervals for the log odds of recruitment, over $160$ possible profiles in the Bird’s eye model. Similar plots displaying the absolute and relative probabilities of recruitment are available in Supplementary Figures I.14 and I.15. Table 1 presents the top $10$ most likely profiles to be recruited, providing four useful metrics to interpret the results: predicted probability; predicted rate per $10,000$ people; predicted odds relative to the average profile; and log odds.
A note of caution on the interpretation of these visuals: these are useful summaries of the data, but the uncertainty around the point estimates tends to be relatively large. Taking Figure 7 as an example, a qualitative interpretation of the uncertainty would be as follows: it cannot be categorically ruled out that the most likely profile is actually ranked only $30$ th (out of $160$ ), though this would be very unlikely given the evidence implied by the data. In general, we note that profiles which are at high risk of recruitment are endowed with higher levels of certainty around their point estimates, suggesting that: (i) it is possible to distinguish high-risk profiles from low-risk profiles (at least in Tunisia and in the Bird’s eye view) and (ii) it is easier to distinguish between different high-risk profiles than it is between low-risk profiles. For Egypt, although we do observe a reduction in uncertainty at high levels of risk, we cannot entirely distinguish between low-risk and high-risk profiles, as a significant degree of overlap between posterior distributions is maintained across profiles. This is likely as a result of the relatively small sample of cases, and the large effect of the unexplained intercept.
From the Bird’s Eye prediction intervals, we notice that the predicted probability of recruitment is centered around $-15$ on the log-odds scale, again underscoring the rarity of becoming a recruit. A select number of profiles approach a predicted probability around $-7$ , and translate to meaningful rates of recruitment; these are highlighted in the predicted probabilities table, which show the $10$ most recruitable profiles. Looking at Table 1, we can say that the most likely recruit profile (loosely characterized as a young, high-status, Sunni male with some college education who is unmarried and not currently studying) is around $23$ times as likely to be recruited as an average Sunni male from an average area in the MENA. For every $10,000$ members of the most recruitable profile across the region, we expect five to have joined ISIS. It is worthwhile to note that, consistent with Figure 3, all the most recruitable profiles are high-status individuals, and a majority of them has some college education. Unsurprisingly, all of these profiles are under 25, and not currently studying.
The Bird’s Eye profiles are comparable to the Worm’s Eye profiles for Tunisia (Figure 8 and Table 2), whereas the Egypt analysis points to stronger evidence for the relative deprivation hypothesis. In Egypt, a majority of the likely recruit profiles are relatively deprived (Figure 9 and Table 3).Footnote 7 The relative recruitment likelihood of the most susceptible profiles in Egypt and Tunisia is also greater. In Egypt, the most likely recruit profile (loosely characterized as a young, low-status, Sunni male with some college education who is married and is currently studying) is around $157$ times as likely to be recruited as the average Egyptian Sunni male. The Egypt-specific recruitment propensity is dramatically lower than that of Tunisia, again highlighting the role of contextual effects. In Tunisia, the most likely recruit profile (loosely characterized as a young, high-status, Sunni male who has college education is unmarried and is not currently studying) has a probability of recruitment equivalent to $0.04$ . This profile is over $335$ times as likely as the average Tunisian Sunni male to be recruited, highlighting that though recruitment is still relatively rare in the population, the probability of recruitment is far greater in the top recruitment profiles. Figure I.19 shows that only a handful of profiles have predicted probabilities above $\frac {1}{100}$ .
4 Conclusion
Extreme forms of political behavior are rarely ever committed by more than a tiny subsection of any given national population. Despite their small size, these groups often have an outsized influence on state and international politics. Because of their small size, extremists are particularly hard to study using conventional statistical methods and research designs.
To address this, we propose that extremism researchers take inspiration from epidemiology and recent applications of case–control methods in political science (Rosenfeld Reference Rosenfeld2018). Here, we propose a new variant of the case–control design that allows us to combine survey techniques with ecological forms of analysis, allowing for meaningful comparisons with the underlying populations from which recruits are drawn. To implement this, we solve a number of statistical problems when explaining rare and extreme forms of political behavior. In particular, we demonstrate (1) how best to incorporate area-level random effects when the number of recruits for a given unit is small, (2) how to account for spatial autocorrelation in this setup, and (3) how to regularize coefficients to guard against separation. Simulations demonstrate the performance advantage of this new approach over alternatives.
While our analysis focuses on recruitment to ISIS, our hope is that this paper inspires social scientists to apply case–control methods to other instances of extremism where data on recruits and population surveys are available. Examples include participation in the 2021 attack on the Capitol Building in Washington, DC (Pape Reference Pape2021), recruitment to far-right movements and white supremacist groups (Klandermans and Nonna Reference Klandermans and Nonna2006; Simi et al. Reference Simi, Blee, DeMichele and Windisch2017), as well as other examples of violent extremism (della Porta Reference della Porta2013). It is in this spirit that we provide the extremeR software package so that extremism researchers working on a range of different cases can easily apply our models (see http://extremeR.info).
Acknowledgments
We received helpful feedback and advice from Sir David Cox, Thomas Hegghammer, Bjørn Høyland, Bent Nielsen, Jacob Aasland Ravndal, and Frank Windmeijer. Hertog, Neumann, and Maher (Reference Hertog, Neumann and Maher2021) reached out to us as we were finalizing our manuscript. Their analysis also uses leaked ISIS recruitment data to analyze the socioeconomic correlates of joining the movement. To implement the methods described in this paper, see the associated R package, as well as documentation and vignettes: http://extremeR.info.
Author Contributions
N.K. and C.B. conceived of the study. R.C. developed the models. R.C., C.B., and N.K. contributed to the analysis. R.C., C.B., and N.K. developed the R package and the documentation. C.B., N.K., and A.Y.Z. contributed to the data collection. C.B., N.K., and A.Y.Z. developed the literature review. All authors contributed to the writing.
Data Availability Statement
All data and code required to replicate the results and simulations described in the main article and the Supplementary Material can be found at https://doi.org/10.7910/DVN/HYOQCD (Cerina et al. Reference Cerina, Barrie, Ketchley and Zelin2023).
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2023.35.