Social media summary: When should cultural distance between groups correlate – or not correlate – with parochial altruism and warfare?
1. Introduction
Wright (Reference Wright1949) introduced FST as a measure of genetic population structure to assess how genotype frequencies for each subpopulation differ from expectations assuming random mating. Also called the inbreeding coefficient, FST responds to the relative influence of selection, migration, mutation and drift operating between and within groups (Holsinger & Weir, Reference Holsinger and Weir2009). Like the well-known R 2 or the ICC calculation in generalised linear modelling, the FST index is a ratio of between-group to total variance, and measures the extent to which group structure ‘explains’ variation across a population on some discrete or continuous trait or set of traits. The popularity of such variance ratios is due in part to their ready interpretation – values near 0 indicate that traits within any specific group are about as variable as within the population as a whole, while values near 1 indicate that almost all variance exists between (mostly homogeneous) groups (Fig. 1).
Variance measures can similarly provide insights into cultural processes in group-structured populations. While the modes of cultural inheritance are more varied than in genetic evolution, cultural FST, or CFST (Bell et al., Reference Bell, Richerson and McElreath2009), indicates the relative amount of segregation or self-assortment taking place on cultural traits, and can measure the between-group cultural distance caused by cultural selection, migration, social learning and other forces (Boyd & Richerson, Reference Boyd and Richerson1985; Cavalli-Sforza & Feldman, Reference Cavalli-Sforza and Feldman1981). The uses of cultural FST have greatly multiplied in recent years, with study systems ranging from chimpanzee tool use (Boesch et al., Reference Boesch, Kalan, Mundry, Arandjelovic, Pika, Dieguez and Kuhl2020) to musical diversity (Rzeszutek et al., Reference Rzeszutek, Savage and Brown2012), folk tales (Ross et al., Reference Ross, Greenhill and Atkinson2013), the evolution of cooperation (Handley & Mathew, Reference Handley and Mathew2020; Smith et al., Reference Smith, Larroucau, Mabulla and Apicella2018) and cultural distances between religions (White et al., Reference White, Muthukrishna and Norenzayan2021) and nations (Muthukrishna et al., Reference Muthukrishna, Bell, Henrich, Curtin, Gedranovich, McInerney and Thue2018). By answering what fraction of the total variance is found between groups, cultural measures of FST mirror its broader use in genetics in understanding both the origins and implications of population structure.
In practice, CFST is thought to positively predict the cooperativeness of groups. Smith et al. (Reference Smith, Larroucau, Mabulla and Apicella2018) write that ‘if FST is large enough, then individually deleterious but group-beneficial traits can evolve’ and Zefferman and Mathew (Reference Zefferman and Mathew2015) argue that ‘a high cultural FST promotes cultural predisposition for warfare because, as cultural norms and institutions are concentrated in specific groups, they will spread disproportionately as these groups win resources’.
In light of the recent popularity of CFST, it is essential to strengthen the connection between CFST and specific causal models of social evolution. To state that a CFST estimate of 0.01, 0.1 or 0.6 is ‘large’ or ‘small’ is not provided by the Law of Total Variance, nor is the meaning of a comparison of CFST values calculated on different traits or between two populations. Currently, the main justification for the use of FST is its importance in mathematical models of cooperation, such as the Prisoner's Dilemma and the Public Goods Game, in which altruistic individuals pay some fixed cost to produce a fixed benefit within their group. In such additive models of the evolution of altruism, FST becomes a concise measure of the relative scope for between-group and within-group selection (Hamilton, Reference Hamilton and Fox1975), and serves as a quantitative threshold for the spread of altruistic behaviour, e.g. eq. (5) in Bowles (Reference Bowles2006) and eq. (1) in both Bell et al. (Reference Bell, Richerson and McElreath2009) and Richerson et al. (Reference Richerson, Baldini, Bell, Demps, Frost, Hillis and Zefferman2016) (see Supporting Information, SI, Section 1).
Yet, many features of social life are beyond the ability of such additive models of cooperation to articulate (Skyrms, Reference Skyrms2004). In particular, the consequences for choosing a cooperative or non-cooperative behaviour are often not fixed, but rather depend on the current prevalence of behaviours within one's group; the interactions have frequency-dependent or synergistic payoffs (Grafen, Reference Grafen1979; Queller, Reference Queller1985) and so lack dominant strategies. Many culturally transmitted norms have this property, and in such systems, the interests of the individual and the group are often not necessarily in opposition. For example, groups residing on opposite sides of an ecological or ethnic frontier often have institutions and cultural norms that are highly differentiated (McElreath et al., Reference McElreath, Boyd and Richerson2003), and so will have high pairwise CFST. If social interactions in this meta-population resemble cooperative dilemmas, it is reasonable to expect the emergence of parochial altruism with culturally similar neighbours (Handley & Mathew, Reference Handley and Mathew2020) and even organised raiding and warfare against outgroups (Turchin, Reference Turchin2009; Zefferman & Mathew, Reference Zefferman and Mathew2015). Yet if social interactions were more accurately characterised by economic exchange, one might instead expect that potential access to non-local resources (Pisor & Gurven, Reference Pisor and Gurven2016) or possible risk-buffering against local shocks (Liu & Mostafavi, Reference Liu and Mostafavi2023) would select for norms of generosity towards the out-group and greater cross-cultural competence (Bunce, Reference Bunce2020). Depending on the causal model of the social interaction, i.e. the ‘rules of the game’, the CFST statistic may plausibly hold very different meanings. As with additive altruism, though, confirming such a hypothesis requires the careful reasoning provided by formal mathematical theory.
We here seek to develop the analytical connection between FST, both cultural and genetic, and the evolution of group-beneficial traits (GBTs) across a variety of frequency-dependent social interactions. Our approach proceeds in three steps. First, we review the properties of various coordination and anti-coordination games to build a general frequency-dependent model of social interaction, from which games like Stag Hunt and Hawk–Dove can be viewed as specific cases. Our synthetic model of linear synergy builds on recent work by Allen and Nowak (Reference Allen and Nowak2015) and Van Cleve (Reference Van Cleve2017), and can describe a continuous spectrum of non-additive games via a single parameter, θ. Following the classic derivation of the evolution of altruism using the FST variance ratio (Hamilton, Reference Hamilton and Fox1975), we analyse our general frequency-dependent system to identify the conditions under which group-beneficial outcomes can evolve, focusing in particular on the role of within- and between-group variance. This extends classic results on FST and assortment in mathematical biology (Gardner et al., Reference Gardner, West and Wild2011; Queller, Reference Queller1985) into two regions of the synergistic spectrum not previously considered: simple coordination and complementarity. Based on these findings, we re-evaluate the existing body of empirical work on cultural FST in light of some testable predictions from the model. By doing so we hope to pair the growing programme of quantifying cultural variation with a suite of models of social interaction framed in the language of evolutionary game theory.
2. The spectrum of social games
The field of evolutionary game theory has contributed substantially to our understanding of human and non-human societies over the last half-century (Gintis, Reference Gintis2000). In this approach, individual agents within a population are treated as expressing behavioural strategies which change in frequency through an evolutionary process, either the survival and reproduction of genetic alleles (Maynard Smith, Reference Maynard Smith1982) or the social transmission of behaviours from demonstrators to learners (Smaldino, Reference Smaldino2023). Fitness expressions that define the payoffs of specific strategies, both at the individual level and at the group level, are passed through an evolutionary replicator model. Behavioural options can themselves be treated as dichotomous choices (e.g. ‘cooperate’ or ‘defect’) or quantitative measures that fall on some continuum, e.g. allocations of resources between self and other. Interactions are commonly described as either pairwise, as a group of players forms dyads who each play the same two-player game together, or as a single N-person game in which all players contribute to shared payoffs. In all cases, a substantial amount of information about a social system can be encapsulated by asking whether a particular equilibrium state can be disrupted by intermittent or persistent shocks, and how different mechanisms of assortment and social structure affect the diffusion of cooperative traits, e.g. through reciprocity (Lehmann, Powers, & Schaik, Reference Lehmann, Powers and Schaik2022), punishment (Marlowe et al., Reference Marlowe, Berbesque, Barr, Barrett, Bolyanatz, Cardenas and Tracer2008) or positive assortment via metapopulation segmentation (Taylor & Nowak, Reference Taylor and Nowak2007). We here focus on games involving this latter mechanism, reviewing first the standard model of additive cooperation and then the different kinds of synergistic interactions which have been studied by game theorists.
2.1. Additive cooperation
In evolutionary biology, altruism refers to any behaviour that comes at a personal cost to the actor while benefiting others in a population. Much theoretical work on social evolution over the last half-century has focused on the conditions for the emergence of altruism, and has led to fruitful discoveries such as Hamilton's concept of inclusive fitness (Hamilton, Reference Hamilton1964) and the gene's-eye perspective. Altruism is also important to the study of population genetics, because it is perhaps the simplest possible representation of a conflict between group and individual interests, and so provides an extreme test of the properties of specific population structures (Rogers, Reference Rogers1990). In both of these contexts, FST plays a central role.
In an additive model of altruism in group-structured populations, we imagine that the ith individual in the jth group has a phenotype xij between 0 and 1, representing their propensity towards cooperation. If xij = 1, then this individual pays a fitness cost c to create a collective benefit b for all members of their group (including themselves). Individuals in the group who are not altruistic (xij = 0) pay no such cost, but experience the group benefits from other altruists. Formally, we can represent the fitness payoff for an individual as
where c and b are the cost and benefits of altruism, and xj is the mean frequency of altruism within group j. Note that Eq. (1) does not require that interactions be dyadic, and applies to both pairwise Prisoner's Dilemma-type interactions and N-player Public Goods Games (SI Section 1).
Extending the original concept of relatedness by shared ancestry in his famous rule, Hamilton (Reference Hamilton and Fox1975) found that the strength of selection for altruistic behaviours in such a system is directly proportional to the fraction of behavioural variance that exists between groups, such that altruism can evolve under the condition
This result is a general feature of linear, additive interactions in structured populations, because the FST statistic serves as a complete summary of the extent to which altruists positively assort with one another (Taylor & Nowak, Reference Taylor and Nowak2007), as genetic relatedness does in systems of interacting kin. This selection threshold is empirically useful, because it suggests that observed FST values in additive interactions will be strongly correlated with the prevalence of cooperative behaviour. Consistent with this, Handley and Mathew (Reference Handley and Mathew2020) have shown that Kenyan pastoralists are more willing to cooperate with hypothetical strangers who are more culturally similar to them, as measured by CFST between the respective ethnic groups, and Smith et al. (Reference Smith, Larroucau, Mabulla and Apicella2018) find high CFST across Hadza camps in contributions to multi-person Public Goods Game experiments. As many mechanisms of human cultural transmission can maintain relatively high between-group variation (Boyd & Richerson, Reference Boyd and Richerson1985), and observed FST ratios for human groups are generally much larger for cultural than genetic traits (Bell et al., Reference Bell, Richerson and McElreath2009), the high degree of parochial altruism observed in human societies is plausibly understood via the transmission of cooperative norms structured by human cultural groups (Richerson et al., Reference Richerson, Baldini, Bell, Demps, Frost, Hillis and Zefferman2016; Zefferman & Mathew, Reference Zefferman and Mathew2015).
2.2. Four categories of non-additive games
The use of CFST as a quantitative measure of cultural diversity is motivated by its prominence in evolutionary models of altruism, so its significance depends on the extent that real-world social interactions resemble the underlying assumptions of Eq. (1). Yet this model has been criticised for being unrepresentative of most social interactions (Alvard & Nolin, Reference Alvard and Nolin2002; Skyrms, Reference Skyrms2004). While theories of altruism posit a fundamental opposition between the interests of a group and of individuals within the group, many real-world institutions function to align individual and group incentives by adjusting reward structures, e.g. through punishment (Marlowe et al., Reference Marlowe, Berbesque, Barr, Barrett, Bolyanatz, Cardenas and Tracer2008; Molleman et al., Reference Molleman, Kolle, Starmer and Gächter2019) or reciprocity (Lehmann et al., Reference Lehmann, Powers and Schaik2022; Panchanathan & Boyd, Reference Panchanathan and Boyd2004).
Further, models of cooperation such as the Prisoner's Dilemma usually assume an additive payoff structure, such that the marginal cost to an individual of switching their behaviour to altruism is the same whether one is in a group entirely of altruists or in a group entirely of defectors, or any mixture of the two. In many naturalistic contexts, though, a social behaviour's consequences are a function of how common it is within the population. Cutting a queue may bring angry, immediate sanctioning in a group where it is rare, but, where common, be a self-reinforcing way to organise turn-taking, or even a social norm. Conversely, an unusual strategy in a competition, or novel product in a marketplace, may derive its success primarily by its rarity vs. commonplace alternatives. As a result, the best response in each situation is always dependent on the behaviours of others, and so there is no dominant strategy. Many kinds of social interaction are non-additive, and linguistic variation, music, sartorial traits and other domains of culture are better described by other game-theoretic models, e.g. Stag Hunt, Snowdrift, Chicken or Hawk–Dove (Camerer, Reference Camerer2003; Gintis, Reference Gintis2000; Maynard Smith, Reference Maynard Smith1982; Skyrms, Reference Skyrms2004; Smaldino, Reference Smaldino2023). These games can all be characterised by the presence of frequency-dependent or synergistic payoffs, and can be divided broadly into four categories.
2.2.1. Simple coordination
Individuals benefit from coordinating on the same behaviour in many kinds of social interaction. The success of a rowing team in a race, a troupe of dancers, the flow of a traffic system or the functioning of a code of laws often depends on participants all doing exactly the same thing, possibly in the presence of a coordinating authority. In coordination games, the worst outcomes are generally experienced by mixed groups in which different participants act with incompatible behaviours, i.e. groups with high behavioural diversity. The origins of norm psychology (House et al., Reference House, Kanngiesser, Barrett, Broesch, Cebioglu, Crittenden and Silk2020), economic agglomeration (Krugman, Reference Krugman1991), by-product mutualisms (Hauert et al., Reference Hauert, Michor, Nowak and Doebeli2006) and positive network externalities (Katz & Shapiro, Reference Katz and Shapiro1985; Liebowitz & Margolis, Reference Liebowitz and Margolis1994; Schelling, Reference Schelling1973) are fundamentally rooted in coordination.
In such interactions, a marginal increase in a norm, behaviour or strategy (generically, a ‘trait’) within a group will increase the payoffs of individuals using that trait, and reduce the payoffs of those not using that trait. To distinguish this class of interactions from coordination dilemmas (described below), we refer to these as simple coordination games, although they have also been called ‘correlative coordination’ (Smaldino, Reference Smaldino2023), ‘relaxed’ social dilemmas (Allen & Nowak, Reference Allen and Nowak2015) or just ‘coordination’ (Cooney, Reference Cooney2022).
In many cases of simple coordination, the choice between alternatives is functionally arbitrary, such as the decision to drive on the left or right side of the road, or to adopt purely symbolic markers of group identity (McElreath et al., Reference McElreath, Boyd and Richerson2003). These can be described by the Pure Coordination game, represented by the 2 × 2 payoff matrix:
Here, the payoffs are given for the ‘row-player’, who receives this amount given their trait and the trait of their partner, the ‘column-player’ (whose payoffs are symmetrical). Although the usual framing involves two players making a simultaneous choice following this payoff matrix, Pure Coordination-type interactions can just as easily apply to large groups or entire societies (SI Section 2.3). Regardless of the framing scenario, the essential detail in Pure Coordination is that all players receive the same high payoff if they can successfully coordinate on one of the two equivalent alternatives.
In other kinds of coordination, the two options instead have a clear difference in performance which has consequences for between-group competition. A common example is differing norms about acceptable marriage partners, which may impact the size and cohesion of political alliances (Schulz et al., Reference Schulz, Bahrami-Rad, Beauchamp and Henrich2018). Following Boyd and Richerson (Reference Boyd and Richerson2002), we refer to these better-performing traits as group-beneficial traits (GBTs), because the group receives a higher payoff at one pure-strategy equilibrium vs. the other. Group-beneficial traits exists in most synergistic interactions (with exceptions such as Pure Coordination), but the GBT pure-strategy equilibrium is not always the group-optimum trait distribution. The existence of GBT's is important in a structured metapopulation, as groups that coordinate on different equilibria can compete with one another through equilibrium selection (Bowles, Reference Bowles2006; Richerson et al., Reference Richerson, Baldini, Bell, Demps, Frost, Hillis and Zefferman2016).
2.2.2. Coordination dilemmas
In contrast to simple coordination, a coordination dilemma exists when all individuals experience a coordination dynamic but, regardless of their own behaviour, benefit from the increased prevalence of the GBT. This could be because one of the two options produces some kind of public good which all individuals in a group benefit from (Boyd & Richerson, Reference Boyd and Richerson2002), or, alternatively, the other trait produces some kind of ‘public bad’ which is costly to all individuals. Within game theory, the most famous coordination dilemma is the Stag Hunt, initially described by Jean-Jacques Rousseau (Skyrms, Reference Skyrms2004), usually defined by a 2 × 2 payoff matrix such as
As Rousseau put it, hunters are better off coordinating to hunt a stag to earn the highest payoff, but may be tempted to hunt hare instead, abandoning their stag-hunting partners who earn nothing as a result. In this classic formulation, a Hare player receives the same mediocre payoff regardless of their partner's behaviour, so technically the Stag Hunt represents a boundary between simple coordination and a coordination dilemma. Like cooperative dilemmas, much research has focused on circumstances under which group-beneficial traits spread within coordination dilemmas. Although individuals are incentivised to choose the GBT, risk-averse players may require assurance that their partner will also, and for this reason coordination dilemmas are also called ‘assurance games’ (Sen, Reference Sen1967). Coordination dilemmas as a group are also often referred to generically as ‘stag hunts’ (Cooney, Reference Cooney2022; Taylor & Nowak, Reference Taylor and Nowak2007; Van Cleve, Reference Van Cleve2017), although different games have different characteristics with respect to equilibrium selection (Boyd & Richerson, Reference Boyd and Richerson2002). Here, we reserve the term ‘Stag Hunt’ to refer exclusively to the algebraic form of the specific game structure above (SI Section 3), rather than coordination dilemmas as a category.
2.2.3. Anti-coordination dilemmas
Another important class of games involves a negative frequency dependence between strategies. As in coordination dilemmas, an anti-coordination dilemma requires that all individuals benefit from the increased prevalence of the GBT. However, unlike coordination dilemmas, each trait can realise higher payoffs within a group composed of the other trait, so individuals are always incentivised to play the rare strategy. Since neither strategy is stable against invasions by the other, evolutionary systems involving anti-coordination dilemmas tend to approach a stable mixture of strategies, but this mixed equilibrium always realises lower benefits than those of the group-optimum configuration of strategies.
Anti-coordination dilemmas are often described in terms of negative externalities or congestion games (Peña & Nöldeke, Reference Peña and Nöldeke2023), and a popular anti-coordination dilemma is Snowdrift (Doebeli & Hauert, Reference Doebeli and Hauert2005). Here, two individuals are trying to accomplish a group project that produces a shared benefit (e.g. a village well, a road clear of snow or a co-authored manuscript) but each has the temptation to shirk their part in the labour. This game is described algebraically by benefit b and cost 0 < c < b, such that the row-player's payoff table is
As is generally the case in anti-coordination dilemmas, payoff-maximising individuals would prefer to live in a group in which all other players are Workers, but can realise a higher payoff by individually defecting to ‘Shirk’. However, this negative frequency dependence works in both directions: a group of Shirkers pays a very large cost (producing no group project), so the cooperative Work strategy is also able to invade.
Another important example of an anti-coordination dilemma is Maynard Smith's (Reference Maynard Smith1982) Hawk–Dove game, defined by payoff matrix
for resource V and cost of fighting C > V. If both agents employ the Dove strategy, each has an equal chance of getting the resource. If one plays Hawk and the other Dove, the Hawk gets all the resources without a fight, and the Dove nothing. If both play Hawk, though, a fight begins in which one gains the resource at a large cost to the other, again with equal chance to each participant. Overt conflict is the worst outcome, both for groups and for individuals, and as a result each strategy can invade the other when it is rare.
Although derived independently in very different contexts, Hawk–Dove and Snowdrift have essentially identical evolutionary dynamics. Anti-coordination dilemmas as a whole are often referred to in this literature as ‘hawk–doves’ (Cooney, Reference Cooney2022; Taylor & Nowak, Reference Taylor and Nowak2007; Van Cleve, Reference Van Cleve2017) or ‘snowdrifts’ (Allen & Nowak, Reference Allen and Nowak2015). As above, though, we reserve the terms Snowdrift and Hawk–Dove to refer specifically to the original payoff matrices defined above, distinguishing them from other games in the larger, heterogeneous class of anti-coordination interactions.
2.2.4. Complementarity
Not all forms of anti-coordination are antagonistic. In Lamaleran whale hunting, success depends on the coordinated actions of not only a harpooner, a bailer and a helmsman, but also a sailmaker, a carpenter and a blacksmith (Alvard & Nolin, Reference Alvard and Nolin2002). Likewise, the specialised roles within an ant colony, an orchestra, a sports team or a sailing crew, or the production and flow of goods and services within a marketplace, depend on behavioural diversity. Because each behaviour experiences a higher payoff when rare, this is essentially an anti-coordination dynamic, but unlike anti-coordination dilemmas above, neither trait benefits from a marginal increase in itself within the group; an increase in the abundance of each trait is always beneficial to individuals choosing the alternative. For this reason, we refer to this category of interaction as complementarity.
Following Adam Smith's famous metaphor, we define the Invisible Hand game as the complementary interaction in which the group-optimal frequency of behaviours is also the mixed equilibrium itself (see SI Section 3.3). An example payoff matrix for this game might be
As a kind of mirror-image of simple coordination, players here can realise the highest payoffs by individually specialising in one or the other trait to generate a synergistic payoff. In economics, complementarity is most famously associated with the principles of ‘gains from trade’ and Ricardian comparative advantage, and in sociology and evolutionary biology, with divisions of labour (Cooper et al., Reference Cooper, Frost, Liu and West2021). Confusingly, the phrase ‘strategic complements’ is also used in game theory to describe coordination games, in the sense that two players both using the same trait may generate positive synergies when interacting. Here, we use the concept of complementarity strictly to refer to different behaviours, traits or strategies ‘complementing’ each other.
Some kinds of complementarity may give the same outcome to all participants regardless of who does what, and as with coordination, which behaviour is chosen by which participant may be totally arbitrary. Depending on their role in the interaction, however, each participant may earn different payoffs, which allows complementarity to serve as a model for studying the origins of inequality and unfairness (O'Connor, Reference O'Connor2019).
3. A model of linear synergy with mixed equilibria
Although the categories of interaction described above have different characteristics, and the games within those categories are derived from different scenarios, they are all connected by synergistic or frequency-dependent payoffs and (with notable exceptions like the Pure Coordination game) by the presence of GBTs. Unlike additive cooperation, there is no dominant strategy in any such games, as each player's best option always depends on what their partners do. As a result, it is valuable to consider all these interactions simultaneously in the context of a general model. To abstract away from specific framing scenarios such as ‘cooperation’, ‘defection’, ‘hunt stag’ and ‘hunt hare’, etc., we instead define two generic behavioural strategies, A and B, and interpret each phenotype xij as a propensity toward trait A, taking any real value between 0 and 1, inclusive (which strategy to track is arbitrary). As before, we define a structured population with group-average phenotype xj for each group j and individual phenotype xij for each individual i in group j.
To derive a tractable model that incorporates the above phenomena, we make two assumptions. As in the metapopulation model in Eq. (1), we assume that payoffs to all individuals are linear with respect to the group frequency of A, xj. Define parameter m as the marginal effect of a within-group increase in trait A on the fitness of a focal individual with A, and n as the marginal effect of such an increase for a focal individual with trait B. For individuals with mixed strategies the marginal effect of an increase in A is xijm + (1 − xij)n, the weighted average of m and n. We assume that m and n are both constant over possible distributions of behaviour within a group, but make no assumption about their signs or relative magnitudes. The assumption of linearity is critical, as non-linear fitness functions require higher-order moments to evaluate the covariance between individual and group phenotype, and FST will no longer be sufficient to describe group structure (Schonmann & Boyd, Reference Schonmann and Boyd2016). One limitation of our approach, however, is that by treating group fitness as a simple average of over individual (linear) fitness, it ignores the concept of viscosity or local competition (Hamilton, Reference Hamilton1964).
Second, in order to incorporate the concept of frequency dependence, we assume non-additivity or synergy with respect to individual frequency of behaviour, such that a marginal change in xij has a different impact on individual payoff as a function of group frequency xj. This is distinct from the concept of linearity, by which we mean that a change in xj has a constant marginal effect on those in the group, regardless of the group frequency xj. Thus, a model can be both linear and non-additive. Synergistic effects imply that an individual increasing their use of behaviour A can sometimes decrease, and sometimes increase their personal fitness depending on how common A is in the group, so there must be some group frequency k at which this individual effect is zero. Together, these assumptions produce the fitness expression
for real numbers m, n and $\tilde{w}$ (see Table 2 for a complete list of symbols). This fitness expression describes pairwise interactions whose payoffs depend on the group frequency of traits, but also N-player interactions in which users of each strategy experience constant per capita returns to scale (Peña et al. (Reference Peña, Nöldeke and Lehmann2015), detailed in SI Section 2.3).
Like additive altruism in Eq. (1), groups of individuals playing synergistic games experience two pure-strategy equilibria, but each of the games described above are distinguished by the presence of a third, mixed equilibrium k at which different strategies co-exist with the same payoffs within a group. Depending on the interaction structure, this mixed-strategy equilibrium may be stable or unstable. In coordination games, k separates the two basins of attraction for each pure-strategy equilibrium. Harsanyi and Selten (Reference Harsanyi and Selten1988) define the risk-dominant equilibrium as having the larger basin, which is more likely to be reached by stochastic evolutionary dynamics (Kandori et al., Reference Kandori, Mailath and Rob1993; Young, Reference Young1993). In anti-coordination games, in contrast, k represents the stable equilibrium whose basin of attraction covers xj ∈ (0, 1), and in complementarity games, groups at k have higher payoffs than at either of the two pure-strategy equilibria (SI Section 3.3).
In general, we do not require k to be between 0 and 1, and synergistic games can exist without a third, mixed equilibrium. We can also express Eq. (3) as
which extends Eq. (1) with a synergistic coefficient d ∈ R. In the above notation, m = b + d while n = b, and all members of a group experience the same fitness payoff at group frequency k = c/d. Assuming further that b > 0, c > 0, and c − b < d < c, this formulation describes the ‘Prisoner's Dilemma with synergy’ (Ohtsuki, Reference Ohtsuki2012; Van Cleve, Reference Van Cleve2017).
We do not use this parameterisation in this analysis, because we want to be as vague as possible about the causal mechanics of traits A and B in order to more easily interpret different synergistic dynamics that might resemble Eq. (3). Our model covers games in which A is GBT, in the sense that a group of A will have a higher average fitness than a group of B (−nk < m(1 − k)), but also games in which B is GBT (−nk > m(1 − k)) and interactions like Pure Coordination, in which neither trait is GBT (−nk = m(1 − k)). The specific details of different interactions may indeed resemble a Prisoner's Dilemma, such as the choice between conserving or over-harvesting a local marine resource, which benefits one individual at the expense of others. However, there are many synergistic interactions in which the concepts of ‘cooperation’ and ‘defection’ do not make sense, such as the choice between using LaTeX or Microsoft Word to write co-authored articles. Here, payoffs are influenced by the ambient number of users within one's collaboration networks, and each alternative works well when commonplace.
3.1. Mapping the four categories of interaction
The general model of linear synergy described by Eq. (3) can define the four categories of synergistic interaction simply by whether the marginal effects m and n are, respectively, either positive or negative. Simple coordination interactions require that an increase in trait A benefits those with A (m > 0) and harms those with trait B (n < 0), while in complementarity the reverse is true (m < 0, n > 0). A coordination dilemma occurs when an increase in the GBT has a positive effect on all, but the marginal benefit is larger for those with the GBT (that is, m > n > 0 if A is GBT). Anti-coordination dilemmas, in contrast, require that the benefit of increasing the GBT is larger for those without the GBT, and vice versa, as each trait has a higher payoff when rare.
We are hardly the first to map this space of interactions, or to recognise that different games can be related to each other by transformation through a continuous spectrum. Modelling groups within a cultural metapopulation, Boyd and Richerson (Reference Boyd and Richerson2002) use a similar approach to differentiate coordination dilemmas by the strength of selection and size of the relative basins of attraction. Hauert et al. (Reference Hauert, Michor, Nowak and Doebeli2006) and Taylor and Nowak (Reference Taylor and Nowak2007) show how the Prisoner's Dilemma can be transformed algebraically into other games, defining anti-coordination and coordination dilemmas using inequalities between absolute payoffs. This approach is further developed by Van Cleve (Reference Van Cleve2017) to incorporate the concepts of synergy and reciprocity, while Allen and Nowak (Reference Allen and Nowak2015) and Cooney (Reference Cooney2022) extend this notation to cover complementarity and simple coordination interactions.
We see our approach here as complementary with existing methods. For the broad group of games with mixed equilibria (0 < k < 1), one advantage of our parameterisation is the ability to articulate the differences between games with the same k with only two terms, m and n. The (m, n) space then forms a kind of map for all possible interactions in this model (Fig. 2, right). This is because the specific value of parameter k is not important for the properties of the interactions relevant to our analysis (provided it remains between 0 and 1); a Hawk–Dove game with k = 0.1, k = 0.5, or k = 0.9 is still a Hawk–Dove game per its definition above (Maynard Smith, Reference Maynard Smith1982). By interpreting trait A as ‘Dove’ and trait B as ‘Hawk’, the Hawk–Dove dynamic is captured in this model when m = V/2 and n = (V + C)/2. At Dove frequency k = (C − V)/C, all individuals have the same average fitness, so it holds that
meaning that, in (m, n) space, Hawk–Dove-like interactions exist for any game in which
where n > 0 and m > 0 and where 0 < k < 1.
We can supply a similar analysis for any coordination or anti-coordination game with a defined payoff structure, and derive a characteristic relationship between m, n, and k in the general model above (Table 1, examples). As a result, for a given k, specific games will appear as vectors within the (m, n) space (Fig. 2, right), and all vectors with the same slope are effectively the ‘same’ game. This means that to differentiate games with the same equilibrium frequency k, we only need one parameter, the polar angle θ, where tanθ= n/m. This gives us the ability to articular all linear synergistic interactions with the same mixed equilibrium along a single numerical scale. Doing so identifies that several canonical games exist at critical locations within this space marking transitions between regions (Fig. 3).
4. The role of FST in the spread of group-beneficial traits
Having defined the model of linear synergy, we now seek an expression similar to that of Hamilton (Reference Hamilton and Fox1975) that illuminates the role of FST in the spread of group-beneficial traits. In a metapopulation model, group interactions structure payoffs to individuals through frequency-dependent feedback, so one reasonable method to analyse such a model is by the multi-level version of the Price equation, which partitions covariance dynamics within and between groups as
Although initially developed to describe genetic evolution, the Price equation can be equally applied to the decomposition of distinct processes of cultural transmission (Beheim & Baldini, Reference Beheim and Baldini2012; El Mouden et al., Reference El Mouden, Andre, Morin and Nettle2014), including group-level traits (Smaldino, Reference Smaldino2014) and gene–culture coevolutionary systems (Aguilar & Akçay, Reference Aguilar and Akçay2018). Supplying our model of frequency dependence and assuming dichotomous phenotypes, we show in the Appendix that this expression becomes
where payoff ratio ℓ = m/(m − n). The critical variance ratio at which the group-beneficial trait neither spreads or declines ($\Delta {\bar x} = 0$) is then
This threshold separates FST values that cause an increase in group-beneficial traits from those that cause a decrease, but the specific causal details of the system (defined by payoff ratio ℓ and equilibrium location k) determine both the size and direction of evolutionary change. We now also require the metapopulation prevalence of A, ${\bar x}$, which was not present in the additive condition of Eq. (2).
We can assess the effect of a marginal increase in FST on selection for A by taking the derivative of Eq. (6) with respect to FST,
When A is the GBT (–nk < m(1 – k)), positive values of this selection gradient indicate stronger selection for the GBT as FST increases, while negative values imply greater selection against the GBT. If trait B is instead the GBT (-nk > m(1 − k)), the reverse is true. Because ${\bar x}( {1-{\bar x}} )$ is necessarily positive, this phenomenon is thus mediated entirely by the signs and relative magnitudes of m and n, with the selection gradient reversing direction at ${\bar x} = \ell$. We can see the behaviour of Eq. (8) graphically over the four categories of interaction in Fig. 3 by graphing the arctan θ of m and n.
In coordination or anti-coordination dilemmas, FST must have a positive marginal effect on selection for group-beneficial trait, for the same reason Hamilton (Reference Hamilton and Fox1975) described: at high levels of FST, behaviours that raise average group payoffs can positively assort with one another, thereby avoiding some costs from free riding. For example, in a Stag Hunt, a high FST enables groups at different pure-strategy equilibria to compete with one another through direct conflict (Bowles, Reference Bowles2006), differential dispersion (Rogers, Reference Rogers1990) or social influence (Boyd & Richerson, Reference Boyd and Richerson2002) (Fig. 4, Stag Hunt). Consistent with this, Kenyan pastoralists who frequently engage in intergroup conflict show a strong association between pairwise FST and willingness to engage with partners in coordination vignettes (Handley & Mathew, Reference Handley and Mathew2020). In anti-coordination dilemmas such as Hawk–Dove, high FST indicates the ability for group-beneficial traits like Dove to avoid interacting with group-harmful behaviours like Hawk. If FST is high enough, this positive assortment on like-type can prevent Hawk from invading a population altogether (Fig. 4, Hawk–Dove). This positive selection gradient is also present in coordination dilemmas and anti-coordination dilemmas without a third, mixed equilibrium (k < 0 or k > 1) such as the ‘Prisoner's Dilemma with synergy’ (SI Section 2.1).
Outside of coordination and anti-coordination dilemmas, however, we can see in Fig. 3 how FST can both facilitate or hinder the spread of a group-beneficial trait depending on ${\bar x}$. Because the reversal of the selection gradient with respect to FST lies at frequency ℓ, it can only occur in complementarity and simple coordination interactions if 0 < k < 1, because only they allow 0 < ℓ < 1. In these regions, ${\rm d}{\bar w}\Delta {\bar x}/{\rm d}F_{ST}$ will be positive for some ${\bar x}$ and negative for others. When the two coordination alternatives are essentially arbitrary, FST slows movement through each basin of attraction towards a pure-strategy equilibrium, as some groups will coordinate on the minority norm even as it declines in the population overall (Fig. 4, Pure Coordination). Similarly, complementary strategies spread faster when FST is low, as they can more quickly find unlike-types, while with high FST each trait can become stuck inside low-diversity behavioural enclaves that cannot realise the full benefits of complementarity (Fig. 4, Invisible Hand).
Although directly relevant to the study of cultural FST, this phenomenon has not been clearly identified in either the cultural or social evolution literature to date. The role of FST in models of linear synergy was initially described for discrete traits by Queller (Reference Queller1985), and our Eq. (6) is isomorphic with Eq. (10) in Gardner et al. (Reference Gardner, West and Wild2011) and Eq. (A33) in Lehmann et al. (Reference Lehmann, Feldman and Foster2008). We can also re-express Eq. (7) in terms of Queller's (Reference Queller1985) ‘synergy coefficient’ d as
In this version, it is clear that if there are no synergistic effects (d = 0), this simplifies back to Hamilton's (Reference Hamilton and Fox1975) well-known threshold (Van Cleve & Lehmann, Reference Van Cleve and Lehmann2013). Because our approach reduces the essential differences between games with the same k down to a single parameter, θ, we can more readily connect the role of FST to each of the four regions of interaction via Eq. (8) and Fig. 3. Allen and Nowak (Reference Allen and Nowak2015) report that positive assortment between genetic relatives (which is analogous to FST) can inhibit coordination when m + n < 0, assuming A is GBT, and provide an example game at k = 1/7, θ = −1.19 (also see SI Fig. A13). Taking a different approach with Eq. (8), we find that this inhibitory phenomenon is much more general, and appears whenever ${\bar x}m + ( {1-{\bar x}} ) n < 0$ (assuming A is GBT) or ${\bar x}m + ( {1-{\bar x}} ) n > 0$ (assuming B is GBT). For games with mixed equilibria (0 < k < 1), cultural distance can slow the spread of GBTs under certain trait frequencies throughout both the simple coordination (m > 0, n < 0) and complementarity (m < 0 and n > 0) regions of the interaction spectrum.
5. Discussion
Our results challenge the prevailing view in cultural evolution that cultural distance between groups, as measured by CFST, should positively associate with the prevalence of parochial altruism, homophily preferences, between-group warfare and so forth. Although this finding has been empirically validated, our model results suggest that it should be viewed as valid only within a specific range of a broader spectrum of interaction. Within certain game structures – complementarity and simple coordination – we can rather predict the opposite, that group-beneficial traits spread slower, or are prevented from spreading altogether, when cultural distance between groups is relatively high and within-group trait diversity is low (i.e. CFST is high). These results suggest that properly designed experimental or observational studies should be able to show heterophily, a preference for interacting with those different from one's in-group.
Thus, because the causal details of frequency-dependent interactions (as defined by m, n and k) determine the effect of FST on the evolution of group-beneficial traits, caution when interpreting and comparing empirical CFSTs is warranted. Even within a game such as Stag Hunt, in which higher values of FST promote the spread of group-beneficial Stag behaviours, the presence of synergistic effects complicates the ability to compare distance measures across different systems. This is fundamentally because the population average plays a mediating role in Eq. (7). As a result, paradoxically, a large CFST might indicate weaker selection for group-beneficial traits while a small CFST indicates stronger selection (Fig. 4, Stag Hunt). Meta-analyses cataloging observed cultural FST values would therefore benefit from contextualising these estimates with both the mean prevalence of the behaviours they are tracking and from careful descriptions of the causal details of each system under comparison.
By making explicit the relationship between FST and the outcomes of frequency-dependent interactions, we not only better-situate empirical measures of cultural distance, but also motivate new theoretical inquiry into questions around synergy, frequency-dependence, network measures, the complexities of identity, the paradox of diversity and other questions that heavily rely on the structure of variation across groups. As such, these results can bridge a number of disparate literatures on social evolution, cultural evolution, graph theory and group identity.
5.1. Dichotomous and continuous traits
Our derivation (Appendix A) generalises Queller's (Reference Queller1985) original analysis of dichotomous traits over finite numbers of groups and individuals, allowing us to consider the effects of selection over the full range of possible values of FST via Eq. (A3). Assuming discrete traits allows us to simplify the system to Eq. (6), but this imposes combinatoric constraints on the possible values of ${\bar x}$ and FST. With M groups each with N individuals, for example, there are $\left({\matrix{ {N + M} \cr M \cr } } \right)$ possible combinations of ${\bar x}$ and FST, with systematic under-representation of corner cases as a function of both M and N (Fig. 5). This indicates that if FST values are calculated pairwise between two groups (M = 2), high FST values cannot be reached at low or high values of ${\bar x}$ regardless of group size. With even moderate numbers of groups and individuals, though, most of the possible range of both ${\bar x}$ and FST is reachable. Thus, Eq. (6) and the resulting selection surfaces (Fig. 4) are best understood in the context of a large number of large groups.
5.2. Networks, identities and the complexities of population structure
Our result was derived in the context of a purely hierarchical population, in which each individual has unambiguously one, and only one, group membership. Yet in realistic social settings, individuals often have multiple overlapping group identities. While we expect that the qualitative features of anti-coordination, coordination and complementarity described here are quite general, we also believe that decomposing the multiple memberships may illuminate the relevant variance measures in empirical applications.
Specifically, we may decompose the first covariance in Eq. (5) relating trait value and average group fitness, by asking how an individual's trait value covaries with the multiple groups to which they may belong. If the identities or groups are not competing or mutually exclusive, their dynamics may be treated independently, i.e. a separate Eq. (5) for each group. However, if groups affect each other in some way, we will require a system of N equations for N identities. In such a case the relevant variance measures will reflect the properties of a system of equations, such as stationary distributions if an equilibrium exists, or a cyclical dynamic otherwise. This may be an important theoretical avenue to pursue.
We may also consider a network approach, where we can decompose the first term in Eq. (5) into conditional covariances. That is, the covariance of a trait value and its fitness is conditional on another random variable, which in the network context may be the strength of the tie between two individuals with a certain phenotype. The key statistic – the expectation of conditional covariances across network ties – in general reflects assortment mechanisms central to the literature around group-beneficial traits, and the relevant empirical variance measures will be expressed through parameters prescribing assortment according to individual trait value.
5.3. Resolving the paradox of cultural diversity
Our results also provide theoretical focus to a persistent empirical debate about the relationship between immigration, multiculturalism, assimilation and trust, recently framed as the ‘paradox of diversity’ (Schimmelpfennig et al., Reference Schimmelpfennig, Razek, Schnell and Muthukrishna2021). A substantial literature in sociology and political science has shown that racial and ethnic diversity at the neighbourhood level is associated with decreased levels of generalised trust (Dinesen & Sønderskov, Reference Dinesen and Sønderskov2015) and consequently a decline in civic engagement (Alesina et al., Reference Alesina, Glaeser and Sacerdote2001; Putnam, Reference Putnam2007). Yet, at the same time, the integration of marginalised minorities into a market system is often associated with an increase in expression of pro-social preferences (Henrich et al., Reference Henrich, Ensminger, McElreath, Barr, Barrett, Bolyanatz and Ziker2010) and willingness to trust advice from co-ethnic strangers (Lightner & Hagen, Reference Lightner and Hagen2021), and in urbanising economies local immigration rates have been positively associated with increases in wages (Ottaviano & Peri, Reference Ottaviano and Peri2006) and rates of innovation (Posch et al., Reference Posch, Schulz and Henrich2023).
Since FST serves as an indicator of cultural homogeneity within groups, our model serves as a simple demonstration of this paradox. Consistent with sociological findings, low CFST (i.e. high within-group diversity) erodes the positive assortment necessary to sustain a group-beneficial trait in anti-coordination dilemmas, e.g. Hawk–Dove, or to bootstrap it in coordination dilemmas, e.g. Stag Hunt. However, when interactions are complementary, a low CFST more rapidly promotes socially-optimal outcomes. This mirrors the empirical pattern of higher marginal gains from specialisation in diverse urban economies (Peri & Sparber, Reference Peri and Sparber2009; Posch et al., Reference Posch, Schulz and Henrich2023). Moreover, in simple coordination interactions, higher within-group diversity moves a population faster towards a social optimum, as it becomes harder for some subpopulations to become ‘stuck’ within the basin of attraction of an inferior normative equilibrium. Cultural diversity can have opposite effects in different causal interaction structures. As with the parable of the blind men describing different parts of an elephant, we should expect such contradictory findings to make more sense when placed in the larger context of the spectrum of synergistic interactions (Fig. 3).
5.4. From statics to dynamics
To date much of the theoretical literature has focused on evaluating broad questions, such as when cooperation will evolve. Therefore, analytical attention has been drawn to assessing the evolutionary scope of a trait via static analysis, e.g. Bell et al. (Reference Bell, Richerson and McElreath2009), which is also the approach used here. As a result, we do not track the change in FST as we do the change in mean frequency ${\bar x}$. This is unsatisfactory when wishing to predict trait variation and selection over time, and in future models, special attention should be made to the relationship between traits and their inheritance, the multilevel dynamics of selection over time, and how to put them together to estimate the parameters of a particular case (Keller, Reference Keller1999).
For cultural traits, understanding the individual-level transmission of a particular trait is key as many inheritance pathways are possible which may affect the covariance between trait values and the ‘next-generation’ learner. Likewise, the transmission of group-level traits may occur through multiple mechanisms, including selective imitation, migration and natural selection (Richerson et al., Reference Richerson, Baldini, Bell, Demps, Frost, Hillis and Zefferman2016). Given the diverse transmission mechanisms and other evolutionary forces, it is likely that selection on groups, institutions, individuals or other units may evolve at different time scales. As a result, CFST will change through time and consequently so will the predicted rate and perhaps direction of selection. A dynamic approach also requires more detailed consideration of stochasticity and long-run stability (Foster & Young, Reference Foster and Young1990). In coordination interactions, Van Cleve and Lehmann (Reference Van Cleve and Lehmann2013) show that if selection is relatively weak, the relative size of each basin of attraction, random mutation and positive assortment together determine which trait reaches long-run fixation.
6. Conclusion
We formalise the causal significance of FST across synergistic social interactions through a game theoretic frame, better justifying its use in diverse cultural contexts. Our results demonstrate that there is no single relationship between CFST and the strength of selection for group-beneficial traits. With the increasing popularity of CFST and similar measures of behavioural diversity between groups, our analysis strongly motivates increased attention to the ethnographic contexts affecting cultural variation. Doing so will give greater power to empirical variance measures for inferring or reflecting underlying causal mechanisms.
Data availability statement
n/a
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ehs.2024.8
Acknowledgements
We thank Jorge Peña for extensive comments and assistance with derivations. Mitchell Newberry, Laurel Fogarty, Richard McElreath, Hedvig Skirgård, Jeff Andrews, Elspeth Ready, Cody Ross, Alejandro Pérez Velilla, Marco Smolla, and members of the Department of Human Behaviour, Ecology and Culture at MPI-EVA and Department of Anthropology at UC Davis provided valuable feedback. Three anonymous reviewers provided further comments that resulted in significant improvements.
Author contributions
Both authors contributed equally to conceptualising, analysing, writing, and revising the manuscript.
Financial support
n/a
Competing interest
n/a
Research transparency and reproducibility
No data were directly used in this paper. R code to reproduce all figures and calculations is available at https://github.com/babeheim/causality-variance-fst .
Appendix: Evolutionary Decomposition
In a metapopulation in which individual i in group j has phenotype xij and experiences fitness payoff wij, behaviour A will increase according to
where wj and xj are the mean fitness and average phenotypic frequency within group j. Note that these are empirical covariances and expectations calculated over a finite number of groups. A full glossary of mathematical symbols is found in Table 2. Define φ = n/(n − m), so that from Eq. (3) individual fitness becomes
Within any particular group, the covariance between individual fitness and phenotype can be written as
which, because cov(φ, x ij) = 0, becomes
Substituting this into the second term of Eq. (5) gives
On the group level, the mean fitness can be written as
so the covariance between group fitness and group phenotype simplifies to
Combining Eq. (A1) and Eq. (A2), the full expression for evolutionary change in mean phenotype is
Define ℓ = 1 − φ and write $\beta = {\rm cov}( {x_j, \;\;{\rm E( }x_{ij}^2 ) } ) {\rm var}( {\rm x} ) ^{{-}1}$ as the regression coefficient of ${\rm E( }x_{ij}^2 ) \;$on x j, then evolutionary change in our frequency-dependent model is given as
Equation (A3) holds for continuous phenotypes (x ij ∈ [0, 1]) as a frequency-dependent extension of Hamilton's Eq. (2). Previous work generally assumes that traits are discrete 0/1 variables, e.g. Eq. (10) in Gardner et al. (Reference Gardner, West and Wild2011), Eq. (A33) in Lehmann et al. (Reference Lehmann, Feldman and Foster2008) and Appendix D in Allen and Nowak (Reference Allen and Nowak2015). In this special case in which individual phenotypes have dichotomous traits (x ij ∈ {0, 1}), then ${\rm E( }x_{ij}^2 ) = {\rm E( }x_{ij})$, ${\rm cov}( {x_j, \;{\rm E( }x_{ij}^2 } ) ) = {\rm var}( {x_j} )$ and β = 1. Equation (A3) thus simplifies to
which is Eq. (6) in the main text. If −nk < m(1 − k), then positive values of this expression indicate that the GBT (trait A) will be selected for, and negative values indicate that the GBT will be selected against. If −nk > m(1 − k), then B is the GBT and the reverse is true.