Introduction
In genetics quantitative traits like stature, body weight, milk yield in cattle, yield per acre of a field crop or the outcomes of pencil and paper tests for cognitive ability or personality, are traits that can be measured on a continuous scale. The theory of the evolution of quantitative traits under selection is well established in plant and animal breeding and in evolutionary biology (Lande, Reference Lande1976; Falconer, Reference Falconer1981; Walsh & Lynch, Reference Walsh and Lynch2015). The origin of quantitative genetic theory is R. A. Fisher (Reference Fisher1919), who modelled a quantitative trait as the additive effects of a large number of genetic loci, allowing for dominance within each locus, and random environmental effects. Fisher’s paper ended the controversy between Mendelians and Biometricians, the two competing approaches to understanding the transmission of heritable traits in the early twentieth century. Many regard this paper, after Darwin’s Origin of Species, as the beginning of modern evolutionary biology (Provine, Reference Provine2001).
In practice Fisher’s basic model is simple and robust despite potential complications and elaborations. Sufficient conditions for the key results are the stochastic independence of genotypes and environments, the absence of dominance and epistasis (non-additive interactions between genes at different genomic locations) and a linear regression of genotypic values on phenotypes (Nagylaki, Reference Nagylaki1992). Under these assumptions the breeder’s equation relates the change in the average value of some quantitative trait in response to selection as:
In this equation r, the response to selection, is the difference between the average phenotype of parents and the average phenotype of offspring after a single generation of selection. The term h 2 is the narrow-sense heritability, or the fraction of phenotypic variance in the population attributable to the additive effects of genes. Finally, the selection differential, s, denotes the difference between the average phenotype in the population and the average phenotype of parents of the next generation.
Given two terms of the breeder’s equation, one can estimate the third. Agriculture favours estimating heritability through selection experiments. Breeders or experimenters fix the selection differential by choosing parents from the population under study. The response r is obtained by measuring the offspring after selection. The heritability of the trait h 2 immediately follows from the breeder’s equation. Such experiments cannot be done in the social sciences. Instead, heritability is estimated from similarities among relatives.
Complications can occur in the simple theory. For example, epistatic interactions between loci violate the simple set of assumptions. Remarkably, these tend to cancel each other over time. Crow (Reference Crow2010) suggested that natural selection generally favours changes that lead to the simple additive model given by the breeder’s equation.
Complexities have cast doubt over human heritability estimates. For example, a standard source of data has been pairs of identical and fraternal twins. Heritability estimates then assumed that identical twins separated at birth are genetically identical and raised in random environments. Challenges to this assumption arise from many sources: shared uterine environment of identical twins before birth, non-random placement of adopted twins and others (Kamin & Goldberger, Reference Kamin and Goldberger2002).
Data from the Human Genome Project render many criticisms of twin studies moot. Identical twins were of interest because they shared more or less the same genome, but suspicions of shared environment persisted. Today the state of technology is to type approximately a million single nuclear polymorphisms (SNPs) in a large sample of subjects and measure the relationship between individuals by the number of SNPs they share. Realized kinship to the order of second cousins and lower is assessed for tens of thousands of typed subjects. This technology renders studies of real pedigree relationship nearly obsolete while simultaneously validating the conclusions about human heritability estimates that had been so widely denounced in the twentieth century (see Visscher et al., Reference Visscher, Hill and Wray2008; Yang et al., Reference Yang, Benyamin, McEvoy, Gordon, Henders and Nyholt2010; Davies et al., Reference Davies, Tenesa, Payton, Yang, Harris and Liewald2011; and Lee & Chow, Reference Lee and Chow2014, for further discussion).
Thought experiment
Assortative mating can be equivalent to directional selection. Assortment for a trait generates either distinct groups or a continuum: those with more of the trait at one extreme and those with less of the trait at the other, within a population. To illustrate this consider a simple thought experiment about assortative mating for stature in a population. Assign average stature of 175.26 cm for males and 162.56 cm for females, each with a standard deviation of 7.62 cm. These figures are close to observed values in North America. Suddenly assortative mating by height occurs such that those taller than the population mean mate only with others also above the mean while those shorter than the mean mate only with others shorter than the mean.
First, it is assumed that the differences between male and female stature reflect fixed differences in the development of the two sexes. The data can then be normalized for each sex and stature in standard deviation units (SDs) measured. The pooled population has a mean of zero and a standard deviation of 1.0. The new units, called Z-scores, are obtained by subtracting the population mean from everyone’s stature (175.26 cm from each male, 162.56 cm from each female) and dividing by the standard deviation of 7.62 cm. After imposition of the mating rule the population separates into two populations: talls and shorts. With this normal distribution assumption the overall normal distribution has been split into two distributions. The mean of a left half-normal distribution is −0.8 SDs and that of the right half +0.8 SDs. The difference in population means is therefore 1.6 SDs, or 12.19 cm in the original units.
The new subpopulations mate at random within themselves. The two sets of parents differ by 1.6 SDs. After reproduction, assuming an additive heritability of 0.80 for stature, the group difference will be, following the breeder’s equation, 0.8×1.6=1.28 SDs, corresponding to 1.28×7.62=9.75 cm. After the first generation of reproduction the genotypic group means differ by 9.75 cm without further movement between groups.
Before the new mating system the overall variance of stature was 1.0. After a single generation of the new mating system the between-subpopulation variance is 0.642 or 0.41 on the standard deviation scale. The fraction of stature variance that is between groups is now 41%. The variance of stature within each of the two subpopulations is the complement or 0.59. Back on the scale of centimetres, the standard deviation within populations is the square root of 0.59 (0.773), corresponding to 5.9 cm.
Richard Lewontin (Reference Lewontin1972) famously estimated the fraction of genic variance among continental human groups to be about 15% for a collection of presumably neutral genetic markers. That estimate of the ratio of between-group to total variance has remained constant as the number of loci available to estimate it grew from dozens to hundreds of thousands. After a single generation of this thought experiment the stature variance between groups at genetic loci influencing stature is between two and three times as great as the overall neutral genic variance among major human populations. Yet there is absolutely no selection in this thought experiment. Everything is selectively neutral. On the other hand, from the perspective from within subpopulations, there has been strong selection imposed by the mating rule allocating persons to groups according to stature. The mechanism of selecting according to a threshold is called truncation selection. It is, for example, the most common selection rule in animal and plant breeding. Here, the threshold was at the mean height of the overall population.
It is worth pointing out that after assortative mating the genetic variance within each group is somewhat smaller than what it otherwise could be as a result of negative linkage disequilibrium between causal sites (Bulmer, Reference Bulmer1980). This may seem somewhat paradoxical because of the overall positive linkage disequilibrium between causal sites in the total population (including both talls and shorts) induced by the assortative mating itself (Fisher, Reference Fisher1919; Crow & Felsenstein, Reference Crow and Felsenstein1968; Nagylaki, Reference Nagylaki1982). The reader may consult Lee and Chow (Reference Lee and Chow2013) for an intuitive explanation of the so-called Bulmer effect of negative disequilibrium. If random mating prevails within each group after assortment, then the Bulmer effect decays; the genetic variance within each group will thus increase. The most important feature of the genetics to keep in mind is that the classical increase in the overall genetic variance due to assortative mating, first described by Fisher himself, corresponds to the divergence in means between the two groups formed by the assortment process.
The mating rule in this thought experiment about the effects of truncation selection acted on a biometric trait. Suppose, instead, individuals in the population were similarly partitioned according to a threshold along a behavioural trait. Bailey (Reference Bailey1998) pointed out that behavioural traits like religiosity, extraversion, neuroticism and even divorce are all moderately heritable. Indeed, Eric Turkheimer in an essay titled ‘Three laws of behaviour genetics and what they mean’ (Turkheimer, Reference Turkheimer2000) gives ‘all human behavioural traits are heritable’ as his first law of behaviour genetics. Rejecting the null hypothesis that behaviours are not heritable is therefore no longer of much interest (Bailey, Reference Bailey1998; Turkheimer, Reference Turkheimer2011). The genetic effects of the mating rule in the thought experiment on behavioural traits therefore mirror those illustrated using height. Despite this, the social sciences have generally neglected quantitative genetic models even though many other quantitative traits like IQ (Herrnstein & Murray, Reference Herrnstein and Murray1994; Jensen, Reference Jensen1998), the propensity to develop mental illnesses like bipolar disorder and schizophrenia (Owen & O’Donovan, Reference Owen and O’Donovan2003; Barnett & Smoller, Reference Barnett and Smoller2009; Wray & Visscher, Reference Wray and Visscher2010), aggression and antisociality (Barker et al., Reference Barker, Larsson, Viding, Maughan, Rijsdijk, Fontaine and Plomin2009) and other behavioural traits have significant social and economic consequences.
If the rule had its origin as a purely culturally transmitted preference then the effect on the genome would be exactly the same as those described in the thought experiment. Purely learned preferences or socially imposed assortments would ‘pick up’ any relevant genes, giving an identical genetic effect. Pervasive social transmission of these preferences would therefore probably amplify the heritability of behavioural traits. Consider new popular social traits that rapidly spread like a new religious movement, tofu consumption or crack cocaine use. Initially there may be no genetic basis for the cultural success of these traits, but suppose the trait fostered reproductive success. Standing genetic variation favouring the trait is then subject to selection. Consequently, the traits would become more heritable as initially rare genes present within the population became more common.
An older example of this effect is due to Waddington (Reference Waddington1956). He found that flies treated with ether were prone to develop a second thorax. Without the ether treatment the trait, bithorax, was absent. However, bithorax began to appear even in the absence of ether after 20 generations of selection for flies vulnerable to the ether treatment. Alleles that slightly increased the probability of bithorax, but not enough to bring out this phenotype without the large effect of the ether treatment, were apparently positively selected during this experiment. Selection then raised their frequencies high enough to trigger bithorax in the absence of ether. The observed heritability of bithorax increased through a process that Waddington called genetic assimilation. Though superficially like Lamarckian inheritance of acquired characteristics, genetic assimilation has a well-studied quantitative genetic basis.
Genetic accommodation (West-Eberhard, Reference West-Eberhard2003; Braendle & Flatt, Reference Braendle and Flatt2006) is a generalization of genetic assimilation. Even if the hypothetical examples of tofu or crack fostered reproductive success through purely social mechanisms, individuals whose biochemistry was congenial to the new trait would participate more easily and their genotypes would increase in frequency. The preference would become heritable even though it was not initially. The idea that cultural adaptation renders genetic adaptation irrelevant is widespread in anthropology. Instead, genetic accommodation suggests that the pervasiveness of cultural transmission in our species has increased overall rates of genetic adaptation (see e.g. Hawks et al., Reference Hawks, Wang, Cochran, Harpending and Moyzis2007; Cochran & Harpending, Reference Cochran and Harpending2009; Laland et al., Reference Laland, Odling-Smee and Myles2010; Richerson et al., Reference Richerson, Boyd and Henrich2010, for further discussion of cultural pressures increasing the rate of human genetic adaptation).
The Old-Order Amish
Assortative mating
Are there concrete ethnographic examples of assortative mating about a trait analogous to height with the above kind of strong effect? There are several candidates. Cochran et al. (Reference Cochran, Hardy and Harpending2006) proposed that a similar mechanism operated in the evolution of European Ashkenazi Jews during the Middle Ages: the Ashkenazim were a population closed to immigration, but with the possibility of emigration, that experienced strong selection for economic and managerial success. The Parsi of India, a closed class of managers, scholars, musicians and other economically and culturally successful individuals (Nelson, Reference Nelson2012), are another distinct population in which assortment may have been important in their evolution. Recently Charles Murray in Coming Apart (Reference Murray2013) proposed that assortative mating by education, socioeconomic status and IQ is splitting America into distinct and different classes – the ‘coming apart’ of the title.
The Old-Order Amish, an Anabaptist sect in North America, also follow the pattern of negligible inward, but significant outward, gene flow, making them a potential candidate for selection by assortment. A model is described in which there is an underlying trait that we call AQ (Amish Quotient) by analogy with IQ derived from cognitive tests. It is postulated in this model that AQ predicts the probability of an individual remaining in the community when adolescents decide to remain and be baptized or to decline membership and emigrate. Specifically the measure of AQ is a personality measure associated with the plain and simple lifestyle of the Amish, postulated to indicate an individual’s underlying affinity for membership in the community. Under the proposed mechanism of selection the AQ within the Amish population would increase each generation as those with lower AQs are truncated each generation through ‘boiling-off’ or defection. It is therefore predicted that declining boiling-off rates over time (see below) are a consequence of this selection mechanism increasing overall population AQ each generation within the Amish population.
The model performs well relative to alternative models of Amish defection and retention (see Ericksen et al., Reference Ericksen, Ericksen and Hostetler1980; Meyers, Reference Meyers1991, Reference Meyers1994; and Greksa, Reference Greksa2002, for review). While Old-Order endogamy is well described and understood, the trade-offs and constraints of defection are unclear. Reductions in defection rates over the last century (Meyers, Reference Meyers1991, Reference Meyers1994; Hostetler, Reference Hostetler1993; Greksa, Reference Greksa2002) need explanation. Previous models of Amish defection emphasize rapid population growth in regions with finite resources, particularly expensive farm land, and the resulting trade-off between culturally sanctioned high fertility and the ability of families to support their own farms (Markle & Pasco, Reference Markle and Pasco1977; Ericksen et al., Reference Ericksen, Ericksen, Hostetler and Huntington1979, Reference Ericksen, Ericksen and Hostetler1980; Wasao & Donnermeyer, Reference Wasao and Donnermeyer1996). Other models focus on the effects of occupational shifts consequent to economic constraints on farming. These models predict that increasing exposure to non-Amish lifestyles may tempt youth into defection or interfere with parents’ ability to transmit Amish identity (Meyers, Reference Meyers1991; Greksa, Reference Greksa2002). Unfortunately predictions of these more familiar approaches to Amish defection are difficult to quantify and corroborate.
Endogamy
As the most traditional and ‘plain’ Anabaptist group, the Old-Order Amish are pacifists who reserve baptism for believing adults and have strict shunning practices of deviant church members (Hostetler, Reference Hostetler1977, Reference Hostetler1993; Roth, Reference Roth2002). Such non-hostile ethnocentrism and social solidarity introduce a near absolute boundary that culturally uncouples the Amish from the ‘English’, their word for non-Amish (Huntington, Reference Huntington1957; Savells, Reference Savells1988). Familiar Amish norms like carrying out intragroup discussions in Pennsylvania Dutch, favouring horse-and-buggy over automobiles, traditional ‘humble’ clothing, absolute pacifism and forsaking appliances like telephones, televisions, refrigerators and indoor toilets, are means to signal this uncoupling from their broader society (Fuchs et al., Reference Fuchs, Levinson, Stoddard, Mullet and Jones1990; Kraybill, Reference Kraybill2001). Genealogical and genetic analysis reveals the Amish have been significantly isolated since Old-Order founding populations immigrated in the 18th and 19th centuries (Hostetler, Reference Hostetler1993; Kraybill, Reference Kraybill2001; Pollin et al., Reference Pollin, McBride, Agarwala, Schäffer, Shuldiner, Mitchell and O’Connell2007; Hurst & McConnell, Reference Hurst and McConnell2010; Lee et al., Reference Lee, Pollin, O’Connell, Agarwala and Schäffer2010). Arguably the Old-Order Amish are the most culturally and reproductively isolated Anabaptist group in the United States (Nolt, Reference Nolt1992; Hostetler, Reference Hostetler1993; Kraybill, Reference Kraybill2001).
Following this isolation, the Old-Order substitute communal aid for secular public support. Initiatives like welfare, Social Security, life insurance, barn insurance (Hurd, personal communication), government agricultural subsidies or public schools (if Amish schools are available) are therefore prohibited or strongly discouraged (Hostetler, Reference Hostetler1969, Reference Hostetler1993; Fuchs et al., Reference Fuchs, Levinson, Stoddard, Mullet and Jones1990; Hurst & McConnell, Reference Hurst and McConnell2010; Kraybill, 2011) in the interests of communal self-reliance. Amish substitutes for institutional support are primarily mediated through kin networks so extended families bear the emotional and financial burden of the welfare of their members (Hurd, Reference Hurd1983; Hostetler, Reference Hostetler1993; Hurst & McConnell, Reference Hurst and McConnell2010). Children, especially males, rely on kin support to start businesses and farms when they begin their own families (Ericksen et al., Reference Ericksen, Ericksen, Hostetler and Huntington1979, Reference Ericksen, Ericksen and Hostetler1980; Hurd, Reference Hurd1981; Wasao & Donnermeyer, Reference Wasao and Donnermeyer1996). Highly fertile Amish mothers draw on relatives to help with their large families (Hewner, Reference Hewner1998; Kraybill, Reference Kraybill2001). Costs of disabled, ill, and senior members are mitigated by family and community support (Hewner, Reference Hewner1998). The Amish are a healthy population with low incidence of infectious disease, low infant and adult mortality rates, high fertility and high standards of living (McKusick et al., Reference McKusick, Hostetler and Egeland1964a, Reference McKusick, Hostetler, Egeland and Eldridgeb; Cross, Reference Cross1969; Cross & McKusick, Reference Cross and McKusick1970; Fuchs et al., Reference Fuchs, Levinson, Stoddard, Mullet and Jones1990; Hewner, Reference Hewner1998; Greksa, Reference Greksa2002).
The 16PF
Wittmer (Reference Wittmer1970) reported the results of a personality questionnaire administered to 25 Amish and 25 non-Amish, 18- to 20-year-old men from Daviess County, Indiana. The questionnaire was a widely used personality assessment called 16PF (for sixteen personality factors) developed in the 1940s by Raymond Cattell and others at the University of Illinois (Cattell & Eber, Reference Cattell and Eber1972). It went through several revisions over the years. Each form has 180 questions that are scored to yield sixteen numbers, or factors, supposed to represent the subjects’ sixteen dimensions of personality. These are listed in Table 1 along with the adjectives in common use that supposedly describe each trait.
No use is made of the details of the traits in this paper: the sixteen numbers for each person in the study are simply regarded as self-reports of interesting characteristics. The test and the scoring system are designed to maximize the independence of the factors so the scores are not redundant indicators of the same trait. In some of the literature factor B is not reported nor used since it assesses intelligence rather than a personality trait. Here B is included since these data are treated in the same way as genetic markers like SNPs would be treated, i.e. the authors are agnostic about the interpretation and meaning of the factors.
To make sense out of the array of scores for the Amish and non-Amish sample a conventional method for data reduction is used called principal components analysis (PCA). This provides a least-squares best two-dimensional picture of differences among subjects and differences (i.e. correlations) among variables.
The process used is analogous to that of estimating IQ. An important finding from cognitive testing is that cognitive ability is essentially one-dimensional. For example, people with high mathematical ability also have large vocabularies. One empirical result of this is that almost anything constructed to test ability functions as a suitable proxy for a proper IQ test. SATs, GREs, MCATs and AFQTs are, for example, nearly interchangeable with IQ tests (Frey & Detterman, Reference Frey and Detterman2004; Beaujean et al., Reference Beaujean, Firmin, Knoop, Michonski, Berry and Lowrie2006). In addition, the United States General Social Survey (GSS) surveys a large sample of Americans every few years. The results are conveniently available on the web (General Social Survey, 2012). Each round of the survey presents each respondent with ten words, asks for their definitions, and then tabulates the variable WORDSUM that varies from 0 to 10 for that subject. WORDSUM is a useful proxy for IQ in the sense that differences among groups along the variable replicate reported differences among the same groups (Jensen, Reference Jensen1998) from more elaborate tests. The PCA of the results from Wittmer’s study may reveal an analogous one-dimensional personality trait corresponding to the AQ.
Scattergrams along principal components (groups) and principal co-ordinates (traits) from the 16PF results are shown in Figs 1 and 2. Figure 1 shows patterns among group means, the Amish and non-Amish from Wittmer’s study, along with a collection of other populations culled from the literature. Details are given in the caption of Fig. 1. The scatter of traits in the left panel is not very informative. There is no clear pattern, presumably reflecting the design goal of independence among the sixteen factors. The scatter among groups, in the right panel, is more interesting. There are three groups that are on the edge of the scatter: Amish, Chinese nurses and candidates for high-level management positions in the UK. The outstanding (AN) and ordinary (EN) Chinese nurses (see Zhang et al., Reference Zhang, Liu, Ren, Liu and Zhang2013, for data) are nearly identical to each other on these first two dimensions. Generic English (UK), according to data taken from Bartram (Reference Bartram1992), are more similar to Indiana rural young men than they are to high-level managers in the UK. Figure 1 shows a broad perspective on group differences, especially the difference between the British norms and the non-Amish Indiana rural males.
The 16PF test itself is proprietary and widely used in personnel selection. Comparable data in the literature are scarce, accounting for this strange choice of Chinese nurses and UK managers. They were simply chosen by convenience to convey a sense of the magnitude of group differences. Even more dismaying is that a standard way to present 16PF results is to normalize everyone in a sample to a fixed mean. Consequently much of the published 16PF data are not comparable between groups.
The interest in Fig. 2 is to show individual differences among young rural Indiana men. The scatter shows a nearly complete separation of the Amish and non-Amish along the horizontal axis (the first principal component). The UK mean is shown for comparison in the right panel as a diamond. Remarkably, the overall UK mean is near the centre of the non-Amish scatter, implying that they are not very different. The Amish are strongly different. In a genetic model the differences are partially the accumulated effects of Amish voluntary endogamy along with boiling-off of the young Amish in each generation who decide to leave the Amish community.
The scatter of traits in the left panel of Fig. 2 helps identify those traits distinguishing the Amish and non-Amish along the horizontal axis. Traits G, Q3 and I on the extreme left of the horizontal axis are in the space of those with the highest AQ. The 16PF scores show that Amish describe themselves, albeit indirectly, as conscientious and persistent (G), tough-minded and self-reliant (I), and controlled (Q3). Correspondingly they describe themselves as low in Q1 and E, scoring respectively as conservative/traditional and humble/co-operative. These imputed self-characterizations agree remarkably well with stereotypes of the Amish that they hold themselves according to ethnographies. The authors are agnostic about the meaning and uses of the 16PF and other similar tests except as indirect self-descriptions by subjects.
A model of (self-)selection
Under this model the horizontal axis in Fig. 2 is a numeric scale of a trait AQ identifying where an individual is in a suite of traits differentiating young Amish men from their non-Amish neighbours. Movement out of the Amish community is open to anyone, especially around adolescence. What would be the consequences for the Amish community if individuals on the low end of AQ were especially prone to leaving the population?
It is informative to bring a quantitative genetics model to the phenomenon as a well understood baseline for evaluating competing models. In this spirit it is assumed that AQ is a partially heritable quantitative trait. There are enough data on behavioural or psychological trait heritabilities to sensibly estimate the additive heritability of AQ, written h 2, at 0.5. It can be seen below that the conclusions are unaltered if the heritability is set to either a plausible limit of 0.25 or 0.75. Boiling-off rates vary among communities but 0.10 (i.e. 10% of youth choose to leave per generation) is a reasonable contemporary value.
Following standard practice in quantitative genetics it is now assumed that AQ has an underlying normal or Gaussian distribution in the Amish population. The 10% boiling-off rate is equivalent to losing the bottom 10% of that distribution each generation. It is also assumed that the boiling-off mechanism is equivalent to truncation selection, but other selection schemes will have some equivalent alternative truncation point.
With 10% with the lowest AQ leaving each generation, the average AQ of those who remain is 0.20 standard deviations if the population mean before emigration was 0. In other words this amount of self-selective emigration leaves behind a population whose average AQ is 0.2 standard deviations greater than it was before the emigration. Assume that mating occurs exclusively within the Amish. Fixing the heritability of AQ at 0.5 means that the offspring will regress halfway back to their parental mean.
The next generation of children will therefore have an average AQ of 0.10 standard deviations greater than their parents did before emigration. The process of selective emigration repeats each generation, increasing the mean Amish AQ by one-tenth of a standard deviation per generation. With 25 years per generation, ‘Amishness’ will increase by a full standard deviation in ten generations, or 250 years, unless changes in gene frequency lead to changes in heritability. This is substantial social evolution on a time scale of a few centuries.
The modelled AQ of an individual is simply the value of that individual along the x-axis of the right panel of Fig. 2. The computed standard deviations of AQ for the Amish and non-Amish subjects are 1.33 and 1.31 respectively, giving the within-group standard deviation to be 1.32 units. Interestingly the variances within groups, i.e. the squared within-group standard deviations, differ very little. It is not an unusual finding in experiments that selection has only small effects on the variance of a selected population. Falconer and Mackay (Reference Falconer and Mackay1995) discussed the outcome of experiments at length in their chapter 12.
The means of the two groups are −1.4 (Amish) and 1.4 (non-Amish) units. Their difference, corresponding to the distance between the two group means, is 2.8 units. Our model implies a difference of this magnitude requires 28 generations, or 700 years, which is substantially longer than the Amish have been in the United States. This may suggest that the heritability of AQ is higher than the assumed 0.5, that the boiling-off rate was higher in the past (for which there is support), or that the process has been going on longer than ten generations. This last possibility is unlikely. Currently no Old-Order exist in Europe. Persecution and political sanctions in the 17th and 18th centuries had spread European Amish families too far apart to form cohesive communities, forcing early European Amish to eventually assimilate with non-Amish (Gascho, Reference Gascho1937; Hostetler, Reference Hostetler1977; Crowley, Reference Crowley1978; Nolt, Reference Nolt1992; Roth, Reference Roth2002). Thus the magnitude of Amish endogamy was probably negligible in Europe compared with Amish endogamous mating in the United States. If correct, this suggests the Old-Order tenure in Europe did not contribute to the differences reported here.
Taking the results shown in Fig. 2 at face value implies the between-group difference (2.8 standard deviations) is too large to be attributable to just simple genetic change with a predicted difference of 1.0 standard deviations, or 1.5 standard deviations if heritability is assumed to be as high as 0.75. The best conclusion from comparing the data with the simple genetic model is that other social forces, e.g. family transmission, are mostly responsible for the difference between the Amish and non-Amish young men.
Homicide decline in European history
Much of quantitative genetics is about some measurable metric character in a population. A convenient and useful way to study many discrete characters is to treat them as threshold characters that only manifest when an underlying hidden quantitative trait exceeds some threshold. Cleft palate, for example, is a failure of the palate to close at the midline during intrauterine development. There is an underlying metric trait in the model that is the strength of the mechanism leading to midline fusion. Fusion fails for trait values below a threshold strength of this mechanism. The model further specifies what the incidence should be in relatives of varying degree. It predicts cleft palate risk in relatives of those affected quite closely (Fraser, Reference Fraser1970).
Cloninger et al. (Reference Cloninger, Reich and Guze1975b) proposed a threshold model for the prevalence of sociopathy and hysteria in humans. In their model both disorders reflect the same underlying heritable trait but the sociopathy liability threshold in females is higher than the male threshold. They consider hysteria to be a less deviant form of sociopathy in women whose liability is above the male, but below the female, threshold. Sociopathy prevalence in the general population, the higher proportion of sociopathy among relatives of female sociopaths, and the clustering of both male and female sociopaths within families closely fit the predictions of the model (Cloninger et al., Reference Cloninger, Reich and Guze1975a, Reference Cloninger, Reich and Guzeb).
Eisner (Reference Eisner2001) tabulated the historical decline in homicide rates, excluding war deaths, in several European countries from the fourteenth to the twentieth century. A typical finding is that homicide rates fell from 40 per 100,000 people in 1300 to 1 per 100,000 in 1900, a forty-fold decline. This striking social change seems ‘large’. A threshold model is considered to sharpen this intuition.
This simple model posits that the propensity to murder is a normally distributed trait in the population. Individuals with high propensity will murder while those with low propensity will not. Following Eisner the threshold for homicide is placed at 40/100,000. Tables of the standard normal distribution show that this threshold corresponds to the right tail of the propensity distribution beyond 3.35 standard deviations. After 600 years the rate declined to 1 in 100,000, corresponding to the tail beyond 4.26 standard deviations. If the whole distribution moved with no change in the variance, as expected with the simple Fisher model, the shift was just 4.26−3.35≈0.9 standard deviation. On a stature scale in which the standard deviation is 7.26 cm this corresponds to a change in mean stature of 6.53 cm in 600 years. This amount of change in 600/25 or 24 generations amounts to just over 1/4 cm per generation. This is in the order of known changes in stature in European history. From this perspective the decline in homicide is not especially remarkable at all and plausibly reflects genetic change. This is not to claim that genetic change accounts for most of the decline in homicide. However, it does suggest that this model is an informative yardstick for measuring how substantially homicide rates declined.
Modifying the assumptions of this plausible quantitative genetics model yields some insight into its robustness. Assume, for example, that murderers have a variable number of victims and that the average number of victims is two. Since the average number of homicides per murderer is two, the thresholds are estimated by a decline from 20 to 0.5, rather than 40 to 1, per 100,000 people. These numbers correspond to 3.54 and 4.42 standard deviations on the right of normal curves. The change is therefore about 0.9 standard deviation – the same result found in the simple model. The qualitative conclusion of unremarkable change in 600 years remains the same.
A perhaps more realistic elaboration of the model is that murderers murder other murderers. The homicide rate under these conditions reflects not the density, but the square of the density, of murderers assuming a mass action rate of random encounters between murderers. In this case the change in the square root of the homicide rates is looked at so that the tails of this distribution in 1300 and 2000 are respectively at 2.73 and 2.05 standard deviations to the right of the mean. This elaborated model predicts a shift in the distribution of just 0.7 standard deviation, even less than in the simple model. It is concluded that the simple quantitative genetic model is fairly robust to perturbations. Frost and Harpending (Reference Frost and Harpending2014) examined English homicide and execution rates, in addition to death rates of murderers awaiting trial, during this period in more detail. They also concluded that genetic transmission could account for much of the social change.
Conclusions
In this paper an index was constructed from responses to a personality questionnaire that distinguishes Old-Order Amish from their non-Amish neighbours. The associated genetic model posited that the differences between the two populations are a consequence of selective ‘boiling-off’ of Amish individuals over time. This process is equivalent to truncation selection from the viewpoint of the Amish population since those who found the community least congenial left. This example illustrates that groups in particular demographic or occupational niches with reproductive assortment can undergo genetic change that significantly differentiates them from other groups.
The data suggest that the simple model incompletely accounts for the observed group differences. Even if the heritability of AQ is high, differences accumulate under genetic transmission at about 0.1 standard deviations per generation. The difference between the group means is 2.8 standard deviations. If this difference were a consequence of purely genetic change such a difference would require 28 generations, or 56 generations with a more reasonable heritability estimate of 0.25. Historical evidence suggests that the pattern is no older than ten generations. The implication is that the cultural difference between the Amish and their neighbours reflects mostly strong cultural transmission. Unfortunately, previous cultural transmission models of Amish defection proposed by Markle and Pasco (Reference Markle and Pasco1977), Ericksen et al. (Reference Ericksen, Ericksen, Hostetler and Huntington1979, Reference Ericksen, Ericksen and Hostetler1980), Meyers (Reference Meyers1991, Reference Meyers1994), Wasao and Donnermeyer (Reference Wasao and Donnermeyer1996) and Greksa (Reference Greksa2002) give only plausibility arguments that are difficult to quantify and falsify. On the other hand, quantitative genetic theory is well established in agriculture and other areas of biology like biometrics and conservation biology. The present study illustrates the utility of quantitative genetic models as a useful yardstick for describing social traits and evaluating transmission models, suggesting that quantitative genetics deserves more attention and use in human evolutionary biology.
The historical decline in European homicide rates since the fourteenth century was also considered. Gregory Clark (Reference Clark2007) described this decline as part of a larger heritable change in Europe leading up to the Industrial Revolution. Several variants of a simple model of selection against homicide indicate such genetic change is entirely plausible.
Acknowledgments
The authors thank Adrian Bell, Stephen Beckerman, Elizabeth Cashdan, Gregory Cochran, James Lee, Alan Rogers and Ryan Schacht for their comments and criticisms, which substantially improved this paper.