1. Introduction
The classical result of Haldane (Reference Haldane1927) that a beneficial mutation with selective advantage s will reach fixation with probability approximately equal to 2s, has been an important contribution to evolutionary theory. The derivation of this simple result was based on assuming a closed population with given constant size as well as Poisson distributed family size. However, determining probabilities of fixation is a lot more complicated under more realistic assumptions, such as in models including spatial variation (Whitlock, Reference Whitlock2002; Cherry & Wakeley, Reference Cherry and Wakelay2003; Wakeley, Reference Wakeley2003; Taylor, Reference Taylor2008) and fluctuations in population size (Ewens, Reference Ewens1967; Otto & Whitlock, Reference Otto and Whitlock1997; Phillips, Reference Phillips1997; Pollak, Reference Pollak2000; Campbell, Reference Campbell2003; Lambert, Reference Lambert2006; Parsons & Quince, Reference Parsons and Quince2007). Here we analyse the effect of fluctuating population size described by a stationary stochastic process. Most real populations often show large fluctuations due to environmental effects on all individuals, and the fate of a new beneficial mutation is mainly determined by the dynamics of the subpopulation with the mutation during the initial several generations after it arises. The process describing the number of mutants is therefore complex, with stochasticity in the gene transmission as well as temporally correlated noise due to stochastic fluctuations in population size.
Otto & Whitlock (Reference Otto and Whitlock1997) derived the fixation probability for a beneficial mutation in a fluctuating population. Their derivation and results are, however, based on the assumption that the sequence of population sizes through time is known. Hence, although their theory is stochastic in the sense that it is based on the classical result of Haldane (Reference Haldane1927) with stochastic variation in gene frequency (random genetic drift), the population dynamical part of their theory is deterministic. Here we extend this theory by incorporating stochastic population dynamics. We show that the expected fixation rate of beneficial mutations depends primarily on the environmental variance causing stochastic population fluctuations, rather than the harmonic mean population size through time. The classical formula of Wright (Reference Wright1931, Reference Wright1969) for effective population size based on harmonic mean population size through time is expected to underestimate the mean probability of fixation when temporal fluctuations are viewed as stochastic rather than deterministic.
Haldane (Reference Haldane1927) assumed that family size is Poisson-distributed with mean 1 for all individuals except those bearing the mutation, for which the Poisson mean is 1+s. We employ a similar model in which the family size produced by an adult individual is defined as the number of offspring surviving to maturity, assuming that the number of surviving offspring per family is Poisson-distributed for each genotype. This model also would apply if all adults of a given genotype produce large, nearly equal numbers of offspring, which suffer a high non-selective mortality, leaving an approximately Poisson distribution of number of surviving offspring per family. Haldane further assumed a large population size, Ns≫1 so that selection is more important than random drift at intermediate frequencies of the mutant allele. For a fluctuating population we accordingly assume that the mean population size is much larger than 1/s. Otto & Whitlock (Reference Otto and Whitlock1997) made the same assumption of Poisson family size and derived the probability u i that a mutation in generation i will be fixed for a given sequence of population sizes N 1, N 2, …. A single mutation in generation i will have a number of copies next generation that is Poisson-distributed with mean λ=(1+s)N i+1/N i. Hence the total probability that the mutation will not be fixed is
(Ewens, Reference Ewens1967; Otto & Whitlock, Reference Otto and Whitlock1997). Haldane's result appears by assuming all population sizes equal, N i=N, and u i=u, in which case the solution is approximately u=2s for a small s. With changing population size eqn (1) can be solved iteratively for a given sequence N i, but only as a function of the unknown initial probability u 1. For cyclic populations, however, the formula defines probabilities of fixation by the fact that u i takes the initial value after one cycle. Hence, computing u i+c as a function of u 1, where c is the cycle length, and solving numerically u 1=u 1+c gives the complete solution. Also for populations approaching a limiting size, the probabilities can be found utilizing the probability at equilibrium, which is the same as for constant population size (Otto & Whitlock, Reference Otto and Whitlock1997).
If a single mutation produces k copies in the next generation, the factor (1−u i+1)k in eqn (1) is based on the assumption that future contributions from these alleles are independent. This is a realistic assumption if the future population sizes are known. However, with stochastic population fluctuations this assumption of independence is no longer valid. Environmental stochasticity driving the fluctuations in population size exerts the same impact on all individuals with the mutant gene, violating the assumption of independence among the branching processes for copies from the original mutation.
This paper focuses on deriving an accurate approximation to the probability of fixation for fluctuating populations described by a stationary Markov process. For such populations the probability of fixation of a beneficial mutation at population size N must depend on N only, because the Markov assumption ensures that all properties of the future population depend only on the initial population size. To derive these probabilities P(A|N)=u(x), where A denotes the event of fixation and x=ln(N/K) is the log of the initial population size scaled by the carrying capacity K, we first consider the probability of fixation conditioned on all future population sizes. However, our primary goal is not the computation of the probability u(x), which is conditioned on the population size at mutation, but rather the expected probability and rate of fixation for mutations averaged over the stationary distribution of population sizes. The expected probability of fixation, denoted ū, will only depend on the population dynamical parameters and the selective advantage s. Using the theta-logistic model (Gilpin & Ayala, Reference Gilpin and Ayala1973), we explore how the environmental variance and the strength and form of density-regulation influence the expected probability and rate of fixation of beneficial mutations.
Model
(i) Formulation and general solution
Writing Ni=(N i, N i+1, …) the recurrence formula of Otto & Whitlock (Reference Otto and Whitlock1997) takes the form
This conditional model assumes known future population sizes. Taking the expectation on both sides with respect to the distribution of Ni+2 conditioned on N i+1 yields
Conditioned on N i+1 the fixation probability P(A|N i+1, Ni+2) is a stochastic variable because it is a function of the unknown sequence of future population sizes Ni+2, and the right-hand side of eqn (2) is its moment generating function with argument −(s+1)(N i+1/N i). Numerical calculations show that its variance is quite small and could be set to zero to give a rough first-order approximation. However, a better approximation can also easily be obtained by including the variance of P(A|N i+1, Ni+2) conditioned on N i+1. This second-order approximation for eqn (2) would be the exact solution if P(A|N i+1, Ni+2) was normally distributed and is generally an accurate approximation if the variance of this variable is small. Although P(A|N i+1, Ni+2) depends on the unknown future expressed by Ni+2 it is likely to have a small variance (later exemplified by numerical solution in Fig. 2). Writing t=−(s+1)(N i+1/N i) and introducing the scaled log population size x i=ln(N i/K), we then have
where u(x i+1)=EP(A|N i+1, Ni+2)=P(A|N i+1) and σ2(x i+1)=var[P(A|N i+1, Ni+2)]. The expectation as well as the variance here refer to the distribution of Ni+2 conditioned on N i+1.
One way of formulating the stationary time series N i is to specify the distribution of the scaled population size in the next generation y=ln(N i+1/K) given the value x=ln(N i/K) the current generation, say f(y|x). Then, multiplying eqn (4) by f(y|x) and integrating over y yields
This equation for the unknown function u(x) can be used to find the first-order approximation, setting σ2(x) to zero and adopting the fixed point method. We then substitute the kth approximation for u(x), say u k(x), on the right side giving 1−u k+1(x) on the left, starting with Haldane's (Reference Haldane1927) result for a constant population u 0(x)=2s and computing subsequently μ1(x), μ2(x), … approaching the solution of eqn (5). To find the more accurate second-order approximation we introduce the second moment of 1−P(A|N i+1, Ni+2), ν(x)=σ2(x)+[1−u(x)]2. The same argument as the one used in the derivation of eqn (5), starting by squaring both sides of eqn (1), then yields
Equations (5) and (6) can now be solved jointly for u(x) as well as σ2(x) by calculating both integrals based on the kth approximation for u(x) and σ2(x), giving on the left-hand side the (k+1)th approximation for u(x) and ν(x), and hence also for σ2(x). The iterations can be initiated by choosing u 0(x)=2s and σ02(x)=0.
(ii) The theta-logistic model
A population dynamical model must now be specified to derive f(y|x). A flexible model describing a wide range of types of density regulation, magnitude of stochasticity and expected return times to equilibrium is the theta-logistic model (Gilpin & Ayala, Reference Gilpin and Ayala1973; Diserud & Engen, Reference Diserud and Engen2000; Lande et al., Reference Lande, Engen and Sæther2003). Writing ΔN for the annual change in population size, the deterministic form of the model is
where r 0 is the growth rate at N=0. However, the model is more flexible if we consider the lowest possible population size to be 1 writing the formula as the expression on the right-hand side, where r 1=r 0(1−K −θ) (Diserud & Engen, Reference Diserud and Engen2000) is the growth rate at N=1. Choosing r 1 as a free parameter and varying θ then defines a number of well-known population models. For θ=1 we recover the classical logistic model, while the limit obtained as θ approaches zero is a linear expression in lnN, often called the Gompertz model. As θ approaches infinity we arrive at a model with a ceiling at K and no density-regulation below this value. Finally, for θ=−1 the model is linear in the population size N. Alternatively, the model may be formulated on the log scale as
For small or moderate fluctuations in the population size, these models are approximately equivalent because ΔlnN≈ΔN/N. The stochastic formulation on the log scale takes the form
where r 1(t) is the growth rate at time t fluctuating between years with mean 1 (May, Reference May1973, Reference May1974; Turelli, Reference Turelli1977; Engen et al., Reference Engen, Bakke and Islam1998; Lande et al., Reference Lande, Engen and Sæther2003). For large populations the demographic stochastic component of r 1(t), due to within year variation in vital rates between individuals, can be ignored by the law of large numbers and the only stochastic components are those generated by fluctuations in the environment producing approximately a constant environmental variance σe2 in r 1(t) between years. For this model 1 is the mean growth rate on the log scale at small population sizes (N=1) called the stochastic growth rate (Lande et al., Reference Lande, Engen and Sæther2003). Again writing x and y for the value of ln(N/K) in two sequential generations, we have var(y|x)≈σe2, and
for θ≠0 and (1−γ)x for θ=0. The parameter γ is the strength of density regulation defined as 1θ/(1−K −θ)=0θ for θ≠0 and 1/ln K for θ=0. The expected return time to equilibrium defined by May (Reference May1974) for this model is T=1/γ.
Finally, the change in log population size between generations can often be approximated by a normal distribution. These assumptions then specify f(y|x) as the normal distribution with the mean given by eqn (7) and constant variance σe2, which we apply in the recurrence relations for the fixed point iterations defined by eqns (5) and (6).
(iii) Mean fixation rate
The rate of production of new mutations is proportional to the population size. Writing g(x) for the stationary distribution of the scaled log population size, the distribution of population size given that a mutation occurs (M) must be proportional to ex, that is,
Hence, for a random beneficial mutation with selective advantage s the probability of fixation is ū=∫u(x)h(x|M)dx.
Although the stationary distribution is known for the theta-logistic model based on the diffusion approximation (Diserud & Engen, Reference Diserud and Engen2000), more accurate values for the fixation probability for our model with discrete generations can be found numerically by using the fixed point method also for g(x), calculating the (k+1)th approximation as g k+1(x)=∫f(x|y)×g k(y)dy, which converges quickly. Figure 1 shows examples of g(x) as well as h(x|M).
For the Gompertz model with θ=0, the population process is a simple first-order autoregressive time series and the stationary distribution f(x) is normal with zero mean and variance υ=σe2/(2γ−γ2). The population size then has a lognormal stationary distribution with coefficient of variation . When θ≠0 the mean population size in the next generation may be linearized at x=0 to give the approximation E(y|x)≈(1−γ)x. Accordingly, the above normal distribution can also be used as an approximation for the stationary distribution in non-linear models defined by θ≠0.
(iv) Fixation rate at a given population size
Since u(x) is the probability of fixation for a single mutation at scaled population size x, the rate of fixation for mutations at population size N is R(x)=2μNu(x). Here 2μ is the genomic mutation rate for beneficial mutations, and we assume that the product of the per locus mutation rate and the population size is small enough to render unlikely any interference among beneficial mutations that could occur if multiple mutations simultaneously segregate at a single locus or among tightly linked loci. For a constant population size K, the fixation rate is approximately R 0=4sKμ. Accordingly, for population size N=K ex, we define the scaled fixation rate R(x)/R 0=u(x)N/(2sK), which measures the reduction in fixation rate due to stochastic fluctuations in population size, relative to the fixation rate for a constant population of size N=K.
3. Results
The proposed fixed point method for the joint solution of u(x) and σ2(x), based on eqns (5) and (6) with initial conditions u 0(x)=2s and σ02(x)=0, converges for relevant values of the parameters γ, θ and σe2. Examples of the function u(x) plus or minus one standard deviation σ(x) are shown in Fig. 2. The results agree with Otto & Whitlock's (Reference Otto and Whitlock1997) findings in the sense that the probability of fixation is larger for populations that are expected to grow (x<0) than for those expected to decrease (x>0). For mutations at equilibrium the probability of fixation is usually a little smaller than the probability 2s of fixation at constant population size.
In Fig. 3, we show a simulation of N/K for a stochastic logistic model together with the probabilities of fixation scaled by 2s, u(x)/(2s). This graph is qualitatively similar to Otto & Whitlock's (Reference Otto and Whitlock1997) Fig. 4, showing the fluctuations of the cyclic snowshoe hare population over a period of 20 generations (Krebs et al., Reference Krebs, Boutin, Boonstra, Sinclair and Smith1995) together with probabilities of fixation fluctuating in an opposite way, assuming deterministic cyclic fluctuations. We see from Fig. 3 that the unconditional probability, that is, the probability of fixation only conditioned on the population size at mutation and averaged over the future stochastic population fluctuations, also fluctuates opposite to the scaled population size.
Figures 2, 4 (upper panel) and 5 (upper panel) show how the environmental variance (Fig. 2), the return time to equilibrium (Fig. 4) and the form of density-regulation defined by θ (Fig. 5) affect u(x). The overall message from these examples is that changing T=1/γ and θ has surprisingly small effects except at very small population sizes. Linearizing the theta-logistic model at equilibrium gives the first-order autoregressive model with stationary variance of ln(N/K) equal to σe2/(2γ−γ2). Hence, in Fig. 4 the stationary distribution of x=ln(N/K) has standard deviations of approximately 0·1 for T=1 and 0·22 for T=10, which implies that the probability of fixation is almost unaffected by T for this magnitude of variation in x. The same argument holds for the effect of changing θ.
Figures 4–6 show examples of reduction in fixation rate measured by R(x)/R 0=u(x)N/(2sK) as functions of the scaled log population size for different values of the environmental variance σe2 (Fig. 6), the return time to equilibrium T=1/γ (Fig. 4, lower panel) and the shape of the density-regulation θ (Fig. 5, lower panel), respectively. Increasing σe2 causes a substantial reduction in the scaled fixation rate for all population sizes, whereas the effects of T and θ are quite small, especially for scaled population sizes x=ln(N/K) in the range (−0·5, 0·5), corresponding to a population size N in the range (0·61K, 1·65K).
In Fig. 7, we show how the mean probability of fixation for a random beneficial mutation ū (averaged over the stationary first-moment distribution h(x|M)) depends on σe2, γ and θ for realistic ranges of these parameters. This confirms that the only substantial influence arises from the environmental variance, an increase of which reduces the probability of fixation. Interestingly, we see indirectly that the stationary variance of the population fluctuations, which is approximately υ=σe2/(2γ−γ2) for the scaled log population size, is not by itself important. Figure 7 shows practically no influence of γ even if a change in γ leads to a large change in υ. On the other hand, an increase in the environmental variance substantially decreases the mean fixation probability. The practical conclusion from these findings is that short-term fluctuations in population size between generations produced by σe2 are much more important than the long-term stationary fluctuations in population size itself (Figs 6 and 7).
As a further illustration we consider the Gompertz model (θ=0) with an exactly lognormal stationary distribution. In Fig. 8 this distribution is kept constant while varying the strength of density regulation γ and the environmental variance σe2 so that υ=σe2/(2γ−γ2) of log population size and the of the population size N are kept constant. Figure 8 shows ū plotted against return time to equilibrium T=1/γ (lower panel) as well as against σe2 (upper panel) for realistic parameter ranges. There is a decrease in ū up to about 25% relative to the maximum value for the chosen parameter intervals although the stationary distribution remains unchanged. Figure 8 (upper panel) also shows the approximation for ū based on using the harmonic mean N e, that is, the exact value of the probability of fixation for a constant population size multiplied by 2N e/, where is the mean population time through time. It appears that this approximation underestimates ū for large values of CV.
4. Discussion
Ewens (Reference Ewens1967) considered a model where the population size cycles deterministically and showed that such fluctuations decreased the probability of fixation for a beneficial mutant. This result was later confirmed by Otto & Whitlock (Reference Otto and Whitlock1997), who concluded that fixation is more likely if the mutation occurs during increase than during decrease in population size. This was supported by a logistic diffusion model of Lambert (Reference Lambert2006) with only demographic stochasticity. He found that fixation probabilities were larger than Haldane's (Reference Haldane1927) result 2s for constant populations if mutations occurred below the carrying capacity and were smaller above. This result was confirmed by Parsons & Quince (Reference Parsons and Quince2007) using a logistic type of birth and death process for the total population size. On the other hand, in a population with immediate return to equilibrium, Campbell (Reference Campbell2003) found that fixation probabilities could be larger than for constant population size. Cyclic fluctuations were also dealt with by Pollak (Reference Pollak2000), who found, in accordance with Otto & Whitlock (Reference Otto and Whitlock1997), that the probability of fixation was approximately proportional to the inverse of population size 1/N at mutation, provided that s multiplied by the cycle length was small.
Many populations undergo fluctuations over long periods of time which often can be realistically described by a stationary stochastic process. Here, we have used the theta-logistic population model, assuming population sizes are large enough for demographic stochasticity to be ignored. This model is flexible, covering a wide range of possible types of population fluctuations, including large fluctuations of large populations generated by a fluctuating environment. The models of Lambert (Reference Lambert2006) and Parsons and Quince (Reference Parsons and Quince2007), on the other hand, have only demographic stochasticity defined through the independent birth and death of individuals and will therefore have a CV in total population size that approaches zero as the carrying capacity increases. Fluctuating environments that affect all individuals in the population in the same way generate stochastic dependence between individuals that leads to a more complicated dynamics than under demographic stochasticity alone.
The accurate approximation to the probability of fixation derived here is the ‘unconditional’ probability, conditioned only on the population size at mutation. This is rather different from the probability used by Ewens (Reference Ewens1967) and Otto & Whitlock (Reference Otto and Whitlock1997), which is conditioned on all future population sizes. Nevertheless, the results are rather similar. We find a large reduction in fixation probability at population sizes above average at the time of mutation (x>0), as these are expected to decrease towards the mean size. Considering the stochastic variation in population size for different models, exemplified by a simulation in Fig. 3, we also see that the variation in population size is large enough for the fixation probabilities to change considerably through time. The shape of these curves for unconditional probabilities is approximately proportional to 1/N, implying a nearly exponential function of x. This is typically the form of the curves shown in Fig. 2 and Figs 4–5 (upper panels) although the shape changes with the population dynamical parameters, especially the environmental variance σe2. Deviation from proportionality to 1/N is seen most clearly from Figs 6 and 8 showing the scaled fixation rates, as these should be constant functions of x if u(x) is proportional to 1/N so that u(x)N is constant.
If all individuals are equally prone to mutate, it is realistic to assume that mutations are more likely to arise in large populations, or more precisely, the probability of a single mutation is proportional to population size. Hence, the population size at mutation does not have the stationary distribution of the population process, but rather the so-called first-moment distribution multiplied by N and then scaled. For the scaled log population size x, the distribution, correspondingly, has to be multiplied by ex. Integrating the fixation probabilities u(x) over this distribution leads to the mean fixation probability for a random mutation ū with advantage s. In this model the mean fixation probability ū is rather insensitive to substantial changes in the form of the density-regulation and the return time to equilibrium (Fig. 7). The most important parameter is the environmental variance σe2 (Fig. 7) rather than the variance of the stationary distribution of population size itself. This occurs because the number of descendants of a beneficial mutation during the first few generations after it originates is crucial for its fate, as pointed out by Crow & Kimura (Reference Crow and Kimura1970) and Phillips (Reference Phillips1997). Short-term variation in population size is therefore of great importance, and is mainly determined by the environmental variance.
These conclusions are further illustrated in Fig. 7. The environmental variance is of the order of 0·01 for most populations, corresponding to a standard deviation of for fluctuations in the population growth rate between generations (Lande et al., Reference Lande, Engen and Sæther2003). With this environmental variance we see that large changes in CV caused by differences in γ produce quite small changes in ū, while changes in CV produced by differences in σe2 create much larger changes in ū.
It is commonly argued that long-term genetic drift and fixation probabilities are determined by the effective population size N e approximated by the harmonic mean population size N through time (Wright Reference Wright1931, Reference Wright1969; Waples, Reference Waples2006). Crow & Kimura (Reference Crow and Kimura1970) claimed that the mean probability of fixation is approximately 2s(N e/). As increasing the amplitude of population fluctuations tends to produce a smaller harmonic mean, the mean fixation probability of a beneficial mutation should then decrease with increasing population fluctuations. We see from Fig. 8 that this is correct, but ū may still be sensitive to changes in environmental variance σe2 when N e/ is kept constant. This occurs because the fixation of a new mutant depends heavily on its fate during the first several generations after its origination. Thus mean fixation probabilities may depend rather strongly on σe2 even if the stationary distribution of population size and the harmonic mean are kept constant.
This study was generously supported by grants from the Research Council of Norway (STORFORSK), the Royal Society of London and cord funding to the Centre for Conservation Biology from the Norwegian University of Science and Technology.