1. Introduction
A central prediction from population genetics theory is that natural selection can have an important influence on levels of linked variation, a process referred to as hitch-hiking. Negative (‘background selection’) (Charlesworth et al., Reference Charlesworth, Morgan and Charlesworth1993) and positive (‘selective sweeps’) (Maynard Smith & Haigh, Reference Maynard Smith and Haigh1974) selection both reduce the effective population size of linked sites, reducing neutral variation. Conversely, balancing selection can lead to elevated variation due to the long-term maintenance of selected alleles (Charlesworth et al., Reference Charlesworth, Nordborg and Charlesworth1997). Because the effects of selection on linked variation depend on the amount of linkage disequilibrium between selected and neutral sites, the effects on genetic variation will be more pronounced in genomic regions, or genomes, experiencing low levels of recombination (Braverman et al., Reference Braverman, Hudson, Kaplan, Langley and Stephan1995; Nordborg et al., Reference Nordborg, Charlesworth and Charlesworth1996). Evidence for a suppression of diversity in regions of low recombination (Begun & Aquadro, Reference Begun and Aquadro1992), regions of high gene density per unit of recombination (Nordborg et al., Reference Nordborg, Hu, Ishino, Jhaveri and Toomajian2005) and self-fertilizing species (Glemin et al., Reference Glemin, Bazin and Charlesworth2006) are broadly consistent with these predictions.
The organelle genomes of plants and animals are generally considered to be non-recombining (Barr et al., Reference Barr, Neiman and Taylor2005), experiencing uniparental inheritance as an effectively haploid genome (Birky, Reference Birky1995; Birky et al., Reference Birky, Maruyama and Fuerst1983). Because of this, the action of natural selection at any site should influence patterns of variation genome-wide, leading to departures from neutral expectation across the genome. In most plants, predominant maternal inheritance of both the mitochondria and the chloroplast imply that they should generally be inherited as a linked unit and that, if frequent enough, selection acting at any region of these organelles can have an important influence on the amount of genetic variation.
To date, few studies have tested the assumptions of neutral equilibrium in organelle genomes. In Drosophila, several studies indicate a reduction in nucleotide variation in mitochondrial genomes relative to the nucleus, controlling for expected differences in effective population size due to maternal inheritance (Ballard et al., Reference Ballard, Hatzidakis, Karr and Kreitman1996; Ballard & Kreitman, Reference Ballard and Kreitman1994). These patterns have been interpreted as reflecting the presence of selective sweeps of mitochondrial variation due to the recent fixation of advantageous mutations. Recent studies have also shown signatures of selection on mitochondria in other insects (Jiggins, Reference Jiggins2003; Jiggins & Tinsley, Reference Jiggins and Tinsley2005) and these have been shown to be associated with male-killing bacteria which are inherited maternally. In humans, mitochondria show an excess of low-frequency variants, in contrast to the nuclear genome (Hey, Reference Hey1997). Although this difference could reflect the action of recent positive selection on the mitochondria, a population bottleneck in the species' history could also explain the data, since genomes with contrasting effective size will be at different stages of recovery from demographic events (Fay & Wu, Reference Fay and Wu1999). Recently, a meta-analysis of animal polymorphism datasets suggested a general lack of correlation between nuclear and cytoplasmic diversity (Bazin et al., Reference Bazin, Glemin and Galtier2006), and this was interpreted as the predominance of hitch-hiking on mitochondrial genomes (but see Wares et al., Reference Wares, Barber, Ross-Ibarra, Sotka and Toonen2006).
In plants, organelle genomes experience low mutation rates (Wolfe et al., Reference Wolfe, Li and Sharp1987; but see Cho et al., Reference Cho, Mower, Qiu and Palmer2004; Barr et al., Reference Barr, Keller, Ingvarsson, Sloan and Taylor2007). This has limited studies of direct nucleotide sequence variation (although restriction fragment length polymorphism in chloroplasts is often used in studies of phylogeography), and almost no data are available directly comparing nucleotide polymorphism relative to divergence at organelle versus nuclear genes. Because the mitochondria and chloroplast should behave as effectively haploid genomes, the effective population size (treated here as the effective number of gene copies, not individuals) should be half that of the nuclear genome in an outcrossing hermaphrodite, in the absence of natural selection (Birky et al., Reference Birky, Maruyama and Fuerst1983). However, this prediction assumes equal effective population sizes through male and female function; if the variance in fitness differs between male and female function, or migration rates through pollen and seed are different, this prediction may not hold (Laporte & Charlesworth, Reference Laporte and Charlesworth2002).
In highly selfing organisms, the effective population size of the nuclear genome is reduced by homozygosity (Nordborg & Donnelly, Reference Nordborg and Donnelly1997; Pollak, Reference Pollak1987), and under neutrality this would equalize effective sizes of the organelles and nuclear genes. Thus, under neutrality, the effective population size in selfing populations is reduced at nuclear but not cytoplasmic genomes relative to outcrossers. However, if selfing populations experience more frequent population bottlenecks than outcrossers, both organelle and nuclear variation could be reduced in highly selfing taxa (Fenster & Ritland, Reference Fenster and Ritland1992; Graustein et al., Reference Graustein, Gaspar, Walters and Palopoli2002). On the other hand, nuclear genes should experience greater hitch-hiking effects in selfing taxa compared with outcrossing relatives due to lower effective recombination rates (Charlesworth & Wright, Reference Charlesworth and Wright2001), and the net result on diversity at cytoplasmic versus nuclear genes will depend on the relative rates and distributions of fitness effects of mutations, which will determine the strength of the hitch-hiking effects.
The low rates of substitution may imply that plant organelles are unlikely to experience frequent hitch-hiking, due to insufficient rates of deleterious and advantageous mutation. However, despite low nucleotide substitution rates, plant mitochondria appear to experience high rates of genome rearrangement (Palmer et al., Reference Palmer, Adams, Cho, Parkinson and Qiu2000), and it is possible that many of these events are subject to both frequent negative and positive selection. Indeed, many mitochondrial rearrangements have been identified that are associated with cytoplasmic male sterility (Schnable & Wise, Reference Schnable and Wise1998), and they can be under strong directional or balancing selection in plant populations (Saur Jacobs & Wade, Reference Saur Jacobs and Wade2003). Similarly, gene order has been inferred to be subject to positive and negative selection in the chloroplast (Cui et al., Reference Cui, Leebens-Mack, Wang, Tang and Rymarquis2006). If genome rearrangements are common and subject to natural selection, this can leave important signatures on patterns of nucleotide variation.
Despite the low rates of substitution, a number of recent studies have found evidence for significant levels of polymorphism in plant mitochondria and chloroplast genomes (Houliston & Olson, Reference Houliston and Olson2006; Stadler & Delph, Reference Stadler and Delph2002). These studies have focused on gynodioecious species that maintain polymorphisms for cytoplasmic male sterility (CMS). The patterns of nucleotide variation are suggestive of the action of long-term balancing selection associated with this sexual system, potentially reflecting frequency-dependent selection on distinct male sterility cytotypes (but see Barr et al., Reference Barr, Keller, Ingvarsson, Sloan and Taylor2007). In contrast, a recent survey of chloroplast variation in dioecious Silene (Muir & Filatov, Reference Muir and Filatov2007) suggested a reduction in chloroplast variation consistent with a selective sweep. However, few comparative data have been available from hermaphroditic taxa, and variation at multiple nuclear genes has not been compared directly with variation at organelle genes, to uncouple effects of species and population history from specific effects on cytoplasmic variation.
Despite the predominance of maternal inheritance in most plants, another open question concerns the extent of recombination occurring in cytoplasmic genomes. Paternal leakage of mitochondria and chloroplasts may be common in plants (Barr et al., Reference Barr, Neiman and Taylor2005), and recent studies using quantitative PCR suggest that this can have significant effects on organelle genotypes in natural populations (Welch et al., Reference Welch, Darnell and McCauley2006). Paternal leakage can generate heteroplasmy, or the presence of multiple cytotypes in a single individual, and if recombination occurs in heteroplasmic individuals this can act to break down linkage disequilibrium within and between the organelle genomes. Recent analyses of polymorphism levels of chloroplast and mitochondria have provided evidence for some historical recombination in plants (Houliston & Olson, Reference Houliston and Olson2006; Marshall et al., Reference Marshall, Newton and Ritland2001; Stadler & Delph, Reference Stadler and Delph2002), although the overall frequency of recombination in organelles remains poorly characterized.
In this study we analyse nucleotide polymorphism in the chloroplast and mitochondria of the predominantly self-incompatible, outcrossing plant Arabidopsis lyrata, in order to quantify the effective population size and test for selection acting on these genomes. Arabidopsis lyrata has a circumpolar distribution, and is characterized by high levels of nucleotide polymorphism at nuclear genes (Savolainen et al., Reference Savolainen, Langley, Lazzarro and Fréville2000; Wright et al., Reference Wright, Lauga and Charlesworth2003; Ramos-Onsins et al., Reference Ramos-Onsins, Stranger, Mitchell-Olds and Aguade2004). Studies to date suggest elevated levels of nucleotide diversity at nuclear genes in A. lyrata compared with the highly selfing model plant A. thaliana (Savolainen et al., Reference Savolainen, Langley, Lazzarro and Fréville2000; Wright et al., Reference Wright, Lauga and Charlesworth2003; Ramos-Onsins et al., Reference Ramos-Onsins, Stranger, Mitchell-Olds and Aguade2004), although the relative role of population history and selection in causing these differences remains unclear. We make use of a polymorphism dataset from 53 nuclear genes in A. lyrata (Wright et al., Reference Wright, Foxe, Derose-Wilson, Kawabe and Looseley2006; S. I. Wright, J. Ross-Ibarra, J. P. Foxe, A. Kawabe, L. DeRose Wilson, G. Gos, D. Charlesworth & B. S. Gaut, submitted) to quantify the relative effective population size of the organelles. We also make use of published datasets on nuclear (Nordborg et al., Reference Nordborg, Hu, Ishino, Jhaveri and Toomajian2005) and chloroplast (Jakobsson et al., Reference Jakobsson, Sall, Lind-Hallden and Hallden2007) variation in the highly selfing Arabidopsis thaliana, to test for effects of selfing on the relative effective population sizes.
2. Materials and methods
(i) Samples
A total of 24 individuals from distinct maternal families, from four populations of Arabidopsis lyrata, were used in the initial survey: 12 individuals from Plech, Germany (from T. Mitchell Olds), 4 individuals from Rondeau Provincial Park, Ontario, Canada (collected by B. Mable), 4 individuals from Indiana Dunes, Indiana, USA (collected by B. Mable) and 4 individuals from Stubbsand Sweden (collected by O. Savolainen). Our sampling emphasized the Plech population, since this population has been shown to be the most highly polymorphic (Wright et al., Reference Wright, Foxe, Derose-Wilson, Kawabe and Looseley2006; Wright et al., submitted), and may be closest to demographic equilibrium (Clauss & Mitchell-Olds, Reference Clauss and Mitchell-Olds2006), while the other populations may have experienced population bottlenecks associated with postglacial recolonization (Wright et al., Reference Wright, Foxe, Derose-Wilson, Kawabe and Looseley2006; Wright et al., submitted). It should also be noted that a breakdown of self-incompatibility and high selfing rates have been documented in the Rondeau population (Mable et al., Reference Mable, Robertson, Dart, Di Berardo and Witham2005). Possible effects of this breakdown on our results are addressed in Section 4. In addition, to extend the analysis of polymorphic sites to further test for evidence of recombination and population subdivision, we sequenced all polymorphic loci (see Table 1) in three additional individuals from Sweden, and eight individuals from Karhumäki Russia (collected by O. Savolainen). Because these samples were only sequenced in a subset of the polymorphic loci for the recombination and differentiation analysis, they do not give an unbiased estimate of overall diversity, and are not included in the comparison of polymorphism levels. DNA from a single plant per maternal family was extracted using the FastDNA kit (MP Biomedicals, Solon, OH, USA) and used as template for PCR amplification. PCR conditions consisted of 35 cycles of 20 s at 94°C, 20 s at 55°C, 40 s at 72°C, followed by a 4 min elongation at 72°C.
‘.’ indicates nuleotides identical to the top sequence, shown as the outgroup base from A. thaliana; ‘mv’ indicates missing values.
Populations correspond to Plech, Germany (GER); Rondeau, Canada (CAN); Indiana Dunes, USA (USA), Stubbsand, Sweden (SWE), and Karhumaki, Russia (RUS).
syn, synonymous; rep, replacement, non, non-coding.
(ii) Loci and PCR amplification
Primers were designed using PrimerQuest (Integrated DNA Technologies, http://biotools.idtdna.com/primerquest/) to amplify a total of 12 chloroplast fragments and 16 mitochondrial fragments, with product sizes ranging from 500 to 700 bp, using the complete organelle genomes of Arabidopsis thaliana (chloroplast, GenBank accession number AP000423, mitochondria, GenBank accession number Y08501). Loci include both coding and non-coding genomic regions. A summary of the lengths, positions, primer sequences and annotation of the regions is given in Supplementary Table S1. PCR products were sequenced directly on both strands by Cogenics (Houston, TX, USA).
Note: loci in bold are found in the inverted repeat of the chloroplast genome, hence the dual position information.
(iii) Analysis
Summaries of polymorphism data, including Watterson's (Reference Watterson1975) estimator of θ=4N eu, where N e is the effective population size and u is the mutation rate; π the average pairwise differences (Tajima, Reference Tajima1989); Rm, the minimum number of recombination events (Hudson & Kaplan, Reference Hudson and Kaplan1985), Wright's (Reference Wright1951) Fst, and Tajima's (Reference Tajima1989)D were calculated using DNAsp version 4.10 (Rozas et al., Reference Rozas, Sanchez-Delbarrio, Messeguer and Rozas2003). Maximum likelihood estimates of the population parameter θ=4N eu given the number of segregating sites were calculated using the equations given in Hudson (Reference Hudson1991) as described by Wright and colleagues (Reference Wright, Lauga and Charlesworth2003). In addition, we used a maximum likelihood implementation of the HKA test, mlHKA (Wright & Charlesworth, Reference Wright and Charlesworth2004), to estimate the ratio of effective population sizes in the cytoplasm relative to the nucleus (N c/N n). Note that for the purposes of this study we considered N n as the effective number of nuclear gene copies (not individuals), such that the neutral expectations of N c/N n are 0·5 under complete outcrossing, and 1 under complete selfing for a hermaphroditic plant. We used polymorphism and divergence data from a comparable multi-population sample of 53 nuclear loci from large exons (Wright et al., Reference Wright, Foxe, Derose-Wilson, Kawabe and Looseley2006; Wright et al., submitted) combined with the observed polymorphism and divergence data from the chloroplast and mitochondria. The published nuclear sequence data were surveyed in a total of 71 diploid individuals (up to 142 chromosomes) from the same populations surveyed here, as well as a population from Esja Mountain, Iceland. Given the comparable patterns of species-wide nuclear diversity with the inclusion versus exclusion of Iceland (data not shown), the addition of another population in the nuclear dataset is unlikely to bias our results, and was retained to increase power for estimating N c/N n. However, we also explored within-population estimates of N c/N n for the German population and the pooled North American population, where sufficient sampling of individuals and samples allow. North American populations show relatively low levels of differentiation, and were pooled to allow sufficient power to estimate N c/N n, although it should be noted that we are pooling across populations with distinct outcrossing rates, as addressed in Section 4.
We conducted mlHKA analysis treating the chloroplast, mitochondria and the combined data as single non-recombining datasets, pooling the silent sites across loci, and using the minimum sample sizes as input. Nuclear loci were treated as independent, unlinked loci in the analysis. To avoid upward bias in estimates due to the use of minimum sample sizes for the chloroplast and mitochondrial data, we excluded loci with low sample sizes in the mlHKA analysis. For the species-wide dataset, we excluded loci with sample sizes less than 18. For individual populations, we used cutoffs of 10 for the German population, and 7 for the pooled North American populations. Divergence values were corrected for multiple hits using the Jukes & Cantor (Reference Jukes, Cantor and Munro1969) correction for multiple substitutions. We generated a likelihood surface for N c/N n by varying the inheritance scalar for the organelle data (i.e. varying assumptions about the ratio of effective sizes in organelles versus the nucleus). Credibility intervals were obtained by assuming the Χ2 distribution.
(iv) Arabidopsis thaliana datasets
We also applied mlHKA to the estimation of N c/N n for Arabidopsis thaliana, using multilocus polymorphism data from nuclear genes (Nordborg et al., Reference Nordborg, Hu, Ishino, Jhaveri and Toomajian2005) and a total of 50 fragments from the chloroplast (Jakobsson et al., Reference Jakobsson, Sall, Lind-Hallden and Hallden2007). For an equivalent comparison, we focused on synonymous diversity from the nuclear loci, since A. lyrata nuclear data were taken only from exons. In order to get divergence data for these fragments, nuclear and chloroplast fragments were submitted to a BLAST (Altschul et al., Reference Altschul, Madden, Schaffer, Zhang and Zhang1997) search against the shotgun genome sequence of A. lyrata (http://www.jgi.doe.gov/sequencing) using the trace archive from NCBI (www.ncbi.nlm.nih.gov). Consensus sequences from multiple shotgun trace files were assembled from the A. lyrata genome dataset, and used to calculate divergence for mlHKA analysis. We only retained loci where A. lyrata alignments included multiple shotgun trace files, using a cutoff of 10 bp in the alignment where only a single trace file was used in divergence calculation. In the interest of computation time, a subset of 99 nuclear loci was used from the survey of polymorphism at nuclear genes for the maximum likelihood analysis, chosen to have the largest numbers of total synonymous sites, to increase power in the estimation of N c/N n.
(v) Analysis of recombination
We used the program LDHAT 2.0 (http://www.stats.ox.ac.uk/~mcvean/LDhat) to estimate the population recombination parameter 4N er, where r is the rate of recombination. LDHAT implements the composite likelihood approach of Hudson (Reference Hudson2001), allowing a finite sites model (McVean et al., Reference McVean, Awadalla and Fearnhead2002). Since the appropriate recombination model for organelles should be gene conversion (McVean et al., Reference McVean, Awadalla and Fearnhead2002), we implemented a gene conversion model with a tract length of 300 bp. Distances between polymorphic sites were taken as the minimum distance in the circular chromosomes (McVean et al., Reference McVean, Awadalla and Fearnhead2002), using the A. thaliana genome. We also used LDHAT for likelihood permutation tests of recombination, permuting the locations of segregating sites, to test for significantly greater evidence for recombination in the observed data than data permuted by location.
3. Results
(i) Patterns of polymorphism
We surveyed nucleotide variation in a total of 5827 bp (3·8%) of the chloroplast genome and 8445 bp (2·3%) of the mitochondrial genome of A. lyrata. Of the loci surveyed, five of 13 (38%) showed variation in the chloroplast, while five of 16 (31%) mitochondrial loci were variable. Furthermore, polymorphic loci showed uniformly low levels of variation; loci showed at most two segregating silent sites, and no highly polymorphic loci were observed (Supplementary Table S2). We identified a total of 16 segregating nucleotide sites (Table 1), five of which were synonymous, seven of which were non-coding and four of which were replacement. No indels were identified in our dataset.
Notes: Ss is number of segregating silent sites, and Sa is the number of segregating replacement sites.
Within-population data not shown for Sweden, since all values are zero.
Patterns of variation across populations generally conform to results from nuclear loci (Wright et al., Reference Wright, Foxe, Derose-Wilson, Kawabe and Looseley2006; Wright et al., submitted), namely reduced variation in North American populations and highest diversity in Germany (Table 1; Supplementary Table S2). However, levels of differentiation are very high; there are no shared polymorphisms across any populations with the exception of the United States and Canada, and much of the species-wide variation is characterized by fixed or nearly fixed differences between populations. The total level of differentiation, estimated by F st, is 0·824, which is greater than for any nuclear locus examined in a similar population dataset (average F st for nuclear genes is 0·6; Wright et al., submitted), suggesting significantly elevated differentiation at cytoplasmic relative to nuclear genes. Interestingly, no chloroplast or mitochondrial variation is observed in the Swedish and Russian populations at the loci surveyed here (Table 1), although these populations were sampled only at polymorphic loci (see Section 2). At nuclear loci, these latter two populations have higher diversity than North American populations, but lower than the German population (Wright et al., Reference Wright, Foxe, Derose-Wilson, Kawabe and Looseley2006).
The maximum likelihood estimates of the species-wide population parameter θ=4N eu at the mitochondria and the chloroplast from the number of silent segregating sites are 0·00035 (95% credibility interval 0·00011 to 0·00097) and 0·001 (95% credibility interval 0·00027 to 0·00287), respectively. In contrast, the estimate of θ from the species-wide nuclear genome data is 0·02, approximately 60-fold higher than the mitochondria and 20-fold higher than the chloroplast. Calculation of Tajima's D (Tajima, Reference Tajima1989), a measure of the site frequency spectrum, from the combined data provides no evidence for a significant departure from neutral expectation for the species-wide sample (D=1·41, P>0·05), although the data show a positive skew in the frequency spectrum primarily due to sampling of fixed differences between populations (Table 1). From the nuclear genes, 18% of loci show values of Tajima's D as high as observed here, providing no evidence that this value is unusual compared with the genome average. In Germany, Tajima's D shows a non-significant negative skew (D=−1·17), while in the pooled North American sample Tajima's D is 0·889. Thus, no population shows evidence for an excess of rare or high-frequency variants expected under recent positive selection or long-term balancing selection, respectively.
(ii) Ratio of effective population size in A. lyrata
The above differences in polymorphism across the Arabidopsis lyrata cytoplasmic and nuclear genomes can result from differences in both mutation rates and effective population sizes. To uncouple these, we used a maximum likelihood analysis controlling for divergence between A. lyrata and A. thaliana. Fig. 1 shows estimates of the ratio of effective population sizes at the organelle versus nuclear genomes, N c/N n for the species-wide dataset, as well as for Plech Germany, and the North American populations pooled as a single sample. Maximum likelihood estimates from the species-wide datasets follow the neutral expectations for both organelles individually, as well as the combined dataset; in all cases, the maximum likelihood estimate of the scaled effective population size is 0·4, and shows no significant departure from the neutral expectation of 0·5. Similar patterns are observed for the German population, although the maximum likelihood estimate is slightly lower at 0·3 for the combined data, but not significantly different from 0·5. Maximum likelihood estimates of N c/N n were non-significantly reduced for the North American chloroplast sample, due to the lack of silent variation identified at the chloroplast loci surveyed, although the likelihood surface is flat due to an overall low nucleotide variation in this population. Mitochondrial loci give an elevated estimate of N c/N n in North America, and the combined data give a maximum likelihood estimate of N c/N n=0·6. In summary, the ratio of effective population sizes appears to be entirely consistent with neutral expectations in our A. lyrata dataset.
The maximum likelihood analysis also allows for a comparison of mutation rate, controlling for differences in effective population size. Our estimate of the scaled mutation rates, taking the maximum likelihood value of θ from the nuclear genes, is 8-fold elevated in the nucleus relative to the chloroplast, which is a considerably larger difference than the 2-fold estimated by Wolfe and colleagues (Reference Wolfe, Li and Sharp1987). Our chloroplast dataset includes four loci from the inverted repeat region of the genome, which has been shown to have suppressed substitution rates, and this may in part drive our mutation rate estimates downward. Note that this should not bias our effective size estimation, since this is scaled to divergence between species. Using estimates of substitution rate from nuclear genes (1·5×10−8 per year) in the Brassicaceae (Koch et al., Reference Koch, Haubold and Mitchell-Olds2000), this would suggest that the chloroplast has a substitution rate of 1·7×10−9 per year on average. For the mitochondria, maximum likelihood analysis estimates a 23-fold reduction in mutation rate relative to the nuclear genome, giving a substitution rate estimate of 6·5×10−10 per base pair per year for the mitochondria. Again, this value is considerably lower than the 6-fold reduction suggested by Wolfe and colleagues (Reference Wolfe, Li and Sharp1987), but both chloroplast and mitochondrial values are closer to relative rate estimates from polymorphism data in rice (Tian et al., Reference Tian, Zheng, Hu and Yu2006). Although the latter estimates are not controlling for effective population size differences, domesticated rice is highly selfing, so these ratios may be comparable to differences in mutation rate.
(iii) Comparisons with A. thaliana
Maximum likelihood estimates of silent θ from A. thaliana are 0·00123 (95% credibility interval 0·0055 to 0·00279) for the chloroplast and 0·0055 (95% confidence interval 0·00473 to 0·00643) for the nuclear genes from this dataset. Thus, values are slightly higher than those of A. lyrata for the chloroplast, in contrast to patterns from nuclear genes, where a close to 4-fold reduction in polymorphism is evident. Note that controlling for silent differences in divergence between the loci surveyed in the two species in the chloroplast (0·032 for A. thaliana and 0·016 for A. lyrata), there is still a reduction in mean estimates of effective population size in A. thaliana in the chloroplast (θs/Ks=0·038 for A. thaliana, 0·0625 for A. lyrata), but this is not as severe as the 4-fold reduction at nuclear loci. Consistent with these results, the maximum likelihood estimate of N c/N n is higher in A. thaliana at 0·75, and is not significantly different from 1 (95% credibility interval 0·5 to 2·0). This is consistent with a role for selfing in elevating the ratio of N c/N n, although the credibility intervals overlap between the two taxa.
(iv) Testing for recombination
The similarity in effective population size estimates between the chloroplast and mitochondria in A. lyrata is consistent with these genomes experiencing a single genealogical history due to a lack of recombination. Furthermore, the original combined chloroplast and mitochondrial dataset provided no evidence for recombination using the four-gamete test using the species-wide A. lyrata dataset (Table 1; samples to SWE4); the minimum number of recombination events is inferred to be 0. However, if we include A. thaliana as an outgroup for the analysis, the minimum number of recombination events is inferred to be 1, and several pairs of sites are discordant based on the four-gamete test (e.g. chloroplast position 24037 and chloroplast position 49304). This pattern can only be explained by a recombination event in the history of the sample, or recurrent mutation. Although substitution rates are low, it is possible that this signature of recombination is driven by multiple substitutions since the divergence from A. thaliana. This prompted us to increase our sampling at polymorphic loci, adding a Russian population and larger samples from Sweden (Table 1; samples SWE5 to RUS8). With the additional data, we confirmed some of the outgroup gamete combinations segregating in Russian A. lyrata, and the minimum number of recombination events from within-species analysis is now 1 (Table 2). Overall, six pairs of segregating sites show incongruence using the four-gamete test, including one pair within the chloroplast, and the remaining pairs are between the chloroplast and the mitochondria (Table 2). All incongruent sites involve position 24037 of the chloroplast, which is a high frequency derived variant in our sample. Using a composite likelihood approach, the maximum likelihood estimate of the effective recombination rate ρ is 0·501 for the combined data, 0·501 for the chloroplast alone, and 0 for the mitochondria alone. Considering the length of sequence surveyed in this analysis, this implies a per base pair rate of recombination for the chloroplast of 1·3×10−5 and 0 for the mitochondria. The estimate of the ratio of recombination to mutation can be calculated by the ratio of estimates of ρ/θ. In the chloroplast, this implies a ratio of 0·013, which is two orders of magnitude lower than estimates from the nuclear genome (Wright et al., submitted). Given that the combined dataset gives a comparable estimate of ρ, the ratio of ρ/θ overall should be orders of magnitude lower than this estimate.
a Composite likelihood estimates of the population recombination parameter 4Ner. Values in parentheses are per base pair estimates.
b Proportion of permuted data having a maximum likelihood value under recombination as high as observed.
c Minimum number of recombination events.
d Pairs of incompatible sites using the four-gamete test. Positions correspond to A. thaliana genomic positions in the chloroplast (c) or mitochondria (m), as shown in Table 1.
If some recurrent mutation has occurred in the history of the sample, recombination rates could be even lower than those estimated here. In particular, the site 24037 in the chloroplast is the primary site involved in evidence for recombination in our data (Table 2). To explore the possibility that this could be a mutation hotspot, we submitted this region to a BLAST search, and checked for multiple substitutions at this site in the Brassicaceae. Indeed, the C-T transition identified in our data is also found to have occurred in Sinapis alba (accession number AJ243754), as well as Brassica rapa (accession number AC189190), suggesting that recurrent mutation could be important. Furthermore, a permutation test of likelihood scores provides no evidence for higher linkage disequilibrium at adjacent polymorphic sites in our dataset (Table 2), even when using the pooled chloroplast and mitochondrial data, suggesting little pattern of recombination breaking down associations for more distant sites. Thus, while we cannot rule out recombination entirely, the results are generally consistent with either a low rate of multiple substitutions due to hotspots, or with very low levels of localized gene conversion events.
4. Discussion
In A. lyrata, our results fit with the standard neutral prediction of a 2-fold reduction in effective population size of the chloroplast and the mitochondria. This result is most evident in the species-wide sample, as well as the Plech, Germany population, where variation at both nuclear and cytoplasmic genes tends to be highest, and our sampling was most focused. Other within-population samples showed no significant rejection of neutrality, despite a complete lack of variation identified in Sweden and Russia, and a lack of silent diversity in the chloroplast genome of North American populations. Evidence for elevated between-population differentiation at cytoplasmic loci is also consistent with the observed reduction in effective population size compared with nuclear loci. The combined data also suggest a non-significant elevation of N c/N n at the mitochondria in North American populations. One possible contributor to this is the effect of partial selfing increasing homozygosity in the nuclear genome in the Rondeau population (Mable et al., Reference Mable, Robertson, Dart, Di Berardo and Witham2005); increased homozygosity at nuclear genes will tend to increase the ratio of N c/N n. More extensive sampling across individuals, populations and loci will be important to further test the importance of partial selfing and local adaptation on the effective size of these genomes.
A recent study of chloroplast variation suggested extensive haplotype sharing between A. lyrata, A. arenosa and A. halleri (Koch & Matschinger, Reference Koch and Matschinger2007). This result could reflect continuing lineage sorting of ancestral variation, or recent between-species introgression. While our results cannot rule out introgression into A. lyrata, our inference of a neutral ratio of cytoplasmic to nuclear effective size would suggest that, if hybridization is occurring, it is not asymmetric with respect to pollen versus seed migration. More, generally, our results are inconsistent with a strong difference in the ratio of pollen to seed migration, which may reflect a general lack of gene flow in relatively fragmented, isolated populations.
In A. thaliana, selfing rates have been estimated at 0·97 in natural populations (Abbott & Gomes, Reference Abbott and Gomes1989). This would imply an expected effective population size ratio of cytoplasmic relative to nuclear genes of approximately 1, and our results are consistent with this finding. This analysis is thus consistent with the conclusion from the frequency spectrum (Jakobsson et al., Reference Jakobsson, Sall, Lind-Hallden and Hallden2007) that the A. thaliana chloroplast polymorphism data also follow the neutral expectation.
The overall results suggest that rates of adaptive and deleterious mutation at organelles are low enough that they are not causing extensive hitch-hiking in these species, in contrast to recent suggestions of rampant hitch-hiking of mitochondrial polymorphism in animal taxa (Bazin et al., Reference Bazin, Glemin and Galtier2006). The reduction of substitution rates may be sufficient, and genomic rearrangement rates low enough, to cause little effect of background selection and selective sweeps at organelles in many natural plant populations. Alternatively, these processes could be affecting nuclear and cytoplasmic genes equally, giving the neutral expectation when examining the ratio of effective sizes. In either case, given the unusually high diversity found in previous studies, balancing selection associated with male sterility may in fact be important in structuring cytoplasmic variation in gynodioecious species (Houliston & Olson, Reference Houliston and Olson2006; Stadler & Delph, Reference Stadler and Delph2002), although clearly more multilocus nuclear and cytoplasmic data should be collected from other plant taxa to assess this in more detail. Our data suggesting uniformly low polymorphism levels are also in contrast with recent reports in other systems of high substitution rates at some cytoplasmic loci (Cho et al., Reference Cho, Mower, Qiu and Palmer2004; Barr et al., Reference Barr, Keller, Ingvarsson, Sloan and Taylor2007). Combined with evidence for very low to no recombination in our data, these results also suggest that rates of paternal leakage are low in these species.
Controlling for divergence, species-wide polymorphism levels in the chloroplast are slightly reduced in A. thaliana compared with A. lyrata, while nuclear variation is more extensively reduced. This differs from a study of selfing and outcrossing species in the genus Mimulus (Fenster & Ritland, Reference Fenster and Ritland1992), where both chloroplast and allozyme variation were equally reduced in selfing species, but is comparable to a study in the nematode genus Caenorhabditis (Graustein et al., Reference Graustein, Gaspar, Walters and Palopoli2002) where mitochondrial variation was reduced in the selfing species relative to outcrossers, although less extensively than nuclear loci. In these systems, it was inferred that frequent population bottlenecks in selfing species were contributing to the reduction in organelle diversity, explaining suppression of variation in both genomes. Given our results, we can conclude that differences in demographic history may be reducing species-wide variation at both cytoplasmic and nuclear loci, while mating system differences are specifically reducing variation at nuclear genes, leading to an increase in N c/N n in A. thaliana.
We thank Asher Cutter for helpful discussion, and Pavel Goldvasser and Neera Singal for technical assistance. This work was supported by an NSERC discovery grant and a Sloan Research Fellowship to S. I. W. S. I. W. also thanks Deborah Charlesworth for extensive guidance in the study of population genetics and molecular evolution.