Hostname: page-component-586b7cd67f-dsjbd Total loading time: 0 Render date: 2024-11-23T00:07:49.170Z Has data issue: false hasContentIssue false

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Published online by Cambridge University Press:  14 April 2009

Joseph Felsenstein
Affiliation:
Department of Genetics SK-50, University of Washington, Seattle, Washington 98195
Rights & Permissions [Opens in a new window]

Summary

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

It is known that under neutral mutation at a known mutation rate a sample of nucleotide sequences, within which there is assumed to be no recombination, allows estimation of the effective size of an isolated population. This paper investigates the case of very long sequences, where each pair of sequences allows a precise estimate of the divergence time of those two gene copies. The average divergence time of all pairs of copies estimates twice the effective population number and an estimate can also be derived from the number of segregating sites. One can alternatively estimate the genealogy of the copies. This paper shows how a maximum likelihood estimate of the effective population number can be derived from such a genealogical tree. The pairwise and the segregating sites estimates are shown to be much less efficient than this maximum likelihood estimate, and this is verified by computer simulation. The result implies that there is much to gain by explicitly taking the tree structure of these genealogies into account.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1992

References

Avise, J. C. (1989). Gene trees and organismal histories: a phylogenetic approach to population biology. Evolution 43, 11921208.CrossRefGoogle ScholarPubMed
Avise, J. C, Ball, R. M. Jr & Arnold, J. (1988). Current versus historical population sizes in vertebrate species with high gene flow: a comparison based on mitochondrial DNA polymorphism and inbreeding theory for neutral mutations. Molecular Biology and Evolution 5, 331344.Google Scholar
Ball, R. M. Jr, Neigel, J. E. & Avise, J. C. (1990). Gene genealogies within the organismal pedigrees of randommating populations. Evolution 44, 360370.Google ScholarPubMed
Cann, R. L., Stoneking, M. & Wilson, A. C. (1987). Mitochondrial DNA and human evolution. Nature 325, 3136.CrossRefGoogle ScholarPubMed
Ethier, S. N. & Griffiths, R. C. (1987). The infinitely-manysites model as a measure-valued diffusion. Annals of Probability 15, 515545.CrossRefGoogle Scholar
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, 3rd edn.New York: John Wiley.Google Scholar
Griffiths, R. C. (1989). Genealogical tree probabilities in the infinitely-many-site model. Journal of Mathematical Biology 11, 667680.CrossRefGoogle Scholar
Harding, E. F. (1971). The probabilities of rooted tree shapes generated by random bifurcation. Advances in Applied Probability 3, 4477.CrossRefGoogle Scholar
Hudson, R. R. (1983). Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 203217.CrossRefGoogle ScholarPubMed
Kingman, J. F. C. (1982 a). The coalescent. Stochastic Processes and Their Applications 13, 235248.CrossRefGoogle Scholar
Kingman, J. F. C. (1982 b). On the genealogy of large populations. Journal of Applied Probability 19 A, 2743.CrossRefGoogle Scholar
Maddison, W. P. & Slatkin, M. (1991). Null models for the number of evolutionary steps in a character on a phylogenetic tree. Evolution 45, 11841197.CrossRefGoogle Scholar
Moran, P. A. P. (1958). Random processes in genetics. Proc. Camb. Phil. Soc. 54, 6071.CrossRefGoogle Scholar
Nei, M. & Tajima, F. (1981). DNA polymorphism detectable by restriction endonucleases. Genetics 97, 145163.CrossRefGoogle ScholarPubMed
Nei, M. (1987). Molecular Evolutionary Genetics. New York: Columbia University Press.CrossRefGoogle Scholar
Saunders, I. W., Tavare, S. & Watterson, G. A. (1984). On the genealogy of nested subsamples from a haploid population. Advances in Applied Probability 16, 471491.CrossRefGoogle Scholar
Slatkin, M. (1987). The average number of sites separating DNA sequences drawn from a subdivided population. Theoretical Population Biology 32, 4249.CrossRefGoogle ScholarPubMed
Slatkin, M. (1989). Detecting small amounts of gene flow from phylogenies of alleles. Genetics 121, 609612.CrossRefGoogle ScholarPubMed
Slatkin, M. & Maddison, W. P. (1989). Cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123, 603613.CrossRefGoogle ScholarPubMed
Slowinski, J. G. & Guyer, C. (1989). Testing the stochasticity of patterns of organismal diversity: an improved null model. American Naturalist 134, 907921.CrossRefGoogle Scholar
Strobeck, C. (1983). Estimation of the neutral mutation rate in a finite population from DNA sequence data. Theoretical Population Biology 24, 160172.CrossRefGoogle Scholar
Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437460.CrossRefGoogle ScholarPubMed
Takahata, N. (1988). The coalescent in two partially isolated diffusion populations. Genetical Research 52, 213222.CrossRefGoogle ScholarPubMed
Takahata, N. & Slatkin, M. (1990). Genealogy of neutral genes in two partially isolated populations. Theoretical Population Biology 38, 331350.CrossRefGoogle ScholarPubMed
Tavare, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology 26, 119164.CrossRefGoogle ScholarPubMed
Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7, 256276.CrossRefGoogle ScholarPubMed
Wright, S. (1940). Breeding structure of populations in relation to speciation. American Naturalist 74, 232248.CrossRefGoogle Scholar