Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method

Joseph Felsenstein

doi:10.1017/S0016672300030962

Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method

Published online by Cambridge University Press: 14 April 2009

Joseph Felsenstein

Show author details

Joseph Felsenstein: Affiliation:
Department of Genetics SK-50, University of Washington, Seattle, Washington 98195, USA

Article contents

Summary
References

Rights & Permissions

Summary

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We would like to use maximum likelihood to estimate parameters such as the effective population size Ne, or, if we do not know mutation rates, the product 4Neμof mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods pproximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, Ne or of 4Neμ. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.

Type: Research Article
Information: Genetics Research , Volume 60 , Issue 3 , December 1992 , pp. 209 - 220

DOI: https://doi.org/10.1017/S0016672300030962 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1992

References

Avise, J. C. (1989). Gene trees and organismal histories: a phylogenetic approach to population biology. Evolution 43, 1192–1208CrossRef Google Scholar PubMed

Cann, R. L., Stoneking, M. & Wilson, A. C. (1987). Mitochondrial DNA and human evolution. Nature 325, 31–36.Google Scholar

Edwards, A. W. F. (1970). Estimation of the branch points of a branching diffusion process. Journal of the Royal Statistical Society, B32, 155–174.Google Scholar

Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7, 1–26.CrossRef Google Scholar

Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Felsenstein, J. (1981). Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35, 1229–1242.CrossRef Google Scholar PubMed

Felsenstein, J. (1985). Confidence limits on phylogenies with a molecular clock. Systematic Zoology 34, 152–161.Google Scholar

Felsenstein, J. (1988). Phylogenies from molecular sequences: inference and reliability. Annual Review of Genetics 22, 521–565.CrossRef Google Scholar PubMed

Felsenstein, J. (1992). Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates. Genetical Research 59, 139–147.CrossRef Google Scholar PubMed

Griffiths, R. C. (1989). Genealogical tree probabilities in the infinitely-many-site model. Journal of Mathematical Biology 27, 667–680.Google Scholar

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109.CrossRef Google Scholar

Hammersley, J. M. & Handscomb, D. C. (1964). Monte Carlo Methods. London: Methuen.CrossRef Google Scholar

Jukes, T. H. & Cantor, C. (1969). Evolution of protein molecules. In Mammalian Protein Metabolism (ed. Munro, M. N.), pp. 21–132. New York: Academic Press.Google Scholar

Kahn, H. (1950). Random sampling (Monte Carlo) techniques in neutron attenuation problems. I. Nucleonics 6 (5), 27–37.Google Scholar

Kendall, M. G. & Stewart, A. (1973). The Advanced Theory of Statistics, Vol. 2, 3rd Edn.New York: Hafner.Google Scholar

Kimura, M. & Ohta, T. (1972). On the stochastic model for estimation of mutational distance between homologous proteins. Journal of Molecular Evolution 2, 87–90.CrossRef Google Scholar PubMed

Kingman, J. F. C. (1982 a). The coalescent. Stochastic Processes and Their Applications 13, 235–248.Google Scholar

Kingman, J. F. C. (1982 b). On the genealogy of large populations. Journal of Applied Probability 19 A, 27–43.CrossRef Google Scholar

Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17, 1217–1241.Google Scholar

Margush, T. & McMorris, F. R. (1981). Consensus n-trees. Bulletin of Mathematical Biology 43, 239–244.Google Scholar

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092.Google Scholar

Nei, M. & Tajima, F. (1981) DNA polymorphism detectable by restriction endonucleases. Genetics 97, 145–163.CrossRef Google Scholar PubMed

Strobeck, C. (1983). Estimation of the neutral mutation rate in a finite population from DNA sequence data. Theoretical Population Biology 24, 160–172.CrossRef Google Scholar

Watterson, G. A. (1975) On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7, 256–276.Google Scholar

Article contents

Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method

Summary

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests