Hostname: page-component-586b7cd67f-dsjbd Total loading time: 0 Render date: 2024-11-22T21:19:26.262Z Has data issue: false hasContentIssue false

Minimal clade size and external branch length under the neutral coalescent

Published online by Cambridge University Press:  01 July 2016

Michael G. B. Blum*
Affiliation:
Institut National Polytechnique de Grenoble
Olivier François*
Affiliation:
Institut National Polytechnique de Grenoble
*
Postal address: Laboratoire TIMC-TIMB, Institute for Health and Information Engineering, Faculty of Medicine, F38706 La Tronche cedex, France.
Postal address: Laboratoire TIMC-TIMB, Institute for Health and Information Engineering, Faculty of Medicine, F38706 La Tronche cedex, France.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Given a sample of genes taken from a large population, we consider the neutral coalescent genealogy and study the theoretical and empirical distributions of the size of the smallest clade containing a fixed gene. We show that the theoretical distribution is strongly related to a Yule distribution of parameter 2, and that the empirical count statistics are asymptotically Gaussian as the number of genes grows to infinity. Then we consider external branches of the coalescent tree, and describe their lengths. Using the infinitely many sites model of mutation, we also describe the conditional distribution of the external branch lengths, given the number of pairwise differences between a reference DNA sequence and the sequence of one closest relative in the sample.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2005 

References

Aldous, D. J. (1991). Asymptotic fringe distributions for general families of random trees. Ann. Appl. Prob. 1, 228266.Google Scholar
Aldous, D. J. (1996). Probability distributions on cladograms. In Random Discrete Structures, eds Aldous, D. J. and Pemantle, R., Springer, Berlin, pp. 118.Google Scholar
Aldous, D. J. (2001). Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16, 2334.CrossRefGoogle Scholar
Devroye, L. (1991). Limit laws for local counters in random binary search trees. Random Structures Algorithms 2, 303315.Google Scholar
Donnelly, P., Tavaré, S., Balding, D. J. and Griffiths, R. C. (1996). Estimating the age of the common ancestor of men from the ZFY intron. Science 272, 13571359.Google Scholar
Durrett, R. (2003). Probabilistic Models of DNA Sequences. Springer, New York.Google Scholar
Fu, Y. X. and Li, W. H. (1993). Statistical tests of neutrality of mutations. Genetics 133, 693709.CrossRefGoogle ScholarPubMed
Hwang, H.-K. and Neininger, R. (2002). Phase change of limit laws in the quicksort recurrence under varying toll functions. SIAM J. Comput. 31, 16871722.Google Scholar
Kingman, J. F. C. (1982). On the genealogy of large populations. In Essays in Statistical Science (J. Appl. Prob. Spec. Vol. 19A), Applied Probability Trust, Sheffield, pp. 2743.Google Scholar
Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235248.Google Scholar
McKenzie, A. and Steel, M. (2000). Distributions of cherries for two models of trees. Math. Biosci. 164, 8192.Google Scholar
Nordborg, M. (2001). Coalescent theory. In Handbook of Statistical Genetics, eds Balding, D. J. et al., John Wiley, New York, pp. 179208.Google Scholar
Régnier, M. (1989). A limiting distribution for quicksort. RAIRO Inf. Théor. Appl. 23, 335343.Google Scholar
Rösler, U. (1992). A limit theorem for quicksort. RAIRO Inf. Théor. Appl. 25, 85100.CrossRefGoogle Scholar
Saunders, I. W., Tavaré, S. and Watterson, G. A. (1984). On the genealogy of nested subsamples from a haploid population. Adv. Appl. Prob. 16, 471491.Google Scholar
Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437460.CrossRefGoogle ScholarPubMed
Tavaré, S. (2004). Ancestral inference in population genetics. In Lectures on Probability Theory and Statistics (Lecture Notes Math. 1837), Springer, Berlin, pp. 1188.Google Scholar
Tavaré, S. (1997). Ancestral inference from DNA sequence data. In Case Studies in Mathematical Modeling in Ecology, Physiology and Cell Biology, eds Othmer, H. G. et al., Prentice Hall, Upper Saddle River, NJ, pp. 8196.Google Scholar
Walsh, B. (2001). Estimating the time to the most recent common ancestor for the Y chromosome or mitochontrial DNA for a pair of individuals. Genetics 158, 897912.Google Scholar
Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoret. Pop. Biol. 7, 256276.CrossRefGoogle ScholarPubMed
Watterson, G. A. (1982). Mutant substitutions at linked nucleotide sites. Adv. Appl. Prob. 14, 206224.Google Scholar
Wiuf, C. and Donnelly, P. (1999). Conditional genealogies and the age of a neutral mutant. Theoret. Pop. Biol. 56, 183201.CrossRefGoogle ScholarPubMed
Yule, G. U. (1924). A mathematical theory of evolution, based on the conclusions of Dr J. C. Willis. Philos. Trans. R. Soc. London B 213, 2187.Google Scholar