Hostname: page-component-586b7cd67f-t7fkt Total loading time: 0 Render date: 2024-11-22T19:32:30.782Z Has data issue: false hasContentIssue false

A central limit theorem for the parsimony length of trees

Published online by Cambridge University Press:  01 July 2016

Mike Steel*
Affiliation:
University of Canterbury
Larry Goldstein*
Affiliation:
University of Southern California
Michael S. Waterman*
Affiliation:
University of Southern California
*
Postal address: Department of Mathematics, University of Canterbury, Private Bag, Christchurch, New Zealand.
∗∗ Postal address: Department of Mathematics, University of Southern California, Los Angeles, CA 90089-1113, USA.
∗∗∗ Postal address: Departments of Mathematics and Molecular Biology, University of Southern California, Los Angeles, CA 90089-1113, USA.

Abstract

In phylogenetic analysis it is useful to study the distribution of the parsimony length of a tree under the null model, by which the leaves are independently assigned letters according to prescribed probabilities. Except in one special case, this distribution is difficult to describe exactly. Here we analyze this distribution by providing a recursive and readily computable description, establishing large deviation bounds for the parsimony length of a fixed tree on a single site and for the minimum length (maximum parsimony) tree over several sites. We also show that, under very general conditions, the former distribution converges asymptotically to the normal, thereby settling a recent conjecture. Furthermore, we show how the mean and variance of this distribution can be efficiently calculated. The proof of normality requires a number of new and recent results, as the parsimony length is not directly expressible as a sum of independent random variables, and so normality does not follow immediately from a standard central limit theorem.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 1996 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The research of Larry Goldstein and Michael S. Waterman is supported in part by NSF grant DMS-9005833 and Larry Goldstein in part by NSF grant DMS-9505075. The research of Michael S. Waterman was also supported in part by NIH grant #6M 36230.

References

Alon, N. and Spencer, J. H. (1992) The Probabilistic Method. Wiley, New York.Google Scholar
Archie, J. and Felsenstein, J. (1993) The number of evolutionary steps on random and minimum length trees for random evolutionary data. Theoret. Pop. Biol. 43, 5279.Google Scholar
Carter, M., Hendy, M., Penny, O., Szekely, L. A. and Wormald, N. C. (1990) On the distribution of lengths of evolutionary trees. SIAM J. Disc. Math. 3, 3847.Google Scholar
Cavalli-Sforza, L. L. and Edwards, A. W. F. (1967) Phylogenetic analysis: Models and estimation procedures. Amer. J. Hum. Genet. 19, 233257.Google Scholar
Charleston, M. and Steel, M. A. (1995) Five surprising properties of parsimoniously colored trees. Bull. Math. Biol. 57, 367375.Google Scholar
Durrett, R. (1991) Probability: Theory and Examples. Wadsworth & Brooks/Cole, Belmont, CA.Google Scholar
Fitch, W. M. (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20, 406416.Google Scholar
Hartigan, J. A. (1973) Minimum mutation fits to a given tree. Biometrics 29, 5365.Google Scholar
Moon, J. W. and Steel, M. A. (1993) A limiting distribution for parsimoniously bicolored trees. Appl. Math. Lett. 6, 58.Google Scholar
Serfling, R. J. (1980) Approximation Theorems of Mathematical Statistics. Wiley, New York.Google Scholar
Steel, M. A. (1993a) Decompositions of leaf-coloured binary trees. Adv. Appl. Math. 14, 124.Google Scholar
Steel, M. A. (1993b) Distributions on bicoloured binary trees arising from the principle of parsimony. Discr. Appl. Math. 41, 245261.Google Scholar
Steele, J. M. (1986) An Efron-Stein inequality for nonsymmetric statistics. Ann. Statist. 14, 753758.Google Scholar