Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-11T14:51:13.716Z Has data issue: false hasContentIssue false

Identifiability of a Markovian model of molecular evolution with gamma-distributed rates

Published online by Cambridge University Press:  01 July 2016

Elizabeth S. Allman*
Affiliation:
University of Alaska Fairbanks
Cécile Ané*
Affiliation:
University of Wisconsin Madison
John A. Rhodes*
Affiliation:
University of Alaska Fairbanks
*
Postal address: Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AK 99775, USA.
∗∗∗ Department of Statistics, University of Wisconsin Madison, Medical Science Center, 1300 University Avenue, Madison, WI 53706, USA. Email address: [email protected]
Postal address: Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AK 99775, USA.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although nonidentifiability was proven for a semiparametric model and an incorrect proof of identifiability was published for a general parametric model (GTR + Γ + I). Here we prove that one of the most widely used models (GTR + Γ) is identifiable for generic parameters, and for all parameter choices in the case of four-state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2008 

References

Allman, E. S. and Rhodes, J. A. (2006). The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J. Comput. Biol. 13, 11011113.Google Scholar
Allman, E. S. and Rhodes, J. A. (2008). Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math. Biosci. 211, 1833.CrossRefGoogle ScholarPubMed
Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 5173.CrossRefGoogle ScholarPubMed
Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Sunderland, MA.Google Scholar
Gascuel, O. and Guidon, S. (2007). Modelling the variability of evolutionary processes. In Reconstructing Evolution: New Mathematical and Computational Advances, eds Gascuel, O. and Steel, M., Oxford University Press, pp. 65107.Google Scholar
Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press.Google Scholar
Kolaczkowski, B. and Thornton, J. (2004). Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980984.Google Scholar
Matsen, F. A. and Steel, M. A. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767775.Google Scholar
Matsen, F. A., Mossel, E. and Steel, M. (2008). Mixed-up trees: the structure of phylogenetic mixtures. To appear in Bull. Math. Biol. Google Scholar
Pagel, M. and Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571581.Google Scholar
Rogers, J. S. (2001). Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst. Biol. 50, 713722.Google Scholar
Semple, C. and Steel, M. (2003). Phylogenetics (Oxford Lecture Ser. Math. Appl. 24). Oxford University Press.Google Scholar
Steel, M. A., Székely, L. and Hendy, M. D. (1994). Reconstructing trees from sequences whose sites evolve at variable rates. J. Comput. Biol. 1, 153163.Google Scholar
Štefankovič, D. and Vigoda, E. (2007). Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J. Comput. Biol. 14, 156189.Google Scholar
Štefankovič, D. and Vigoda, E. (2007). Pitfalls of heterogeneous processes for phylogenetic reconstruction. Syst. Biol. 56, 113124.Google Scholar
Sullivan, J., Swofford, D. L. and Naylor, G. J. P. (1999). The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molec. Biol. Evolution 16, 13471356.Google Scholar
Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Molec. Evol. 39, 306314.Google Scholar