Setting up and running a phylogenetic analysis

Alexei J. Drummond; Remco R. Bouckaert

doi:10.1017/CBO9781139095112.008

In this chapter, we will go through some of the more common decisions involved in setting up a phylogenetic analysis in BEAST. The order in which the issues are presented follows more or less the order in which an analysis is set up in BEAUti for a standard analysis. So, we start with issues involved in the alignment, then setting up site and substitution models, clock models and tree priors and all of their priors. Some notes on calibrations and miscellanea are followed by some practicalities of running a BEAST analysis. Note that a lot of the advice in this section is rather general. Since every situation has its special characteristics, the advice should be interpreted in the context of what you know about your data.

Preparing alignments

Some tips on selecting samples and loci for alignments are discussed in (Ho and Shapiro 2011; Mourier et al. 2012; Silva et al. 2012).

Recombinant sequences: Though under some circumstances, horizontal transmission was shown not to impact the tree and divergence time estimates (Currie et al. 2010; Greenhill et al. 2010), the models in BEAST cannot handle recombinant sequences properly at the time of writing. So, it is recommended that these are removed from the alignment. There are many programs that can help identify recombinant sequences, for example 3seq (Boni et al. 2007) or SplitsTree (Huson and Bryant 2006).

Duplicate sequences: An erroneous argument for removal of duplicate sequences in the alignment is that multiple copies will lead to ambiguous trees and slow down the analysis. However, a Bayesian approach aims to sample all trees that have an appreciable probability given the data. One of the assumptions underlying common Bayesian phylogenetic models is that there is a binary tree according to which the data were generated. If, for example, three taxa have identical sequences, it does not mean that they represent the same individual, or that they are equally closely related in the true tree. All that can be said is that there were no mutations in the sampled part of the genome during the ancestral history of those three taxa. In this case, BEAST would sample all three subtrees with equal probability: ((1,2),3), (1,(2,3)), ((1,3),2). If you summarise the BEAST output as a single tree (see Section 11.4) you will see some particular sub-tree over these identical sequences based on the selected representative tree.

Book contents

7 - Setting up and running a phylogenetic analysis

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

7 - Setting up and running a phylogenetic analysis

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive