from Part I - Next Generation Phylogenetics
Published online by Cambridge University Press: 05 June 2016
Make no little plan
Attributed to Daniel Burnham, Chicago architectIntroduction
Phylogenetic trees are getting large. Trees based on single loci have been constructed for > 100 000 taxa (Price et al. 2010), and trees based on a handful of loci for > 10 000 taxa (Goloboff et al. 2009; Smith et al. 2011a). Basic counting arguments show that the number of loci needed to reconstruct a tree accurately scales up with the number of leaves, N, in the tree (Mossel and Steel 2005, p. 400). Whether this scaling occurs at a conjectured rate of log(N), or is worse than that, the need for genome-scale datasets is likely to increase. Fortunately, the pace at which new sequence data are accumulating is extraordinary, and its revolutionary impact on systematics has been noted many times (e.g. Goldman and Yang 2008). What is perhaps more noteworthy is that taxon sampling has been keeping pace with advances in sequencing technology, so that the size of phylogenetic datasets has been steadily increasing in both dimensions. Figure 1.1 shows the expanding wave front of phylogenetic dataset size, a kind of ‘Moore's Law’ for phylogenomics. This pattern undoubtedly has its limits. Goldman and Yang (2008) documented the exponential growth in number of sequences in databases, but cautioned that molecular phylogenetic studies are accumulating at a rate that is less than exponential. This is probably due to a combination of the mean number of sequences per study increasing over time (Fig 1.1), and the inevitable increasing difficulty of obtaining samples of rare taxa. Given the ‘hollow curve’ of distribution, the fact that most species are both geographically restricted and locally uncommon (McGill 2010), it is doubtful that sampling across taxa will be able to keep up with sampling across individual genomes. Nonetheless, today ~ 19% of described biodiversity has at least one sequence in GenBank (355 000 species out of 1.9 million, as of March 2016).
There are many reasons to add genome-scale data to phylogenetic inference in local problems in the Tree of Life, or to solidify its deep backbone with a small number of exemplars, but this paper focuses on the task of building large, species rich, high-resolution phylogenies.
To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.