Book contents
- Frontmatter
- Contents
- List of contributors
- Foreword
- Preface
- Section I Introduction
- Section II Data preparation
- 2 Sequence databases and database searching
- 3 Multiple sequence alignment
- Section III Phylogenetic inference
- Section IV Testing models and trees
- Section V Molecular adaptation
- Section VI Recombination
- Section VII Population genetics
- Section VIII Additional topics
- Glossary
- References
- Index
2 - Sequence databases and database searching
from Section II - Data preparation
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- List of contributors
- Foreword
- Preface
- Section I Introduction
- Section II Data preparation
- 2 Sequence databases and database searching
- 3 Multiple sequence alignment
- Section III Phylogenetic inference
- Section IV Testing models and trees
- Section V Molecular adaptation
- Section VI Recombination
- Section VII Population genetics
- Section VIII Additional topics
- Glossary
- References
- Index
Summary
THEORY
Introduction
Phylogenetic analyses are often based on sequence data accumulated by many investigators. Faced with a rapid increase in the number of available sequences, it is not possible to rely on the printed literature; thus, scientists had to turn to digitalized databases. Databases are essential in current bioinformatic research: they serve as information storage and retrieval locations; modern databases come loaded with powerful query tools and are cross-referenced to other databases. In addition to sequences and search tools, databases also contain a considerable amount of accompanying information, the so-called annotation, e.g. from which organism and cell type a sequence was obtained, how it was sequenced, what properties are already known, etc. In this chapter, we will provide an overview of the most important publicly available sequence databases and explain how to search them. A list of the database URLs discussed in this section is provided in Box 2.1.
To search sequence databases, there are basically three different strategies.
– To easily retrieve a known sequence, you can rely on unique sequence identifiers.
– To collect a comprehensive set of sequences that share a taxonomic origin or a known property, the annotation can be searched by keyword.
– To find the most complete set of homologous sequences a search by similarity of a selected query sequence against a sequence database can be performed using tools like BLAST or FASTA.
- Type
- Chapter
- Information
- The Phylogenetic HandbookA Practical Approach to Phylogenetic Analysis and Hypothesis Testing, pp. 33 - 67Publisher: Cambridge University PressPrint publication year: 2009
- 1
- Cited by