Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
6 - Sequence Databases
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
Summary
Once DNA fragments have been sequenced and assembled, the results must be properly identified and labeled for storage so that their origins will not be the subject of confusion later. As the sequences are studied further, annotations of various sorts will be added. Eventually, it will be appropriate to make the sequences available to a larger audience. Generally speaking, a sequence will begin in a database available only to workers in a particular lab, then move into a database used primarily by workers on a particular organism, then finally arrive in a large publicly accessible database. By far, the most important public database of biological sequences is one maintained jointly by three organizations:
the National Center for Biotechnology Information (NCBI), a constituent of the U.S. National Institutes of Health;
the European Molecular Biology Laboratory (EMBL); and
the DNA Databank of Japan (DDBJ).
At present, the three organizations distribute the same information; many other organizations maintain copies at internal sites, either for faster search or for avoidance of legal “disclosure” of potentially patent-worthy sequence data by submission for search on publicly accessible servers. Although their overall formats are not identical, these organizations have collaborated since 1987 on a common annotation format. We will focus on NCBI's GenBank database as a representative of the group.
- Type
- Chapter
- Information
- Genomic PerlFrom Bioinformatics Basics to Working Code, pp. 72 - 92Publisher: Cambridge University PressPrint publication year: 2002