Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
7 - Local Alignment and the BLAST Heuristic
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
Summary
In Chapters 3 and 5 we saw how to develop an alignment scoring matrix and, given such a matrix, how to find the alignment of two strings with the highest score. In Chapter 6, we learned about some of the large genomic databases available for reference. Perhaps the most commonly performed bioinformatic task is to search a large protein sequence database for entries whose similarity scores may indicate homology with some query sequence, often a newly sequenced protein or putative protein.
The Needleman–Wunsch algorithm described in Chapter 3 constructs global alignments – alignments of the entireties of its two input sequences. In practice, homologous proteins are not similar over their entire lengths. This is because of differences in the importance of different segments of the sequence for the function of the protein. A typical protein has one or more active sites that play crucial roles in the chemical reactions it catalyzes. Acceptance of mutations in the midst of an active site is infrequent, since such mutations are likely to disrupt the protein's function. The segments intervening between the active sites help give the protein its peculiar shape but do not form strong bonds with other molecules as the protein performs its function. Mutations in these regions are more easily tolerated and thus are more common. Hemoglobin provides a good example; it easily tolerates mutations on its outer surface, but mutations affecting the active sites in its interior can destroy its ability to hold the iron-binding heme group essential to its role as oxygen carrier.
- Type
- Chapter
- Information
- Genomic PerlFrom Bioinformatics Basics to Working Code, pp. 93 - 108Publisher: Cambridge University PressPrint publication year: 2002