Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
10 - Multiple Sequence Alignment II
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Acknowledgments
- 1 The Central Dogma
- 2 RNA Secondary Structure
- 3 Comparing DNA Sequences
- 4 Predicting Species: Statistical Models
- 5 Substitution Matrices for Amino Acids
- 6 Sequence Databases
- 7 Local Alignment and the BLAST Heuristic
- 8 Statistics of BLAST Database Searches
- 9 Multiple Sequence Alignment I
- 10 Multiple Sequence Alignment II
- 11 Phylogeny Reconstruction
- 12 Protein Motifs and PROSITE
- 13 Fragment Assembly
- 14 Coding Sequence Prediction with Dicodons
- 15 Satellite Identification
- 16 Restriction Mapping
- 17 Rearranging Genomes: Gates and Hurdles
- A Drawing RNA Cloverleaves
- B Space-Saving Strategies for Alignment
- C A Data Structure for Disjoint Sets
- D Suggestions for Further Reading
- Bibliography
- Index
Summary
In the previous chapter we saw that finding the very best multiple alignment of a large number of sequences is difficult for two different reasons. First, although the direct method is general, it seems to require a different program text for each different number of sequences. Second, if there are K sequences of roughly equal length then the number of entries in the dynamic programming table increases as fast as the K th power of the length, and the time required to fill each entry increases as the K th power of 2.
In this chapter, we will address both of these difficulties to some degree. It is, in fact, possible to create a single program text that works for any number of strings. And, although some inputs seem to require that nearly all of the table entries be filled in, careful analysis of the inputs supplied on a particular run will often help us to avoid filling large sections of the table that have no influence on the final alignment. Part of this analysis can be performed by the heuristic methods described in the previous chapter, since better approximate alignments help our method search more quickly for the best alignment.
Pushing through the Matrix by Layers
As a comfortable context for learning two of the techniques used in our program, we will begin by modifying Chapter 3's subroutine similarity for computing the similarity of two sequences.
- Type
- Chapter
- Information
- Genomic PerlFrom Bioinformatics Basics to Working Code, pp. 141 - 154Publisher: Cambridge University PressPrint publication year: 2002