Book contents
- Frontmatter
- Contents
- Extended contents
- Preface
- Acknowledgments
- Editors and contributors
- A computational micro primer
- PART I Genomes
- 1 Identifying the genetic basis of disease
- 2 Pattern identification in a haplotype block
- 3 Genome reconstruction: a puzzle with a billion pieces
- 4 Dynamic programming: one algorithmic key for many biological locks
- 5 Measuring evidence: who's your daddy?
- PART II Gene Transcription and Regulation
- PART III Evolution
- PART IV Phylogeny
- PART V Regulatory Networks
- REFERENCES
- Glossary
- Index
4 - Dynamic programming: one algorithmic key for many biological locks
from PART I - Genomes
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Extended contents
- Preface
- Acknowledgments
- Editors and contributors
- A computational micro primer
- PART I Genomes
- 1 Identifying the genetic basis of disease
- 2 Pattern identification in a haplotype block
- 3 Genome reconstruction: a puzzle with a billion pieces
- 4 Dynamic programming: one algorithmic key for many biological locks
- 5 Measuring evidence: who's your daddy?
- PART II Gene Transcription and Regulation
- PART III Evolution
- PART IV Phylogeny
- PART V Regulatory Networks
- REFERENCES
- Glossary
- Index
Summary
Dynamic programming is an algorithm that allows one to find an optimal solution to many important bioinformatics problems without explicit consideration of all possible solutions. This chapter provides a description of the algorithm in the graph-theoretical language, and shows how it is applied to such diverse areas as DNA and protein alignment, gene recognition, and polymer physics.
Introduction
A major part of computational biology deals with the similarity of sequences, be they DNA fragments or proteins. There are four aspects to this problem: defining the measure of similarity, calculating this measure for given sequences, assessing its statistical significance, and interpreting the results from the biological viewpoint. Biologists are interested in the latter: similar sequences may have a common origin, as well as similar structure and function. However, here we shall deal with a formal problem: how to discover similarity.
Consider two sequences from a finite alphabet (e.g. 4 nucleotides or 20 amino acids) written one under the other, possibly with gaps. This is called an alignment (Figure 4.1).
We can calculate the number of matching symbols (nucleotides or amino acids), the number of mismatches, and the number and size of gaps. If we assign a positive weight (premium) to a match, and negative weights (penalties) to a mismatch and a gap of a given size, we can calculate the total score as the sum of all weights. Depending on the weights, different alignments will have the highest score.
- Type
- Chapter
- Information
- Bioinformatics for Biologists , pp. 66 - 92Publisher: Cambridge University PressPrint publication year: 2011