Book contents
- Frontmatter
- Contents
- Preface
- 1 Observed Markov Chains
- 2 Estimation of an Observed Markov Chain
- 3 Hidden Markov Models
- 4 Filters and Smoothers
- 5 The Viterbi Algorithm
- 6 The EM Algorithm
- 7 A New Markov Chain Model
- 8 Semi-Markov Models
- 9 Hidden Semi-Markov Models
- 10 Filters for Hidden Semi-Markov Models
- Appendix A Higher-Order Chains
- Appendix B An Example of a Second-Order Chain
- Appendix C A Conditional Bayes Theorem
- Appendix D On Conditional Expectations
- Appendix E Some Molecular Biology
- Appendix F Earlier Applications of HiddenMarkov Chain Models
- References
- Index
Appendix F - Earlier Applications of HiddenMarkov Chain Models
Published online by Cambridge University Press: 01 February 2018
- Frontmatter
- Contents
- Preface
- 1 Observed Markov Chains
- 2 Estimation of an Observed Markov Chain
- 3 Hidden Markov Models
- 4 Filters and Smoothers
- 5 The Viterbi Algorithm
- 6 The EM Algorithm
- 7 A New Markov Chain Model
- 8 Semi-Markov Models
- 9 Hidden Semi-Markov Models
- 10 Filters for Hidden Semi-Markov Models
- Appendix A Higher-Order Chains
- Appendix B An Example of a Second-Order Chain
- Appendix C A Conditional Bayes Theorem
- Appendix D On Conditional Expectations
- Appendix E Some Molecular Biology
- Appendix F Earlier Applications of HiddenMarkov Chain Models
- References
- Index
Summary
Introduction
In this appendix some earlier application methods are briefly described.
Markov chain models can be used to provide probability models for sequences of symbols. This will aid in genome annotation. The types of questions that can be asked include the following: Does a particular sequence belong to a particular family and what can one say about its internal structure? How can one discriminate between two sequences?
Some general reviews are given in (Durbin et al., 1998, Chapters 2 and 3), (Robin et al., 2005, Chapters 1 and 2), but a more detailed review of observed Markov chains is provided by (Koski, 2001, Chapter 9). We have added some extra details to Koski's treatment.
A straightforward application of Markov chains to genome sequencing. This approach does not seem to work for the following reasons:
• The four bases A, T, G, C are not uniformly distributed in a sequence and the compositions vary within and between sequences.
• Various k-tuples of bases are not uniformly distributed. However, exons and introns are often separated on the basis of dinucleotide frequencies.
• It seems that higher-order chains need to be used as probabilities of a base in a particular location and then can depend not only on the immediately adjacent bases. In addition, the base composition can vary from one segment to another. The segmentation techniques for decomposing DNA sequences into homogeneous segments includes hidden Markov models.
Frame-dependent Markov chains. These use the GeneMark software; information can be found at
http://genemark.biology.gatech.edu/GeneMark/gm_info.html
Mixture transition distribution chain of order k. These are called MTD(k) models. For a Markov chain of order k with a state-space of size N, there are (N − 1)Nk entries in the transition matrix A to be estimated, (the column sums of A are 1), plus the initial probabilities. With N = 4 and k = 8, we have 3 ・ 48 = 196, 608 which is quite large. This has a further implication that we may not have enough data to calibrate all these entries in A. We comment on estimation using sparse data below.
- Type
- Chapter
- Information
- Introduction to Hidden Semi-Markov Models , pp. 165 - 168Publisher: Cambridge University PressPrint publication year: 2018