Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-05T13:23:27.151Z Has data issue: false hasContentIssue false

Appendix F - Earlier Applications of HiddenMarkov Chain Models

Published online by Cambridge University Press:  01 February 2018

John van der Hoek
Affiliation:
University of South Australia
Robert J. Elliott
Affiliation:
University of Calgary
Get access

Summary

Introduction

In this appendix some earlier application methods are briefly described.

Markov chain models can be used to provide probability models for sequences of symbols. This will aid in genome annotation. The types of questions that can be asked include the following: Does a particular sequence belong to a particular family and what can one say about its internal structure? How can one discriminate between two sequences?

Some general reviews are given in (Durbin et al., 1998, Chapters 2 and 3), (Robin et al., 2005, Chapters 1 and 2), but a more detailed review of observed Markov chains is provided by (Koski, 2001, Chapter 9). We have added some extra details to Koski's treatment.

A straightforward application of Markov chains to genome sequencing. This approach does not seem to work for the following reasons:

  • • The four bases A, T, G, C are not uniformly distributed in a sequence and the compositions vary within and between sequences.

  • • Various k-tuples of bases are not uniformly distributed. However, exons and introns are often separated on the basis of dinucleotide frequencies.

  • • It seems that higher-order chains need to be used as probabilities of a base in a particular location and then can depend not only on the immediately adjacent bases. In addition, the base composition can vary from one segment to another. The segmentation techniques for decomposing DNA sequences into homogeneous segments includes hidden Markov models.

  • Frame-dependent Markov chains. These use the GeneMark software; information can be found at

    http://genemark.biology.gatech.edu/GeneMark/gm_info.html

    Mixture transition distribution chain of order k. These are called MTD(k) models. For a Markov chain of order k with a state-space of size N, there are (N − 1)Nk entries in the transition matrix A to be estimated, (the column sums of A are 1), plus the initial probabilities. With N = 4 and k = 8, we have 3 ・ 48 = 196, 608 which is quite large. This has a further implication that we may not have enough data to calibrate all these entries in A. We comment on estimation using sparse data below.

    Type
    Chapter
    Information
    Publisher: Cambridge University Press
    Print publication year: 2018

    Access options

    Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

    Save book to Kindle

    To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

    Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

    Find out more about the Kindle Personal Document Service.

    Available formats
    ×

    Save book to Dropbox

    To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

    Available formats
    ×

    Save book to Google Drive

    To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

    Available formats
    ×