Hostname: page-component-586b7cd67f-2plfb Total loading time: 0 Render date: 2024-11-22T21:25:42.188Z Has data issue: false hasContentIssue false

Moderate Deviations for Word Counts in Biological Sequences

Published online by Cambridge University Press:  14 July 2016

Sarah Behrens*
Affiliation:
Max Planck Institute for Molecular Genetics
Matthias Löwe*
Affiliation:
University of Münster
*
Postal address: Max Planck Institute for Molecular Genetics, Department for Computational Molecular Biology, Ihnestraβe 63-73, 14195 Berlin, Germany. Email address: [email protected]
∗∗Postal address: Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstr. 62, 48149, Münster, Germany. Email address: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We derive a moderate deviation principle for word counts (which is extended to counts of multiple patterns) in biological sequences under different models: independent and identically distributed letters, homogeneous Markov chains of order 1 and m, and, in view of the codon structure of DNA sequences, Markov chains with three different transition matrices. This enables us to approximate P-values for the number of word occurrences in DNA and protein sequences in a new manner.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 2009 

References

[1] Behrens, S. (2008). Moderate und gross e abweichungen zur statistischen analyse biologischer sequenzen. , Universität Münster.Google Scholar
[2] Blaisdell, B. E. (1985). Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J. Molec. Evol. 21, 278288.Google Scholar
[3] Chen, X. (1999). Limit theorems for functionals of ergodic Markov chains with general state space. Mem. Amer. Math. Soc. 139.Google Scholar
[4] Chung, K. L. (1967). Markov Chains With Stationary Transition Probabilities, 2nd edn. Springer, New York.Google Scholar
[5] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd edn. Springer, New York.Google Scholar
[6] Djellout, H. and Guillin, A. (2001). Moderate deviations for Markov chains with atom. Stoch. Process. Appl. 95, 203217.Google Scholar
[7] Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge University Press.Google Scholar
[8] Hunter, J. J. (2008). Variances of first passage times in a Markov chain with applications to mixing times. Linear Algebra Appl. 429, 11351162.Google Scholar
[9] Kleffe, J. and Borodovsky, M. (1992). First and second moment of counts of words in random texts generated by Markov chains. Comput. Appl. Biosci. 8, 433441.Google Scholar
[10] Kleffe, J. and Langbecker, U. (1990). Exact computation of pattern probabilities in random sequences generated by Markov chains. Comput. Appl. Biosci. 6, 347353.Google Scholar
[11] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.Google Scholar
[12] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.Google Scholar
[13] Nuel, G. (2001). Grandes déviations et chaînes de Markov pour l'étude des occurrences de mots dans les séquences biologiques. , Université d'Essonne.Google Scholar
[14] Nuel, G. (2006). Numerical solutions for patterns statistics on Markov chains. Statist. Appl. Genet. Molec. Biol. 5, 45 pp.Google Scholar
[15] Nussinov, R. (1981). The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Molec. Evol. 17, 237244.Google Scholar
[16] Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitsth. 29, 193227.Google Scholar
[17] Prum, B., Rodolphe, F. and de Turckheim, È. (1995). Finding words with unexpected frequencies in desoxyribonucleic acid sequences. J. R. Statist. Soc. B 57, 205220.Google Scholar
[18] Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259280.Google Scholar
[19] Reinert, G., Schbath, S. and Waterman, M. S. (2005). Probabilistic and statistical properties of finite words in finite sequences. In Applied Combinatorics on Words, eds Berstel, J. and Perrin, D., Cambridge University Press.Google Scholar
[20] Robin, S. and Daudin, J. J. (1999). Exact distributions of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179193.Google Scholar
[21] Schbath, S. (1995). Compound poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 116.Google Scholar
[22] Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaîne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. , Université René Descartes, Paris V.Google Scholar