Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary markov chain

Etienne Roquain; Sophie Schbath

doi:10.1239/aap/1175266472

Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary markov chain

Part of: Distribution theory Combinatorial probability

Published online by Cambridge University Press: 01 July 2016

Etienne Roquain and

Sophie Schbath

Show author details

Etienne Roquain*: Affiliation:
Institut National de la Recherche Agronomique
Sophie Schbath*: Affiliation:
Institut National de la Recherche Agronomique
*: ∗ Postal address: INRA, Unité Mathématique, Informatique et Génome, Domaine de Vilvert, F-78352 Jouy-en-Josas, France.
∗ Postal address: INRA, Unité Mathématique, Informatique et Génome, Domaine de Vilvert, F-78352 Jouy-en-Josas, France.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.

Keywords

Compound Poisson approximation Chen-Stein method multiple word count clump period Markov chain

MSC classification

Primary: 62E17: Approximations to distributions (nonasymptotic)

Secondary: 60C05: Combinatorial probability

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 39 , Issue 1 , March 2007 , pp. 128 - 140

DOI: https://doi.org/10.1239/aap/1175266472 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2007

References

Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen–Stein method. Statist. Sci. 5, 403–434.Google Scholar

Chryssaphinou, O. and Papastavridis, S. (1990). The occurrence of sequence patterns in repeated dependent experiments. Theory Prob. Appl. 35, 145–152.CrossRef Google Scholar

Chryssaphinou, O., Papastavridis, S. and Vaggelatou, E. (2001). Poisson approximation for the non-overlapping appearances of several words in Markov chains. Combin. Prob. Comput. 10, 293–308.Google Scholar

Godbole, A. P. (1991). Poisson approximations for runs and patterns of rare events. Adv. Appl. Prob. 23, 851–865.Google Scholar

Lothaire, M. (2005). Applied Combinatorics on Words. Cambridge University Press.Google Scholar

Prum, B., Rodolphe, F. and de Turckheim, É. (1995). Finding words with unexpected frequencies in DNA sequences. J. R. Statist. Soc. B 57, 205–220.Google Scholar

Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259–280.Google Scholar

Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223–253.Google Scholar

Reinert, G., Schbath, S. and Waterman, M. (2000). Probabilistic and statistical properties of words. J. Comput. Biol. 7, 1–46.Google Scholar

Robin, S. and Daudin, J.-J. (1999). Exact distribution of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179–193.Google Scholar

Robin, S. and Schbath, S. (2001). Numerical comparison of several approximations of the word count distribution in random sequences. J. Comput. Biol. 8, 349–359.Google Scholar

Schbath, S. (1995). Compound Poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 1–16.Google Scholar

Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaıcirc;ne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. , Université René Descartes, Paris V.Google Scholar

Article contents

Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary markov chain

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests