Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-23T17:07:16.506Z Has data issue: false hasContentIssue false

An improved approximation for assessing the statistical significance of molecular sequence features

Published online by Cambridge University Press:  14 July 2016

S. Mercier*
Affiliation:
Université Toulouse II
D. Cellier*
Affiliation:
Université Toulouse II
D. Charlot*
Affiliation:
Université Rouen
*
Postal address: GRIMM, Département de Mathématiques et Informatique, Université Toulouse II, 31058 Toulouse cedex 9, France. Email address: [email protected]
∗∗ Postal address: LMRS, UMR CNRS 6085, Université Rouen, 76821 Mont-Saint-Aignan, France.
∗∗ Postal address: LMRS, UMR CNRS 6085, Université Rouen, 76821 Mont-Saint-Aignan, France.

Abstract

Using random walk theory, we first establish explicitly the exact distribution of the maximal partial sum of a sequence of independent and identically distributed random variables. This result allows us to obtain a new approximation of the distribution of the local score of one sequence. This approximation improves the one given by Karlin et al., which can be deduced from this new formula. We obtain a more accurate asymptotic expression with additional terms. Examples of application are given.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2003 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arratia, R., and Waterman, M.-S. (1994). A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Prob. 4, 200225.CrossRefGoogle Scholar
Arratia, R., Golstein, L., and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen-Stein method. Ann. Prob. 17, 925.Google Scholar
Arratia, R., Gordon, L., and Waterman, M.-S. (1990). The Erd Hos-Rényi law in distribution for coin tossing and sequence matching. Ann. Statist. 18, 539570.Google Scholar
Asmussen, S. (1987). Applied Probability and Queues. John Wiley, Chichester.Google Scholar
Asmussen, S. (1995). Stationary distributions via first passage times. In Advances in Queuing, ed. Dshalalow, J. H., CRC Press, Boca Raton, FL, pp. 79102.Google Scholar
Borovkov, A.-A. (1976). Stochastic Processes in Queuing Theory (Appl. Math. 4). Springer, New York.Google Scholar
Daudin, J.-J., and Mercier, S. (1999). Distribution exacte du score local d'une suite de variables indépendantes et identiquement distribuées. C. R. Acad. Sci. Paris 329, 815820.Google Scholar
Dembo, A., and Karlin, S. (1991). Strong limit theorems of empirical functionnals for large exceedences of partial sums of i.i.d. variables. Ann. Prob. 19, 17371755.Google Scholar
Dembo, A., Karlin, S., and Zeitouni, O. (1994). Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob. 22, 20222039.Google Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological sequence analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.Google Scholar
Feller, W. (1966). An Introduction to Probability Theory and Its Applications, Vol. 2. John Wiley, New York.Google Scholar
Iglehart, D.-L. (1972). Extreme values in the GI/G/1 queues. Ann. Math. Statist. 43, 627635.Google Scholar
Karlin, S., and Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. Sci. USA 87, 22642268.Google Scholar
Karlin, S., and Dembo, A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24, 113140.Google Scholar
Karlin, S., and Taylor, H. M. (1981). A Second Course in Stochastic Processes. Academic Press, New York.Google Scholar
Karlin, S., Dembo, A., and Kawabata, T. (1990). Statistical composition of high-scoring segments from molecular sequences. Ann. Statist. 18, 571581.Google Scholar
Kyte, J., and Doolittle, R. F. (1982). A simple method for displaying the hydrophatic character of a protein. J. Molec. Biol. 157, 105132.Google Scholar
Mercier, S., and Daudin, J.-J. (2001). Exact distribution for the local score of one i.i.d. random sequence. J. Comput. Biol. 8, 373380.Google Scholar
Mercier, S., Cellier, D., Charlot, F., and Daudin, J.-J. (2001). Exact and asymptotic distribution of the local score of one i.i.d. random sequence. In Proc. JOBIM 2000 Computational Biology (Lecture Notes Comput. Sci. 2066), eds Gascuel, O. and Sagot, M.-F., Springer, Berlin, pp. 7483.Google Scholar
Wald, A. (1947). Sequential Analysis. John Wiley, New York.Google Scholar
Waterman, M. S. (1995). Introduction to Computational Biology. Chapman and Hall, London.Google Scholar
Zhang, Y. (1995). A limit theorem for matching random sequences allowing deletions. Ann. Appl. Prob. 5, 12361240.Google Scholar