Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-19T03:57:30.116Z Has data issue: false hasContentIssue false

A Phase Transition for the Distribution of Matching Blocks

Published online by Cambridge University Press:  12 September 2008

Claudia Neuhauser
Affiliation:
Department of Mathematics, University of Wisconsin, 480 Lincoln Drive, Madison, WI 53706, USA

Abstract

We show distributional results for the length of the longest matching consecutive subsequence between two independent sequences A1, A2, …, Am and B1, B2, …, Bn whose letters are taken from a finite alphabet. We assume that A1, A2, … are i.i.d. with distribution μ and B1, B2, … are i.i.d. with distribution ν. It is known that if μ and v are not too different, the Chen–Stein method for Poisson approximation can be used to establish distributional results. We extend these results beyond the region where the Chen–Stein method was previously successful. We use a combination of ‘matching by patterns’ results obtained by Arratia and Waterman [1], and the Chen–Stein method to show that the Poisson approximation can be extended. Our method explains how the matching is achieved. This provides an explanation for the formulas in Arratia and Waterman [1] and thus answers one of the questions posed in comment F19 in Aldous [2]. Furthermore, in the case where the alphabet consists of only two letters, the phase transition observed by Arratia and Waterman [1] for the strong law of large numbers extends to the distributional result. We conjecture that this phase transition on the distributional level holds for any finite alphabet.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Arratia, R. and Waterman, M. S. (1985). Critical phenomena in sequence matching. Ann. Probab. 13, 12361249.CrossRefGoogle Scholar
[2]Aldous, D. (1989). Probability approximations via the Poisson clumping heuristic, Applied Mathematical Sciences 77, Springer-Verlag.CrossRefGoogle Scholar
[3]Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erdős–Rényi law in distribution, for coin tossing and sequence matching, Ann. Statist. 18, 539570.CrossRefGoogle Scholar
[4]Erdős, P. and Rényi, A. (1970). On a new law of large numbers, J. Anal. Math. 22, 103111.CrossRefGoogle Scholar
[5]Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximations: The Chen–Stein method, Ann. Probab. 17, 925.CrossRefGoogle Scholar
[6]Arratia, R., Gordon, L. and Waterman, M. S. (1986). An extreme value theory for sequence matching, Ann. Statist. 14, 971993.CrossRefGoogle Scholar
[7]Karlin, S. and Ost, F. (1987). Counts of long aligned word matches among random letter sequences, Adv. Applic. Probab. 19, 293351.CrossRefGoogle Scholar
[8]Karlin, S., Ghandour, G., Ost, F., Tavaré, S. and Korn, L. J. (1983). New approaches for computer analysis of nucleic acid sequences, Proc. Nat. Acad. Sci. USA 80, 56605664.CrossRefGoogle ScholarPubMed
[9]Stein, C. M. (1971). A bound for the error in the normal approximations to the distribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Probab. 3, 583602.Google Scholar
[10]Chen, L. H. Y. (1975). Poisson approximation for dependent trials, Ann. Probab. 3, 534545.CrossRefGoogle Scholar
[11]Dembo, A., Karlin, S. and Zeitouni, O. (1994). Limit distribution of maximal non-aligned two-sequence segmental score. Preprint.CrossRefGoogle Scholar
[12]Stein, C. M. (1986). Approximate Computation of Expectations, IMS, Hayward, CA.CrossRefGoogle Scholar
[13]Dembo, A. and Zeitouni, O. (1993). Large Deviations Techniques and Applications, Jones and Bartlett, Boston.Google Scholar
[14]Blom, G. and Thorburn, D. (1982). How many digits are required until given sequences are obtained? J. Appl. Probab. 19, 518531.CrossRefGoogle Scholar
[15]Chryssaphinou, O. and Papastavridis, S. (1988). A limit theorem for the number of non-overlapping occurrences of a pattern in a sequence of independent trials, J. Appl. Probab. 25, 428431.CrossRefGoogle Scholar