Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-28T02:50:38.010Z Has data issue: false hasContentIssue false

Counts of long aligned word matches among random letter sequences

Published online by Cambridge University Press:  01 July 2016

Samuel Karlin*
Affiliation:
Stanford University
Friedemann Ost*
Affiliation:
Technische Universität München
*
Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305, USA.
∗∗ Postal address: Institut für Angewandte Mathematik und Statistik, Technische Universität München, Arcisstr. 21, D-8000 München 2, Germany.

Abstract

Asymptotic distributional properties of the maximal length aligned word (a contiguous set of letters) among multiple random Markov dependent sequences composed of letters from a finite alphabet are given. For sequences of length N, Cr,s(N) defined as the longest common aligned word found in r or more of s sequences has order growth log N/(–logλ) where λis the maximal eigenvalue of r-Schur product matrices from among the collections of Markov matrices that generate the sequences. The count Zr,s(N, k) of positions that initiate an aligned match of length exceeding k = log N/(–logλ) + x but fail to match at the immediately preceding position has a limiting Poisson distribution. Distributional properties of other long aligned word relationships and patterns are also discussed.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1987 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported in part by NIH Grant GM10452-22 and NSF Grant MCS82-15131.

References

Arratia, R. and Waterman, M. (1985) Critical phenomena in sequence matching. Ann. Prob. 13, 12361249.CrossRefGoogle Scholar
Chvatal, V. and Sankoff, D. (1975) Longest common subsequences of two random sequences. J. Appl. Prob. 12, 306315.CrossRefGoogle Scholar
Deken, G. A. (1979) Some limit results for longest common subsequences. Discrete Math. 26, 1731.CrossRefGoogle Scholar
Erdös, P. and Revesz, P. (1975) On the length of the longest head run. Colloq. Math. Soc. J. Bólyai 16, 219-228. In Topics in Information Theory , ed. Csiszár, I. and Elias, P. North-Holland, Amsterdam.Google Scholar
Foulser, D. and Karlin, S. (1987) Maximal success runs for semi-Markov processes. Stoch. Proc. Appl. 29.Google Scholar
Gordon, L., Schilling, M. F. and Waterman, M. (1986) An extreme value theory for sequence matching. Prob. Theory and Related Fields 72, 279287.CrossRefGoogle Scholar
Guibas, L. J. and Odlyzko, A. M. (1980) Long repetitive patterns in random sequences. Z. Wahrscheinlichkeitsth. 53, 241262.CrossRefGoogle Scholar
Karlin, S., Ghandour, G. Ost, F., Tavaré, S. and Korn, L. J. (1983) New approaches for computer analysis of nucleic acid sequences. Proc. Natl. Acad. Sci. USA 80, 56605664.CrossRefGoogle ScholarPubMed
Karlin, S. and Ost, F. (1985a) Some monotonicity properties of Schur powers of matrices and related inequalities. Linear Algebra Appl. 68, 4765.CrossRefGoogle Scholar
Karlin, S. and Ost, F. (1985b) Maximal segmental match length among random sequences from a finite alphabet. Proc. Berkeley Conf. in Honor of J. Neyman and J. Kiefer , Vol. I, ed. LeCam, L. M. and Olshen, R. A.. Wadsworth, Monterey, CA.Google Scholar
Karlin, S. and Ost, F. (1987) Maximal length of common words among random letter sequences. Ann. Prob. CrossRefGoogle Scholar
Karnin, E. D. (1983) The first repetition of a pattern in a symmetric Bernoulli sequence. J. Appl. Prob. 20, 413418.CrossRefGoogle Scholar
Mikhailov, V. G. (1974) Limit distributions of random variables associated with multiple long duplications in a sequence of independent trials. Theory Prob. Appl. 19, 180184.CrossRefGoogle Scholar
Ohno, S. (1970) Evolution by Gene Duplication. Springer-Verlag, Berlin.CrossRefGoogle Scholar
Samarova, S. S. (1981) On the length of the longest head-run for a Markov chain with two states. Theory Prob. Appl. 26, 498509.CrossRefGoogle Scholar
Steele, J. M. (1982) Long common subsequences and probability of two random strings. SIAM J. Appl. Math. 42, 731737.CrossRefGoogle Scholar
Zubkov, A. M. and Mikhailov, V. G. (1974) Limit distributions of random variables associated with long duplications in a sequence of independent trials. Theory Prob. Appl. 19, 172179.CrossRefGoogle Scholar
Zubkov, A. M. and Mikhailov, V. G. (1979) Repetitions of s-tuples in a sequence of independent trials. Theory Prob. Appl. 24, 269282.CrossRefGoogle Scholar