Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-24T18:08:52.528Z Has data issue: false hasContentIssue false

r-scan statistics of a Poisson process with events transformed by duplications, deletions, and displacements

Published online by Cambridge University Press:  01 July 2016

Chingfer Chen*
Affiliation:
Stanford University
Samuel Karlin*
Affiliation:
Stanford University
*
Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305, USA.
Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305, USA.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

A stochastic model of a dynamic marker array in which markers could disappear, duplicate, and move relative to its original position is constructed to reflect on the nature of long DNA sequences. The sequence changes of deletions, duplications, and displacements follow the stochastic rules: (i) the original distribution of the marker array {…, X−2, X−1, X0, X1, X2, …} is a Poisson process on the real line; (ii) each marker is replicated l times; replication or loss of marker points occur independently; (iii) each replicated point is independently and randomly displaced by an amount Y relative to its original position, with the Y displacements sampled from a continuous density g(y). Limiting distributions for the maximal and minimal statistics of the r-scan lengths (collection of distances between r + 1 successive markers) for the l-shift model are derived with the aid of the Chen-Stein method and properties of Poisson processes.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2007 

References

Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen–Stein method. Ann. Prob. 17, 925.Google Scholar
Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson Approximation. Oxford University Press.Google Scholar
Berg, D. E. and Howe, M. M. (1989). Mobile DNA. American Society for Microbiology, Washington, DC.Google Scholar
Bernardi, G. et al. (1985). The mosaic genome of warm-blooded vertebrates. Science 228, 953958.CrossRefGoogle ScholarPubMed
Bernardi, G., Mouchiroud, D., Gautier, C., Bernardi, G. (1988). Compositional patterns in vertebrate genomes: conservation and change in evolution. J. Molec. Evol. 28, 718.Google Scholar
Bird, A. P. (1986). CpG-rich islands and the function of DNA methylation. Nature 321, 209213.Google Scholar
Blackburn, E. H. (1991). Structure and function of telomeres. Nature 350, 569573.Google Scholar
Burge, C., Campbell, A. and Karlin, S. (1992). Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Nat. Acad. Sci. USA 89, 13581362.Google Scholar
Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann Prob. 3, 534545.Google Scholar
Dembo, A. and Karlin, S. (1992). Poisson approximations for r-scan processes. Ann. Appl. Prob. 2, 329337.Google Scholar
Ficket, J. W. (1982). Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 53035318.CrossRefGoogle Scholar
Gerstein, M. (1997). A structure census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J. Molec. Biol. 274, 562576.CrossRefGoogle ScholarPubMed
Gilson, E. et al. (1991). Palindromic units are part of a new bacterial interspersed mosaic element (BIME). Nucleic Acids Res. 19, 13751383.CrossRefGoogle ScholarPubMed
Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.Google Scholar
Josse, J., Kaiser, A. D. and Kornberg, A. (1961). Enzymatic synthesis of deoxyribonucleic acid. J. Biol. Chem. 236, 864875.Google Scholar
Karlin, S. and Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis. Science 257, 3949.Google Scholar
Karlin, S. and Cardon, L. R. (1994). Computational DNA sequence analysis. Ann. Rev. Microbiol. 48, 619654.Google Scholar
Karlin, S. and Macken, C. (1991). Some statistical problems in the assessment of inhomogeneities of DNA sequence data. J. Amer. Statist. Assoc. 86, 2735.Google Scholar
Karlin, S., Mrázek, J. and Campbell, A. (1996). Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 24, 42634272.CrossRefGoogle ScholarPubMed
Kingman, J. F. C. (1993). Poisson Processes. Oxford University Press.Google Scholar
Krawiec, S. and Riley, M. (1990). Organization of the bacterial chromosome. Microbiol. Rev. 54, 502539.CrossRefGoogle ScholarPubMed
Naus, J. I. (1979). An indexed bibliography of clusters, clumps and coincidences. Internat. Statist. Rev. 47, 4778.Google Scholar
Naus, J. I. (1982). Approximations for distributions of scan statistics. J. Amer. Statist. Assoc. 77, 177183.Google Scholar
Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223254.Google Scholar
Willard, H. F. and Waye, J. S. (1987). Hierachical order in chromosome-specific human alpha satellite DNA. Trends Genet. 3, 192198.Google Scholar