Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2024-12-23T05:58:17.620Z Has data issue: false hasContentIssue false

Expected frequencies of DNA patterns using whittle's formula

Published online by Cambridge University Press:  14 July 2016

Richard Cowan*
Affiliation:
University of Hong Kong
*
Postal address: Department of Statistics, University of Hong Kong, Pokfulam Road, Hong Kong.

Abstract

Given a realisation of a Markov chain, one can count the numbers of state transitions of each type. One can ask how many realisations are there with these transition counts and the same initial state. Whittle (1955) has answered this question, by finding an explicit though complicated formula, and has also shown that each realisation is equally likely. In the analysis of DNA sequences which comprise letters from the set {A, C, G, T}, it is often useful to count the frequency of a pattern, say ACGCT, in a long sequence and compare this with the expected frequency for all sequences having the same start letter and the same transition counts (or ‘dinucleotide counts' as they are called in the molecular biology literature). To date, no exact method exists; this paper rectifies that deficiency.

Type
Short Communications
Copyright
Copyright © Applied Probability Trust 1991 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Avery, P. J. (1987) The analysis of intron data and their use in the detection of short signals. J. Mol. Evol. 26, 335340.Google Scholar
Bird, A. P. (1980) DNA methylation and the frequency of CPG in animal DNA. Nucleic Acids Res. 8, 14991504.10.1093/nar/8.7.1499Google Scholar
Bulmer, M. (1987) A statistical analysis of nucleotide sequences of introns and exons in human genes. Mol. Biol. Evol. 4, 395405.Google Scholar
Cowan, R. (1992) Whittle's formula on a circle. In preparation.Google Scholar
Gardiner-Garden, M. and Frommer, M. (1987) CPG islands in vertebrate genomes. J. Mol. Biol. 196, 261282.Google Scholar
Nussinov, R. (1981) The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Mol. Evol. 17, 237244.10.1007/BF01732761Google Scholar
Smith, T. F., Waterman, M. S. and Sadler, J. R. (1983) Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 11, 22052220.Google Scholar
Whittle, P. (1955) Some distribution and moment formulae for the Markov chain. J. R. Statist. Soc. B 17, 235242.Google Scholar