Hostname: page-component-6587cd75c8-cpvbf Total loading time: 0 Render date: 2025-04-23T14:32:21.293Z Has data issue: false hasContentIssue false

Asymptotics of the allele frequency spectrum and the number of alleles

Published online by Cambridge University Press:  22 November 2024

Ross A. Maller*
Affiliation:
The Australian National University
Soudabeh Shemehsavar*
Affiliation:
Murdoch University and University of Tehran
*
*Postal address: Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, ACT, 0200, Australia. Email address: Ross.Maller@anu.edu.au
**Postal address: College of Science, Technology, Engineering and Mathematics, and Centre for Healthy Ageing, Health Future Institute, Murdoch University, and School of Mathematics, Statistics & Computer Sciences, University of Tehran. Email address: Soudabeh.Shemehsavar@murdoch.edu.au

Abstract

We derive large-sample and other limiting distributions of components of the allele frequency spectrum vector, $\mathbf{M}_n$, joint with the number of alleles, $K_n$, from a sample of n genes. Models analysed include those constructed from gamma and $\alpha$-stable subordinators by Kingman (thus including the Ewens model), the two-parameter extension by Pitman and Yor, and a two-parameter version constructed by omitting large jumps from an $\alpha$-stable subordinator. In each case the limiting distribution of a finite number of components of $\mathbf{M}_n$ is derived, joint with $K_n$. New results include that in the Poisson–Dirichlet case, $\mathbf{M}_n$ and $K_n$ are asymptotically independent after centering and norming for $K_n$, and it is notable, especially for statistical applications, that in other cases the limiting distribution of a finite number of components of $\mathbf{M}_n$, after centering and an unusual $n^{\alpha/2}$ norming, conditional on that of $K_n$, is normal.

Type
Original Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Arratia, R. and Baxendale, P. (2015). Bounded size bias coupling: a Gamma function bound, and universal Dickman-function behavior. Prob. Theory Related Fields 162, 411429.CrossRefGoogle Scholar
Arratia, R., Barbour, A. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach (EMS Monographs in Mathematics). European Mathematical Society, Zurich.CrossRefGoogle Scholar
Basdevant, A. and Goldschmidt, C. (2008). Asymptotics of the allele frequency spectrum associated with the Bolthausen–Sznitman coalescent. Electron. J. Prob. 13, 486512.CrossRefGoogle Scholar
Berestycki, J., Berestycki, N. and Schweinsberg, J. (2007). Beta-coalescents and continuous stable random trees. Ann. Prob. 35, 18351887.CrossRefGoogle Scholar
Cereda, G. and Corradi, F. (2023). Learning the two parameters of the Poisson–Dirichlet distribution with a forensic application. Scand. J. Statist. 50, 120141.CrossRefGoogle Scholar
Chegini, S. and Zarepour, M. (2023). Random discrete probability measures based on negative binomial process. Available at arXiv:2307.00176.Google Scholar
Covo, S. (2009). On approximations of small jumps of subordinators with particular emphasis on a Dickman-type limit. J. Appl. Prob. 46, 732755.CrossRefGoogle Scholar
Covo, S. (2009). One-dimensional distributions of subordinators with upper truncated Lévy measure, and applications. Adv. Appl. Prob. 41, 367392.CrossRefGoogle Scholar
Dolera, E. and Favaro, S. (2020). A Berry–Esseen theorem for Pitman’s $\alpha$ -diversity. Ann. Appl. Prob. 30, 847869.CrossRefGoogle Scholar
Ewens, W. (1972). The sampling theory of selectively neutral alleles. Theoret. Pop. Biol. 3, 87112.CrossRefGoogle ScholarPubMed
Favaro, S. and Feng, S. (2014). Asymptotics for the number of blocks in a conditional Ewens–Pitman sampling model. Electron. J. Prob. 19, 21, 115.CrossRefGoogle Scholar
Feng, S. (2007). Large deviations associated with Poisson–Dirichlet distribution and Ewens sampling formula. Ann. Appl. Prob. 17, 15701595.CrossRefGoogle Scholar
Feng, S. (2010). The Poisson–Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviours (Probability and its Applications). Springer.CrossRefGoogle Scholar
Freund, F. and and Möhle, M. (2009). On the number of allelic types for samples taken from exchangeable coalescents with mutation. Adv. Appl. Prob. 41, 10821101.CrossRefGoogle Scholar
Gnedenko, B. V. and Kolmogorov, A. N. (1968). Limit Distributions for Sums of Independent Random Variables. Addison-Wesley.Google Scholar
Gregoire, G. (1984). Negative binomial distributions for point processes. Stoch. Process. Appl. 16, 179188.CrossRefGoogle Scholar
Griffiths, R. C. (1979). On the distribution of allele frequencies in a diffusion model. Theoret. Pop. Biol. 15, 140158.CrossRefGoogle Scholar
Griffiths, R. C. (2003). The frequency spectrum of a mutation, and its age, in a general diffusion model. Theoret. Pop. Biol. 64, 241251.CrossRefGoogle Scholar
Grote, M. N. and Speed, T. P. (2002). Approximate Ewens formulae for symmetric overdominance selection. Ann. Appl. Prob. 12, 637663.CrossRefGoogle Scholar
Handa, K. (2009). The two-parameter Poisson–Dirichlet point process. Bernoulli 15, 10821116.CrossRefGoogle Scholar
Hansen, J. (1994). Order statistics for decomposable combinatorial structures. Random Structures Algorithms 5I, 517533.CrossRefGoogle Scholar
Hensley, D. (1982). The convolution powers of the Dickman function. J. London Math. Soc. s2-33, 395–406.Google Scholar
Ipsen, Y. F. and Maller, R. A. (2017). Negative binomial construction of random discrete distributions on the infinite simplex. Theory Stoch. Process. 22, 3446.Google Scholar
Ipsen, Y. F., Maller, R. A. and Shemehsavar, S. (2020). Limiting distributions of generalised Poisson–Dirichlet distributions based on negative binomial processes. J. Theoret. Prob. 33, 19742000.CrossRefGoogle Scholar
Ipsen, Y. F., Maller, R. A. and Shemehsavar, S. (2020). Size biased sampling from the Dickman subordinator. Stoch. Process. Appl. 130, 68806900.CrossRefGoogle Scholar
Ipsen, Y. F., Maller, R. A. and Shemehsavar, S. (2021). A generalised Dickman distribution and the number of species in a negative binomial process model. Adv. Appl. Prob. 53, 370399.CrossRefGoogle Scholar
Ipsen, Y. F., Shemehsavar, S. and Maller, R. A. (2018). Species sampling models generated by negative binomial processes. Available at arXiv:1904.13046.Google Scholar
James, L. F. (2008). Large sample asymptotics for the two-parameter Poisson–Dirichlet process. Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, vol. 3, pp. 187199. Institute of Mathematical Statistics.Google Scholar
Joyce, P., Krone, S. M. and Kurtz, T.G. (2002). Gaussian limits associated with the Poisson–Dirichlet distribution and the Ewens sampling formula. Ann. Appl. Prob. 12, 101124.CrossRefGoogle Scholar
Keith, T. P., Brooks, L. D., Lewontin, R. C., Martinez-Cruzado, J. C. and Rigby, D. L. (1985). Nearly identical allelic distributions of xanthine dehydrogenase in two populations of Drosophila pseudoobscural . Mol. Biol. Evol. 2, 206216.Google Scholar
Kingman, J. F. C. (1975). Random discrete distributions. J. R. Statist. Soc. B 37, 122.CrossRefGoogle Scholar
Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235248.CrossRefGoogle Scholar
Koriyama, T., Matsuda, T. and Komaki, F. (2023). Asymptotic analysis of parameter estimation for the Ewens–Pitman partition. Available at arXiv:2207.01949v3.Google Scholar
Lijoi, A., Mena, R. H. and Prunster, I. (2005). Mixture modeling with normalized inverse-Gaussian priors. J. Amer. Statist. Assoc. 100, 12781291.CrossRefGoogle Scholar
Maller, R. A. and Shemehsavar, S. (2023). Generalized Poisson–Dirichlet distributions based on the Dickman subordinator. Theory Prob. Appl. 67, 593612.CrossRefGoogle Scholar
Mas-Sandoval, A., Pope, N. S., Nielsen, K. N., Altinkaya, I., Fumagalli, M. and Korneliussen, T. S. (2022). Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data. Gigascience 11, giac032.CrossRefGoogle ScholarPubMed
Möhle, M. (2015). The Mittag–Leffler process and a scaling limit for the block counting process of the Bolthausen–Sznitman coalescent. ALEA 12, 3553.Google Scholar
Perman, M. (1993). Order statistics for jumps of normalised subordinators. Stoch. Process. Appl. 46, 267281.CrossRefGoogle Scholar
Perman, M., Pitman, J. and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Prob. Theory Related Fields 92, 2139.CrossRefGoogle Scholar
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Prob. Theory Related Fields 102, 145158.CrossRefGoogle Scholar
Pitman, J. (1997). Partition structures derived from Brownian motion and stable subordinators. Bernoulli 3, 7996.CrossRefGoogle Scholar
Pitman, J. (2006). Combinatorial Stochastic Processes. Springer, Berlin.Google Scholar
Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Prob. 25, 855900.CrossRefGoogle Scholar
Ruggiero, M., Walker, S. G. and Favaro, S. (2013). Alpha-diversity processes and normalized inverse-Gaussian diffusions. Ann. Appl. Prob. 23, 386425.CrossRefGoogle Scholar
Watterson, G. A. (1974). The sampling theory of selectively neutral alleles. Adv. Appl. Prob. 6, 463468.CrossRefGoogle Scholar
Zhang, J. and Dassios, A. (2024). Truncated two-parameter Poisson–Dirichlet approximation for Pitman–Yor process hierarchical models. Scand. J. Statist. 51, 590611.CrossRefGoogle Scholar
Zhou, M., Favaro, S. and Walker, S. G. (2017). Frequency of frequencies distributions and size-dependent exchangeable random partitions. J. Amer. Statist. Assoc. 112, 16231635.CrossRefGoogle Scholar
Supplementary material: File

Maller and Shemehsavar supplementary material 1

Maller and Shemehsavar supplementary material
Download Maller and Shemehsavar supplementary material 1(File)
File 360.1 KB
Supplementary material: File

Maller and Shemehsavar supplementary material 2

Maller and Shemehsavar supplementary material
Download Maller and Shemehsavar supplementary material 2(File)
File 53.3 KB
Supplementary material: File

Maller and Shemehsavar supplementary material 3

Maller and Shemehsavar supplementary material
Download Maller and Shemehsavar supplementary material 3(File)
File 131.2 KB