Asymptotic normality for -statistics, with applications to pattern matching in random strings and permutations

Svante Janson

doi:10.1017/apr.2022.51

Asymptotic normality for $\boldsymbol{m}$-dependent and constrained $\boldsymbol{U}$-statistics, with applications to pattern matching in random strings and permutations

Part of: Theory of computing Limit theorems Combinatorics Combinatorial probability

Published online by Cambridge University Press: 28 March 2023

Svante Janson

Show author details

Svante Janson*: Affiliation:
Uppsala University
*: *Postal address: Department of Mathematics, Uppsala University, PO Box 480, SE-751 06 Uppsala, Sweden. Email address: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We study (asymmetric) $U$-statistics based on a stationary sequence of $m$-dependent variables; moreover, we consider constrained $U$-statistics, where the defining multiple sum only includes terms satisfying some restrictions on the gaps between indices. Results include a law of large numbers and a central limit theorem, together with results on rate of convergence, moment convergence, functional convergence, and a renewal theory version.

Special attention is paid to degenerate cases where, after the standard normalization, the asymptotic variance vanishes; in these cases non-normal limits occur after a different normalization.

The results are motivated by applications to pattern matching in random strings and permutations. We obtain both new results and new proofs of old results.

Keywords

Random words constrained occurrences vincular patterns

MSC classification

Primary: 60F05: Central limit and other weak theorems

Secondary: 05A05: Permutations, words, matrices 60C05: Combinatorial probability 68Q87: Probability in computer science (algorithm analysis, random structures, phase transitions, etc.)

Type: Original Article
Information: Advances in Applied Probability , Volume 55 , Issue 3 , September 2023 , pp. 841 - 894

DOI: https://doi.org/10.1017/apr.2022.51 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alsmeyer, G. and Hoefs, V. (2001). Markov renewal theory for stationary

$m$ -block factors. Markov Process. Relat. Fields 7, 325–348.Google Scholar

Baldi, P. and Rinott, Y. (1989). On normal approximations of distributions in terms of dependency graphs. Ann. Prob. 17, 1646–1650.10.1214/aop/1176991178CrossRef Google Scholar

Barbour, A. D., Karoński, M. and Ruciński, A. (1989). A central limit theorem for decomposable random variables with applications to random graphs. J. Combinatorial Theory B 47, 125–145.10.1016/0095-8956(89)90014-2CrossRef Google Scholar

Bender, E. A. and Kochman, F. (1993). The distribution of subword counts is usually normal. Europ. J. Combinatorics 14, 265–275.10.1006/eujc.1993.1030CrossRef Google Scholar

Billingsley, P. (1956). The invariance principle for dependent random variables. Trans. Amer. Math. Soc. 83, 250–268.10.1090/S0002-9947-1956-0090923-6CrossRef Google Scholar

Blom, G. (1976). Some properties of incomplete

$U$ -statistics. Biometrika 63, 573–580.10.1093/biomet/63.3.573CrossRef Google Scholar

Bóna, M. (2007). The copies of any permutation pattern are asymptotically normal. Preprint. Available at https://arxiv.org/0712.2792.Google Scholar

Bóna, M. (2008). Generalized descents and normality. Electron. J. Combinatorics 15, paper no. 21, 8 pp.10.37236/896CrossRef Google Scholar

Bóna, M. (2010). On three different notions of monotone subsequences. In Permutation Patterns, Cambridge University Press, pp. 89–114.10.1017/CBO9780511902499.005CrossRef Google Scholar

Bourdon, J. and Vallée, B. (2002). Generalized pattern matching statistics. In Mathematics and Computer Science II: Algorithms, Trees, Combinatorics and Probabilities, Birkhäuser, Basel, pp. 249–265.Google Scholar

Bourdon, J. and Vallée, B. (2006). Pattern matching statistics on correlated sources. In LATIN 2006: Theoretical Informatics, Springer, Berlin, pp. 224–237.10.1007/11682462_24CrossRef Google Scholar

Bradley, R. C. (2007). Introduction to Strong Mixing Conditions, Vol. 1–3. Kendrick Press, Heber City, UT.Google Scholar

Cameron, R. H. and Martin, W. T. (1945). Transformations of Wiener integrals under a general class of linear transformations. Trans. Amer. Math. Soc. 58, 184–219.10.1090/S0002-9947-1945-0013240-1CrossRef Google Scholar

Chen, L. H. Y., Goldstein, L. and Shao, Q.-M. (2011). Normal Approximation by Stein’s Method. Springer, Berlin. 10.1007/978-3-642-15007-4CrossRef Google Scholar

Dehling, H. and Wendler, M. (2010). Central limit theorem and the bootstrap for

$U$ -statistics of strongly mixing data. J. Multivariate Anal. 101, 126–137.10.1016/j.jmva.2009.06.002CrossRef Google Scholar

Dynkin, E. B. and Mandelbaum, A. (1983). Symmetric statistics, Poisson point processes, and multiple Wiener integrals. Ann. Statist. 11, 739–745.10.1214/aos/1176346241CrossRef Google Scholar

Euler, L. (1743). De summis serierum reciprocarum ex potestatibus numerorum naturalium ortarum dissertatio altera, in qua eaedem summationes ex fonte maxime diverso derivantur. Miscellanea Berolinensia 7, 172–192. Reprinted in Opera Omnia, Ser. 1, Vol. 14, Teubner, Leipzig, 1925, pp. 138–155.Google Scholar

Even-Zohar, C. (2020). Patterns in random permutations. Combinatorica 40, 775–804.10.1007/s00493-020-4212-zCrossRef Google Scholar

Even-Zohar, C., Lakrec, T. and Tessler, R. J. (2021). Spectral analysis of word statistics. Sém. Lothar. Combin. 85B, paper no. 81, 12 pp.Google Scholar

Fang, X. (2016). A multivariate CLT for bounded decomposable random vectors with the best known rate. J. Theoret. Prob. 29, 1510–1523.10.1007/s10959-015-0619-7CrossRef Google Scholar

Fisher, N. I. and Lee, A. J. (1982). Nonparametric measures of angular–angular association. Biometrika 69, 315–321.10.2307/2335405CrossRef Google Scholar

Flajolet, P. and Sedgewick, R. (2009). Analytic Combinatorics. Cambridge University Press.10.1017/CBO9780511801655CrossRef Google Scholar

Flajolet, P., Szpankowski, W. and Vallée, B. (2006). Hidden word statistics. J. Assoc. Comput. Mach. 53, 147–183.10.1145/1120582.1120586CrossRef Google Scholar

Gut, A. (2009). Stopped Random Walks, 2nd edn. Springer, New York.10.1007/978-0-387-87835-5CrossRef Google Scholar

Gut, A. (2013). Probability: A Graduate Course, 2nd edn. Springer, New York.10.1007/978-1-4614-4708-5CrossRef Google Scholar

Han, F. and Qian, T. (2018). On inference validity of weighted U-statistics under data heterogeneity. Electron. J. Statist. 12, 2637–2708.10.1214/18-EJS1462CrossRef Google Scholar

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19, 293–325.10.1214/aoms/1177730196CrossRef Google Scholar

Hoeffding, W. (1961). The strong law of large numbers for

$U$ -statistics. Tech. Rep., University of North Carolina, Institute of Statistics, Mimeograph Series No. 302. Available at https://repository.lib.ncsu.edu/handle/1840.4/2128.Google Scholar

Hofer, L. (2018). A central limit theorem for vincular permutation patterns. Discrete Math. Theoret. Comput. Sci. 19, paper no. 9, 26 pp.Google Scholar

Hsing, T. and Wu, W. B. (2004). On weighted

$U$ -statistics for stationary processes. Ann. Prob. 32, 1600–1631.10.1214/009117904000000333CrossRef Google Scholar

Jacquet, P. and Szpankowski, W. (2015). Analytic Pattern Matching: From DNA to Twitter. Cambridge University Press.10.1017/CBO9780511843204CrossRef Google Scholar

Jammalamadaka, S. R. and Janson, S. (1986). Limit theorems for a triangular scheme of

$U$ -statistics with applications to inter-point distances. Ann. Prob. 14, 1347–1358.10.1214/aop/1176992375CrossRef Google Scholar

Janson, S. (1983). Renewal theory for

$M$ -dependent variables. Ann. Prob. 11, 558–568.10.1214/aop/1176993500CrossRef Google Scholar

Janson, S. (1988). Normal convergence by higher semi-invariants with applications to sums of dependent random variables and random graphs. Ann. Prob. 16, 305–312.10.1214/aop/1176991903CrossRef Google Scholar

Janson, S. (1997). Gaussian Hilbert Spaces. Cambridge University Press.10.1017/CBO9780511526169CrossRef Google Scholar

Janson, S. (2015). On degenerate sums of

$m$ -dependent variables. J. Appl. Prob. 52, 1146–1155.10.1239/jap/1450802758CrossRef Google Scholar

Janson, S. (2018). Renewal theory for asymmetric

$U$ -statistics. Electron. J. Prob. 23, paper no. 129, 27 pp.10.1214/18-EJP252CrossRef Google Scholar

Janson, S. (2022). The number of occurrences of patterns in a random tree or forest permutation. Preprint. Available at https://arxiv.org/2203.04182.Google Scholar

Janson, S., Nakamura, B. and Zeilberger, D. (2015). On the asymptotic statistics of the number of occurrences of multiple permutation patterns. J. Combinatorics 6, 117–143.10.4310/JOC.2015.v6.n1.a8CrossRef Google Scholar

Janson, S. and Szpankowski, W. (2021). Hidden words statistics for large patterns. Electron. J. Combinatorics 28, paper no. P2.36, 26 pp.10.37236/9452CrossRef Google Scholar

Kallenberg, O. (2002). Foundations of Modern Probability, 2nd edn. Springer, New York.10.1007/978-1-4757-4015-8CrossRef Google Scholar

Major, P. (1994). Asymptotic distributions for weighted

$U$ -statistics. Ann. Prob. 22, 1514–1535.10.1214/aop/1176988610CrossRef Google Scholar

Malevich, T. L. and Abdalimov, B. (1982). Refinement of the central limit theorem for

$U$ -statistics of

$m$ -dependent variables. Teor. Veroyat. Primen. 27, 369–373 (in Russian). English translation: Theory Prob. Appl. 27, 391–396.10.1137/1127045CrossRef Google Scholar

Miller, R. G. Jr. and Sen, P. K. (1972). Weak convergence of

$U$ -statistics and von Mises’ differentiable statistical functions. Ann. Math. Statist. 43, 31–41.10.1214/aoms/1177692698CrossRef Google Scholar

Nicodème, P., Salvy, B. and Flajolet, P. (2002). Motif statistics. Theoret. Comput. Sci. 287, 593–617.10.1016/S0304-3975(01)00264-XCrossRef Google Scholar

Olver, F. W. J., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (eds) (2010). NIST Handbook of Mathematical Functions. Cambridge University Press.Google Scholar

O’Neil, K. A. and Redner, R. A. (1993). Asymptotic distributions of weighted

$U$ -statistics of degree 2. Ann. Prob. 21, 1159–1169.Google Scholar

Orey, S. (1958). A central limit theorem for

$m$ -dependent random variables. Duke Math. J. 25, 543–546.10.1215/S0012-7094-58-02548-1CrossRef Google Scholar

Peligrad, M. (1996). On the asymptotic normality of sequences of weak dependent random variables. J. Theoret. Prob. 9, 703–715.10.1007/BF02214083CrossRef Google Scholar

Pike, J. (2011). Convergence rates for generalized descents. Electron. J. Combinatorics 18, paper no. 236, 14 pp.10.37236/723CrossRef Google Scholar

Raič, M. (2004). A multivariate CLT for decomposable random vectors with finite second moments. J. Theoret. Prob. 17, 573–603.10.1023/B:JOTP.0000040290.44087.68CrossRef Google Scholar

Régnier, M. and Szpankowski, W. (1998). On pattern frequency occurrences in a Markovian sequence. Algorithmica 22, 631–649.10.1007/PL00009244CrossRef Google Scholar

Revuz, D. and Yor, M. (1999). Continuous Martingales and Brownian Motion, 3rd edn. Springer, Berlin.10.1007/978-3-662-06400-9CrossRef Google Scholar

Rinott, Y. (1994). On normal approximation rates for certain sums of dependent random variables. J. Comput. Appl. Math. 55, 135–143.10.1016/0377-0427(94)90016-7CrossRef Google Scholar

Rinott, Y. and Rotar, V. (1997). On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted

$U$ -statistics. Ann. Appl. Prob. 7, 1080–1105.10.1214/aoap/1043862425CrossRef Google Scholar

Rubin, H. and Vitale, R. A. (1980). Asymptotic distribution of symmetric statistics. Ann. Statist. 8, 165–170.10.1214/aos/1176344898CrossRef Google Scholar

Sen, P. K. (1960). On some convergence properties of

$U$ -statistics. Calcutta Statist. Assoc. Bull. 10, 1–18.10.1177/0008068319600101CrossRef Google Scholar

Sen, P. K. (1963). On the properties of

$U$ -statistics when the observations are not independent. I. Estimation of non-serial parameters in some stationary stochastic process. Calcutta Statist. Assoc. Bull. 12, 69–92.10.1177/0008068319630301CrossRef Google Scholar

Sen, P. K. (1972). Limiting behavior of regular functionals of empirical distributions for stationary mixing processes. Z. Wahrscheinlichkeitsth. 25, 71–82.10.1007/BF00533337CrossRef Google Scholar

Sen, P. K. (1974). Weak convergence of generalized

$U$ -statistics. Ann. Prob. 2, 90–102.10.1214/aop/1176996754CrossRef Google Scholar

Shapiro, C. P. and Hubert, L. (1979). Asymptotic normality of permutation statistics derived from weighted sums of bivariate functions. Ann. Statist. 7, 788–794.10.1214/aos/1176344728CrossRef Google Scholar

Szpankowski, W. (2001). Average Case Analysis of Algorithms on Sequences. John Wiley, New York.10.1002/9781118032770CrossRef Google Scholar

Utev, S. A. (1990). Central limit theorem for dependent random variables. In Probability Theory and Mathematical Statistics: Proc. Fifth Vilnius Conference, June 25–July 1, 1990, Vol. II, Mokslas, Vilnius, pp. 519–528.Google Scholar

Yoshihara, K. (1976). Limiting behavior of

$U$ -statistics for stationary, absolutely regular processes. Z. Wahrscheinlichkeitsth. 35, 237–252.10.1007/BF00532676CrossRef Google Scholar

Yoshihara, K. (1992). Limiting behavior of

$U$ -statistics for strongly mixing sequences. Yokohama Math. J. 39, 107–113.Google Scholar

Zhou, Z. (2014). Inference of weighted

$V$ -statistics for nonstationary time series and its applications. Ann. Statist. 42, 87–114.10.1214/13-AOS1184CrossRef Google Scholar

Article contents

Asymptotic normality for $\boldsymbol{m}$-dependent and constrained $\boldsymbol{U}$-statistics, with applications to pattern matching in random strings and permutations

Abstract

Keywords

MSC classification

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests