1. Introduction
The purpose of the present paper is to present some new results for (asymmetric) $U$ -statistics together with some applications. (See Section 3 for definitions.) The results include a strong law of large numbers and a central limit theorem (asymptotic normality), together with results on rate of convergence, moment convergence, functional convergence, and a renewal theory version.
Many results of these types have been proved for $U$ -statistics under different hypotheses by a large number of authors, from Hoeffding [Reference Hoeffding27] on. The new feature of the results here, which are motivated by applications discussed below, is the combination of the following:
-
(i) We consider, as in e.g. [Reference Janson35], [Reference Janson37], and [Reference Han and Qian26], but unlike many other authors, asymmetric $U$ -statistics and not just the symmetric case. (See Remark 3.3.)
-
(ii) We consider also constrained $U$ -statistics, where the summations are restricted as in (3.2) or (3.3).
-
(iii) The $U$ -statistics are based on an underlying sequence that is not necessarily i.i.d. (as is usually assumed); we assume only that the sequence is stationary and $m$ -dependent. (This case has been studied earlier by e.g. [Reference Sen58], but not in the present asymmetric case.)
The extension to the $m$ -dependent case might be of interest for some applications, but for us the main motivation is that it allows us to reduce the constrained versions to ordinary $U$ -statistics; hence this extension is implicitly used also when we apply the results for constrained $U$ -statistics based on i.i.d. sequences.
Remark 1.1. The combination of the features (i)–(iii) above is new, but they have each been considered separately earlier.
In particular, constrained $U$ -statistics are special cases of the large class of incomplete $U$ -statistics [Reference Blom6]. These are, in turn, special cases of the even more general weighted $U$ -statistics; see e.g. [Reference Shapiro and Hubert61], [Reference O’Neil and Redner47], [Reference Major42], [Reference Rinott and Rotar55], [Reference Hsing and Wu30], [Reference Zhou66], and [Reference Han and Qian26]. (These references show asymptotic normality under various conditions; some also study degenerate cases with non-normal limits; [Reference Han and Qian26] includes the asymmetric case.) In view of our applications, we consider here only the constrained case instead of trying to find suitable conditions for general weights.
Similarly, $U$ -statistics have been considered by many authors for more general weakly dependent sequences than $m$ -dependent ones. In particular, asymptotic normality has been shown under various types of mixing conditions by e.g. [Reference Sen59], [Reference Yoshihara64, Reference Yoshihara65], and [Reference Dehling and Wendler15]. We are not aware of any paper on asymmetric $U$ -statistics with a mixing condition on the variables. Such results might be interesting for future research, but again in view of our applications, we have not pursued this and consider here only the $m$ -dependent case.
There are thus many previous results yielding asymptotic normality for $U$ -statistics under various conditions. One general feature, found already in the first paper [Reference Hoeffding27], is that there are degenerate cases where the asymptotic variance vanishes (typically because of some internal cancellations). In such cases, the theorems only yield convergence to 0 and do not imply asymptotic normality; indeed, typically a different normalization yields a non-normal limit. It is often difficult to calculate the asymptotic variance exactly, and it is therefore of great interest to have simple criteria that show that the asymptotic variance is non-zero. Such a criterion is well known for the standard case of (unconstrained) $U$ -statistics based on i.i.d. variables [Reference Hoeffding27]. We give corresponding (somewhat more complicated) criteria for the $m$ -dependent case studied here, in both the unconstrained and constrained cases. (This is one reason for considering only the $m$ -dependent case in the present paper, and not more general weakly dependent sequences.) We show the applicability of our criteria in some examples.
Like many (but not all) of the references cited above, we base our proof of asymptotic normality on the decomposition method of Hoeffding [Reference Hoeffding27], with appropriate modifications. As pointed out by an anonymous referee, an alternative method is to use dependency graphs together with Stein’s method, which under an extra moment assumption yields our main results on asymptotic normality together with an upper bound on the rate of convergence. We do not use this method in the main parts of the paper, partly because it does not seem to yield simple criteria for non-vanishing of the asymptotic variance; however, as a complement, we use this method to give some results on rate of convergence.
1.1. Applications
The background motivating our general results is given by some parallel results for pattern matching in random strings and in random permutations that have previously been shown by different methods, but easily follow from our results; we describe these results here and return to them (and some new results) in Sections 13 and 14. Further applications to pattern matching in random permutations restricted to two classes of permutations are given in [Reference Janson38].
First, consider a random string $\Xi _n=\xi _1\cdots \xi _n$ consisting of $n$ i.i.d. random letters from a finite alphabet $\mathcal{A}$ (in this context, this is known as a memoryless source), and consider the number of occurrences of a given word $\textbf{w}=w_1\dotsm w_\ell$ as a subsequence; to be precise, an occurrence of $\textbf{w}$ in $\Xi _n$ is an increasing sequence of indices $i_1\lt \ldots \lt i_\ell$ in $[n]=\{{1,\ldots,n}\}$ such that
This number, $N_n(\textbf{w})$ say, was studied by Flajolet, Szpankowski and Vallée [Reference Flajolet, Szpankowski and Vallée23], who proved that $N_n(\textbf{w})$ is asymptotically normal as $n\to \infty$ .
Flajolet, Szpankowski and Vallée [Reference Flajolet, Szpankowski and Vallée23] also studied a constrained version, where we are also given numbers $d_1,\ldots,d_{\ell -1}\in \mathbb{N}\cup \{\infty \}=\{{1,2,\ldots,\infty }\}$ and count only occurrences of $w$ such that
(Thus the $j$ th gap in $i_1,\ldots,i_\ell$ has length strictly less than $d_j$ .) We write ${\mathcal{D}}\,:\!=\,(d_1,\ldots,d_{\ell -1})$ , and let $N_n(\textbf{w};\,{\mathcal{D}})$ be the number of occurrences of $\textbf{w}$ that satisfy the constraints (1.2). It was shown in [Reference Flajolet, Szpankowski and Vallée23] that, for any fixed $\textbf{w}$ and $\mathcal{D}$ , $N_n(\textbf{w},{\mathcal{D}})$ is asymptotically normal as $n\to \infty$ . See also the book by [Jacquet and Szpankowski [Reference Jacquet and Szpankowski31], Chapter 5].
Remark 1.2. Note that $d_j=\infty$ means no constraint for the $j$ th gap. In particular, $d_1=\ldots =d_{\ell -1}=\infty$ yields the unconstrained case; we denote this trivial (but important) constraint $\mathcal{D}$ by ${\mathcal{D}}_\infty$ .
In the other extreme case, if $d_j=1$ , then $i_j$ and $i_{j+1}$ have to be adjacent. In particular, in the completely constrained case $d_1=\cdots =d_{\ell -1}=1$ , $N_n(\textbf{w};\,{\mathcal{D}})$ counts occurrences of $\textbf{w}$ as a substring $\xi _i\xi _{i+1}\cdots \xi _{i+\ell -1}$ . Substring counts have been studied by many authors; some references with central limit theorems or local limit theorems under varying conditions are [Reference Bender and Kochman4], [Reference Régnier and Szpankowski52], [Reference Nicodème, Salvy and Flajolet45], and [Reference Flajolet and Sedgewick22, Proposition IX.10, p. 660]. See also [Reference Szpankowski62, Section 7.6.2 and Example 8.8] and [Reference Jacquet and Szpankowski31]; the latter book discusses not only substring and subsequence counts but also other versions of substring matching problems in random strings.
Note also that the case when all $d_i\in \{{1,\infty }\}$ means that $\textbf{w}$ is a concatenation $\textbf{w}_1\dotsm \textbf{w}_b$ (with $\textbf{w}$ broken at positions where $d_i=\infty$ ), such that an occurrence now is an occurrence of each $\textbf{w}_i$ as a substring, with these substrings in order and non-overlapping, and with arbitrary gaps in between. (This is a special case of the generalized subsequence problem in [Reference Jacquet and Szpankowski31, Section 5.6]; the general case can be regarded as a sum of such counts over a set of $\textbf{w}$ .)
There are similar results for random permutations. Let $\mathfrak{S}_n$ be the set of the $n!$ permutations of $[n]$ . If $\pi =\pi _1\cdots \pi _n\in \mathfrak{S}_n$ and $\tau =\tau _1\cdots \tau _\ell \in \mathfrak{S}_\ell$ , then an occurrence of the pattern $\tau$ in $\pi$ is an increasing sequence of indices $i_1\lt \cdots \lt i_\ell$ in $[n]=\{{1,\ldots,n}\}$ such that the order relations in $\pi _{i_1}\cdots \pi _{i_\ell }$ are the same as in $\tau _1\cdots \tau _\ell$ , i.e., $\pi _{i_j}\lt \pi _{i_k}\iff \tau _j\lt \tau _k$ .
Let $N_n(\tau )$ be the number of occurrences of $\tau$ in $\boldsymbol{\pi }$ when ${\boldsymbol{\pi }}={\boldsymbol{\pi }}^{(n)}$ is uniformly random in $\mathfrak{S}_n$ . Bóna [Reference Bóna7] proved that $N_n(\tau )$ is asymptotically normal as $n\to \infty$ , for any fixed $\tau$ .
Also for permutations, one can consider, and count, constrained occurrences by again imposing the restriction (1.2) for some ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ . In analogy with strings, we let $N_n(\tau,{\mathcal{D}})$ be the number of constrained occurrences of $\tau$ in ${\boldsymbol{\pi }}^{(n)}$ when ${\boldsymbol{\pi }}^{(n)}$ is uniformly random in $\mathfrak{S}_n$ . This random number seems to have been studied mainly in the case when each $d_i\in \{{1,\infty }\}$ , i.e., some $i_j$ are required to be adjacent to the next one—in the permutation context, such constrained patterns are known as vincular patterns. Hofer [Reference Hofer29] proved asymptotic normality of $N_n(\tau,{\mathcal{D}})$ as $n\to \infty$ , for any fixed $\tau$ and vincular $\mathcal{D}$ . The extreme case with $d_1=\cdots =d_{\ell -1}=1$ was earlier treated by Bóna [Reference Bóna9]. Another (non-vincular) case that has been studied is that of $d$ -descents, given by $\ell =2$ , $\tau =21$ , and ${\mathcal{D}}=(d)$ ; Bóna [Reference Bóna8] shows asymptotic normality and Pike [Reference Pike50] gives a rate of convergence.
We unify these results by considering $U$ -statistics. It is well known and easy to see that the number $N_n(\textbf{w})$ of unconstrained occurrences of a given subsequence $\textbf{w}$ in a random string $\Xi _n$ can be written as an asymmetric $U$ -statistic; see Section 13 and (13.2) for details. There are general results on asymptotic normality of $U$ -statistics that extend the basic result by [Reference Hoeffding27] to the asymmetric case; see e.g. [Reference Janson35, Corollary 11.20] and [Reference Janson37]. Hence, asymptotic normality of $N_n(\textbf{w})$ follows directly from these general results. Similarly, it is well known that the pattern count $N_n(\tau )$ in a random permutation also can be written as a $U$ -statistic (see Section 14 for details), and again this can be used to prove asymptotic normality. (See [Reference Janson, Nakamura and Zeilberger39], with an alternative proof by this method of the result by Bóna [Reference Bóna7].)
The constrained case is different, since the constrained pattern counts are not $U$ -statistics. However, they can be regarded as constrained $U$ -statistics, which we define in (3.2) below in analogy with the constrained counts above. As stated above, in the present paper we prove general limit theorems for such constrained $U$ -statistics, which thus immediately apply to the constrained pattern counts discussed above in random strings and permutations.
The basic idea in the proofs is that a constrained $U$ -statistic based on a sequence $(X_i)$ can be written (possibly up to a small error) as an unconstrained $U$ -statistic based on another sequence $(Y_i)$ of random variables, where the new sequence $(Y_i)$ is $m$ -dependent (with a different $m$ ) if $(X_i)$ is. (However, even if $(X_i)$ is independent, $(Y_i)$ is in general not; this is our main motivation for considering $m$ -dependent sequences.) The unconstrained $m$ -dependent case then is treated by standard methods from the independent case, with appropriate modifications.
Section 2 contains some preliminaries. The unconstrained and constrained $U$ -statistics are defined in Section 3, where also the main theorems are stated. The degenerate case, when the asymptotic variance in the central limit theorem Theorem 3.3, 3.4, or 3.8 vanishes, is discussed later in Section 8, when more notation has been introduced; Theorems 8.1, 8.2, and 8.3, respectively, give criteria that can be used to show that the asymptotic variance is non-zero in an application. On the other hand, Example 8.1 shows that the degenerate case can occur in new ways for constrained $U$ -statistics.
The reduction to the unconstrained case and some other lemmas are given in Section 4, and then the proofs of the main theorems are completed in Sections 5–7 and 9–12. Section 13 gives applications to the problem on pattern matching in random strings discussed above. Similarly, Section 14 gives applications to pattern matching in random permutations. Some further comments and open problems are given in Section 15. The appendix contains some further results on subsequence counts in random strings.
2. Preliminaries
2.1. Some notation
A constraint is, as in Section 1, a sequence ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})\in (\mathbb{N}\cup \{\infty \})^{\ell -1}$ , for some given $\ell \geqslant 1$ . Recall that the special constraint $(\infty,\ldots,\infty )$ is denoted by ${\mathcal{D}}_\infty$ . Given a constraint $\mathcal{D}$ , define $b=b({\mathcal{D}})$ by
We say that $b$ is the number of blocks defined by $\mathcal{D}$ ; see further Section 4 below.
For a random variable $Z$ , and $p\gt 0$ , we let $\lVert{Z}\rVert _p\,:\!=\, ({{\mathbb{E}}[{|Z|^p}]} )^{1/p}$ .
We use $\overset{d}{\longrightarrow }$ , $\overset{p}{\longrightarrow }$ , and $\overset{a.s.}{\longrightarrow }$ for convergence of random variables in distribution, in probability, and almost surely (a.s.), respectively. For a sequence of random variables $(Z_n)$ , and a sequence $a_n\gt 0$ , we write $Z_n=o_{p}(a_n)$ when $Z_n/a_n \overset{p}{\longrightarrow } 0$ .
Unspecified limits are as $n\to \infty$ . $C$ denotes an unspecified constant, which may be different at each occurrence. ( $C$ may depend on parameters that are regarded as fixed, for example the function $f$ below; this will be clear from the context.)
We use the convention $\binom nk\,:\!=\,0$ if $n\lt 0$ . (We will always have $k\geqslant 0$ .) Some further standard notation: $[n]\,:\!=\,\{{1,\ldots,n}\}$ , and $\max \emptyset \,:\!=\,0$ . All functions are tacitly assumed to be measurable.
2.2. $m$ -dependent variables
For reasons mentioned in the introduction, we will consider $U$ -statistics not only based on sequences of independent random variables, but also based on $m$ -dependent variables.
Recall that a (finite or infinite) sequence of random variables $(X_i)_{i}$ is $m$ -dependent if the two families $\{{X_i}\}_{i\leqslant k}$ and $\{{X_i}\}_{i\gt k+m}$ of random variables are independent of each other for every $k$ . (Here, $m\geqslant 0$ is a given integer.) In particular, 0-dependent is the same as independent; thus the important independent case is included as the special case $m=0$ below.
It is well known that if $(X_i)_{i\in I}$ is $m$ -dependent, and $I_1,\ldots,I_r\subseteq I$ are sets of indices such that ${dist}(I_j,I_k)\,:\!=\,\inf \{{|i-i^{\prime}|\,:\,i\in I_j,i^{\prime}\in I_k}\}\gt m$ when $j\neq k$ , then the families (vectors) of random variables $(X_i)_{i\in I_1}$ , …, $(X_i)_{i\in I_r}$ are mutually independent of each other. (To see this, note first that it suffices to consider the case when each $I_j$ is an interval; then use the definition and induction on $r$ .) We will use this property without further comment.
In practice, $m$ -dependent sequences usually occur as block factors; i.e. they can be expressed as
for some i.i.d. sequence $(\xi _i)$ of random variables (in some measurable space ${\mathcal{S}}_0$ ) and a fixed function $h$ on ${\mathcal{S}}_0^{m+1}$ . (It is obvious that (2.2) then defines a stationary $m$ -dependent sequence.)
3. $U$ -statistics and main results
Let $X_1,X_2,\ldots$ be a sequence of random variables, taking values in some measurable space $\mathcal{S}$ , and let $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ be a (measurable) function of $\ell$ variables, for some $\ell \geqslant 1$ . Then the corresponding $U$ -statistic is the (real-valued) random variable defined for each $n\geqslant 0$ by
$U$ -statistics were introduced by Hoeffding [Reference Hoeffding27], who proved a general central limit theorem; the present paper gives an extension of his result that builds on his methods.
Remark 3.1. Of course, for the definition (3.1) it suffices to have a finite sequence $(X_i)_1^{n}$ , but in the present paper we will only consider the initial segments of an infinite sequence.
Remark 3.2. Many authors, including Hoeffding [Reference Hoeffding27], define $U_n$ by dividing the sum in (3.1) by $\binom n\ell$ , the number of terms in it. We find it more convenient for our purposes to use the non-normalized version above.
Remark 3.3. Many authors, including Hoeffding [Reference Hoeffding27], assume that $f$ is a symmetric function of its $\ell$ variables. In this case, the order of the variables does not matter, and in (3.1) we can sum over all sequences $i_1,\ldots,i_\ell$ of $\ell$ distinct elements of $\{{1,\ldots,n}\}$ , up to an obvious factor of $\ell !$ . (Hoeffding [Reference Hoeffding27] gives both versions.) Conversely, if we sum over all such sequences, we may without loss of generality assume that $f$ is symmetric. However, in the present paper (as in several earlier papers by various authors) we consider the general case of (3.1) without assuming symmetry; for emphasis, we call this an asymmetric $U$ -statistic. (This is essential in our applications to pattern matching.) Note that for independent $(X_i)_1^n$ , the asymmetric case can be reduced to the symmetric case by the trick in Remark 11.21, in particular (11.20), of [Reference Janson35]; see also [Reference Janson, Nakamura and Zeilberger39, (15)] and (A.18) below. However, this trick does not work in the $m$ -dependent or constrained cases studied here, so we cannot use it here.
As stated in the introduction, we also consider constrained $U$ -statistics. Given a constraint ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ , we define the constrained $U$ -statistic
where we thus impose the constraints (1.2) on the indices.
We define further the exactly constrained $U$ -statistic
where we thus specify each gap either exactly or (when $d_j=\infty$ ) not at all. In the vincular case, when all $d_j$ are either 1 or $\infty$ , there is no difference and we have $U_n(f;\,{\mathcal{D}})=U_n(f;\,{{\mathcal{D}}{=}})$ .
Note that, trivially, each constrained $U$ -statistic can be written as a sum of exactly constrained $U$ -statistics:
where we sum over all constraints ${\mathcal{D}}^{\prime}=(d^{\prime}_1,\ldots, d^{\prime}_\ell )$ with
Remark 3.4. As stated in the introduction, the [exactly] constrained $U$ -statistics thus belong to the large class of incomplete $U$ -statistics [Reference Blom6], where the summation in (3.1) is restricted to some, in principle arbitrary, subset of the set of all $\ell$ -tuples $(i_1,\ldots,i_\ell )$ in $[n]$ .
The standard setting, in [Reference Hoeffding27] and many other papers, is to assume that the underlying random variables $X_i$ are i.i.d. In the present paper we consider a more general case, and we will assume only that $X_1,X_2,\ldots$ is an infinite stationary $m$ -dependent sequence, for some fixed integer $m\geqslant 0$ ; see Section 2.2 for the definition, and recall in particular that the special case $m=0$ yields the case of independent variables $X_i$ .
We will consider limits as $n\to \infty$ . The sequence $X_1,X_2,\ldots$ (and thus the space $\mathcal{S}$ and the integer $m$ ) and the function $f$ (and thus $\ell$ ) will be fixed, and do not depend on $n$ .
We will throughout assume the following moment condition for $p=2$ ; in a few places (always explicitly stated) we also assume it for some larger $p$ :
-
( ${A}_p$ ) $\quad{\mathbb{E}}|f(X_{i_1},\ldots,X_{i_\ell })|^p\lt \infty\quad $ for every $i_1\lt \cdots \lt i_\ell$ .
Note that in the independent case ( $m=0$ ), it suffices to verify $({A}_p)$ for a single sequence $i_1,\ldots,i_\ell$ , for example $1,\ldots,\ell$ . In general, it suffices to verify $({A}_p)$ for all sequences with $i_1=1$ and $i_{j+1}-i_j\leqslant m+1$ for every $j\leqslant{\ell -1}$ , since the stationarity and $m$ -dependence imply that every larger gap can be reduced to $m+1$ without changing the distribution of $f(X_{i_1},\ldots,X_{i_\ell })$ . Since there is only a finite number of such sequences, it follows that ( ${A}_2$ ) is equivalent to the uniform bound
and similarly for $({A}_p)$ .
3.1. Expectation and law of large numbers
We first make an elementary observation on the expectations ${\mathbb{E}} U_n(f;\,{\mathcal{D}})$ and ${\mathbb{E}} U_n(f;\,{{\mathcal{D}}{=}})$ . These can be calculated exactly by taking the expectation inside the sums in (3.2) and (3.3). In the independent case, all terms have the same expectation, so it remains only to count the number of them. In general, because of the $m$ -dependence of $(X_i)$ , the expectations of the terms in (3.3) are not all equal, but most of them coincide, and it is still easy to find the asymptotics.
Theorem 3.1. Let $(X_i)_1^\infty$ be a stationary $m$ -dependent sequence of random variables with values in a measurable space $\mathcal{S}$ , let $\ell \geqslant 1$ , and let $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ satisfy ( ${A}_2$ ). Then, as $n\to \infty$ , with $\mu$ given by (5.1) below,
More generally, let ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be a constraint, and let $b\,:\!=\,b({\mathcal{D}})$ . Then, as $n\to \infty$ , for some real numbers $\mu _{\mathcal{D}}$ and $\mu _{{{\mathcal{D}}{=}}}$ given by (5.5) and (5.4),
If $m=0$ , i.e., the sequence $(X_i)_1^\infty$ is i.i.d., then, moreover,
The straightforward proof is given in Section 5, where we also give formulas for $\mu _{\mathcal{D}}$ and $\mu _{{\mathcal{D}}{=}}$ in the general case, although in an application it might be simpler to find the leading term of the expectation directly.
Next, we have a corresponding strong law of large numbers, proved in Section 7. This extends well-known results in the independent case; see [Reference Hoeffding28, Reference Janson37, Reference Sen57].
Theorem 3.2. Let $(X_i)_1^\infty$ be a stationary $m$ -dependent sequence of random variables with values in a measurable space $\mathcal{S}$ , let $\ell \geqslant 1$ , and let $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ satisfy ( ${A}_2$ ). Then, as $n\to \infty$ , with $\mu$ given by (5.1),
More generally, let ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be a constraint, and let $b\,:\!=\,b({\mathcal{D}})$ . Then, as $n\to \infty$ ,
where $\mu _{\mathcal{D}}$ and $\mu _{{{\mathcal{D}}{=}}}$ , as in Theorem 3.1, are given by (5.5) and (5.4).
Equivalently,
Remark 3.5. For convenience, we assume ( ${A}_2$ ) in Theorem 3.2 as in the rest of the paper, which leads to a simple proof. We conjecture that the theorem holds assuming only ( ${A}_1$ ) (i.e., finite first moments) instead of ( ${A}_2$ ), as in [Reference Hoeffding28, Reference Janson37] for the independent case.
3.2. Asymptotic normality
We have the following theorems yielding asymptotic normality. The proofs are given in Section 6.
The first theorem is for the unconstrained case, and extends the basic theorem by Hoeffding [Reference Hoeffding27] for symmetric $U$ -statistics based on independent $(X_i)_1^\infty$ to the asymmetric and $m$ -dependent case. Note that both these extensions have previously been treated, but separately. For symmetric $U$ -statistics in the $m$ -dependent setting, asymptotic normality was proved by Sen [Reference Sen58] (at least assuming a third moment); moreover, bounds on the rate of convergence (assuming a moment condition) were given by Malevich and Abdalimov [Reference Malevich and Abdalimov43]. The asymmetric case with independent $(X_i)_1^\infty$ has been treated e.g. in [Reference Janson35, Corollary 11.20] and [Reference Janson37]; furthermore, as stated in Remark 3.3, for independent $(X_i)$ , the asymmetric case can be reduced to the symmetric case by the method in [Reference Janson35, Remark 11.21].
Theorem 3.3. Let $(X_i)_1^\infty$ be a stationary $m$ -dependent sequence of random variables with values in a measurable space $\mathcal{S}$ , let $\ell \geqslant 1$ , and let $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ satisfy ( ${A}_2$ ). Then, as $n\to \infty$ ,
for some $\sigma ^2=\sigma ^2(f)\in [0,\infty )$ , and
The second theorem extends Theorem 3.3 to the constrained cases.
Theorem 3.4. Let $(X_i)_1^\infty$ be a stationary $m$ -dependent sequence of random variables with values in a measurable space $\mathcal{S}$ , let $\ell \geqslant 1$ , and let $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ satisfy ( ${A}_2$ ). Let ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be a constraint, and let $b\,:\!=\,b({\mathcal{D}})$ . Then, as $n\to \infty$ ,
for some $\sigma ^2=\sigma ^2(f;\,{\mathcal{D}})\in [0,\infty )$ , and
The same holds, with some (generally different) $\sigma ^2=\sigma ^2(f;\,{{\mathcal{D}}{=}})$ , for the exactly constrained $U_n(f;\,{{\mathcal{D}}{=}})$ .
Remark 3.6. It follows immediately by the Cramér–Wold device [Reference Gut25, Theorem 5.10.5] (i.e., considering linear combinations), that Theorem 3.3 extends in the obvious way to joint convergence for any finite number of different $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ , with $\sigma ^2$ now a covariance matrix. Moreover, the proof shows that this holds also for a family of different $f$ with (possibly) different $\ell \geqslant 1$ .
Similarly, Theorem 3.4 extends to joint convergence for any finite number of different $f$ (possibly with different $\ell$ and $\mathcal{D}$ ); this follows by the proof below, which reduces the results to Theorem 3.3.
Remark 3.7. The asymptotic variance $\sigma ^2$ in Theorems 3.3 and 3.4 can be calculated explicitly; see Remark 6.2.
Remark 3.8. Note that it is possible that the asymptotic variance $\sigma ^2=0$ in Theorems 3.3 and 3.4; in this case, (3.19) and (3.21) just give convergence in probability to 0. This degenerate case is discussed in Section 8.
Remark 3.9. We do not consider extensions to triangular arrays where $f$ or $X_i$ (or both) depend on $n$ . In the symmetric $m$ -dependent case, such a result (with fixed $\ell$ but possibly increasing $m$ , under suitable conditions) has been shown by [Reference Malevich and Abdalimov43], with a bound on the rate of convergence. In the independent case, results for triangular arrays are given by e.g. [Reference Rubin and Vitale56] and [Reference Jammalamadaka and Janson32]; see also [Reference Janson and Szpankowski40] for the special case of substring counts $N_n(\textbf{w})$ with $\textbf{w}$ depending on $n$ (and growing in length). It seems to be an interesting (and challenging) open problem to formulate useful general theorems for constrained $U$ -statistics in such settings.
3.3. Rate of convergence
Under stronger moment assumptions on $f$ , an alternative method of proof (suggested by a referee) yields the asymptotic normality in Theorems 3.3 and 3.4 together with an upper bound on the rate of convergence, provided $\sigma ^2\gt 0$ .
In the following theorem of Berry–Esseen type, we assume for simplicity that $f$ is bounded (as it is in our applications in Sections 13–14); see further Remark 9.1. Let $d_K$ denote the Kolmogorov distance between distributions; recall that for two distributions ${\mathcal{L}}_1,{\mathcal{L}}_2$ with distribution functions $F_1(x)$ and $F_2(x)$ , $d_K=d_K({\mathcal{L}}_1,{\mathcal{L}}_2)\,:\!=\,\sup _x|F_1(x)-F_2(x)|$ ; we use also the notation $d_K(X,{\mathcal{L}}_2)\,:\!=\,d_K({\mathcal{L}}(X),{\mathcal{L}}_2)$ for a random variable $X$ .
Theorem 3.5. Suppose in addition to the hypotheses in Theorem 3.3 or 3.4 that $\sigma ^2\gt 0$ and that $f$ is bounded. Then
where $U_n$ denotes $U_n(f)$ , $U_n(f,{\mathcal{D}})$ , or $U_n(f;\,{{\mathcal{D}}{=}})$ .
In the symmetric and unconstrained case, this (and more) was shown by Malevich and Abdalimov [Reference Malevich and Abdalimov43]. The proof of Theorem 3.5 is given in Section 9, together with further remarks.
3.4. Moment convergence
Theorems 3.3 and 3.4 include convergence of the first (trivially) and second moments in (3.19) and (3.21). This extends to higher moments under a corresponding moment condition on $f$ . (The unconstrained case with independent $X_i$ was shown in [Reference Janson37, Theorem 3.15].)
Theorem 3.6. Suppose in addition to the hypotheses in Theorems 3.3 or 3.4 that ( ${A}_{p}$ ) holds for some real $p\geqslant 2$ . Then all absolute and ordinary moments of order up to $p$ converge in (3.19) or (3.21).
The proof is given in Section 10, where we also give related estimates for maximal functions.
3.5. Functional limit theorems
We can extend Theorem 3.4 to functional convergence. For unconstrained $U$ -statistics, this was done by Miller and Sen [Reference Miller and Sen44] in the classical case of independent $X_i$ and symmetric $f$ ; the asymmetric case is [Reference Janson37, Theorem 3.2]; furthermore, Yoshihara [Reference Yoshihara65] proved the case of dependent $X_i$ satisfying a suitable mixing condition (assuming a technical condition on $f$ besides symmetry).
Theorem 3.7. Suppose that ( ${A}_2$ ) holds. Then as $n\to \infty$ , with $b=b({\mathcal{D}})$ , in $D[0,\infty )$ ,
where $Z(t)$ is a continuous centered Gaussian process. Equivalently, in $D[0,\infty )$ ,
The same holds for exact constraints. Moreover, the results hold jointly for any finite set of $f$ and $\mathcal{D}$ (possibly with different $\ell$ and $b$ ), with limits $Z(t)$ depending on $f$ and $\mathcal{D}$ .
The proof is given in Section 11.
Remark 3.10. A comparison between (3.23) and (3.21) yields $Z(t)\sim \textsf{N}\big(0,t^{2b-1}\sigma ^2\big)$ , with $\sigma ^2$ as in Theorem 3.4. Equivalently, ${Var}\ Z(t) = t^{2b-1}\sigma ^2$ , which can be calculated by Remark 6.2. Covariances ${Cov} ({Z(s),Z(t)} )$ can be calculated by the same method and (11.20) in the proof; we leave the details to the reader. Note that these covariances determine the distribution of the process $Z$ .
3.6. Renewal theory
Assume further that $h\,:\,{\mathcal{S}}\to \mathbb{R}$ is another (fixed) measurable function, with
We define
and, for $x\gt 0$ ,
$N_-(x)$ and $N_+(x)$ are finite a.s. by the law of large numbers for $S_n$ (12.1); see further Lemma 12.1. We let $N_\pm (x)$ denote either $N_-(x)$ or $N_+(x)$ , in statements and formulas that are valid for both.
Remark 3.11. In [Reference Janson37], instead of $h(x)$ we consider more generally a function of several variables, and we define $N_\pm$ using the corresponding $U$ -statistic instead of $S_n$ . We believe that the results of the present paper can be extended to that setting, but we have not pursued this, and leave it as an open problem.
Remark 3.12. If $h(X_1)\geqslant 0$ a.s., which is often assumed in renewal theory, then $N_+(x)=N_-(x)+1$ . However, if $h$ may be negative (still assuming (3.25)), then $N_-(x)$ may be larger than $N_+(x)$ . Nevertheless, the difference is typically small, and we obtain the same asymptotic results for both $N_+$ and $N_-$ . (We can also obtain the same results if we instead use $S_n\lt x$ or $S_n\geqslant x$ in the definitions.)
In this situation, we have the following limit theorems, which extend results in [Reference Janson37]. Proofs are given in Section 12. For an application, see [Reference Janson38].
Theorem 3.8. With the assumptions and notation of Theorem 3.4, assume ( ${A}_2$ ), and suppose also that $\nu \,:\!=\,{\mathbb{E}} h(X_1)\gt 0$ and ${\mathbb{E}} h(X_1)^2\lt \infty$ . Then, with notation as above, as $x\to \infty$ ,
for some $\gamma ^2=\gamma ^2(f;\,h;\,{\mathcal{D}})\geqslant 0$ .
The same holds for exact constraints. Moreover, the results hold jointly for any finite set of $f$ and $\mathcal{D}$ (possibly with different $\ell$ and $b$ ).
Theorem 3.9. Suppose in addition to the hypotheses in Theorem 3.8 that $h(X_1)$ is integer-valued and that $(X_i)_1^\infty$ are independent. Then (3.29) holds also conditioned on $S_{N_-(x)}=x$ for integers $x\to \infty$ .
We consider here tacitly only $x$ such that $\mathbb{P} ({S_{N_-(x)}=x} )\gt 0$ .
Remark 3.13. We prove Theorem 3.9 only for independent $X_i$ (which, in any case, is our main interest, as stated in the introduction). It seems likely that the result can be extended to at least some $m$ -dependent $(X_i)$ , using a modification of the proof below and the $m$ -dependent renewal theorem (under some conditions) [Reference Alsmeyer and Hoefs1, Corollary 4.2], but we have not pursued this.
Theorem 3.10. Suppose in addition to the hypotheses in Theorem 3.8 that ( ${A}_{p}$ ) holds and ${\mathbb{E}} [{|h(X_1)|^p} ]\lt \infty$ for every $p\lt \infty$ . Then all moments converge in (3.29).
Under the additional hypothesis in Theorem 3.9, this holds also conditioned on $S_{N_-(x)}=x$ .
Remark 3.14. In Theorem 3.10, unlike Theorem 3.6, we assume $p$ th moments for all $p$ , and conclude convergence of all moments. If we only want to show convergence for a given $p$ , some sufficient moment conditions on $f$ and $h$ can be derived from the proof, but we do not know any sharp results and have not pursued this. Cf. [Reference Janson37, Remark 6.1] and the references there.
4. Some lemmas
We give here some lemmas that will be used in the proofs in later sections. In particular, they will enable us to reduce the constrained cases to the unconstrained one.
Let ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be a given constraint. Recall that $b=b({\mathcal{D}})$ is given by (2.1), and let $1=\beta _1\lt \ldots \lt \beta _b$ be the indices in $[\ell ]$ just after the unconstrained gaps; in other words, $\beta _j$ are defined by $\beta _1\,:\!=\,1$ and $d_{\beta _j-1}=\infty$ for $j=2,\ldots,b$ . For convenience we also define $\beta _{b+1}\,:\!=\,\ell +1$ . We say that the constraint $\mathcal{D}$ separates the index set $[\ell ]$ into the $b$ blocks $B_1,\ldots,B_b$ , where $B_k\,:\!=\,\{{\beta _k,\ldots,\beta _{k+1}-1}\}$ . Note that the constraints (1.2) thus are constraints on $i_j$ for $j$ in each block separately.
Lemma 4.1. Let $(X_i)_1^\infty$ be a stationary $m$ -dependent sequence of random variables with values in $\mathcal{S}$ , let $\ell \geqslant 1$ , and let $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ satisfy ( ${A}_2$ ). Let ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be a constraint. Then
Furthermore,
Moreover, the same estimates hold for $U_n(f;\,{{\mathcal{D}}{=}})$ .
Proof. The definition (3.2) yields
Let $d_*$ be the largest finite $d_j$ in the constraint $\mathcal{D}$ , i.e.,
The constraints imply that for each block $B_q$ and all indices $k\in B_q$ , coarsely,
It follows that if $|i_{\beta _r}-j_{\beta _s}|\gt d_*\ell +m$ for all $r,s\in [b]$ , then $|i_\alpha -j_\beta |\gt m$ for all $\alpha,\beta \in [\ell ]$ . Since $(X_i)_1^\infty$ is $m$ -dependent, this implies that the two random vectors $ ({X_{i_1},\ldots,X_{i_\ell }} )$ and $ ({X_{j_1},\ldots,X_{j_\ell }} )$ are independent, and thus the corresponding term in (4.3) vanishes.
Consequently, we only have to consider terms in the sum in (4.3) such that
for some $r,s\in [b]$ . For each of the $O(1)$ choices of $r$ and $s$ , we can choose $i_{\beta _1},\ldots,i_{\beta _b}$ in at most $n^b$ ways; then $j_{\beta _s}$ in $O(1)$ ways such that (4.6) holds; then the remaining $j_{\beta _q}$ in $O\big(n^{b-1}\big)$ ways; then, finally, all remaining $i_k$ and $j_k$ in $O(1)$ ways because of (4.5). Consequently, the number of non-vanishing terms in (4.3) is $O\big(n^{2b-1}\big)$ . Moreover, each term is $O(1)$ by (3.6) and the Cauchy–Schwarz inequality, and thus (4.1) follows.
For (4.2), we note that $U_n(f;\,{\mathcal{D}})-U_{n-1}(f;\,{\mathcal{D}})$ is the sum in (3.2) with the extra restriction $i_\ell =n$ . Hence, its variance can be expanded as in (4.3), with the extra restrictions $i_\ell =j_\ell =n$ . We then argue as above, but note that (4.5) and $i_\ell =n$ imply that there are only $O(1)$ choices of $i_b$ , and hence $O\big(n^{b-1}\big)$ choices of $i_1,\ldots,i_b$ . We thus obtain $O \big({n^{2b-2}} \big)$ non-vanishing terms in the sum, and (4.2) follows.
The argument for the exactly constrained $U_n(f;\,{{\mathcal{D}}{=}})$ is the same (and slightly simpler). (Alternatively, we could do this case first, and then use (3.4) to obtain the results for $U_n(f;\,{\mathcal{D}})$ .)
The next lemma is the central step in the reduction to the unconstrained case.
Lemma 4.2. Let $(X_i)_1^\infty$ , $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ , and ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be as in Lemma 4.1, and let
Let $M\gt D$ and define
Then there exists a function $g=g_{{{\mathcal{D}}{=}}}\,:\,({\mathcal{S}}^M)^b\to \mathbb{R}$ such that for every $n\geqslant 0$ ,
with $U_{n-D}(g)\,:\!=\,0$ when $n\lt D$ . Furthermore,
for every $j_1\lt \cdots \lt j_b$ .
Proof. For each block $B_q=\{{\beta _q,\ldots,\beta _{q+1}-1}\}$ defined by $\mathcal{D}$ , let
Note that $t_{q1}=0$ for every $q$ and that $t_{qr},u_q\lt \infty$ . (We stop the summation in (4.13) just before the next infinite $d_j$ , which occurs for $j=\beta _{q+1}-1$ provided $q\lt b$ .) Note also that
We then rewrite (3.3) as follows, letting $k_q\,:\!=\,i_{\beta _q}$ and grouping the arguments of $f$ according to the blocks of $\mathcal{D}$ (using an obvious notation for this):
Change summation variables by $k_q=j_q+v_q$ . Then, recalling (4.14)–(4.15), (4.16) yields
Define, for $y_i=(y_{ik})_{k=1}^M\in{\mathcal{S}}^M$ ,
(Note that $v_j+t_{jr}+1\leqslant v_j + u_j +1 \leqslant D+1\leqslant M$ .) We have $Y_j=(X_{j+k-1})_{k=1}^M$ , and thus (4.18) yields
Lemma 4.3. Let $(X_i)_1^\infty$ and ${\mathcal{D}}=(d_1,\ldots,d_{\ell -1})$ be as in Lemma 4.1, and let $M$ and $Y_i$ be as in Lemma 4.2. For every $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ such that ( ${A}_2$ ) holds, there exist functions $g_{{\mathcal{D}}},g_{{{\mathcal{D}}{=}}}\,:\,({\mathcal{S}}^M)^b\to \mathbb{R}$ such that (4.10) holds for both, and
Proof. First, letting $g_{{{\mathcal{D}}{=}}}$ be as in Lemma 4.2, we have by (4.9)
Thus (4.21) follows by (4.2) in Lemma 4.1 applied to $g_{{{\mathcal{D}}{=}}}$ , the trivial constraint ${\mathcal{D}}_\infty$ (i.e., no constraint), and $(Y_i)_1^\infty$ .
Next, we recall (3.4) and define
again summing over all constraints ${\mathcal{D}}^{\prime}$ satisfying (3.5). This is a finite sum, and by (3.4) and (4.23),
To avoid some of the problems caused by dependencies between the $X_i$ , we follow Sen [Reference Sen58] and introduce another type of constrained $U$ -statistics, where we require the gaps between the summation indices to be large, instead of small as in (3.2). We need only one case, and define
summing only over terms where all gaps $i_{j+1}-i_j\gt m$ , $j=1,\ldots,\ell -1$ . (The advantage is that in each term in (4.25), the variables $X_{i_1},\ldots,X_{i_\ell }$ are independent.)
Lemma 4.4. Let $(X_i)_1^\infty$ and $f\,:\,{\mathcal{S}}^\ell \to \mathbb{R}$ be as in Lemma 4.1. Then
Proof. We can express the type of constrained $U$ -statistic in (4.25) as a combination of constrained $U$ -statistics of the previous type by the following inclusion–exclusion argument:
where we sum over the $2^{\ell -1}$ subsets $J$ of $[{\ell -1}]$ , and use the constraints
We have $b({\mathcal{D}}_{J})=\ell -|J|$ , and thus $b({\mathcal{D}}_J)\lt \ell$ unless $J=\emptyset$ . Moreover, ${\mathcal{D}}_\emptyset =(\infty,\ldots,\infty )={\mathcal{D}}_\infty$ , and this means no constraint, so $U_n(f;\,{\mathcal{D}}_\emptyset )=U_n(f)$ , the unconstrained $U$ -statistic. Consequently, by (4.27) and Lemma 4.1,
which proves the estimate (4.26).
4.1. Triangular arrays
We will also use a central limit theorem for $m$ -dependent triangular arrays satisfying the Lindeberg condition, which we state as Theorem 4.1 below. The theorem is implicit in Orey [Reference Orey48]; it follows from his theorem there exactly as his corollary does, although the latter is stated for a sequence and not for a triangular array. See also [Peligrad [Reference Peligrad49], Theorem 2.1], which contains the theorem below (at least for $\sigma ^2\gt 0$ ; the case $\sigma ^2=0$ is trivial), and is much more general in that it only assumes strong mixing instead of $m$ -dependence.
Recall that a triangular array is an array $(\xi _{ni})_{1\leqslant i\leqslant n\lt \infty }$ of random variables, such that the variables $(\xi _{ni})_{i=1}^n$ in a single row are defined on a common probability space. (As usual, it is only for convenience that we require that the $n$ th row has length $n$ ; the results extend to arbitrary lengths $N_n$ .) We are here mainly interested in the case when each row is an $m$ -dependent sequence; in this case, we say that $(\xi _{ni})$ is an $m$ -dependent triangular array. (We make no assumption on the relationships between variables in different rows; these may even be defined on different probability spaces.)
Theorem 4.1. (Orey [Reference Orey48].) Let $(\xi _{ni})_{1\leqslant i\leqslant n\lt \infty }$ be an $m$ -dependent triangular array of real-valued random variables with ${\mathbb{E}}\xi _{ni}=0$ . Let $\widehat{S}_n\,:\!=\,\sum _{i=1}^n \xi _{ni}$ . Assume that, as $n\to \infty$ ,
that the $\xi _{ni}$ satisfy the Lindeberg condition
and that
Then, as $n\to \infty$ ,
Note that Theorem 4.1 extends the standard Lindeberg–Feller central limit theorem for triangular arrays with row-wise independent variables (see e.g. [Reference Gut25, Theorem 7.2.4]), to which it reduces when $m=0$ .
Remark 4.1. In fact, the assumption (4.32) is not needed in Theorem 4.1; see [Reference Utev63, Theorem 2.1]. However, it is easily verified in our case (and many other applications), so we need only this classical result.
5. The expectation
The expectation of a (constrained) $U$ -statistic, and in particular its leading term, is easily found from the definition. Nevertheless, we give a detailed proof of Theorem 3.1, for completeness and for later reference.
Proof of Theorem 3.1. Consider first the unconstrained case. We take expectations in (3.1). The sum in (3.1) has $\binom{n}{\ell }$ terms. We consider first the terms that satisfy the restriction $i_{j+1}\gt i_j+m$ for every $j\in [{\ell -1}]$ (i.e., the terms in (4.25)). As noted above, in each such term, the variables $X_{j_1},\ldots,X_{j_\ell }$ are independent. Hence, let $(\widehat{X}_i)_1^\ell$ be an independent sequence of random variables in $\mathcal{S}$ , each with the same distribution as $X_1$ (and thus as each $X_j$ ), and define
Then
for every sequence of indices $i_1,\ldots,i_\ell$ with $i_{j+1}\gt i_j+m$ for all $j\in [{\ell -1}]$ . Moreover, the number of terms in (3.1) that do not satisfy these constraints is $O ({n^{{\ell -1}}} )$ , and their expectations are uniformly $O(1)$ as a consequence of (3.6). Thus, (3.7) follows from (3.1).
Next, consider the exactly constrained case. We use Lemma 4.2 and then apply the unconstrained case just treated to $g$ and $(Y_i)$ ; this yields
with $\widehat{Y}_1,\ldots,\widehat{Y}_b\overset{d}{=} Y_1$ independent. Using (4.19), and the notation there, this yields (3.9) with
for any sequence $j_1,\ldots,j_b$ with $j_{k+1}-j_k\geqslant m+M$ for all $k\in [b-1]$ . (Note that $(Y_i)_1^\infty$ is $(m+M-1)$ -dependent.)
Finally, the constrained case (3.8) follows by (3.9) and the decomposition (3.4), with
summing over all ${\mathcal{D}}^{\prime}$ satisfying (3.5).
In the independent case $m=0$ , the results above simplify. First, for the unconstrained case, the formula for $\mu$ in (3.10) is a special case of (5.2). Similarly, in the exactly unconstrained case, (5.4) yields the formula for $\mu _{{\mathcal{D}}{=}}$ in (3.10). Finally, (3.10) shows that $\mu _{{\mathcal{D}}{=}}$ does not depend on $\mathcal{D}$ , and thus all terms in the sum in (5.5) are equal to $\mu$ . Furthermore, it follows from (3.5) that the number of terms in the sum is $\prod _{d_j\lt \infty }d_j$ , and (3.11) follows.
Alternatively, in the independent case, all terms in the sums in (3.1), (3.2), and (3.3) have the same expectation $\mu$ given by (3.10), and the result follows by counting the number of terms. In particular, exactly,
and, with $D$ given by (4.7),
6. Asymptotic normality
The general idea of the proof of Theorem 3.3 is to use the projection method of Hoeffding [Reference Hoeffding27], together with modifications as in [Reference Sen58] to treat $m$ -dependent variables and modifications as in e.g. [Reference Janson37] to treat the asymmetric case. We then obtain the constrained version, Theorem 3.4, by reduction to the unconstrained case.
Proof of Theorem 3.3. We first note that by Lemma 4.4, it suffices to prove (3.18)–(3.19) for $U_n(f;\,\gt m)$ . (This uses standard arguments with Minkowski’s inequality and the Cramér–Slutsky theorem [Reference Gut25, Theorem 5.11.4], respectively; we omit the details. The same arguments are used several times below without comment.)
As remarked above, the variables inside each term in the sum in (4.25) are independent; this enables us to use Hoeffding’s decomposition for the independent case, which (in the present, asymmetric case) we define as follows.
As in Section 5, let $(\widehat{X}_i)_1^\ell$ be an independent sequence of random variables in $\mathcal{S}$ , each with the same distribution as $X_1$ . Recall $\mu$ defined in (5.1), and, for $i=1,\ldots,\ell$ , define the function $f_i$ as the one-variable projection
Equivalently,
(In general, $f_i$ is defined only ${\mathcal{L}}(\widehat{X}_i)$ -almost everywhere (a.e.), but it does not matter which version we choose.) Define also the residual function
Note that the variables $f_i(X_j)$ are centered by (5.1) and (6.2):
Furthermore, ( ${A}_2$ ) implies that $f_i(\widehat{X}_i)$ , and thus each $f_i(X_j)$ , is square integrable.
The essential property of $f_*$ is that, as an immediate consequence of the definitions and (6.4), its one-variable projections vanish:
We assume from now on for simplicity that $\mu =0$ ; the general case follows by replacing $f$ by $f-\mu$ . Then (4.25) and (6.3) yield, by counting the terms where $i_j=k$ for given $j$ and $k$ ,
Let us first dispose of the last term in (6.6). Let $i_1\lt \ldots \lt i_\ell$ and $j_1\lt \ldots \lt j_\ell$ be two sets of indices such that the constraints $i_{k+1}-i_k\gt m$ and $j_{k+1}-j_k\gt m$ in (4.25) hold. First, as in the proof of Lemma 4.1, if also $|i_\alpha -j_\beta |\gt m$ for all $\alpha,\beta \in [\ell ]$ , then all $X_{i_\alpha }$ and $X_{j_\beta }$ are independent; thus $f_*(X_{i_1},\ldots,X_{i_\ell })$ and $f_*(X_{j_1},\ldots,X_{j_\ell })$ are independent, and
Moreover, suppose that $|i_\alpha -j_\beta |\gt m$ for all but one pair $(\alpha,\beta )\in [\ell ]^2$ , say for $(\alpha,\beta )\neq (\alpha _0,\beta _0)$ . Then the pair $(X_{i_{\alpha _0}},X_{j_{\beta _0}})$ is independent of all the variables $\{{X_{i_\alpha }\,:\,\alpha \neq \alpha _0}\}$ and $\{{X_{j_\beta }\,:\,\beta \neq \beta _0}\}$ , and all these are mutually independent. Hence, recalling (6.5), a.s.
Thus, taking the expectation, we find that unconditionally
Consequently, if we expand ${Var} [{U_n(f_*;\,\gt m)} ]$ in analogy with (4.3), then all terms where $|i_\alpha -j_\beta |\leqslant m$ for at most one pair $(\alpha,\beta )$ will vanish. The number of remaining terms, i.e., those with at least two such pairs $(\alpha,\beta )$ , is $O(n^{2\ell -2})$ , and each term is $O(1)$ , by ( ${A}_2$ ) and the Cauchy–Schwarz inequality. Consequently,
Hence, we may ignore the final term $U_n(f_*;\,\gt m)$ in (6.6).
We turn to the main terms in (6.6), i.e., the double sum; we denote it by $\widehat{U}_n$ and write it as
where we thus define
where the $O$ is uniform over all $k\leqslant n$ and $j\leqslant \ell$ . For $j=1,\ldots,\ell$ , define the polynomial functions
Then (6.12) yields, again uniformly for all $k\leqslant n$ and $j\leqslant \ell$ ,
The expansion (6.11) yields
where all terms with $|k-q|\gt m$ vanish because the sequence $(X_i)$ is $m$ -dependent. Hence, with $r_-\,:\!=\,\max \{{-r,0}\}$ and $r_+\,:\!=\,\max \{{r,0}\}$ ,
The covariance in (6.16) is independent of $k$ ; we thus define, for any $k\gt r_-$ ,
and obtain
Furthermore, by (6.14),
Consequently, (6.18) yields
Since (4.26), (6.6), and (6.10) yield
the result (3.18) follows from (6.20), with
Next, we use (6.11) and write
with
Since $Z_{kn}$ is a function of $X_k$ , it is evident that $(Z_{kn})$ is an $m$ -dependent triangular array with centered variables. Furthermore, ${\mathbb{E}} Z_{kn}=0$ as a consequence of (6.4).
We apply Theorem 4.1 to $(Z_{kn})$ , so $\widehat{S}_n=n^{\frac 12-\ell }\widehat{U}_n$ by (6.23), and verify first its conditions. The condition (4.30) holds by (6.20) and (6.22). Write $Z_{kn}=\sum _{j=1}^\ell Z_{jkn}$ with
Since (6.12) yields $|a_{j,k,n}|\leqslant n^{\ell -1}$ , we have, for $\varepsilon \geqslant 0$ ,
The distribution of $f_j(X_k)$ does not depend on $k$ , and thus the Lindeberg condition (4.31) for each triangular array $(Z_{jkn})_{k,n}$ follows from (6.26). The Lindeberg condition (4.31) for $(Z_{nk})_{k,n}$ then follows easily. Finally, taking $\varepsilon =0$ in (6.26) yields ${\mathbb{E}} Z_{jkn}^2\leqslant Cn^{-1}$ , and thus ${\mathbb{E}} Z_{kn}^2\leqslant Cn^{-1}$ , which shows (4.32).
We have shown that Theorem 4.1 applies, and thus, recalling (6.23) and (6.4),
Proof of Theorem 3.4. Lemma 4.3 implies that it suffices to consider $U_n ({g;\,(Y_i) } )$ instead of $U_n(f;\,{\mathcal{D}})$ or $U_n(f;\,{{\mathcal{D}}{=}})$ . Note that the definition (4.8) implies that $(Y_i)_1^\infty$ is a stationary $m^{\prime}$ -dependent sequence, with $m^{\prime}\,:\!=\,m+M-1$ . Hence, the result follows from Theorem 3.3 applied to $g$ and $(Y_i)_1^\infty$ .
Remark 6.1. The integrals in (6.22) are standard beta integrals [Reference Olver, Lozier, Boisvert and Clark46, 5.12.1]; we have
Remark 6.2. In the unconstrained case Theorem 3.3, the asymptotic variance $\sigma ^2$ is given by (6.22) together with (6.17), (6.1), and (6.28).
In the constrained cases, the proof above shows that $\sigma ^2$ is given by (6.22) applied to the function $g$ given by Lemma 4.3 and $(Y_i)_1^\infty$ given by (4.8) (with $M=D+1$ for definiteness); note that this also entails replacing $\ell$ by $b$ and $m$ by $m+M-1=m+D$ in the formulas above. In particular, in the exactly constrained case (3.3), it follows from (6.1) and (4.18) that, with $y=(x_1,\ldots,x_M)\in{\mathcal{S}}^M$ and other notation as in (4.11)–(4.14) and (5.4),
where the $i$ th group of variables consists of the given $x_i$ , the other $b-1$ groups contain variables $X_i$ , and $j_1,\ldots,j_b$ is any sequence of indices that has large enough gaps: $j_{i+1}-j_i\gt m+M-1= m+D$ .
In the constrained case (3.2), $g=g_{\mathcal{D}}$ is obtained as the sum (4.23), and thus each $g_i$ is a similar sum of functions that can be obtained as (6.29). (Note that $M\,:\!=\,D+1$ works in Lemma 4.2 for all terms by (3.5).) Then, $\sigma ^2$ is given by (6.22) (with substitutions as above).
7. Law of large numbers
Proof of Theorem 3.2. Note first that if $R_n$ is any sequence of random variables such that
then Markov’s inequality and the Borel–Cantelli lemma show that $R_n\overset{a.s.}{\longrightarrow }0$ .
We begin with the unconstrained case, ${\mathcal{D}}={\mathcal{D}}_\infty =(\infty,\ldots,\infty )$ . We may assume, as in the proof of Theorem 3.3, that $\mu =0$ . Then (6.21) holds, and thus by the argument just given, and recalling that ${\mathbb{E}} \widehat{U}_n=0$ by (6.11) and (6.4),
Hence, to prove (3.15), it suffices to prove $n^{-\ell }\widehat{U}_n\overset{a.s.}{\longrightarrow }0$ .
For simplicity, we fix $j\in [\ell ]$ and define, with $f_j$ as above given by (6.1),
and, using partial summation,
The sequence $(f_j(X_k))_k$ is $m$ -dependent, stationary, and with ${\mathbb{E}} |f_j(X_k)|\lt \infty$ . As is well known, the strong law of large numbers holds for stationary $m$ -dependent sequences with finite means. (This follows by considering the subsequences $(X_{(m+1)n+q})_{n\geqslant 0}$ , which for each fixed $q\in [m+1]$ is an i.i.d. sequence.) Thus, by (7.3) and (6.4),
In other words, a.s. $S_{jn}=o(n)$ , and thus also
Moreover, (6.12) implies $a_{j,k,n}-a_{j,k+1,n}=O(n^{\ell -2})$ . Hence, (7.4) yields
and thus, using (7.6),
Consequently,
which together with (7.2) yields the desired result (3.15).
Next, for an exact constraint ${\mathcal{D}}{=}$ , we use Lemma 4.2. Then (4.9) together with the result just shown applied to $g$ and $(Y_i)$ yields
This proves (3.17), and (3.16) follows by (3.4).
Finally, using Theorem 3.1, (3.12)–(3.14) are equivalent to (3.15)–(3.17).
8. The degenerate case
As is well known, even in the original symmetric and independent case studied in [Reference Hoeffding27], the asymptotic variance $\sigma ^2$ in Theorem 3.3 may vanish also in non-trivial cases. In such cases, (3.19) is still valid, but says only that the left-hand side converges to 0 in probability. In the present section, we characterize this degenerate case in Theorems 3.3 and 3.4. Note that in applications, it is frequently natural to guess that $\sigma ^2\gt 0$ , but this is sometimes surprisingly difficult to prove. One purpose of the theorems below is to assist in showing $\sigma ^2\gt 0$ ; see the applications in Sections 13 and 14.
For an unconstrained $U$ -statistic and an independent sequence $(X_i)_1^\infty$ (the case $m=0$ of Theorem 3.3), it is known, and not difficult to see, that $\sigma ^2=0$ if and only if every projection $f_i(X_1)$ defined by (6.1) vanishes a.s.; see [Reference Janson37, Corollary 3.5]. (This is included in the theorem below by taking $m=0$ in (iii), and it is also the correct interpretation of (vi) when $m=0$ .) In the $m$ -dependent case, the situation is similar, but somewhat more complicated, as shown by the following theorem. Note that $S_n(f_j)$ defined in (8.8) below equals $S_{jn}$ ; for later applications we find this change of notation convenient.
Theorem 8.1. With assumptions and notation as in Theorem 3.3, define $f_i$ by (6.1), $\gamma _{i,j,r}$ by (6.17) and $S_{jn}$ by (7.3). Then the following are equivalent:
-
(i)
(8.1) \begin{align} \sigma ^2=0. \end{align} -
(ii)
(8.2) \begin{align} {Var}\ U_n = O\bigl ({n^{2\ell -2}}\bigr ). \end{align} -
(iii)
(8.3) \begin{align} \sum _{r=-m}^m\gamma _{i,j,r}=0,\qquad \forall i,j\in [\ell ]. \end{align} -
(iv)
(8.4) \begin{align} {Cov}\bigl [{S_{in},S_{jn}}\bigr ]/n \to 0\ as\ n\to \infty \quad \forall i,j\in [\ell ]. \end{align} -
(v)
(8.5) \begin{align} {Var}\bigl [{S_{jn}}\bigr ]/n \to 0\ as\ n\to \infty \quad \forall j\in [\ell ]. \end{align} -
(vi) For each $j\in [\ell ]$ there exists a stationary sequence $(Z_{j,k})_{k=0}^\infty$ of $(m-1)$ -dependent random variables such that a.s.
(8.6) \begin{align} f_j(X_k)=Z_{j,k}-Z_{j,k-1}, \qquad k\geqslant 1. \end{align}
Moreover, suppose that the sequence $(X_k)_1^\infty$ is a block factor given by (2.2) for some function $h$ and i.i.d. $\xi _i$ , and that $\sigma ^2=0$ . Then, in (vi), we may take $Z_{j,k}$ as block factors
for some functions $\varphi _j\,:\,{\mathcal{S}}_0^m\to \mathbb{R}$ . Hence, for every $j\in [\ell ]$ and $n\geqslant 1$ ,
and thus $S_{n}(f_j)$ is independent of $\xi _{m+1},\ldots,\xi _{n}$ for every $j\in [{\ell -1}]$ and $n\gt m$ .
To prove Theorem 8.1, we begin with a well-known algebraic lemma; for completeness we include a proof.
Lemma 8.1. Let $A=(a_{ij})_{i,j=1}^\ell$ and $B=(b_{ij})_{i,j=1}^\ell$ be symmetric real matrices such that $A$ is positive definite and $B$ is positive semidefinite. Then
Proof. Since $A$ is positive definite, there exists an orthonormal basis $(v_k)_1^\ell$ in $\mathbb{R}^\ell$ consisting of eigenvectors of $A$ , in other words satisfying $Av_k=\lambda _kv_k$ ; furthermore, the eigenvalues $\lambda _k\gt 0$ . Write $v_k=(v_{ki})_{i=1}^\ell$ . We then have
Thus
Since $B$ is positive semidefinite, all terms in the last sum are $\geqslant 0$ , so the sum is 0 if and only if every term is, and thus
By the Cauchy–Schwarz inequality for the semidefinite bilinear form $\langle{v,Bw}\rangle$ (or, alternatively, by using $\langle{v_k\pm v_n,B(v_k\pm v_n)}\rangle \geqslant 0$ ), it follows that this condition implies $\langle{v_k,Bv_n}\rangle =0$ for any $k,n\in [\ell ]$ , and thus
Since $(v_k)_1^\ell$ is a basis, this is further equivalent to $\langle{v,Bw}\rangle =0$ for any $v,W\in \mathbb{R}^\ell$ , and thus to $B=0$ . This yields (8.9).
Proof of Theorem 8.1. The $\ell$ polynomials $\psi _j$ , $j=1,\ldots,\ell$ , of degree $\ell -1$ defined by (6.13) are linearly independent (e.g., since the matrix of their coefficients in the standard basis $\{{1,x,\ldots,x^{\ell -1}}\}$ is upper triangular with non-zero diagonal elements). Hence, the Gram matrix $A=(a_{ij})_{i,j}$ with
is positive definite.
We have by (7.3), similarly to (6.15)–(6.18),
and thus, as $n\to \infty$ ,
Note that (6.22) can be written as
The covariance matrices $ ({{Cov}(S_{in},S_{jn})} )_{i,j=1}^\ell$ are positive semidefinite, and thus so is the limit $B=(b_{ij})$ defined by (8.16). Hence Lemma 8.1 applies and yields, using (8.17) and the definition of $b_{ij}$ in (8.16), the equivalence (i) $\iff$ (iii).
Furthermore, (8.16) yields (iii) $\iff$ (iv).
The implication (iv) $\implies$ (v) is trivial, and the converse follows by the Cauchy–Schwarz inequality.
If (iii) holds, then (6.20) yields ${Var}\ \widehat{U}_n = O \big({n^{2\ell -2}} \big)$ , and (ii) follows by (6.21). Conversely, (ii) $\implies$ (i) by (3.18).
Moreover, for $m\geqslant 1$ , (v) $\iff$ (vi) holds by [Reference Janson36, Theorem 1], recalling ${\mathbb{E}} f_j(X_k)=0$ by (6.4). (Recall also that any stationary sequence $(W_k)_1^\infty$ of real random variables can be extended to a doubly-infinite stationary sequence $(W_k)_{-\infty }^\infty$ .) The case $m=0$ is trivial, since then (v) is equivalent to ${Var} f_j(X_k)=0$ and thus $f_j(X_k)=0$ a.s. by (6.4), while (vi) should be interpreted to mean that (8.6) holds for some non-random $Z_{j,k}=z_j$ .
Finally, suppose that $(X_i)_1^\infty$ is a block factor. In this case, [Reference Janson36, Theorem 2] shows that $Z_{j,k}$ can be chosen as in (8.7). (Again, the case $m=0$ is trivial.) Then (8.8) is an immediate consequence of (8.6)–(8.7).
Remark 8.1. It follows from the proof in [Reference Janson36] that in (vi), we can choose $Z_{jk}$ such that also the random vectors $(Z_{jk})_{j=1}^\ell$ , $k\geqslant 0$ , form a stationary $(m-1)$ -dependent sequence.
Theorem 8.2. With assumptions and notation as in Theorem 3.4, define also $g_i$ , $i\in [b]$ , as in Remark 6.2, i.e., by (6.29) in the exactly constrained case and otherwise as a sum of such terms over all ${\mathcal{D}}^{\prime}$ given by (3.5). Also (again as in Remark 6.2), let $D$ be given by (4.7) and $Y_k$ by (4.8) with $M=D+1$ . Then $\sigma ^2=0$ if and only if for every $j\in [b]$ , there exists a stationary sequence $(Z_{j,k})_{k=0}^\infty$ of $(m+D-1)$ -dependent random variables such that a.s.
Moreover, if the sequence $(X_i)_1^\infty$ is independent and $\sigma ^2=0$ , then there exist functions $\varphi _j\,:\,{\mathcal{S}}^{D}\to \mathbb{R}$ such that (8.18) holds with
and consequently a.s.
thus $S_{n}(g_j)$ is independent of $X_{D+1},\ldots,X_n$ for every $j\in [{\ell -1}]$ and $n\gt D$ .
Proof. As in the proof of Theorem 3.4, it suffices to consider $U_n(g)$ with $g$ given by Lemma 4.3 (with $M=D+1$ ). The first part then is an immediate consequence of Theorem 8.1(i) $\Leftrightarrow$ (vi) applied to $g$ and $Y_i\,:\!=\,(X_i,\ldots,X_{i+D})$ , with appropriate substitutions $\ell \mapsto b$ and $m\mapsto m+D$ .
The second part follows similarly by the last part of Theorem 8.1, with $\xi _i=X_i$ ; note that then $(Y_i)$ is a block factor as in (2.2), with $m$ replaced by $D$ .
Remark 8.2. Of course, under the assumptions of Theorem 8.2, the other equivalences in Theorem 8.1 hold as well, with the appropriate interpretations, substituting $g$ for $f$ and so on.
We give an example of a constrained $U$ -statistic where $\sigma ^2=0$ in a somewhat non-trivial way.
Example 8.1. Let $(X_i)_1^\infty$ be an infinite i.i.d. symmetric random binary string; i.e., ${\mathcal{S}}=\{{0,1}\}$ and $X_i\sim {Be}(1/2)$ are i.i.d. Let
and consider the constrained $U$ -statistic
which thus has constraint ${\mathcal{D}}=(1,\infty )$ . (In this case, $U_n(f,{\mathcal{D}})=U_n(f;\,{{\mathcal{D}}{=}})$ .) Note that (8.22) is a difference of two constrained subsequence counts.
Although the function (8.21) might look non-trivial and innocuous at first glance, this turns out to be a degenerate case. In fact, it is easily verified that
Hence, with $m=0$ , $D=1$ , and $M=D+1=2$ , (5.4) yields
while (6.29) yields
Thus $g_2$ vanishes but not $g_1$ . Nevertheless, $g_1(Y_k)=g_1(X_k,X_{k+1})=\frac 12(X_k-X_{k+1})$ is of the type in (8.18)–(8.19) (with $Z_{1,k}\,:\!=\,-\frac 12X_{k+1}$ ). Hence, Theorem 8.2 shows that $\sigma ^2=0$ , and thus Theorem 3.4 and (3.8) yield $n^{-3/2}U_n(f;\,{{\mathcal{D}}{=}})\overset{p}{\longrightarrow }0$ .
In fact, in this example we have by (8.23), for $n\geqslant 3$ ,
Hence, by the law of large numbers for stationary $m$ -dependent sequences,
As a consequence, $n^{-1} U_n(f;\,{\mathcal{D}})$ has a non-degenerate limiting distribution. Note that this example differs in several respects from the degenerate cases that may occur for standard $U$ -statistics, i.e. unconstrained $U$ -statistics based on independent $(X_i)$ . In this example, (8.28) shows that the asymptotic distribution is a linear transformation of a Bernoulli variable, and is thus neither normal, nor of the type that appears as limits of degenerate standard $U$ -statistics. (The latter are polynomials in independent normal variables, in general infinitely many; see e.g. Theorem A.2 and, in general, [Reference Rubin and Vitale56] and [Reference Janson35, Chapter 11].) Moreover, the a.s. convergence to a non-degenerate limit is unheard of for standard $U$ -statistics, where the limit is mixing.
8.1. The degenerate case in renewal theory
In the renewal theory setting in Theorem 3.8, the degenerate case is characterized by a modified version of the conditions above.
Theorem 8.3. With the assumptions and notation of Theorem 3.8, let $g_i$ , $i\in [b]$ , be as in Theorem 8.2 and Remark 6.2. Then $\gamma ^2=0$ if and only if for every $j\in [b]$ , the function
satisfies the condition (8.18). Moreover, if the sequence $(X_i)_1^\infty$ is independent and $\gamma ^2=0$ , then the functions $\widetilde{g}_j$ also satisfy (8.19)–(8.20).
The proof is given in Section 12. Note that ${\mathbb{E}} \widetilde{g}_j(Y_1)=0$ for each $j\in [b]$ by (6.4) and (3.25).
9. Rate of convergence
Here we use a different method from that used in the rest of the paper.
Proof of Theorem 3.5. We consider $U_n(f,{\mathcal{D}})$ ; the argument for $U_n(f;\,{{\mathcal{D}}{=}})$ is identical, and $U_n(f)$ is a special case.
Let $\mathcal{I}$ denote the set of all indices $(i_1,\ldots,i_\ell )$ in the sum (3.2); thus (3.2) can be written $U_n(f;\,{\mathcal{D}})=\sum _{I\in{\mathcal{I}}} Z_I$ , where $Z_{i_1,\ldots,i_\ell }\,:\!=\,f(X_{i_1},\ldots,X_{i_\ell })$ . Note that the size $|{\mathcal{I}}| \sim C n^b$ for some $C\gt 0$ , where $b=b({\mathcal{D}})$ .
We define a graph $\widehat{\mathcal{I}}$ with vertex set $\mathcal{I}$ by putting an edge between $I=(i_1,\ldots,i_\ell )$ and $I^{\prime}=(i^{\prime}_1,\ldots,i^{\prime}_\ell )$ if and only if $|i_j-i^{\prime}_k|\leqslant m$ for some $j,k\in \{{1\ldots,\ell }\}$ . Let $\Delta$ be 1 + the maximum degree of the graph $\widehat{\mathcal{I}}$ ; it is easy to see that $\Delta =O\big(n^{b-1}\big)$ . Moreover, it follows from the $m$ -dependence of $(X_i)$ that $\widehat{\mathcal{I}}$ is a dependency graph for the random variables $(Z_I)_I$ , meaning that if $A$ and $B$ are two disjoint subsets of $\mathcal{I}$ such that there is no edge between $A$ and $B$ , then the two random vectors $(Z_I)_{I\in A}$ and $(Z_I)_{I\in B}$ are independent.
The result now follows from [Reference Rinott54, Theorem 2.2], which in our notation yields the following bound, with $\sigma ^2_n\,:\!=\,{Var}\ U_n \sim \sigma ^2 n^{2b-1}$ and $B\,:\!=\,2\sup |f|$ , which implies $|Z_I-{\mathbb{E}} Z_I|\leqslant B$ a.s. for every $I\in{\mathcal{I}}$ :
since $|{\mathcal{I}}|\Delta \leqslant C n^{b+b-1}\leqslant C \sigma ^2_n$ and $B$ is a constant. (Alternatively, one could use the similar bound in [Reference Fang20, Theorem 2.1].)
Remark 9.1. The assumption in Theorem 3.5 that $f$ is bounded can be relaxed to the 6th moment condition ( ${A}_{6}$ ) by using Theorem 2.1 instead of Theorem 2.2 of [Reference Rinott54], together with Hölder’s inequality and straightforward estimates.
The similar bound [Reference Baldi and Rinott2, Corollary 2] gives the weaker estimate $d_K=O ({n^{-1/4}} )$ , assuming again that $f$ is bounded; this can be relaxed to ( ${A}_4$ ) by instead using [Reference Baldi and Rinott2, Theorem 1].
If, instead of the Kolmogorov distance, we use the Wasserstein distance $d_W$ (see e.g. [Reference Chen, Goldstein and Shao14, pp. 63–64] for several equivalent definitions, and for several alternative names), the estimate $d_W=O ({n^{-1/2}} )$ follows similarly from [Reference Barbour, Karoński and Ruciński3, Theorem 1], assuming only the third moment condition ( ${A}_3$ ); we omit the details. (Actually, [Reference Barbour, Karoński and Ruciński3] does not state the result for the Wasserstein distance but for a weaker version called bounded Wasserstein distance; however, the same proof yields estimates for $d_W$ .) See also [Reference Raič51, Theorem 3 and Remark 3], which yield the same estimate under ( ${A}_3$ ), and furthermore imply convergence in distribution assuming only ( ${A}_2$ ). (This thus yields an alternative proof of Theorems 3.3 and 3.4.)
Returning to the Kolmogorov distance, we do not believe that the moment assumption ( ${A}_{6}$ ) in Remark 9.1 is the best possible. For unconstrained and symmetric $U$ -statistics, [Malevich and Abdalimov [Reference Malevich and Abdalimov43], Theorem 2] have shown bounds for the Kolmogorov distance, which in particular show that then ( ${A}_3$ ) is sufficient to yield $d_K=O ({n^{-1/2}} )$ ; we conjecture that the same holds in our, more general, setting.
Conjecture 9.1. Theorem 3.5 holds assuming only ( ${A}_3$ ) (instead of $f$ bounded).
Remark 9.2. If we do not care about the rate of convergence, for bounded $f$ we can alternatively obtain convergence in distribution in (3.22), and thus in (3.19) and (3.21), by [Reference Janson34, Theorem 2], using the dependency graph $\widehat{\mathcal{I}}$ in the proof of Theorem 3.5. This can easily be extended to any $f$ satisfying the second moment condition ( ${A}_2$ ) by a standard truncation argument.
10. Higher moments and maximal functions
To prove Theorem 3.6, we will show estimates for maximal functions that will also be used in Sections 11 and 12. Let $p\geqslant 2$ be fixed throughout the section; explicit and implicit constants may thus depend on $p$ . We let
and use similar notation for maximal functions of other sequences of random variables.
We use another decomposition of $f$ and $U_n(f)$ which was used in [Reference Janson37] for the independent case ( $m=0$ ); unlike Hoeffding’s decomposition in Section 6, it focuses on the order of the arguments.
Recall from Section 5 that $(\widehat{X}_i)_1^\ell$ are i.i.d. with the same distribution as $X_1$ . Let $\widehat{F}_0\,:\!=\,\mu$ defined in (5.1) and, for $1\leqslant k\leqslant \ell$ ,
(These are defined at least for ${\mathcal{L}}(X_1)$ -a.e. $x_1,\ldots,x_k\in{\mathcal{S}}$ , which is enough for our purposes.) In other words, a.s.,
and thus $\widehat{F}_k(\widehat{X}_1,\ldots,\widehat{X}_k)$ , $k=0,\ldots,\ell$ , is a martingale, with the martingale differences $F_k(\widehat{X}_1,\ldots,\widehat{X}_k)$ , $k=1,\ldots,\ell$ . Hence, or directly from (10.2)–(10.3), for a.e. $x_1,\ldots,x_{k-1}$ ,
Furthermore, if $({A}_p)$ holds, then by (10.4) and Jensen’s inequality,
and thus by (10.3),
Lemma 10.1. Suppose that $({A}_p)$ holds for some $p\geqslant 2$ , and that $\mu =0$ . Then
Proof. We argue as in [Reference Janson37, Lemmas 4.4 and 4.7] with some minor differences. By (10.2)–(10.3), $f(x_1,\ldots,x_\ell ) =\widehat{F}_\ell (x_1,\ldots,x_\ell ) =\sum _{k=1}^\ell F_k(x_1,\ldots,x_k)$ for a.e. $x_1,\ldots,x_\ell$ , and thus, a.s.,
using a summation by parts and the identity $\binom{n^{\prime}}{\ell -k}-\binom{n^{\prime}-1}{\ell -k}=\binom{n^{\prime}-1}{\ell -k-1}$ . In particular,
Since the right-hand side is weakly increasing in $n$ , it follows that, a.s.,
We thus may consider each $F_k$ separately. Let $1\leqslant k\leqslant \ell$ , and let
By the definition (4.25), $\Delta U_n(F_k;\,\gt m)$ is a sum of $\binom{n-(k-1)m-1}{k-1}\leqslant n^{k-1}$ terms $F_k\big(X_{i_1},\ldots,X_{i_{k-1}},X_n\big)$ that all have the same distribution as $F_k({\widehat{X}_1,\ldots,\widehat{X}_k})$ , and thus by Minkowski’s inequality and (10.7),
Furthermore, in each such term $F_k\big(X_{i_1},\ldots,X_{i_{k-1}},X_n\big)$ we have $i_{k-1}\leqslant n-m-1$ . Hence, if we let ${\mathcal{F}}_i$ be the $\sigma$ -field generated by $X_1,\ldots,X_i$ , then, by $m$ -dependence, $X_n$ is independent of ${\mathcal{F}}_{i_{k-1}}$ , whence (10.5) implies
Consequently,
In the independent case $m=0$ treated in [Reference Janson37], this means that $U_n(F_k)$ is a martingale. In general, we may as a substitute split $U_n(F_k;\,\gt m)$ as a sum of $m+1$ martingales. For $j=1,\ldots,m+1$ and $i\geqslant 1$ , let
Then (10.15) implies that $(M_i^{(k,j)})_{i\geqslant 0}$ is a martingale, for each $k$ and $j$ . Hence, Burkholder’s inequality [Reference Gut25, Theorem 10.9.5] yields, for the maximal function $M_n^{(k,j)*}$ ,
Furthermore, Minkowski’s inequality yields (since $p/2\geqslant 1$ ), using also (10.16) and (10.13), for $n\geqslant 1$ ,
Combining (10.18) and (10.19) yields
It follows from (10.16)–(10.17) that
Hence (coarsely),
and thus (10.20) and Minkowski’s inequality yield
for $k=1,\ldots,\ell$ .
The result (10.8) now follows from (10.23) and (10.11) by a final application of Minkowski’s inequality.
Theorem 10.1. Suppose that $({A}_p)$ holds for some $p\geqslant 2$ . Then, with $b=b({\mathcal{D}})$ ,
The same results hold for an exact constraint ${\mathcal{D}}{=}$ .
Proof. We use induction on $b$ . We split the induction step into three cases.
Case 1: no constraint, i.e., ${\mathcal{D}}={\mathcal{D}}_\infty$ and $b=\ell$ By (4.27)–(4.28),
Thus,
Suppose first that $\mu =0$ ; then Lemma 10.1 applies to $U^*_n(f;\,\gt m)$ . Furthermore, the induction hypothesis applies to each term in the sum in (10.28), since $b({\mathcal{D}}_J)=\ell -|J|\leqslant \ell -1=b-1$ . Hence, Minkowski’s inequality yields
When $\mu =0$ , (3.7) yields
Now (10.24) follows from (10.29) and (10.30), which shows (10.24) when $\mu =0$ . The general case follows by considering $f-\mu$ ; this does not affect $U_n(f)-{\mathbb{E}} U_n(f)$ .
Finally, both (10.25) and (10.26) follow from (10.24) and (3.7).
Case 2: an exact constraint ${\mathcal{D}}{=}$ , $b\lt \ell$ . This is an immediate consequence of (4.9) in Lemma 4.2 and Case 1 applied to $g$ ; note that $g$ too satisfies $({A}_p)$ by (4.19).
Case 3: a constraint $\mathcal{D}$ , $b\lt \ell$ . This is a consequence of Case 2 by (3.4) and (5.5).
Lemma 10.2. Suppose that $({A}_p)$ holds for some $p\geqslant 2$ . Let $b\,:\!=\,b({\mathcal{D}})$ . Then the sequences
are uniformly $p$ th-power integrable.
The same holds for an exact constraint ${\mathcal{D}}{=}$ .
Proof. We consider the second sequence in (10.31); the proof for the first sequence differs only notationally.
We have so far let $f$ be fixed, so the constants above may depend on $f$ . However, it is easy to see that the proof of (10.26) yields
with $C_p$ independent of $f$ (but depending on $p$ ). (Note that we only have to consider a finite set of indices $(i_1,\ldots,i_\ell )$ , as discussed above (3.6).)
Truncate $f$ , and for $B\gt 0$ define $f_B(\textbf{x})\,:\!=\,f(\textbf{x})\boldsymbol{1}\{{|f(\textbf{x})|\leqslant B}\}$ . Then (10.32) yields
where
as $B\to \infty$ .
Let $q\,:\!=\,2p$ . Since $f_B$ is bounded, we may apply (10.32) (or Theorem 10.1) with $p$ replaced by $q$ and obtain
Hence, for any $B$ , the sequence $n^{-\ell } U^*_n(f_B;\,{\mathcal{D}})$ is uniformly $p$ th-power integrable. Since $ U^*_n(f;\,{\mathcal{D}}) \leqslant U^*_n(f_B;\,{\mathcal{D}})+ U^*_n(f-f_B;\,{\mathcal{D}})$ , the result now follows from the following simple observation.
Lemma 10.3. Let $1\leqslant p\lt \infty$ . Let $(\xi _n)_{n\geqslant 1}$ be a sequence of random variables. Suppose that for every $\varepsilon \gt 0$ , there exist random variables $\eta ^\varepsilon _n$ and $\zeta ^\varepsilon _n$ , $n\geqslant 1$ , such that
-
(i) $|\xi _n|\leqslant \eta ^\varepsilon _n+\zeta ^\varepsilon _n$ ,
-
(ii) the sequence $(|\eta ^\varepsilon _n|^p)_n$ is uniformly integrable, and
-
(iii) $\lVert{\zeta ^\varepsilon _n}\rVert _p\leqslant \varepsilon$ .
Then $({|\xi _n|^p})_{n}$ is uniformly integrable.
Proof. Since (i) implies $|\xi _n|^p \leqslant 2^p|\eta ^\varepsilon _n|^p+2^p|\zeta ^\varepsilon _n|^p$ , it suffices (by appropriate substitutions) to consider the case $p=1$ . This is a simple exercise, using for example [Reference Gut25, Theorem 5.4.1].
11. Functional convergence
We begin by improving (10.8) in a special situation. (We consider only $p=2$ .)
Lemma 11.1. Suppose that ( ${A}_2$ ) holds and that $\mu =0$ and $f_i(X_i)=0$ a.s. for every $i=1,\ldots,\ell$ . Then
Proof. Note that (6.6) and (6.10) immediately give this estimate for $\lVert{U_n(f;\,\gt m)}\rVert _2$ . To extend it to the maximal function $U_n^*(f;\,\gt m)$ , we reuse the proof of Lemma 10.1 (with $p=2$ ), and analyze the terms $U^*_n(F_k;\,\gt m)$ further. First, by (10.4), (6.2), and the assumptions, for every $k\in [\ell ]$ ,
In particular, for $k=1$ , (11.2) yields $\widehat{F}_1(\widehat{X}_1)=0$ a.s., and thus
For $k\geqslant 2$ , as stated in the proof of Lemma 10.1, $\Delta U_n(F_k;\,\gt m)$ is a sum of $\leqslant n^{k-1}$ terms $F_k\big(X_{i_1},\ldots,X_{i_{k-1}},X_n\big)$ . Consider two such terms $F_k\big(X_{i_1},\ldots,X_{i_{k-1}},X_n\big)$ and $F_k\big(X_{i^{\prime}_1},\ldots,X_{i^{\prime}_{k-1}},X_n\big)$ , and suppose that $|i_j-i^{\prime}_{j^{\prime}}|\gt m$ for all $j,j^{\prime}\in [k-1]$ . Then all variables $X_{i_j}, X_{i^{\prime}_{j^{\prime}}}$ , and $X_n$ are independent, and thus a.s.
by (11.2). Hence, taking the expectation,
unless $|i_j-i^{\prime}_{j^{\prime}}|\leqslant m$ for some pair $(j,j^{\prime})$ . For each $(i_1,\ldots,i_{k-1})$ , there are only $O(n^{k-2})$ such $(i^{\prime}_1,\ldots,i^{\prime}_{k-1})$ , and for each of these, the expectation in (11.5) is $O(1)$ by ( ${A}_2$ ) and the Cauchy–Schwarz inequality. Consequently, summing over all appearing $(i_1,\ldots,i_{k-1})$ and $(i^{\prime}_1,\ldots,i^{\prime}_{k-1})$ ,
We have gained a factor of $n$ compared to (10.13). Hence, recalling (10.16) and using (11.6) in (10.18)–(10.19) (which for $p=2$ is essentially Doob’s inequality), we improve (10.20) to
Finally, (11.7) and (10.22) yield
for $2\leqslant k\leqslant \ell$ ; this holds trivially for $k=1$ too by (11.3). The result follows by (10.11) and (11.8).
Proof of Theorem 3.7. We prove (3.23); then (3.24) follows by (3.8). By replacing $f$ by $f-\mu$ , we may assume that $\mu =0$ .
Consider first the unconstrained case. We argue as in [Reference Janson37], with minor modifications. We use (6.6), which we write as follows (cf. (6.11)):
with, as in (6.12),
Lemma 11.1 applies to $f_*$ and shows that
which implies that the last term in (11.9) is negligible, so we concentrate on the sum. Define $\Delta a_{j,i,n} \,:\!=\,a_{j,{i+1},n}- a_{j,i,n}$ and, using a summation by parts,
Donsker’s theorem extends to $m$ -dependent stationary sequences [Reference Billingsley5], and thus, as $n\to \infty$ ,
for a continuous centered Gaussian process $W_j$ (a suitable multiple of Brownian motion); furthermore, as is easily seen, this holds jointly for $j=1,\ldots,\ell$ . Moreover, define
(Thus $\psi (s,1)=\psi (s)$ defined in (6.13); the present homogeneous version is more convenient here.) Let $\psi ^{\prime}_j(s,t)\,:\!=\,\frac{\partial }{\partial s}\psi (s,t)$ . Then straightforward calculations (as in [Reference Janson37, Lemma 4.2]) show that, extending (6.12),
uniformly for all $n,j,i$ that are relevant; moreover, the error terms with negative powers, i.e., $n^{\ell -2}$ for $\ell =1$ and $n^{\ell -3}$ for $\ell \leqslant 2$ , vanish.
By the Skorokhod coupling theorem [Reference Kallenberg41, Theorem 4.30], we may assume that the convergences (11.13) hold a.s., and similarly (see (11.11)), a.s.
It then follows from (11.12)–(11.16) and the homogeneity of $\psi _j$ that a.s., uniformly for $t\in [0,T]$ for any fixed $T$ ,
Summing over $j\in [\ell ]$ , we obtain by (11.9), (11.12), (11.18), and (11.17), a.s. uniformly for $t\in [0,T]$ for any $T$ ,
where
which obviously is a centered Gaussian process. We can rewrite (11.19) as
Finally we use (10.27), which implies
and thus, by Theorem 10.1, recalling $b({\mathcal{D}}_J)=\ell -|J|\leqslant \ell -1$ ,
It follows that, in each $D[0,T]$ and thus in $D[0,\infty )$ ,
Furthermore, recalling the assumption $\mu =0$ , ${\mathbb{E}} U_n(f)=O ({n^{\ell -1}} )$ by (3.7), and thus
The result (3.23) in the unconstrained case follows from (11.21), (11.24), and (11.25).
Joint convergence for several $f$ (in the unconstrained case) follows by the same proof.
Finally, as usual, the exactly constrained case follows by (4.9) in Lemma 4.2 and the constrained case then follows by (3.4), using joint convergence for all $g_{{{\mathcal{D}}^{\prime}{=}}}$ , with notation as in (3.4) and Lemma 4.2. To obtain joint convergence for several $f$ and $\mathcal{D}$ , we only have to choose $M$ in (4.8) large enough to work for all of them.
12. Renewal theory
Note first that by the law of large numbers for $m$ -dependent sequences,
As a simple consequence, we have the following (the case $N_+$ is in [Reference Janson33, Theorems 2.1 and 2.2]), which extends the well-known case of independent $X_i$ ; see e.g. [Reference Gut24, Sections 3.4 and 3.10].
Lemma 12.1. Assume that $\nu \,:\!=\,{\mathbb{E}} h(X_1)\gt 0$ and that ${\mathbb{E}} |h(X_1)|^2\lt \infty$ . As $x\to \infty$ ,
Proof. Note that, by the definitions (3.27)–(3.28),
Then (12.2) follows easily from (12.1). Furthermore, for any $\varepsilon \gt 0$ ,
as $x\to \infty$ , and (12.3) follows from (12.4), (12.5), and (12.2). We omit the standard details.
Moreover, still assuming ${\mathbb{E}} |h(X_1)|^2\lt \infty$ , Donsker’s theorem for $m$ -dependent sequences [Reference Billingsley5] yields, as in (11.13),
for a continuous centered Gaussian process $W_h(t)$ .
Proof of Theorem 3.8. Note that (12.6) is the special (unconstrained) case $f=h$ , ${\mathcal{D}}=()$ , $\ell =b=1$ of (3.23). By joint convergence in Theorem 3.7 for $(f,{\mathcal{D}})$ and $h$ , we thus have (3.24) jointly with (12.6). We use again the Skorokhod coupling theorem and assume that (3.24), (12.6), and (12.3), as well as (12.2), hold a.s.
Take $n\,:\!=\,\lceil{x}\rceil$ and $t\,:\!=\,N_\pm (x)/n$ , and let $x\to \infty$ . Then $t\to 1/\nu$ a.s. by (12.2), and thus (3.24) implies, a.s.,
Similarly, (12.6) implies, a.s.,
By (12.8) and (12.3), we have a.s.
Thus, by the binomial theorem, a.s.,
Hence, (12.7) yields, a.s.,
which yields (3.29) with
The exactly constrained case and joint convergence follow similarly.
Proof of Theorem 8.3. We may as in the proof of Theorem 3.8 assume that (3.24), (12.6), and (12.10) hold a.s. Recall that the proofs above use the decomposition (3.4) and Lemma 4.2 applied to every ${\mathcal{D}}^{\prime}{=}$ there, with $b\,:\!=\,b({\mathcal{D}})$ , $D$ given by (4.7), $M\,:\!=\,D+1$ (for definiteness), and $Y_i$ defined by (4.8). Furthermore, (4.20) holds with $g=g_{\mathcal{D}}\,:\,{\mathcal{S}}^b\to \mathbb{R}$ given by (4.23); in (3.23)–(3.24), we thus have the same limit $Z(t)$ for $U_n(f;\,{\mathcal{D}} ;\,(X_i) )$ and $U_{n}({g;\,(Y_i) })$ . We may assume that this limit holds a.s. also for $g$ .
Recall that $h\,:\,{\mathcal{S}}\to \mathbb{R}$ . We abuse notation and extend it to ${\mathcal{S}}^M$ by $h(x_1,\ldots,x_M)\,:\!=\,h(x_1)$ ; thus $h(Y_i)=h(X_i)$ . In particular, $S_n(h;\,(X_i))=S_n(h;\,(Y_i))$ , so we may write $S_n(h)$ without ambiguity. We define $H\,:\,({\mathcal{S}}^M)^b\to \mathbb{R}$ by
Note that (5.1) and (6.1) applied to the function $H$ yield
(Since $H$ is symmetric, $H_j$ is the same for every $j$ .)
In the unconstrained sum (3.1), there are $\binom{n-1}{\ell -1}$ terms that contain $X_i$ , for each $i\in [n]$ . Applying this to $H$ and $(Y_i)$ , we obtain by (12.13)
Hence, for each fixed $t\gt 0$ , by (12.6), a.s.,
Combining (3.23) (for $g$ and $(Y_i)$ ) and (12.17), we obtain that, a.s.,
Taking $t=\nu ^{-1}$ , we see that this converges to the random variable in (12.12). Let $G\,:\!=\,g-\mu _{\mathcal{D}}\nu ^{-1} H$ . Then a comparison with Theorem 3.3 (applied to $G$ ) shows that
In particular, $\gamma ^2=0$ if and only if $\sigma ^2(G)=0$ , and the result follows by Theorem 8.2, noting that $G_j\,:\!=\,g_j-\mu _{\mathcal{D}}\nu ^{-1} H_j=g_j+\mu _{\mathcal{D}}-\mu _{\mathcal{D}}\nu ^{-1} h$ by (12.15).
Proof of Theorem 3.9. This can be proved like [Reference Janson37, Theorem 3.13], by first stopping at $N_+(x_-)$ , with $x_-\,:\!=\,\lfloor{x-\ln x}\rfloor$ , and then continuing to $N_-(x)$ ; we therefore only sketch the details. Let $R(x)\,:\!=\,S_{N_+(x)}-x\gt 0$ be the overshoot at $x$ , and let $\Delta (x)\,:\!=\,x-S_{N_+(x_-)}=x-x_--R(x_-)$ . It is well known (see e.g. [Reference Gut24, Theorem 2.6.2]) that $R(x)$ converges in distribution as $x\to \infty$ . In particular, $\Delta (x)\overset{p}{\longrightarrow }+\infty$ and thus $\mathbb{P}[{\Delta (x)\gt 0}]\to 1$ . Since $N_+(x_-)$ is a stopping time and $(X_i)$ are independent, the increments of the random walk $S_n$ after $N_+(x_-)$ are independent of $U_{N_+(x_-)}(f)$ , and it follows that the overshoot $R(x-1)$ is asymptotically independent of $U_{N_+(x_-)}(f)$ . The event $\{{S_{N_-(x)}=x}\}$ equals $\{{R(x-1)=1}\}$ , and thus the asymptotic distribution of $U_{N_+(x_-)}(f)$ conditioned on $S_{N_-(x)}=x$ is the same as without conditioning, and given by (3.29). Finally, the difference between $U_{N_+(x_-)}(f)$ and $U_{N_\pm (x)}(f)$ is negligible, e.g. as a consequence of (3.24).
In the remainder of the section, we prove moment convergence. Since we here consider different exponents $p$ simultaneously, we let $C_p$ denote constants that may depend on $p$ . (By our standing convention, they may also depend on e.g. $f$ and $h$ .)
Lemma 12.2. Assume that $\nu \,:\!=\,{\mathbb{E}} h(X_1)\gt 0$ and that ${\mathbb{E}} |h(X_1)|^p\lt \infty$ for every $p\lt \infty$ . Then, for every $p\lt \infty$ , $A\geqslant 2/\nu$ , and $x\geqslant 1$ , we have
Proof. We have $N_+(x)\leqslant N_-(x)+1$ ; hence it suffices to consider $N_-(x)$ .
Let $A\geqslant 2/\nu$ . If $N_-(x)\geqslant Ax$ , then (12.4) implies
Hence, for any $p\geqslant 2$ , using (10.24) for $h$ (which is a well-known consequence of Doob’s and Burkholder’s, or Rosenthal’s, inequalities),
We replace $p$ by $2p$ and $A$ by $2^k A$ in (12.22), and sum for $k\geqslant 0$ ; this yields (12.20).
Lemma 12.3. Assume that ( ${A}_{p}$ ) and ${\mathbb{E}} |h(X_1)|^p\lt \infty$ hold for every $p\lt \infty$ , and that $\nu \,:\!=\,{\mathbb{E}} h(X_1)\gt 0$ . Then, for every $p\geqslant 1$ and $x\geqslant 1$ ,
Proof. Let $V_n\,:\!=\,U_n(f;\,{\mathcal{D}})-\frac{\mu _{\mathcal{D}}}{b!}n^b$ , let $B\,:\!=\,2/\nu$ , and choose $q\,:\!=\,2bp$ . Then the Cauchy–Schwarz inequality, (10.25), and Lemma 12.2 yield, with $V^*_x\,:\!=\,\sup _{n\leqslant x}|V_n|$ ,
In other words,
We may here replace $U_n(f;\,{\mathcal{D}})$ by $S_n(h)$ (and thus $b$ by 1 and $\mu _{\mathcal{D}}$ by $\nu$ ). This yields
By the same proof, this holds also if we replace $N_{\pm }(x)$ by ${N_{\pm }(x)}\mp 1$ . Using (12.4), it follows that
In particular, by Minkowski’s inequality,
Consequently, by Minkowski’s and Hölder’s inequalities, (12.27) and (12.28),
Proof of Theorem 3.10. We have shown that the left-hand side of (3.29) is uniformly bounded in $L^p$ for $x\geqslant 1$ . By replacing $p$ with $2p$ , say, this implies that these left-hand sides are uniformly $p$ th-power integrable, for every $p\lt \infty$ , which implies convergence of all moments in (3.29).
The proof of Theorem 3.9 shows that, under the assumptions there, $\mathbb{P}(S_{N_-(x)}=x)$ converges to a positive limit as $x\to \infty$ ; hence $\mathbb{P}(S_{N_-(x)}=x)\gt c$ for some $c\gt 0$ and all large $x$ . This implies that the uniform $p$ th-power integrability also holds after conditioning (for large $x$ ), and thus all moments also converge in (3.29) after conditioning.
13. Constrained pattern matching in words
As stated in Section 1, Flajolet, Szpankowski and Vallée [Reference Flajolet, Szpankowski and Vallée23] studied the following problem; see also [Jacquet and Szpankowski [Reference Jacquet and Szpankowski31], Chapter 5]. Consider a random string $\Xi _n=\xi _1\cdots \xi _n$ , where the letters $\xi _i$ are i.i.d. random elements in some finite alphabet $\mathcal{A}$ . (We may regard $\Xi _n$ as the initial part of an infinite string $\xi _1\xi _2\ldots$ of i.i.d. letters.) Consider also a fixed word $\textbf{w}=w_1\cdots w_\ell$ from the same alphabet. (Thus, $\ell \geqslant 1$ denotes the length of $w$ ; we keep $\textbf{w}$ and $\ell$ fixed.) Let $N_n(\textbf{w})$ be the (random) number of occurrences of $\textbf{w}$ in $\Xi _n$ . More generally, for any constraint ${\mathcal{D}}=(d_1,\ldots,d_{{\ell -1}})$ , let $N_n(\textbf{w};{\mathcal{D}})$ be the number of constrained occurrences. This is a special case of the general setting in (3.1)–(3.2), with $X_i=\xi _i$ , and (cf. (1.1))
Consequently,
with $f$ given by (13.1).
Denote the distribution of the individual letters by
We will, without loss of generality, assume $p(x)\gt 0$ for every $x\in{\mathcal{A}}$ . Then (13.2) and the general results above yield the following result from [Reference Flajolet, Szpankowski and Vallée23], with $b=b({\mathcal{D}})$ given by (2.1). The unconstrained case (also in [Reference Flajolet, Szpankowski and Vallée23]) is a special case. Moreover, the theorem also holds for the exactly constrained case, with $\mu _{{\mathcal{D}}{=}}=\prod _i p(w_i)$ and some $\sigma ^2(\textbf{w};\,{{\mathcal{D}}{=}})$ ; we leave the detailed statement to the reader. A formula for $\sigma ^2$ is given in [Reference Flajolet, Szpankowski and Vallée23, (14)]; we show explicitly that $\sigma ^2\gt 0$ except in trivial (non-random) cases, which seems to have been omitted from [Reference Flajolet, Szpankowski and Vallée23].
Theorem 13.1. (Flajolet, Szpankowski and Vallée [Reference Flajolet, Szpankowski and Vallée23].) With notation as above, as $n\to \infty$ ,
for some $\sigma ^2=\sigma ^2(\textbf{w};\,{\mathcal{D}})\geqslant 0$ , with
Furthermore, all moments converge in (13.4).
Moreover, if $|{\mathcal{A}}|\geqslant 2$ , then $\sigma ^2\gt 0$ .
Proof. By (13.2), the convergence (13.4) is an instance of (3.21) in Theorem 3.4 together with (3.8) in Theorem 3.1. The formula (13.5) follows from (3.11) since, by (13.1) and independence,
Moment convergence follows by Theorem 3.6; note that ( ${A}_{p}$ ) is trivial, since $f$ is bounded.
Finally, assume $|{\mathcal{A}}|\geqslant 2$ and suppose that $\sigma ^2=0$ . Then Theorem 8.2 says that (8.20) holds, and thus, for each $n\gt D$ , the sum $S_{n}(g_j)$ is independent of $\xi _{D+1},\ldots,\xi _n$ . We consider only $j=1$ . Choose $a\in{\mathcal{A}}$ with $a\neq w_1$ . Consider first an exact constraint ${\mathcal{D}}{=}$ . Then $g_{{\mathcal{D}}{=}}$ is given by (4.18). Since $f(x_1,\ldots,x_\ell )=0$ whenever $x_1=a$ , it follows from (4.18) that $g(y_1,\ldots,y_b)=0$ whenever $y_1=(y_{1k})_{k=1}^M$ has $y_{11}=a$ . Hence, (6.1) shows that
Consequently, on the event $\xi _1=\ldots =\xi _{n}=a$ , recalling (4.8) and $M=D+1$ , we have $g_1(Y_k)=g_1(\xi _k,\ldots,\xi _{k+D})=-\mu$ for every $k\in [n]$ . Thus,
On the other hand, as noted above, the assumption $\sigma ^2=0$ implies that $S_{n}(g_1)$ is independent of $\xi _{D+1}, \ldots,\xi _n$ . Consequently, (13.8) implies
regardless of $\xi _{D+1}\ldots,\xi _{n+D}$ . This is easily shown to lead to contradiction. For example, we have, by (6.4),
and thus, conditioning on $\xi _1,\ldots,\xi _D$ ,
since all terms with $k\gt D$ are unaffected by the conditioning and thus vanish by (13.10); this contradicts (13.9) for large $n$ , since $\mu \gt 0$ . This contradiction shows that $\sigma ^2\gt 0$ for an exact constraint ${\mathcal{D}}{=}$ .
Alternatively, instead of using the expectation as in (13.11), one might easily show that if $n\gt D+\ell$ , then $\xi _{D+1},\ldots,\xi _n$ can be chosen such that (13.9) does not hold.
For a constraint $\mathcal{D}$ , $g=g_{\mathcal{D}}$ is given by a sum (4.23) of exactly constrained cases ${\mathcal{D}}^{\prime}{=}$ . Hence, by summing (13.7) for these ${\mathcal{D}}^{\prime}{=}$ , it follows that (13.7) also holds for $g_{\mathcal{D}}$ (with $\mu$ replaced by $\mu _{{\mathcal{D}}}$ ). This leads to a contradiction exactly as above.
Theorem 13.1 shows that, except in trivial cases, the asymptotic variance $\sigma ^2\gt 0$ for a subsequence count $N_n(\textbf{w};\,{\mathcal{D}})$ , and thus (13.4) yields a non-degenerate limit, so that it really shows asymptotic normality. By the same proof (see also Remark 3.6), Theorem 13.1 extends to linear combinations of different subsequence counts (in the same random string $\Xi _n$ ), but in this case, it may happen that $\sigma ^2=0$ , and then (13.4) has a degenerate limit and thus yields only convergence in probability to 0. (We consider only linear combinations with coefficients not depending on $n$ .) One such degenerate example with constrained subsequence counts is discussed in Example 8.1. There are also degenerate examples in the unconstrained case. In fact, the general theory of degenerate (in this sense) $U$ -statistics based on independent $(X_i)_1^\infty$ is well understood; for symmetric $U$ -statistics this case was characterized by [Reference Hoeffding27] and studied in detail by [Reference Rubin and Vitale56], and their results were extended to the asymmetric case relevant here in [Reference Janson35, Chapter 11.2]. In Appendix A we apply these general results to string matching and give a rather detailed treatment of the degenerate cases of linear combinations of unconstrained subsequence counts. See also [Reference Even-Zohar, Lakrec and Tessler19] for further algebraic aspects of both non-degenerate and degenerate cases.
Problem 13.1. Appendix A considers only the unconstrained case. Example 8.1 shows that for linear combinations of constrained pattern counts, there are further possibilities to have $\sigma ^2=0$ . It would be interesting to extend the results in Appendix A and characterize these cases, and also to obtain limit theorems for such cases, extending Theorem A.2 (in particular the case $k=2$ ); note again that the limit in Example 8.1 is of a different type than the ones occurring in unconstrained cases (Theorem A.2). We leave these as open problems.
14. Constrained pattern matching in permutations
Consider now random permutations. As usual, we generate a random permutation ${\boldsymbol{\pi }}={\boldsymbol{\pi }}^{(n)}\in \mathfrak{S}_n$ by taking a sequence $(X_i)_1^n$ of i.i.d. random variables with a uniform distribution $X_i\sim U(0,1)$ , and then replacing the values $X_1,\ldots,X_n$ , in increasing order, by $1,\ldots,n$ . Then the number $N_n(\tau )$ of occurrences of a fixed permutation $\tau =\tau _1\cdots \tau _\ell$ in $\boldsymbol{\pi }$ is given by the $U$ -statistic $U_n(f)$ defined by (3.1) with
Similarly, for any constraint ${\mathcal{D}}=(d_1,\ldots,d_{{\ell -1}})$ , we have for the number of constrained occurrences of $\tau$ , with the same $f$ as given by (14.1),
Hence, Theorems 3.3 and 3.4 yield the following result showing asymptotic normality of the number of (constrained) occurrences. As stated in the introduction, the unconstrained case was shown by Bóna [Reference Bóna7], the case $d_1=\cdots =d_{{\ell -1}}=1$ by Bóna [Reference Bóna9], and the general vincular case by Hofer [Reference Hofer29]; we extend the result to general constrained cases. The fact that $\sigma ^2\gt 0$ was shown in [Reference Hofer29] (in vincular cases); we give a shorter proof based on Theorem 8.2. Again, the theorem also holds for the exactly constrained case, with $\mu _{{\mathcal{D}}{=}}=1/\ell !$ and some $\sigma ^2(\tau ;\,{{\mathcal{D}}{=}})$ .
Theorem 14.1. (Largely Bóna [Reference Bóna7, Reference Bóna9] and Hofer [Reference Hofer29].) For any fixed permutation $\tau \in \mathfrak{S}_\ell$ and constraint ${\mathcal{D}}=(d_1,\ldots,d_{{\ell -1}})$ , as $n\to \infty$ ,
for some $\sigma ^2=\sigma ^2(\tau ;\,{\mathcal{D}})\geqslant 0$ and
Furthermore, all moments converge in (14.3).
Moreover, if $\ell \geqslant 2$ , then $\sigma ^2\gt 0$ .
Proof. This is similar to the proof of Theorem 13.1. By (14.2), the convergence (14.3) is an instance of (3.21) together with (3.8). The formula (14.4) follows from (3.11), since $ \mu \,:\!=\,{\mathbb{E}} f(X_1,\ldots,X_\ell )$ by (14.1) is the probability that $X_1,\ldots,X_\ell$ have the same order as $\tau _1,\ldots,\tau _\ell$ , i.e., $1/\ell !$ . Moment convergence follows by Theorem 3.6.
Finally, suppose that $\ell \geqslant 2$ but $\sigma ^2=0$ . Then Theorem 8.2 says that (8.20) holds, and thus, for each $j$ and each $n\gt D$ , the sum $S_{n}(g_j)$ is independent of $X_{D+1},\ldots,X_n$ ; we want to show that this leads to a contradiction. We again choose $j=1$ , but we now consider two cases separately.
Case 1: $d_1\lt \infty$ Recall the notation in (4.11)–(4.14), and note that in this case
Assume for definiteness that $\tau _1\gt \tau _2$ . (Otherwise, we may exchange $\lt$ and $\gt$ in the argument below.) Then (14.1) implies that
Consider first the exact constraint ${\mathcal{D}}{=}$ . Then $g_{{\mathcal{D}}{=}}$ is given by (4.18). Hence, (14.6) and (14.5) imply that
In particular,
By (4.23), the same holds for the constraint $\mathcal{D}$ . Hence, (6.1) shows that, for $g=g_{\mathcal{D}}$ ,
Consequently, on the event $X_1\lt \cdots \lt X_{n+D}$ , recalling (4.8) and $M=D+1$ , we have $g_1(Y_k)=g_1(X_k,\ldots,X_{k+D})=-\mu _{\mathcal{D}}$ for every $k\in [n]$ , and thus
On the other hand, as noted above, the assumption $\sigma ^2=0$ implies that $S_{n}(g_1)$ is independent of $X_{D+1}, \ldots,X_n$ . Consequently, (14.10) implies that a.s.
However, in analogy with (13.10)–(13.11), we have ${\mathbb{E}} g_1(Y_k)=0$ by (6.4), and thus
since all terms with $D\lt k\leqslant n-D$ are unaffected by the conditioning and thus vanish. But (14.12) contradicts (14.11) for large $n$ , since $\mu _{\mathcal{D}}\gt 0$ . This contradiction shows that $\sigma ^2\gt 0$ when $\ell \geqslant 2$ and $d_1\lt \infty$ .
Case 2: $d_1=\infty$ In this case, $\ell _1=1$ . Consider again first ${\mathcal{D}}{=}$ . Since $(X_i)$ are i.i.d., (4.18) and (6.1) yield, choosing $j_i\,:\!=\,(D+1)i$ , say,
(with $\mu =1/\ell !$ ). Thus, recalling (4.8), we have
By Theorem 8.2, the assumption $\sigma ^2=0$ thus implies that the final sum in (14.14) is independent of $X_{D+1}$ , for any $n\geqslant D+1$ . Since $(X_i)$ are independent, this is possible only if $f_1(X_{D+1})=c$ a.s. for some constant $c$ , i.e., if $f_1(x)=c$ for a.e. $x\in (0,1)$ .
However, by (14.1), $f ({x,X_2,\ldots,X_\ell } )=1$ if and only if $\tau _1-1$ prescribed $X_j$ are in $(0,x)$ and in a specific order, and the remaining $\ell -\tau _1$ ones are in $(x,1)$ and in a specific order. Hence, (6.1) yields
Since $\ell \geqslant 2$ , $f_1(x)$ is a non-constant polynomial in $x$ .
This is a contradiction, and shows that $\sigma ^2\gt 0$ also when $d_1=\infty$ .
Remark 14.1. Although $\sigma ^2\gt 0$ for each pattern count $N_n(\tau ;\,{\mathcal{D}})$ with $\ell \gt 1$ , non-trivial linear combinations might have $\sigma ^2=0$ , and thus variance of lower order, even in the unconstrained case. (This is similar to the case of patterns in strings in Section 13 and Appendix A.) In fact, for the unconstrained case, it is shown in [Reference Janson, Nakamura and Zeilberger39] that for permutations $\tau$ of a given length $\ell$ , the $\ell !$ counts $N_n(\tau )$ converge jointly, after normalization as above, to a multivariate normal distribution of dimension only $(\ell -1)^2$ , meaning that there is a linear space of dimension $\ell !-(\ell -1)^2$ of linear combinations that have $\sigma ^2=0$ . This is further analyzed in [Reference Even-Zohar18], where the spaces of linear combinations of $N_n(\tau )$ having variance $O ({n^{2\ell -r}} )$ are characterized for each $r=1,\ldots,\ell -1$ , using the representation theory of the symmetric group. In particular, the highest degeneracy, with variance $\Theta ({n^{\ell +1}} )$ , is obtained for the sign statistic $U_n({sgn})$ , where ${sgn}(x_1,\ldots,x_\ell )$ is the sign of the permutation defined by the order of $(x_1,\ldots,x_n)$ ; in other words, $U_n({sgn})$ is the sum of the signs of the $\binom n\ell$ subsequences of length $\ell$ of a random permutation ${\boldsymbol{\pi }}^{(n)}\in \mathfrak{S}_n$ . For $\ell =3$ , the asymptotic distribution of $n^{-(\ell +1)/2}U_n({sgn})$ is of the type in (A.19); see [Reference Fisher and Lee21] and [Reference Janson, Nakamura and Zeilberger39, Remark 2.7]. For larger $\ell$ , using the methods in Appendix A, the asymptotic distribution can be expressed as a polynomial of degree $\ell -1$ in infinitely many independent normal variables, as in (A.17); however, we do not know any concrete such representation.
We expect that, in analogy with Example 8.1, for linear combinations of constrained pattern counts, there are further possibilities to have $\sigma ^2=0$ . We have not pursued this, and we leave it as an open problem to characterize these cases with $\sigma ^2=0$ ; moreover, it would also be interesting to extend the results of [Reference Even-Zohar18] characterizing cases with higher degeneracies to constrained cases.
15. Further comments
We discuss here briefly some possible extensions of the present work. We have not pursued them, and they are left as open problems.
15.1. Mixing and Markov input
We have in this paper studied $U$ -statistics based on a sequence $(X_i)$ that is allowed to be dependent, but only under the rather strong assumption of $m$ -dependence (partly motivated by our application to constrained $U$ -statistics). It would be interesting to extend the results to weaker assumptions on $(X_i)$ , for example that it is stationary with some type of mixing property. (See e.g. [Reference Bradley12] for various mixing conditions and central limit theorems under some of them.)
Alternatively (or possibly as a special case of mixing conditions), it would be interesting to consider $(X_i)$ that form a stationary Markov chain (under suitable assumptions).
In particular, it seems interesting to study constrained $U$ -statistics under such assumptions, since the mixing or Markov assumptions typically imply strong dependence for sets of variables $X_i$ with small gaps between the indices, but not if the gaps are large.
Markov models are popular models for random strings. Substring counts, i.e., the completely constrained case of subsequence counts (see Remark 1.2), have been treated for Markov sources by e.g. [Reference Régnier and Szpankowski52], [Reference Nicodème, Salvy and Flajolet45], and [Reference Jacquet and Szpankowski31].
A related model for random strings is a probabilistic dynamic source; see e.g. [Reference Jacquet and Szpankowski31, Section 1.1]. For substring counts, asymptotic normality has been shown by [Reference Bourdon and Vallée11]. For (unconstrained or constrained) subsequence counts, asymptotic results on mean and variance are special cases of [Reference Bourdon and Vallée10] and [Reference Jacquet and Szpankowski31, Theorem 5.6.1]; we are not aware of any results on asymptotic normality in this setting.
15.2. Generalized $U$ -statistics
Generalized $U$ -statistics (also called multi-sample $U$ -statistics) are defined similarly to (3.1), but are based on two (for simplicity) sequences $(X_i)_1^{n_1}$ and $(Y_j)_1^{n_2}$ of random variables, with the sum in (3.1) replaced by a sum over all $i_1\lt \cdots \lt i_{\ell _1}\leqslant n_1$ and $j_1\lt \cdots \lt j_{\ell _2}\leqslant n_2$ , and $f$ now a function of $\ell _1+\ell _2$ variables. Limit theorems, including asymptotic normality, under suitable conditions are shown in [Reference Sen60], and extensions to asymmetric cases are sketched in [Reference Janson35, Example 11.24]. We do not know any extensions to $m$ -dependent or constrained cases, but we expect that such extensions are straightforward.
Appendix A. Linear combinations for unconstrained subsequence counts
As promised in Section 13, we consider here unconstrained subsequence counts in a random string $\Xi _n$ with i.i.d. letters, normalized as in Theorem 13.1, and study further the case of linear combinations of such normalized counts (with coefficients not depending on $n$ ); in particular, we study in some detail such linear combinations that are degenerate in the sense that the asymptotic variance $\sigma ^2=0$ .
The results are based on the orthogonal decomposition introduced in the symmetric case by Hoeffding [Reference Hoeffding28] (see also Rubin and Vitale [Reference Rubin and Vitale56]); this is extended to the asymmetric case in [Reference Janson35, Chapter 11.2], but the treatment there uses a rather heavy formalism, and we therefore give here a direct treatment in the present special case. (This case is somewhat simpler than the general case since we only have to consider finite-dimensional vector spaces below, but otherwise the general case is similar.) See also [Reference Even-Zohar, Lakrec and Tessler19], which contains a much deeper algebraic study of the asymptotic variance $\sigma ^2(f)$ and the vector spaces below, and in particular a spectral decomposition that refines (A.9).
Fix $\mathcal{A}$ and the random string $(\xi _i)_1^\infty$ . Assume, as in Section 13, that $p(x)\gt 0$ for every $x\in{\mathcal{A}}$ . Let $A\,:\!=\,|{\mathcal{A}}|$ , the number of different letters.
We fix also $\ell \geqslant 1$ and consider all unconstrained subsequence counts $N_n(\textbf{w})$ with $|\textbf{w}|=\ell$ . There are $A^\ell$ such words $\textbf{w}$ , and it follows from (13.2) and (13.1) that the linear combinations of these counts are precisely the asymmetric $U$ -statistics (3.1) for all $f\,:\,{\mathcal{A}}^\ell \to \mathbb{R}$ , by the relation
Note that Theorem 3.3 applies to every $U_n(f)$ , and thus (3.18) and (3.19) hold for some $\sigma ^2=\sigma ^2(f)\geqslant 0$ . (As stated above, this case of Theorem 3.3 with i.i.d. $X_i$ , i.e., the case $m=0$ , is also treated in [Reference Janson35, Corollary 11.20] and [Reference Janson37].)
Let $V$ be the linear space of all functions $f\,:\,{\mathcal{A}}^\ell \to \mathbb{R}$ . Thus $\dim V=A^\ell$ . Similarly, let $W$ be the linear space of all functions $h:{\mathcal{A}}\to \mathbb{R}$ , i.e., all functions of a single letter; thus $\dim W=A$ . Then $V$ can be identified with the tensor product $W^{\otimes \ell }$ , with the identification
We regard $V$ as a (finite-dimensional) Hilbert space with inner product
and, similarly, $W$ as a Hilbert space with inner product
Let $W_0$ be the subspace of $W$ defined by
Thus, $\dim W_0=A-1$ .
For a subset ${\mathcal{B}}\subseteq{\mathcal{A}}$ , let $V_{\mathcal{B}}$ be the subspace of $V$ spanned by all functions $h_1\otimes \cdots \otimes h_\ell$ as in (A.2) such that $h_i\in W_0$ if $i\in{\mathcal{B}}$ , and $h_i=1$ if $i\notin{\mathcal{B}}$ . In other words, if for a given $\mathcal{B}$ we define $W'_i\,:\!=\,W_0$ when $i\in{\mathcal{B}}$ and $W'_i=\mathbb{R}$ when $i\notin{\mathcal{B}}$ , then
It is easily seen that these $2^A$ subspaces of $V$ are orthogonal, and that we have an orthogonal decomposition
Furthermore, for $k=0,\ldots,\ell$ , define
Thus, we also have an orthogonal decomposition (as in [Reference Hoeffding28] and [Reference Rubin and Vitale56])
Note that, by (A.6) and (A.8),
Let $\Pi _{\mathcal{B}}$ and $\Pi _k=\sum _{|{\mathcal{B}}|=k}\Pi _{\mathcal{B}}$ be the orthogonal projections of $V$ onto $V_{\mathcal{B}}$ and $V_k$ . Then, for any $f\in V$ , we may consider its components $\Pi _kf\in V_k$ .
First, $V_0$ is the 1-dimensional space of constant functions in $V$ . Trivially, if $f\in V_0$ , then $U_n(f)$ is non-random, so ${Var}\ U_n(f)=0$ for every $n$ , and $\sigma ^2(f)=0$ . More interesting is that for any $f\in V$ , we have
Next, it is easy to see that taking ${\mathcal{B}}=\{{i}\}$ yields the projection $f_i$ defined by (6.1), except that $\Pi _{\{i\}} f$ is defined as a function on ${\mathcal{A}}^\ell$ ; to be precise,
Recalling (A.1), this leads to the following characterization of degenerate linear combinations of unconstrained subsequence counts.
Theorem A.1. With notation and assumptions as above, if $f\,:\,{\mathcal{A}}^\ell \to \mathbb{R}$ , then the following are equivalent:
-
(i) $\sigma ^2(f)=0$ .
-
(ii) $f_i=0$ for every $i=1,\ldots,\ell$ .
-
(iii) $\Pi _1f=0$ .
Proof. The equivalence (i) $\iff$ (ii) is a special case of Theorem 8.1; as stated above, this special case is also given in [Reference Janson37, Corollary 3.5].
The equivalence (ii) $\iff$ (iii) follows by (A.8) and (A.12), which give
Note that (A.8) and (A.12) also yield
Corollary A.1. The $A^\ell$ different unconstrained subsequence counts $N_n(\textbf{w})$ with $\textbf{w}\in{\mathcal{A}}^\ell$ converge jointly, after normalization as in (13.4), to a centered multivariate normal distribution in $\mathbb{R}^{A^\ell }$ whose support is a subspace of dimension $\ell (A-1)$ .
Proof. By Remark 3.6, we have joint convergence in Theorem 13.1 to some centered multivariate normal distribution in $V=\mathbb{R}^{A^\ell }$ . Let $L$ be the support of this distribution; then $L$ is a subspace of $V$ . Let $f\in V$ . Then, by Theorems 13.1 and A.1,
Hence $L=V_1$ , and the result follows by (A.10).
What happens in the degenerate case when $\Pi _1f=0$ and thus $\sigma ^2(f)=0$ ? For symmetric $U$ -statistics, this was considered by Hoeffding [Reference Hoeffding27] (variance) and Rubin and Vitale [Reference Rubin and Vitale56] (asymptotic distribution); see also Dynkin and Mandelbaum [Reference Dynkin and Mandelbaum16]. Their results extend to the present asymmetric situation as follows. We make a final definition of a special subspace of $V$ : let
In particular, $V_{\geqslant 1}$ consists of all $f$ with ${\mathbb{E}} f(\Xi _n)=0$ . Note also that $f_*$ in (6.3), by (A.11) and (A.14), equals $f-\Pi _0 f-\Pi _1 f\in V_{\geqslant 2}$ .
Lemma A.1. Let $0\leqslant k\leqslant \ell$ . If $f\in V_{\geqslant k}$ , then ${\mathbb{E}} U_n(f)^2 = O ({n^{2\ell -k}} )$ . Moreover, if $f\in V_{\geqslant k}\setminus V_{\geqslant k+1}$ , then ${\mathbb{E}} U_n(f)^2 = \Theta ({n^{2\ell -k}} )$ .
Proof. This is easily seen using the expansion (4.3) without the constraint $\mathcal{D}$ , and is similar to the symmetric case in [Reference Hoeffding27]; cf. also (in the more complicated $m$ -dependent case) the cases $k=1$ in (4.1) and $k=2$ in (6.10). We omit the details.
We can now state a general limit theorem that also includes degenerate cases.
Theorem A.2. Let $k\geqslant 1$ and suppose that $f\in V_{\geqslant k}$ . Then
where $Z$ is some polynomial of degree $k$ in independent normal variables (possibly infinitely many). Moreover, $Z$ is not degenerate unless $f\in V_{\geqslant k+1}$ .
Proof. This follows by [Reference Janson35, Theorem 11.19]. As noted in [Reference Janson35, Remark 11.21], it can also be reduced to the symmetric case in [Reference Rubin and Vitale56] by the following trick. Let $(\eta _i)_1^\infty$ be an i.i.d. sequence, independent of $(\xi _i)_1^\infty$ , with $\eta _i\sim U(0,1)$ ; then
where $\sum ^*$ denotes summation over all distinct $i_1,\ldots,i_\ell \in [n]$ , and the sum in (A.18) can be regarded as a symmetric $U$ -statistic based on $(\xi _i,\eta _i)_1^\infty$ . The result (A.17) then follows by [Reference Rubin and Vitale56].
Remark A.1. The case $k=1$ in Theorem A.2 is just a combination of Theorem 13.1 (in the unconstrained case) and Theorem A.1; then $Z$ is simply a normal variable. When $k=2$ , there is a canonical representation (where the number of terms is finite or infinite)
where $\zeta _i$ are i.i.d. $\textsf{N}(0,1)$ random variables and $\lambda _i$ are the non-zero eigenvalues (counted with multiplicity) of a compact self-adjoint integral operator on $L^2({\mathcal{A}}\times [0,1], \nu \times {d} t)$ , where $\nu \,:\!=\,{\mathcal{L}}(\xi _1)$ is the distribution of a single letter and ${d} t$ is Lebesgue measure; the kernel $K$ of this integral operator can be constructed from $f$ by applying [Reference Janson35, Corollary 11.5(iii)] to the symmetric $U$ -statistic in (A.18). We omit the details, but note that in the particular case $k=\ell =2$ , this kernel $K$ is given by
and thus the integral operator is
When $k\geqslant 3$ , the limit $Z$ can be represented as a multiple stochastic integral [Reference Janson35, Theorem 11.19], but we do not know any canonical representation of it. See also [Reference Rubin and Vitale56] and [Reference Dynkin and Mandelbaum16].
We give two simple examples of limits in degenerate cases; in both cases $k=2$ . The second example shows that although the space $V$ has finite dimension, the representation (A.19) might require infinitely many terms. (Note that the operator $T$ in (A.21) acts in an infinite-dimensional space.)
Example A.1. Let $\Xi _n$ be a symmetric binary string, i.e., ${\mathcal{A}}=\{{0,1}\}$ and $p(0)=p(1)=1/2$ . Consider
with
For convenience, we change notation and consider instead the letters $\hat{\xi }_i\,:\!=\,2\xi _i-1\in \{{\pm 1}\}$ ; then $f$ corresponds to
Thus
By the central limit theorem, $n^{-1/2}\sum _{i=1}^n \hat{\xi }_i\overset{d}{\longrightarrow } \zeta \sim \textsf{N}(0,1)$ , and thus (A.25) implies
This is an example of (A.17), with $k=\ell =2$ and limit given by (A.19), in this case with a single term in the sum and $\lambda _1 =1$ .
Note that in this example, the function $f$ is symmetric, so (A.22) is an example of a symmetric $U$ -statistic, and thus the result (A.26) is also an example of the limit result in [Reference Rubin and Vitale56].
Example A.2. Let ${\mathcal{A}}=\{{a,b,c,d}\}$ , with $\xi _i$ having the symmetric distribution $p(x)=1/4$ for each $x\in{\mathcal{A}}$ . Consider
with, writing $\boldsymbol{1}_y(x)\,:\!=\,\boldsymbol{1}\{{x=y}\}$ ,
Then $\Pi _0f=\Pi _1 f=0$ by symmetry, so $f\in V_{\geqslant 2}=V_2$ (since $\ell =2$ ).
Consider the integral operator $T$ on $L^2({\mathcal{A}}\times [0,1])$ defined by (A.21). Let $h$ be an eigenfunction with eigenvalue $\lambda \neq 0$ , and write $h_x(t)\,:\!=\,h(x,t)$ . The eigenvalue equation $Th=\lambda h$ then is equivalent to the following, using (A.21) and (A.28):
These equations hold a.e., but we can redefine $h_x(t)$ by these equations so that they hold for every $t\in [0,1]$ . Moreover, although originally we assume only $h_x\in L^2[0,1]$ , it follows from (A.29)–(A.32) that the functions $h_x(t)$ are continuous in $t$ , and then by induction that they are infinitely differentiable on $[0,1]$ . Note also that (A.29) and (A.30) yield $h_b(t)=-h_a(t)$ , and similarly $h_d(t)=-h_c(t)$ . Hence, we may reduce the system to
By differentiation, for $t\in (0,1)$ ,
Hence, with $\omega \,:\!=\,1/(2\lambda )$ ,
Furthermore, (A.34) yields $h_c(0)=0$ , and thus (A.37) has the solution (up to a constant factor that we may ignore)
By (A.36), we then obtain
However, (A.33) also yields $h_a(1)=0$ and thus we must have $\cos (1/2\lambda )=0$ ; hence
Conversely, for every $\lambda$ of the form (A.40), the argument can be reversed to find an eigenfunction $h$ with eigenvalue $\lambda$ . It follows also that all these eigenvalues are simple. Consequently, Theorem A.2 and (A.19) yield
where, as above, $\zeta _N$ are i.i.d. and $\textsf{N}(0,1)$ . A simple calculation, using the product formula for cosine (see [Reference Euler17, Section 12], [Reference Olver, Lozier, Boisvert and Clark46, 4.22.2]), shows that the moment generating function of the limit distribution $Z$ in (A.41) is
It can be shown that $Z\overset{d}{=} \frac 12\int _0^1 B_1(t)\,{d} B_2(t)$ if $B_1(t)$ and $B_2(t)$ are two independent standard Brownian motions; this is, for example, a consequence of (A.42) and the following calculation, using [Reference Cameron and Martin13] or [Reference Revuz and Yor53, p. 445] for the final equality:
We omit the details, but note that this representation of the limit $Z$ is related to the special form (A.28) of $f$ ; we may, intuitively at least, interpret $B_1$ and $B_2$ as limits (by Donsker’s theorem) of partial sums of $\boldsymbol{1}_a(\xi _i)-\boldsymbol{1}_b(\xi _i)$ and $\boldsymbol{1}_c(\xi _i)-\boldsymbol{1}_d(\xi _i)$ . In fact, in this example it is possible to give a rigorous proof of $n^{-1} U_n(f)\overset{d}{\longrightarrow } Z$ by this approach; again we omit the details.
Funding information
This work was supported by the Knut and Alice Wallenberg Foundation.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.
Acknowledgements
I thank Wojciech Szpankowski for stimulating discussions on patterns in random strings over the last 20 years. Furthermore, I thank Andrew Barbour and Nathan Ross for help with references, and the anonymous referees for helpful comments and further references; in particular, Theorem 3.5 is based on suggestions from a referee.