1. Introduction
Let
$X = (X_1, \ldots, X_n)$
be a random vector with coordinates in a Polish space E, and let
$f\,:\, E^n \to \mathbb{R}$
be a measurable function such that f(X), for n large, is square-integrable. For a large class of such functions f it is expected that as n grows without bound, f(X) behaves like a normal random variable. To quantify such estimates one is interested in bounding the distance between f(X) and the normal random variable
$\mathcal{N} \sim N (m_f, \sigma_f^2)$
where
$m_f = \mathbb{E}[f(X)]$
and
$\sigma_f^2 = Var (f(X))$
. Two such distances of interest are the Kolmogorov distance

and the Wasserstein distance

where this last supremum is taken over real-valued Lipschitz functions h such that
$|h(x) - h(y)| \leq |x - y|$
for all
$x, y \in \mathbb{R}$
.
For the case where the components of X are independent random variables, upper bounds on
$d_W(f(X), \mathcal{N})$
were first obtained in [Reference Chatterjee2], and these were extended to
$d_K(f(X), \mathcal{N})$
in [Reference Lachièze-Rey and Peccati14]. Both results rely on a class of difference operators that will be described in Section 2.
Very few results address the (weakly) dependent case, and in the present work we provide estimates on
$d_K(f(X), \mathcal{N})$
and
$d_W(f(X), \mathcal{N})$
when X is generated by a hidden Markov model. Such models are of interest because of their many applications in fields such as computational biology and speech recognition; see e.g. [Reference Durbin, Eddy, Krogh and Mitchison7]. Recall that a hidden Markov model (Z, X) consists of a Markov chain
$Z = (Z_1, \ldots, Z_n)$
which emits the observed variables
$X = (X_1, \ldots, X_n)$
. The possible states in Z are each associated with a distribution on the values of X. In other words the observation X is a mixture model where the choice of the mixture component for each observation depends on the component of the previous observation. The mixture components are given by the sequence Z. Note also that given Z, X is a Markov chain.
The content of the paper is as follows. Section 2 contains a short overview of results on normal approximation in the independent setting and introduces a simple transformation involving independent and identically distributed (i.i.d.) random variables that allows us to adapt these estimates to the hidden Markov model. Moreover, further quantitative bounds are provided for the special case when f is a Lipschitz function with respect to the Hamming distance. Applications to variants of the ones analyzed in [Reference Chatterjee2] and [Reference Lachièze-Rey and Peccati14] are developed in Sections 3 and 4, leading to an extra log-factor in the various rates obtained there. The majority of the more technical computations are carried out and presented in Section 5.
2. Main results
For a comprehensive review of Stein’s method we refer the reader to [Reference Chen, Goldstein and Shao4]. The exchangeable pairs approach was outlined in [Reference Chatterjee2]. We now recall below a few of its main points.
Let
$W \,{:}\,{\raise-1.5pt{=}}\, f(X)$
. Originally in [Reference Chatterjee2], and then in [Reference Lachièze-Rey and Peccati14], various bounds on the distance between W and the normal distribution were obtained through a variant of Stein’s method. As is well known, Stein’s method is a way to obtain normal approximations based on the observation that the standard normal distribution
$\mathcal{N}$
is the only centered and unit-variance distribution that satisfies

for all absolutely continuous g with almost-everywhere (a.e.) derivative g
′ such that
$\mathbb{E}|g^{\prime}(\mathcal{N})| < \infty$
(see [Reference Chen, Goldstein and Shao4]), and for the random variable W,
$|\mathbb{E}[ W g(W) - g^{\prime}(W)]|$
can be thought of as a distance measuring the proximity of W to
$\mathcal{N}$
. In particular, for the Kolmogorov distance, the solutions
$g_t$
to the differential equation

are absolutely continuous with a.e. derivative such that
$\mathbb{E}\big| g^{\prime}_t(\mathcal{N})\big| < \infty$
(see [Reference Chen, Goldstein and Shao4]). Then,

Further properties of the solutions
$g_t$
allow for upper bounds on
$\mathbb{E}\big[ g^{\prime}_t (W) - W g_t(W)\big]$
using difference operators associated with W that were introduced in [Reference Chatterjee2] (see [Reference Lachièze-Rey and Peccati14]). This is called the generalized perturbative approach in [Reference Chatterjee3], and we describe it next. First, we recall the perturbations used to bound the right-hand side of (1) in [Reference Chatterjee2, Reference Lachièze-Rey and Peccati14]. Let
$X^{\prime} = \big(X^{\prime}_1, \ldots, X^{\prime}_n\big)$
be an independent copy of X, and let
$W^{\prime} = f(X^{\prime})$
. Then (W, W
′) is an exchangeable pair, since it has the same joint distribution as (W
′, W). A perturbation
$W^A = f^A(X) \,{:}\,{\raise-1.5pt{=}}\, f\big(X^A\big)$
of W is defined through the change
$X^A$
of X as follows:

for any
$A \subseteq [n] \,{:}\,{\raise-1.5pt{=}}\, \{1, \ldots, n\}$
, including
$A = \emptyset$
. With these definitions, still following [Reference Chatterjee2], difference operators are defined for any
$\emptyset \subseteq A \subseteq [n]$
and
$i \notin A$
, as follows:

Moreover, set

and for
$k_{n, A} = 1 / \Big(\begin{array}{c}n\\|A|\end{array}\Big) (n - |A|)$
, set

Now, for
$W = f(X_1, \ldots, X_n)$
such that
$\mathbb{E} [W ]= 0$
,
$0 < \sigma^2 = \mathbb{E} \big[W^2\big] < \infty$
, and assuming all the expectations below are finite, [Reference Chatterjee2, Theorem 2.2] gives the bound

while [Reference Lachièze-Rey and Peccati14, Theorem 4.2] yields

where in both cases
$\mathcal{N}$
is now a standard normal random variable.
Our main abstract result generalizes (2) and (3) to the case when X is generated by a hidden Markov model. It is as follows.
Proposition 2.1. Let
$(Z, X)$
be a hidden Markov model with Z an aperiodic time-homogeneous and irreducible Markov chain with finite state space
$\mathcal{S}$
, and X taking values in a non-empty finite
$\mathcal{A}$
. Let
$W \,{:}\,{\raise-1.5pt{=}}\, f(X_1, \ldots, X_n)$
with
$\mathbb{E}[W] = 0$
and
$0 < \sigma^2 = \mathbb{E}\big[W^2\big] < \infty$
. Then there exist a finite sequence of independent random variables
$R = \big(R_0, R_1, \ldots, R_{|\mathcal{S}|(n-1)}\big)$
, with
$R_i$
taking values in
$\mathcal{S} \times \mathcal{A}$
for
$i = 0, \ldots, |\mathcal{S}|(n-1)$
, and a measurable function
$h\,:\, (\mathcal{S} \times \mathcal{A})^{|\mathcal{S}|(n-1)+1} \longrightarrow \mathbb{R}$
such that
$h\big(R_0, \ldots, R_{|\mathcal{S}| (n-1)} \big)$
and
$f(X_1, \ldots, X_n)$
are identically distributed. Therefore,

and

where
$\mathcal{N}$
is a standard normal random variable.
At first glance the above results might appear to be simple corollaries to (2) and (3). Indeed, as is well known, every Markov chain (in a Polish space) admits a representation via i.i.d. random variables
$U_1, \ldots, U_n$
, uniformly distributed on (0, 1), and the inverse distribution function. Therefore,
$f(X_1, \ldots, X_n) \overset{d}{=} h(U_1, \ldots, U_n)$
, for some function h, where, as usual,
$\overset{d}{=}$
indicates equality in distribution. However, providing quantitative estimates for
$\mathbb{E} |\Delta_j h(U_1, \ldots, U_n)|$
via f seems to be out of reach, since passing from f to h involves the ‘unknown’ inverse distribution function. For this reason, we develop for our analysis a more amenable, although more restrictive, choice of i.i.d. random variables, which is described intuitively in the next paragraph and then again in greater detail in Section 2.1.
Consider
$R = \big(R_0, \ldots, R_{|\mathcal{S}|(n-1)}\big)$
as stacks of independent random variables on the
$|\mathcal{S}|$
possible states of the hidden chain that determine the next step in the process, with
$R_0$
specifying the initial state. Each
$R_i$
takes values in
$\mathcal{S} \times \mathcal{A}$
and is distributed according to the transition probability from the present hidden state. Then, one has
$f\big(X_1, \ldots , X_n\big) \overset{d}{=} h \big(R_0, \ldots , R_{|\mathcal{S}|(n-1)}\big)$
, for
$h = f \circ \gamma$
, where the function
$\gamma$
translates between R and X. This construction is carried out in more detail in the next section. Further note that when
$(X_i)_{i \geq 1}$
is a sequence of independent random variables, the hidden chain in the model consists of a single state, and then the function
$\gamma$
is the identity function and so
$h = f$
.
In order for Proposition 2.1 to be meaningful, a further quantitative study of the terms in the upper bounds is necessary. It turns out that the variance terms determine the order of decay; see Remark 5.2. Nevertheless, we obtain additional quantitative estimates for all the terms involved; the proofs are presented in Section 5.
Proposition 2.2. With the notation as above, let f be Lipschitz with respect to the Hamming distance, i.e., let

for every
$x, y \in \mathcal{A}^n$
and where
$c > 0$
. Then, for any
$r > 0$
,

for n large enough and
$C_1, C_2 > 0$
depending on r and the parameters of the model.
Let R
′ and R
′′ be independent copies of R, and let
$\widetilde{\mathbf{R}}$
be the random set of recombinations of R, R
′, and R
′′. The set
$\widetilde{\mathbf{R}}$
consists of
$3^{|R|}$
random vectors of size
$|R|$
:

Let

Then the following general bound for the variance terms holds.
Proposition 2.3. With the notation as above and for
$U = T_{|R|}(h)$
or
$U = T^{\prime}_{|R|}(h)$
, we have

where
$[|R|] \,{:}\,{\raise-1.5pt{=}}\, \{1, \ldots, |R| \}$
.
Note that in Proposition 2.3, the underlying function f is not assumed to be Lipschitz. Moreover, the function h is not symmetric, and therefore the expression above cannot be simplified further, in contrast to the similar results in [Reference Lachièze-Rey and Peccati14]. The proofs of Propositions 2.2 and 2.3 are technical and are delayed to Sections 5.1 and 5.2 respectively.
2.1. Additional details on the construction of R
Let (Z, X) be a hidden Markov model with Z an aperiodic time-homogeneous and irreducible Markov chain on a finite state space
$\mathcal{S}$
, and with X taking values in an alphabet
$\mathcal{A}$
. Let P be the transition matrix of the hidden chain, and let Q be the
$|\mathcal{S}| \times |\mathcal{A}|$
probability matrix for the observations; i.e.,
$Q_{ij}$
is the probability of seeing output j if the latent chain is in state i. Let the initial distribution of the hidden chain be
$\mu$
. Then

Next we introduce a sequence of independent random variables
$R_0, \ldots, R_{|\mathcal{S}|(n-1)}$
taking values in
$\mathcal{S} \times \mathcal{A}$
and a function
$\gamma$
such that
$\gamma\big(R_0, \ldots, R_{|\mathcal{S}| (n-1) }\big) = \big(Z_1, \ldots, Z_n;\, X_1, \ldots , X_n\big)$
. For any
$s, s^{\prime} \in \mathcal{S}$
,
$x \in \mathcal{A}$
, and
$i \in \{0, \ldots, n-1\}$
, let

The random variables
$R_i$
are well defined since
$\sum_x Q_{s, x} = 1$
for any
$s \in \mathcal{S}$
and
$\sum_s P_{s^{\prime}, s} = \sum_s \mu(s) = 1$
for any
$s^{\prime} \in \mathcal{S}$
. One can think of the variables
$R_i$
as a set of instructions indicating where the hidden Markov model goes next. The function
$\gamma$
reconstructs the realization
$(Z_i, X_i)_{i \geq 1}$
sequentially from the sequence
$(R_i)_{i \geq 0}$
. In particular,
$\gamma$
captures the following relations:

One can also think of the sequence
$(R_i)_{i \geq 0}$
as
$|\mathcal{S}|$
stacks of random variables on the
$|\mathcal{S}|$
possible states of the latent Markov chain, and the values being rules for the next step in the model. Note that only one variable on the ith level of the stack will be used to determine the
$(i+1)$
th hidden and observed pair. Furthermore, the distribution of the random variables
$R_i$
for
$i \geq 1$
encodes the transition and output probabilities in the P and Q matrices of the original model.
Thus one can write
$f\big(X_1, \ldots, X_n\big) = h\big(R_0, \ldots , R_{|\mathcal{S}| (n-1) }\big)$
, for
$h \,{:}\,{\raise-1.5pt{=}}\, f \circ \gamma$
, where the function
$\gamma$
does the translation from
$(R_i)_{i \geq 0}$
to
$\big(Z_i, X_i\big)_{i \geq 1}$
as described above.
Let
$R^{\prime} = \big(R^{\prime}_0, \ldots, R_{|\mathcal{S}| (n-1)}^{\prime}\big)$
be an independent copy of R. Let
$A \subseteq \{0, 1, \ldots, |\mathcal{S}|(n-1)\}$
, and let the change
$R^A$
of R be defined as follows:

where, as before, when
$A = \{j\}$
we write
$R^j$
instead of
$R^{\{j\}}$
.
Recall that the ‘discrete derivative’ of h with a perturbation A is

Then (4) and (5) follow from (2) and (3), respectively, since when (Z, X) is a hidden Markov model one writes

where the sequence
$(R_i)_{i \geq 0}$
is a sequence of independent random variables.
Remark 2.1. (i) The idea of using stacks of independent random variables to represent a hidden Markov model is somewhat reminiscent of Wilson’s cycle popping algorithm for generating a random directed spanning tree; see [Reference Wilson17]. The algorithm has also been related to loop-erased random walks in [Reference Gorodezky and Pak9].
(ii) If S consists of a single state, making the hidden chain redundant, there is a single stack of instructions. This corresponds to the independent setting of [Reference Chatterjee2] and [Reference Lachièze-Rey and Peccati14], and then
$\gamma$
is just the identity function.
(iii) The same approach, via the use of instructions, is also applicable when
$\mathcal{A}$
and
$\mathcal{S}$
are infinite countable. The
$Q_{s,x}$
no longer form a finite matrix, but the same definition holds as long as
$\sum_{x \in \mathcal{A}} Q_{s,x} = 1$
for all
$s \in \mathcal{S}$
. We need a countably infinite number of independent instructions to encode
$(Z_i, X_i)_{1 \leq i \leq n}$
. In particular, let
$R_0$
and
$\big(R_{i, s}\big)_{1 \leq i \leq n, s \in \mathcal{S}}$
be such that

Then the function
$\gamma$
reconstructs
$\big(Z_i, X_i\big)_{1 \leq i \leq n}$
from
$R_0$
and
$\big(R_{i,s}\big)_{1 \leq i \leq n, s \in \mathcal{S}}$
via

(iv) It is possible to obtain bounds on the various terms involved in Proposition 2.2 and Proposition 2.3 in the case when
$|\mathcal{S}|$
is a function of n. Assuming that there exists a general deterministic upper bound
$g_r(n)$
on
$\mathbb{E} |\Delta_i h|^r$
, one can bound the non-variance terms in Proposition 2.2 by
$C |\mathcal{S}| \left(\sqrt{ g_6(n) } + g_3(n) \right) /\sigma^3 $
. The variance terms, using Proposition 2.3, will then be bounded by
$C \sqrt{ |\mathcal{S}| g_4(n) + |\mathcal{S}|^3 ( A ) } /\sigma^2$
, where A is a complicated expression that depends on the particular problem as well as on
$\mathbb{E}|\Delta_i h|^r$
and
$|\mathcal{S}|$
.
The next crucial part in the analysis is the key result we use everywhere: if a change in an instruction propagates X levels (a random variable), then
$\mathbb{P} (X > K) \leq (1 - \epsilon)^K$
, for some absolute
$0 < \epsilon < 1$
. If
$|\mathcal{S}|$
is finite, this holds under some standard assumptions on the model. If
$|\mathcal{S}|$
grows with n, then under some minor additional restrictions in the hidden Markov model, we have
$\mathbb{P}(X > K) \leq (1 - 1/|\mathcal{S}|)^K$
. Then K can be chosen to be at most a small power of n (we take
$K = \text{ln}\ n$
in the finite case, which explains the logarithmic factors in our bounds). For meaningful bounds,
$|\mathcal{S}|$
could grow at most like
$\text{ln}\ n$
. However, much more fine-tuning is necessary in the actual proof.
For some recent results (different from ours) on normal approximation for functions of general Markov chains we refer the reader to [Reference Chen, Shao and Xu5].
3. Covering process
Although our framework was initially motivated by [Reference Houdré and Kerchev11] and the problem of finding a normal approximation result for the length of the longest common subsequences in dependent random words, some applications to stochastic geometry are presented below. Our methodology can be applied to other related settings, in particular to the variant of the occupancy problem introduced in the recent article [Reference Grabchak, Kelbert and Paris10] (see Remark 4.1).
Let
$(K, \mathcal{K})$
be the space of compact subsets of
$\mathbb{R}^d$
, endowed with the hit-and-miss topology. Let
$E_n$
be a cube of volume n, and let
$C_1, \ldots, C_n$
be random variables in
$E_n$
, called germs. In the i.i.d. setting of [Reference Lachièze-Rey and Peccati14], each
$C_i$
is sampled uniformly and independently in
$E_n$
; i.e., if
$T \subset E_n$
with measure
$|T|$
, then

for all
$i \in \{1, \ldots, n\}$
.
Here, we consider
$C_1, \ldots, C_n$
, generated by a hidden Markov model in the following way. Let
$Z_1, \ldots, Z_n$
be an aperiodic irreducible Markov chain on a finite state space
$\mathcal{S}$
. Each
$s \in \mathcal{S}$
is associated with a measure
$m_s$
on
$E_n$
. Then for each measurable
$T \subseteq E_n$
,

Assume that there are constants
$0 < c_m \leq c_M$
such that for any
$s \in \mathcal{S}$
and measurable
$T \subseteq E_n$
,

Note that
$c_m = c_M = 1$
recovers the setting of [Reference Lachièze-Rey and Peccati14].
Let
$K_1, \ldots, K_n$
be compact sets (grains) with
$Vol (K_i) \in (V_1, V_2)$
(absolute constants) for
$i = 1,\ldots, n$
. Let
$X_i = C_i + K_i$
for
$i = 1, \ldots, n$
be the germ–grain process. Consider the closed set formed by the union of the germs translated by the grains

We are interested in the volume covered by
$F_n$
,

and the number of isolated grains,

Theorem 3.1. Let
$\mathcal{N}$
be a standard normal random variable. Then, for all
$n \in \mathbb{N}$
,


for some constant
$C > 0$
independent of n.
The study of the order of growth of
$Var f_I$
and
$Var f_V$
is not really part of the scope of the present paper. In the independent case, there are constants
$0 < c_V \leq C_V$
such that
$c_V n \leq Var f_V \leq C_V n$
and
$c_V n \leq Var f_I \leq C_V n $
for n sufficiently large (see [Reference Kendall and Molchanov13, Theorem 4.4]). In our dependent setting, a variance lower bound of order n will thus provide a rate of order
$(\log n)^4/\sqrt{n}$
.
The proofs of the normal approximations for
$f_V$
and
$f_I$
are carried out in Sections 3.1 and 3.2. Many of the more technical computations are carried out in Section 5.
3.1. Normal approximation for
$\textbf{\textit{f}}_{\textbf{\textit{V}}}$
Write
$f_V\big(X_1, \ldots, X_n\big) = h\big(R_0, \ldots, R_{|\mathcal{S}|(n-1)}\big)$
for a set of instructions R defined as in Section 2.1. The volume of each grain is bounded by
$V_2$
, so
$f_V$
is Lipschitz with respect to the Hamming distance, with constant
$V_2$
. Proposition 2.1 holds, and from Proposition 2.2, the non-variance terms in the bounds in Proposition 2.1 are bounded by
$C (\text{ln}\ n)^3 / \sqrt{n}$
. Here and below, C is a constant, independent of n, which can vary from line to line. Indeed, for instance,

To analyze the bound on the variance terms given by Proposition 2.3, first note that

using Proposition 2.2. The other terms are bounded as follows.
Proposition 3.1. Let
$A \subsetneq [ | R | ] $
, and let
$B_{|R|}(h)$
,
$B_{|R|}^k(h)$
, and
$B_{|R|}^j(h)$
be as in (6). Then



for some constant
$C > 0$
that does not depend on n.
Proof. See Section 5.3 for the proof of the first bound. The others follow similarly.
The bound on the variance terms in Proposition 2.3 becomes

3.2. Normal approximation for
$\textbf{\textit{f}}_{\textbf{\textit{I}}}$
The proof of (9) is more involved since the function
$f_I$
is not Lipschitz. Abusing notation, write
$f_I\big(X_1, \ldots, X_n\big) = h\big(R_0, \ldots, R_{|\mathcal{S}|(n-1)}\big)$
for a set of instructions R as in Section 2.1. Proposition 2.1 holds, and, as in our analysis for
$f_V$
, we proceed by estimating the non-variance terms in the bounds. The following holds.
Proposition 3.2. For any
$t = 1, 2, \ldots$
and
$i \in \{0, \ldots, |\mathcal{S}|(n-1)\}$
,

where
$C = C(t) > 0$
.
Proof. See Section 5.4. The approach is similar to the one employed for Proposition 2.2 and uses a graph representation.
Therefore, for the non-variance term in Proposition 2.1, we have

We are left to analyze the bound on the variance terms given by Proposition 2.3. First, note that using Proposition 3.2,

Proposition 3.3. Let
$A \subsetneq [ | R | ] $
, and let
$B_{|R|}(h)$
,
$B_{|R|}^k(h)$
, and
$B_{|R|}^j(h)$
be as in (6). Then the bounds (11), (12), and (13) hold in this setting as well.
Proof. See Section 5.5.
The bound on the variance terms in Proposition 2.3 becomes

4. Set approximation with random tessellations
Let
$K \subseteq [0,1]^d$
be compact, and let X be a finite collection of points in K. The Voronoi reconstruction, or the Voronoi approximation, of K based on X is given by

For
$x \in [0,1]^d$
, denote by
$V(x;\, X)$
the Voronoi cell with nucleus x within X, given by

where
$(X, x) = X \cup \{x\}$
, and where, as usual,
$||\cdot||$
is the Euclidean norm in
$\mathbb{R}^d$
. The volume approximation of interest is

In [Reference Lachièze-Rey and Peccati14],
$X = (X_1, \ldots, X_n)$
is a vector of n i.i.d. random variables uniformly distributed on
$[0,1]^d$
. Here, we consider
$X_1, \ldots, X_n$
generated by a hidden Markov model in the following way. Let
$Z_1, \ldots, Z_n$
be an aperiodic irreducible Markov chain on a finite state space
$\mathcal{S}$
. Each
$s \in \mathcal{S}$
is associated with a measure
$m_s$
on
$[0,1]^d$
. Then for each measurable
$T \subseteq [0,1]^d$
,

Assume, moreover, that there are constants
$0 < c_m \leq c_M$
such that for any
$s \in \mathcal{S}$
and measurable
$T \subseteq [0,1]^n$
,

Recall the notions of Lebesgue boundary of K given by

and

where d(x, A) is the Euclidean distance from
$x \in \mathbb{R}^d$
to
$A \subseteq \mathbb{R}^d$
.
Now, for
$\beta > 0$
, let

Next, recall that K is said to satisfy the (weak) rolling ball condition if

Our main result is as follows.
Theorem 4.1. Let
$K \subseteq [0,1]^d$
satisfy the rolling ball condition. Moreover, assume that there exist
$S_-(K), \, S_+(K), \, \alpha > 0$
such that

Then, for
$n \geq 1$
,

where
$C > 0$
is a constant not depending on n.
As in [Reference Lachièze-Rey and Peccati14], we split Theorem 4.1 into two results. The first one establishes a central limit theorem.
Proposition 4.1. Let
$0 < \sigma^2 = Var( \varphi(X))$
. Assume that
$Vol(\partial K^r) \leq S_+(K) r^{\alpha}$
for some
$S_+(K), \alpha > 0$
. Then, for
$n \geq 1$
,

where
$C > 0$
is a constant not depending on n.
The second result introduces bounds on the variance under some additional assumptions.
Proposition 4.2. Let
$K \subseteq [0,1]^d$
satisfy the rolling ball condition. Moreover, assume that there exist
$S_-(K), \, S_+(K), \, \alpha > 0$
such that

Then, for n sufficiently large,

for some
$C_d^-, C_d^+ > 0$
.
It is clear that Theorem 4.1 will be proved once Proposition 4.1 and Proposition 4.2 are established.
4.1. Proof of Proposition 4.1
Again, as before, we introduce a set of instructions R and a function h such that
$h(R) = \varphi(X)$
. We apply Proposition 2.1, and the initial step is to bound
$\mathbb{E}|\Delta_i h(R)|^r$
, where
$ r > 0$
. In fact, the following holds.
Proposition 4.3. Under the assumptions of Proposition 4.1,

where
$c_{d,r , \alpha}$
depends on the parameters of the model and the dimension d, as well as on r and
$\alpha$
. Moreover, for
$n,q \geq 1$
,

for some
$C_{d, r, \alpha} > 0$
.
Before presenting the proof, we introduce some notation. Recall that
$x, y \in [0,1]^d$
are said to be Voronoi neighbors within the set X if
$V(x;\, X) \cap V(y;\, X) \neq \emptyset$
. In general, the Voronoi distance
$d_V(x, y;\, X)$
between x and y within X is given by the smallest
$k \geq 1$
such that there exist
$x = x_0, x_1 \in X, \ldots, x_{k-1} \in X, x_k = y$
such that
$x_i, x_{i+1}$
are Voronoi neighbors for
$i = 0, \ldots, k-1$
.
Denote by
$v(x, y;\, X) = Vol\big(V(y;\, X) \cap V(x;\, (y, X)) \big)$
the volume that
$V(y;\,X)$
loses when x is added to X. Then, for
$x \notin X$
,

Let
$R_k(x;\, X)$
be the distance from x to the farthest point in the cell of a kth-order Voronoi neighbor in X; i.e., for
$X = (X_1, \ldots, X_n)$
,

with
$R(x;\, X) \,{:}\,{\raise-1.5pt{=}}\, R_1(x;\, X)$
. If x does not have kth-order neighbors, take
$R_k(x;\, X) = \sqrt{d}$
. Then

where
$\kappa_d = \pi^{d/2} / \Gamma(d/2 + 1)$
is the volume of the unit ball in
$\mathbb{R}^d$
.
Proof of Proposition 4.3. The proof relies on two results. First, we have the following technical lemma which is established in Section 5.6.
Lemma 4.1. Assume there exist
$S_+(K), \alpha > 0$
such that
$Vol(\partial K^r) \leq S_+(K) r^{\alpha}$
for all
$r > 0$
. Let

Then, for some
$c_{d, qd + \alpha, k} >0$
,

for all
$n \geq 1$
,
$q \geq 1$
.
Second, within our framework, we have the following version of [Reference Lachièze-Rey and Peccati14, Proposition 6.4] where S(R) is the original set of points generated by R and
$S(R^i)$
is the set of points generated after the change in the instruction
$R_i$
.
Proposition 4.4. (i) If, for every
$s \in S(R) \setminus S\big(R^i\big)$
, the set
$R_1(s, S(R))$
, which contains s and all its neighbors, is either entirely in K or entirely in
$K^c$
, then
$\Delta_i h(R)= 0$
. A similar result holds for
$s \in S\big(R^i\big) \setminus S(R)$
and the set
$R_1\big(s, S\big(R^i\big)\big)$
.
(ii) Assume
$|i - j|$
is large enough so that
$(S\big(R^i\big) \setminus S(R) ) \cup (S(R^j) \setminus S(R)) = S\big(R^{ij}\big) \setminus S(R)$
, where
$S\big(R^{ij}\big)$
is the set of points generated after the changes in both
$R_i$
and
$R_j$
. Suppose that for every
$s_1 \in S\big(R^i\big) \Delta S(R)$
and
$s_2 \in S\big(R^j\big) \Delta S(R)$
, at least one of the following holds:
-
1.
$d_V( s_1, s_2;\, S\big(R^{ij}\big) \cap S(R)) \geq 2$ , or
-
2.
$d_V\big(s_1, \partial K;\, S\big(R^{ij}\big) \cap S(R)\big) \geq 2$ and
$d_V\big(s_2, \partial K;\, S\big(R^{ij}\big) \cap S(R) \big) \cap S(R)) \geq 2$ .
Then
$\Delta_{i,j} h(R) = 0$
.
Now, write

As before, for some
$T > 0$
, there exist an event E and
$\epsilon > 0$
such that, conditioned on E,
$|S(R^i) \setminus S(R)| = |S(R) \setminus S(R^i)| \leq T$
and
$\mathbb{P}(E^c) \leq (1 - \epsilon)^T$
. Then, from Lemma 4.1, there exist
$S_+(K), \alpha > 0$
such that

where
$c_{d,r , \alpha}$
depends on the parameters of the model and the dimension d, as well as on r and
$\alpha$
. If
$T = c\text{ln}\ n$
for a suitable
$c > 0$
, then

as desired. An application of the rth-moment Efron–Stein inequality (see [Reference Houdré and Ma12, Reference Rhee and Talagrand16]) then yields (23).
For the non-variance term in Theorem 2.1, we have

To analyze the bound on the variance terms given by Proposition 2.3, first note that

again using Proposition 4.3. The other terms are bounded as follows.
Proposition 4.5. Let
$A \subsetneq [ | R | ] $
, and let
$B_{|R|}(h), B_{|R|}^k(h)$
and
$B_{|R|}^j(h)$
be as in (6). Then, for
$\epsilon > 0$
,

for some constant
$C_\epsilon> 0$
that does not depend on n.
Proof. See Section 5.7 for the proof of the first bound. The others follow similarly.
The bounds on the variance terms in Proposition 2.3 become

4.2. Proof of Proposition 4.2
The upper bound follows immediately from Proposition 4.3.
Recall the following result ([Reference Lachièze-Rey and Peccati14, Corollary 2.4]) concerning the variance. Let
$X \,{:}\,{\raise-1.5pt{=}}\, (X_1, \ldots, X_n) \in E^n$
, where E is a Polish space. If X
′ is an independent copy of X, and
$f\,:\, E^n \to \mathbb{R}$
is measurable, with
$\mathbb{E}\big[ f(X)^2\big] < \infty$
,

In our setting we take
$f = \varphi$
. Unlike in [Reference Lachièze-Rey and Peccati14], the function
$\varphi$
is not symmetric, and the right-hand side of (26) cannot be simplified. The lower bound provided by [Reference Lachièze-Rey and Peccati14, Corollary 2.4] recovers the correct order of growth of the variance, which is enough for our purposes. However, more precise generic results are also available; see e.g. [Reference Bousquet and Houdré1].
Let H be the realization of the hidden chain for X. By the law of total variance,
$Var(\varphi(X)) \geq Var(\varphi(X) | H)$
. Let X
′ be an independent copy of X, given H. Note that, given H,
$(X_i)_{i = 1, \ldots, n}$
and
$\big(X^{\prime}_i\big)_{ i = 1, \ldots, n}$
are independent random variables which are not identically distributed.
Applying (26) to
$\varphi(X | H)$
, we obtain

where
$X^i = \big(X_1, \ldots, X_{i-1}, X^{\prime}_i, X_{i+1}, \ldots, X_n\big)$
, and
$\mathbb{E}^H$
indicates that H is given. (To simplify notation in what follows we drop the H.) The difference from the proof in [Reference Lachièze-Rey and Peccati14] is that now the variables are no longer identically distributed. Write

where
$X^{(i)} = \big(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n\big)$
. By Lemma 4.1,

We are left to study
$\mathbb{E} \big[ \varphi\big(X^i\big) - \varphi\big(X^{(i)}\big)\big]$
. Recall that

Now, for the case
$X^{\prime}_i \in K^C$
(the other case being equivalent), we have

since
$v\big(X^{\prime}_i, X_j;\, X^{(i,j)}\big) \geq 0$
. Then

using the independence after conditioning on H and the properties of the model. We want to find an event that implies that
$v\big(x,y;\, X^{(i,j)}\big) \geq c n^{-1}$
. One instance is when no point of
$X^{(i,j)}$
falls in
$B\big(y, 6 \beta n^{-1/d}\big)$
. Indeed, then
$B\big(y, 3 \beta n^{-1/d}\big) \subset V\big(y, X^{(i,j)}\big)$
. The distance between y and x is less than
$\beta n^{-1/d}$
, and so there is
$z \in B\big(y, 3 \beta n^{-1/d}\big)$
, namely
$z = x + \beta n^{-1/d} (x - y)/||x-y||$
, such that

Then
$v\big(x,y;\, X^{(i,j)}\big) \geq Vol\big(B\big(z, \beta n^{-1/d}\big) = \kappa_d \beta^d n^{-1}$
. Finally,

for some
$c_{d, \beta} > 0$
depending on the parameters of the model, the dimension d, and
$ \beta$
. Then

Therefore, by the very definition of
$\gamma(K, r, \beta)$
and since the case
$X^{\prime}_i \in K$
is symmetric,

If the rolling ball condition (18) and the lower bound on
$\partial{K}^{n^{-1/d}}$
both hold, then

which dominates the contribution (27) from
$\mathbb{E}\big[\varphi (X) - \varphi\big(X^{(i)}\big)\big]$
. Therefore, finally,

as desired.
Remark 4.1. Let us expand a bit on another potential application of our generic framework, namely the occupancy problem as studied in [Reference Grabchak, Kelbert and Paris10]. To set up the notation,
$(Z_1, \ldots, Z_n)$
is an aperiodic, irreducible, and time-homogeneous (hidden) Markov chain that transitions between different alphabets. To each alphabet is associated a distribution over the collection of all possible letters, giving rise to the observed letters
$(X_1, \ldots, X_n)$
. We assume that the number of alphabets is finite but that the number of total letters is
$\lfloor\alpha n\rfloor$
, for some fixed
$\alpha >0$
. One studies
$W \,{:}\,{\raise-1.5pt{=}}\, f(X_1, \ldots, X_n)$
—the number of letters that have not appeared among the
$X_1, \ldots, X_n$
. Then an analysis as in the proof of Theorem 3.1 leads to the following:

where Var(W) is a function of n,
$\mathcal{N}$
is the standard normal distribution, and
$C > 0$
is a constant depending on the parameters of the model but not on n. As mentioned at the beginning of the section, the study of the precise order of growth of the variance of W, in our dependent framework, is not within the scope of the current paper. For the i.i.d. case one can show (see e.g. [Reference Englund8]) that
$Var(W) \sim \big(\alpha e^{-1/ \alpha} - (1 + \alpha) e^{-2 / \alpha} \big) n$
as
$n \to \infty$
.
5. Technical results
5.1. Bounds on terms involving
$\Delta_i h$
Recall the setting of Proposition 2.2 and Proposition 2.3. Let (Z, X) be a hidden Markov model and let the latent chain Z be irreducible and aperiodic, with finite state space
$\mathcal{S}$
. Assume that Z is started at the stationary distribution.
We start by establishing a technical result regarding Z.
First, note that there exist
$K \geq 1$
and
$\epsilon \in (0,1)$
such that

and thus,

for all
$n \geq 1$
and
$s, s^{\prime} \in \mathcal{S}$
.
Lemma 5.1. Let
$K \geq 1$
and
$\epsilon \in (0,1)$
be as in (28), and let
$(Z_i)_{i \geq 1}$
be an irreducible and aperiodic Markov chain with finite state space
$\mathcal{S}$
. Then

for any
$t \geq 1$
,
$j \geq 1$
, and
$(s_1, \ldots, s_t) \in \mathcal{S}^t$
.
Proof. We show (29) by induction. The case
$t = 1$
follows from (28). Next, for
$(s_1, \ldots, s_{t+1}) \in \mathcal{S}^{t+1}$
,

where we have used the Markov property, (28), and finally the induction hypothesis. This suffices for the proof of (29), and thus the proof of the lemma is complete.
Let
$f\,:\, \mathcal{A}^n \to \mathbb{R}$
be Lipschitz, i.e., be such that
$|f(x) - f(y)| \leq c\sum_{i =1}^n \mathbf{1}_{x_i \neq y_i}$
for every
$x, y \in \mathcal{A}^n$
, where
$c > 0$
. Let
$R = \big(R_0, \ldots, R_{|\mathcal{S}|(n-1)}\big)$
be a vector of independent random variables, and let h be the function such that

Let R ′ be an independent copy of R. The next result provides a tail inequality that is key for Proposition 2.2.
Proposition 5.1. Let
$K > 0$
and
$\epsilon > 0$
be as in (28). Then

for any
$x \in \mathbb{N}$
, where
$C > 0$
depends on the parameters of the model but neither on n nor on x.
Proof. The sequence of instructions
$R^i \,{:}\,{\raise-1.5pt{=}}\, \big(R_0, \ldots, R^{\prime}_i, \ldots, R_{|\mathcal{S}(n-1)|}\big)$
may give rise to a different realization (Z
′, X
′) of the hidden Markov model, as compared to (Z, X), the one generated by R. The two models are not independent. In particular, if instruction
$R_i$
determines
$\big(Z_j, X_j\big)$
and
$R^{\prime}_i$
determines
$\big(Z^{\prime}_j, X^{\prime}_j\big)$
, then
$\big(Z_k, X_k\big) = \big(Z^{\prime}_k, X^{\prime}_k\big)$
for
$k < j$
. Let s be the smallest nonnegative integer (possibly
$s = \infty$
) such that
$Z_{j+s} = Z^{\prime}_{j+s}$
. Then for any
$k > j+s$
,
$\big(Z_k, X_k\big) = \big(Z^{\prime}_k, X^{\prime}_k\big)$
as well. Finally, if
$k \in \{j, \ldots, j + s - 1\}$
, the pairs
$\big(Z_k, X_k\big)$
and
$\big(Z^{\prime}_k, X^{\prime}_k\big)$
are independent. We show next that for
$K \geq 1$
as in (28), and any
$t \in \mathbb{N}$
,

Indeed,

By independence,

and thus by Lemma 5.1

as desired.
Let E(t) be the event

Note that
$\mathbb{P} ( E(t) ) \leq \mathbb{P}(s \geq tK) \leq (1 - \epsilon)^{t}$
. In particular, if
$|h(R) - h(R^i)| \geq cx$
, where
$c > 0$
is the Lipschitz constant of the associated function f, then
$s \geq x$
, as there are at least x positions k such that
$X_k \neq X^{\prime}_k $
. Thus,

where
$C > 0$
depends on the parameters of the model but not on x. This suffices for the proof of (30).
We now turn to the proof of Proposition 2.2.
Proof of Proposition 2.2. Let
$E_t$
be the event that
$|h(R) - h(R^i)| \geq tK$
. Then

Recall that
$|g(x)| \leq cn$
for all
$x \in \mathcal{A}^n$
, and then
$|h(R) - h(R^i)| \leq 2cn$
. Using (32),

Let
$t = - r \text{ln}\ n / (\text{ln}\ (1 - \epsilon) ) > 0$
. Then

The order of the bound is optimal for t such that

or

it follows that

and the right-hand side has the same order of growth as (34).
If the growth order of
$(1 - \epsilon)^{t}$
is larger than the one in (35), the bound on the second term in (33) is of larger order as well.
Then
$\mathbb{E}| \Delta_i h(R)|^r \leq C_1 (\text{ln}\ n)^r$
, for
$ C_1 > 0$
depending on the parameters of the model and r. The first part of Proposition 2.2 is established.
For the upper bound on the central moments of f(X), recall the following generalizations of the Efron–Stein inequality (see [Reference Houdré and Ma12, Reference Rhee and Talagrand16]): for
$r \geq 2$
,

and for
$ r \in (0,2)$
,

Then, for all
$r > 0$
,

where
$C_2>0$
is a function of
$|\mathcal{S}|$
and r.
Remark 5.1. (i) Recall that in the independent setting, there is a single stack, or equivalently, the state space of the latent chain consists of a single element. Then for s as defined in the first paragraph of the above proof,
$\mathbb{P}(s > 1) = 0$
. Thus we can take
$tK = 2$
, and since
$\mathbb{P}(E_t ) \leq \mathbb{P}(s \geq tk) = 0$
, (33) becomes

which recovers the independent case.
(ii) Note that the bound on the central moments also follows from using an exponential bounded difference inequality for Markov chains proved by Paulin [Reference Paulin15]. This holds for the general case when X is a Markov chain (not necessarily time-homogeneous) taking values in a Polish space
$\Lambda = \Lambda_1 \times \cdots \times \Lambda_n$
, with mixing time
$\tau_{min}$
. Then, for any
$ t \geq 0$
,

where f is such that

for any
$x, y \in \mathbb{R}^n$
and some
$c^* = (c_1, \ldots, c_n) \in \mathbb{R}^n$
, and where
$||c^*||^2 = \sum_{i = 1}^n c_i^2$
.
5.2. Proof of Proposition 2.3
In this section we no longer require the underlying function f to be Lipschitz.
Let
$U \,{:}\,{\raise-1.5pt{=}}\, \sum_{ \emptyset \subseteq A \subsetneq[|R|]} k_{|R|, A} U_A /2$
for a general family of square-integrable random variables
$U_A(R, R^{\prime})$
. From [Reference Chatterjee2, Lemma 4.4],

As in [Reference Lachièze-Rey and Peccati14], this inequality will be used for both
$U_A = T_A(h)$
and
$U_A = T^{\prime}_A (h)$
. A major difference from the setting in [Reference Lachièze-Rey and Peccati14, Section 5] is that the function h is not symmetric; i.e., if
$\sigma$
is a permutation of
$\{ 0, \ldots, |\mathcal{S}|(n-1)\}$
, it is not necessarily the case that
$h\big(R_0, \ldots, R_{|\mathcal{S}|(n-1)}\big) = h\big(R_{\sigma(0)}, \ldots, R_{\sigma(|R|(n-1))}\big)$
. Indeed, each variable in R is associated with a transition at a particular step and from a particular state. Fix
$A \subsetneq[|R|]$
and let
$\tilde{R}$
be another independent copy of R. Introduce the substitution operator

Recall that from the Efron–Stein inequality,

where
$\tilde{\Delta}_i U_A(R) = U_A\big(\tilde{S}_i(R)\big) - U_A(R)$
.
Then,

Recall also that
$U_A = \sum_{j \notin A} \Delta_j h(R) a (\Delta_j h(X^A))$
, where the function a is either the identity, or
$a ({\cdot}) = | \cdot |$
. Then

Fix
$0 \leq i \leq |R| - 1$
, and note that for
$j \notin A$
,

Then, using
$|\tilde{\Delta}_i a({\cdot}) | \leq |\tilde{\Delta}_i ({\cdot})|$
, the summands in (37) are bounded by

where Y, Y
′, Z, Z
′ are recombinations of
$R, R^{\prime}, \tilde{R}$
; i.e.,
$Y_i \in \big\{R_i, R^{\prime}_i, \tilde{R}_i\big\}$
, for
$i \in [0, |R| -1]$
.
Next, as in [Reference Lachièze-Rey and Peccati14], we bound each type of summand appearing in (37).
If
$i = j = k$
, and using
$\tilde{\Delta}_i \big(\Delta_i ({\cdot})\big) = \Delta_i ({\cdot})$
, (39) is bounded by

If
$i \neq j \neq k$
, we can switch
$\tilde{R}_i$
and
$R^{\prime}_i $
, and Y is still a recombination. Then (39) is equal to

where the last step follows from the Cauchy–Schwarz inequality.
If
$i \neq j = k$
, (39) is equal to

where we have exchanged
$\tilde{R}_i$
and
$R^{\prime}_i$
and used the Cauchy–Schwarz inequality as in (40).
Similarly, if
$i = j \neq k$
, the bound is

Finally, if
$i = k \neq j$
, the bound is by symmetry

Combining (40), (41), (42), and (43) in (37), we finally get

where

This suffices for the proof of Proposition 2.3.
Remark 5.2. As observed in [Reference Chu, Shao and Zhang6], the terms involving
$\Delta_i h(R)$
in (4) and (5) can be removed, leaving only the variance terms. Here is a different way to establish this fact for the particular framework we consider in Section 3. Recall that the expressions on the right-hand sides of (4) and (5) are bounds on terms of the form

where
$|g^{\prime}_t | \leq 1$
and
$|g_t(W)W - g^{\prime}_t (W)| = |\mathbf{1}_{W \leq t} - \mathbb{P}(\mathcal{N} \leq t)| \leq 1$
(see [Reference Lachièze-Rey and Peccati14] and [Reference Chatterjee3]). First, note that

and

Then, by the triangle inequality and the above,

Let
$\mathbb{E} \big| g^{\prime}_t(W) - g^{\prime}_t(W) T\big| \leq f(n)$
for some function f, with
$f(n) \to \infty$
and such that for
$\sigma^2 = \sigma^2(n)$
,
$f(n) / \sigma^2 \to 0$
. Then

for some constant
$C > 0$
that does not depend on n. Therefore, the asymptotic behavior of the bounds in (4) and (5) is given by the terms corresponding to
$\mathbb{E} | g^{\prime}_t(W) - g^{\prime}_t(W)T|$
, i.e., the terms involving the variance. This modification of the method is also valid in our framework and would ‘improve’ our results. However, it does not have a significant effect on the rates obtained in our applications in Section 3, and so we will not pursue it any further here.
5.3. Proof of Proposition 3.1
Recall that

where the supremum is taken over recombinations of R and its independent copies R ′ and R ′′.
Let E be the event that at least one of the perturbations of the instructions in (44) yields a difference in more than K points. By Proposition 5.1, there is
$\epsilon > 0$
such that
$\mathbb{P}( E) \leq (1 - \epsilon)^K$
. Then, by the Lipschitz property of h,

If S(Y) is the set of points generated by the instructions Y and
$S(Y^i)$
—the set of points generated by Y after the perturbation of
$Y_i$
—let

where
$\Delta$
is the symmetric difference operator. Similarly, let

Note that, conditioned on
$E^c$
,
$|S_i| \leq 2K$
for
$i = 1, 2,3, 4$
. Furthermore, if
$s_1 \cap s_2 = \emptyset$
for all
$(s_1, s_2) \in (S_1, S_2)$
, then
$\Delta_{i, j} h(Y) = 0$
. Then

This bound is meaningful if the sets
$S_1$
and
$S_2$
are disjoint sets of random variables. Conditioned on
$E^c$
, this is the case if
$|i - j| \geq |\mathcal{S}|K$
. We introduce events
$E_1$
,
$E_2$
, and
$E_3$
corresponding to 0, 1, or 2 of the conditions
$\{ |i - j| \geq |\mathcal{S}| K, |j- k| \geq |\mathcal{S}| K\}$
holding, respectively. The events
$E_1$
,
$E_2,$
and
$E_3$
are deterministic. Then we have

First we use the trivial bound
$\mathbf{1}_{\Delta_{i,j} h(Y) \neq 0, \Delta_{j,k} h(Y^{\prime}) \neq 0} \leq 1$
to get

Then, for the term with
$\mathbf{1}_{E_3}$
,

To bound
$\mathbb{E} \big[ \mathbf{1}_{s_1 \cap s_2 \neq \emptyset, s_3 \cap s_4 \neq \emptyset}\big]$
, we condition on
$s_2$
,
$s_3$
, and the values of all hidden variables H. Then, since
$S_1$
and
$S_4$
are disjoint, we have independence:

Therefore,

for some
$C > 0$
independent of K and n, where we have used that
$|S_i| \leq 2K$
for
$i = 1,2,3,4$
.
Finally, for the term with
$E_2$
, we may assume that
$|i - j| \geq |\mathcal{S}|K$
, since the case
$|j - k| \geq |\mathcal{S}|K$
is identical. Write, using the trivial bound on
$\mathbf{1}_{\Delta_{j,k} h(Y^{\prime})} \neq 0$
,

Next, as before,

Then

Combining (45), (46), (48), and (47), we get the following bound on (44):

Then,

when we choose
$K = c \text{ln}\ n$
for a suitable
$ c> 0$
, independent of n.
5.4. Proof of Proposition 3.2
As in the proof of Proposition 5.1, the sequence of instructions
$R^i$
may give rise to a different realization (Z
′, X
′). Indeed, if the instruction
$R_i$
determines
$\big(Z_j, X_j\big)$
and
$R^{\prime}_i $
determines
$\big(Z^{\prime}_j, X^{\prime}_j\big)$
, it is possible that
$\big(Z_j, X_j\big) \neq \big(Z^{\prime}_j, X^{\prime}_j\big)$
. Let
$s \geq 0$
be the smallest integer (possibly
$s = \infty$
) such that
$Z_{j + s} = Z^{\prime}_{j+s} $
. Then, as in (31), there is
$\epsilon > 0$
such that for
$K \in \mathbb{N}$
,

Fix K, and let E be the event corresponding to
$\{s \geq K\}$
. Using the trivial bound
$|h(R)| \leq n$
, and thus
$|\Delta_i h(R)| \leq 2n$
, we have

Let S(R) be the set of points generated by the sequence of instructions R, and let
$S(R^j)$
be the points generated by R after the perturbation of
$R_j$
. Set
$S = S(R) \Delta S(R^j)$
for the symmetric difference and
$S^c = S(R) \cap S(R^j)$
. Note that
$E^c$
implies that
$|\mathcal{S}| \leq 2K$
. Furthermore,

and

To estimate (49), we need to evaluate

and to do so we proceed as in [Reference Lachièze-Rey and Peccati14] by studying the shape of the relations of
$(s_j, x_{\ell})_{j, \ell \in \{1, \ldots, t\}}$
.
Identify the set
$(s_j, x_{\ell})_{j, \ell \in \{1, \ldots, t\}}$
with the edges of the graph G whose vertices correspond to
$(s_j)_{j \in \{1, \ldots, t\}}$
and
$(x_{\ell})_{ \ell \in \{1, \ldots, t\}}$
. In particular, if
$s_{j_1} = s_{j_2}$
for some
$j_1 \neq j_2$
, we identify them with the same point in the graph G. Conditioned on the realization of the hidden chain Z, we have independence. Then, if G is a tree, fix a root and condition recursively on vertices at different distances from the root. By the restrictions on the volume of the grain and the sampling distribution,

where
$|E(G)|$
is the number of edges in the graph G. Furthermore,

Note that the same result holds if G is a graph without cycles, i.e., a collection of disjoint trees. In general, G might have cycles. Let T be a subgraph of G that contains no cycles. Then

where the product on the right-hand side runs over the edges
$e = (e_1, e_2)$
of the graph T with
$e_1 \in S$
and
$e_2 \in S^c$
. Let
$|\mathcal{S}|$
be the number of distinct vertices in
$(s_1, \ldots, s_t)$
, and similarly let
$|x|$
be the number for
$(x_1, \ldots, x_t)$
. The graph G is complete bipartite with
$|S| + |x|$
vertices. We can find a subgraph T of G, also with
$|S| + |x|$
vertices and no cycles. Then

where
$C_t > 0$
is a constant depending on t, and where we have used that
$|\mathcal{S}| \leq 2K$
and
$|S^c| \leq 2n$
.
Let
$K = c\text{ln}\ n$
for a suitable
$c > 0$
; then (49) implies (15) as desired.
5.5. Proof of Proposition 3.3
As before, let E be the event that all perturbations of instructions in (44) propagate at most K levels. We have that
$\mathbb{P}(E^c) \leq (1 - \epsilon)^K$
for some
$\epsilon \in (0,1)$
. Using the trivial bound
$|h(Y)| \leq n$
,

Let
$S(Y^i)$
be the set of points generated by the sequence of instructions Y after the perturbation of
$Y_i$
. Let S be the set of all points in the expectation above, and furthermore let

where
$\Delta$
is the symmetric difference operator. Conditioned on E,
$|S_i| \leq 2K$
for
$i = 1, \ldots, 6$
and
$ |\mathcal{S}| \leq 10n$
.
Conditioned on E, if
$j - i \leq |\mathcal{S}| K$
, the perturbation in i might be propagating past the position corresponding to the instruction j, leading to difficulties in the analysis of
$\Delta_{i,j} h(Y)$
. This is why we condition further on the events
$E_1, E_2, E_3$
corresponding respectively to 0, 1, or 2 of the conditions
$\{|i - j| \geq |\mathcal{S}|K, |j - k| \geq |\mathcal{S}|K\}$
holding true. Note that
$E_1$
,
$E_2$
, and
$E_3$
are deterministic.
If
$E_1$
holds, we use the trivial bound
$ \mathbf{1}_{\Delta_{i,j} h(Y) \neq 0,\Delta_{j,k} h(Y^{\prime}) \neq 0} \leq 1$
, which leads to

using the Cauchy–Schwarz inequality.
Conditioned on
$E_3$
, the sets
$S_1$
,
$S_2 \cup S_3$
, and
$S_4$
are pairwise disjoint. Next, in similarity to an argument presented in [Reference Lachièze-Rey and Peccati14], if
$s_1 \cap s = \emptyset$
and
$s_2 \cap s = \emptyset$
for all
$(s_1, s_2, s) \in (S_1, S_2, S)$
, then
$\Delta_{i,j} h(Y) = 0$
. Therefore,

and also

Furthermore,

and

Therefore,

where
$S_{56} = S_5 \cup S_6$
and
$|S_{56}| \leq 4K$
, conditioned on E.
To evaluate the summand expression we use the graph representation. Let
$E_{\ell}$
be the event that there are
$\ell$
distinct points among
$s^{\prime},s^{\prime\prime}, s^{\prime}_5, \ldots, s^{\prime}_8$
, different from
$s_1, \ldots, s_8$
. Note that
$\ell \in [0, 6]$
. Conditioned on
$E_{\ell}$
, we can find a subgraph with no cycles and
$\ell + 2$
edges, of the graph with edges
$\{ \{s_1, s^{\prime}\}, \{s_2, s^{\prime}\}, \{s_3, s^{\prime\prime}\}, \{s_4, s^{\prime\prime}\} \} \cup \{ \big\{s_a, s^{\prime}_b \big\}\,:\, a, b \in [5,8]\}$
. Indeed, note that there are at least 3 different points among
$s_1, \ldots, s_4$
. Next, if there are x points present among s
′, s
′′ and
$\ell - x$
points among
$s^{\prime}_5, \ldots, s^{\prime}_8 $
, we can find a subgraph with no cycles with at least
$\ell - x$
edges among
$\{ \big\{s_a, s^{\prime}_b \big\}\,:\, a, b \in [5,8]\}$
and
$x + 2$
edges among
$\{ \{s_1, s^{\prime}\}, \{s_2, s^{\prime}\}, \{s_3, s^{\prime\prime}\}, \{s_4, s^{\prime\prime}\} \}$
.
Then, if we further condition on the values of the hidden variables H, we get, by independence,

Then (52) is further bounded by

for some
$C > 0$
independent of n and K.
Finally, assume that
$E_2$
holds and that
$|i - j| \geq |\mathcal{S}| K$
. The case
$|j - k| \geq |\mathcal{S}|K$
is identical. As above, using the trivial bound
$\mathbf{1}_{\Delta_{j,k} h(Y^{\prime}) \neq 0} \leq 1$
,

If we condition on
$E_{\ell}$
and the values of the hidden variables H, we get

since in this case
$s_1$
and
$s_2$
are distinct and we can find a subgraph with
$\ell + 1$
edges and no cycles.
Then (54) is bounded by

for some
$C > 0$
.
We get the following bound on
$B_{|R|} (h)$
using (50), (51), (55), and (53):

Then,

where we have chosen
$K = c \text{ln}\ n$
for a suitable
$c > 0$
, independent of n.
5.6. Proof of Lemma 4.1
To simplify computations, we introduce the process X ′ defined as

Unlike in the independent setting in [Reference Lachièze-Rey and Peccati14], here the law of X
′ is invariant only under integer-valued translations. Note that, almost surely, X
′ has exactly n points in any cube
$[t, t+1]^d$
, where
$t \in \mathbb{R}$
. Let
$T_x = \big\{ [y, y+1]^d \,:\, y \in \mathbb{R}^d, x \in [y, y+1]^d\big\}$
. Define
$\overline{R_k}(x;\, X)$
as

Note that if
$x \in [0,1]^d$
, then
$[0,1 ]^d \in T_x$
and so
$\overline{R_k}(x;\, X^{\prime}) \geq R_k(x;\, X)$
. When the
$X_i$
are sampled independently and uniformly, as in [Reference Lachièze-Rey and Peccati14], it is the case that
$\overline{R_k}(x;\, X^{\prime})$
does not depend on the position of x. However, in the hidden Markov model case we need to find a further bound on
$\overline{R_k}(x;\, X^{\prime})$
.
For that purpose, consider the cube
$K_0 \,{:}\,{\raise-1.5pt{=}}\, [-1/2, 1/2]^d$
of volume 1 centered at
$\mathbf{0} \in \mathbb{R}^d$
. Let
$B_A$
be the open ball of
$\mathbb{R}^d$
centered at
$\mathbf{0}$
and of volume
$A < 1$
, to be chosen later. Next, let
$\tilde{X} = \big(0, \tilde{X}_1, \ldots, \tilde{X}_{n-1}\big)$
be such that
$\tilde{X}_i \in K_0$
for all
$i = 1, \ldots, n-1$
. Furthermore, for any Lebesgue-measurable
$T \subseteq K_0$
, set

for all
$i \in 1, \ldots, n-1$
, where
$|\cdot|$
now denotes the Lebesgue measure of the corresponding sets. If
$A = (c_M - 1)/(c_M - c_m)$
, then the above is a well-defined positive measure on
$K_0$
. From the restrictions of the hidden Markov model, if
$\tilde{R}_k = R_k(0;\, \tilde{X})$
,

Indeed,
$\tilde{R}_k$
represents the worst-case scenario where the remaining points of X are least likely to be distributed in the volume closest to x.
Then,

where we have used the upper bound on
$Vol(\partial K^r)$
.
To estimate
$\mathbb{E}\big[ \tilde{R}_k^{qd+ \alpha}\big]$
, note that if
$\tilde{R}_k \geq r$
, there will be an open ball of radius
$r/2k$
in
$K_0$
containing no points of
$\tilde{X}$
. Moreover, there will be
$s_d \in (0,1)$
, depending only on the dimension d, such that every ball of radius 2k contains a cube of side length
$s_d r/k$
of the form
$[g - s_d r/ 2k, g + s_d r/ 2k]$
, where
$g \in (s_d r / k) \mathbb{Z}^d $
. Then, if
$s_d r / k < 1$
,

If, on the other hand,
$s_d r / k \geq 1$
, then
$ \tilde{X} \cap [g - s_d r/ 2k, g + s_d r/ 2k] = \tilde{X}$
and
$ \mathbb{P}\big(\tilde{R}_k \geq r\big) = 0$
. Using
$1 - x \leq e^{-x}$
, for any
$u > 0$
we have

Applying the above in (56) yields

where
$c_{d, k, qd + \alpha} > 0$
depends only on the parameters of the transition probabilities of the hidden chain and on d, k, and
$qd + \alpha$
, but neither on n nor on i.
5.7. Proof of Proposition 4.5
We analyze

where as before the supremum is taken over recombinations Y, Y
′, Z, Z
′ of R, R
′, R
′′. Let E be the event that all perturbations of the instructions in (57) propagate at most T levels. There is
$\epsilon > 0$
, depending only on the parameters of the models, such that
$\mathbb{P}(E^c) \leq (1 - \epsilon)^T$
.
As before, conditioned on E, if
$|j - i| \leq |\mathcal{S}| K$
, the perturbation in i might be propagating past the position corresponding to the instruction j, leading to difficulties in the analysis of
$\Delta_{i,j} h(Y)$
. This is the reason for conditioning further on the events
$E_1, E_2, E_3$
corresponding respectively to 0, 1, or 2 of the conditions
$\{|i - j| \geq |\mathcal{S}|K, |j - k| \geq |\mathcal{S}|K\}$
holding. Note that
$E_1$
,
$E_2$
, and
$E_3$
are deterministic.
In this setting, we also study the event that all Voronoi cells are small. For that purpose, as in [Reference Lachièze-Rey and Peccati14], we introduce the event
$\Omega_n(X)$
, given by

where
$\rho_n = (\text{ln}\ n)^{1/d + \epsilon^{\prime}}$
for
$\epsilon^{\prime}$
sufficiently small. Then, after conditioning on the realization of the hidden chain, a proof as in [Reference Lachièze-Rey and Peccati14, Lemma 6.8] leads to

as
$n \to \infty$
, for all
$\eta > 0$
.
We now estimate
$B_{|R|}(h)$
. Write

Using
$|\Delta_j h(Z)|, |\Delta_k h\big(Z^{\prime}\big)| \leq 1$
, we get that the first two terms in (59) are bounded by
$\mathbb{P}(E^c) +\mathbb{P}(\Omega_n^c)$
. Next,

where we have used the Cauchy–Schwarz inequality.
Next, define as before

Further, let
$S_0 = S(Y) \cap S\big(Y^i\big) \cap S\big(Y^j\big)$
and
$S^{\prime}_0 = S\big(Y^{\prime}\big) \cap S\big(\big(Y^{\prime}\big)^j\big) \cap S\big(\big(Y^{\prime}\big)^k\big)$
. By Proposition 4.4(ii), it follows that conditioned on
$\Omega_n$
,

Conditioned on
$E_3$
, the sets
$S_1$
,
$S_2 \cup S_3$
, and
$S_4$
are pairwise disjoint:

By conditioning on the realization of all hidden chains H, we obtain

Now, conditioned on H,
$s^{\prime}_1$
, and
$s_2$
, we have independence in the innermost expectation. Therefore, the above is bounded by

Then,

Finally, for the event
$E_2$
, assuming that
$|i - j | \geq |\mathcal{S}| K$
, the other case being identical, we have

Using (59), (60), (62), and (61) leads to

Then,

where we have chosen
$K = c \text{ln}\ n$
, for a suitable
$c > 0$
, independent of n, using also (58) and the definition of
$\rho_n$
.
Funding information
The first author’s research is supported in part by grant no. 524678 from the Simons Foundation. The second author was partially supported by a TRIAD NSF grant (award 1740776) and the FNR grant APOGee at Luxembourg University (R-AGR-3585-10-C).
Competing interests
There were no competing interests to declare which arose during the preparation or publication process for this article.