1. Introduction
The minimum spanning tree (MST) problem ranks among the simplest Combinatorial Optimization problems, with many applications, well beyond its historical introduction for network design [Reference Graham and Hell17] including approximation algorithms for more complex problems [Reference Christofides13, Reference Kou, Markowsky and Berman20] and cluster analysis [Reference Asano, Bhattacharya, Keil and Yao4].
Its formulation is straightforward: given a weighted undirected graph $G=(V,E,w)$ with $w \,:\, E \to (0, \infty )$ , find a subgraph $T\subseteq E$ that connects all nodes $V$ and has a minimal total edge cost
thus defining the MST cost functional $\mathcal{C}_{\text{MST}}(G)$ . Minimality yields that redundant connections can be discarded, so that the resulting subgraph $T$ turns out to be a tree, i.e., connected and without cycles. Several algorithms have been proposed for its solution, from classical greedy to more efficient ones [Reference Chazelle12], possibly randomised [Reference Karger, Klein and Tarjan18].
Despite its apparent simplicity, a probabilistic analysis of the problem, i.e., assuming that weights are random variables with a given joint law and studying the resulting random costs and MST’s yields interesting results. Moreover, it may suggest mathematical tools to deal with more complex problems, such as the Steiner tree problem or the travelling salesperson problem, where one searches instead for a cycle connecting all points having minimum total edge weight.
The most investigated random model is surely that of i.i.d. weights with a regular density, as first studied by Frieze [Reference Frieze16], who showed in particular the following law of large numbers: if $G^n= (V^n, E^n, w^n)$ , with $(V^n,E^n)= K_n$ the complete graph over $n$ nodes and $w^n = \left ( w_{ij} \right )_{i,j =1}^n$ are independent and uniformly distributed on $[0,1]$ , then almost surely
Another well studied setting is provided by Euclidean models, where nodes are i.i.d. sampled points in a region (say uniformly on a cube $[0,1]^d \subseteq{\mathbb{R}}^d$ , for simplicity) and edge weights are functions of their distance, e.g. $w(x,y) = |x-y|^p$ for some parameter $p\gt 0$ . This setting dates back at least to the seminal paper by Beardwood, Halton and Hammersley [Reference Beardwood, Halton and Hammersley8] where they focused on the travelling salesperson problem, but stated that other problems may be as well considered, including the MST one. A full analysis was later performed by Steele [Reference Steele28] who proved that, if the Euclidean graph consists of $n$ nodes, then for every $0\lt p\lt d$ , almost sure convergence holds
where $\beta _{\text{MST}}(p,d) \in (0,\infty )$ is a constant. The rate $n^{1-p/d}$ is intuitively clear due to the fact that there are $n-1$ edges in a tree over $n$ points and the typical distance between two adjacent points is expected to be of order $n^{-1/d}$ . The constraint $p\lt d$ was removed by Aldous and Steele [Reference Aldous and Steele3] and Yukich [Reference Yukich31], so that convergence holds in fact for any $p\gt 0$ . This result can be seen as an application of a general Euclidean additive functional theory [Reference Steele27, Reference Yukich32]. However, such general methods that work for other combinatorial optimisation problems give not much insight on the precise value of the limit constant $\beta _{MST}(p,d)$ . The MST problem is known to be exceptional, for a (sort of) explicit series representation, analogue to (1.1), was obtained by Avram and Bertismas [Reference Avram and Bertsimas1], although only in the range $0\lt p\lt d$ . The latter was used by Penrose [Reference Penrose24], in connection with continuum percolation, to study, among other things, the MST in the high dimensional regime $d\to \infty$ . An alternative approach towards explicit formulas was proposed by Steele [Reference Steele, Chauvin, Flajolet, Gardy and Mokkadem26], but limited to the case of i.i.d. weights, based on Tutte polynomials.
The aim of this paper is to investigate analogous results for bipartite Euclidean random models, i.e., when nodes correspond to two distinct families of sampled points (e.g., visually rendered by red/blue colourings) and weights, still given by a power of the distance, are only defined between points with different colours. Formally, we replace the underlying complete graph $K_n$ with a complete bipartite graph $K_{n_R, n_B}$ with $n_R+n_B=n$ .
A similar question was formulated and essentially solved in the model with independent weights by Frieze and McDiarmid [Reference Frieze and McDiarmid15]. In Euclidean models, however, it is known that such innocent looking variant may in fact cause quantitative differences in the corresponding asymptotic results. For example, in the Euclidean bipartite travelling salesperson problem with $d=1$ and $d=2$ , the correct asymptotic rates (for $p=1$ ) are known to be respectively $\sqrt{n}$ [Reference Caracciolo, Di Gioacchino, Gherardi and Malatesta11] and $\sqrt{n \log n }$ [Reference Capelli, Caracciolo, Di Gioacchino and Malatesta9], larger than the natural $n^{1-1/d}$ for the non-bipartite problem. Similar results are known for other problems, such as the minimum matching problem [Reference Steele27] and its bipartite counterpart, also related to the optimal transport problem [Reference Ajtai, Komlós and Tusnády2, Reference Ambrosio, Stra and Trevisan5, Reference Caracciolo, Lucibello, Parisi and Sicuro10, Reference Talagrand29, Reference Talagrand30]. Barthe and Bordenave proposed a bipartite extension of the Euclidean additive functional theory [Reference Barthe, Bordenave, Donati-Martin, Lejay and Rouault6] that allows to recover an analogue of (1.2) for many relevant combinatorial optimisation problems on bipartite Euclidean random models, although its range of applicability is restricted to $0\lt p\lt d/2$ (the cases $p=1$ , $d\in \left \{ 1,2 \right \}$ are indeed outside this range) and anyway the MST problem does not fit in the theory. The main reason for the latter limitation is that there is no uniform bound on the maximum degree of an MST on a bipartite Euclidean random graph – their theory instead applies to a variant of the problem where a uniform bound on the maximum degree is imposed, which is in fact algorithmically more complex (if the bound is two it recovers essentially the travelling salesperson problem).
1.1. Main results
Our first main result describes precisely the asymptotic maximum degree of an MST on a bipartite Euclidean random graph, showing that it grows logarithmically in the total number of nodes, in the asymptotic regime where a fraction of points $\alpha _R \in (0,1)$ is red and the remaining $\alpha _B = 1-\alpha _R$ is blue.
Theorem 1.1. Let $d\ge 1$ , let $n \ge 1$ and $R^n = \left ( X_i \right )_{i=1}^{n_R}$ , $B^n= \left ( Y_i \right )_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Let $T^n$ denote the MST over the complete bipartite graph with independent sets $R^n$ , $B^n$ and weights $w(X_i,Y_j) = |X_i-Y_j|$ , and let $\Delta (T^n)$ denote its maximum vertex degree. Then, there exists a constant $C= C(d,\alpha _R)\gt 0$ such that
(Indeed, the structure of the MST does not depend on the specific choice of the exponent $p\gt 0$ , so we simply let $p=1$ above). The proof is detailed in Section 3.
Our second main result shows that, although the general theory of Barthe and Bordenave does not apply and the maximum degree indeed grows, the total weight cost for the bipartite Euclidean MST problem turns out to be much closer to the non-bipartite one, since no exceptional rates appear in low dimensions. Before we give the complete statement, let us introduce the following quantity, for $d \ge 1$ , $k_R$ , $k_B \ge 1$ , $\alpha _R \in (0,1)$ ,
where $\alpha _B = 1-\alpha _R$ and we write
for the set of (ordered) points $\left( \left ( r_i \right )_{i=1}^{k_R}, \left ( b_j \right )_{j=1}^{k_B}\right)$ such that, in the associated Euclidean bipartite graph with weights $\left ( |r_i-b_j| \right )_{i,j}$ , the subgraph with all edges having weight less than $1$ is connected (or equivalently, there exists a bipartite Euclidean spanning tree having all edges with length weight less than $1$ ), and for a set $A \subseteq{\mathbb{R}}^d$ , we write
and $|D(A)|$ for its Lebesgue measure. Notice also that the overall integration is performed with respect to the $d$ -dimensional Lebesgue measure over $k_R+k_B-1$ variables and one (either $r_1$ or $b_1$ ) is instead with respect to a Dirac measure at $0$ .
These quantities enter in the explicit formula for the limit constant in the bipartite analogue of (1.2), as our second main result shows.
Theorem 1.2. Let $d\ge 1$ , let $n \ge 1$ and $R^n = \left ( X_i \right )_{i=1}^{n_R}$ , $B^n= \left ( Y_i \right )_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Let $T^n$ denote the MST over the complete bipartite graph with independent sets $R^n$ , $B^n$ and weights $w(X_i,Y_j) = |X_i-Y_j|$ . Then, for every $p \in (0, d)$ , the following convergence holds
and the constant $\beta _{b\text{MST}}(d,p) \in (0, \infty )$ is given by the series
Moreover, if $p\lt d/2$ for $d \in \left \{ 1,2 \right \}$ or $p\lt d$ for $d \ge 3$ , convergence is almost sure:
The proof is detailed in Section 4. The one-dimensional random bipartite Euclidean MST has been recently theoretically investigated in the statistical physics literature by Riva, Caracciolo and Malatesta [Reference Riva, Caracciolo and Malatesta25], together with extensive numerical simulations also in higher dimensions, hinting at the possibility of a non-exceptional rate $n^{1-p/d}$ also for $d=2$ . In particular, our result confirms this asymptotic rate in the two dimensional case, with a.s. convergence if $p\lt 1$ and just convergence of the expected costs if $1\le p \lt 2$ – in fact we also have a general upper bound if $p\ge 2$ (Lemma 4.5).
1.2. Further questions and conjectures
Several extensions of the results contained in this work may be devised, for example by generalising to $k$ -partite models or more general block models, allowing for weights between the same coloured points but possibly with a different function, e.g. the same power of the distance function, but multiplied by a different pre-factor according to pair of blocks. An interesting question, also open for the non-bipartite case, is to extend the series representation for the limiting constant to the case $p\ge d$ . On the other side, we suspect that additivity techniques may yield convergence in (1.5) also in the range $p \ge d$ , without an explicit series, but we leave it for future explorations. A further question, that has no counterpart in the non-bipartite case, is what happens if the laws of different coloured points are different, say with densities $f_R$ and $f_B$ that are regular, uniformly positive and bounded. Assuming that $n_R/n \to 1/2$ , a natural conjecture is that the limit holds with (1.6) obtained by replacing $\alpha _R$ and $\alpha _B$ with the ‘local’ fraction of points $f_R(x)/2$ , $f_R(x)/2$ and then integrating with respect to $x \in [0,1]^d$ , i.e.,
Furthermore, a central limit theorem is known for MST problem [Reference Chatterjee and Sen14, Reference Kesten and Lee19] and it may be interesting to understand the possible role played by the additional fluctuations introduced in bipartite setting for analogue results.
Finally, it may be of interest to strengthen Theorem 1.1, by establishing the limit (e.g. in probability)
for some constant $\gamma (d) \in (0, \infty )$ , and further investigating the vertex degree distribution of $T^n$ .
1.3. Structure of the paper
In Section 2 we collect useful notation and properties of general MST’s, together with crucial observations in the metric setting (including the Euclidean one) and some useful probabilistic estimates. We try here to keep separate as much as possible probabilistic from deterministic results, to simplify the exposition. In Section 3 we prove Theorem 1.1 and in Section 4 we first extend [Reference Avram and Bertsimas1, Theorem 1] to the bipartite case and then apply it in the Euclidean setting. An intermediate step requires to argue on the flat torus $\mathbb{T}$ to exploit further homogeneity. We finally use a concentration result to obtain almost sure convergence: since the vertex degree is not uniformly bounded, the standard inequalities were not sufficient to directly cover the case $p\lt 1$ , so we prove a simple variant of McDiarmid inequality in Appendix A that we did not find in the literature and may be of independent interest.
2. Notation and preliminary results
We always consider the space ${\mathbb{R}}^d$ , $d \ge 1$ , endowed with the Euclidean distance, which we denote with $|x-y|$ for $x$ , $y \in{\mathbb{R}}^d$ . The volume of the unit ball is denoted with $\omega _d$ . We also use throughout the letter $C$ to denote constants that may depend upon parameters (such as the dimension $d$ , exponents $p$ , etc.), warning the reader that to simplify the notation in some proofs, the same letter may be used in different locations to denote possibly different constants.
2.1. Minimum spanning trees
Although our focus is on weighted graphs induced by points in the Euclidean space ${\mathbb{R}}^d$ , the following general definition of minimum spanning trees will be useful.
Definition 2.1. Given a weighted undirected finite graph $G = (V,E, w)$ , with $w\,:\, E \to [0, \infty ]$ , the MST cost functional is defined as
Here and below, connected is in the sense that only edges with finite weight must be considered. We consider only minimisers $T$ in (2.1) that are trees, i.e., connected and acyclic, otherwise removing the most expensive edge in a cycle would give a competitor with smaller cost (since we assume possibly null weights, there may be other minimisers). The following lemma is a special case of the cut property of minimum spanning trees, but will play a crucial role in several occasions, so we state it here.
Lemma 2.2. Let $G = (V, E, w)$ , $v \in V$ and assume that $e \in \operatorname{arg min}\left \{ w(f)\, : \, v \in f \right \}$ is unique. Then, $e$ belongs to every minimum spanning tree of $G$ .
Proof. Assume that $e$ does not belong to a minimum spanning tree $T$ . Addition of $e$ to $T$ induces a cycle that includes necessarily another edge $f$ , with $v \in f$ and by assumption $w(f)\gt w(e)$ . By removing $f$ , the cost of the resulting connected graph is strictly smaller that the cost of $T$ , a contradiction.
We write $\mathsf{n}_G(v) \in V$ , or simply $\mathsf{n}(v)$ if there are no ambiguities, for the closest node to $v$ in $G$ , i.e.,
assuming that such node is unique.
The subgraphs
are strongly related to the minimum spanning tree on $G$ , since the execution of Kruskal’s algorithm yields the identity, already observed in [Reference Avram and Bertsimas1],
where we write $C_G$ denotes the number of connected components of a graph $G$ . Indeed, the function $z \mapsto C_{G(z)}$ is piecewise constant and decreasing from $|V|$ towards $1$ (assuming that all weights are strictly positive and $G$ is connected). Assume for simplicity that all weights $\left \{ w_e \right \}_{e \in E}$ are different, so that $z \mapsto C_{G(z)}$ has only unit jumps, on a set $J_{-}$ . An integration by parts gives the identity
To argue that the right hand side is the cost of an MST, e.g., obtained by Kruskal’s algorithm, we may represent the connected components of $G(z)$ as a function of $z$ in a tree-like graph (see Fig. 1): starting with components consisting of single nodes at $z=0$ , whenever two components merge (i.e., at values $z \in J_{-}$ ) we connect the corresponding segments. This yields a (continuous) tree with leaves given by the nodes and a root at $z =\infty$ . Since Kruskal’s algorithm returns exactly the tree consisting of the edges corresponding to such $z \in J_{-}$ , we obtain (2.2).
Remark 2.3. The construction above also yields that the minimum spanning trees of $G = (V, E, w)$ are also minimum spanning trees associated to the graph $G^\psi = (V,E, \psi \circ w)$ , i.e., weights are $\psi (w(e))$ where $\psi$ is an increasing function. In particular, assuming that $\psi \,:\,[0, \infty )$ is strictly increasing with $\psi (0) = 0$ , then $G^\psi (z) = G(\psi ^{-1}(z))$ , hence
In particular, choosing $\psi (x) = x^p$ and letting $p \to \infty$ , we obtain that any minimum spanning tree $T$ is also a minimum bottleneck spanning tree, i.e., $T$ minimises the functional
A similar argument [Reference Avram and Bertsimas1, Lemma 4] yields an upper bound for a similar quantity where $C_{G}$ is replaced with $C_{k,G}$ , the number of connected components having at least $k$ nodes.
Lemma 2.4. Let $G = (V, E, w)$ be connected with all distinct weights $(w(e))_{e \in E}$ (if finite) and $2 \le k \le |V|$ . Then, there exists a partition $V = \bigcup _{i=1}^{m} C_i$ such that letting $G_k$ be the graph over the node set $\left \{ C_1,\ldots, C_m \right \}$ with weights
then
Moreover, for every $i=1, \ldots, m$ , $|C_i| \ge k$ , hence $m \le |V|/k$ , and there exists $v \in C_i$ such that $\mathsf{n}_G(v) \in C_i$ and $\mathsf{n}_G(\mathsf{n}_G(v)) = v$ .
Proof. The function $z \mapsto C_{k, G(z)}$ is piecewise constant, with jumps of absolute size $1$ , with positive sign on a set $J_+$ and negative sign on a set $J_-$ . An integration by parts gives
We interpret the right hand side above as $\mathcal{C}_{\text{MST}}(G_k)$ for a suitable graph $G_k$ . To define the sets $C_i$ , we let $J_+ = \left \{ z_1, \ldots, z_{m} \right \}$ and define, for every $z_i$ , the ‘seed’ of $C_i$ as the set of nodes that gives an additional component with at least $k$ nodes, i.e., obtained by merging two components in $G(z_i^-)$ , both having less that $k$ nodes. Notice that, since $C_i$ will be then completed by adding nodes to such seeds, the last statement is already fulfilled. Indeed, any seed contains at least $k$ nodes we can always choose $v_1$ , $v_2$ in a seed such that the paths from $v_1$ , $v_2$ merge first (among those from other nodes in the same seed). This gives that $v_2= \mathsf{n}_G(v_1)$ and $v_2 = \mathsf{n}_G(v_1)$ .
To completely determine every $C_i$ , it is simpler to argue graphically on the the tree-like representation (Fig. 1), where we highlight the ‘birth’ of $C_i$ at $z_i \in J_{-}$ by thickening the shortest path from the seed towards the root at $z =\infty$ . At every $z$ such that two thick paths merge, the corresponding two connected components with at least $k$ elements become one, hence $C_{k,G(z)}$ jumps downwards, i.e., $z \in J_{-}$ . Given $v \in V$ , we introduce the following rule in order to determine the set $C_i$ to which $v$ belongs. Consider the shortest path from the trivial component containing only $v$ at $z=0$ towards the root at $z = \infty$ in the tree-like representation. We focus on the first point where such path merges with a thick line and write $z_v$ for the corresponding value on the $z$ -axis and let $\mathcal{S}_v$ denote the subset of ‘seeds’ among the $C_i$ ’s that are located further from the root (i.e., with smaller $z$ values) along the thick path, starting from such point (for example, in Fig. 1 we report the construction for $k=3$ , where we have for $v=7$ , $z_7 = 6$ and $\mathcal{S}_7 = \left \{ \left \{ 1,2,3 \right \}, \left \{ 4,5,6 \right \} \right \}$ ). If $z_v \in J_+$ , then $v$ already becomes part of a seed of some $C_i$ , hence the situation is trivial, and we let $v \in C_i$ . Otherwise the node $v$ will be assigned to a chosen cluster $C_i$ among those in $\mathcal{S}_v$ . We choose to add the vertex $v$ to the $C_i \in \mathcal{S}_v$ with smallest label index $i$ . Our choice ensures that all the connected component of $v$ in the graph $G(z_v^-)$ is added to the same $C_i$ . We highlight that, as long as it is consistent, other choices of $C_i$ ’s would be possible, as such nodes play a role only later in the construction of the MST, when the clusters are already merged. Anyway, the resulting $C_i$ ’s thus well-defined and moreover the induced weight between them defined via (2.4) actually coincides with the weight between the original seeds. Using this fact, to prove (2.5) it is then elementary to check that the graphical representation of Kruskal’s algorithm on the graph $G_k$ gives exactly the thickened tree.
2.2. Metric MST problem
If $(X, \mathsf{d})$ is a metric space and $V \subseteq X$ , then a natural choice for a weight is $w(\{ x,y\}) = \mathsf{d}(x,y)^p$ , where $p\gt 0$ is fixed. If $V$ , $R$ , $B \subseteq X$ , are finite sets and $p\gt 0$ , we write
for the the MST cost functional on the complete graph on $V$ (and respectively, on the complete bipartite graph with independent sets $R$ , $B$ ) and edge weights $w(\{ x,y\}) = \mathsf{d}(x,y)^p$ , for $\{ x,y\} \in E$ . Notice that, by Remark 2.3, the MST does not in fact depend on the choice of $p$ , and moreover we may let $p\to \infty$ and obtain
where $\mathcal{C}_{\text{MST}}^\infty$ is the minimum bottleneck spanning tree cost defined in (2.3) with edge weight given by the distance.
We denote by
and
respectively the distance function from $V$ and the Hausdorff distance between $R$ and $B$ . Clearly, $\mathcal{C}_{\text{MST}}^p(R \cup B) \le \mathcal{C}_{\text{MST}}^p(R, B)$ . The following lemma provides a sort of converse inequality.
Lemma 2.5. Let $p\gt 0$ . There exists a constant $C = C(p) \in (0, \infty )$ such that, for finite sets $R$ , $B \subseteq X$ ,
and, for some constant $C\gt 0$ ,
Proof. For simplicity, we assume that all edge weights are different (otherwise a small perturbation of the weights and a suitable limit gives the thesis). Let $T_R$ denote the MST for the vertex set $R$ and fix $\bar{r} \in R$ . For every $r \in R$ , there exists a unique path in $T_R$ with minimal length connecting $r$ to $\bar{r}$ . We associate to every $r \in R\setminus \left \{ \bar{r} \right \}$ the first edge $e(r)$ of such path (so that $r \in e(r)$ ). The correspondence $r \mapsto e(r)$ is a bijection.
We use this correspondence to define a connected spanning graph $S$ (not necessarily a tree) on the bipartite graph with independent sets $R$ , $B$ . For every $r \in R \setminus \left \{ \bar{r} \right \}$ , if $e(r) = \left \{ r, r^{\prime} \right \}$ , we add the edge $\left \{ \mathsf{n}(r), r^{\prime} \right \}$ to $S$ , where $\mathsf{n}(r) = \mathsf{n}_G(r) \in B$ and $G$ is the complete bipartite graph with independent sets $R$ , $B$ . Moreover, for every $r \in R$ , $b \in B$ we also add the edges $\left \{ r, \mathsf{n}(r) \right \}$ , $\left \{ b, \mathsf{n}(b) \right \}$ . $S$ is connected because every $b \in B$ is connected to $R$ and the vertex set $R$ is connected: any path on $T_R$ naturally corresponds to a path on $S$ using the pair of edges $\left \{ r, \mathsf{n}(r) \right \}$ , $\left \{ \mathsf{n}(r), r^{\prime} \right \}$ instead of an edge $e(r) = \big\{ r,r^{\prime} \big\}$ . The triangle inequality gives
for some constant $C = C(p)\ge 1$ , hence
and the first claim follows. Taking the $p$ -th root both sides and letting $p \to \infty$ yields the second inequality.
Remark 2.6. In the Euclidean setting $X={\mathbb{R}}^d$ , it is known (see [Reference Steele28]) that, if $p\lt d$ , there exists a constant $C =C(d,p)\gt 0$ such that
for any $V \subseteq [0,1]^d$ . A similar uniform bound cannot be true in the bipartite case, as simple examples show.
A second fundamental difference between the usual Euclidean MST and its bipartite variant is that for the latter its maximum vertex degree does not need to be uniformly bounded by a constant $C = C(d)\gt 0$ (again, examples are straightforward). The following result will be crucial to provide an upper bound in the random case. We say that $Q \subseteq{\mathbb{R}}^d$ is a cube if $Q = \prod _{i=1}^d (x_i, x_i+a)$ with $x = (x_i)_{i=1}^d \in{\mathbb{R}}^d$ and $a\gt 0$ is its side length. The diameter of $Q$ is then $\sqrt{d}a$ and its volume $|Q| = a^d$ .
Lemma 2.7. Let $R$ , $B \subseteq X$ , let $T$ be an MST on the bipartite graph with independent sets $R$ , $B$ and edge weight $w(x,y) = \mathsf{d}(x,y)$ and let $\left \{ r,b \right \} \in T$ with $\delta \,:\!=\, \mathsf{d}(r,b) \gt \mathsf{d}(R,B)$ . Then $S \cap R = \emptyset$ , where
In particular, when $X = [0,1]^d$ , $S$ contains a cube $Q\subseteq [0,1]^d$ with volume
Proof. Assume by contradiction that there exists $r^{\prime} \in S \cap R$ , and consider the two connected components of the disconnected graph $T \setminus \left \{ r,b \right \}$ . If $r^{\prime}$ is in the same component as $r$ , then adding $\left \{ r^{\prime}, b \right \}$ to $T \setminus \left \{ r,b \right \}$ yields a tree (hence, connected) with strictly smaller cost, since $\mathsf{d}(r^{\prime},b) \lt \delta$ , hence a contradiction. If $r^{\prime}$ is in the same component as $b$ , then adding $\left \{ \mathsf{n}(r), r^{\prime} \right \}$ to $T\setminus \left \{ r,b \right \}$ again yields a tree with strictly smaller cost, since the triangle inequality gives
To prove the last statement, notice first that by convexity of $[0,1]^d$ , the point $x$ on the segment connecting $r$ and $b$ at $(\delta - \mathsf{d}(R,B))/2$ from $r$ belongs to $[0,1]^d$ . Moreover, the open ball centred at $x$ with radius $(\delta - \mathsf{d}(R,B))/2$ is entirely contained in $S$ . Finally, intersection of any ball with radius $u\ge 0$ and centre $x \in [0,1]^d$ contains at least a cube of side length $\min \{ u/\sqrt{d}, 1\}$ (the worst case is in general when $x$ is a vertex of $[0,1]^d$ ).
2.3. Probabilistic estimates
In this section we collect some basic probabilistic bounds on distances between i.i.d. uniformly distributed random variables $(X_i)_{i=1}^n$ on a cube $Q \subseteq{\mathbb{R}}^d$ . Some of these facts are well known, especially for $d=1$ , since they are related to order statistics, but we provide here short proofs for completeness. The basic observation is that, for every $x\in{\mathbb{R}}^d$ , $0 \le \lambda \le \big( |Q|/\omega _d \big)^{1/d}$ , we have
hence, by independence,
If $x \in Q$ , we also have the upper bound (the worst case being $x$ a vertex of $Q$ )
hence,
Assume $Q = [0,1]^d$ . The standard layer-cake formula $\mathbb{E}\!\left [ Z^p \right ] = \int _0^\infty \mathbb{P}(Z\gt \lambda ) p \lambda ^{p-1} d \lambda$ yields, for every $p\gt 0$ , the existence of a constant $C =C(d,p)\gt 0$ such that, for every $n\ge 1$ ,
For $A \subseteq [0,1]^d$ , write $N(A) = \sum _{i=1}^n I_{A}(X_i)$ . Then $N(A)$ has binomial law with parameters $(n, |A|)$ . In particular, for every $t\gt 0$ ,
An application of Markov’s inequality yields that, letting
then, for every $t\gt 1$ , it holds
and, for $t\lt 1$ ,
We need some uniform bounds on $N(Q)$ for every cube $Q \subseteq [0,1]^d$ with sufficiently large or small volume. We write for brevity, for $v \ge 0$ ,
and
Lemma 2.8. Let $(X_i)_{i=1}^n$ be i.i.d. uniformly distributed on $[0,1]^d$ . For every $v\in (0,1)$ , there exists $C = C(v,d) \ge 1$ such that, for every $n \ge C$ , if $t\gt 2^{2d}$ , then
while, if $t\lt 2^{-2d}$ ,
with $F$ as in (2.10).
Proof. We prove (2.13) first. Let $k \in \mathbb{Z}$ be such that
so that every cube $Q$ with $|Q| =v$ is contained in a dyadic cube $\prod _{\ell =1}^d ( n_\ell 2^{-k}, (n_\ell +1)2^{-k})$ , with $(n_\ell )_{i=1}^d \in \big\{ 0,\ldots, 2^{k}-1 \big\}^d$ . It is then sufficient to consider the event $N(Q) \gt t nv$ for at least one such dyadic cube, i.e., using the union bound (2.11), we bound from above
Using that, for a dyadic cube with $|Q| = 2^{-dk}$ ,
it follows that (2.11) applies with $2^{-2d}t$ instead of $t$ , yielding
The argument for (2.14) is analogous. Let $k \in \mathbb{Z}$ be such that
hence every cube $Q$ with $|Q| \ge v$ contains at least one dyadic cube with volume $2^{-kd}$ , and we are reduced to consider the event that for such a cube $N(Q) \lt t nv$ , i.e.,
Using that
it follows that (2.12) applies with $2^{2d}t$ instead of $t$ , yielding
In the following result we investigate the random variable (for $n \ge 2$ )
where $\mathsf{n}(X_i)$ denotes the closest point to $X_i$ among $\left \{ X_j \right \}_{j \neq i}$ , so that
Heuristically, since $|X_i-\mathsf{n}(X_i)| \sim n^{-1/d}$ , and the random variables are almost independent, we still expect that $M_n \sim n^{-1/d}$ , up to logarithmic factors. This is indeed the case.
Proposition 2.9. Let $n \ge 2$ , $(X_i)_{i=1}^n$ be i.i.d. uniformly distributed on $[0,1]^d$ . Then, for every $a\gt 0$ , there exists a constant $C = C(a,d) \ge 1$ such that
In particular, for every $q\gt 0$ , there exists $C =C(d,q)\gt 0$ such that, for every $n \ge 2$ ,
Proof. It is convenient to replace the Euclidean distance in the definition of $M_n$ with the $\ell ^{\infty }$ distance $|x-y|_\infty = \max _{i=1, \ldots, d} |x_i-y_i|$ , i.e., we consider
Using that $|x-y|_\infty \le |x-y| \le \sqrt{d} | x-y|_\infty$ , we have $M_n^\infty \le M_n \le \sqrt{d} M_n^\infty$ , hence the thesis for the Euclidean case would follow once established for $M_n^\infty$ , since we do not aim for a sharp constant $C$ .
For every $n$ sufficiently large, we choose $\eta = \eta (d,n) \in ((a+1)2^{2d+1}, (a+1)2^{2d+2})$ such that, defining $\delta = \left ( \eta \log ( n)/{n} \right )^{1/d}$ , we have that $1/(3\delta )^d$ is an integer.
Consider a partition of $[0,1]^d$ into cubes $\left \{ Q_j \right \}_{j\in J}$ of volume
with $J = \big\{ 1, \ldots, \delta ^{-d} \big\}$ . Fix $\bar{t}$ large enough, in particular such that $\bar{t}\gt 2^{2d}\eta$ and $\eta 2^d F(\bar{t}2^{-2d}/\eta ) \gt a+1$ . Lemma 2.8 entails that the event
has probability larger than $1- C/n^a$ for some constant $C = C(d,a)$ . Indeed, since $v = \delta ^d = \eta \log (n)/n$ , inequality (2.13) yields
where we used that $\log (n)\gt \log (2)$ .
Similarly, by (2.14), with $t = 2/({\log}\,(n) \eta )$ (that is smaller than $2^{-2d}$ for $n$ large enough) it holds
For $n$ sufficiently large, it holds $F\!\left ( 2^{2d+1}/({\log}\,(n) \eta ) \right )\gt 1/2$ , hence
using that $\eta \gt (a+1)2^{2d+1}$ .
Hence, to prove the thesis, we can assume that $E$ holds. In such a case, it follows immediately that $M_n^\infty \le \delta$ , since the $\ell ^\infty$ -diameter of a cube $Q_j$ for $j \in J$ is $v^{1/d} = \delta$ and each point belongs in one such cube where at least one other point can be found. Hence, we deduce
Next, for each $j \in J$ , we introduce the random variables
i.e., we maximise the minimum $\ell ^\infty$ distances between points in $Q_j$ and those that are at $\ell ^\infty$ distance at most $\delta$ . These are not necessarily in $Q_j$ but must belong to the union of all cubes $Q_\ell$ covering the set
and is easily seen to be a cube of side length $3 \delta$ . Notice also that, since we argue in the event $E$ , each cube $Q_j$ contains at least two elements, hence
We now use the following fact: for every $f\,:\, \left \{ 1, \ldots, n \right \} \to J$ , conditioning upon the event
the random variables $(X_i)_{i=1}^n$ are independent, each $X_{i}$ uniform on $Q_{f(i)}$ . Thus, we further disintegrate upon the events $A_f$ , and since $E$ holds we consider only $f$ ’s such that, for every $j\in J$ ,
We introduce then a subfamily $K \subseteq J$ consisting of $(3 \delta )^{-d}$ cubes such that, for $j,k \in K$ , with $j \neq k$ , $Q_{j,\delta } \cap Q_{k,\delta } = \emptyset$ , so that the random variables $(M_{n,k})_{k \in K}$ are independent (after conditioning upon $A_f$ ). Using
and independence we have, for every $\lambda \gt 0$ ,
The probability $\mathbb{P}\!\left( M_{n,k} \le \lambda | A_f \right)$ clearly depends only on the number of elements in each set $f^{-1}(j)$ , i.e., on the number of points in each $Q_j$ , for $j \in J$ , not their labels. We may therefore assume without loss of generality that $f(1) = k$ , i.e., $X_1 \in Q_k$ and that
with
Then,
hence, further conditioning upon $X_1$ and using independence,
where in the last equality we choose $\lambda = \delta ((1-e^{-u})/ \omega _d)^{1/d}$ with $u =1/\left(2\cdot 3^d \bar{t}\right)$ . Indeed, this choice ensures that, by (2.16) we bound from above, for $n$ sufficiently large,
Finally, (2.15) follows since $M_n^\infty \le 1$ , hence, choosing $a = q/d$ , we bound from above
for a (possibly different) constant $C=C(d,q)$ .
A minor variation of the proof of the previous proposition yields the following bipartite analogue, where we replace $M_n$ with the (random) Hausdorff distance between $R$ and $B$ .
Proposition 2.10. For $n \ge 1$ , let $R^n = \left \{ X_i \right \}_{i=1}^{n_R}$ , $B^n= \left \{ Y_i \right \}_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Then, for every $a\gt 0$ , there exists a constant $C = C(d,\alpha _R, \alpha _B, a)\ge 1$ such that, for every $n$ sufficiently large,
In particular, for every $q\gt 0$ , there exists $C =C(d,q,\alpha _R)\gt 0$ such that, for every $n$ sufficiently large,
Proof. The proof follows the same argument as the previous one, so we briefly describe the needed minor changes. It is again simpler to argue first in the $\ell ^\infty$ distance, hence we replace the variable $M^\infty _n$ with
After considering the same partition of $[0,1]^d$ into cubes $\left \{ Q_j \right \}_{j \in J}$ , we replace the event $E$ with
which has large probability, using first the independence between $R^n$ , $B^n$ and then arguing similarly as in the previous proof. Once reduced to the event $E$ , we repeat the same arguments replacing the variables $M_{n,j}$ , $j \in J$ , with
This eventually yields the bound
Finally, repeating the overall derivations with the roles of $R^n$ and $B^n$ exchanged, we obtain the thesis, because
3. Proof of Theorem 1.1
Throughout this section, for $n \ge 1$ , let $R^n = \left \{ X_i \right \}_{i=1}^{n_R}$ , $B^n= \left \{ Y_i \right \}_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Let $T^n$ denote the Euclidean bipartite MST on $R^n$ , $B^n$ , and write $\Delta (T^n)$ be the maximum vertex degree of $T^n$ .
We split the proof into two separate results.
Lemma 3.1. There exists $\epsilon \gt 0$ such that, as $n \to \infty$ ,
Proof. For $n$ sufficiently large, we have $n_R \ge \alpha _R n/2$ , $n_B \ge \alpha _B n/2$ . Fix any $a\gt 0$ and let $C_1 = C(d,a)\ge 1$ be as in Proposition 2.9 applied to the variables $(X_i)_{i=1}^{n_R}$ , so that the event $E^n$ , such that there exists $X_i \in R^n$ with
(the second inequality being true if $n$ is sufficiently large) has probability $\mathbb{P}(E^n)\ge 1-C/n^a$ as $n \to \infty$ , for a suitable constant $C = C(d,\alpha _R, a)\gt 0$ . In particular, every point in the cube $Q$ centred at $X_i$ with side length
is strictly closer to $X_i$ than any other point in $R^n$ . Notice that $E^n$ and such cube $Q$ depend on the random variables $R^n$ only. By Lemma 2.2, if $E^n$ holds, every $Y_j \in Q$ will be adjacent in $T^n$ to $X_i$ , hence, choosing
and writing $N_{B^n}(Q)$ for the number of elements in $B^n \cap Q$ , we have
By independence, the conditional law of $N_{B^n}(Q)$ is Binomial with parameters $\big( n_B, \delta ^d \big)$ , hence we may use (2.12) to obtain
Since $n_B \ge \alpha _B n/2$ , $\delta ^d = \log (n)/ \big(n \big(4 \sqrt{d} C\big)\big)$ and $F(1/2)\gt 0$ , it follows that, as $n \to \infty$ ,
Lemma 3.2. For every $a\gt 0$ , there exists $C = C(d,a, \alpha _R) \gt 0$ such that, for $n$ sufficiently large,
In particular, for every $q\gt 0$ , there exists $C =C(d, q, \alpha _R)\gt 0$ such that, for every $n$ large enough,
Proof. For $n$ sufficiently large, we have $n_R \ge \alpha _R n/2$ , $n_B \ge \alpha _B n/2$ . Fix $a\gt 0$ and let $C_1 = C_1(d,a, \alpha _R)\ge 1$ be as in Proposition 2.10 applied to the variables $(X_i)_{i=1}^{n_R}$ , $(Y_j)_{j=1}^{n_B}$ so that, if $n$ is sufficiently large, the event
has probability smaller than $C_1/n^a$ . We claim that there exists $\lambda =\lambda (d,a, \alpha _R) \gt 8 C_1$ and $C = C(d,a,\alpha _R)$ such that the following conditions hold, for $n$ sufficiently large:
-
1. letting $v_1 = \left ( \lambda/4 \right )^d \log (n)/n$ , then
\begin{equation*} \mathbb {P}\!\left ( N^*_{R^n}(v_1) \gt \lambda ^d \log n \right ) \le \frac {C}{n^a} \quad \text {and} \quad \mathbb {P}\!\left ( N^*_{B^n}(v_1) \gt \lambda ^d \log n \right ) \le \frac {C}{n^a};\end{equation*} -
2. letting $v_2 = \left [ \left ( \lambda/8 - C_1) \right )/\big(2\sqrt{d}\big) \right ]^{d} \log (n)/n$ , then
\begin{equation*} \mathbb {P}\!\left ( \left ( N_{R^n} \right )_*(v_2) =0 \right ) \le \frac {C}{n^a}, \quad \mathbb {P}\!\left ( \left ( N_{B^n} \right )_*(v_2) =0 \right ) \le \frac {C}{n^a}. \end{equation*}
Once the claim is proved, it is immediate to show that on the event
that satisfies $\mathbb{P}(E^c) \lt C/n^a$ (with a different constant $C$ ), it must be $\Delta (T^n) \le \lambda ^d \log (n)$ . Indeed, if $\Delta (T^n) \gt \lambda ^d \log n$ assuming without loss of generality that $X_i \in R^n$ has degree larger than $\lambda ^d \log n$ , it follows from $N_{B^n}^*(v_1) \le \lambda ^d \log n$ that there must be a node in $Y_j \in B^n$ adjacent to $X_i$ that does not belong to the cube with volume $v_1$ centred at $X_i$ . In particular,
By Lemma 2.7, there exists a cube $Q$ with $Q \cap R^n = \emptyset$ and
contradicting $(N_{R^n})_*(v_2) \gt 0$ .
To prove the claim, we apply Lemma 2.8. Indeed, (2.13) with $t= 4^d n/n_R \ge 2^{2d}$ gives
provided that $n$ is sufficiently large and $\lambda$ is sufficiently large so that $\lambda ^d \alpha _R 2^{-d-1}F(n/n_R) \ge a+1$ . Conversely, we use (2.14) with $t = 1/n_R v_2 = \left [ \left ( \lambda/8 - C_1) \right )/\big( 2\sqrt{d}\big) \right ]^{-d} n/(n_R\log (n)) \lt 2^{-2d}$ (if $n$ is sufficiently large) so that
provided that we choose $\lambda$ large enough such that
since for $n$ sufficiently large, we have $F\Big(t2^{2d}\Big) \gt 1/2$ . Naturally, the same argument works replacing $R^n$ with $B^n$ , thus completing the proof of (1) and (2).
Finally, (3.1) follows since trivially $\Delta (T^n) \le n$ , hence, choosing $a = q$ , we bound from above
4. Proof of Theorem 1.2
We first extend [Reference Avram and Bertsimas1, Theorem 1] to the bipartite case. Let $G =(V, E, w)$ be a random weighted graph, i.e., $(w_e)_{e \in E}$ are random variables. To simplify, we assume throughout this section that $G$ is the complete graph over $V =\left \{ 1,\ldots, m \right \}$ for some $m \ge 1$ , but allow weights $w(e) \in [0,\infty ]$ . Recall that connection between nodes is meant only along paths consisting of edges with finite weight: we assume in particular that $G$ is a.s. connected. The number of connected components of $G(z)$ can be written as
where the random variable $X_{i,k,G(z)} \in \left \{ 0,1 \right \}$ indicates whether $i \in V$ belongs to a component of $G(z)$ having exactly $k$ elements. Similarly, for the number of connected components having at least $k$ nodes,
To estimate the expectation of $X_{i,k,G(z)}$ , in [Reference Avram and Bertsimas1] it is assumed that $\left ( w_{i,j} \right )_{i,j \in V}$ are exchangeable random variables. To extend the validity of their results to the bipartite case we relax this condition by requiring that the joint law of the weights is invariant with respect to the symmetries of an underlying graph model (such as a complete bipartite graph). Let us give the following general definition.
Definition 4.1. On a random weighted graph $G = (V, E, w)$ , nodes $i, j \in V$ are said equivalent in law if there exists a bijection $\pi \,:\, V \to V$ such that $\pi (i) = j$ and $\left ( w_{k,\ell } \right )_{k,\ell \in V}$ have the same joint law as $\left ( w_{\pi (k),\pi (\ell )} \right )_{k,\ell \in V}$ .
Clearly, this defines an equivalence relation, which is relevant for our purposes since, if $i, j \in V$ are equivalent in law, then for every $k$ , $z \ge 0$ the random variables $X_{i,k,G(z)}$ , $X_{j,k,G(z)}$ have the same law. Therefore, when computing the expectation $\mathbb{E}\!\left [ C_{G(z)} \right ]$ using (4.1), we are reduced to a summation upon $k$ and the equivalence classes. If the weights are exchangeable, then there is only one equivalence class, but this is also the case for a random Euclidean bipartite graph with $V = R\cup B$ and $|R| = |B|$ . To deal with bipartite graphs with $|R| \neq |B|$ we consider the case of two (non empty) equivalence classes $R$ and $B$ . We introduce the functions
i.e., the probability that a given node in $R$ (respectively in $B$ ) belongs to a connected component of $G(z)$ with exactly $k$ elements. Taking the expectation in (2.2) and (4.1), we deduce that
Consider now a sequence of random graphs $\left ( G^n \right )_{n=1}^\infty = \left ( (V^n, E^n, w^n) \right )_{n=1}^\infty$ , each with two equivalence classes $V^n = R^n \cup B^n$ , and write, for brevity,
For a (pseudo-dimension) parameter $d \gt 1$ we introduce the following assumptions:
-
a) For any $k\ge 1$ , $y\gt 0$ ,
\begin{equation*} \lim _{n \to \infty } P_{k,R}^n\left ( (y/n)^{1/d} \right ) = f_{k,R}(y), \quad \text {and} \quad \lim _{n \to \infty } P_{k,B}^n\left ( (y/n)^{1/d} \right ) = f_{k,B}(y),\end{equation*}where convergence is pointwise and dominated in the following sense: for every $k\ge 1$ there exists a function $\ell _{k}$ and $n_k \ge 1$ such that, for every $y\gt 0$ ,\begin{equation*} \sup _{n \ge n_k} P_{k,R}^n\left ( (y/n)^{1/d} \right )+ P_{k,B}^n\left ( (y/n)^{1/d} \right ) \le \ell _k(y)\end{equation*}and\begin{equation*} \int _{0}^{\infty } \ell _{k}(y)y^{\frac {1}{d}-1}dy \lt \infty .\end{equation*} -
b) It holds
\begin{equation*} \lim _{k \to \infty } \limsup _{n\to \infty } \left | \frac {1}{n^{1-\frac {1}{d}}} \int _{0}^{\infty } \left ( \mathbb {E}\!\left [ C_{k}^n(z) \right ]-1 \right ) dz \right | = 0.\end{equation*}
The following result extends [Reference Avram and Bertsimas1, Theorem 1] to the bipartite case.
Theorem 4.2. Let $d \gt 1$ and $\left ( G^n \right )_{n=1}^{\infty }$ be a sequence of random graphs, each with two equivalence classes $V^n = R^n \cup B^n$ , satisfying assumptions a), b),
and such that, for some constant $\bar{z} \in (0, \infty )$ , it holds $\mathbb{P}$ -a.s.
Then,
Proof. By (2.2), for any $n \ge 1$ , $k \ge 1$ , we decompose
having used that, for $z \gt \bar{z}$ the integrand in (2.2) is identically zero. Assumption b) gives that in the limit $n \to \infty$ , $k \to \infty$ the last line gives no contribution in the limit. Hence, it is sufficient to let $n \to \infty$ and then $k\to \infty$ in first term. Actually, since $n^{1-1/d} \to \infty$ , we only need to prove that
For fixed $k$ , the limit as $n \to \infty$ follows by dominated convergence applied to each $\ell =1, \ldots, k-1$ , yielding
after the change of variables $z = (y/n)^{1/d}$ and using Lebesgue’s dominated convergence Theorem, as ensured by assumption a). Finally, the limit as $k \to \infty$ follows by monotone convergence.
We now apply the above theorem to the bipartite MST problem on the $d$ -dimensional flat torus $\mathbb{T}^d ={\mathbb{R}}^d/\mathbb{Z}^d$ , endowed with the flat distance
This can be seen as an intermediate step towards the proof of Theorem 1.2, where we exploit the additional homogeneity of the torus. In particular, we use the fact that the uniform distribution on $\mathbb{T}^d$ is invariant with respect to translations. In practice, this means that compared to the case of the cube $[0,1]^d$ , one does not have to consider whether a point is close or far from the boundary (of course $\mathbb{T}^d$ has no boundary): a fact that in the limit plays no role also in $[0,1]^d$ , but it could make a direct derivation much more complicated.
Theorem 4.3. Let $d \ge 1$ , $n\ge 1$ , $R^n = \left \{ X_i \right \}_{i=1}^{n_R}$ , $B^n = \left \{ Y_i \right \}_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $\mathbb{T}^d$ with $n_R+n_B = n$ and
Then, for every $p \in (0,d)$ ,
with $\beta _{bMST}(d,p)$ as in Theorem 1.2 .
Proof. We apply Theorem 4.2 to the random bipartite graph $G^n$ over $V^n=R^n\cup B^n$ , and $w^n\big(X_i, Y_j\big) = \mathsf{d}_{\mathbb{T}^d}\big(X_i, Y_j\big)^p$ , that can be naturally identified with a graph over $\left \{ 1,\ldots, n \right \}$ (recall that we allow for infinite weights). We show separately that assumptions a) and b) hold with $d/p\gt 1$ instead of $d$ .
We introduce some notation: for $k_R$ , $k_B \in \mathbb{N}$ , $z\ge 0$ , let
denote the set of (ordered) points $\left( \left ( r_i \right )_{i=1}^{k_R}, \left ( b_j \right )_{j=1}^{k_B}\right)$ such that, in the associated bipartite graph with weights $( \mathsf{d}_{\mathbb{T}^d}(r_i,b_j)^p)_{i,j}$ , the subgraph with all edges having weight less than $z$ is connected (or equivalently, there exists a bipartite spanning tree having all edges with weight less than $z$ ). For a set $A \subseteq \mathbb{T}^d$ , $z\ge 0$ , write
Recall that by definition $P_{k,n,R}(z)$ is the probability that a fixed vertex in $R$ , say $X_1$ , belongs to a component of the subgraph $G^n(z)$ having exactly $k$ nodes. We disintegrate upon the nodes in $R$ and in $B$ belonging to such component. Clearly, only their numbers are relevant, not the precise labels (except for $X_1$ that is fixed). Therefore, we compute the probability $I_R(k_R, k_B, z)$ that $X_1$ belongs to a component with $k_R$ nodes $\left \{ X_i \right \}_{i=1}^{k_R} \subseteq R$ and $k_B$ nodes $\left \{ Y_j \right \}_{j=1}^{k_B} \subseteq B$ , with $k_R+k_B = k$ , which is precisely described as follows:
-
1. the set $\left(\left \{ X_i \right \}_{i=1}^{k_R}, \left \{ Y_j \right \}_{j=1}^{k_B}\right)$ belongs to $\Theta _{\mathbb{T}^d}(k_R, k_B, z)$ ,
-
2. $\mathsf{d}_{\mathbb{T}^d}\left(\left \{ Y_j \right \}_{j=1}^{k_B}, X_i \right) \gt z$ , i.e. $X_i \notin D_{\mathbb{T}^d} \left(\left \{ Y_j \right \}_{j=1}^{n_B}, z\right)$ , for every $i \gt k_R$ ,
-
3. $\mathsf{d}_{\mathbb{T}^d} \left(\left \{ X_i \right \}_{i=1}^{k_R}, Y_j\right) \gt z$ , i.e. $Y_j \notin D_{\mathbb{T}^d} \left ( \left \{ X_i \right \}_{i=1}^{k_R}, z \right )$ , for every $j \gt k_B$ .
Conditioning upon $\left \{ X_i \right \}_{i=1}^{k_R}$ , $\left \{ Y_j \right \}_{j=1}^{k_B}$ and using independence for the events 2) and 3), we obtain the following expression for the probability:
where $d r d b$ stands for integration performed with respect to the variables $\left \{ r_i \right \}_{i=1}^{k_R}$ and $\left \{ b_j \right \}_{j=1}^{k_B}$ . Summing upon all the different choices of labellings (recall that $X_1$ is kept fixed) and upon $k_R \ge 1$ , $k_B$ with $k_R+k_B = k$ gives
We now replace integration in (4.3) from $(\mathbb{T}^{d})^{k}$ to $\big({\mathbb{R}}^d\big)^k$ . This is possible provided that $z$ is small enough, so that only the local structure is relevant. We first notice that, since we are considering the flat torus $\mathbb{T}^d$ , by exploiting the invariance with respect to translations of the uniform distribution, we can always fix one variable, say $r_1 = 0$ . We thus integrate upon the configurations $\left \{ r_i \right \}_{i=2}^{k_R}$ , $\left \{ b_j \right \}_{j=1}^{k_B}$ such that adding $0$ to the set $\left \{ r_i \right \}_{i=2}^{k_R}$ yields a bipartite graph that contains a spanning tree $T$ with edge weights smaller than $z$ . Now, if $z \le 1/(4k)$ , it follows that such tree is contained in a ball of centre $0 \in \mathbb{T}^d$ and radius $1/4$ , hence it can be isometrically lifted to a tree on $(-1/2,1/2)^d \subseteq{\mathbb{R}}^d$ . Similarly, both $D_{\mathbb{T}^d}\left(\left \{ b_j \right \}_{j=1}^{k_B}, z\right)$ , $D_{\mathbb{T}^d}\left(\left \{ r_i \right \}_{i=1}^{k_R},z\right)$ are then contained in a ball of centre $0$ and radius $1/2$ , hence their volumes computed on $\mathbb{T}^d$ coincide with those of their respective lifts on $(-1/2, 1/2)^d \subseteq{\mathbb{R}}^d$ . To parallel the notation, we therefore define the set $\Theta _{{\mathbb{R}}^d}(k_R, k_B, z) \subseteq \big({\mathbb{R}}^d\big)^{k_R} \times \big({\mathbb{R}}^d\big)^{k_B}$ , analogous to $\Theta _{\mathbb{T}^d}(k_R, k_B, z)$ . We notice that
as defined in (1.3). We write
for $A \subseteq{\mathbb{R}}^d$ , and notice that $D_{{\mathbb{R}}^d}(A,1) = D(A)$ as defined in (1.4). As noticed above, for $z \le 1/(4k)$ , we have the identity
where here $drdb$ denotes Lebesgue integration with respect to the remaining $k-1$ variables in ${\mathbb{R}}^d$ .
Since for every $A \subseteq{\mathbb{R}}^d$ and $z\gt 0$ ,
a change of variable in the integration $r_i z^{-1/p} \mapsto r_i$ , $b_i z^{-1/p} \mapsto b_i$ yields, for $z \le 1/(4k)$ ,
(notice the exponent $k-1$ instead of $k$ because of the different integration for $r_1$ ). Given $y\gt 0$ , set $z\,:\!=\,\left ( y/n \right )^{p/d}$ , so that $z \le 1/(4k)$ if $n$ is sufficiently large, hence
Since $\Theta \!\left(k_R, k_B\right)$ has finite measure (with respect to $\delta _0(r_1)drdb$ ), by dominated convergence it follows that
Moreover, we have the limit
so that for every $y\gt 0$ we have pointwise convergence
We next prove that the convergence is dominated in the sense of assumption a), with $d/p$ instead of $d$ . In view of (4.4) and the limit (4.5) it is sufficient to establish an inequality
for some $n_k\ge 1$ and a suitable function $\ell _{R, k_R, k_B}$ such that
We begin by noticing that, if $A \subseteq \mathbb{T}^d$ is not empty, then for $0\lt z \le 2^{-p}$ we have $|D_{\mathbb{T}^d}(A, z)| \ge \omega _d z^{d/p}$ , while for $2^{-p} \lt z \le \sqrt{d}/2$ it holds $\big|D_{\mathbb{T}^d}(A,z)\big| \ge \omega _d 2^{-d}$ and for $z \ge \sqrt{d}/2$ we have $\big|D_{\mathbb{T}^d}(A,z)\big| = 1$ . Therefore, one can find a constant $c=c(d)\gt 0$ such that
Next, we argue that, for every $z \gt 0$ , one has the inequality
Indeed, every tree (even not necessarily bipartite) with edge weights smaller than $z$ can be obtained starting from any given ‘root’ $x_1$ and by recursively choosing a sequence of points such that
where the second inclusion follows by the triangle inequality and a simple induction. Hence, conditionally upon $x_1$ , the set $\Theta _{\mathbb{T}^d}(n_R, n_B, z)$ is contained in the Cartesian product $\left \{ x_1 \right \}\times D_{\mathbb{T}^d}\big(x_1, (k-1)^pz\big)^{(k-1)}$ , and (4.9) follows. Using the inequalities (4.8) and (4.9) and further assuming that $k_B \ge 1$ (recall that we already have $k_R \ge 1$ by hypothesis), we have
Substituting $z = (y/n)^{p/d}$ yields the domination (4.6) with
for $n \ge 2k$ , and clearly (4.7) holds using that $c\gt 0$ and $p/d\lt 1$ . Arguing similarly in the case $k_B=0$ , $k_R=1$ , we obtain
yielding a similar domination also in this case. Finally, in the cases $k_B = 0$ and $k_R \ge 2$ , there is actually nothing to prove, since $\Theta _{\mathbb{T}^d}(k_R, k_B, z)$ is empty.
Performing similar arguments for $P_{k,B}^n$ yields instead
with analogous definitions and dominations. Thus, the validity of assumption a) in this case is fully established.
It is convenient to notice here that, by exchanging the order of integration, we have
This identity, after some simple manipulations and taking into account also the analogous contributions from the terms $\mathcal{I}_B\big(k_R, k_B, \alpha _R, y\big)$ eventually yields the claimed expression for $\beta _{b\text{MST}}(p,d)$ .
We next prove that assumption b) holds. A lower bound is straightforward, since the maximum weight of the edges is uniformly bounded (by some constant $M= M(d,p) \gt 0$ , e.g. $M = d^{p/2}$ ), it follows that $C_k^n(z)=1$ if $z \ge M$ and $n$ is sufficiently large (recall that we must let first $n \to \infty$ and then $k \to \infty$ , so we can assume $n \ge k$ ).
It follows that
To obtain an upper bound we use Lemma 2.4 on each $G^n$ – we can assume $k\ge 2$ and $n \ge k$ . Given the sets $C_i$ , for $i= 1\ldots, m$ with $m \le n/k$ , we choose elements $r_i \in C_i$ such that $b_i\,:\!=\,\mathsf{n}_{G^n}(r_i) \in C_i$ and $\mathsf{n}_{G^n}(b_i) = r_i$ . Without loss of generality, we can assume that $r_i \in R^n$ , $b_i \in B^n$ . We consider the induced subgraph of $G^n_k\subseteq G_k$ obtained by restriction on the nodes $R^n_k \,:\!=\, \left \{ r_i \right \}_{i=1}^m$ , $B^n_k \,:\!=\, \left \{ b_i \right \}_{i=1}^m$ . With the notation of Lemma 2.4, we have
hence
We then use Lemma 2.5 (in fact applied on the metric space $\mathbb{T}^d$ ) to obtain that, for some constant $C = C(p)\gt 0$ ,
where only one summation appears since $\mathsf{n}_{G^n_k} (b_i) = \mathsf{n}_G(b_i) = r_i$ . Bounding from above the distance on $\mathbb{T}^d$ with the Euclidean distance, and using Remark (2.6), we have the inequality, for some constant $C = C(d,p)\gt 0$ ,
where we also used that $b_i = \mathsf{n}_{G^n}(r_i)$ . We finally apply (2.9) with $x = r_i$ and the i.i.d. uniform random variables $(Y_j)_{j=1}^n$ to conclude that, again for some further constant $C = C(d,p)\gt 0$ ,
Dividing by $n^{1-p/d}$ and letting first $n\to \infty$ and then $k \to \infty$ gives the thesis.
To transfer the result from the torus $\mathbb{T}^d$ to the cube $[0,1]^d$ , we use the fact that points in $[0,1]^d$ can be projected to $\mathbb{T}^d$ , and $\mathsf{d}_{\mathbb{T}^d}(x,y) \le |x-y|$ , so that, for any $p\gt 0$ , $R$ , $B \subseteq [0,1]^d$ ,
for any set of points $R$ , $B \subseteq [0,1]^d$ , where $\mathcal{C}_{\text{MST}}^p\left(R,B |\mathbb{T}^d\right)$ denotes the MST cost functional on $\mathbb{T}^d$ . Letting $p \to \infty$ yields also $\mathcal{C}_{\text{MST}}^\infty \left(R,B |\mathbb{T}^d\right) \le \mathcal{C}_{\text{MST}}^\infty (R,B)$ . A converse inequality is the following one.
Lemma 4.4. If $\delta \in (0,1/2)$ is such that $\mathcal{C}_{\text{MST}}^\infty (R,B) \le \delta$ , then, for every $p\gt 0$ ,
where $R_\delta = R\setminus [ \delta, 1- \delta ]^d$ , $B_\delta = B\setminus [ \delta, 1- \delta ]^d$ .
Proof. Indeed, let $T$ be an MST for $R$ , $B$ projected on $\mathbb{T}^d$ . The assumption gives that for every $\left \{ r,b \right \} \in T$ , since $|r-b|\le \delta$ , it must be $\mathsf{d}_{\mathbb{T}^d}(r,b) = |r-b|$ if $r \in R\setminus R_\delta$ or $b \in B \setminus B_\delta$ . Therefore, the only obstacle to bound $\mathcal{C}_{\text{MST}}^p(R,B)$ from above by $\mathcal{C}_{\text{MST}}^p\left(R,B|\mathbb{T}^d\right)$ is due to edges $\left \{ r,b \right \} \in T$ with $r\in R_\delta$ and $b \in B_\delta$ , for which $\mathsf{d}_{\mathbb{T}^d}(r,b)$ may be much smaller than $|r-b|$ . However, removing all these edges and adding all the edges of a bipartite Euclidean MST over $R_\delta$ , $B_\delta$ yields a connected graph and the desired upper bound.
We combine the above lemma with the following asymptotic upper bounds.
Lemma 4.5. For $n \ge 1$ , let $R^n = \left \{ X_i \right \}_{i=1}^{n_R}$ , $B^n= \left \{ Y_i \right \}_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Then, for every $p\gt 0$ , there exists a constant $C = C(d,p, \alpha _R)\gt 0$ such that, for $n$ large enough,
as well as, for some constant $C=C(d,p, \alpha _R) \gt 0$ ,
and finally, for every $a\gt 0$ , there exists $C= C(d,a,\alpha _R)\gt 0$ , such that for $n$ large enough,
Proof. This follows from an application of the space-filling curve technique: consider $\gamma \,:\, [0,1] \to [0,1]^d$ such that the push-forward of the uniform measure on $[0,1]^d$ is the uniform measure on $[0,1]^d$ and it is Hölder continuous with exponent $1/d$ , i.e.,
While many constructions for $d=2$ are historically well known, the case of general $d$ is established in detail e.g. in [Reference Milne23]. Let then $\left ( Z_i \right )_{i=1}^{m}$ be i.i.d. uniform on $[0,1]$ , so that $\left ( \gamma (Z_i) \right )_{i=1}^{m}$ are i.i.d. on $[0,1]^d$ . Consider the order statistics
and let $T$ be the connected graph on $\left \{ \gamma (Z_i) \right \}_{i=1}^{m}$ with edges
We have
The law of each $Z_{(i+1)} - Z_{(i)}$ is beta $B(1,n)$ , so that, for every $q \in \mathbb{N}$ ,
Bounding the $p/d$ -th moment with the $\left \lceil p/d \right \rceil$ -th moment gives that, for $p\gt 0$ , there exists a constant $C(p)$ such that, for every $m$ and $i$ ,
The first inequality of the thesis thus follows by summation upon $i= 1\ldots, m-1$ and letting $m=n_R$ . For the second inequality, we use Lemma 2.5 and (2.9). The remaining inequalities follow analogously, noticing that
with the notation of Proposition 2.9.
The following result entails that the expectation of the bipartite Euclidean minimum spanning tree cost on $\mathbb{T}^d$ and on the cube $[0,1]^d$ are much closer than the rate $n^{1-p/d}$ .
Proposition 4.6. For $n \ge 1$ , let $R^n = \left \{ X_i \right \}_{i=1}^{n_R}$ , $B^n= \left \{ Y_i \right \}_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Then, for every $p\gt 0$ , there exists a constant $C = C(p,d)\gt 0$ such that, for $n$ large enough,
In particular, for every $p \lt d$ ,
with $\beta _{b\text{MST}}(p,d)$ as in Theorem 1.2 .
Proof. Let $C\gt 0$ be as in (4.10) with $a\gt p/d$ and let $\delta = C({\log}\,(n)/n)^{1/d}$ . Since both $n_R$ and $n_B$ grow linearly with $n \to \infty$ , an application of Lemma 2.8 as in the proof of Proposition 2.9 ensures that with probability larger than $1-C/n^a$ , for every fixed $a\gt 0$ , in particular for $a\gt p/d$ (and a suitable constant $C\gt 0$ ) all cubes $Q$ with $|Q| = \delta ^d$ have non-empty intersections both with $R^n$ and $B^n$ . Let $E$ be the event where both these conditions occur as well as $\mathcal{C}_{\text{MST}}^\infty (R^n, B^n) \le \delta$ . In the event $E$ , we may apply Lemma 4.4, otherwise we simply bound $\mathcal{C}_{\text{MST}}^p(R^n, B^n) \le n d^{p/2}$ and use the fact that $\mathbb{P}(E^c)\le C/n^a$ for $a$ arbitrary large.
We are then reduced to bound from above
where $R^n_\delta = R^n \setminus [\delta, 1-\delta ]^d$ , $B^n_\delta = B^n \setminus [\delta, 1-\delta ]^d$ . We decompose $[0,1]^d \setminus [\delta, 1-\delta ]^d$ into $\left ( Q_i \right )_{i=1}^m$ , cubes with side length $\delta$ , with $m \le C \delta ^{-(d-1)}$ , for some $C = C(d)\gt 0$ . We consider a bipartite Euclidean minimum spanning tree $T^n_i$ on $R^n_i \,:\!=\, R^n \cap Q_i$ , $B^n_i \,:\!=\, B^n \cap Q_i$ – both are non empty if the event $E$ occurs – and then add $m-1$ edges, each with one node in a cube and an adjacent one, to connect all these trees (again this is possible since $E$ occurs). This construction leads to a bipartite spanning tree on $R^n_\delta$ , $B^n_\delta$ , so that, in $E$ ,
Taking expectation and rescaling from the cube $Q_i$ to $[0,1]^d$ , we bound each term in the sum using Lemma 4.5, writing $N_R(Q_i) = |R^n_i|$ , $N_B(Q_i) = |B^n_i|$ , that are independent random variables with binomial laws with common parameters $(n, \delta ^d)$ . It follows that
having used that
and
It follows that, for some (other) constant $C = C(p,d)\gt 0$ ,
that eventually gives the thesis.
We end the proof of Theorem 1.2 with a concentration result to improve from convergence of expectations to almost sure convergence.
Proposition 4.7. Let $d \ge 1$ . For $n \ge 1$ , let $R^n = \left \{ X_i \right \}_{i=1}^{n_R}$ , $B^n= \left \{ Y_i \right \}_{i=1}^{n_B}$ be (jointly) i.i.d. uniformly distributed on $[0,1]^d$ with $n_R+n_B = n$ and
Then, for every $p \in (0, d/2)$ if $d \in \left \{ 1,2 \right \}$ or any $p\gt 0$ if $d \ge 3$ , almost sure convergenc holds:
Proof. Consider the function
We argue separately for $p\lt 1$ and $p\ge 1$ . In the former case, if $(x,y)$ , $(x^{\prime},y^{\prime}) \in \left ( [0,1]^{d} \right )^{n_R+n_B}$ differ only on a single coordinate, say for simplicity $x_1 \neq x^{\prime}_1$ , then, letting $T$ denote the Euclidean bipartite minimum spanning tree on $\left \{ x_i \right \}_{i=1}^{n_R}$ , $\left \{ y_j \right \}_{j=1}^{n_B}$ ,
where $\Delta (T)$ denote the maximum degree of $T$ . Arguing symmetrically, we obtain
By Lemma A.1 with $E_i = [0,1]^d$ and $g_i(x,y) = \Delta (T) d^{p/2}$ (for every $i=1,\ldots, n$ ) we obtain, for every $q\ge 2$ , that there exists $C = C(q)\gt 0$ such that
where we used (3.1) and $C^{\prime}= C(d,q,\alpha _R, \alpha _b)\gt 0$ is a constant. Dividing both sides by $n^{q(1-p/d)}$ and using Markov’s inequality yields, for every $\epsilon \gt 0$ ,
that is summable if $p/d-1/2\lt 0$ , i.e., $p\lt d/2$ , and $q$ is sufficiently large.
In the case $p \ge 1$ , we use the fact that $f$ is Lipschitz (being minimum of Lipschitz functions) with a.e. derivative given by
We bound from above, using the Cauchy-Schwartz inequality
It follows that the (Euclidean) norm of the derivative is bounded above by
where we used the fact that the minimum spanning tree $T$ does not depend on the choice of $p$ , see Remark 2.3. For every $q\gt 0$ Lemma 3.2 and Lemma 4.5 yield that, for some constant $C = C(d,p,q)\gt 0$ and $n$ sufficiently large,
It follows that (possibly for a larger constant $C$ )
Poincaré’s inequality for the uniform measure on the unit cube, see e.g. the argument in [Reference Ledoux21, Prop. 2.8], gives that, for some constant $C = C(q)\gt 0$ ,
thus using Markov’s inequality, for every $\epsilon \gt 0$ , we have
which is summable if $q$ is large enough and $d \ge 3$ .
Acknowledgements
Both authors warmly thank the anonymous referees for their feedback and suggestions, which improved the quality of this paper. D.T. is a member of the INdAM GNAMPA group and was partially supported by GNAMPA project 2020 ‘Problemi di ottimizzazione con vincoli via trasporto ottimo e incertezza’.
Appendix A. A concentration inequality in $\textit{L}^{\textit{q}}$
McDiarmid inequality [Reference McDiarmid22] is a simple but effective concentration inequality often used in random combinatorial optimisation problems. An interpretation is that the oscillations of a function of many independent random variables are bounded by its Lipschitz norm (with respect to a Hamming-type distance). The usual proof relies on concentration inequalities for discrete time exponential martingales. In this appendix we show an analogous result where we replace the Lipschitz condition with a ‘Sobolev’ one and use the Burkholder-Gundy inequalities instead.
Lemma A.1. Let $\left ( (E_i, \mathcal{E}_i) \right )_{i=1}^n$ be measurable spaces, set $E = \prod _{i=1}^n E_i$ and let
be such that, for every $i \in \left \{ 1,\ldots, n \right \}$ , for every $x$ , $x^{\prime}\in E$ with
then
For every $q \ge 2$ , there exists $C= C(q)\gt 0$ such that, if $X = (X_i)_{i=1}^n$ are independent random variables, with $X_i\,:\, \Omega \to E_i$ , then
Proof. Write $\mathbb{E}_{i}$ for the conditional expectation with respect to the variables $\left ( X_j \right )_{j\le i}$ , i.e., conditioned on the values of these variables. In particular, $\mathbb{E}_{0} = \mathbb{E}$ and $\mathbb{E}_{n}$ is the identity operator. We write $f(X)- \mathbb{E}\!\left [ f(X) \right ]$ as a sum of martingale differences
The Burkholder-Gundy inequality [Reference Burkholder and Gundy7] gives, for some constant $C=C(q) \ge 0$ ,
To simplify notation, we introduce a copy of the independent variables $\big(X^{\prime}_i\big)_{i=1}^n$ defined on a different space $\Omega ^{\prime}$ , and write $\mathbb{E}^{\prime}$ for expectation with respect to such variables, so that, because of independence, for every $i=0,\ldots, n$ ,
This leads to
Using these inequalities in (A.1) yields
hence the thesis.