Hostname: page-component-7bb8b95d7b-dtkg6 Total loading time: 0 Render date: 2024-09-28T19:21:38.289Z Has data issue: false hasContentIssue false

On the support of measures with fixed marginals with applications in optimal mass transportation

Part of: Manifolds

Published online by Cambridge University Press:  29 May 2024

Abbas Moameni*
Affiliation:
School of Mathematics and Statistics, Carleton University, Ottawa, ON K1S 5B6, Canada
Rights & Permissions [Opens in a new window]

Abstract

Let $\mu $ and $\nu $ be Borel probability measures on complete separable metric spaces X and Y, respectively. Each Borel probability measure $\gamma $ on $X\times Y$ with marginals $\mu $ and $\nu $ can be described through its disintegration $\big ( \gamma _{x}\big )_{x \in X}$ with respect to the initial distribution $\mu .$ Assume that $\mu $ is continuous, i.e., $\mu \big (\{x\}\big )=0$ for all $x \in X.$ We shall analyze the structure of the support of the measure $\gamma $ provided $\text {card } \big (\mathrm{spt} (\gamma _{x}) \big )$ is finitely countable for $\mu $-a.e. $x\in X.$ We shall also provide an application to optimal mass transportation.

Type
Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Canadian Mathematical Society

1 Introduction

Let X and Y be Polish spaces equipped with Borel probability measures $\mu $ on X and $\nu $ on $Y.$ Recall that a measure is called continuous if $\mu \big (\{x\}\big )=0$ for all $x \in X.$ Let $\Pi (\mu ,\nu )$ be the set of Borel probability measures on $X\times Y$ which have X-marginal $\mu $ and Y-marginal $\nu .$ Let $\gamma \in \Pi (\mu ,\nu ). $ In what follows, we say that $\gamma \in \Pi (\mu ,\nu )$ is concentrated on a set S if the outer measure of its complement is zero, i.e., $\gamma ^*(S^c)=0.$ The support of the measure $\gamma $ is denoted by $\mathrm{spt}(\gamma )$ and is the smallest closed set such that $\gamma $ is zero on its complement. We now define precisely some notation describing measures concentrated on several graphs.

Definition 1.1 Let X and Y be Polish spaces with Borel probability measures $\mu $ on X and $\nu $ on $Y.$ Let $k \in \mathbb {N}\cup \{\infty \}.$ We say that a measure $\gamma \in \Pi (\mu , \nu )$ is concentrated on the graphs of measurable maps $\{G_i\}_{i=1}^k$ from X to Y, if there exists a sequence of measurable nonnegative functions $\{\alpha _i\}_{i=1}^k$ from X to with $\sum _{i=1}^k \alpha _i(x)=1$ ( $\mu $ -almost surely) such that for each bounded continuous function ,

$$\begin{align*}\int_{X \times Y} f(x,y)\, d \gamma= \sum_{i=1}^k\int_X \alpha_i(x) f(x,G_ix) \, d\mu. \end{align*}$$

In this case, we write $\gamma =\sum _{i=1}^k (Id \times G_i)_\#(\alpha _i \mu ).$

Setting $\Gamma =\mathrm{spt}(\gamma ),$ for every $x \in X$ , we denote by $\Gamma _x$ the x-section of $\Gamma $ , i.e.,

$$\begin{align*}\Gamma_x=\big \{y \in Y; \, (x,y)\in \Gamma\big \}. \end{align*}$$

Here is our main result in this paper.

Theorem 1.2 Let $\mu $ and $\nu $ be Borel probability measures on complete separable metric spaces X and Y, respectively. Assume that at least one of $\mu $ or $\nu $ is continuous. Let ${\gamma \in \Pi (\mu ,\nu )}$ and $\Gamma =\mathrm{spt}(\gamma ).$ The following assertions hold:

  1. 1. If there exists $m \in \mathbb {N}$ such that $\text {card} \big (\Gamma _x \big ) \leq m$ for $\mu $ -a.e. $x\in X,$ then there exists $k\leq m$ and a sequence of Borel measurable maps $\{G_i\}_{i=1}^k$ from X to Y such that the measure $\gamma $ is concentrated on their graphs.

  2. 2. If $\text {card} \big (\Gamma _x \big )< \infty $ for $\mu $ -a.e. $x\in X,$ then there exist $k\in \mathbb {N}\cup \{\infty \}$ and a sequence of Borel measurable maps $\{G_i\}^k_{i=1}$ from X to Y such that the measure $\gamma $ is concentrated on their graphs.

This theorem has direct applications in the theory of optimal transportation as it provides a precise description of the structure of optimal plans [Reference Ahmad, Kim and McCann1, Reference Levin6, Reference McCann and Rifford7, Reference Moameni and Rifford10Reference Villani12]. Theorem 1.2 has a straightforward generalization to the multi-marginal case (see Corollary 2.9). We refer to [Reference Moameni and Pass9] for applications of this result in multi-marginal mass transportation. We also remark that a weaker version of Theorem 1.2 is proved implicitly in [Reference Moameni8]. The next section is devoted to the proof of the main theorem.

2 Preliminaries and the proof of Theorem 1.2

We shall need some important preliminaries from the theory of measures before proving Theorem 1.2. Let $(X, \mathcal {B}, \mu )$ be a finite, not necessarily complete measure space, and let $(Y, \Sigma )$ be a measurable space. The completion of $\mathcal {B}$ with respect to $\mu $ is denoted by $\mathcal {B}_\mu .$ When necessary, we identify $\mu $ with its completion on $\mathcal {B}_\mu .$ The push forward of the measure $\mu $ by a map $T: (X, \mathcal {B}, \mu ) \to (Y, \Sigma )$ is denoted by $T_\# \mu ,$ i.e.,

$$\begin{align*}T_\# \mu (A)=\mu(T^{-1}(A)), \qquad \forall A \in \Sigma.\end{align*}$$

Definition 2.1 Let $T: X \to Y$ be $(\mathcal {B}, \Sigma )$ -measurable, and let $\nu $ be a positive measure on $\Sigma .$ We call a map $F: Y \to X$ a $(\Sigma _\nu ,\mathcal {B})$ -measurable section of T if F is $(\Sigma _\nu ,\mathcal {B})$ - measurable and $T \circ F = \text {Id}_Y.$

If X is a topological space we denote by $\mathcal {B}(X)$ the set of Borel sets on $X.$ The space of Borel probability measures on a topological space X is denoted by $\mathcal {P}(X)$ . The following definition and proposition are essential in the sequel.

Definition 2.2 Let X be a Polish space, let $T: X \to X$ be a surjective Borel measurable map, and let $\mu $ be a positive finite measure on $\mathcal {B}(X).$ Denote by $\mathcal {S}(T)$ the set of all measurable sections of T, i.e.,

$$\begin{align*}\mathcal{S}(T)=\Big\{ F:\big (X, \mathcal{B}(X)_\mu\big ) \to \big (X,\mathcal{B}(X)\big ); \, \, T \circ F=Id_X \Big \}.\end{align*}$$

Let $\mathcal {K} \subset \mathcal {S}(T).$ We say that a measurable function $F:\big (X, \mathcal {B}(X)_\mu \big ) \to \big (X,\mathcal {B}(X)\big )$ is generated by $\mathcal {K}$ if there exist a sequence $\{F_i\}_{i=1}^\infty \subset \mathcal {K}$ such that

$$\begin{align*}X=\cup_{i=1}^\infty\big \{x \in X; \, \, F(x)=F_i(x)\big \}.\end{align*}$$

We also denote by $\mathcal {G}(\mathcal {K})$ the set of all functions generated by $\mathcal {K}.$ It is easily seen that $\mathcal {K} \subseteq \mathcal {G}(\mathcal {K})\subseteq \mathcal {S}(T).$

Proposition 2.1 Let X be a Polish space, let $T: X \to X$ be a surjective Borel measurable map, and let $\mu $ be a positive finite measure on $\mathcal {B}(X).$ Let $\mathcal {K}$ be a nonempty subset of $\mathcal {S}(T).$ Then, there exist $ k \in \mathbb {N} \cup \{\infty \}$ and a sequence $\{F_i\}_{i=1}^k \subset \mathcal {G}(\mathcal {K})$ such that the following assertions hold:

  1. 1. For each $i\in \mathbb {N}$ with $i\leq k$ , we have $\mu (B_{i})>0,$ where $\{B_i\}_{i=1}^k$ is defined recursively as follows:

    $$\begin{align*}B_1=X \quad and \quad B_{i+1}=\Big \{x \in B_i; \,\, F_{i+1}(x)\not\in \{F_1(x),\dots, F_i(x) \}\Big\}\quad \text{provided } k>1.\end{align*}$$
  2. 2. For all $F \in \mathcal {G}(\mathcal {K})$ , we have

    $$\begin{align*}\mu \Big(\big \{x \in B_{i+1}^c\,{\backslash}\, B^c_{i}; \, \, F(x) \not\in \{F_1(x),\dots, F_i(x) \}\big\} \Big)=0.\end{align*}$$
  3. 3. If $k\not =\infty $ , then for all $F \in \mathcal {G}(\mathcal {K}),$

    $$\begin{align*}\mu \Big(\big \{x \in B_k; \, \, F(x) \not\in \{F_1(x),\dots, F_k(x) \}\big\} \Big)=0.\end{align*}$$

Moreover, if either $k\not =\infty $ or, $k=\infty $ and $\mu (\cap _{i=1}^\infty B_i)=0$ , then for every $F \in \mathcal {G}(\mathcal {K}),$ the measure $\varrho _F=F_\#\mu $ is absolutely continuous with respect to the measure $\sum _{i=1}^k \varrho _i $ , where $\varrho _i={F_i}_\#\mu .$

We refer to Proposition 3.1 in [Reference Moameni8] for the proof of Proposition 2.1.

The following result shows that every $(\Sigma _\nu ,\mathcal {B}(X))$ -measurable map has a $(\Sigma ,\mathcal {B}(X))$ -measurable representation (see [Reference Bogachev2, Corollary 6.7.6]). Recall that a Souslin space is the image of a Polish space under a continuous mapping.

Proposition 2.2 Let $\nu $ be a finite measure on a measurable space $(Y, \Sigma )$ , let X be a Souslin space, and let $F : Y\to X$ be a $(\Sigma _\nu , \mathcal {B}(X))$ -measurable mapping. Then, there exists a mapping $G: Y \to X $ such that $G = F$ $\nu $ -a.e. and $G^{-1}(B) \in \Sigma $ for all $ B \in \mathcal B(X).$

For a measurable map $T: (X, \mathcal {B}(X) ) \to (Y,\Sigma , \nu )$ denote by $\mathcal {M}( T, \nu )$ the set of all measures $\lambda $ on $\mathcal {B} $ so that T pushes $\lambda $ forward to $\nu ,$ i.e.,

$$\begin{align*}\mathcal{M}( T, \nu)=\{\lambda \in \mathcal{P}(X); \, T_\# \lambda=\nu\}. \end{align*}$$

Evidently, $\mathcal {M}( T, \nu )$ is a convex set. A measure $\lambda $ is an extreme point of $\mathcal {M}( T, \nu )$ if the identity $ \lambda = \theta \lambda _1+(1-\theta )\lambda _2$ with $\theta \in (0,1) $ and $\lambda _1, \lambda _2 \in \mathcal {M}( T, \nu )$ imply that $ \lambda _1=\lambda _2$ . The set of extreme points of $\mathcal {M}( T, \nu )$ is denoted by $\mathrm{ext}\,\mathcal {M}( T, \nu ).$

We recall the following result from [Reference Graf4] in which a characterization of the set $\mathrm{ext}\,\mathcal {M}( T, \nu )$ is given.

Theorem 2.3 Let $(Y,\Sigma , \nu )$ be a probability space, let $(X, \mathcal {B}(X))$ be a Hausdorff space with a Radon probability measure $\lambda $ , and let $T : X\to Y$ be a $(\mathcal {B}(X), \Sigma )$ -measurable mapping. Assume that T is surjective and $\Sigma $ is countably separated. The following conditions are equivalent:

  1. (i) $\lambda $ is an extreme point of $M(T, \nu )$ ;

  2. (ii) there exists a $(\Sigma _\nu ,\mathcal {B}(X))$ -measurable section $F : Y \to X$ of the mapping T with $\lambda = F_\# \nu $ .

By making use of the Choquet theory in the setting of non-compact sets of measures [Reference von Weizsäcker and Winkler13], each $ \lambda \in M(T, \nu )$ can be represented as a Choquet-type integral over $ \mathrm{ext} \, M(T, \nu ).$ Denote by $\Sigma _{ \mathrm{ext} \, M(T, \nu )}$ the $\sigma $ -algebra over $\mathrm{ext}\, M(T, \nu )$ generated by the functions $\varrho \to \varrho (B)$ , $B \in \mathcal {B}(X).$ We have the following result (see [Reference Moameni8] for a proof).

Theorem 2.4 Let X and Y be complete separable metric spaces, and let $\nu $ be a probability measure on $\mathcal {B} (Y).$ Let $T:(X, \mathcal {B}(X)) \to (Y, \mathcal {B}(Y))$ be a surjective measurable mapping, and let $\lambda \in M(T, \nu ).$ Then, there exists a probability measure $\xi $ on $\sum _{\mathrm{ext} \, M(T, \nu )}$ such that for each $B \in \mathcal {B}(X)$ ,

$$\begin{align*}\lambda(B)=\int_{\mathrm{ext} \, M(T, \nu)} \varrho(B)\, d\xi(\varrho), \qquad \big (\varrho \to \varrho(B) \text{ is measurable}\big). \end{align*}$$

We now recall the notion of isomorphisms for measures.

Definition 2.5 Assume that X and Y are topological spaces with Borel probability measures $\mu $ on X and $\nu $ on $Y.$ We say that $(X,B(X), \mu )$ is isomorphic to $(Y,B(Y ), \nu )$ if there exists a one-to-one map T of X onto Y such that for all $A \in B(X),$ we have $T(A) \in B(Y)$ and $\mu (A) = \nu \big (T(A)\big ),$ and for all $B \in B(Y),$ we have $T^{-1}(B) \in B(X)$ and $\mu \big (T^{-1}(B)\big ) = \nu (B)$ .

Here is the well-known measure isomorphism theorem (see Theorem 17.41 in [Reference Kechris5] for a proof).

Theorem 2.6 Let $\mu $ be a Borel probability measure on a Polish space X. If $\mu $ is continuous, then $(X,B(X), \mu )$ and $([0, 1], \lambda )$ , where $\lambda $ is Lebesgue measure, are isomorphic.

Lemma 2.7 Let $\gamma \in \Pi (\mu ,\nu ).$ If either $\mu $ or $\nu $ is continuous, then so is $\gamma .$

Proof Assume that $\mu $ is continuous. Take $(x,y)\in X \times Y.$ It follows that

$$\begin{align*}\mu(\{x\})=\gamma\big (\{x\} \times Y\big)\geq \gamma\big(\{x\} \times \{y\}\big),\end{align*}$$

from which the desired result follows. The proof is similar if $\nu $ is continuous.

Proof of Theorem 1.2

We assume that $\mu $ is a continuous measure. It follows from Lemma 2.7 that $\gamma $ is also continuous. It follows from Theorem 2.6 that the Borel measurable spaces $(X, \mathcal {B}(X), \mu )$ and $(X \times Y, \mathcal {B}(X \times Y), \gamma )$ are isomorphic. Thus, there exists an isomorphism $T=(T_1, T_2)$ from $(X, \mathcal {B}(X), \mu )$ onto $(X \times Y, \mathcal {B}(X \times Y), \gamma )$ . It can be easily deduced that $T_1 : X\to X$ and $T_2: X \to Y$ are surjective maps and

$$\begin{align*}{(T_1)}_\# \mu=\mu \quad \& \quad (T_2)_\# \mu=\nu.\end{align*}$$

Consider the convex set

$$\begin{align*}\mathcal{M}(T_1, \mu)=\big\{ \lambda \in \mathcal{P}(X); \, (T_1)_\# \lambda=\mu\big \},\end{align*}$$

and note that $\mu \in \mathcal {M}(T_1, \mu ).$ Since $ \mu \in \mathcal {M}(T_1, \mu )$ , it follows from Theorem 2.4 that there exists a probability measure $\xi $ on $\sum _{\mathrm{ext} \, M(T_1, \mu )}$ such that for each $B \in \mathcal {B}(X)$ ,

(1) $$ \begin{align} \mu(B)=\int_{\mathrm{ext} \, M(T_1, \mu)} \varrho(B)\, d\xi(\varrho), \qquad \big (\varrho \to \varrho(B) \text{ is measurable}\big).\end{align} $$

Since $\Gamma =\mathrm{spt}(\gamma ),$ it follows that $T^{-1}(\Gamma )$ is a measurable subset of X with ${\mu \big (T^{-1}(\Gamma ) \big )=1.}$ Let $A_\gamma \in \mathcal {B}(X)$ be the set such that $A_\gamma \subseteq T^{-1}(\Gamma )$ and for all $ x \in A_\gamma $ , the cardinality of the set $\Gamma _x$ does not exceed m. It follows from the assumption that $\mu (A_\gamma )=1.$ Since $\mu (X\,{\backslash}\, A_\gamma )=0,$ it follows from (1) that

$$\begin{align*}\int_{\mathrm{ext} \, M(T_1, \mu)} \varrho(X_1\,{\backslash}\, A_\gamma)\, d\xi (\varrho)=\mu(X\,{\backslash}\, A_\gamma)=0,\end{align*}$$

and therefore there exists a $\xi $ -full measure subset $K_\gamma $ of $\mathrm{ext} \, M(T_1, \mu )$ such that ${\varrho (X\,{\backslash}\, A_\gamma )=0}$ for all $\varrho \in K_\gamma .$ Let $\mathcal {S}(T_1)$ be the set of all sections of $T_1$ and define

$$\begin{align*}\mathcal{K}:=\big \{F\in \mathcal{S}(T_1);\,\, \exists \varrho \in K_\gamma \text{ with } \mu=F_\# \varrho \big \}.\end{align*}$$

Let $\mathcal {G}(\mathcal {K})$ be the set of all measurable sections of $T_1$ generated by $\mathcal {K}$ as in Definition 2.2. By Proposition 2.1, there exists a sequence $\{F_i\}_{i=1}^k \subset \mathcal {G}(\mathcal {K})$ with $ k \in \mathbb {N} \cup \{\infty \}$ satisfying assertions (1)–(3) in that proposition. Let $B_\gamma :=\cap _{i=1}^{k}F_i^{-1}(A_\gamma ),$ and for each $k \in \mathbb {N}\cup \{\infty \}$ , define

Let $\varrho _i:={F_i}_\# \mu $ for each $i \in \mathbb {N}_k.$ We shall now proceed with the proof in several steps.

Step I: In this step, we show that $\mu \big (B_\gamma \big )=1$ and

(2) $$ \begin{align} \big (x, T_2 \circ F_i( x) \big ) \in \Gamma, \qquad \,\,\forall x \in B_\gamma, \, \, \, \forall i \in \mathbb{N}_k. \end{align} $$

Note first that $\varrho _i(X\,{\backslash}\, A_\gamma )=0$ for each $i \in \mathbb {N}_k$ . In fact, for a fixed $i\in \mathbb {N}_k,$ since ${F_i \in \mathcal {G}(\mathcal {K})}$ there exists a sequence $\{F_{\sigma _j}\}_{j=1}^{\infty } \subset \mathcal {K}$ such that $X=\cup _{j=1}^{\infty }A_j$ , where

$$\begin{align*}A_j=\{x \in X; \, \, F_i(x)=F_{\sigma_j}\}. \end{align*}$$

Let $\sigma _j \in K_{\gamma }$ be such that the map $F_{\sigma _j}$ is a push-forward from $\sigma _j$ to $\mu .$ It follows that

$$ \begin{align*} \varrho_i(X\,{\backslash}\, A_\gamma)=\mu\big (F_i^{-1}(X\,{\backslash}\, A_{\gamma})\big )&=\mu\big ((\cup_{j=1}^{\infty}A_j) \cap F_i^{-1}(X\,{\backslash}\, A_{\gamma})\big )\\ &\leq \sum_{j=1}^\infty \mu\big (A_j \cap F_i^{-1}(X\,{\backslash}\, A_{\gamma})\big )\\ &=\sum_{j=1}^\infty \mu\big (A_j \cap F_{\sigma_j}^{-1}(X\,{\backslash}\, A_{\gamma})\big )\\ &\leq \sum_{j=1}^\infty \mu\big ( F_{\sigma_j}^{-1}(X\,{\backslash}\, A_{\gamma})\big )=\sum_{j=1}^\infty \sigma_j(X\,{\backslash}\, A_{\gamma})=0. \end{align*} $$

This proves that $\varrho _i(X\,{\backslash}\, A_\gamma )=0.$ Since $\varrho _i$ is a probability measure, we have that $\varrho _i(A_\gamma )=1$ for every $i \in \mathbb {N}_k.$ Therefore, $\mu \big (F_i^{-1}(A_\gamma )\big )=\varrho _i(A_\gamma )=1$ . This implies that $\mu (B_\gamma )=\mu \big (\cap _{i=1}^{k}F_i^{-1}(A_\gamma )\big )=1.$ We shall now prove that

$$\begin{align*}\big (x, T_2 \circ F_i (x)\big ) \in \Gamma, \qquad \,\,\forall x \in B_\gamma, \, \, \, \forall i \in \mathbb{N}_k. \end{align*}$$

Since for all $x \in A_\gamma $ , we have $T(x)=(T_1 x, T_2x) \in \Gamma $ , it follows that for each $i \in \mathbb {N}_k,$

$$\begin{align*}\big (T_1\circ F_i (x), T_2 \circ F_i (x)\big ) \in \Gamma, \qquad \forall x \in F_i^{-1}(A_\gamma), \end{align*}$$

from which together with $T_1 \circ F_i=Id_X$ one obtains

(3) $$ \begin{align} \big (x, T_2 \circ F_i (x) \big ) \in \Gamma, \qquad \forall x \in F_i^{-1}(A_\gamma). \end{align} $$

Thus,

$$\begin{align*}\big (x, T_2 \circ F_i (x)\big ) \in \Gamma, \qquad \,\,\forall x \in B_\gamma, \, \, \, \forall i \in \mathbb{N}_k. \end{align*}$$

This completes the proof of Step I.

Step II: In this step, we assume that assumption (1) of the theorem holds. In this case, we show that $k \leq m.$

To do this, let us assume that $k>m. $ It follows from Step I that

(4) $$ \begin{align} \big (x, T_2 \circ F_i (x)\big ) \in \Gamma, \qquad \,\,\forall x \in B_\gamma, \,\, \, \forall i\in \{1,\dots,m+1\}. \end{align} $$

Note that by assertion $(1)$ in Proposition 2.1, we have $\mu (B_{m+1})>0.$ Since $\mu (B_\gamma )=1$ and $\mu (B_{m+1})>0,$ it follows that $B_\gamma \cap B_{m+1}\not =\emptyset .$ Take $x \in B_\gamma \cap B_{m+1}.$ We have that the cardinality of the set $\Gamma _x$ is at most m. On the other hand, it follows from (4) that $T_2 \circ F_i(x) \in \Gamma _x$ for all $i \in \{1,2,\dots ,m+1\}.$ Thus, there exist $i, j \in \{1,2,\dots ,m+1\}$ with $i <j$ such that $T_2 \circ F_i(x)=T_2 \circ F_j(x).$ Since $T_1 \circ F_i=T_1 \circ F_j=Id_X$ and the map ${T=(T_1,T_2)}$ is injective, it follows that $F_{i}(x)=F_j(x).$ On the other hand, ${x \in B_{m+1} \subseteq B_j}$ from which we have $F_j(x)\not \in \{F_1(x),\dots ,F_{j-1}(x)\}.$ This leads to a contradiction and therefore $k\leq m$ in this case.

Step III: In this step, we assume that assumption (2) of the theorem holds. In this case, we prove that if $k=\infty $ , then $\mu (\cap _{i=1}^\infty B_i)=0.$

To prove this, let us assume that $ k=\infty $ and $\mu (\cap _{i=1}^\infty B_i)>0.$ By Step I, we have that $\mu (B_\gamma )=1$ and

(5) $$ \begin{align} \big (x, T_2 \circ F_i (x)\big ) \in \Gamma, \qquad \,\,\forall x \in B_\gamma, \, \, \, \forall i \in \mathbb{N.} \end{align} $$

Take $x \in \big ( \cap _{i=1}^\infty B_i\big ) \cap B_\gamma. $ It follows from (5) that $T_2 \circ F_i x \in \Gamma _x$ for each $i \in \mathbb {N}.$ On the other hand, by assumption, we have that $card(\Gamma _x)<\infty .$ Thus, there exist $i, j$ with $i <j$ such that $T_2 \circ F_i(x)=T_2 \circ F_j(x).$ As in Step II, since $T_1 \circ F_i=T_1 \circ F_j=Id_X$ and the map ${T=(T_1,T_2)}$ is injective, it follows that $F_{i}(x)=F_j(x).$ On the other hand, $x \in \cap _{i=1}^\infty B_i \subseteq B_j$ from which we have $F_j(x)\not \in \{F_1(x),\dots ,F_{j-1}(x)\}.$ This leads to a contradiction and Step III follows.

It now follows from Steps II and III that either $k\not =\infty $ or, if $k=\infty $ , then ${\mu (\cap _{i=1}^\infty B_i)=0}$ . On the other hand, Proposition 2.1 yields that if either $k\not =\infty $ or, $k=\infty $ and $\mu (\cap _{i=1}^\infty B_i)=0$ , then for every $F \in \mathcal {G}(\mathcal {K})$ , the measure $\varrho _F=F_\#\mu $ is absolutely continuous with respect to the measure $\sum _{i=1}^k \varrho _i $ , where $\varrho _i={F_i}_\#\mu $ for $i \in \mathbb {N}_k.$ This together with the representation

$$ \begin{align*} \mu(B)=\int_{\mathrm{ext} \, M(T_1, \mu)} \varrho(B)\, d\xi(\varrho)= \int_{K_{\gamma}} \varrho(B)\, d\xi(\varrho), \qquad \big (\forall B \in \mathcal{B}(X)\big), \end{align*} $$

imply that $\mu $ is absolutely continuous with respect to $\sum _{i=1}^k \varrho _i.$ It then follows that there exists a nonnegative measurable function such that

$$\begin{align*}\frac{d \mu}{d \big(\sum_{i=1}^k \varrho_i\big)}=\alpha. \end{align*}$$

Define $\alpha _i=\alpha \circ F_i$ for $i\in \mathbb {N}_k.$ We show that $ \sum _{i=1}^k\alpha _i(x)=1$ for $\mu $ -almost every $x \in X.$ In fact, for each $B \in \mathcal {B}(X),$ we have

$$ \begin{align*} \mu(B)&=\mu(T_1^{-1}(B))\\[5pt]&=\sum_{i=1}^k\int_{T_1^{-1}(B)} \alpha(x) \, d\varrho_i =\sum_{i=1}^k\int_{F_i^{-1}\circ T_1^{-1}(B)} \alpha(F_ix) \, d \mu =\sum_{i=1}^k\int_{B} \alpha_i(x) \, d \mu, \end{align*} $$

from which we obtain $\mu (B)=\sum _{i=1}^k\int _{B} \alpha _i(x) \, d \mu .$ Since this holds for all $B \in \mathcal {B}(X),$ we have

$$\begin{align*}\sum_{i=1}^k\alpha_i(x)=1, \qquad \quad \mu-a.e. \end{align*}$$

It now follows from Proposition 2.2 that each $F_i$ is $\mu $ -a.e. equal to a $(\mathcal {B}(X),\mathcal {B}(X))$ -measurable function for which we still denote it by $F_i.$ For each $i \in \mathbb {N}_k,$ let $G_i=T_2 \circ F_i.$ We now show that $\gamma =\sum _{i=1}^k (\text {Id}\times G_i)_\# (\alpha _i \mu )$ . For each bounded continuous function , it follows that

$$ \begin{align*} \int_{X \times Y} f(x,y) \, d \gamma=\int_X f(T_1x,T_2x) \, d\mu&= \sum_{i=1}^k\int_X \alpha(x)f(T_1x,T_2x) \, d\varrho_i\\[5pt] &= \sum_{i=1}^k\int_X \alpha\big (F_i (x)\big ) f\big (T_1\circ F_i (x),T_2 \circ F_i (x) \big ) \, d\mu\\[5pt] &= \sum_{i=1}^k\int_X \alpha_i(x) f\big (x,G_i (x) \big ) \, d\mu. \end{align*} $$

Therefore,

$$\begin{align*}\gamma =\sum_{i=1}^k (\text{Id}\times G_i)_\# (\alpha_i \mu).\\[-42pt]\end{align*}$$

Remark 2.8 It follows from the last part of the proof of Theorem 1.2 that if ${G_i(x)=G_j(x)}$ for some $x\in X$ , then $\alpha _i(x)=\alpha _j(x)$ . In fact, let us assume that $G_i(x)=G_j(x)$ for some $x\in X.$ It implies that $T_2\circ F_i(x)=T_2 \circ F_j(x).$ Since $T_1\circ F_i(x)=T_1 \circ F_j(x)=x$ and $T=(T_1, T_2) $ is injective, we obtain that $F_i(x)=F_j(x).$ This yields that

$$\begin{align*}\alpha_i(x)=\alpha \circ F_i(x)=\alpha \circ F_j(x)=\alpha_j(x), \end{align*}$$

as claimed.

It is worth noting that Theorem 1.2 has a straight forward generalization to the multi-marginal case.

Corollary 2.9 Let $\mu _1,\dots ,\mu _n$ be Borel probability measures on complete separable metric spaces $X_1,\dots ,X_n$ respectively. Assume that $\mu _1$ is continuous. Let $\gamma $ be a probability measure on $X_1 \times \cdots \times X_n$ with fixed marginal $\mu _i$ on $X_i,$ and let $\Gamma =\mathrm{spt}(\gamma ).$ The following assertions hold:

  1. 1. If there exists $m \in \mathbb {N}$ such that the cardinality of the set

    $$\begin{align*}\Gamma_{x_1}:=\big \{(x_2,\dots,x_n)\in X_2\times\cdots\times X_n; \, \, (x_1,\dots,x_n)\in \Gamma \big \}\end{align*}$$
    does not exceed m for $\mu _1$ -a.e. $x_1\in X_1,$ then there exists $k\leq m$ and a sequence of Borel measurable maps $\{G_i\}_{i=1}^k$ from $X_1$ to $X_2 \times \cdots \times X_n$ such that the measure $\gamma $ is concentrated on their graphs.
  2. 2. If $\text {card} \big (\Gamma _{x_1} \big )< \infty $ for $\mu _1$ -a.e. $x_1\in X_1,$ then there exist $k\in \mathbb {N}\cup \{\infty \}$ and a sequence of Borel measurable maps $\{G_i\}^k_{i=1}$ from $X_1$ to $X_2\times \cdots \times X_n$ such that the measure $\gamma $ is concentrated on their graphs.

Proof Let $Y=X_2\times \cdots \times X_n$ and $\nu $ be the projection of $\gamma $ on $Y.$ It follows that ${\gamma \in \Pi (\mu _1, \nu ).}$ Since $\mu _1$ is continuous the desired result follows from Theorem 1.2.

3 Applications in optimal transportation

Here, we shall provide an application of Theorem 1.2. Let $\mathcal {T}$ be a $(2,3)$ -torus knot in $\mathbb {R}^3$ (see Figure 1). Our goal is to describe the structure of optimal plans for the cost given by

$$\begin{align*}c(x,y)=\frac{1}{2}|x-y|^2.\end{align*}$$

Let $\mu $ and $\nu $ be two probability measures on $\mathcal {T}$ . Since the function c is bounded and continuous on $\mathcal {T} \times \mathcal {T}$ it follows that the problem

(6) $$ \begin{align} \inf\Big\{\int_{\mathcal{T} \times \mathcal{T}} c(x,y)\, d\gamma; \, \gamma \in \Pi(\mu, \nu)\Big\}, \end{align} $$

admits a solution. We have the following result.

Figure 1 $(2, 3)$ -torus knot $\mathcal {T}$ .

Theorem 3.1 Assume that the nonatomic measure $\mu $ is absolutely continuous in each coordinate chart on $\mathcal {T}$ . Then any optimal plan of (6) is concentrated on the graphs of at most eight measurable maps.

Proof By standard results in the theory of optimal transportation, there exist measurable functions and with

(7) $$ \begin{align}\psi(y)=\inf_{x \in \mathcal{T}}\{c(x,y)-{\varphi}(x)\} \qquad and \qquad {\varphi}(x)=\inf_{y \in \mathcal{T} }\{c(x,y)-\psi(y)\}, \end{align} $$

such that for any optimal plan $\gamma $ of (6),

$$\begin{align*}\mathrm{spt}(\gamma)\subseteq \big \{(x,y) \in \mathcal{T} \times \mathcal{T}\; :\, {\varphi}(x)+\psi(y)= c(x,y) \big \}.\end{align*}$$

Since $\mathcal {T}$ is bounded, it follows from Lemma C.1 in [Reference Gangbo and McCann3] that ${\varphi }$ is locally Lipschitz on $\mathcal {T}$ . Let $M=\mathrm{Dom} (D {\varphi }).$ It follows from Rademacher’s theorem together with the absolute continuity of $\mu $ that $\mu (M)=1.$ For $x_0 \in M$ , if there exist $y_0, y \in \mathcal {T}$ with $(x_0,y_0)$ and $ (x_0,y) \in \mathrm{spt}(\gamma ),$ then we must have $D_1c(x_0,y_0)=D_1c(x_0,y).$ Let $\vec {N}(x_0)$ be the outward normal vector at $x_o.$ If

$$\begin{align*}D_1c(x_0,y_0)=D_1c(x_0,y),\end{align*}$$

then $y-y_0=\alpha \vec {N}(x_0) $ for some This implies that $y=y_0+ \alpha \vec {N}(x_0).$ The latter argument shows that all the points in the set

$$\begin{align*}\Big\{y \in \mathcal{T};\, D_1c(x_0,y_0)=D_1c(x_0,y)\Big\},\end{align*}$$

live on a straight line through $y_0$ and parallel to the normal vector $\vec {N}(x_0).$ On the other hand, one can easily observe that any straight line can intersect the manifold $\mathcal {T}$ in at most eight points. This proves that $\text {card} \big (\Gamma _x \big )\leq 8 $ is for $\mu $ -a.e. $x\in \mathcal {T}$ where ${\Gamma _x=\big \{y \in \mathcal {T}; \, (x,y)\in \mathrm{spt} (\gamma )\big \}.}$ Therefore, by Theorem 1.2, there exist $k\in \{1,2,\dots ,8\}$ and a sequence of Borel measurable maps $\{G_i\}^k_{i=1}$ from $\mathcal {T}$ to $\mathcal {T}$ such that the measure $\gamma $ is concentrated on their graphs.

Data availability statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Competing interests

The author declares no competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Footnotes

This work is supported by a grant from the Natural Sciences and Engineering Research Council of Canada.

References

Ahmad, N., Kim, H. K., and McCann, R. J., Optimal transportation, topology and uniqueness . Bull. Math. Sci. 1(2011), 1332.Google Scholar
Bogachev, V. I., Measure theory. Vols. I, II, Springer-Verlag, Berlin, 2007.Google Scholar
Gangbo, W. and McCann, R. J., The geometry of optimal transportation . Acta Math. 177(1996), 113161.Google Scholar
Graf, S., Induced $\sigma$ -homomorphisms and a parametrization of measurable sections via extremal preimage measures . Math. Ann. 247(1980), no. 1, 6780.Google Scholar
Kechris, A., Classical descriptive set theory, Graduate Texts in Mathematics, 156, Springer-Verlag, New York, 1995.Google Scholar
Levin, V., Abstract cyclical monotonicity and Monge solutions for the general Monge–Kantorovich problem . Set-Valued Var. Anal. 7(1999), no. 1, 732.Google Scholar
McCann, R. and Rifford, L., The intrinsic dynamics of optimal transport . J. Ec. polytech. Math. 3(2016), 6798.Google Scholar
Moameni, A., A characterization for solutions of the Monge–Kantorovich mass transport problem . Math. Ann. 365(2016), nos. 3–4, 12791304.Google Scholar
Moameni, A. and Pass, B., Solutions to multi-marginal optimal transport problems supported on several graphs . ESAIM Control Optim. Calc. Var. 23(2017), no. 2, 551567.Google Scholar
Moameni, A. and Rifford, L., Uniquely minimizing costs for the Kantorovitch problem . Ann. Fac. Sci. Toulouse Math. (6) 29(2020), no. 3, 507563.Google Scholar
Rachev, S. T. and Rüschendorf, L., Mass transportation problems. Vol. I, Theory. Probability and its Applications (New York), Springer-Verlag, New York, 1998.Google Scholar
Villani, C., Optimal transport, old and new, Grundlehren der Mathematischen Wissenschaften, Springer-Verlag, Berlin, 2009.Google Scholar
von Weizsäcker, H. and Winkler, G., Integral representation in the set of solutions of a generalized moment problem . Math. Ann. 246(1979/80), no. 1, 2332.Google Scholar
Figure 0

Figure 1 $(2, 3)$-torus knot$\mathcal {T}$.