On Stability Estimate of Optimal Transport Maps Using 𝑚 -th Polynomial Convexity

In 1781, Gaspard Monge first proposed the practical problem of relocating building materials while minimizing workers’ effort. Mathematically, the problem can be reiterated as finding a mapping T0 that transforms a random variable ( X ) following probability measure (μ) into a random variable ( Y ) following probability measure ( ν ), with minimal cost. Afterward, it has been widely studied and applied in statistics, machine learning, and economics, which concern the study of “distance” between usually a pair of probability distributions. The focus of this paper is centered on investigating and generalizing stability estimates for optimal transport plans, particularly through the lens of strong polynomial convexity. Building on previous research using plug-in estimators to strengthen the convergence rate of discrete or semi-discrete estimators for optimal transport plans, this paper introduces a novel stability estimate leveraging L-Lipschitz continuity and a paradigmatic methodology based on polynomial convexity, the understanding of which remains limited.


Introduction
Recent studies emphasize the profound nature of optimal transport theory and its applications.Active studies have been conducted on investigating optimization problems and stability estimation of empirical approximation of optimal transport plan [1], [2].Stability estimates directly yield rates of convergence for estimators of optimal transport maps constructed by plugging in empirical distributions.This helps quantify the sample complexity for accurate recovery.Moreover, stability bounds can reveal structural properties relating the geometry/complexity of the underlying distributions to the difficulty of estimating the transport map between them, which provides insight into which problems are statistically harder [3]- [5].Many regularization methods for optimal transport yield dual potentials that satisfy forms of polynomial convexity rather than strict convexity [6], e.g., entropic regularization and Sinkhorn-type algorithms.The incorporation of optimal transport theory in fields such as machine learning and economics underscores the broad impact of this mathematical theory.Studies in machine learning, for instance, apply Wasserstein distance to compute similarities between images.[7] Economic studies involve the results from optimal transport theory into studying marriage and labor market, where matching of couples and assignments of labors can be computed via application of the mathematical theory.[8] Furthermore, studying polynomial convexity provides stability guarantees for transport maps estimated using these common approaches.Polynomial convexity assumptions allow the analysis to cover non-strictly convex optimization problems that arise in applications like imaging, economics, and machine learning.This expands the applicability beyond traditional strongly convex optimal transport [9].
The contribution of this study is to introduce a novel estimation method to deal with strong polynomial convexity up to a finite order, leveraging the Taylor remainder approximation.Applying the assumption of polynomial convexity enhances the feasibility of computational analysis, given its widespread applicability.Published by IDEAS SPREAD Now we recall the general definition of the transport plan between probability measures.Figure 1.Illustration representing the probability measures  and  on the space ℝ  .The arrow labeled  0 =  0 represents the optimal transport map  0 , which is the gradient of the optimal transport potential  0 .The arrow pointing from ℝ to the midpoint between  and  indicates that  0 is a real-valued function defined on ℝ  .
Definition II   refers to the set of probability measures on ℝ  which are absolutely continuous with respect to the Lebesgue measure.Specifically: We call  the density of .
The Brenier-McCann theorem states that, given an absolutely continuous probability measure , there exists a unique (up to μ-almost everywhere) optimal transport map  0 that pushes forward  to another probability measure .This means that for any Borel set  ⊆ ℝ  , we have () = ( 0 −1 ()).
Furthermore, the optimal transport map  0 is the gradient of a convex function ϕ0, which is called the optimal transport potential.The uniqueness of the minimizer π for the optimal transport problem between  and  (when both are in  2 (ℝ  )) is given by ( × ) = ( ∩ ( 0 ) −1 ())for all Borel sets ,  ⊆ ℝ  .
Figure 1 provides a visual representation of the key components and relationships in the Brenier-McCann polar factorization theorem, helping to understand the connection between the probability measures, the optimal transport map, and the optimal transport potential.
Secondly, we have the alternate dual representation which gives: Published by IDEAS SPREAD (1) (2) Where the space of convex functions over ℝ  is denoted as Ƒ, elements of  1 ().Moreover, we let  * (•) denote the standard Legendre-Fenchel dual which we define as the following: Definition VI [Strong m-th order polynomial convexity [2], [9], [10]] A function :ℝ  → ℝ ∪ +∞ is said to be strongly -th order polynomial convex with parameter  > 0 if the function is strongly convex with parameter  > 0 for every choice of ,  ∈ dom(), where   () denotes the -th order differential of  at , and we use the convention that  0 () = ().
Equivalently, (•) is strongly -th order polynomial convex with parameter  if for all ,  ∈ dom() and for all  ∈ [0,1], The key requirement is that the -th order Taylor residual term (1−t) m |  −  | m+1 enters the inequality with modulus .This is a generalization of strong convexity which corresponds to the case = 1.
As with strong convexity, -th order polynomial convexity becomes weaker as  increases, but allows more flexibility.The proofs proceed analogously, but involve more intricate Taylor expansions and remainder term analysis.
Theorem The term max |dual, integrals| refers to the maximum absolute value of two specific dual integrals that arise from the dual formulation of the optimal transport problem.These dual integrals are used to bound the difference between the squared Wasserstein distances  2 2 ( ̃,  ̃) and  2 2 ( ̃,   ): Here,   ̃ * ,  ̃and  ̃,   * are the optimal solutions to the dual problems of the optimal transport between the empirical measures ( ̃,  ̃) and ( ̃,   ), respectively.The integrals are taken with respect to the signed measure ( ̃−  ), which represents the difference between the empirical measure  ̃ and the true measure   .
Multiplying both sides of (15) by Finally, applying Young's inequality to the last term in (17) gives (18) which leads to the desired result: This completes the proof.
Further discussion and severization on computing the upper bound of  2 with respect to the Wasserstein distance is to finished in the future.


An additional term appears involving higher order norms of the transport map difference.


Constants have worse dependence on smoothness parameters like the Lipschitz constant L. item Theorem 2 could potentially allow for the analysis of a class of regularization methods in optimal transport that yield only polynomial convex dual potentials.


The appearance of higher order difference terms on the right-hand side of Theorem 2 suggests that under weaker convexity, the stability estimate involves finer properties beyond just the Lipschitz constant of the optimal map  0 .Additional smoothness moduli play a role.item Since many statistical and optimization methods satisfy forms of polynomial convexity (e.g.gradient boosting, neural nets), Theorem 2 represents a stepping stone towards stability analyses of plug-in estimators involving learned transport maps.

Conclusion
This paper has focused on investigating and extending stability estimates for optimal transport plans, particularly with the study of strong polynomial convexity.By building upon prior research utilizing plug-in estimators to enhance convergence rates of discrete or semi-discrete estimators for optimal transport plans, this study contributes Published by IDEAS SPREAD a novel stability estimate, which leverages L-Lipschitz continuity and a methodology rooted in polynomial convexity, an area of understanding that remains not fully explored.

Glossary μ:
Probability measures on a  algebra.
supp(f): Support set, set of points that give non-zero output of a real-valued function () whose domain is defined.
Borel set: Any set in a topological space that is formed by open sets through relative complement, countable union, and countable intersection.

Absolutely continuous probability measures:
A random variable X is absolutely continuous if there exists a real-valued function f satisfying   () = ∫  () ⅆ, where A is an arbitrary Borel set.  (ℝ  ): We denote it as the space of all probability measures on ℝ  with finite second moments.

ℝ 𝑑 :
Euclidean space with dimensions d.T#: A push forward operator T#μ = ν is a linear map corresponding to the displacement of the support of the points.L-Lipschitz continuity: Given two metric spaces (X,dX) and (Y, dY), where dX denotes the metric on the set X and dY is the metric on set Y , a function f : X→Y is called Lipschitz continuous if there exists a real constant K ≥ 0 such that, for all x1 and x2 in X, dY (f(x1), f(x2) ≤ KdX(x1, x2) Wp(a, b): p-Wasserstein distance with respect to distance d and ground distance matrix D ϕ: which we denote as the gradient of convex function ϕ : ℝ  → ℝ    (ℝ  ): It refers to the set of probability measures on ℝ  which are absolutely continuous with respect to the Lebesgue measure.
:  is a probability measure on Rd such that () = 0 for any Lebesgue null set }.