# Efficiently Calculating GDD

In previous chapters we have introduced and defined diffusion distance. A crucial component of the optimization required to calculate GDD is the optimization of the $\alpha$ parameter for conversion between coarse and fine time scales. Optimizing $\alpha$ in addition to optimizing the prolongation matrix $P$ under transitive constraints $\mathcal{C}(P)$, is a nontrivial numerical problem that in our experience seems to require new methods. We develop such methods here.

## Algorithm Development

In this section, we describe the algorithm used to calculate upper bounds on graph distances as the joint optima (over $P$, $t$, and $\alpha$) of the distance equations Equation [defn:exp_defn] and Equation [defn:linear_defn], under orthogonality constraints only, i.e. the case $\mathcal{C}(P) = \{ P | P^T P = I \}$. At the core of both algorithms is a subroutine to solve the Linear Assignment Problem (LAP - see Equation [eqn:lap_defn]) repeatedly, in order to find the subpermutation matrix which is optimal at a particular value of $\alpha$. Namely, we are interested in calculating $\tilde{D}$ as

\begin{aligned} \tilde{D}(G_1, G_2) &= \min_\alpha f(\alpha) \quad \text{where} \quad f(\alpha) = \inf_{P | P^T P = I} \left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right| \\ \intertext{which, for orthogonality or any other compact constraint} &= \min_{P | P^T P = I} \left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right|. \nonumber \end{aligned}

However, we have found that the unique structure of this optimization problem admits a specialized procedure which is faster and more accurate than nested univariate optimization of $\alpha$ and $t$ (where each innermost function evaluation consists of a full optimization over $P$ at some $t, \alpha$). We first briefly describe the algorithm used to find the optimal $P$ and $\alpha$ for $\tilde{D}^2$. The formal description of the algorithm is given by Algorithm [alg:linear_alg]. In both cases, we reduce the computational complexity of the optimization over $P$ by imposing the additional constraint that $P$ must be a subpermutation matrix when rotated into the spectral basis (we define subpermutations in the proof of Theorem [thm:LAP_bound]). This constraint is compatible with the orthogonality constraint (all subpermutation matrices are orthogonal, but not vice versa). The tradeoff of this reduction of computational complexity is that we can only guarantee that our optima are upper bounds of the optima over all orthogonal $P$. However, in practice, this bound seems to be tight: over a large number of empirical evaluations we did not find any example where orthogonally-constrained optimization was able to improve in objective function value over optimization constrained to subpermutation matrices. Therefore, we shall for the remainder of this paper refer to the optima calculated as distance values, when strictly they are distance upper bounds. We also note here that a distance lower bound is also possible to calculate by relaxing the constraints in $\mathcal{C}(P)$ (for instance, by replacing the optimization over all $P$ with a less constrained matching problem - see Section 2.4).

(a)

(b)

[fig:distance_example_fig]

## Optimization of $\tilde{D}^2$

[alg:linear_alg]

Compute $\lambda^{(1)}$, $\lambda^{(2)}$ as the eigenvalues of $L_1$ and $L_2$. Compute, by optimizing a linear assignment, $M_\text{low}$ and $M_\text{high}$ as the optimal matchings at $\alpha_\text{low}$, $\alpha_\text{high}$ respectively. Initialize the list of optimal matchings as $\{M_\text{low}, M_\text{high} \}$. Until the current list of matchings is not expanded in the following step, or the entire interval $[ \alpha_\text{low}$, $\alpha_\text{high} ]$ is marked as explored: Attempt to expand the list of optimal matchings by solving a linear assignment problem at the $\alpha$ where the cost curves of two matchings (currently in the list) intersect. If no better assignment exists, then mark the interval covered by those matchings as explored, as guaranteed by Theorem [thm:matching_agreement]. Return the lowest-cost $M$ and its optimal $\alpha$.

Joint optimization of $\tilde{D}^2$ over $\alpha$ and $P$ is a nested optimization problem (see (Onoyama et al. 2000) and (Sinha, Malo, and Deb 2018) for a description of nested optimization), with potential combinatorial optimization over $P$ dependent on each choice of $\alpha$. Furthermore, the function $f(\alpha) = \inf_{P| \mathcal{C}(P)} \tilde{D}^2(G_1, G_2 | \alpha)$ is both multimodal and continuous but, in general, with a discontinuous derivative (See Figure [fig:distance_example_fig]). Univariate optimization procedures such as Golden Section Search result in many loops of some procedure to optimize over $P$, which in our restricted case must each time compute a full solution to a LAP with $n_2 \times n_1$ weights. In our experience, this means that these univariate methods have a tendency to get stuck in local optima. We reduce the total number of calls to the LAP solver, as well as the size of the LAPs solved, by taking advantage of several unique properties of the optimization as a function of $\alpha$. When the optimal $P^{(1)}$ and $P^{(2)}$ are known for $\alpha_1$ and $\alpha_2$, then for any $\alpha_c$ such that $\min (\alpha_1, \alpha_2) \leq \alpha_c \leq \max (\alpha_1, \alpha_2)$, the optimal $P^{(c)}$ at $\alpha_c$ must satisfy: $P^{(1)}_{ij} = 1 \wedge P^{(2)}_{ij} = 1 \implies P^{(c)}_{ij} = 1$ (see Theorem [thm:matching_agreement]). Thus, the optimization over $P$ at $\alpha_c$ is already partially solved given the solutions at $\alpha_1$ and $\alpha_2$, and so we need only re-compute the remaining (smaller) subproblem on the set of assignments where $P^{(1)}$ and $P^{(2)}$ disagree. This has two consequences for our search over $\alpha$: First, the size of LAP problems which must be solved at each step decreases over time (as we find $P$-optima for a denser and denser set of $\alpha$). Secondly, these theoretical guarantees mean that we can mark intervals of $\alpha$-values as being explored (meaning we have provably found the $P$ which are optimal over the interval) and thus do not have to perform the relatively expensive optimization over $P$ for any $\alpha$ in that interval.

## Optimization of $D^2$

Compute $\lambda^{(1)}$, $\lambda^{(2)}$ as the eigenvalues of $L_1$ and $L_2$. Solve the Linear Version of the problem using Algorithm [alg:linear_alg], obtaining $\alpha^*, M^*$. According to the argument presented in the definition of linear distance (Equation [defn:linear_defn]) this solution holds for very small $t$. Keep the entire frontier of matchings found during the execution of Algorithm [alg:linear_alg]. Set $t=0$, $d(0) = D(G_1, G_2 | \alpha*, M*, t)$ Until $d(t + \epsilon) < d(t)$: $t = t + \epsilon$ Use the linear algorithm with $e^{tL_1}$ and $e^{tL_2}$ as the input matrices, initializing the list of matchings with those found at the previous $t$. Set $d(t) = D(G_1, G_2 | \alpha*, M*, t)$ where $\alpha*, M*$ are the optima from the previous step. Return the $\max_t d(t)$.

[alg:exponential_alg]

Many of the theoretical guarantees underlying our algorithm for computing $\tilde{D}^2$ no longer hold for the exponential version of the distance. We adapt our linear-version procedure into an algorithm for computing this version, with the caveat that the lack of these guarantees means that our upper bound on the exponential version may be looser than that on the linear version. It is still clearly an upper bound, since the $\alpha$ and $P$ found by this procedure satisfy the given constraints $\alpha > 0$ and $P^T P = I$. In particular, we have observed cases where the exponential-distance analog of Theorem [thm:matching_agreement] would not hold, meaning we cannot rule out $\alpha$-intervals as we can in the linear version. Thus, this upper bound may be looser than that computed for the linear objective function.

For the exponential version of the algorithm, we first compute the list of optimal $P$ for the linear version, assuming (since $e^{t L} \approx I + L$ for very small $t$) that this is also the list of optimal $P$ for the exponential version of the objective function at some low $t$. We proceed to increment $t$ with some step size $\Delta t$, in the manner of a continuation method (Allgower and Georg 2012). At each new $t$ value, we search for new optimal $P$ along the currently known frontier of optima as a function of $\alpha$. When a new $P$ is found as the intersection of two known $P_{i}, P_{i+1}$, it is inserted into the list, which is kept in order of increasing $\alpha$. For each $P$ in this frontier, we find the optimal $\alpha$, keeping $P$ and $t$ constant. Assuming $\inf_P \inf_\alpha D^2(G_1, G_2 | t_c)$ is unimodal as a function of $t_c$, we increase $t_c$ until $\inf_P \inf_\alpha D^2(G_1, G_2 | t_c) \geq \inf_P \inf_\alpha D^2(G_1, G_2 | t_c + \Delta t)$, storing all $P$ matrices found as optima at each $t_c$ value. $P$ which were on the lower convex hull at some prior value of $t$ but not the current value are retained, as they may regain optimality for some $\alpha$-range at a future value of $t$ (we have observed this, in practice). For this list $P_1, P_2 \ldots P_m$, we then compute $\sup_t \inf_\alpha \inf_i D^2(G_1, G_2 | P_i)$. Since the exponential map is continuous, and we are incrementing $t$ by very small steps, we also propose the further computational shortcut of storing the list of optimal $\alpha$ at time $t$ to use as starting points for the optimization at $t + \Delta t$. In practice, this made little difference in the runtime of our optimization procedure.

## Algorithm Correctness Proof

[thm:LAP_bound] For any two graphs $G_1$ and $G_2$ with Laplacians $L(G_1)$ and $L(G_2)$, for fixed $\alpha$, the optimization over $P$ given in the innermost loop of Equation [defn:linear_defn] is upper bounded by a Linear Assignment Problem as defined in Equation [eqn:lap_defn]. This LAP is given by taking $R$ to be the eigenvalues $\lambda^{(1)}_j$ of $L(G_1)$ and $S$ to be the eigenvalues $\lambda^{(2)}_i$ of $L(G_2)$, with the cost of a pair (equivalently, one entry of the cost matrix $C$) given by \begin{aligned} \label{eqn:cost_matrix_entry} C_{ij} = c(s_i, r_j) = c\left(\lambda^{(2)}_i, \lambda^{(1)}_j\right) = {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \end{aligned}

$L(G_1)$ and $L(G_2)$ are both real symmetric matrices, so they may be diagonalized as $L(G_i) = U_i \Lambda_i U_i^T$, where the $U_i$ are rotation matrices, and the $\Lambda_i$ are diagonal matrices with the eigenvalues $\lambda^{(i)}_1, \lambda^{(i)}_2 \ldots \lambda^{(i)}_{n_i}$ along the diagonal. Because the Frobenius norm is invariant under rotation, we have: \begin{aligned} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P L(G_1) U_1 - \alpha U_2^T L(G_2) P U_1 \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P U_1 \Lambda_1 U_1^T U_1 - \alpha U_2^T U_2 \Lambda_2 U_2^T P U_1 \right| \right|}^2_F \label{eqn:diagonal} \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P U_1 \Lambda_1 - \alpha \Lambda_2 U_2^T P U_1 \right| \right|}^2_F. \nonumber \\ \intertext{Because the U_i are orthogonal, the transformation  \tilde{P} = U_2^T P U_1 preserves orthogonality, so} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} P \Lambda_1 - \alpha \Lambda_2 P \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} \Lambda_1 \right| \right|}^2_F + {\left| \left| \frac{}{} \alpha \Lambda_2 P \right| \right|}^2_F - 2 \text{Tr} \left[ P^T \Lambda_2 P \Lambda_1 \right] \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \text{Tr} \left[ \frac{1}{\alpha^2} \Lambda_1^2 \right] + \text{Tr} \left[ \alpha^2 P^T \Lambda_2^2 P \right] - 2 \text{Tr} \left[ P^T \Lambda_2 P \Lambda_1 \right] \right) \nonumber \\ \intertext{writing P = [p_{ij}],} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\lambda^{(2)}_i}^2 \right. \label{eqn:f_terms} \\* &\left. \qquad \qquad \qquad \qquad \qquad - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \nonumber \\* &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 \left( \frac{1}{\alpha^2} {\lambda^{(1)}_j}^2 - 2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} + \alpha^2 {\lambda^{(2)}_i}^2 \right) \right) \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \label{eqn:simple_d_tilde} \end{aligned} For any given $\alpha$, \begin{aligned} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{\lambda^{(1)}_j}{\alpha} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \leq \inf_{\tilde{P} | \text{sub}(\tilde{P})} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} \tilde{p}_{ij}^2 {\left( \frac{\lambda^{(1)}_j}{\alpha} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \quad , \nonumber \end{aligned}

where $\text{subperm}(\tilde{P})$ could be any other condition more strict than the constraint $P^T P = I$. Here we take this stricter constraint to be the condition that $\tilde{P}$ is a subpermutation matrix: an orthogonal matrix (i.e. $\tilde{P}^T \tilde{P}=I)$ for which $\tilde{P} \in \{0,1\}^{n_2 \times n_1}$. Equivalently, a subpermutation matrix is a $\{0,1\}$-valued matrix $[\tilde{p}_{ij}]$ such that for each $i \in \{1, \ldots n_1 \leq n_2 \}$, exactly one $j \in \{1, \ldots n_2 \geq n_1\}$ takes the value 1 rather than 0 (so $\sum_{j=1}^{n_2} \tilde{P}_{j i}=1$), and for each $j \in \{1, \ldots n_2 \geq n_1 \}$, either zero or one $i \in \{1, \ldots n_1 \leq n_2\}$ takes the value 1 rather than 0 (so $\sum_{i=1}^{n_1} \tilde{P}_{j i} \leq 1$).

Furthermore, this optimization is exactly a linear assignment problem of eigenvalues of $L(G_1)$ to $L(G_2)$, with the cost of a pair $\left(\lambda^{(1)}_j, \lambda^{(2)}_i \right)$ given by $c\left(\lambda^{(1)}_j, \lambda^{(2)}_i \right) = {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2$

Note also that the same argument applies to the innermost two optimizations of the calculation of $D^2$ (the exponential version of the diffusion distance) as well as $D^2_r$. In the $D^2$ case the entries of the cost matrix are instead given by $c\left(\lambda^{(1)}_j, \lambda^{(2)}_i \right) = {\left( e^{\frac{1}{\alpha} {\lambda^{(1)}_j}} - e^{\alpha {\lambda^{(2)}_i}} \right)}^2$ If we instead loosen the constraints on $P$, we can calculate a lower bound on the distance. See Section 2.4 for lower bound details.

Recall that our definition of a ‘matching’ in Section 1.4.2 was a $P$ matrix representing a particular solution to the linear assignment problem with costs given as in Equation [eqn:cost_matrix_entry]. For given $G_1, G_2$, and some matching $M$, let \begin{aligned} \label{eqn:f_alpha_defn} f_M(\alpha) = \tilde{D}^2 (G_1, G_2 | \alpha, U_2^T M U_1) \end{aligned} where $U_1, U_2$ diagonalize $L_1$ and $L_2$ as in Equation [eqn:diagonal].

[lem:unique_soln] For two unique matchings $M_1$ and $M_2$ (for the same $G_1, G_2$) the equation $f_{M_1}(\alpha) - f_{M_2}(\alpha) = 0$ has at most one real positive solution in $\alpha$. This follows from the fact that when $P$ and $t$ are fixed, the objective function is a rational function in $\alpha$ (see Equation [eqn:f_terms]), with a quadratic numerator and an asymptote at $\alpha = 0$.

By Equation [eqn:f_terms], we have \begin{aligned} & f_{M_1}(\alpha) - f_{M_2}(\alpha) = \nonumber \\ & \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i}^2 - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \\ & \quad - \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i}^2 - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \\ &= \alpha^2 \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i}^2 - \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i}^2 \right) \\ & \quad + \left( 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \end{aligned} \begin{aligned} \intertext{Abbreviating the sums, we have} \alpha^2 \left(A_1 - A_2\right) + \left(C_2 - C_1\right) &= 0 \\ \intertext{and so} \alpha = \pm \sqrt{\frac{C_2 - C_1}{A_1 - A_2}} \end{aligned} Since $A_1, A_2, C_1, C_2$ are all nonnegative reals, at most one of these roots is positive.

We will say that a matching $M$ “assigns” $j$ to $i$ if and only if $M_{ij} = 1$.

[thm:matching_agreement] If two matchings $M_1$ and $M_3$ which yield optimal upper bounds for the linear distance $\tilde{D}^2$ (at $\alpha_1 \leq \alpha$ and $\alpha_3 \geq \alpha$ respectively) agree on a set of assignments, then the optimal $M$ at $\alpha$ must also agree with that set of assignments.

We need the following lemmas:

[lem:mono_incr] If an optimal matching assigns $i$ to $m(i)$ (so that eigenvalue $\lambda^{(1)}_i$ of $G_1$ is paired with $\lambda^{(2)}_{f(i)}$ of $G_2$ in the sum of costs Equation [eqn:cost_matrix_entry]), then the sequence $m(1), m(2), \ldots m(n_1)$ is monotonic increasing.

This follows from the fact that the two sequences of eigenvalues are monotonic nondecreasing, so if there’s a ‘crossing’ ($i_1 < i_2$ but $m(i_2) < m(i_1)$) then the new matching obtained by uncrossing those two pairs (performing a 2-opt step as defined in (Croes 1958)) has strictly lesser objective function value. Hence an optimal matching can’t contain any such crossings.

For all positive real $\alpha^* \geq \epsilon > 0$, let $M_1$ be an optimal matching at $\alpha^* - \epsilon$ and $M_2$ be optimal at $\alpha^* + \epsilon$. For $1 \leq i \leq n_1$, let $s_1(i)$ and $s_2(i)$ be the indices of $\lambda^{(2)}$ paired with $i$ in $M_1$ and $M_2$, respectively. Then for all $i$, $s_1(i) \leq s_2(i)$.

Define a “run” for $s_1, s_2$ as a sequence of consecutive indices $l, l+1, \ldots l+k$ in $[1,n_1]$ such that for any $l$, $l+1$: $\min(s_1(l+1), s_2(l+1)) < \max(s_1(l), s_2(l))$. The following must be true about a “run”:

1. Within a run, either $s_1(l) < s_2(l)$ or $s_1(l) > s_2(l)$ for all $l$. Otherwise, we have one or more crossings (as in Lemma [lem:mono_incr]): for some $l$ we have $s_1(l) > s_1(l+1)$ or $s_2(l) > s_2(l+1)$. Any crossing may be uncrossed for a strictly lower objective function value - violating optimality of $M_1$ or $M_2$.

2. Any pair of matchings as defined above consists of a sequence of runs, where we allow a run to be trivial i.e. be a single index.

Next, we show that within a run, we must have $s_1(i) < s_2(i)$ for all $i$. Let $S = \{l, l+1, \ldots l+k \}$ be a run. By optimality of $M_1$, $M_2$ at $\alpha^* - \epsilon$ and $\alpha^* + \epsilon$ respectively, we have:

\begin{aligned} \label{eqn:dist_comp_sum} \sum_{i \in S} {\left(\frac{1}{\alpha^* - \epsilon}\lambda^{(1)}_i - (\alpha^*-\epsilon)\lambda^{(2)}_{s_1(i)} \right)}^2 < \sum_{i\in S} {\left( \frac{1}{\alpha^* - \epsilon}\lambda^{(1)}_i - (\alpha^*-\epsilon)\lambda^{(2)}_{s_2(i)} \right)}^2 \\ \intertext{and} \sum_{i \in S} {\left(\frac{1}{\alpha^* + \epsilon}\lambda^{(1)}_i - (\alpha^*+\epsilon)\lambda^{(2)}_{s_2(i)}\right)}^2 < \sum_{i \in S} {\left(\frac{1}{\alpha + \epsilon}\lambda^{(1)}_i - (\alpha+\epsilon)\lambda^{(2)}_{s_1(i)}\right)}^2. \end{aligned} Respectively, these simplify to \begin{aligned} - \sum_{i \in S}\left( \lambda^{(2)}_{s_1(i)} - \lambda^{(2)}_{s_2(i)} \right) \left( -2 \lambda^{(i)}_{i} + {(\alpha^* - \epsilon)}^2 \left( \lambda^{(2)}_{s_1(i)} + \lambda^{(2)}_{s_2(i)} \right) \right) > 0 \\ \intertext{and} \sum_{i \in S} \left( \lambda^{(2)}_{s_1(i)} - \lambda^{(2)}_{s_2(i)} \right) \left( -2 \lambda^{(i)}_{i} + {(\alpha^* + \epsilon)}^2 \left( \lambda^{(2)}_{s_1(i)} + \lambda^{(2)}_{s_2(i)} \right) \right) > 0. \\ \end{aligned} Summing these inequalities and cancelling $-2 \lambda^{(i)}_{i}$, we have: \begin{aligned} &\sum_{i \in S} \left\{ {(\alpha^* + \epsilon)}^2 \left( {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 + {\left(\lambda^{(2)}_{s_2(i)}\right)}^2 \right) \right. - \left. {(\alpha^* - \epsilon)}^2 \left( {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 + {\left(\lambda^{(2)}_{s_2(i)}\right)}^2 \right) \right\}> 0. \end{aligned} Summing and reducing gives us $4 \alpha^* \epsilon \left( \sum_{i \in S} {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 - \sum_{i \in S} {\left( \lambda^{(2)}_{s_2(i)} \right) }^2 \right) > 0 \quad \text{and so} \quad \sum_{i \in S} {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 > \sum_{i \in S} {\left( \lambda^{(2)}_{s_2(i)} \right) }^2.$

However, since the $\lambda^{(2)}_j$ are monotonic nondecreasing, this means we cannot also have $s_1(i) > s_2(i)$ for all $i \in S$, since that would imply $\sum_{i=1}^{n_1} {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 < \sum_{i=1}^{n_1} {\left( \lambda^{(2)}_{s_2(i)} \right) }^2.$

Therefore, in a run of arbitrary length, all indices must move ‘forward’ (meaning that $s_1(i) < s_2(i)$ for all $i$ in the run), and so (since any pair of matchings optimal at such $\alpha$ define a set of runs) we must have $s_1(i) \leq s_2(i)$. This completes the proof of the lemma.

Thus, for three matchings $M_1, M_2, M_3$ which are optimal at a sequence of $\alpha_1 \leq \alpha_2 \leq \alpha_3$, we must have $s_1(i) \leq s_2(i) \leq s_3(i)$ for all $i$. In particular, if $s_1(i) = s_3(i)$, we must also have $s_1(i) = s_2(i) = s_3(i)$.

If two matchings $M_1$ and $M_3$ yield optimal upper bounds for the linear distance $\tilde{D}^2$ at $\alpha_1$ and $\alpha_3$ respectively, and $f_{M_1}(\alpha_2) = f_{M_2}(\alpha_2)$ for some $\alpha_2$ s.t. $\alpha_1 \leq \alpha_2 \leq \alpha_3$, then either (1) $M_1$ and $M_3$ are optimal over the entire interval $[\alpha_1, \alpha_3]$ or (1) some other matching $M_2$ improves over $M_1$ and $M_3$ at $\alpha_2$.

This follows directly from the facts that $f_{M_1}(\alpha)$ and $f_{M_2}(\alpha)$ (as defined in Equation [eqn:f_alpha_defn]), can only meet at one real positive value of $\alpha$ (Lemma [lem:unique_soln]). Say that the cost curves for $M_1$ (known to be optimal at $\alpha = \alpha_1$) and $M_3$ (optimal at $\alpha = \alpha_3$) meet at $\alpha = \alpha_2$, and furthermore assume that $\alpha_1 \leq \alpha_2 \leq \alpha_3$. If some other matching $M_2$ improves over (meaning, has lesser obj. function value as a function of $\alpha$) $M_1$ or $M_3$ anywhere in the interval $[\alpha_1, \alpha_3]$, it must improve over both at $\alpha=\alpha_2$, since it may intersect each of these cost curves at most once on this interval. If $M_1$ and $M_3$ are both optimal at their intersection point (meaning no such distinct $M_2$ exists) then we know that no other matching improves on either of them over the the interval $[\alpha_1, \alpha_3]$ and may therefore mark it as explored during the outermost loop (otimization over $\alpha$) of Algorithm [alg:linear_alg].

Together, the preceeding properties verify that our algorithm will indeed find the joint optimum over all $\alpha$ and $P$ (for fixed $t=c$, for $\tilde{D}$, subject to subpermutation constraints on $P$): it allows us to find the entire set of $P$ subpermutation matrices which appear on the lower convex hull of distance as a function of alpha.

## Implementation Details

We implement Algorithms [alg:linear_alg] and [alg:exponential_alg] in the programming language Python (version 3.6.1) (Rossum 1995). Numerical arrays were stored using the numpy package (Van Der Walt, Colbert, and Varoquaux 2011). Our inner LAP solver was the package lapsolver (Heindl 2018). Univariate optimization over $t$ and $\alpha$ was performed with the ‘bounded’ method of the scipy.optimize package (Jones et al., n.d.), with bounds set at $[0, 10.0]$ for each variable and a input tolerance of $10^{-12}$. Laplacians were computed with the laplacian method from the package networkX (Hagberg, Swart, and S Chult 2008), and their eigenvalues were computed with scipy.linalg.eigh.

Because of numerical precision issues arising during eigenvalue computation, it can be difficult to determine when two matchings agree, using eigenvalue comparison. In practice we ignore this issue and assume that two matchings are only identical if they associate the same indices of the two lists of eigenvalues. This means we may be accumulating multiple equivalent representations of the same matching (up to multiplicity of eigenvalues) during our sweeps through $t$ and $\alpha$. We leave mitigating this inefficiency for future work.

Code for computing diffusion distance, both with our algorithm and with naive univariate optimization, may be found in the Supplementary Information associated with this paper, as well as a maintained GitHub repository (Cory B. Scott 2021).