Efficiently Calculating GDD

In previous chapters we have introduced and defined diffusion distance. A crucial component of the optimization required to calculate GDD is the optimization of the α\alpha parameter for conversion between coarse and fine time scales. Optimizing α\alpha in addition to optimizing the prolongation matrix PP under transitive constraints 𝒞(P)\mathcal{C}(P), is a nontrivial numerical problem that in our experience seems to require new methods. We develop such methods here.

Algorithm Development

In this section, we describe the algorithm used to calculate upper bounds on graph distances as the joint optima (over PP, tt, and α\alpha) of the distance equations Equation [defn:exp_defn] and Equation [defn:linear_defn], under orthogonality constraints only, i.e. the case 𝒞(P)={P|PTP=I}\mathcal{C}(P) = \{ P | P^T P = I \}. At the core of both algorithms is a subroutine to solve the Linear Assignment Problem (LAP - see Equation [eqn:lap_defn]) repeatedly, in order to find the subpermutation matrix which is optimal at a particular value of α\alpha. Namely, we are interested in calculating D̃\tilde{D} as

$$\begin{aligned} \tilde{D}(G_1, G_2) &= \min_\alpha f(\alpha) \quad \text{where} \quad f(\alpha) = \inf_{P | P^T P = I} \left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right| \\ \intertext{which, for orthogonality or any other compact constraint} &= \min_{P | P^T P = I} \left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right|. \nonumber \end{aligned}$$

However, we have found that the unique structure of this optimization problem admits a specialized procedure which is faster and more accurate than nested univariate optimization of α\alpha and tt (where each innermost function evaluation consists of a full optimization over PP at some t,αt, \alpha). We first briefly describe the algorithm used to find the optimal PP and α\alpha for D̃2\tilde{D}^2. The formal description of the algorithm is given by Algorithm [alg:linear_alg]. In both cases, we reduce the computational complexity of the optimization over PP by imposing the additional constraint that PP must be a subpermutation matrix when rotated into the spectral basis (we define subpermutations in the proof of Theorem [thm:LAP_bound]). This constraint is compatible with the orthogonality constraint (all subpermutation matrices are orthogonal, but not vice versa). The tradeoff of this reduction of computational complexity is that we can only guarantee that our optima are upper bounds of the optima over all orthogonal PP. However, in practice, this bound seems to be tight: over a large number of empirical evaluations we did not find any example where orthogonally-constrained optimization was able to improve in objective function value over optimization constrained to subpermutation matrices. Therefore, we shall for the remainder of this paper refer to the optima calculated as distance values, when strictly they are distance upper bounds. We also note here that a distance lower bound is also possible to calculate by relaxing the constraints in 𝒞(P)\mathcal{C}(P) (for instance, by replacing the optimization over all PP with a less constrained matching problem - see Section 2.4).

(a)

(b)

[fig:distance_example_fig]

Optimization of D̃2\tilde{D}^2

[alg:linear_alg]

Compute λ(1)\lambda^{(1)}, λ(2)\lambda^{(2)} as the eigenvalues of L1L_1 and L2L_2. Compute, by optimizing a linear assignment, MlowM_\text{low} and MhighM_\text{high} as the optimal matchings at αlow\alpha_\text{low}, αhigh\alpha_\text{high} respectively. Initialize the list of optimal matchings as {Mlow,Mhigh}\{M_\text{low}, M_\text{high} \}. Until the current list of matchings is not expanded in the following step, or the entire interval [αlow[ \alpha_\text{low}, αhigh]\alpha_\text{high} ] is marked as explored: Attempt to expand the list of optimal matchings by solving a linear assignment problem at the α\alpha where the cost curves of two matchings (currently in the list) intersect. If no better assignment exists, then mark the interval covered by those matchings as explored, as guaranteed by Theorem [thm:matching_agreement]. Return the lowest-cost MM and its optimal α\alpha.

Joint optimization of D̃2\tilde{D}^2 over α\alpha and PP is a nested optimization problem (see (Onoyama et al. 2000) and (Sinha, Malo, and Deb 2018) for a description of nested optimization), with potential combinatorial optimization over PP dependent on each choice of α\alpha. Furthermore, the function f(α)=infP|𝒞(P)D̃2(G1,G2|α)f(\alpha) = \inf_{P| \mathcal{C}(P)} \tilde{D}^2(G_1, G_2 | \alpha) is both multimodal and continuous but, in general, with a discontinuous derivative (See Figure [fig:distance_example_fig]). Univariate optimization procedures such as Golden Section Search result in many loops of some procedure to optimize over PP, which in our restricted case must each time compute a full solution to a LAP with n2×n1n_2 \times n_1 weights. In our experience, this means that these univariate methods have a tendency to get stuck in local optima. We reduce the total number of calls to the LAP solver, as well as the size of the LAPs solved, by taking advantage of several unique properties of the optimization as a function of α\alpha. When the optimal P(1)P^{(1)} and P(2)P^{(2)} are known for α1\alpha_1 and α2\alpha_2, then for any αc\alpha_c such that min(α1,α2)αcmax(α1,α2)\min (\alpha_1, \alpha_2) \leq \alpha_c \leq \max (\alpha_1, \alpha_2), the optimal P(c)P^{(c)} at αc\alpha_c must satisfy: Pij(1)=1Pij(2)=1Pij(c)=1P^{(1)}_{ij} = 1 \wedge P^{(2)}_{ij} = 1 \implies P^{(c)}_{ij} = 1 (see Theorem [thm:matching_agreement]). Thus, the optimization over PP at αc\alpha_c is already partially solved given the solutions at α1\alpha_1 and α2\alpha_2, and so we need only re-compute the remaining (smaller) subproblem on the set of assignments where P(1)P^{(1)} and P(2)P^{(2)} disagree. This has two consequences for our search over α\alpha: First, the size of LAP problems which must be solved at each step decreases over time (as we find PP-optima for a denser and denser set of α\alpha). Secondly, these theoretical guarantees mean that we can mark intervals of α\alpha-values as being explored (meaning we have provably found the PP which are optimal over the interval) and thus do not have to perform the relatively expensive optimization over PP for any α\alpha in that interval.

Optimization of D2D^2

Compute λ(1)\lambda^{(1)}, λ(2)\lambda^{(2)} as the eigenvalues of L1L_1 and L2L_2. Solve the Linear Version of the problem using Algorithm [alg:linear_alg], obtaining α*,M*\alpha^*, M^*. According to the argument presented in the definition of linear distance (Equation [defn:linear_defn]) this solution holds for very small tt. Keep the entire frontier of matchings found during the execution of Algorithm [alg:linear_alg]. Set t=0t=0, d(0)=D(G1,G2|α*,M*,t)d(0) = D(G_1, G_2 | \alpha*, M*, t) Until d(t+ϵ)<d(t)d(t + \epsilon) < d(t): t=t+ϵt = t + \epsilon Use the linear algorithm with etL1e^{tL_1} and etL2e^{tL_2} as the input matrices, initializing the list of matchings with those found at the previous tt. Set d(t)=D(G1,G2|α*,M*,t)d(t) = D(G_1, G_2 | \alpha*, M*, t) where α*,M*\alpha*, M* are the optima from the previous step. Return the maxtd(t)\max_t d(t).

[alg:exponential_alg]

Many of the theoretical guarantees underlying our algorithm for computing D̃2\tilde{D}^2 no longer hold for the exponential version of the distance. We adapt our linear-version procedure into an algorithm for computing this version, with the caveat that the lack of these guarantees means that our upper bound on the exponential version may be looser than that on the linear version. It is still clearly an upper bound, since the α\alpha and PP found by this procedure satisfy the given constraints α>0\alpha > 0 and PTP=IP^T P = I. In particular, we have observed cases where the exponential-distance analog of Theorem [thm:matching_agreement] would not hold, meaning we cannot rule out α\alpha-intervals as we can in the linear version. Thus, this upper bound may be looser than that computed for the linear objective function.

For the exponential version of the algorithm, we first compute the list of optimal PP for the linear version, assuming (since etLI+Le^{t L} \approx I + L for very small tt) that this is also the list of optimal PP for the exponential version of the objective function at some low tt. We proceed to increment tt with some step size Δt\Delta t, in the manner of a continuation method (Allgower and Georg 2012). At each new tt value, we search for new optimal PP along the currently known frontier of optima as a function of α\alpha. When a new PP is found as the intersection of two known Pi,Pi+1P_{i}, P_{i+1}, it is inserted into the list, which is kept in order of increasing α\alpha. For each PP in this frontier, we find the optimal α\alpha, keeping PP and tt constant. Assuming infPinfαD2(G1,G2|tc)\inf_P \inf_\alpha D^2(G_1, G_2 | t_c) is unimodal as a function of tct_c, we increase tct_c until infPinfαD2(G1,G2|tc)infPinfαD2(G1,G2|tc+Δt)\inf_P \inf_\alpha D^2(G_1, G_2 | t_c) \geq \inf_P \inf_\alpha D^2(G_1, G_2 | t_c + \Delta t), storing all PP matrices found as optima at each tct_c value. PP which were on the lower convex hull at some prior value of tt but not the current value are retained, as they may regain optimality for some α\alpha-range at a future value of tt (we have observed this, in practice). For this list P1,P2PmP_1, P_2 \ldots P_m, we then compute suptinfαinfiD2(G1,G2|Pi)\sup_t \inf_\alpha \inf_i D^2(G_1, G_2 | P_i). Since the exponential map is continuous, and we are incrementing tt by very small steps, we also propose the further computational shortcut of storing the list of optimal α\alpha at time tt to use as starting points for the optimization at t+Δtt + \Delta t. In practice, this made little difference in the runtime of our optimization procedure.

Algorithm Correctness Proof

[thm:LAP_bound] For any two graphs G1G_1 and G2G_2 with Laplacians L(G1)L(G_1) and L(G2)L(G_2), for fixed α\alpha, the optimization over PP given in the innermost loop of Equation [defn:linear_defn] is upper bounded by a Linear Assignment Problem as defined in Equation [eqn:lap_defn]. This LAP is given by taking RR to be the eigenvalues λj(1)\lambda^{(1)}_j of L(G1)L(G_1) and SS to be the eigenvalues λi(2)\lambda^{(2)}_i of L(G2)L(G_2), with the cost of a pair (equivalently, one entry of the cost matrix CC) given by Cij=c(si,rj)=c(λi(2),λj(1))=(1αλj(1)αλi(2))2\begin{aligned} \label{eqn:cost_matrix_entry} C_{ij} = c(s_i, r_j) = c\left(\lambda^{(2)}_i, \lambda^{(1)}_j\right) = {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \end{aligned}

L(G1)L(G_1) and L(G2)L(G_2) are both real symmetric matrices, so they may be diagonalized as L(Gi)=UiΛiUiTL(G_i) = U_i \Lambda_i U_i^T, where the UiU_i are rotation matrices, and the Λi\Lambda_i are diagonal matrices with the eigenvalues λ1(i),λ2(i)λni(i)\lambda^{(i)}_1, \lambda^{(i)}_2 \ldots \lambda^{(i)}_{n_i} along the diagonal. Because the Frobenius norm is invariant under rotation, we have: $$\begin{aligned} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P L(G_1) U_1 - \alpha U_2^T L(G_2) P U_1 \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P U_1 \Lambda_1 U_1^T U_1 - \alpha U_2^T U_2 \Lambda_2 U_2^T P U_1 \right| \right|}^2_F \label{eqn:diagonal} \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P U_1 \Lambda_1 - \alpha \Lambda_2 U_2^T P U_1 \right| \right|}^2_F. \nonumber \\ \intertext{Because the $U_i$ are orthogonal, the transformation $ \tilde{P} = U_2^T P U_1$ preserves orthogonality, so} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} P \Lambda_1 - \alpha \Lambda_2 P \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} \Lambda_1 \right| \right|}^2_F + {\left| \left| \frac{}{} \alpha \Lambda_2 P \right| \right|}^2_F - 2 \text{Tr} \left[ P^T \Lambda_2 P \Lambda_1 \right] \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \text{Tr} \left[ \frac{1}{\alpha^2} \Lambda_1^2 \right] + \text{Tr} \left[ \alpha^2 P^T \Lambda_2^2 P \right] - 2 \text{Tr} \left[ P^T \Lambda_2 P \Lambda_1 \right] \right) \nonumber \\ \intertext{writing $P = [p_{ij}]$,} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\lambda^{(2)}_i}^2 \right. \label{eqn:f_terms} \\* &\left. \qquad \qquad \qquad \qquad \qquad - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \nonumber \\* &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 \left( \frac{1}{\alpha^2} {\lambda^{(1)}_j}^2 - 2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} + \alpha^2 {\lambda^{(2)}_i}^2 \right) \right) \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \label{eqn:simple_d_tilde} \end{aligned}$$ For any given α\alpha, infPTP=I(i=1n2j=1n1pij2(λj(1)ααλi(2))2)infP̃|sub(P̃)(i=1n2j=1n1p̃ij2(λj(1)ααλi(2))2),\begin{aligned} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{\lambda^{(1)}_j}{\alpha} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \leq \inf_{\tilde{P} | \text{sub}(\tilde{P})} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} \tilde{p}_{ij}^2 {\left( \frac{\lambda^{(1)}_j}{\alpha} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \quad , \nonumber \end{aligned}

where subperm(P̃)\text{subperm}(\tilde{P}) could be any other condition more strict than the constraint PTP=IP^T P = I. Here we take this stricter constraint to be the condition that P̃\tilde{P} is a subpermutation matrix: an orthogonal matrix (i.e. P̃TP̃=I)\tilde{P}^T \tilde{P}=I) for which P̃{0,1}n2×n1\tilde{P} \in \{0,1\}^{n_2 \times n_1}. Equivalently, a subpermutation matrix is a {0,1}\{0,1\}-valued matrix [p̃ij][\tilde{p}_{ij}] such that for each i{1,n1n2}i \in \{1, \ldots n_1 \leq n_2 \}, exactly one j{1,n2n1}j \in \{1, \ldots n_2 \geq n_1\} takes the value 1 rather than 0 (so j=1n2P̃ji=1\sum_{j=1}^{n_2} \tilde{P}_{j i}=1), and for each j{1,n2n1}j \in \{1, \ldots n_2 \geq n_1 \}, either zero or one i{1,n1n2}i \in \{1, \ldots n_1 \leq n_2\} takes the value 1 rather than 0 (so i=1n1P̃ji1\sum_{i=1}^{n_1} \tilde{P}_{j i} \leq 1).

Furthermore, this optimization is exactly a linear assignment problem of eigenvalues of L(G1)L(G_1) to L(G2)L(G_2), with the cost of a pair (λj(1),λi(2))\left(\lambda^{(1)}_j, \lambda^{(2)}_i \right) given by c(λj(1),λi(2))=(1αλj(1)αλi(2))2c\left(\lambda^{(1)}_j, \lambda^{(2)}_i \right) = {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2

Note also that the same argument applies to the innermost two optimizations of the calculation of D2D^2 (the exponential version of the diffusion distance) as well as Dr2D^2_r. In the D2D^2 case the entries of the cost matrix are instead given by c(λj(1),λi(2))=(e1αλj(1)eαλi(2))2c\left(\lambda^{(1)}_j, \lambda^{(2)}_i \right) = {\left( e^{\frac{1}{\alpha} {\lambda^{(1)}_j}} - e^{\alpha {\lambda^{(2)}_i}} \right)}^2 If we instead loosen the constraints on PP, we can calculate a lower bound on the distance. See Section 2.4 for lower bound details.

Recall that our definition of a ‘matching’ in Section 1.4.2 was a PP matrix representing a particular solution to the linear assignment problem with costs given as in Equation [eqn:cost_matrix_entry]. For given G1,G2G_1, G_2, and some matching MM, let fM(α)=D̃2(G1,G2|α,U2TMU1)\begin{aligned} \label{eqn:f_alpha_defn} f_M(\alpha) = \tilde{D}^2 (G_1, G_2 | \alpha, U_2^T M U_1) \end{aligned} where U1,U2U_1, U_2 diagonalize L1L_1 and L2L_2 as in Equation [eqn:diagonal].

[lem:unique_soln] For two unique matchings M1M_1 and M2M_2 (for the same G1,G2G_1, G_2) the equation fM1(α)fM2(α)=0f_{M_1}(\alpha) - f_{M_2}(\alpha) = 0 has at most one real positive solution in α\alpha. This follows from the fact that when PP and tt are fixed, the objective function is a rational function in α\alpha (see Equation [eqn:f_terms]), with a quadratic numerator and an asymptote at α=0\alpha = 0.

By Equation [eqn:f_terms], we have fM1(α)fM2(α)=(1α2j=1n1λj(1)2+α2i=1n2j=1n1[M1]ij2λi(2)22i=1n2j=1n1[M1]ij2λi(2)λj(1))(1α2j=1n1λj(1)2+α2i=1n2j=1n1[M2]ij2λi(2)22i=1n2j=1n1[M2]ij2λi(2)λj(1))=α2(i=1n2j=1n1[M1]ij2λi(2)2i=1n2j=1n1[M2]ij2λi(2)2)+(2i=1n2j=1n1[M2]ij2λi(2)λj(1)2i=1n2j=1n1[M1]ij2λi(2)λj(1))\begin{aligned} & f_{M_1}(\alpha) - f_{M_2}(\alpha) = \nonumber \\ & \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i}^2 - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \\ & \quad - \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i}^2 - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \\ &= \alpha^2 \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i}^2 - \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i}^2 \right) \\ & \quad + \left( 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_2]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} {[M_1]}_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \end{aligned} $$\begin{aligned} \intertext{Abbreviating the sums, we have} \alpha^2 \left(A_1 - A_2\right) + \left(C_2 - C_1\right) &= 0 \\ \intertext{and so} \alpha = \pm \sqrt{\frac{C_2 - C_1}{A_1 - A_2}} \end{aligned}$$ Since A1,A2,C1,C2A_1, A_2, C_1, C_2 are all nonnegative reals, at most one of these roots is positive.

We will say that a matching MM “assigns” jj to ii if and only if Mij=1M_{ij} = 1.

[thm:matching_agreement] If two matchings M1M_1 and M3M_3 which yield optimal upper bounds for the linear distance D̃2\tilde{D}^2 (at α1α\alpha_1 \leq \alpha and α3α\alpha_3 \geq \alpha respectively) agree on a set of assignments, then the optimal MM at α\alpha must also agree with that set of assignments.

We need the following lemmas:

[lem:mono_incr] If an optimal matching assigns ii to m(i)m(i) (so that eigenvalue λi(1)\lambda^{(1)}_i of G1G_1 is paired with λf(i)(2)\lambda^{(2)}_{f(i)} of G2G_2 in the sum of costs Equation [eqn:cost_matrix_entry]), then the sequence m(1),m(2),m(n1)m(1), m(2), \ldots m(n_1) is monotonic increasing.

This follows from the fact that the two sequences of eigenvalues are monotonic nondecreasing, so if there’s a ‘crossing’ (i1<i2i_1 < i_2 but m(i2)<m(i1)m(i_2) < m(i_1)) then the new matching obtained by uncrossing those two pairs (performing a 2-opt step as defined in (Croes 1958)) has strictly lesser objective function value. Hence an optimal matching can’t contain any such crossings.

For all positive real α*ϵ>0\alpha^* \geq \epsilon > 0, let M1M_1 be an optimal matching at α*ϵ\alpha^* - \epsilon and M2M_2 be optimal at α*+ϵ\alpha^* + \epsilon. For 1in11 \leq i \leq n_1, let s1(i)s_1(i) and s2(i)s_2(i) be the indices of λ(2)\lambda^{(2)} paired with ii in M1M_1 and M2M_2, respectively. Then for all ii, s1(i)s2(i)s_1(i) \leq s_2(i).

Define a “run” for s1,s2s_1, s_2 as a sequence of consecutive indices l,l+1,l+kl, l+1, \ldots l+k in [1,n1][1,n_1] such that for any ll, l+1l+1: min(s1(l+1),s2(l+1))<max(s1(l),s2(l))\min(s_1(l+1), s_2(l+1)) < \max(s_1(l), s_2(l)). The following must be true about a “run”:

  1. Within a run, either s1(l)<s2(l)s_1(l) < s_2(l) or s1(l)>s2(l)s_1(l) > s_2(l) for all ll. Otherwise, we have one or more crossings (as in Lemma [lem:mono_incr]): for some ll we have s1(l)>s1(l+1)s_1(l) > s_1(l+1) or s2(l)>s2(l+1)s_2(l) > s_2(l+1). Any crossing may be uncrossed for a strictly lower objective function value - violating optimality of M1M_1 or M2M_2.

  2. Any pair of matchings as defined above consists of a sequence of runs, where we allow a run to be trivial i.e. be a single index.

Next, we show that within a run, we must have s1(i)<s2(i)s_1(i) < s_2(i) for all ii. Let S={l,l+1,l+k}S = \{l, l+1, \ldots l+k \} be a run. By optimality of M1M_1, M2M_2 at α*ϵ\alpha^* - \epsilon and α*+ϵ\alpha^* + \epsilon respectively, we have:

$$\begin{aligned} \label{eqn:dist_comp_sum} \sum_{i \in S} {\left(\frac{1}{\alpha^* - \epsilon}\lambda^{(1)}_i - (\alpha^*-\epsilon)\lambda^{(2)}_{s_1(i)} \right)}^2 < \sum_{i\in S} {\left( \frac{1}{\alpha^* - \epsilon}\lambda^{(1)}_i - (\alpha^*-\epsilon)\lambda^{(2)}_{s_2(i)} \right)}^2 \\ \intertext{and} \sum_{i \in S} {\left(\frac{1}{\alpha^* + \epsilon}\lambda^{(1)}_i - (\alpha^*+\epsilon)\lambda^{(2)}_{s_2(i)}\right)}^2 < \sum_{i \in S} {\left(\frac{1}{\alpha + \epsilon}\lambda^{(1)}_i - (\alpha+\epsilon)\lambda^{(2)}_{s_1(i)}\right)}^2. \end{aligned}$$ Respectively, these simplify to $$\begin{aligned} - \sum_{i \in S}\left( \lambda^{(2)}_{s_1(i)} - \lambda^{(2)}_{s_2(i)} \right) \left( -2 \lambda^{(i)}_{i} + {(\alpha^* - \epsilon)}^2 \left( \lambda^{(2)}_{s_1(i)} + \lambda^{(2)}_{s_2(i)} \right) \right) > 0 \\ \intertext{and} \sum_{i \in S} \left( \lambda^{(2)}_{s_1(i)} - \lambda^{(2)}_{s_2(i)} \right) \left( -2 \lambda^{(i)}_{i} + {(\alpha^* + \epsilon)}^2 \left( \lambda^{(2)}_{s_1(i)} + \lambda^{(2)}_{s_2(i)} \right) \right) > 0. \\ \end{aligned}$$ Summing these inequalities and cancelling 2λi(i)-2 \lambda^{(i)}_{i}, we have: iS{(α*+ϵ)2((λs1(i)(2))2+(λs2(i)(2))2)(α*ϵ)2((λs1(i)(2))2+(λs2(i)(2))2)}>0.\begin{aligned} &\sum_{i \in S} \left\{ {(\alpha^* + \epsilon)}^2 \left( {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 + {\left(\lambda^{(2)}_{s_2(i)}\right)}^2 \right) \right. - \left. {(\alpha^* - \epsilon)}^2 \left( {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 + {\left(\lambda^{(2)}_{s_2(i)}\right)}^2 \right) \right\}> 0. \end{aligned} Summing and reducing gives us 4α*ϵ(iS(λs1(i)(2))2iS(λs2(i)(2))2)>0and soiS(λs1(i)(2))2>iS(λs2(i)(2))2.4 \alpha^* \epsilon \left( \sum_{i \in S} {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 - \sum_{i \in S} {\left( \lambda^{(2)}_{s_2(i)} \right) }^2 \right) > 0 \quad \text{and so} \quad \sum_{i \in S} {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 > \sum_{i \in S} {\left( \lambda^{(2)}_{s_2(i)} \right) }^2.

However, since the λj(2)\lambda^{(2)}_j are monotonic nondecreasing, this means we cannot also have s1(i)>s2(i)s_1(i) > s_2(i) for all iSi \in S, since that would imply i=1n1(λs1(i)(2))2<i=1n1(λs2(i)(2))2.\sum_{i=1}^{n_1} {\left(\lambda^{(2)}_{s_1(i)}\right)}^2 < \sum_{i=1}^{n_1} {\left( \lambda^{(2)}_{s_2(i)} \right) }^2.

Therefore, in a run of arbitrary length, all indices must move ‘forward’ (meaning that s1(i)<s2(i)s_1(i) < s_2(i) for all ii in the run), and so (since any pair of matchings optimal at such α\alpha define a set of runs) we must have s1(i)s2(i)s_1(i) \leq s_2(i). This completes the proof of the lemma.

Thus, for three matchings M1,M2,M3M_1, M_2, M_3 which are optimal at a sequence of α1α2α3\alpha_1 \leq \alpha_2 \leq \alpha_3, we must have s1(i)s2(i)s3(i)s_1(i) \leq s_2(i) \leq s_3(i) for all ii. In particular, if s1(i)=s3(i)s_1(i) = s_3(i), we must also have s1(i)=s2(i)=s3(i)s_1(i) = s_2(i) = s_3(i).

If two matchings M1M_1 and M3M_3 yield optimal upper bounds for the linear distance D̃2\tilde{D}^2 at α1\alpha_1 and α3\alpha_3 respectively, and fM1(α2)=fM2(α2)f_{M_1}(\alpha_2) = f_{M_2}(\alpha_2) for some α2\alpha_2 s.t. α1α2α3\alpha_1 \leq \alpha_2 \leq \alpha_3, then either (1) M1M_1 and M3M_3 are optimal over the entire interval [α1,α3][\alpha_1, \alpha_3] or (1) some other matching M2M_2 improves over M1M_1 and M3M_3 at α2\alpha_2.

This follows directly from the facts that fM1(α)f_{M_1}(\alpha) and fM2(α)f_{M_2}(\alpha) (as defined in Equation [eqn:f_alpha_defn]), can only meet at one real positive value of α\alpha (Lemma [lem:unique_soln]). Say that the cost curves for M1M_1 (known to be optimal at α=α1\alpha = \alpha_1) and M3M_3 (optimal at α=α3\alpha = \alpha_3) meet at α=α2\alpha = \alpha_2, and furthermore assume that α1α2α3\alpha_1 \leq \alpha_2 \leq \alpha_3. If some other matching M2M_2 improves over (meaning, has lesser obj. function value as a function of α\alpha) M1M_1 or M3M_3 anywhere in the interval [α1,α3][\alpha_1, \alpha_3], it must improve over both at α=α2\alpha=\alpha_2, since it may intersect each of these cost curves at most once on this interval. If M1M_1 and M3M_3 are both optimal at their intersection point (meaning no such distinct M2M_2 exists) then we know that no other matching improves on either of them over the the interval [α1,α3][\alpha_1, \alpha_3] and may therefore mark it as explored during the outermost loop (otimization over α\alpha) of Algorithm [alg:linear_alg].

Together, the preceeding properties verify that our algorithm will indeed find the joint optimum over all α\alpha and PP (for fixed t=ct=c, for D̃\tilde{D}, subject to subpermutation constraints on PP): it allows us to find the entire set of PP subpermutation matrices which appear on the lower convex hull of distance as a function of alpha.

Implementation Details

We implement Algorithms [alg:linear_alg] and [alg:exponential_alg] in the programming language Python (version 3.6.1) (Rossum 1995). Numerical arrays were stored using the numpy package (Van Der Walt, Colbert, and Varoquaux 2011). Our inner LAP solver was the package lapsolver (Heindl 2018). Univariate optimization over tt and α\alpha was performed with the ‘bounded’ method of the scipy.optimize package (Jones et al., n.d.), with bounds set at [0,10.0][0, 10.0] for each variable and a input tolerance of 101210^{-12}. Laplacians were computed with the laplacian method from the package networkX (Hagberg, Swart, and S Chult 2008), and their eigenvalues were computed with scipy.linalg.eigh.

Because of numerical precision issues arising during eigenvalue computation, it can be difficult to determine when two matchings agree, using eigenvalue comparison. In practice we ignore this issue and assume that two matchings are only identical if they associate the same indices of the two lists of eigenvalues. This means we may be accumulating multiple equivalent representations of the same matching (up to multiplicity of eigenvalues) during our sweeps through tt and α\alpha. We leave mitigating this inefficiency for future work.

Code for computing diffusion distance, both with our algorithm and with naive univariate optimization, may be found in the Supplementary Information associated with this paper, as well as a maintained GitHub repository (Cory B. Scott 2021).