In previous chapters we have introduced and defined diffusion distance. A crucial component of the optimization required to calculate GDD is the optimization of the parameter for conversion between coarse and fine time scales. Optimizing in addition to optimizing the prolongation matrix under transitive constraints , is a nontrivial numerical problem that in our experience seems to require new methods. We develop such methods here.
In this section, we describe the algorithm used to calculate upper bounds on graph distances as the joint optima (over , , and ) of the distance equations Equation [defn:exp_defn] and Equation [defn:linear_defn], under orthogonality constraints only, i.e. the case . At the core of both algorithms is a subroutine to solve the Linear Assignment Problem (LAP - see Equation [eqn:lap_defn]) repeatedly, in order to find the subpermutation matrix which is optimal at a particular value of . Namely, we are interested in calculating as
$$\begin{aligned} \tilde{D}(G_1, G_2) &= \min_\alpha f(\alpha) \quad \text{where} \quad f(\alpha) = \inf_{P | P^T P = I} \left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right| \\ \intertext{which, for orthogonality or any other compact constraint} &= \min_{P | P^T P = I} \left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right|. \nonumber \end{aligned}$$
However, we have found that the unique structure of this optimization problem admits a specialized procedure which is faster and more accurate than nested univariate optimization of and (where each innermost function evaluation consists of a full optimization over at some ). We first briefly describe the algorithm used to find the optimal and for . The formal description of the algorithm is given by Algorithm [alg:linear_alg]. In both cases, we reduce the computational complexity of the optimization over by imposing the additional constraint that must be a subpermutation matrix when rotated into the spectral basis (we define subpermutations in the proof of Theorem [thm:LAP_bound]). This constraint is compatible with the orthogonality constraint (all subpermutation matrices are orthogonal, but not vice versa). The tradeoff of this reduction of computational complexity is that we can only guarantee that our optima are upper bounds of the optima over all orthogonal . However, in practice, this bound seems to be tight: over a large number of empirical evaluations we did not find any example where orthogonally-constrained optimization was able to improve in objective function value over optimization constrained to subpermutation matrices. Therefore, we shall for the remainder of this paper refer to the optima calculated as distance values, when strictly they are distance upper bounds. We also note here that a distance lower bound is also possible to calculate by relaxing the constraints in (for instance, by replacing the optimization over all with a less constrained matching problem - see Section 2.4).
(a)
(b)
[fig:distance_example_fig]
[alg:linear_alg]
Compute , as the eigenvalues of and . Compute, by optimizing a linear assignment, and as the optimal matchings at , respectively. Initialize the list of optimal matchings as . Until the current list of matchings is not expanded in the following step, or the entire interval , is marked as explored: Attempt to expand the list of optimal matchings by solving a linear assignment problem at the where the cost curves of two matchings (currently in the list) intersect. If no better assignment exists, then mark the interval covered by those matchings as explored, as guaranteed by Theorem [thm:matching_agreement]. Return the lowest-cost and its optimal .
Joint optimization of over and is a nested optimization problem (see (Onoyama et al. 2000) and (Sinha, Malo, and Deb 2018) for a description of nested optimization), with potential combinatorial optimization over dependent on each choice of . Furthermore, the function is both multimodal and continuous but, in general, with a discontinuous derivative (See Figure [fig:distance_example_fig]). Univariate optimization procedures such as Golden Section Search result in many loops of some procedure to optimize over , which in our restricted case must each time compute a full solution to a LAP with weights. In our experience, this means that these univariate methods have a tendency to get stuck in local optima. We reduce the total number of calls to the LAP solver, as well as the size of the LAPs solved, by taking advantage of several unique properties of the optimization as a function of . When the optimal and are known for and , then for any such that , the optimal at must satisfy: (see Theorem [thm:matching_agreement]). Thus, the optimization over at is already partially solved given the solutions at and , and so we need only re-compute the remaining (smaller) subproblem on the set of assignments where and disagree. This has two consequences for our search over : First, the size of LAP problems which must be solved at each step decreases over time (as we find -optima for a denser and denser set of ). Secondly, these theoretical guarantees mean that we can mark intervals of -values as being explored (meaning we have provably found the which are optimal over the interval) and thus do not have to perform the relatively expensive optimization over for any in that interval.
Compute , as the eigenvalues of and . Solve the Linear Version of the problem using Algorithm [alg:linear_alg], obtaining . According to the argument presented in the definition of linear distance (Equation [defn:linear_defn]) this solution holds for very small . Keep the entire frontier of matchings found during the execution of Algorithm [alg:linear_alg]. Set , Until : Use the linear algorithm with and as the input matrices, initializing the list of matchings with those found at the previous . Set where are the optima from the previous step. Return the .
[alg:exponential_alg]
Many of the theoretical guarantees underlying our algorithm for computing no longer hold for the exponential version of the distance. We adapt our linear-version procedure into an algorithm for computing this version, with the caveat that the lack of these guarantees means that our upper bound on the exponential version may be looser than that on the linear version. It is still clearly an upper bound, since the and found by this procedure satisfy the given constraints and . In particular, we have observed cases where the exponential-distance analog of Theorem [thm:matching_agreement] would not hold, meaning we cannot rule out -intervals as we can in the linear version. Thus, this upper bound may be looser than that computed for the linear objective function.
For the exponential version of the algorithm, we first compute the list of optimal for the linear version, assuming (since for very small ) that this is also the list of optimal for the exponential version of the objective function at some low . We proceed to increment with some step size , in the manner of a continuation method (Allgower and Georg 2012). At each new value, we search for new optimal along the currently known frontier of optima as a function of . When a new is found as the intersection of two known , it is inserted into the list, which is kept in order of increasing . For each in this frontier, we find the optimal , keeping and constant. Assuming is unimodal as a function of , we increase until , storing all matrices found as optima at each value. which were on the lower convex hull at some prior value of but not the current value are retained, as they may regain optimality for some -range at a future value of (we have observed this, in practice). For this list , we then compute . Since the exponential map is continuous, and we are incrementing by very small steps, we also propose the further computational shortcut of storing the list of optimal at time to use as starting points for the optimization at . In practice, this made little difference in the runtime of our optimization procedure.
[thm:LAP_bound] For any two graphs and with Laplacians and , for fixed , the optimization over given in the innermost loop of Equation [defn:linear_defn] is upper bounded by a Linear Assignment Problem as defined in Equation [eqn:lap_defn]. This LAP is given by taking to be the eigenvalues of and to be the eigenvalues of , with the cost of a pair (equivalently, one entry of the cost matrix ) given by
and are both real symmetric matrices, so they may be diagonalized as , where the are rotation matrices, and the are diagonal matrices with the eigenvalues along the diagonal. Because the Frobenius norm is invariant under rotation, we have: $$\begin{aligned} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} P L(G_1) - \alpha L(G_2) P \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P L(G_1) U_1 - \alpha U_2^T L(G_2) P U_1 \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P U_1 \Lambda_1 U_1^T U_1 - \alpha U_2^T U_2 \Lambda_2 U_2^T P U_1 \right| \right|}^2_F \label{eqn:diagonal} \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} U_2^T P U_1 \Lambda_1 - \alpha \Lambda_2 U_2^T P U_1 \right| \right|}^2_F. \nonumber \\ \intertext{Because the $U_i$ are orthogonal, the transformation $ \tilde{P} = U_2^T P U_1$ preserves orthogonality, so} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} P \Lambda_1 - \alpha \Lambda_2 P \right| \right|}^2_F \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} {\left| \left| \frac{1}{\alpha} \Lambda_1 \right| \right|}^2_F + {\left| \left| \frac{}{} \alpha \Lambda_2 P \right| \right|}^2_F - 2 \text{Tr} \left[ P^T \Lambda_2 P \Lambda_1 \right] \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \text{Tr} \left[ \frac{1}{\alpha^2} \Lambda_1^2 \right] + \text{Tr} \left[ \alpha^2 P^T \Lambda_2^2 P \right] - 2 \text{Tr} \left[ P^T \Lambda_2 P \Lambda_1 \right] \right) \nonumber \\ \intertext{writing $P = [p_{ij}]$,} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \frac{1}{\alpha^2} \sum\limits_{j=1}^{n_1} {\lambda^{(1)}_j}^2 + \alpha^2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\lambda^{(2)}_i}^2 \right. \label{eqn:f_terms} \\* &\left. \qquad \qquad \qquad \qquad \qquad - 2 \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} \right) \nonumber \\* &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 \left( \frac{1}{\alpha^2} {\lambda^{(1)}_j}^2 - 2 {\lambda^{(2)}_i} {\lambda^{(1)}_j} + \alpha^2 {\lambda^{(2)}_i}^2 \right) \right) \nonumber \\ &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \right) \label{eqn:simple_d_tilde} \end{aligned}$$ For any given ,
where could be any other condition more strict than the constraint . Here we take this stricter constraint to be the condition that is a subpermutation matrix: an orthogonal matrix (i.e. for which . Equivalently, a subpermutation matrix is a -valued matrix such that for each , exactly one takes the value 1 rather than 0 (so ), and for each , either zero or one takes the value 1 rather than 0 (so ).
Furthermore, this optimization is exactly a linear assignment problem of eigenvalues of to , with the cost of a pair given by
Note also that the same argument applies to the innermost two optimizations of the calculation of (the exponential version of the diffusion distance) as well as . In the case the entries of the cost matrix are instead given by If we instead loosen the constraints on , we can calculate a lower bound on the distance. See Section 2.4 for lower bound details.
Recall that our definition of a ‘matching’ in Section 1.4.2 was a matrix representing a particular solution to the linear assignment problem with costs given as in Equation [eqn:cost_matrix_entry]. For given , and some matching , let where diagonalize and as in Equation [eqn:diagonal].
[lem:unique_soln] For two unique matchings and (for the same ) the equation has at most one real positive solution in . This follows from the fact that when and are fixed, the objective function is a rational function in (see Equation [eqn:f_terms]), with a quadratic numerator and an asymptote at .
By Equation [eqn:f_terms], we have $$\begin{aligned} \intertext{Abbreviating the sums, we have} \alpha^2 \left(A_1 - A_2\right) + \left(C_2 - C_1\right) &= 0 \\ \intertext{and so} \alpha = \pm \sqrt{\frac{C_2 - C_1}{A_1 - A_2}} \end{aligned}$$ Since are all nonnegative reals, at most one of these roots is positive.
We will say that a matching “assigns” to if and only if .
[thm:matching_agreement] If two matchings and which yield optimal upper bounds for the linear distance (at and respectively) agree on a set of assignments, then the optimal at must also agree with that set of assignments.
We need the following lemmas:
[lem:mono_incr] If an optimal matching assigns to (so that eigenvalue of is paired with of in the sum of costs Equation [eqn:cost_matrix_entry]), then the sequence is monotonic increasing.
This follows from the fact that the two sequences of eigenvalues are monotonic nondecreasing, so if there’s a ‘crossing’ ( but ) then the new matching obtained by uncrossing those two pairs (performing a 2-opt step as defined in (Croes 1958)) has strictly lesser objective function value. Hence an optimal matching can’t contain any such crossings.
For all positive real , let be an optimal matching at and be optimal at . For , let and be the indices of paired with in and , respectively. Then for all , .
Define a “run” for as a sequence of consecutive indices in such that for any , : . The following must be true about a “run”:
Within a run, either or for all . Otherwise, we have one or more crossings (as in Lemma [lem:mono_incr]): for some we have or . Any crossing may be uncrossed for a strictly lower objective function value - violating optimality of or .
Any pair of matchings as defined above consists of a sequence of runs, where we allow a run to be trivial i.e. be a single index.
Next, we show that within a run, we must have for all . Let be a run. By optimality of , at and respectively, we have:
$$\begin{aligned} \label{eqn:dist_comp_sum} \sum_{i \in S} {\left(\frac{1}{\alpha^* - \epsilon}\lambda^{(1)}_i - (\alpha^*-\epsilon)\lambda^{(2)}_{s_1(i)} \right)}^2 < \sum_{i\in S} {\left( \frac{1}{\alpha^* - \epsilon}\lambda^{(1)}_i - (\alpha^*-\epsilon)\lambda^{(2)}_{s_2(i)} \right)}^2 \\ \intertext{and} \sum_{i \in S} {\left(\frac{1}{\alpha^* + \epsilon}\lambda^{(1)}_i - (\alpha^*+\epsilon)\lambda^{(2)}_{s_2(i)}\right)}^2 < \sum_{i \in S} {\left(\frac{1}{\alpha + \epsilon}\lambda^{(1)}_i - (\alpha+\epsilon)\lambda^{(2)}_{s_1(i)}\right)}^2. \end{aligned}$$ Respectively, these simplify to $$\begin{aligned} - \sum_{i \in S}\left( \lambda^{(2)}_{s_1(i)} - \lambda^{(2)}_{s_2(i)} \right) \left( -2 \lambda^{(i)}_{i} + {(\alpha^* - \epsilon)}^2 \left( \lambda^{(2)}_{s_1(i)} + \lambda^{(2)}_{s_2(i)} \right) \right) > 0 \\ \intertext{and} \sum_{i \in S} \left( \lambda^{(2)}_{s_1(i)} - \lambda^{(2)}_{s_2(i)} \right) \left( -2 \lambda^{(i)}_{i} + {(\alpha^* + \epsilon)}^2 \left( \lambda^{(2)}_{s_1(i)} + \lambda^{(2)}_{s_2(i)} \right) \right) > 0. \\ \end{aligned}$$ Summing these inequalities and cancelling , we have: Summing and reducing gives us
However, since the are monotonic nondecreasing, this means we cannot also have for all , since that would imply
Therefore, in a run of arbitrary length, all indices must move ‘forward’ (meaning that for all in the run), and so (since any pair of matchings optimal at such define a set of runs) we must have . This completes the proof of the lemma.
Thus, for three matchings which are optimal at a sequence of , we must have for all . In particular, if , we must also have .
If two matchings and yield optimal upper bounds for the linear distance at and respectively, and for some s.t. , then either (1) and are optimal over the entire interval or (1) some other matching improves over and at .
This follows directly from the facts that and (as defined in Equation [eqn:f_alpha_defn]), can only meet at one real positive value of (Lemma [lem:unique_soln]). Say that the cost curves for (known to be optimal at ) and (optimal at ) meet at , and furthermore assume that . If some other matching improves over (meaning, has lesser obj. function value as a function of ) or anywhere in the interval , it must improve over both at , since it may intersect each of these cost curves at most once on this interval. If and are both optimal at their intersection point (meaning no such distinct exists) then we know that no other matching improves on either of them over the the interval and may therefore mark it as explored during the outermost loop (otimization over ) of Algorithm [alg:linear_alg].
Together, the preceeding properties verify that our algorithm will indeed find the joint optimum over all and (for fixed , for , subject to subpermutation constraints on ): it allows us to find the entire set of subpermutation matrices which appear on the lower convex hull of distance as a function of alpha.
We implement Algorithms [alg:linear_alg] and [alg:exponential_alg] in the programming language Python (version 3.6.1) (Rossum 1995). Numerical arrays were stored using the numpy package (Van Der Walt, Colbert, and Varoquaux 2011). Our inner LAP solver was the package lapsolver (Heindl 2018). Univariate optimization over and was performed with the ‘bounded’ method of the scipy.optimize package (Jones et al., n.d.), with bounds set at for each variable and a input tolerance of . Laplacians were computed with the laplacian method from the package networkX (Hagberg, Swart, and S Chult 2008), and their eigenvalues were computed with scipy.linalg.eigh.
Because of numerical precision issues arising during eigenvalue computation, it can be difficult to determine when two matchings agree, using eigenvalue comparison. In practice we ignore this issue and assume that two matchings are only identical if they associate the same indices of the two lists of eigenvalues. This means we may be accumulating multiple equivalent representations of the same matching (up to multiplicity of eigenvalues) during our sweeps through and . We leave mitigating this inefficiency for future work.
Code for computing diffusion distance, both with our algorithm and with naive univariate optimization, may be found in the Supplementary Information associated with this paper, as well as a maintained GitHub repository (Cory B. Scott 2021).