Diffusion Distance

In this Chapter we provide the definition of Graph Diffusion Distance, as well as providing motivation for why the optimiztion over tt is an essential component of the GDD calculation. We also briefly introduce some variants of GDD which will be covered in more detail in Chapter 3. The diffusion distance calculations presented throughout this thesis depend on an upper bound of the innermost optimization over PP and α\alpha; in Section 2.4 we define a lower bound on the same optimization. This lower bound will be useful in some of the GDD property proofs in Chapter 3.

Diffusion Distance Definition

We generalize the diffusion distance defined by Hammond et al. (Hammond, Gur, and Johnson 2013). This distortion measure between two undirected graphs G1G_1 and G2G_2, of the same size, was defined as: DHammond(G1,G2)=supt||etL1etL2||F2\begin{aligned} D_{\text{Hammond}} (G_1, G_2) = \sup_{t} { \left| \left| e^{t L_1} - e^{t L_2} \right| \right| }_F^2 \end{aligned} where LiL_i represents the graph Laplacian of GiG_i. A typical example of this distance, as tt is varied, can be seen in Figure [fig:hammond_distance].

Plot of graph distance between two graphs of the same size (a 4 \times 4 grid and a path graph on 16 vertices), following the definition of Hammond et al. (Hammond, Gur, and Johnson 2013), as the parameter t is varied. Note 1) there is single global maximum of 1.592 at t= 0.891 and 2) similarity to the plot of our distance measure as a function of t given in Figure [fig:distance_example_fig](a).
Plot of graph distance between two graphs of the same size (a 4×44 \times 4 grid and a path graph on 16 vertices), following the definition of Hammond et al. (Hammond, Gur, and Johnson 2013), as the parameter tt is varied. Note 1) there is single global maximum of 1.5921.592 at t=0.891t= 0.891 and 2) similarity to the plot of our distance measure as a function of tt given in Figure [fig:distance_example_fig](a).

[fig:hammond_distance]

A plot illustrating unimodality of diffusion distance. D^2 was calculated between two grid graphs \text{Sq}_7 and \text{Sq}_8 of size 7 \times 7 and 8 \times 8, respectively. The distance is given by the formula D^2 \left(\left.\text{Sq}_7, \text{Sq}_8 \right| t \right) = \inf_{\alpha > 0} \inf_{P | \mathcal{C}(P)} {\left|\left| P e^{\frac{t}{a} L(\text{Sq}_7)} - e^{t \alpha L(\text{Sq}_8)} P\right| \right|}_F^2 as a function of t. The peak, at t \approx .318, yields the distance D^2 \left(\text{Sq}_7, \text{Sq}_8 \right).
A plot illustrating unimodality of diffusion distance. D2D^2 was calculated between two grid graphs Sq7\text{Sq}_7 and Sq8\text{Sq}_8 of size 7×77 \times 7 and 8×88 \times 8, respectively. The distance is given by the formula D2(Sq7,Sq8|t)=infα>0infP|𝒞(P)||PetaL(Sq7)etαL(Sq8)P||F2D^2 \left(\left.\text{Sq}_7, \text{Sq}_8 \right| t \right) = \inf_{\alpha > 0} \inf_{P | \mathcal{C}(P)} {\left|\left| P e^{\frac{t}{a} L(\text{Sq}_7)} - e^{t \alpha L(\text{Sq}_8)} P\right| \right|}_F^2 as a function of tt. The peak, at t.318t \approx .318, yields the distance D2(Sq7,Sq8)D^2 \left(\text{Sq}_7, \text{Sq}_8 \right).

[fig:typical_ham_dist]

This may be interpreted as measuring the maximum divergence, as a function of tt, between diffusion processes starting from each vertex of each graph, as measured by the squared Euclidean distance between the column vectors of etLie^{t L_i}. Each column vjv_j of etLie^{t L_i} (which is called the Laplacian Exponential Kernel) describes a probability distribution of visits (by a random walk of duration tt, with node transition probabilities given by the columns of eLe^L) to the vertices of GiG_i, starting at vertex jj. This distance metric is then measuring the difference between the two graphs by comparing these probability distributions; the motivation between taking the supremum over all tt is that the value of the objective function at the maximum is the most these two distributions can diverge. See Figure [fig:typical_ham_dist] for an example of a distance calculation, with a characteristic peak.

For further intuition about why the peak is the most natural place to take as the distance, rather than some other arbitrary time, note that at very early times and very late times, the probability distribution of vertex visits is agnostic to graph structure: at early times no diffusion has had a chance to take place, while at very late times the distribution of vertex-visits converges to the stationary state1 for each connected component of the graph. Hence we are most interested in a regime of tt-values in between these extremes, where differences in G1G_1 and G2G_2 are apparent in their differing probability distributions.

Our contribution generalizes this measure to allow for graphs of differing size. We add two variables to this optimization: a prolongation operator, PP (represented as a rectangular matrix), and a time-scaling factor, α\alpha. The dissimilarity between two graphs G1G_1 and G2G_2 (with Laplacians Li=L(Gi)L_i = L(G_i)) is then defined as: D2(G1,G2)=supt>0infα>0infP|𝒞(P)||PetαL1eαtL2P||F2\begin{aligned} \label{defn:exp_defn} D^2(G_1,G_2) &= \sup_{t>0} \inf_{\alpha > 0} \inf_{P | \mathcal{C}(P)} {\left| \left| P e^{\frac{t}{\alpha} L_1} - e^{\alpha t L_2} P \right| \right|}^2_F \end{aligned} where 𝒞(P)\mathcal{C}(P) represents some set of constraints on the matrix PP. For the remainder of this work we use D(G1,G2)D(G_1,G_2) to refer to the distance and D2(G1,G2)D^2(G_1,G_2) to refer to the squared distance - this notation is chosen to simplify the exposition of some proofs. It will be convenient for later calculations to introduce and assume the concept of transitive constraints - by which we mean that for any constraint 𝒞\mathcal{C}, satisfaction of 𝒞\mathcal{C} by P1P_1 and P2P_2 implies satisfaction of 𝒞\mathcal{C} by their product P1P2P_1 P_2 (when such a product is defined). Some (non-exclusive) examples of transitive constraints include orthogonality, particular forms of sparsity, and their conjunctions.

The simplest transitive constraint we will consider is that PP should be orthogonal. Intuitively, an orthogonal PP represents a norm-preserving map between nodes of G1G_1 and nodes of G2G_2, so we are measuring how well diffusion on G1G_1 approximates diffusion on G2G_2, as projected by PP. Note that since in general PP is a rectangular matrix it is not necessarily true that PPT=IP P^T = I. We assume that |G1|=n1n2=|G2|\left| G_1\right| = n_1 \leq n_2 = \left| G_2 \right|; if not, the order of the operands is switched, so that PP is always at least as tall as it is wide. We also briefly consider a sparsity constraint in Section 3.4 below. Since sparsity is more difficult to treat numerically, our default constraint will be orthogonality alone. Other constraints could include bandedness and other structural constraints. We also note that because LL is finite-dimensional, the exponential map is continuous and therefore we can swap the order of optimization over tt and α\alpha. The optimization procedure outlined in this thesis optimizes these variables in the order presented above (namely: an outermost loop of maximization over tt, a middle loop of minimization over α\alpha given tt, and an innermost loop of minimization over PP given tt and α\alpha).

The other additional parameter, α\alpha, controls dilation between the passage of time in the two graphs, to account for different scales. Again, the intuition is that we are interested in the difference between structural properties of the graph (from the point of view of single-particle diffusion) independent of the absolute number of nodes in the graph. As an example, diffusion on an n×nn \times n grid is a reasonably accurate approximation of more rapid diffusion on a 2n×2n2n \times 2n grid, especially when nn is very large. In our discussion of variants of this dissimilarity score, we will use the notation D2(G1,G2|x=c)D^2(G_1, G_2 | x = c) to mean restrictions of any of our distortion measure equations where variable xx is held to a constant value cc; In cases where it is clear from context which variable is held to a fixed value cc, we will write D2(G1,G2|c)D^2(G_1, G_2 | c)

At very early times the second and higher-order terms of the Taylor Series expansion of the matrix exponential function vanish, and so etLI+tLe^{t L} \approx I + t L. This motivates the early-time or “linear” version of this distance, D̃\tilde{D}: D̃2(G1,G2)=infα>0infP|𝒞(P)||1αPL1αL2P||F21t2(infα>0infP|𝒞(P)||PetαL1eαtL2P||F2)\begin{aligned} \label{defn:linear_defn} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P | \mathcal{C}(P)} {\left| \left| \frac{1}{\alpha} P L_1 - \alpha L_2 P \right| \right|}^2_F \\ &\approx \frac{1}{t^2} \left( \inf_{\alpha > 0} \inf_{P | \mathcal{C}(P)} {\left| \left| P e^{\frac{t}{\alpha} L_1} - e^{\alpha t L_2} P \right| \right|}^2_F \right) \end{aligned}

(Note that the identity matrices cancel). The outermost optimization (maximization over tt) is removed for this version of the distance, as tt can be factored out:

||tαPL1αtL2P||F2=t2||1αPL1αL2P||F2\begin{aligned} {\left| \left| \frac{t}{\alpha} P L_1 - \alpha t L_2 P \right| \right|}^2_F = t^2 {\left| \left| \frac{1}{\alpha} P L_1 - \alpha L_2 P \right| \right|}^2_F \end{aligned}

This means that for the linear version, we only optimize α\alpha and PP. For the exponential version of the dissimilarity score, we note briefly that the supremum over tt of our objective function must exist, since for any G1,G2G_1, G_2: D2(G1,G2)D2(G1,G2|α=1,P=[I0])\begin{aligned} \label{eqn:D_limit_1} D^2(G_1, G_2) \leq D^2 \left(G_1, G_2 \left| \alpha = 1, P= \left[ \begin{array}{c} I \\ 0 \end{array}\right] \right. \right) \end{aligned} In other words, the infimum over all PP and α\alpha is bounded above by any particular choice of values for these variables. Since D2(G1,G2|t=0,α=1,P=[I0])=0,andlimtcD2(G1,G2|tc,α=1,P=[I0])=0\begin{aligned} \label{eqn:D_limit_2} D^2 &\left(G_1, G_2 \left| t = 0, \alpha = 1, P = \left[ \begin{array}{c} I \\ 0 \end{array}\right] \right. \right) = 0 , & \text{and} \\ \lim_{t_c \rightarrow \infty} D^2 & \left(G_1, G_2 \left| t_c, \alpha = 1, P= \left[ \begin{array}{c} I \\ 0 \end{array}\right] \right. \right) = 0 \end{aligned} this upper bound must have a supremum (possibly 0) at some t*[0,)t^* \in [ 0, \infty). Then D2(G1,G2|t*,α=1,P=[I0])\begin{aligned} D^2 \left(G_1, G_2 \left| t^*, \alpha = 1, P = \left[ \begin{array}{c} I \\ 0 \end{array}\right] \right. \right) \end{aligned} must be finite and therefore so must the objective function.

Directedness of Distance and Constraints

We note that this distance measure, as defined so far, is directed: the operands G1G_1 and G2G_2 serve differing roles in the objective function. This additionally makes the constraint predicate 𝒞(P)\mathcal{C}(P) ambiguous: when we state that 𝒞\mathcal{C} represents orthogonality, it is not clear whether we are referring to PTP=IP ^T P = I or PPT=IP P ^T = I (only one of which can be true for a non-square matrix PP). To remove this ambiguity, we will, for the computations in the rest of this manuscript, define the distance metric to be symmetric: the distance between G1G_1 and G2G_2 with |G1||G2|\left| G_1 \right| \leq \left| G_2 \right| is always D(G1,G2)D(G_1, G_2). PP is then always at least as tall as it is wide, so of the two choices of orthogonality constraint we select PTP=IP ^T P = I.

Variants of Distance Measure

Thus far we have avoided referring to this graph dissimilarity function as a “distance metric”. As we shall see later, full optimization of Equations [defn:exp_defn] and [defn:linear_defn] over α\alpha and PP is too loose, in the sense that the distances D(G1,G2)D(G_1, G_2), D(G2,G3)D(G_2, G_3), and D(G1,G3)D(G_1, G_3) do not necessarily satisfy the triangle inequality. The same is true for D̃\tilde{D}. See Section 5.3 for numerical experiments suggesting a particular parameter regime where the triangle inequality is satisfied. We thus define several restricted/augmented versions of both DD and D̃\tilde{D} which are guaranteed to satisfy the triangle inequality. These different versions are summarized in Table [tab:dist_versions]. These variously satisfy some of the conditions necessary for generalized versions of distance metrics, including:

Additionally, we note here that a distance measure on graphs using Laplacian spectra can at best be a pseudometric, since isospectral, non-isomorphic graphs are well-known to exist (Godsil and McKay 1982)(Van Dam and Haemers 2003). Characterizing the conditions under which two graphs are isospectral but not isomorphic is an open problem in spectral graph theory. However, previous computational work has led to the conjecture that “almost all” graphs are uniquely defined by their spectra (Brouwer and Haemers 2011)(Brouwer and Spence 2009)(Van Dam and Haemers 2009), in the sense that the probability of two graphs of size nn being isospectral but not isomorphic goes to 0 as nn \rightarrow \infty. Furthermore, our numerical experiments seem to show empirically that the violation of the triangle inequality is bounded, in the sense that D(G1,G3)ρ*(D(G1,G2)+D(G2,G3))D(G_1, G_3) \leq \rho * \left(D(G_1, G_2) + D(G_2, G_3) \right) for ρ2.1\rho \approx 2.1. This means that even in the least restricted case our similarity measure may be a 2.12.1-infra-pseudometric on graphs (and, since such isospectral, non-isomorphic graphs are relatively rare, it behaves like a 2.12.1-inframetric). As we will see in Chapter 3, in some more restricted cases we can prove triangle inequalities, making our measure a pseudometric. In Section 4.1, we discuss an algorithm for computing the optima in Equations ([defn:exp_defn]) and ([defn:linear_defn]). First, we discuss some theoretical properties of this dissimilarity measure.

Spectral Lower Bound

In Theorem [thm:LAP_bound] of Chapter 4 we will derive and make use of an upper bound on the graph distance D̃(G1,G2)\tilde{D}(G_1, G_2). This upper bound is calculated by constraining the variable PP to be not only orthogonal, but also P=U2MU1TP = U_2 M U_1^T where MM is the solution (i.e. “matching”, in the terminology of that section) to a Linear Assignment problem with costs given by a function of the eigenvalues of L(G1)L(G_1) and L(G2)L(G_2). In this section we derive a similar lower bound on the distance.

Let G1G_1 and G2G_2 be undirected graphs with Laplacians L1=L(G1)L_1 = L(G_1) and L2=L(G2)L_2 = L(G_2), and let α>0\alpha > 0 be constant. By Equation [eqn:simple_d_tilde], we have

D̃2(G1,G2)=infα>0infPTP=I(i=1n2j=1n1pij2(1αλj(1)αλi(2))2).\begin{aligned} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \right). \end{aligned} The following upper bound on D̃\tilde{D} is achieved by constraining PP to be not only orthogonal, but related to a constrained matching problem between the two lists of eigenvalues:

$$\label{eqn:upper_bound_2_defn} \begin{array}{ll@{}ll} \tilde{D}^2(G_1,G_2) \leq \inf_{\alpha>0} \inf_{M} & {\left| \left| \frac{1}{\alpha} M \Lambda_1 - \alpha \Lambda_2 M \right| \right|}^2_F &\\ \text{subject to}& \displaystyle\sum\limits_{i=1}^{n_2} m_{ij} \leq 1, &j=1 \ldots n_1\\ & \displaystyle\sum\limits_{j=1}^{n_1} m_{ij} \leq 1, &i=1 \ldots n_2\\ & m_{ij} \geq 0 & i = 1 \ldots n_2, j = 1 \ldots n_1, \\ \end{array}$$ where Λ1\Lambda_1 and Λ2\Lambda_2 are diagonal matrices of the eigenvalues of L1L_1 and L2L_2 respectively. Here we used the explicit map P̃=U2TPU1\tilde{P} = U_2^T P U_1 as a change of basis; we then converted the constraints on PP into equivalent constraints on P̃\tilde{P}, and imposed additional constraints so that the resulting optimization (a linear assignment problem) is an upper bound. See the proof of Theorem [thm:LAP_bound] for the details of this derivation. We show in this section that a less constrained assignment problem is a lower bound on D̃2\tilde{D}^2. We do this by computing the same mapping P̃=U2TPU1\tilde{P} = U_2^T P U_1 and then dropping some of the constraints on P̃\tilde{P} (which is equivalent to dropping constraints on PP, yielding a lower bound). The constraint PTP=IP^T P = I is the conjunction of n12n_1 ^ 2 constraints on the column vectors of PP: if 𝐩i\mathbf{p}_i is the iith column of PP, then PTP=IP^T P = I is equivalent to: 𝐩i𝐩i=1i=1n1𝐩i𝐩i=0i=1n1,j=1i1,i+1n1,\begin{aligned} \mathbf{p}_i \cdot \mathbf{p}_i = 1 & \qquad & \forall i=1 \ldots n_1 \label{eqn:p_const_l1}\\ \mathbf{p}_i \cdot \mathbf{p}_i = 0 & \qquad & \forall i = 1 \ldots n_1, j = 1 \ldots i-1, i+1 \ldots n_1, \label{eqn:p_const_l2} \end{aligned} If we discard the constraints in Equation [eqn:p_const_l2], we are left with the constraint that every column of pp must have unit norm.

Construct the “spectral lower bound matching” matrix P(SLB)P^{(\text{SLB})} as follows:

Pi,j(SLB)={1if i = argmink(1αλj(1)αλk(k))20otherwise.P^{(\text{SLB})}_{i,j} = \begin{cases} 1 & \text{if i = } \arg \min_k {\left( \frac{1}{\alpha} \lambda^{(1)}_j - \alpha \lambda^{(k)}_k \right)}^2 \\ 0 & \text{otherwise.} \end{cases} For any α\alpha, this matrix is the solution to a matching problem (less constrained than the original optimization over all PP) where each λj(1)\lambda^{(1)}_j is assigned to the closest λi(2)\lambda^{(2)}_i, allowing collisions. It clearly satisfies the constraints in Equation [eqn:p_const_l1], but may violate those in Equation [eqn:p_const_l2]. Thus, we have

D̃2(G1,G2)=infα>0infPTP=I(i=1n2j=1n1pij2(1αλj(1)αλi(2))2).D̃2(G1,G2|P(SLB))\begin{aligned} \tilde{D}^2(G_1,G_2) &= \inf_{\alpha > 0} \inf_{P^T P = I} \left( \sum\limits_{i=1}^{n_2}\sum\limits_{j=1}^{n_1} p_{ij}^2 {\left( \frac{1}{\alpha} {\lambda^{(1)}_j} - \alpha {\lambda^{(2)}_i} \right)}^2 \right). \\ &\geq \tilde{D}^2\left(G_1,G_2 \left| P^{(\text{SLB})} \right.\right) \end{aligned}

Various algorithms exist to rapidly find the member of a set of points which is closest to some reference point (for example, KD-Trees (Bentley 1975)). For any α\alpha, the spectral lower bound can be calculated by an outer loop over alpha and an inner loop which applies one of these methods. We do not consider joint optimization of the lower bound over PP and α\alpha in this work.

Summary of Distance Metric Versions

Table [tab:dist_versions] summarizes the variants of our distance metric.

Summary of this thesis’s investigation of different forms of our graph dissimilarity measure. In this work, we systematically explore properties of this measure given sparsity parameter s=0s=0, and various regimes of tt (fixed at some early time, or maximized over all tt) and α\alpha (fixed at α=1\alpha = 1, fixed at a constant power rr of the ratio of graph sizes, or minimized over all α\alpha. We leave exploration of nonzero values of the sparsity parameter to future work. Variants not explicitly called out are not considered. In the case where α\alpha and tt are both optimized and s>0s > 0, it is unclear which of the metric conditions GDD satisfies, hence the corresponding classification is left blank.
tt α\alpha ss Classification Treatment in this manuscript
Fixed at tc<ϵt_c < \epsilon Fixed at αc=1\alpha_c = 1 s=0s=0 Pseudometric Defined in Equation [defn:alpha_1_linear]. Optimized by one pass of LAP solver. Triangle inequality proven in Theorem [thm:constant_alpha_thm].
Fixed at tc<ϵt_c < \epsilon Fixed at αc=(n1n2)r\alpha_c = (\frac{n_1}{n_2})^r s=0s=0 Pseudometric Defined in Equation [eqn:TSGDD_defn]. Optimized by one pass of LAP solver. Triangle inequality proven in Theorem [thm:tsgdd_triangle].
Fixed at tc<ϵt_c < \epsilon Optimized s=0s=0 Premetric Defined in Equation [defn:linear_defn]. Optimized by Algorithm [alg:linear_alg]. Triangle inequality violations examined experimentally in Section 5.3.
Optimized Fixed at αc=1\alpha_c = 1 s=0s=0 Metric When |G1|=|G2||G_1| = |G_2|, this is Hammond et. al’s version of graph distance.
Optimized Optimized s=0s=0 Premetric Defined in Equation [defn:exp_defn]. Optimized by Algorithm [alg:exponential_alg]. Graph Product upper bound proven in Theorem [thm:sq_grid_lim_lem]. Triangle inequality violations examined experimentally in Section 5.3. Used to calculate graph distances in Sections 5.4 and 5.5.
Fixed at tc<ϵt_c < \epsilon Fixed at αc=1\alpha_c = 1 s>0s > 0 Pseudometric Triangle inequality proven in Theorem [thm:constant_alpha_thm].
Fixed at tc<ϵt_c < \epsilon Fixed at αc=(n1n2)r\alpha_c = (\frac{n_1}{n_2})^r s>0s > 0 Pseudometric Triangle inequality proven in Theorem [thm:tsgdd_triangle].
Optimized Optimized s>0s> 0 Discussed in Section 3.4.

[tab:dist_versions]


  1. Because the graphs are undirected, a stationary state is guaranteed to exist.