We say that a vertex a is strongly connected to b if there exist two paths, one from a to b and another from b to a.Note that we allow the two paths to share vertices or even to share edges. We will use a ~ b as shorthand for "a is strongly connected to b". We will allow very short paths, with one vertex and no edges, so that any vertex is strongly connected to itself.
Note that it was critical for our definition that we allowed the paths a-b and b-a to overlap. If we made a small change such as defining two vertices to be connected if they are part of a directed cycle, we wouldn't be able to concatenate the paths and show that the transitive property holds.
[a] = { b | a # b }(in English, the equivalence class of a, which we call "[a]", is defined to be simply the set of things related to a). The equivalence classes for strong connectivity are called strongly connected components.
These sets have the property that they partition the space of all vertices into disjoint subsets.
(This is not hard to prove. First, any vertex a is a member of [a] by reflexivity, so the equivalence classes cover all of the input. And second, if b is in [a] then [a]=[b] (by symmetry and transitivity, any element of one is an element of the other) so any two different equivalence classes must be disjoint.)
If we can find all the strongly connected components of a graph, it would be easy to test whether any two vertices are strongly connected: just see if they're in the same component.
For some graph problems, you can use this idea to get an algorithm that reduces the problem to subproblems on each component, plus one more subproblem on the component graph. Here's an example (this problem isn't in Baase, and I didn't get to this in my lecture, so I won't test you on it):
Suppose we define two vertices a and b to be weakly connected (also known as semiconnected) if there's either a path from a to b or one from b to a (but not necessarily both). We say the graph is weakly connected if this is true for every pair of vertices. Then it's not hard to show that a graph is weakly connected if and only if its component graph is a path. So by computing the strongly connected components, we can also test weak connectivity.
So in O(m) time we can find a single component. Since there are O(n) components, we can find them all in time O(mn). But this slower than necessary. The point of today's lecture is to show how to solve the problem in linear time. (The solution we describe, based on depth first search, was invented by Bob Tarjan in 1972. Baase ex. 4.50 outlines an alternative linear time algorithm.)
One complication (that I forgot to mention in lecture) is that we want to build a DFS tree that involves all the vertices of the graph. If we just start somewhere in the graph, not all vertices might be reachable, and the DFS will not get to them. One solution would be to restart the DFS every time this happens, but to make things a little simpler, I'm going to modify the graph by adding a new vertex connected by outward-going edges to everything else. This doesn't change the strongly connected components (except to add one new component for the one new vertex) but keeps the rest of the algorithm simpler.
DFS(G) { make a new vertex x with edges x->v for all v build directed tree T, initially a single vertex {x} visit(x) } visit(p) { for each edge p->q if q is not already in T { add p->q to T visit(q) } }This version of the pseudo-code makes it obvious that only certain edges can occur: if q is not already in T, p->q gets added, so if p->q does not end up in tree, q must be already in tree. There are three possible places q could be: an ancestor of p (in which case we call p->q a back edge), a descendant of p (in which case we call p->q a forward edge), or in a previous branch of the tree (in which case we call p->q a cross edge). The one case that's ruled out is that q can not be in a later branch of the tree.
Why?
Note that if we have paths a-b and b-a, any two intermediate vertices of those paths would have to be also in the same component (since e.g. if we have a-c-b then we already have a path a-c and by concatenating c-b-a we also get a path c-a).
So suppose one component ended up in two parts of the tree. Then it would have to have edges from one part to the other (the definition of strong connectivity tells us there must be paths, but the observation above about intermediate vertices being part of the same component tells us they would actually just be edges).
The two parts couldn't be in side by side branches of the tree, because there would be no edges in one of the two directions. But on the other hand, if one part contains an ancestor x of a vertex y in the other part, we can use the argument above about intermediate vertices to show that the path in the tree from x to y is also in the same component, contradicting the assumption that x and y are in different parts of the tree. So it is not possible to have a component in two separate parts of the DFS tree, which is what we wanted to prove.
To test this, look at the subtree of the DFS tree, rooted at v. Suppose this subtree does not have any back or cross edges going out of it. Then clearly, v must be the head of [v], since there are no paths from v to any vertex higher in the tree.
Just as clearly, if there is a back edge u-w from this subtree to an ancestor of v, v is not a head. In this case, the edge u-w together with the paths in the DFS tree from w to v and from v to u form a cycle, which must all be part of the same component [v]. But w is higher in the tree than v, so v can not be the head of this component.
The complicated case happens when the only edges going out of the subtree rooted at v are cross edges to other branches of the DFS tree. To make this complicated case a little easier, we'll set up our algorithm so that as soon as the DFS finishes visiting a vertex, if it is a head, we delete it and its component from the graph. We can show that if our algorithm does this, then whenever we see a cross edge out of the subtree from v, v is not a head.
Proof: This is where we use the fact that DFS trees have cross edges only to previously visited branches of the tree, not to later branches. Suppose we see a cross edge u-w. Let z be the head of [w], then z is visited no later than w. If z were in a separate branch of the tree, we'd have finished visiting it and deleted both it and w, contradicting the assumption that we're seeing edge u-w. So z is an ancestor of v, and putting edge u-w together with the paths w-z (by assumption that z is the head of [w]), z-v (since z is an ancestor of v) and v-u (since v is an ancestor of u) gives us a cycle, showing that v is in [z] and therefore v is not a head.Summarizing, we see that we can test whether a vertex is a head by looking for the existence of back or cross edges out of its subtree.
We use one more simple data structure, a stack L (represented as a list) which we use to identify the subtree rooted at a vertex. We simply push each new vertex onto L as we visit it; then when we have finished visiting a vertex, its subtree will be everything pushed after it onto L. If v is a head, and we've already deleted the other heads in that subtree, the remaining vertices left on L will be exactly the component [v].
We are now ready to describe the actual algorithm. It simply performs a DFS, keeping track of the low and dfsnum values defined above, using them to identify heads of components, and when finding a head deleting the whole component from the graph, using L to find the vertices of the component.
DFS(G) { make a new vertex x with edges x->v for all v initialize a counter N to zero initialize list L to empty build directed tree T, initially a single vertex {x} visit(x) } visit(p) { add p to L dfsnum(p) = N increment N low(p) = dfsnum(p) for each edge p->q if q is not already in T { add p->q to T visit(q) low(p) = min(low(p), low(q)) } else low(p) = min(low(p), dfsnum(q)) if low(p)=dfsnum(p) { output "component:" repeat remove last element v from L output v remove v from G until v=p } }We have already seen an explanation for why this algorithm works. It only remains to point out that it takes linear time -- the basic framework is just DFS, and the added manipulations of low, dfsnum, and L do not slow this down at all. So we can find strongly connected components in linear time.
ICS 161 -- Dept.
Information & Computer Science -- UC Irvine
Last update: