Searching a graph is a way of walking along edges of a graph in order to visit each vertex in the graph. We look at depth first search (DFS) and breadth first search (BFS) as the two fundamental exploration strategies.
Our graph my not be connected. That is, there may not be routes between any two vertices. There may be islands of vertices with no edges connecting the islands. Certainly, in this case, we cannot explore the entire graph from a single starting point if we are confined to walking along edges. The best we can do is to explore the entire connected component in which we started. That is, the subgraph of all vertices which are reachable by paths of edges from the starting vertex. Applying DFS to each component gives us a forest of trees. Altogether each tree spans its connected component: to journey by edges between any two vertices in the same connected component, it is sufficient to use only tree edges for the journey (although the other graph edges might provide "short cuts").
Here is the code for depth first searching a single component. First we look at this, then we include it as a subroutine for a full DFS which explores all connected components of the graph.
visit(vertex v in V) { R U= {w} // paint w red while exists edge (v,w) in E such that w is not in R { T U= { (v,w) } // (v,w) is a tree edge visit(w) // recurse on w } }Along with the algorithm is the notion of a vertex being visited. A vertex v is being visited during the lifetime of subroutine call visit(v). The nesting of calls gives rise to a stack of vertices, all of which are currently being visited. Tree edges descend from the root of the connected component through the stack of vertices, in the order the vertices appear on the stack. Hence any two vertices currently being visited are in ancestor relationship, with the ancestor closer to the bottom of the stack. The vertex at the bottom of the stack is the root of T.
Claim: Run on a connected graph, at termination T
is a spanning tree.
Proof: It is clear that it is a tree, we need only
demonstrate that at termination every vertex v in V is in R
(that it has been visited). Suppose w in V is not in R.
Since the graph is connected, there is a path from the root of
T to w. The root node is in R, but w is not, so there is some
edge on the path (v1,v2) where v1 in R but v2 is not in R.
This contradicts the algorithm which will not exit visit(v1) until
all neighbors of v1 are red.
Claim: Any edge (v,w) in E of graph G connects
ancestors in the tree T. That is, the graph edges not used in the
spanning tree will always connect up and down the tree, never
across.
Proof:
Suppose edge (v,w)
is an edge of the graph, not a tree edge.
One of v or w was placed made red first, let's say it was v. Since visit(v)
does not make (v,w) a tree edge, w must be in R when it is considered
as a neighbor of v. Hence w as placed in R while v was being visited,
that is, v was on the stack when w was placed at the top of the stack.
Hence there is a path of tree edges from v to w which follows exactly
the vertices on the stack from v to w. Hence (v,w) is an edge connecting
ancestors in T.
Summary: We have demonstrated that the visit procedure will visit all vertices in the connected component in which it was started. We have shown that it will select graph edges that form a tree T which spans all the vertices in the component, which could be handy for navigating the tree, and we have shown that the edges not selected are of a special type. Finally, to search an entire graph, we start visit in each connected component:
DFS(graph G=(V,E)) { R = { } // initialize red vertices T = { } // initialize tree edges W = { } // initialize root vertices while exists vertx v in V such that v is not in R { W U= { v } // v is a connect component root visit(v) // d.f.s. all of v's component } }
BFS uses a queue to control the order of visiting. In our algorithm we assign depth numbers to vertices as we explore. This is not necessary, but aid in the analysis.
BFS(vertex b in V ) { for every v in V, d(v) is set undefined initialize T, the set of tree edges, to empty select r in V d(r) = 0 // r is the root enqueue(r) while the queue is not empty { v = dequeue // get a vertex from the queue for processing while exists (v,w) in E such that d(w) is undefined { T U= (v,w) // the edge is in the spanning tree d(w) = d(v) + 1 // w's depth is one more than v's enqueue(w) // put w in the queue for later processing } } }
Claim: At termination, if the graph is connected, T is a
spanning tree.
Proof:
T is a spanning tree if every v in V has a valid distance, that is,
d(v) is not undefined. Suppose d(v) is undefined. Since the graph
is connected there is a path from r to v. d(r)=0, so there
is some edge on the path (v1,v2) such that d(v1) is defined, but
d(v2) is not. This contradicts the algorithm.
Claim:
For every vertex v, d(v) is the minimum distance by any path from
r to v.
Proof: Take any path from r to v. If it is shorter than
d(v) then there must be some edge in the path (v1,v2) such that
| d(v1)-d(v2) | > 1. Suppose v1 has the smaller d value. When v1 was
assigned this d value, d(v2) was not assigned, since d values are assigned
ascending. Hence the algorithm would have assigned it d(v1)+1.
A contradiction.
In DFS, edges in G not in T must go up and down the tree, connecting ancestors. They cannot go across the tree. In BFS, edges in G not in T must no go up and down the tree, they must go across, and even going across, they cannot rise or fall too much.
Claim: Let (v,w) in E be not in T. Then (v,w) connects two
vertices in the tree of the same distance from the root, or the levels
differ by one.
Proof:
If there is an edge (v,w) then going to v then taking that edge to
w would give a path of length d(v)+1 to v.
Hence d(w) <= d(v)+1. Similarly d(v) <=d(w)+1. So
| d(v) - d(w) | <= 1.
Copyright © 1998 Burton Rosenberg. All rights reserved.
Last Update:
Fri Nov 7 15:49:21 EST 1997