Depth and Breadth first graph search

Depth and breadth first graph search

Burton Rosenberg
7 November 1977

Introduction

A graph is a useful mathematical model for many practical situations. A graph is a pair of sets: a set of vertices and a set of edges. The edges are pairs of vertices, generally considered as a pair of distinct vertices (not the same vertex twice) in which order is not important. Visually, the nodes are dots and each edge connects two dots by a line. Formally one writes the graph G as a pair (V,E), V the vertex set, E the edge set, and every edge e in E is written as a pair of vertices (v,w) with v and w in V.

Searching a graph is a way of walking along edges of a graph in order to visit each vertex in the graph. We look at depth first search (DFS) and breadth first search (BFS) as the two fundamental exploration strategies.

Depth First Search

DFS is best done using a recursive algorithm which has the brief statement: visit the node, then recurse on all unvisited neighbors of the node. To help think about the process, we consider all vertices black at the start, and the vertices are colored red as they are visited. Also, we record the edges by which vertices became visited. These are called tree edges, because at the termination of the serach they will form a tree rooted at our start vertex.

Our graph my not be connected. That is, there may not be routes between any two vertices. There may be islands of vertices with no edges connecting the islands. Certainly, in this case, we cannot explore the entire graph from a single starting point if we are confined to walking along edges. The best we can do is to explore the entire connected component in which we started. That is, the subgraph of all vertices which are reachable by paths of edges from the starting vertex. Applying DFS to each component gives us a forest of trees. Altogether each tree spans its connected component: to journey by edges between any two vertices in the same connected component, it is sufficient to use only tree edges for the journey (although the other graph edges might provide "short cuts").

Here is the code for depth first searching a single component. First we look at this, then we include it as a subroutine for a full DFS which explores all connected components of the graph.

   visit(vertex v in V) {
      R U= {w}         // paint w red
      while exists edge (v,w) in E such that w is not in R {
         T U= { (v,w) }   // (v,w) is a tree edge
         visit(w)         // recurse on w
      }
   }

Along with the algorithm is the notion of a vertex being visited. A vertex v is being visited during the lifetime of subroutine call visit(v). The nesting of calls gives rise to a stack of vertices, all of which are currently being visited. Tree edges descend from the root of the connected component through the stack of vertices, in the order the vertices appear on the stack. Hence any two vertices currently being visited are in ancestor relationship, with the ancestor closer to the bottom of the stack. The vertex at the bottom of the stack is the root of T.

Claim: Run on a connected graph, at termination T is a spanning tree.
Proof: It is clear that it is a tree, we need only demonstrate that at termination every vertex v in V is in R (that it has been visited). Suppose w in V is not in R. Since the graph is connected, there is a path from the root of T to w. The root node is in R, but w is not, so there is some edge on the path (v1,v2) where v1 in R but v2 is not in R. This contradicts the algorithm which will not exit visit(v1) until all neighbors of v1 are red.

Claim: Any edge (v,w) in E of graph G connects ancestors in the tree T. That is, the graph edges not used in the spanning tree will always connect up and down the tree, never across.
Proof: Suppose edge (v,w) is an edge of the graph, not a tree edge. One of v or w was placed made red first, let's say it was v. Since visit(v) does not make (v,w) a tree edge, w must be in R when it is considered as a neighbor of v. Hence w as placed in R while v was being visited, that is, v was on the stack when w was placed at the top of the stack. Hence there is a path of tree edges from v to w which follows exactly the vertices on the stack from v to w. Hence (v,w) is an edge connecting ancestors in T.

Summary: We have demonstrated that the visit procedure will visit all vertices in the connected component in which it was started. We have shown that it will select graph edges that form a tree T which spans all the vertices in the component, which could be handy for navigating the tree, and we have shown that the edges not selected are of a special type. Finally, to search an entire graph, we start visit in each connected component:

   DFS(graph G=(V,E)) {

      R = { }   // initialize red vertices
      T = { }   // initialize tree edges
      W = { }   // initialize root vertices

      while exists vertx v in V such that v is not in R {
         W U= { v }     // v is a connect component root
         visit(v)       // d.f.s. all of v's component
      }
   }

Breadth First Search

DFS starts at a graph vertex and explores as deeply as possible before trying any alternative paths from the starting vertex. Another possibility is to explore a graph layer by layer, according to distance from the starting vertex as measured by the number of edges traversed. BFS explores a graph in this fashion.

BFS uses a queue to control the order of visiting. In our algorithm we assign depth numbers to vertices as we explore. This is not necessary, but aid in the analysis.

   BFS(vertex b in V ) {

      for every v in V, d(v) is set undefined
      initialize T, the set of tree edges, to empty
      select r in V
      d(r) = 0             // r is the root
      enqueue(r)

      while the queue is not empty {
         v = dequeue        // get a vertex from the queue for processing
         while exists (v,w) in E such that d(w) is undefined {
            T U= (v,w)      // the edge is in the spanning tree
            d(w) = d(v) + 1  // w's depth is one more than v's
            enqueue(w)     // put w in the queue for later processing
         }
      }
   }

Claim: At termination, if the graph is connected, T is a spanning tree.
Proof: T is a spanning tree if every v in V has a valid distance, that is, d(v) is not undefined. Suppose d(v) is undefined. Since the graph is connected there is a path from r to v. d(r)=0, so there is some edge on the path (v1,v2) such that d(v1) is defined, but d(v2) is not. This contradicts the algorithm.

Claim: For every vertex v, d(v) is the minimum distance by any path from r to v.
Proof: Take any path from r to v. If it is shorter than d(v) then there must be some edge in the path (v1,v2) such that | d(v1)-d(v2) | > 1. Suppose v1 has the smaller d value. When v1 was assigned this d value, d(v2) was not assigned, since d values are assigned ascending. Hence the algorithm would have assigned it d(v1)+1. A contradiction.

In DFS, edges in G not in T must go up and down the tree, connecting ancestors. They cannot go across the tree. In BFS, edges in G not in T must no go up and down the tree, they must go across, and even going across, they cannot rise or fall too much.

Claim: Let (v,w) in E be not in T. Then (v,w) connects two vertices in the tree of the same distance from the root, or the levels differ by one.
Proof: If there is an edge (v,w) then going to v then taking that edge to w would give a path of length d(v)+1 to v. Hence d(w) <= d(v)+1. Similarly d(v) <=d(w)+1. So | d(v) - d(w) | <= 1.