Describing a Tree with a String

This information is provided to help students in Math 220/317 with the Huffman Tree (Trie) assignment. The Trie/Tree created to compress the input file needs to be stored with the compressed data, so that the file can be decompressed later.

In general, a tree can be described by a string, using parentheses to demark subtrees. Here's the formula, as a string rewriting system:


   TreeFormat :: S -> (AA).         ; root with two children. 
                                    ; final period to help a dumb lex'er
                 A -> (H) | (AA)    ; child leaf | node with two children
                 H = 1 byte in hex  ; content of leaf
                 characters other than 0-9, a-f, ( and ) are ignored
                 
What this means is that you start with the symbol S, replace it with the symbol string (AA)., and I intended that (, ) and . are literal. Then you replace A with either (H) or (AA), where ( and ) are literal. Then you replace H with a hexidecimal representation of 1 byte, e.g. 3a.

Example:

          +
         / \
       23   +
           / \
          1b  2c

is ((23)((1b)(2c))). by the sequence of substitutions:
   S -> (AA). 
     -> ((H)A). 
     -> ((23)A). 
     -> ((23)(AA)).
     -> ((23)((H)A)).
     -> ((23)((1b)A)).
     -> ((23)((1b)(H))).
     -> ((23)((1b)(2c))).
This whole business is an example of a Context Free Grammar. It will be back to haunt you Junior or Senior year.

I threw a period on the end of the string to make it easy to input the characters without knowing anything about the format. Input characters to you get a period, then pass the string to a parsing routine. The parsing routine takes the string, looks symbol by symbol, reconstructing the tree. Here's the algorithm:

    BUILDTREE 
       get '(', call BUILDTREE_AUX returns root, get ')', done

    BUILDTREE_AUX (recursive)
       (1) create tree node, set leaf false, content 0, 
           local workingOn flag to LEFT. 
       (2) if char. is a hex digit, set leaf true, update content
       (3) if char. is '(', call BUILDTREE_AUX returns left/right
           subtree according to whether workingOn is LEFT or RIGHT.
           Set workingOne to RIGHT.
       (4) if char. is ')', Return pointer to tree node to caller
       (5) everything else, ignore.
Each HuffmanTrieNode has a leftChild, rightChild, a boolean flag marking the node as a leaf or not, and a content location to be filled in with data on the leaft nodes.

The other side of the story is given a tree, to derive the string describing the tree. This is classic recursuion. Very simple:

   DESCRIBE_TREE:
       write '(', 
       if leaf true write content, 
       else recurse on left s/tree then right s/tree,
       write ')', 
       return.
Oops, don't forget the '.' at the end! I'd make a recursive describe-tree and a non-recursive shell which adds the dot at the end, and feeds the recursive version the tree root as a starting node.

Good luck!