General and Special Trees


In this lecture we will look at variations of simple binary trees (beyond
BSTs, Heaps, and AVL trees). Specifically, we will see N-ary trees, for storing
nodes that have an arbitrary number of childen; expression trees, for storing
formulas, quadtrees for storing pictures, and digital trees for storing
dictionaries (and Sets and Maps). In these trees we will see examples of how to
store a node's children by using multiple references (just as in binary trees)
as well as using the set, list, and map collection classes to store children.


N-ary Trees with Sets and Lists:

To start, we will examine a simple way to store N-ary trees: trees that can
have any number (N) of children.  Examples of information that we can represent
in N-ary trees are inheritance hierarchies (each class can be extended by any
number of classes, but each class has a unique superclass) and file structures
(each folder can contain an arbitrary number of files and other folders). To
store the later we might declare

public class FolderOrFile {
  public FolderOrFile (String name, String contents, boolean isFolder) {
   this.name= name;
   if (isFolder)
     children      = new ArraySet<FolderFile>();
   else
     this.contents = contents;

  }

  public String            name;
  public String            contents;  //Null for folders: see children
  public Set<FolderOrFile> children;  //Null for files  : see contents
}

which can represent an arbitrary file structure.

An example of how we can process such a structure, say to print all the names
inside a folder (including names inside folders inside that folder, etc.), is
shown below: basically it implements a preorder traversal of an N-ary tree.

  public static void printNames (FolderOrFile ff) {
    if (ff == null)
      return;
    System.out.println(ff.name);
    if (children != null)                    //is this a folder, not a file?
      for (FolderOrFile f :  ff.children)
        printNames( f );
  }

Actually, assuming a root folder, we will never have a base call of null:
if a folder (or file) has no children (files by definition have no children),
then the children instance variable would be null.

Another way to build such a structure is that all folders and files refer to a
set of children, but some of those sets have size()=0, meaning the for loop
executes 0 times (its body is never executed).

If we wanted to sort the children, or keep them in a specific (alphabetical?)
order, we could  use a List collection instead of a Set collection to do the
job. The for-each loop in the code above for printNames works regardless of the
kind of collection we use to implement children, so long as the collection is
Iterable.


N-ary Trees embedded in Binary Trees:

Interestingly enough, we can use a strange kind of binary tree (it has enough
complexity) to store all the information in an N-ary tree. We would define such
a class as follows

  public class NTN {
    public String name;
    public NTN  firstChild;     //First for a linked list of children
    public NTN  nextSibling;    //Next in linked list of siblings

    public NTN (String s, TN fc, TN ns)
    {name = s; firstChild = fc; nextSibling = ns}
  }

Here we use the two "recursive" references differently than in a binary tree
(where we represent references to left and right subtrees) and differently than
in a doubly-linked list (where we represent previous and next references). We
have each node refer to its next sibling and to its first child (and of course,
we can use the the first child to refer to all its siblings by following their
nextSibling references, and eventually find all the children of its parent).
Note that the root node will have no siblings, but all other nodes can have 
siblings (but don't have to).

It is interesting to think about how a node with two references can be used to
represent three such different structures as doubly-linked lists, binary trees,
and N-ary trees, each with a very different use of references than the other.

The web reading page shows the equivalent of a small directory represented
with this kind of data structure.

As a translation of the example above, we can write code that will print all
the names inside an NTN tree (again it implements a preorder traversal of an
N-ary tree). Here we print the name for a node, and then all the names
reachable from its children).

  public static void printNames (NTN n) {
    if (n == null)
      return;
    System.out.println(n.name);
    for (NTN r=t.firstChild; r!=null; r=r.nextSibling)
      printNames(r);
  }

In fact, we can rewrite this method purely recusively.

  public static void printNames (NTN n) {
    if (n == null)
      return;
    System.out.println(t.name);
    printNames(t.firstChild);
    printNames(t.nexSibling);
  }

Here we really need to check the base case n == null, to terminate
the recursion  (as we checked r!=null in the for loop above). Each node prints
its own name, all the names reachable from its first child, and all the names
reachable from its next sibling (which will include the next sibling's
children).

It would also be useful to have a parent reference in the NTN class, so that
every node could reach its unique parent.


Quad Trees:

We next will briefly discuss Quad trees, which we can use to represent pictures
that are rendered (drawn) on the screen NOT top to bottom, but from fuzzy
pictures to clear pictures. For pictures that contain large areas of the same
color, a Quad tree can store a picture more compactly than an array of pixels.

In a Quad tree, each node has exactly 4 children. We can represent a quad tree
to store a picture by

  public class QTN {
    public int   size;      //size x size
    public int   avgRed;    //of all pixels it contains
    public int   avgGreen;
    public int   avgBlue;
    public QTN[] children;  //Always 0 or 4 children

    public DTN (int s, int r, int g, int b)
    {size = s; avgRed = r; avgGreen = g; avgBlue = b}
  }

Each Quad tree node represent a square of size^2 pixels. If the entire square
has a uniform color, then the "avg" instance variables store its red, green,
and blue components, and the children field stores null. Actually, even if the
square is not one uniform color, the "avg" instances store the average amount
of red, green, and blue). If the square is not a uniform color, the children
divide the square into four quadrants (numbered 0-3)and each represents one
quadrant of a child, whose vertical and horizontal size is 1/2 that of its
parent (each of the 4 quadrants is 1/2 * 1/2 = 1/4 the size of its parent).

  +--------+--------+
  |        |        |
  |   0    |   1    |
  |        |        |
  +--------+--------+
  |        |        |
  |   2    |   3    |
  |        |        |
  +--------+--------+

This process of creating children to represent the picture continues, at each
level breaking its picture into four 1/4-sized pictures, until a  child QTN is
all one color. This might be because the quadrant is a single pixel (a base
case), or it might be because the quadrant is a square of pixels with a uniform
color.

By using a Quad tree we can render a picture, not so much top to bottom in full
detail, but by refining each quadrant, quadrants in a quadrant, etc. until the
entire picture is rendered. In this way the picture starts out as a blur at the
root, but with the right approximate color distribution, and the picture gets
sharper and sharper as we process each subtree (breadth first) in the picture,
filling in more of its its details.

This is accomplished with a breadth first traversal of the Quad tree, where we
render each quadrant a depth 0, then each quadrant a depth 1, then each
quadrant a depth 2, etc. Each increase in depth increases the resolution by a
factor of 4.

Obviously, storing all the information in this tree can take more space than
just storing all the pixels, but the space requirements can be less if many
small (or a few large) squares store the same color.


Expression Trees:

Compilers first parse a program (using the syntax of the language) by
converting it into a tree representing its syntax. In this section we will
use expression trees to represent and process expression.

The web reading shows how this is naturally done, by adding a special subclass
for each operator in an inheritance hierarchy. A program you can download uses
a Stack to translate expression into such a tree (and evaluate what value the
tree represents). Here we are going to use just a single class to store and
process such trees (simpler but not as elegant, and not as easy to extend to
expressions with more operators).

  public class ExprTree {
    public ExprTree (String oOV, ExprTree left, ExprTree right)
    {opOrVal = oOV; this.left = left; this.right = right;}

    public String opOrVal;        //Either something like "+" or "3"
    public ExprTree left, right;  //non-null for operators (refer to operands)
 }

The trees are drawn just like the ones shown in the web reading, with the
operator or value in the opOrVal instance variable, and if not a value,
references to the sub expressions.

Note that when converting an expression to an ExprTree, the latter an operator
is applied the higher it appears in the tree. So, we have the following
examples.

  2 + 3 * 5          (2+3) * 5          1 + 2 + 3

      +                  *                  +
    /   \              /   \              /    \
   2    *             +     5            +      3
       /  \          /  \               /  \
      3    5        2    3             1   2

Notice in the last example, because + is "left associative", the first + is
evaluated before the second one. To be completely accurate in drawing
expression trees, you must follow associativity reuls for equal precendence
operators. Note that there are no null expression trees: the smallest
expression tree contains a value: e.g., new ExprTree("3",null,null).

We will discuss in class, for a uniprocessor, the number of time steps it takes
to evaluate such an expression is equal to the number of internal nodes in the
tree; for a multiprocessor it is the height of the tree (using the
multiprocessor to simultaneously evaluate all nodes at a depth, going upward).

Let us see some simple recursive code to evaluate an expression. Note that we
can recognize an ExprTree as a base case value, if null stored is stored in its
left and right subexpressions. Note that if we introduced unary operators, the
left subexpression for one would be null but the right subexpression would
refer to the value the unary operator operated on.

  public int evaluate (ExprTree e) {
    if (t.left == null && t.right == null)   //Leaf (a value)?
      return Integer.parseInt(t.opOrVal);    //return int equivalent of String
 
    else {
       if (t.opOrVal.equals("+"))
         return evaluate(t.left) + evaluate(t.right);

       if (t.opOrVal.equals("-"))
         return evaluate(t.left) - evaluate(t.right);

       if (t.opOrVal.equals("*"))
         return evaluate(t.left) * evaluate(t.right);

       if (t.opOrVal.equals("/"))
         return evaluate(t.left) / evaluate(t.right);

       throw new IllegalOperatorException(t.opOrVal);
   }

This code performs is a postorder traversal of a tree: computing the values of
the left and right subtrees, then using knowledge of the operator to return the
correct result for the operator at the root of the subtree.

We could rename opOrVal to opOrValOrVar; the value "r" stored in this instance
variable would refer to the variable r. To compute the value of the variable,
we could use a Map[String -> String], so the keys in this map are variable
names and the values in this map are the values stored in these variables.
Such a map is called an "environment"; we can rewrite our evaluate code as
follows (let's assume we write methods called isValue and isVariable)

  public int evaluate (ExprTree e, Map<String,String> env) {
    if (t.left == null && t.right == null) {

      if (isValue(t.opOrValOrVar))	    //Value
        return Integer.parseInt(t.opOrVal); //return int equivalent of String
 
      if (isVariable(t.opOrValOrVar))       //Variable
        if (env.contains(t.opOrValOrVar))   //Return its value if in env
          return Integer.parseInt(env.get(t.opOrValOrVar));
        else
          throw new IllegalVariableException(t.opOrVal);
         
    }else {
       if (t.opOrVal.equals("+"))
         return evaluate(t.left) + evaluate(t.right);
       if (t.opOrVal.equals("-"))
         return evaluate(t.left) - evaluate(t.right);
       if (t.opOrVal.equals("*"))
         return evaluate(t.left) * evaluate(t.right);
       if (t.opOrVal.equals("/"))
         return evaluate(t.left) / evaluate(t.right);

       throw new IllegalOperatorException(t.opOrVal);
   }

Problem: What if we allowed the = operator (which changes the value assocciated
with a variable and results in that new value); how could we update evaluate to
include the semantics/meaning of this other operator? We could add the
following code

       if (t.opOrVal.equals("=")) {
         if (!isVariable(t.left) || !env.contains(t.left))
           throw new IllegalVariableException(t.left);
         else {
           int rightValue = evaluate(t.right);
           env.put(t.left,rightValue+"");
           return rightValue;
         }
       }


Digital Trees:

Finally, we will examine Digital Trees (also known as "tries", pronounced as
in the word reTRIEval -so just like "tree"). As an example we can use digital
trees to store a dictionary of words for a spelling correction utility. The
standard way to store such a structure would be as a Set of legal words. So
far, the most efficient way we have seen to store such a collection is an AVL
tree. Recall that an AVL tress is a special kind of BST that is guaranteed to
be well balanced, so the time to search for a word would be at worst O(Log N).

Using a digital tree, we can reduce the complexity of many of its important
operations (add, remove, and insert) to O(1)! It is the same regardless of how
many words are stored in the tree; it depends not on how many words are in the
tree, but only on how many letters are in that word. So, we might say if a word
contained M letters the time is O(M). Technically, since each comparison in the
AVL tree might requiring looking at all M letters (String comparisons are
letter by letter comparisons until there is a letter mismatch or one of the
Strings runs out of letters), we would then have to list the AVL tree's
complexity, if using the same metric, as O(M Log N). So digital trees do save
a factor of Log N.

A digital tree means the value that we want to look up can be broken down into
its "digits". For example, to look up the integer 153 we look at the digit 1,
then the digit 5, then the digit 3; likewise,to look up the String "yes" we
look at the "digit" "y", then the "digit" "e", then the "digit" "s". So, the
characters are the "digits" for a String.

To store a digital tree for processing Strings, we migh represent it as
follows. Note here, the children are collected in a Map, with each key a String
(really just one char, which we will represent as a String) and each value
another DTN.

  public class DTN {
    public boolean          isWord;
    public String           wordToHere;
    public Map<String,DTN>  children;

    public DTN (boolean iw, String wth)
    {isWord = iw; wordToHere = wth; children = new ArrayMap<String,DTN>();}
  }

How do we add a word?

We always start with a root DTN (whose isWord is false and whose wordToHere is
""). It represents a word of 0 characters (of which there aren't any!). To add
a word, say "ant", we start at the root.

  If its children map contains the first letter , "a", we find the value DTN
  assocated with this letter: a subtree containing all the words starting with
  the digit "a". We then repeat this process again from there, with the next
  letter. If we get to the end of the word and we have not needed to create a
  new DTN node to reach that spot, we change isWord of that node to be true.

  If at any time a node's children map does not contain the letter we need, we
  put that new letter into the map (with a value that is a DTN node with isWord
  false and wordToHere containing all the needed letters) and follow it. So, if
  the root's children map DOES NOT contain the first letter, "a", we add to
  that map a key of "a" with a value DTN whose isWord is false and wordToHere
  being ""+"a" (the wordToHere of its parent, extended by its letter). We use
  this node to represent the subtree whose children are all the words starting
  with "a". Then we repeat this process with all subsequent letters: for the
  last node constructed, we set its isWord to true.(since we have processed all
  the letters in a word)

Note that each map will contain at most 26 entries, one for each possible
letter in a word: we'll assume only lowercase letters; of course we could use
both cases and increase the map's size at most to 52). In fact, we could use
an array to store these 26 reference: to look up character c we use the array
indexed at c-'a' (Java impllcitly converts c into an int, then subtracts the
value of 'a' converted to an int: 'a'-'a' = 0, 'b'-'a' = 1, ... 'z'-'a' = 25.

Thus, the word "anteater" and "anthem" share the structure "ant", then
in the children map for "ant", the key "e" leads to a DTN on the path to
"anteater" and the key "h" leads to a different DTN on the path to "anthem".
This sharing is illustrated in the picture below.

                          root
                            |a				  a -> node
                         "a" False
                            |n				  n -> node
                         "an" False (unless "an" added)
                            |t				  t -> node
                         "ant" False (unless "ant" added)
                    /e          \h			  e -> node; h -> node
             "ante" False     "anth" False
               |a               |e			  ...
             "antea" False    "anthe" False
               |t               |m			  ...
             "anteat" False   "anthem" True
               |e					  ...
             "anteate" False
               |r					  ...
             "anteater" True

Note that there is one True for every word. If we put "a" in the digital tree,
the only change would be change the isWord from False to True in the child of
the root.
             
To search for a word, we use each of its letters, in sequence, to "get" the
next DTN, until we either run off the tree (word not present) or get to the
last letter -in which case isWord tells whether or not it is a word. The method
for this is easy to write.

That is, in the above example, looking up "ant" would return true; "anthemly"
would return false; "anthe" would return false.

 public static boolean isAWord (DTN t, String remainingLetters) {
    if (t == null)
      return false;
    else
      if (remainingLetters.equals(""))
        return t.isWord;
     else
       return isAWord(t.children.get(remainingLetters.substring(0,1)),
                      remainingLetters.substring(1));
  }

Here remainingLetters.substring(0,1)is a String containing just the first
letter (the key to the map), and remainingLetters.substring(1) is a String
containing all but the first letter: both the tree and the String are getting
"smaller" in each recursive call: the tree getting closer to the null base
case; the String getting closer to the empty ("") base case.

Removing a word is a bit more subtle: we can search for the word and set its
isWord to false; we can actually remove the node from the tree if it has no
children, and do the same with its ancestors that are not words until we find a
node whose isWord is true; from that point on we must leave this node in the
tree so its children words can be found. There are lots of nodes in the tree
that don't represent words.

So, in the tree above, we can remove "ant" by setting its isWord to false. We
can then remove "anteater" by getting rid of nodes with the words "anteater",
"anteate", "anteat", "antea", "ante", but must stop there, because if we
deleted the node with "ant", the digital tree would not store the word
"anthem".

We can use digital trees to represent sets and maps (instead of isWord, use an
instance variable that refers to the value associated with a key) and very
quickly add/lookup/remove words, all in O(1) - or O(M) where the word we are
operating on has M letters.

Finally, we will review the difference between a data type and a data
structure. Recall that a data type is most like an interface in Java. It
specifies the operations that one can perform, but makes no commitment as to
how the information is represented or how the operations are accomplished. A
data structure is most like a class in Java. There can be many data structures
that implement a data type, each using a different way to encode the data and
perform the operations.

Once a programmer solves a problem using data types, he/she can use any data
structure implementing them to actually run the program. All data structures
should produce the same result, but some will run faster than others, depending
on the complexity class of their operations.

At the start, this course was all about Data Types (collection classes: Queue,
Stack, PriorityQueue, List, Set, Map): we learned to solve problems using
compositions of these data types, modeling the needed data and operations; then
it became about the various ways to implement them (Arrays, Linked Lists,
Heaps, Binary Search Trees, AVL trees, Digital Trees, and coming soon Hash 
Tables, Skip Lists, etc.) where we also characterize these data structures by
the complexity classes of their operations. Once we solve problems (as in
Program #1), we can easily interchange different data structures that implement
these data types to find one that performs the best: typically we just replace
something like new ArraySet<...> by new DigitalTreeSet<...>.