Special Trees

Introduction to Computer Science II
ICS-22

In this lecture we will continue our study of trees by examining a few special kinds of trees. First we will discuss heaps, which are trees with a special order and structure property: these trees are perfectly suited for implementing a fast priority queue collection class (and an O(NLog₂N) sorting method). Second we will discuss how to represent and process N-ary trees, where each parent can store any number of children; we can use N-ary trees in applications like file directories. Third, we will discuss how to represent arithmetic expressions in structure trees, including how to create (with a stack) and evaluate such expression trees using the rules of parentheses and operator precendence. This is like writing a tiny parser for expressions, and is covered in any course on compiler writing. Finally, time permitting, we will discuss digital trees, which are a special type of N-ary tree in which searches can be performed exceptionally fast O(1), because the time is based on the size of the value stored, not the number of values stored.

The primary goal of this lecture is to understand these special trees via pictures and algorithms (which modify pictures). We will examine some methods, actually implemented in Java, but that is not the focus of this lecture: the concepts are more important than the code.

Heaps

A heap is a binary tree with a special ordering property and a special structure property. Taken together these two properties allow for a very efficient implementation of priority queues (measured both in time and space). Here we will discuss min-heaps: priority queues use max-heaps.

Ordering Property (Min-Heap): the value stored at any node must be less than or equal to each of the values stored in its subtrees (for a Max-Heap, it must be greater than or equal to). By transitivity, we can easily check this property, by ensuring that each node stores a value that is less than the values stored in the roots of its subtrees).
Structure Property: all depths but the deepest must be filled; if the deepest depth is not filled, all the values must occur as far to left as possible.

We will continue examining Min-Heaps in this lecture, but everything we learn easily generalizes to Max-Heaps (which are used for priority queues). By the ordering property, the smallest value in the heap must appear at the root. The following example illustrates a heap storing int values. Notice that both the ordering property and the structure property are satisfied by this tree.

The first observation to make about heaps is that they are useless for doing binary searches. This ordering property doesn't help us: if searching for a value, we do not know whether to look in the left or right subtree, because the ordering property dictates only that larger values are in the subtrees.

But the ordering/structure property for heaps makes it very easy (quick) to add a new value and remove the smallest value. We will discuss the operation of these two algorithms next.

The following algorithm inserts a new value into a heap, ensuring that both the ordering and structure properties are invariant: if true before the insertion, they are true after the insertion. To insert a new value in a heap:

By the structure property, the new node must be added into the tree either (a) following the right-most node at the deepest (unfilled) depth, or (b) as the left-most node at an increased depth if the tree is already perfect.
Once the value has been placed property in the tree, by the ordering property, continually compare its value to the value in its parent: flip them when they are out of order; and compare this value to the value of its new parent; stop when the added value is flipped into the root or it is bigger than its parent.

Following this algorithm, the newly added value will percolate upwards until it finds its correct resting spot. Each iteration flips a larger-value parent with a smaller-valued child, so the ordering property continues to hold. The following example illustrates adding the value 18 to the heap depicted above.

Notice that both the ordering property and the structure property remain satisfied after performing the insertion (with the new value, 18, percolating almost to the root of the tree, but not quite). In fact, if we had inserted 10, it would have percolated upwards all the way to the root of the tree (because it would be the smallest value in the entire heap). To build a heap with N values, we would perform this insertion operation N times, starting with an empty tree.

Practice inserting a few more values until you get the general idea of the algorithm

Note that the complexity class of inserting a value is truly O(Log₂N), because by the structure property, heaps always store prefectly balanced trees (their height is always the log of their size). And, the number of percolations performed is at most the height of the tree.

Next, we will examine an algorithm for removing the minimum value from a Min-Heap, ensuring that both the ordering and structure properties are invariant: if true before the removal, they are true after the removal. To remove the minimum values in a heap:

By the ordering property, the minimum value is at the root of the heap (remove this value from the root node, but leave the root node in place).
By the structure property, the node that must be deleted from the tree is the right-most node at the deepest depth; delete this node, but first put its value in the root node.
By the ordering property, continually compare this value to the values of its children: if any child is smaller, swap the value with its smallest child, and repeat this process; stop when the values is smaller than both its children.

Following this algorithm, the last value (promoted to the root) will percolate downwards until it finds its correct resting spot. Each iteration replaces a larger-value parent with its smallest-valued child, so the ordering property continues to hold. The following example illustrates removing the minimum from the heap depicted above.

See the nodes where the values 18, 27 and 29 have ultimately moved. Notice that both the ordering property and the structure property remain satisfied after performing the removal.

Practice removing a few more values until you get the general idea of the algorithm

Note that the complexity class of removing the minimum value is truly O(Log₂N), because by the structure property, heaps always store prefectly balanced trees (their height is always the log of their size). And, the number of percolations performed is at most the height of the tree.

This means that the complexity class of enqueuing N values and then dequeuing N values (the standard way to measure the complexity class of a collection class) is O(N Log₂N) + O(N Log₂N) = O(N Log₂N). This complexity class is much better than our previous implementations, which were O(N²) (when either enqueue or dequeue was O(1) while the other was O(N). Having each operation O(Log₂N) (worse than O(1) but better than O(N) seems to balance things out better. The logarithm function is closer than constant than to linear growth. In fact, we can sort an array in O(N Log₂N) by the equivalent of enqueuing all its values and then dequeuing them.

Finally, the structure property of heaps allows us to easily store them as contiguous values in arrays, without the use of explicit child/parent references: references to left/right children and parents can be calculated via the indexes. To do this, we do the following:

Store the root of the tree at index 1 (leave index 0 unfilled).
Store the left child of the node at index i at index 2*i.
Store the right child of the node at index i at index 2*i+1.

For a heap with size N, the values are stored in an N+1 valued array, in indexes 1 through N. The following example illustrates how a heap is stored in an array.

Note that the parent of any child stored in index i is stored at index i/2.

The left child of the root is stored at index 2 (its left and right children are stored at indices 4 and 5 respectively); the right child of the root is stored at index 3 (its left and right children are stored at indices 6 and 7 respectively). By continuing this process, we can observe that every index is filled in and there are no collisions (multiple values stored in the same index).

Thus, we can unambiguously store any heap of N values in an array of size N+1, storing its nodes uniquely between indexes 1 and N. As the algorithms above require, we can easily find the location of the node to add (for insertion) or node to remove (deletion): index N+1 and index N respectively.

General N-ary Trees

Binary trees store references to their left and right subtrees: each parent has exactly 0, 1, or 2 children. We will now explore a few interesting ways to generalize trees to allow each parent to store references to any number of children. Such trees are called N-ary trees, and we can use them to represent the tree structure of a file system, where every node is either a file, or a folder (that can include other files and folders).

Here is an example of an N-ary tree representing a directory tree (with folder names in pink and file names in white)

There are many ways to implement an N-ary tree structure. We could, for example, have one instance named children that stored a reference to a collection class of children: List if there was an important order among its children, and Set if there was no important order; of course, we can use an array for this information too, but using a collection class often makes things simpler.

  public class FolderFile {
    public FolderFile (String s)
    {name = s; children = new HashSet();}

    public String name;
    public Set    children;
  }

Given this representation, here is a recursive method that prints the names of all the the folders and files stored inside the FolderFile supplied to its parameter (no matter how many levels deep). It uses a combination of iteration and recursion to reach every node in the tree.

  public static void printNames (FolderFile ff)
  {
     System.out.println(ff.name);
     for (Iterator i = ff.children.iterator(); i.hasNext(); ) {
       FolderFile aChild = (FolderFile)i.next()
       printNames( aChild );
     }
  }

Note that this is a form of preorder traversal: a node is printed before the recursive call made on its children.

The program Directory Lister shows how we can explore directory structures in Java, using the File class from Java's standard library. In fact, Java uses arrays to list all the files available in a folder, so they are processed in a manner similar to the code above.

Surprisingly, we can also use a standard binary tree to store an N-ary tree, when we assign the two references different meanings. The basic idea behind N-ary trees stored as binary trees is that that each node refers to its first child and its next sibling. Thus, as with regular binary trees, we still define such trees using two recursive references; but their meanings, and how such trees are processed are very different. The general form for defining N-ary tree nodes is

  public class NTN {
    public int value;
    public NTN  firstChild,sibling;

    public NTN (int i, TN fc, TN s)
    {value = i; firstChild = f; sibling = s}
  }

For example, to represent the directory tree above, we use NTNs with the following references. Notice that each node refers downward (and to the left) for its first child, and rightward to its next sybling.

For actually defining directory data, we will define the following three classes in a small inheritance hierarchy. DirectoryEntry is the superclass for both File and Folder. It supplies code for a variety of methods, including getFirstChild (which always returns null but is overridden by Directory) and getSize (which always returns 0 but is overridden by File). All directory entries have siblings, but only Folders have children.

  public class DirectoryEntry {
    public DirectoryEntry (String name)
    {this.name = name; next = null;}

    public DirectoryEntry getFirstChild()
    {return null;}

    public DirectoryEntry getNextSibling()
    {return next;}

    public void addSibling(DirectoryEntry de)
    {
       DirectoryEntry c = this;
       for (; c.next != null; c=c.next) 
         {}
       c.next = de;
    }

    public int getSize()
    {return 0;}

    private String         name;
    private DirectoryEntry next;
  }

  
  public class File extends DirectoryEntry {
    public File (String name, int size)
    {super(name); this.size = size;}

    public int getSize()
    {return size;}

    private int size;
  }


  public class Folder extend DirectoryEntry {
    public Folder (String name)
    {super(name); firstChild = null;}

    public DirectoryEntry getFirstChild()
    {return firstChild;}

    public void addChild(DirectoryEntry de)
    {
      if (firstChild == null)
        firstChild = de;
      else
        firstChild.addSibling(de);
     }

    private DirectoryEntry firstChild;
  }

Now suppose that we wanted to compute the height of this N-ary tree (how deep one has to go to get to the deepest folder or file). We could use the following code, which combines iteration (for siblings) with recursion (for children).

  public static int height (DirectoryEntry de)
  {
    if (de == null)
       return -1;
    else {
      int maxChildHeight = -1;
      for (DirectoryEntry c=de.getFirstChild(); c!=null; c=c.getSibling()) {
         int childHeight = height(c);
         if (childHeight > maxChildHeight)
           maxChildHeight = childHeight;
      }
      return 1 + maxChildHeight;
    }
  }

We can write this code amazingly more elegantly and compactly by using double recursion. It does require two methods. First,

  public static int height (DirectoryEntry de)
  {return 1 + heightHelper(de.firstChild);}

where heightHelper computes the maximum height of a node or any of its subsequent siblings.

  public static int heightHelper (DirectoryEntry de)
  {
    if (de == null)
      return -1;
    else
      return Math.max( 1+heightHelper(de.firstChild),
                       heightHelper(de.getSibling) );
  }

Finally, if we wanted to compute the total size of all the files in a directory (like adding up the sizes in all the nodes in a binary tree), we could use the following code (also, two methods)

  public static int size (DirectoryEntry de)
  {
    if (de == null)
      return 0;
    else
      return de.getSize() + sizeHelper(de.firstChild);
  }

where sizeHelper computes the size of a node or any of its subsequent siblings.

  public static int size (DirectoryEntry de)
  {
    if (de == null)
      return 0;
    else
      return de.getSize() + sizeHelper(de.firstChild) + sizeHelper(de.getSibling);
  }

Expression Trees

We can also use binary trees to model Java expressions (using binary and unary operators -whose left subtrees are empty and whose right subtrees contain the expression to apply the unary operator to). In such trees, operators are all internal nodes and literals are all leaf nodes. The following picture illustrate an arithmetic expression written in infix form, and its expression tree.

It it interesting to observe that the Reverse Polish Notation (RPN) translation of any expression can be computed by a postorder traversal of its tree (printing the value, operator or constant, of each node).

Note that there are no parentheses in the tree; the operator precedence (including parentheses which override precedence) can alter the structure of the tree, even though they don't appear in the tree proper. Also note that by modeling an expression as a tree, we can answer certain questions about the expression by computing information on the model. For a computer with a single arithmetic unit, the amount of time it will take to evaluate an expression is computed by the number of internal nodes. For a computer with many arithmetic units, the amount of time it will take to evaluate an expression is computed by the height of the tree (because all operators at the same depth bacn be computed at the same time by the multiple arithmetic units).

We can use inheritance to model expressions easily. The classes definining expression trees (both abstract and concrete) form the following inheritance hierarchy.

At the top of this hierarchy (descended from Object) is generic class ExpressionTree, which is defined as follows.

  public abstract class ExpressionTree {
    public abstract int    evaluate();
    public abstract String postfix();
	
    //This is called a "Factory" Method. It constructs an expression tree
    public static ExpressionTree makeET
      (String op, ExpressionTree left, ExpressionTree right)
    {
      if (op.equals("+"))
        return new Add(left,right);
      else if (op.equals("-"))
        return new Subtract(left,right);
      else if (op.equals("*"))
        return new Multiply(left,right);
      else if (op.equals("/"))
        return new Divide(left,right);
      else if (op.equals("^"))
        return new Power(left,right);
      else if (op.equals("~"))
        return new Negate(right);
      else
        throw new IllegalArgumentException
                    ("ExpressionTree.makeET: illegal operators = " + op);
    }
  }

So the main methods, which are abstract, compute the value of an expression and produce the postfix (RPN) String representation of an expression. We can write the Constant class as a direct, conctete subclass of this one, trivially implementing the postfix and evaluate methods.

  public class Constant extends ExpressionTree {
    public Constant (int value)
    {this.value = value;}
  
    public int evaluate()
    {return value;}
	
    public String postfix()
    {return ""+value+" ";}
	
    private int value;
  }

We add another layer of abstraction when defining the simple Operator subclass (extending ExpressionTree).

  public abstract class Operator extends ExpressionTree {
    public abstract String getOpSymbol();
  }

And add a final abstract layer by defining the BinaryOperator class (the UnaryOperator class is defined similarly). Notice that the abstract postfix method is made concrete at this level of the hierarchy (the getOpSymbol method is still abstract and will be specified in a subclass). It uses a postorder traversal to compute the postfix form of the left and right subexpressions, followed by the current operator.

  public abstract class BinaryOperator extends Operator {
    public BinaryOperator (ExpressionTree left, ExpressionTree right)
    {
      this.left  = left;
      this.right = right;
    }
  
    public ExpressionTree getLeft()
    {return left;}
	
    public ExpressionTree getRight()
    {return right;}
	
    public String postfix()
    {return left.postfix() + right.postfix() + getOpSymbol() + " ";} 
	
    private ExpressionTree left,right;
  }

Finally, with this infrastructure, it is very easy to specify a new class that represents an operator, such as Multiply specified below.

  public class Multiply extends BinaryOperator {
    public Multiply (ExpressionTree left, ExpressionTree right)
    {super(left,right);}
  
    public int evaluate()
    {return getLeft().evaluate() * getRight().evaluate();}

    public String getOpSymbol()
    {return "*";}
  }

Notice that evaluation uses a postfix traversal of the tree: it recursively evaluates the left and right subtrees (their classes determines how they evaluate their values), and then performs the arithmetic operation specified by this node).

Download Expression Trees for the entire specification of these classes, and a driver program that uses a StringTokenizer and Stack to translate infix expressions (using operator precedence and parentheses) into expression trees, and then print their postfix form and evaluate the expression tree.

Could augment this program by using a Map to store associations between variables and values, so that we could add a Variable class to the heirarchy above. Then, we could add the = operator, and evaluate expressios in the context of the variable map; at this point we have built an interpreter for a simple calculator.

Digital Trees
(aka Tries)

I will discuss these in class.

Problem Set

To ensure that you understand all the material in this lecture, please solve the the announced problems after you read the lecture.

If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a TA, or any other student.

None Yet.

Special Trees

Introduction to Computer Science II ICS-22

Introduction to Computer Science II
ICS-22