General and Special Trees In this lecture we will look at variations of simple binary trees (beyond BSTs, Heaps, and AVL trees). Specifically, we will see N-ary trees, for storing nodes that have an arbitrary number of childen; expression trees, for storing formulas, quadtrees for storing pictures, and digital trees for storing dictionaries (and Sets and Maps). In these trees we will see examples of how to store a node's children by using multiple references (just as in binary trees) as well as using the set, list, and map collection classes to store children. N-ary Trees with Sets and Lists: To start, we will examine a simple way to store N-ary trees: trees that can have any number (N) of children. Examples of information that we can represent in N-ary trees are inheritance hierarchies (each class can be extended by any number of classes, but each class has a unique superclass) and file structures (each folder can contain an arbitrary number of files and other folders). To store the later we might declare public class FolderOrFile { public FolderOrFile (String name, String contents, boolean isFolder) { this.name= name; if (isFolder) children = new ArraySet(); else this.contents = contents; } public String name; public String contents; //Null for folders: see children public Set children; //Null for files : see contents } which can represent an arbitrary file structure. An example of how we can process such a structure, say to print all the names inside a folder (including names inside folders inside that folder, etc.), is shown below: basically it implements a preorder traversal of an N-ary tree. public static void printNames (FolderOrFile ff) { if (ff == null) return; System.out.println(ff.name); if (children != null) //is this a folder, not a file? for (FolderOrFile f : ff.children) printNames( f ); } Actually, assuming a root folder, we will never have a base call of null: if a folder (or file) has no children (files by definition have no children), then the children instance variable would be null. Another way to build such a structure is that all folders and files refer to a set of children, but some of those sets have size()=0, meaning the for loop executes 0 times (its body is never executed). If we wanted to sort the children, or keep them in a specific (alphabetical?) order, we could use a List collection instead of a Set collection to do the job. The for-each loop in the code above for printNames works regardless of the kind of collection we use to implement children, so long as the collection is Iterable. N-ary Trees embedded in Binary Trees: Interestingly enough, we can use a strange kind of binary tree (it has enough complexity) to store all the information in an N-ary tree. We would define such a class as follows public class NTN { public String name; public NTN firstChild; //First for a linked list of children public NTN nextSibling; //Next in linked list of siblings public NTN (String s, TN fc, TN ns) {name = s; firstChild = fc; nextSibling = ns} } Here we use the two "recursive" references differently than in a binary tree (where we represent references to left and right subtrees) and differently than in a doubly-linked list (where we represent previous and next references). We have each node refer to its next sibling and to its first child (and of course, we can use the the first child to refer to all its siblings by following their nextSibling references, and eventually find all the children of its parent). Note that the root node will have no siblings, but all other nodes can have siblings (but don't have to). It is interesting to think about how a node with two references can be used to represent three such different structures as doubly-linked lists, binary trees, and N-ary trees, each with a very different use of references than the other. The web reading page shows the equivalent of a small directory represented with this kind of data structure. As a translation of the example above, we can write code that will print all the names inside an NTN tree (again it implements a preorder traversal of an N-ary tree). Here we print the name for a node, and then all the names reachable from its children). public static void printNames (NTN n) { if (n == null) return; System.out.println(n.name); for (NTN r=t.firstChild; r!=null; r=r.nextSibling) printNames(r); } In fact, we can rewrite this method purely recusively. public static void printNames (NTN n) { if (n == null) return; System.out.println(t.name); printNames(t.firstChild); printNames(t.nexSibling); } Here we really need to check the base case n == null, to terminate the recursion (as we checked r!=null in the for loop above). Each node prints its own name, all the names reachable from its first child, and all the names reachable from its next sibling (which will include the next sibling's children). It would also be useful to have a parent reference in the NTN class, so that every node could reach its unique parent. Quad Trees: We next will briefly discuss Quad trees, which we can use to represent pictures that are rendered (drawn) on the screen NOT top to bottom, but from fuzzy pictures to clear pictures. For pictures that contain large areas of the same color, a Quad tree can store a picture more compactly than an array of pixels. In a Quad tree, each node has exactly 4 children. We can represent a quad tree to store a picture by public class QTN { public int size; //size x size public int avgRed; //of all pixels it contains public int avgGreen; public int avgBlue; public QTN[] children; //Always 0 or 4 children public DTN (int s, int r, int g, int b) {size = s; avgRed = r; avgGreen = g; avgBlue = b} } Each Quad tree node represent a square of size^2 pixels. If the entire square has a uniform color, then the "avg" instance variables store its red, green, and blue components, and the children field stores null. Actually, even if the square is not one uniform color, the "avg" instances store the average amount of red, green, and blue). If the square is not a uniform color, the children divide the square into four quadrants (numbered 0-3)and each represents one quadrant of a child, whose vertical and horizontal size is 1/2 that of its parent (each of the 4 quadrants is 1/2 * 1/2 = 1/4 the size of its parent). +--------+--------+ | | | | 0 | 1 | | | | +--------+--------+ | | | | 2 | 3 | | | | +--------+--------+ This process of creating children to represent the picture continues, at each level breaking its picture into four 1/4-sized pictures, until a child QTN is all one color. This might be because the quadrant is a single pixel (a base case), or it might be because the quadrant is a square of pixels with a uniform color. By using a Quad tree we can render a picture, not so much top to bottom in full detail, but by refining each quadrant, quadrants in a quadrant, etc. until the entire picture is rendered. In this way the picture starts out as a blur at the root, but with the right approximate color distribution, and the picture gets sharper and sharper as we process each subtree (breadth first) in the picture, filling in more of its its details. This is accomplished with a breadth first traversal of the Quad tree, where we render each quadrant a depth 0, then each quadrant a depth 1, then each quadrant a depth 2, etc. Each increase in depth increases the resolution by a factor of 4. Obviously, storing all the information in this tree can take more space than just storing all the pixels, but the space requirements can be less if many small (or a few large) squares store the same color. Expression Trees: Compilers first parse a program (using the syntax of the language) by converting it into a tree representing its syntax. In this section we will use expression trees to represent and process expression. The web reading shows how this is naturally done, by adding a special subclass for each operator in an inheritance hierarchy. A program you can download uses a Stack to translate expression into such a tree (and evaluate what value the tree represents). Here we are going to use just a single class to store and process such trees (simpler but not as elegant, and not as easy to extend to expressions with more operators). public class ExprTree { public ExprTree (String oOV, ExprTree left, ExprTree right) {opOrVal = oOV; this.left = left; this.right = right;} public String opOrVal; //Either something like "+" or "3" public ExprTree left, right; //non-null for operators (refer to operands) } The trees are drawn just like the ones shown in the web reading, with the operator or value in the opOrVal instance variable, and if not a value, references to the sub expressions. Note that when converting an expression to an ExprTree, the latter an operator is applied the higher it appears in the tree. So, we have the following examples. 2 + 3 * 5 (2+3) * 5 1 + 2 + 3 + * + / \ / \ / \ 2 * + 5 + 3 / \ / \ / \ 3 5 2 3 1 2 Notice in the last example, because + is "left associative", the first + is evaluated before the second one. To be completely accurate in drawing expression trees, you must follow associativity reuls for equal precendence operators. Note that there are no null expression trees: the smallest expression tree contains a value: e.g., new ExprTree("3",null,null). We will discuss in class, for a uniprocessor, the number of time steps it takes to evaluate such an expression is equal to the number of internal nodes in the tree; for a multiprocessor it is the height of the tree (using the multiprocessor to simultaneously evaluate all nodes at a depth, going upward). Let us see some simple recursive code to evaluate an expression. Note that we can recognize an ExprTree as a base case value, if null stored is stored in its left and right subexpressions. Note that if we introduced unary operators, the left subexpression for one would be null but the right subexpression would refer to the value the unary operator operated on. public int evaluate (ExprTree e) { if (t.left == null && t.right == null) //Leaf (a value)? return Integer.parseInt(t.opOrVal); //return int equivalent of String else { if (t.opOrVal.equals("+")) return evaluate(t.left) + evaluate(t.right); if (t.opOrVal.equals("-")) return evaluate(t.left) - evaluate(t.right); if (t.opOrVal.equals("*")) return evaluate(t.left) * evaluate(t.right); if (t.opOrVal.equals("/")) return evaluate(t.left) / evaluate(t.right); throw new IllegalOperatorException(t.opOrVal); } This code performs is a postorder traversal of a tree: computing the values of the left and right subtrees, then using knowledge of the operator to return the correct result for the operator at the root of the subtree. We could rename opOrVal to opOrValOrVar; the value "r" stored in this instance variable would refer to the variable r. To compute the value of the variable, we could use a Map[String -> String], so the keys in this map are variable names and the values in this map are the values stored in these variables. Such a map is called an "environment"; we can rewrite our evaluate code as follows (let's assume we write methods called isValue and isVariable) public int evaluate (ExprTree e, Map env) { if (t.left == null && t.right == null) { if (isValue(t.opOrValOrVar)) //Value return Integer.parseInt(t.opOrVal); //return int equivalent of String if (isVariable(t.opOrValOrVar)) //Variable if (env.contains(t.opOrValOrVar)) //Return its value if in env return Integer.parseInt(env.get(t.opOrValOrVar)); else throw new IllegalVariableException(t.opOrVal); }else { if (t.opOrVal.equals("+")) return evaluate(t.left) + evaluate(t.right); if (t.opOrVal.equals("-")) return evaluate(t.left) - evaluate(t.right); if (t.opOrVal.equals("*")) return evaluate(t.left) * evaluate(t.right); if (t.opOrVal.equals("/")) return evaluate(t.left) / evaluate(t.right); throw new IllegalOperatorException(t.opOrVal); } Problem: What if we allowed the = operator (which changes the value assocciated with a variable and results in that new value); how could we update evaluate to include the semantics/meaning of this other operator? We could add the following code if (t.opOrVal.equals("=")) { if (!isVariable(t.left) || !env.contains(t.left)) throw new IllegalVariableException(t.left); else { int rightValue = evaluate(t.right); env.put(t.left,rightValue+""); return rightValue; } } Digital Trees: Finally, we will examine Digital Trees (also known as "tries", pronounced as in the word reTRIEval -so just like "tree"). As an example we can use digital trees to store a dictionary of words for a spelling correction utility. The standard way to store such a structure would be as a Set of legal words. So far, the most efficient way we have seen to store such a collection is an AVL tree. Recall that an AVL tress is a special kind of BST that is guaranteed to be well balanced, so the time to search for a word would be at worst O(Log N). Using a digital tree, we can reduce the complexity of many of its important operations (add, remove, and insert) to O(1)! It is the same regardless of how many words are stored in the tree; it depends not on how many words are in the tree, but only on how many letters are in that word. So, we might say if a word contained M letters the time is O(M). Technically, since each comparison in the AVL tree might requiring looking at all M letters (String comparisons are letter by letter comparisons until there is a letter mismatch or one of the Strings runs out of letters), we would then have to list the AVL tree's complexity, if using the same metric, as O(M Log N). So digital trees do save a factor of Log N. A digital tree means the value that we want to look up can be broken down into its "digits". For example, to look up the integer 153 we look at the digit 1, then the digit 5, then the digit 3; likewise,to look up the String "yes" we look at the "digit" "y", then the "digit" "e", then the "digit" "s". So, the characters are the "digits" for a String. To store a digital tree for processing Strings, we migh represent it as follows. Note here, the children are collected in a Map, with each key a String (really just one char, which we will represent as a String) and each value another DTN. public class DTN { public boolean isWord; public String wordToHere; public Map children; public DTN (boolean iw, String wth) {isWord = iw; wordToHere = wth; children = new ArrayMap();} } How do we add a word? We always start with a root DTN (whose isWord is false and whose wordToHere is ""). It represents a word of 0 characters (of which there aren't any!). To add a word, say "ant", we start at the root. If its children map contains the first letter , "a", we find the value DTN assocated with this letter: a subtree containing all the words starting with the digit "a". We then repeat this process again from there, with the next letter. If we get to the end of the word and we have not needed to create a new DTN node to reach that spot, we change isWord of that node to be true. If at any time a node's children map does not contain the letter we need, we put that new letter into the map (with a value that is a DTN node with isWord false and wordToHere containing all the needed letters) and follow it. So, if the root's children map DOES NOT contain the first letter, "a", we add to that map a key of "a" with a value DTN whose isWord is false and wordToHere being ""+"a" (the wordToHere of its parent, extended by its letter). We use this node to represent the subtree whose children are all the words starting with "a". Then we repeat this process with all subsequent letters: for the last node constructed, we set its isWord to true.(since we have processed all the letters in a word) Note that each map will contain at most 26 entries, one for each possible letter in a word: we'll assume only lowercase letters; of course we could use both cases and increase the map's size at most to 52). In fact, we could use an array to store these 26 reference: to look up character c we use the array indexed at c-'a' (Java impllcitly converts c into an int, then subtracts the value of 'a' converted to an int: 'a'-'a' = 0, 'b'-'a' = 1, ... 'z'-'a' = 25. Thus, the word "anteater" and "anthem" share the structure "ant", then in the children map for "ant", the key "e" leads to a DTN on the path to "anteater" and the key "h" leads to a different DTN on the path to "anthem". This sharing is illustrated in the picture below. root |a a -> node "a" False |n n -> node "an" False (unless "an" added) |t t -> node "ant" False (unless "ant" added) /e \h e -> node; h -> node "ante" False "anth" False |a |e ... "antea" False "anthe" False |t |m ... "anteat" False "anthem" True |e ... "anteate" False |r ... "anteater" True Note that there is one True for every word. If we put "a" in the digital tree, the only change would be change the isWord from False to True in the child of the root. To search for a word, we use each of its letters, in sequence, to "get" the next DTN, until we either run off the tree (word not present) or get to the last letter -in which case isWord tells whether or not it is a word. The method for this is easy to write. That is, in the above example, looking up "ant" would return true; "anthemly" would return false; "anthe" would return false. public static boolean isAWord (DTN t, String remainingLetters) { if (t == null) return false; else if (remainingLetters.equals("")) return t.isWord; else return isAWord(t.children.get(remainingLetters.substring(0,1)), remainingLetters.substring(1)); } Here remainingLetters.substring(0,1)is a String containing just the first letter (the key to the map), and remainingLetters.substring(1) is a String containing all but the first letter: both the tree and the String are getting "smaller" in each recursive call: the tree getting closer to the null base case; the String getting closer to the empty ("") base case. Removing a word is a bit more subtle: we can search for the word and set its isWord to false; we can actually remove the node from the tree if it has no children, and do the same with its ancestors that are not words until we find a node whose isWord is true; from that point on we must leave this node in the tree so its children words can be found. There are lots of nodes in the tree that don't represent words. So, in the tree above, we can remove "ant" by setting its isWord to false. We can then remove "anteater" by getting rid of nodes with the words "anteater", "anteate", "anteat", "antea", "ante", but must stop there, because if we deleted the node with "ant", the digital tree would not store the word "anthem". We can use digital trees to represent sets and maps (instead of isWord, use an instance variable that refers to the value associated with a key) and very quickly add/lookup/remove words, all in O(1) - or O(M) where the word we are operating on has M letters. Finally, we will review the difference between a data type and a data structure. Recall that a data type is most like an interface in Java. It specifies the operations that one can perform, but makes no commitment as to how the information is represented or how the operations are accomplished. A data structure is most like a class in Java. There can be many data structures that implement a data type, each using a different way to encode the data and perform the operations. Once a programmer solves a problem using data types, he/she can use any data structure implementing them to actually run the program. All data structures should produce the same result, but some will run faster than others, depending on the complexity class of their operations. At the start, this course was all about Data Types (collection classes: Queue, Stack, PriorityQueue, List, Set, Map): we learned to solve problems using compositions of these data types, modeling the needed data and operations; then it became about the various ways to implement them (Arrays, Linked Lists, Heaps, Binary Search Trees, AVL trees, Digital Trees, and coming soon Hash Tables, Skip Lists, etc.) where we also characterize these data structures by the complexity classes of their operations. Once we solve problems (as in Program #1), we can easily interchange different data structures that implement these data types to find one that performs the best: typically we just replace something like new ArraySet<...> by new DigitalTreeSet<...>.