Using Collection Classes Introduction: In this lecture we will discuss the six standard generic data types/collection classes: Stack, Queue, PriorityQueue, List, Set, and Map. You can read about their details in the Javadoc that I have written for these classes (see the "Javadoc of Collections API" link on the course web page). For now, focus on the information in the Interface Summary, which uses Interfaces and Subinterfaces to define all the operations applicable to these data types. We will pay special attention to the iterator() method and how to use the Iterator object that it produces. In the next lecture we will focus on how to write concrete classes (using inheritance and abstract classes) that define data structures implementing these data types. We will study some of the simple Array data structure implementations of these data types: specifically constructors in these classes, the code for implementing their defined operations, and we will take a brief look at the complexity classes of their methods (and expand on this topic later in the quarter). These pair of lectures are followed by Programming Assignment #1, which asks you to represent and solve various problems using combinations of these data types. In that assignment, you will focus on understanding/exploiting the data types, using the implementations that I have provided. In the assignments #2, #3, and #4, you will reimplement some of these data types using more sophisticated and efficient data structures. In the final assignment, you will once again use these data types (and their efficient implementations) to implement a much more complicated data type: a graph. After we learn more formally about using analysis of algorithms to study the resource use (performance) of different data structures implementing data types, we will have covered the three major topics in this course. Please take some time and effort to understand the difference between data types and data structures. Six Data Types/Collection Classes: Today we will examine the six most important data types in three groups, in the following order, based on the similarity of their operations: (1) Stack, Queue, PriorityQueue (2) List and Set, (3) Map (and Map.Entry: a trivial data type related to Maps) We will start discussing iterators in detail with the second group. Although iterators also are available in all these collection classes, they are most frequently used with groups 2-3. I suggest that you open Eclipse and write some tiny programs that import these interfaces and classes import edu.uci.ics.pattis.ics23.collections.*; and construct an object from one class and use it to call a few methods, to test out the knowledge you gain. These six interfaces are arranged into two inheritence hierarchies. Here each interface that defines at least one method (Stack, Queue, and Set define none) is followed by the number of methods defined in that interface (in parentheses) Iterable(1) Map(18) / \ OrderedCollection(14) Collection(17) / | \ / \ Stack Queue PriorityQueue(1) Set List(8) ---------- Stacks, Queues, and Priority Queues Stack, Queue, and PriorityQueue are similar, because they are all data types that store their elements in a predefined order. That is, when we remove elements from these collections, the order of removal is determined by the data type of the collection. The orders are: Last-In-First-Out (LIFO, for a stack), First-In-First-Out (FIFO, for a queue), and Highest-Priority-First-Out (for a priority queue). The interfaces for Stack, Queue, and PriorityQueue are "trivial": their operations are mostly inherited from the OrderedCollection interface, which all three extend. Only the interface for PriorityQueue specifies another operation named "merge", which requires another PriorityQueue as a parameter. So, let us start by looking at the OrderedCollection interface. Note that this is a generic interface, with specifying the type of the elements to be placed in this collection, and E used when specifying some of the prototypes of the methods: e.g., add takes an element of type E; remove returns an element of type E. Note that if we specified E to be Object, then we could store references to objects of any type in this collection. I will mostly refer to elements in collections by their simpler name: values. Also notice that this interface extends Iterable (lookup this interface by clicking on it in the OrderedCollection Javadoc) to see that it just requires that classes implementing this interface include an "iterator" method. public interface OrderedCollection extends Iterable { //Commands public boolean add (E e); public boolean addAll (Iterable es); public E remove () throws NoSuchElementException; public void clear (); //Queries public E peek () throws NoSuchElementException; public boolean isEmpty (); public int size (); //Miscellaneous public Iterator iterator (); public E[] toArray (); public OrderedCollection newEmpty (); public OrderedCollection shallowCopy (); //Inherited (should override) public String toString (); public boolean equals (Object o); public int hashCode (); } In all my interfaces, I classify methods into two main categories: Commands (also known as "mutators"), which change/mutate the data structure of the objects they act on; and Queries (also known as "accessors") which examine, but do not change the data structure of the objects they act on. Methods in either category can return a result. I also specify two other classifications, Miscellaneous and Inherited methods, which are all Queries/accessors: every data structure must implement the former methods; every data structure inherits but SHOULD OVERRIDE the latter methods. We will see implementations of these methods in the classes (both abstract and concrete) that implement these data types Let's examine each of these methods more closely. To a large degree, the operations here have almost identical meanings for the Stack, Queue, and PriorityQueue data types: the big difference is in how their remove methods work. Commands: The "add" method adds (includes) a value into the collection. We don't need to understand here "how" this is accomplished by data structures that implement this interface, but only that it will be correctly accomplished, setting up for "remove" to remove the correct value according to which data type is used. Add returns a boolean result: true if adding the value changes the collection. For SQPQs (Stacks, Queues, and PriorityQueues), this method always returns true, because we can add duplicate values into these collections. Contrast this property with the Set collection, which allows NO DUPLICATES. For sets, sometimes calling "add" will return false (the value to be added was already in the set, so the set remains unchanged) and sometimes will return true (the value to be added was NOT already in the set, so the set changes). The "addAll" method iterates through all the values produced by its parameter, adding each one into the collection. It also returns a boolean result: true if any added value changed the collection. Note that "addAll" may return false for SQPQs, but only if the iterator parameter produces NO VALUES; if it produces even one value, that value will be added to the SQPQ and change it. Note that because all but the Map interface inherit from Iterable, we can add all the values of a (non-Map) collection into any of these collections. So, if we want to add all the values from a Queue q into a Stack s, we can write s.addAll(q): Java will iterate through every value in q, adding each of these values to the stack s. The "remove" method removes (discards) the next value from the collection, also returning that value: stacks remove the most recently added value; queues remove the least recently added value; priority queues remove the value with the hightest priority. The add and remove methods work together to accomplish this requirement. Finally, note that the remove operation can fail, if there are no values in the collection to remove (size() is 0 and isEmpty() is true): in such cases it indicates its failure by throwing the NoSuchElementException. The "clear" method removes all the values currently the SQPQ; it is a void method so returns nothing. Calling size() or isEmpty() after calling "clear" will return a result of 0 or true respectively Queries: The "peek" method returns the same value as would calling the remove method (or throws the same exception) but DOES NOT REMOVE that value from the collections. Recall that queries DO NOT change the data structure of the object they act on. So calling peek a second time returns the same value as calling peek the first time (if no commands are called in between). The "size" method returns the number values currently in the SQPQ. Generally the add method increments this value by 1 and the remove method decrements it by 1 (if there is at least one value in the SQPQ). The "isEmpty" method returns whether or not there are any values in the SQPQ. It is a convenient boolean method equivalent to testing whether size() == 0. Miscellaneous: The "iterator" method returns an iterator object that we can use to produce all the values stored in a collection. See the Iterator class, and its three methods hasNext, next, and remove, which we will discuss in more detail (along with a special "for loop" for iterators) when we discuss Lists and Sets. Note that the order of values produced by an iterator of a SQPQ is the SAME ORDER the values in these collections would be returned by repeatedly calling the remove method. The "toArray" method returns a reference to an array that is exactly big enough (length == size()) to store all the values currently in the collection, which is filled with the values in this collection in the SAME ORDER as these values would be returned by repeatedly calling the remove method. The "newEmpty" method returns an empty SQPQ, from the same class as the collection it was called on. So, if the type of object that q refers to is an ArrayQueue, calling q.newEmpty() returns an empty ArrayQueue. Likewise if the type of object tht q refers to is a LinkedQueue, calling q.newEmpty() returns an empty LinkedQueue. The "shallowCopy" method returns a new SQPQ that is filled with the same values as the collection it was called on. The copy is SHALLOW, meaning that the two ordered collections SHARE the same objects; such elements must be mutated carefully (or not at all). So, if the type of q is ArrayQueue, calling q.shallowCopy() returns an ArrayQueue storing the same values. A deep copy would make a copy of the collection object, as well as copies of all the objects stored in the collection. I'll try to draw a picture in class illustrating the difference between a shallow and deep copy. If I forget, ask me to do so. Inherited (but should be overridden): The "toString" method returns a String representation of a SQPQ collection. It includes any useful information about the concrete data structure that implements this ordered collection: e.g., for an array, it includes the length of the array and the number of indices used in the array (always <= the length). The "equals" method returns whether or not the Object o stores the same collection that it was called on: meaning the same data TYPE (it is not necessary to be the same data STRUCTURE), with the same number of values, appearing in the same order (because order is important in SQPQs). So given Stack s = new ArrayStack(); Queue q1 = new ArrayQueue(); Queue q2 = new LinkedQueue(); Calling s.equals(q1) will alway return false; calling q1.equals(q1) will always return true; calling q1.equals(q2) can return either true or false, depending on whether the two queues store .equal values in the same order. it makes no difference that q1 refers to an object from the ArrayQueue class and q2 refers to an object from the LinkedQueue class. We will return to discuss the meaning and purpose of the "hashCode" method later in the quarter, when we discuss hashing. Constructors (not in Interfaces, but in classes implementing Interfaces): Interfaces don't specify constructors, but every class implementing an interface must specify at least one constructor. For the SQPQ classes, there will be constructors that (1) Construct an empty SQPQ (2) Construct a SQPQ initialized by values in an array (3) Construct a SQPQ initialized by values in an iterable object For 2-3, the SQPQ has values added from the array in lowest to largest index and the iterable object in the order it is iterated through. Of special note in all priority queue constructors is a parameter representing an object implementing the Comparator interface, which the priority queue uses to determine the relative priority of its values (and thus knows how to remove the highest-priority value). The Compartor interface contains just the "compare" method. Simple Uses of Ordered Collections Now we have examined the meanings of the individual operations for SQPQ. We still must learn how to organize operations to get jobs done, which we will start now. Programming Assignment #1 deals with this topic in much more depth. First, by convention when we use collection classes, we will declare the type of variables using INTERFACES, and construct object references to store in these variables from CLASSES implementing these interfaces. Recall that there are three kinds of variables in Java: local variables, parameter variables, and instance variables. Thus, we might write PriorityQueue q = new ArrayPriorityQueue(priorityComparator); and then write lots of code that manipulates q. Of course, this code will consist solely of calling the methods specified in the PriorityQueue interface, and the OrderedCollection interface that it extends. If at a later time we find (and we will!) that the HeapPriorityQueue class provides a PriorityQueue implementation that is much faster for most operations, we will need to change only the right side of this one line in our code (selecting a different PriorityQueue implementation). PriorityQueue q = new HeapPriorityQueue(priorityComparator); while leaving unchanged all the code that manipulates q. Since the HeapPriorityQueue implementation supports exactly the same methods as the ArrayPriorityQueue (specified by the PriorityQueue and OrderedCollection interfaces) whatever code we wrote for q will still compile in Java and run correctly as well -and mostly likely compute its answers much faster. Finally, the use of the generic type parameter ensures that the "add" method can add references only to Integer objects. Attempting to "add" a reference to any other class objects results in a compiler error. So Java ensures that only references to Integer objects are stored in q, because it is declared PriorityQueue. Likewise, the "remove"/"peek" methods are guaranteed to return a reference to an Integer object, so we don't need to cast to this class to call any Integer methods. So, by using a generic type parameter, the Java compiler ensures only reference to the specified type of objects is put in and removed/peeked from the collection. Here are a few small examples of such code. Suppose that we have a String[] x and we want to reverse the order in which the values occur. We could easily use a Stack for this job. Stack s = new ArrayStack(); for (int i=0; i s = new ArrayStack(x); because one constructor for ArrayStack takes a String[] as a parameter. Likewise, if we wanted to sort an array of Strings, we could use similar code: PriorityQueue p = new ArrayPriorityQueue(new AlphabeticalOrder()); for (int i=0; i { public int compare(String s1, String s2) {return -s1.compareTo(s2);} } Note for example that we want "A" to have a higher-priority than "B" (so it will be removed from the priority queue first, and be put earlier in the Array). But "A".compareTo("B") returns a negative number (because "A" is less than "B" in the alphabetical order), so we negate the result, and thus return a positive number, giving "A" the higher priority. Of course, we could also have used an anonymous class for this purpose as well. Finally, it would be possible in the first and second examples to specify OrderedCollection s = new ArrayStack(); and OrderedCollection p = new ArrayPriorityQueue(new AlphabeticalOrder()); and then -IN BOTH CASES- follow these declarations by the SAME CODE. Notice the operations performed on s and p are identical, just .add and .remove, and both methods are specified in the OrderedCollection interface, that the Stack and PriorityQueue interface inherit. I use the Stack, Queue, and PriorityQueue interfaces for the types of variables, so that by looking at their declarations, I can understand the meaning of the .add and .remove methods when applied to the variable it declares. ---------- Lists and Sets Lists and Sets are similar, because they are data types that primarily allow us to add and remove elements, and query whether those elements are present or absent (something that we cannot easily/efficiently do with SQPQs). The primary difference between these data types is that Lists store their elements at specific integer indices that we can query and manipulate (like arrays), while Sets have no concept of indices (and later we will see that we can leverage off the "unorderedness" of Sets to implement some Set operations very efficiently). Also recall that while a List can store the same element at different indices, a Set cannot have duplicate elements (which is why its "add" method can sometimes return true and sometimes return false. The interface for Set is "trivial": it inherits all its operations from the Collection interface. The interface for List adds another half a dozen operations to the ones it inherits from the Collection interface; these added operations all concern themselves with the INDICES of elements to examine and/or manipulate. So, let us start by looking at the Collection interface. Note that this one is also a generic interface, with specifying the type of the elements to be placed in this collection, and E used when specifying some of the prototypes of the methods. Again too, if we specified E to be Object, then we could store objects of any type in this collection. I will continue to refer to elements in collections mostly by their simpler name: values. Also notice that this interface extends Iterable (lookup this interface by clicking on it in the OrderedCollection Javadoc) to see that it just requires that classes implementing this interface include an "iterator" method. public interface Collection extends Iterable { //Commands public boolean add (E e); public boolean addAll (Iterable es); public boolean remove (E o); public boolean removeAll (Iterable es); public boolean retainAll (Collection c); public void clear (); //Queries public boolean contains (Object o); public boolean containsAll (Iterable es); public boolean isEmpty (); public int size (); //Miscellaneous public Iterator iterator (); public E[] toArray (); public Collection newEmpty (); public Collection shallowCopy (); //Inherited (should override) public String toString (); public boolean equals (Object o); public int hashCode (); } Let's examine some of these method more closely. Commands: The "add" method adds (includes) a value into the collection. We don't need to understand here "how" this is accomplished by data structures that implement this interface, but only that it will be correctly accomplished. When we add a value to a List, it is stored at an index one higher than the last used index (and becomes the last used index). Sets don't have indices, so we say just that the value is included in the Set. Add for a List always returns true (the List is always changed) but add for a Set returns true only if the value was not already in the set; if the value is already in the Set, the Set doesn't change. Regardless of whether we use a List or a Set, the contains query returns true for any value that we have added (but not yet removed) from a collection. The "addAll" method iterates through all the values produced by its parameter, adding each one into the collection. It also returns a boolean result: true if any added value changed the collection. Note that because all but the Map interface inherit from Iterable, we can add all the values of a (non-Map) collection into these collections. The "remove" method removes (discards) one instance of the specified object from the collection and returns true if it was successful (it found and removed that value). Sets just have one instance of each value, but Lists can have many (in which case only one is removed: the first one in the list: the one with the lowest index). Note for OrderedCollection, the order of the collection determines which value to remove; for a Collection (List and Set) we must specify a parameter that is the value to remove. The "removeAll" method iterates through all the values produced by its parameter, removing each one from the collection. It also returns a boolean result: true if any removed value changed the collection (e.g., it found and removed any value). The "retainAll" method iterates through all the values produced by its parameter, retaining only those values in the collection (removing all others). Thus, the collection will consist of only those values produced by the iterator of c that were originally in the collection. It also returns a boolean result: true if the collection changed (any value was removed). The "clear" method removes all the values currently stored in the collection; it is a void method so returns nothing. Calling size() or isEmpty() after calling "clear" will return a result of 0 or true respectively Queries: The "contains" method returns whether or not the specified object is stored somewhere in the collection (at most once for Sets; at least once for Lists). The "containsAll" method returns whether or not all the objects produced by its specified parameter are in the collection. The "size" method returns the number values currently in the collection. The "isEmpty" method returns whether or not there are any values in the collection. It is a convenient boolean method equivalent to testing whether size() == 0. Miscellaneous: The "iterator" method returns an iterator object that we can use to produce all the values stored in a collection. See the Iterator class, and its three methods hasNext, next, and remove, which we will discuss in more detail (along with a special "for loop" for iterators) after we discuss Lists below. Note that the order of values produced by an iterator of a List is the order that the values are stored in the list: the value in index 0, followed by the value of index 1, etc. up to index size()-1. The order of values produced by an iterator of a Set is undefined (and will depend on the class implementing this interface): its element are produced in no predictable order, so do not assume a Set iterator produces values in any interesting order; in fact. every time an iterator is created, it can iterate through the values in a DIFFERENT order! We can make few assumptions about the order. The "toArray" method returns a reference to an array that is exactly big enough (length == size()) to store all the values currently in the collection, which is filled with the values in this collection. In a List, these will be in order from index 0 through index size()-1; in a Set, the order is undefined (and will depend on the class implementing this interface: it uses an iterator to fill in the array values). The "newEmpty" method returns an empty collection, from the same class as the collection it was called on. The "shallowCopy" method returns a new collection that is filled with the same values as the collection it was called on. The copy is SHALLOW, meaning that the two ordered collections SHARE the same objects; such elements must be mutated carefully (or not at all). A deep copy would make a copy of the collection object, as well as copies of all the objects they contain. Inherited (but should be overridden): The "toString" method returns a String representation of a collection. It includes any useful information about the concrete data structure that implements this ordered collection: e.g., for an array, it includes the length of the array and the number of indices used in the array (always <= the length). The "equals" method returns whether or not the Object o stores the same collection that it was called on: meaning the same data TYPE (it is not necessary to be the same data STRUCTURE), with the same values. For Lists this means a List with the same number of values, with the values appearing in the same order (because order is important). For Sets this means a Set with the same size and same values (order is not important). We will return to discuss the meaning and purpose of the "hashCode" method later in the quarter, when we discuss hashing. Constructors (not in Interfaces): Interfaces don't specify constructors, but every class implementing an interface must specify at least one constructor. For the List and Set classes, there will be constructors that (1) Construct an empty List and Set (2) Construct a List and Set initialized by values in an array (3) Construct a List and Set initialized by values in an iterable object For 2-3, the Set/List has values added from the array in lowest to largest index and the iterable object in the order it is iterated through. In Lists the values are stored in this same order; in Sets there is no order. Simple Uses of Sets Continuing with our convention, here is some code that fills a Set with 5 different String values, gotten by prompting the user. Set s = new ArraySet(); int count = 0; while (count <5) { String attempt = Prompt.forString("Enter a String"); if (! s.contains(attempt) ) { s.add(attempt); count++; } } First, notice that we don't need a local variable to count the number of values in the Set (it has a query for that). So we can simplify this code to be Set s = new ArraySet(); while (s.size() <5) { String attempt = Prompt.forString("Enter a String"); if (! s.contains(attempt) ) s.add(attempt); } } Second, notice what it if we add a String that is already in the Set, the Set remains unchanged. So, we don't need to test whether it is already contained in s before adding it. Thus, we can further simplfy this code to be Set s = new ArraySet(); while (s.size() <5) { String attempt = Prompt.forString("Enter a String"); s.add(attempt); } Finally, we don't really need the variable attempt (now that its value is used in just one place), so we can simplify this code to be Set s = new ArraySet(); while (s.size() <5) s.add(Prompt.forString("Enter a String")); Here is another interesting equivalence. Set s = new ArraySet(); int successfulRemoves = 0; ... String value = Prompt.forString("Enter String to try to remove"); if (s.contains(value)){ s.remove(value); successfulRemoves++; } Notice that because remove returns a boolean, we can simplfy this code to be Set s = new ArraySet(); int successfulRemoves = 0; ... String value = Prompt.forString("Enter String to remove"); if (s.remove(value)) successfulRemoves++; The more you think about and practice using the methods in these classes, the simpler and more elegant your code will become. ---------- List (only) The List interface extends the Collection interface with the following methods, each of which specifies an index as one of its parameters. public interface List extends Collection { //Commands public boolean add (int index, E e) throws IndexOutOfBoundsException; public E remove (int index) throws IndexOutOfBoundsException; public E set (int index, E e) throws IndexOutOfBoundsException; public boolean addAll (int index, Iterable es) throws IndexOutOfBoundsException; //Queries public E get (int index) throws IndexOutOfBoundsException ; public int indexOf (E e); public int lastIndexOf (E e); public List subList (int startIndex, int stopIndex) throws IndexOutOfBoundsException ; } Commands (all involving indexes): The "add" method adds (includes) the specified value into the List at the specified index, moving the values at that index and all subsequent indexes up by one. So long as this method doesn't throw the IndexOutOfBoundsException (the specified index must be in the List or one beyond the end) it always returns true and increments the size by 1. The "remove" method removes (discards) a value from the List at the specified index, moving the values one beyond that index and all subsequent indexes down by one. It returns the removed value. So long as this method doesn't throw the IndexOutOfBoundsException (the specified index must be in the List) it always decrements the size by 1. The "set" method sets (replaces) the specified index in the list to store the specified value, MOVING NO OTHER VALUES. It returns the value originally stored at that index. The size doesn't change, but the method may throw the IndexOutOfBoundsException (the specified index must be in the List) The "addAll" method adds (includes) all the values produced by an iterator on it parameter starting at the specified index, moving the values at that index and all subsequent indexes up by the number of values added. So long as this method doesn't throw the IndexOutOfBoundsException (the specified index must be in the List or one beyond the end) and the iterator produces one or more values, it always returns true and increments the size by the number of values added. Queries (all involving indexes): The "get" method returns the values stored in the List at the specified index, throwing the IndexOutOfBoundsException only if the specified index is not in the list. The "indexOf" method returns the number that is the lowest index that stores the specified value if it is present the List, and -1 (never a legal index) if it is not present in the List. The "lastIndex" method returns the number that is the highest index that stores the specified value if it is present the List, and -1 (never a legal index) if it is not present in the List. Note that if index() == lastIndex() then there are no duplicates of the specified value. The "subList" method returns a new List consisting of all values specified between the specified indices (inclusive). The original list remains unchanged (which is why this method is a query). The lowest index in the returned List is 0 (just like any List). This method throws the IndexOutOfBoundsException if either index is not in the List). Note that an array is just like a List with only a get/set method a[i] is equivalent to a.get(i) a[i] = something is equivalent to a.set(i,something) Note that although the syntax needed to retrieve and store a value in a List is more cumbersome than the syntax needed for an array, there are lots of other methods built into Lists that perform more complex List manipulation, which would require writing additional statements if we were using arrays. Therefore, it is typically more convenient to use Lists than arrays, even if doing the simple operations looks a bit more complicated. For more proof, also see the Collections (notice it is plural) class, which defines lots of static methods that operate on List (and other kids of) values. For example there is a method to sort a List and a method to shuffle (put in random order) a list, etc. Of course, it makes no sense to sort a set because there is no order of values in a set. ---------- Iterators: We will now begin our discussion of iterators, which applies in some form to each of the six collection classes. We can use an iterator to produce every value stored in a collection, and in the process also remove selected values from the collection (without calling the remove method on the collection, but instead calling the remove method on the iterator). We will examine this first (and simpler) use first. When we call iterator(), it produces an object from a class that implements the Iterator interface (much like a StringTokenizer, but for collections not Strings). public interface Iterator { public boolean hasNext(); public E next () throws NoSuchElementException; publit void remove () throws IllegalStateException, UnsupportedOperationException; } To produce every value stored in a collection, we need to call just the first two methods. Here is how to print every String in a Set s; we could write the exact same code if s were a Stack, Queue, PriorityQueue or List. Note that a Set (unlike the others) has no intrinsic order, and a Set (unlike the others) gives no way other than this to examine its elements (no parameterless "remove" like a SQPQ or "get" like a List). Iterator is = s.iterator(); while (is.hasNext()) System.out.println( is.next() ); Note that our code should always check/call hasNext before calling next; if next is called when hasNext returns false, then next will throw NoSuchElementException. Students often forget this important rule. Also note that every time you call next it produces a new value. So the boolean expression x.equals(is.next()) || y.equals(is.next()) DOES NOT check whether the next value produced by the iterator is equal to x or y: it checks whether the next value in the iterator is equal to x, or the value after that one in the iterator is equal to y. Two calls are made to is.next(), with each returning a different result. Even if we checked is.hasnNext(), this expression might throw an exception because of the second call to next might not be able to produce a value. What code could we write to print a random value from a Set, by using its iterator? This is easy for a List: generate a ranndom number in the range [ 0, s.size()) ) and call get with that random number; but, a Set has no indices. Hint: still generate the random number but do something different with it, involving an iterator. We can make use of a for loop and be a bit clever here and write some more condensed code (combining the iterator declaration and check) for (Iterator is = s.iterator(); is.hasNext(); /* see body */) System.out.println( is.next() ); Java recently introduced an even more special iterator-related for loop. It is called the FOR-EACH iterator. If s is an object implementing the Iterable interface (meaning only that it contains an iterator() method, which Sets do) we can write the code above as simply as for (String e : s) System.out.println( e ); Read this as, "for every String e when we iterate through Set s, print e". Generally, when c is a class that contains an iterator() method returning Iterator, we can write for (T e : c) body using e which java translates into the following more complicated to read code for (Iterator hiddenI = c.iterator(); hiddenI.hasNext(); /* see body */) { final T e = hiddenI.next() body using e } We cannot refer to the name hiddenI in our code: it is a secret name that only the Java compiler knows about. So, if we need to produce all the values stored in an Iterable collection class (all but Map are), and we do not need to remove any values, then we should prefer writing this compact form of the FOR-EACH loop. As a final example, suppose that we have List a,b; want to write a method that returns the first index that stores a different value (and -1 if all indexes store equal strings). We can write this method symmetically as public static firstDifference(List l1, List l2) { Iterator l1I = l1.iterator(); Iterator l2I = l2.iterator(); int i=0; for (/* see above */; l1I.hasNext() && l2I.hasNext(); i++) if (!l1I.next().equals(l2I.next())) return i; if (!l1I.hasNext() && !l2I.hasNext()) return -1 //Lists the same size with no differences else return i; //One List is longer } We can also call the remove method on an iterator: it removes from the collection the value "just returned" by next. This meaning causes a bit of confusion for beginners, but when you understand this rule, you will see that it is an obvious and correct one. Here is one example to help. Suppose we have Set s and we want to remove all Strings whose lengths are > 5. We can write for (Iterator is = s.iterator(); is.hasNext(); /* see body */) if ( is.next().length() > 5 ) is.remove(); We cannot use a for-each if we need to call remove, because that form declares no explicit iterator on which we can call remove. Notice that inside the if's condition, Java produces the next value from the Set and checks it for the desired property; it then calls remove which removes that value (the one "just returned" by next). This leads to two possibilities of remove throwing IllegalStateException, as illusted below. 1) Iterator is = s.iterator(); is.remove(); Because next has never been called; there is no "just returned" value. 2) Iterator is = s.iterator(); is.next(); //Assume there is one value to produce is.remove(); //This statement removes that value is.remove(); //This statement throws an IllegalStateException The second/final call to remove throws an exception; the "just returned" value has already been removed, and we cannot remove it again; we need to call next again, before another value can be removed. Some classes do not even support calling remove; in those cases (collections where we cannot remove values via the iterator) calling remove throws UnsupportedOperationException. Finally, there is one more exception related to iterators. If we are iterating through a collection and we change the collection in any way OTHER THAN THROUGH THE REMOVE ON THE ITERATOR (typically by adding or removing a value) a subsequent use of an iterator method will throw a ConcurrentModificationException. The basic idea here is that if you change a collection while you are in the process of iterating over it (but not through the iterator itself), then the meaning of the iteator becomes unclear, so the iterator refuses to work further. Iterators that do this are known as FAIL-FAST iterators, as they fail quickly if their underlying collection is changed. For example here is the WRONG WAY to remove long string from Set s. for (Iterator is = s.iterator(); is.hasNext(); /* see body */) { String toCheck = is.next(); if ( toCheck.length() > 5 ) s.remove(toCheck); } If we call s.remove(...) then the Set is changed -not through the iterator- and then when we call is.next() the next time in the loop, it will throw an exception. ---------- Maps and Map.Entry Maps are the most interesting of the six data types/collection classes. A map associates "keys' (of some type) with "values" (of some type, which can be the same or different than the key type). Often the key is a simple type (e.g., String) while the value is some more complicated data type (e.g., Set). Each key is "associated with"/"mapped to" one value at any time. Typically once we associate/map a value with a key, we will later use the key to retrieve/get its value (and possibly change the value: if the value is a Set, we may add something to that Set). We can also remove a key, ask whether a key or value is in a map, and iterate through all the keys, values, or mappings (represented by an object of some class implementing the Map.Entry interface, defined inside the Map interface). So, let us look at the Map interface. Note that this is a doubly generic interface, with specifying the type of the keys in the Map, and with specifying the type of the values in the Map; both K and V are used when specifying some of the prototypes of the methods: e.g., put takes a key of type K and a value of type V; remove takes a key of type K and returns a value of type V. Also notice that this interface DOES NOT extend Iterable, unlike all the other collection classes. It does include three methods that produce Iterable objects: entries, keys, and values. So, given a Map> m; (which maps every String to a Set of Strings: e.g., a word mapping to a Set of words that are its synonyms) we can print every key/value association/mapping on its own line by writing either for (String k : m.keys()) System.out.println(k + "->" + m.get(k)); or for (Map.Entry> e : m.entries()) System.out.println(e.getKey() + "->" + e.getValue()); We will discuss both of these loops in more detail after discussing the methods that they use from the Map interface below. public interface Map { //Commands public V put (K key, V value); public void putAll (Iterable> keysValues); public V remove (K key); public void clear (); //Queries public V get (Object key); public boolean containsKey (Object key); public boolean containsValue(Object value); public Iterable> entries (); public Iterable keys (); public Iterable values (); public boolean isEmpty (); public int size (); //Miscellaneous public Entry[] toArray (); public Map newEmpty (); public Map shallowCopy (); //Inherited (should override) public String toString (); public boolean equals (Object o); public int hashCode (); public interface Entry { public K getKey (); public V getValue (); public V setValue (V newValue); } } This interface defines a public nested interface named Entry. Outside the Map interface, we refer to this type as Map.Entry. Each Entry stores one key/value association/mapping. This interface says that given an object from a class implementing Mapy.Entry (say, values returned by the entries interator) we can retrieve the key and value, and we can change the value associated with a key (but cannot change the key). See the iterator class above. I'll try to draw a picture in class illustrating what a simple ArrayMap looks like, using its SimpleEntry class (which implements Map.Entry). If I forget, ask me. Let's examine each of these methods more closely. Commands: The "put" method maps a key to a value (adds an Entry to the Map associating that key and value). If that key was already in the Map, it returns the value that it previously mapped to; if it wasn't already in the Map, it returns null. This method is like "add" in the other collection classes, but it returns not a boolean, but the old value that the key mapped to (and if this returned value is null, it typically means that a new key was addd to the map, because that key did not map to anything before calling put). The "putAll" method iterates through all the Entrys produced by its parameter, putting each key/value mapping into the Map. The "remove" method removes (discards) the key and whatever value it maps to in the Map. It also returns the value that the key (now removed) used to map to. If the key is not in the Map, the Map remains unchanged and this method returns null. The "clear" method removes all the Entrys currently the Map; it is a void method so returns nothing. Calling size() or isEmpty() after calling "clear" will return a result of 0 or true respectively Queries: The "get" method returns the value that the specified key maps to in the Map; it returns null if the key is not in the Map (maps to no value). The get method in Maps is like the get method in Lists, but uses a key instead of an index. The "containsKey" method returns whether or not the specified key maps to any value in the Map. The "containsValue" method returns whether or not any key(s) map to the specified value in the Map. The "entries", "keys", and "values" methods each return an Iterable object. We can iterate over the returned result to produce all the keys, values, and Entrys respectively. Notice that the keys are unique, but two keys can map to the same value, so the values() object may produce the same value multiple times. Like Sets, there is no special order in which any of these methods will produce their results. If we created a List from any of these Iterable objects, the size of the List would be the same as the size() of the Map. We will see how to put all the keys from a Map into a List, then sort that list and use it to print all the keys and their values where the keys are in sorted order. The "size" method returns the number key/value associations/mappings currently in the Map. Generally the put method increments this value by 1 (but not if the key is already in the Map, then the size stays the same) and the remove method decrements it by 1 (if that key is in the Map, otherwise no change is made to the map). The "isEmpty" method returns whether or not there are any key/value associations/mappings in the Map. It is a convenient boolean method equivalent to testing whether size() == 0. Miscellaneous: The "toArray" method returns a reference to an array that is exactly big enough (length == size()) to store all the Entrys currently in the Map, which is filled with the Entrys in this Map. The order is undefined (and will depend on the class implementing this interface). The "newEmpty" method returns an empty Map, from the same class as the Map it was called on.. The "shallowCopy" method returns a new Map that is filled with the same values as the Map it was called on. The copy is SHALLOW, meaning that the two Maps SHARE the same objects; such elements must be mutated carefully (or not at all). Inherited (but should be overridden): The "toString" method returns a String representation of a Map. It includes any useful information about the concrete data structure that implements this Map: e.g., for an array, it includes the length of the array and the number of indices used in the array (always <= the length). The "equals" method returns whether or not the Object o stores the same Map that it was called on. For Maps this means the same data type, the same number of associations/mappings, with the same keys mapping to the same values. We will return to discuss the meaning and purpose of the "hashCode" method later in the quarter, when we discuss hashing. Constructors (not in Interfaces): Interfaces don't specify constructors, but every class implementing an interface must specify at least one constructor. For the Map class, there will be constructors that (1) Construct an empty Map (2) Construct a Map initialized by values in an Entry array (3) Construct a Map initialized by values in another Map object Simple Uses of Maps Let us assume for simplicity that we have declared Map> m; We have seen that the following code prints all the key/value associations in a Map, one per line. It iterates over every key k, printing it and the value the key maps to in the Map. for (String k : m.keys()) System.out.println(k + "->" + m.get(k)); Because the keys() method produces keys in no special order, we need to use the following, more complicated code to print all the key/value associations in a Map, IN ALPHABETICAL ORDER ACCORDING TO k. List keyList = new ArrayList(m.keys()); Collections.sort(keyList); //We can add a Compartor to sort specially for (String k : keyList) System.out.println(k + "->" + m.get(k)); Here we start by constructing a List with all the keys in some order: here we supply an iterator over keys to the ArrayList constructor. Then we sort that List. Finally we iterate through the sorted list, printing each key (they are produced by the iterator in alphabetical order) and the value it maps to. Without adding a comparator argument to Collections.sort(keyList); the keys will be sorted in increasing alphabetical/dictionary order, according to the compareTo method in the String class. Let's next examine a few ways to add/update an association/mapping in the Map described above. Suppose we have a String key k1 and want a String v1 to be a value in the Set that k1 maps to. There are two cases to consider: (1) k1 is a key in the Map (maps to a Set) so we should add v1 to that Set (2) k1 is not a key in the Map (maps to NO Set), so we should put in an association/mapping from k to a new Set that contains only v1 This code, or some variant of it, appears in most programs whose most basic data type is a Map (most of those in Programming Assignment #1). I will try to illustrate the execution of the code below with object/instance diagrams to help you understand what is happening in class. If I forget, ask me to do so. Here is some code that directly implements this algorithm. if (m.containsKey(k1)) //k1 a key in m? m.get(k1).add(v1); //add v1 to it associated set else { Set mappedValues = new ArraySet(); //Create empty set mappedValues.add(v1); //add v1 to it m.put(k1,mappedValues); //associate k1 with this set } Notice in this code that we must "search" the Map twice: once for contains(), and once for either get() or put(), depending which if part is executed. Instead, the following code always searches the Map once for get(), and once more for put() -but only if the key is not present; if the key is present, it does no second search. Set mappedValues = m.get(k1); if (mappedValues == null) mappedValues = new ArraySet(); m.put(k1, mappedValues); } //mappedValue is now guaranteed to be a set asscociated with k1 in map m mappedValues.add(v1); Notice that once we have a reference stored in mappedValues to the Set the key maps to (whether from the get() or the new Set put() in), we just add to that Set (mutating the Set that is in the Map). Finally, the shortest code to accomplish our goal, shown below, takes at worst three searches: contains(), put(), and get(). if (!m.containsKey(k1)) m.put(k1, new ArraySet()); m.get(k1).add(v1); To understand maps, you should ensure that you understand why each of these code fragments does what it must do. Mutation in PriorityQueues, Sets, and Maps: PLEASE DO NOT MUTATE any elements in a PriorityQueue or Set, or any KEYS in a Map. It is perfectly OK and frequently useful (as shown above) to mutate the VALUEs in a Map. Advanced/efficient data structures that implement these data types store values based on their hashCode method. Changing the state of such an object will change its hashCode, causing the object to be lost in the collection. So, for example, if you wanted to change an object in a Set, you should first remove it, then mutate it, then add it back into the Set. Set s; Foo element = ...; s.remove(element); element.mutator(); s.add(element); Likewise, if you wanted to change a KEY in a Map, you should first remove it, then mutate it, then put it back into the Set. Map m; Foo element = ...; Bar balue = m.remove(element); element.mutator(); m.put(element,value); //but the value is mutated We will discuss this issue further, and in greater detail, when we learn about Hashing. In fact, the same problem occurs when using binary search trees, so we will discuss this issues more than once this quarter. ---------- Note that we can examine the code in all these classes (I stripped out the Javadoc comments to make them shorted and more read) in the collections.jar file, either unzipping this file and examining the .java files or by disclosing these classes and viewing them in an Editor tab in an Eclipse project. In the next lecture we will discuss some of the simple array implementations of these interfaces. Many of the programming assignments this quarter require you to implement these interfaces with advanced data structures.