Introduction |
This lecture begins a series of lectures that together act as a capstone for
the first half of the quarter.
These lectures explore collection classes, which are a sophisticated and
powerful group of related interfaces and classes that are useful in a wide
variety of programming tasks; they are also designed and built using most
of the high-level Java programming features that we have studied: interfaces,
inheritance, abstract classes, concrete classes, exceptions, and analysis
of algorithims (and iterators, and inner-clases which we learn in this
lecture).
Thus, studying collection classes now brings all these features back into focus, each sharing the spotlight with -and complementing- the others. Once you accept the power of these tools, and practice using them, you'll never think about plain old arrays in the same way again (and maybe you'll never think of them at all). This lecture begins with a taxonomy of the collection classes, and an introduction to the interfaces, abstract classes, and concrete classes that are used to define them. The second part of this lecture explores iterators, which is an interesting concept by itself, but which takes on a central role when coupled with collection classes. We can create extremely sophisticated and efficient behavior by composing collection classes (for modelling complex data) and processing them with iterators. |
Design of Collection Classes |
In this section we will briefly examine the standard Java collection classes,
to get a birds-eye view of how they are designed and implemented.
First we will examine the overall relationships among the interfaces, abstract
classes, and concrete classes; these features naturally arrange themselves
into three vertical levels in a hierarchy.
Then we will examine one example more closely, at each level in the hierarchy.
The following legend explains the three levels and some of the notation used. |
  |
Overall, there are three major groups of "collection" classes, eact rooted in one special interface: OrderedCollection, Collection, and Map. Each group is presented using the three vertical levels of interfaces, abstract classes, and concrete classes. We depict the OrderedCollection group as follows. |
  |
Next, we depict the Collection group as follows. |
  | And finally, we depict the Map group as follows. |
  |
In each, interfaces are implemented by abstract classes (which supply some
but not all of the needed methods) which are extended by concrete classes
that inherit some behvaior from the abstract classes and define all their
abstract methods.
In some cases, an interface extends another interface:
List and Set extend Collection;
SortedSet extends Set; and
SortedMap extends Map.
In many cases, one abstract class extends another, before being extended by
a concrete subclass.
Recall that concrete subclasses automatically implement the interfaces that
their superclasses implement, and [abstract]classes can implement more
than one interface (but can extend only one superclass); this is a
fundamentcal difference between interfaces and classes.
In the next three sections, we will examine the Javadoc of an interface, two abstract classes, and a concrete class, forming a chain from top to bottom, starting with the Collections interface and ending with the HashSet concrete class. We will spend two more lectures covering this same material in more detail. |
Interfaces |
The methods specified in the Collection interface are summarized
in the following Javadoc.
The semantics of most methods should be somewhat intuitive.
Primarily, objects can be added and removed from a collection, and checked for
membership.
Methods like add, contains, and remove which have
Object parameters, have counterparts addAll,
containsAll, and removeAll which use another
Collection as a parameter, adding, removing, or checking for
containment each of the values in the parameter.
Read these Javadoc descriptions now; we will examine them again, in more detail, in a subsequent lecture (including hashCode). We will discuss the iterator method (and the Iterator interface) at the end of this lecture. |
  | Now we will examine an abstract class and abstract subclass that implement a surprising number of these methods, leaving the concrete subclass to implement very little (mostly the iterator, which many other methods use). Remember that there are 15 methods specified in this interface. |
Abstract Classes | Now we will examine the Javadoc of an abstract class and its abstract subclass that implement the interface specified above (although some of its methods are abstract). The Collection interface specified 15 methods. The AbstractCollection class specifies one protected constructor and 14 methods; it doesn't define equals or hashCode which are inherited from the Object class that this one implicitly extends (and overridden is the abstract subclass in the next section); it adds the specification of a toString method. Of these 14 (=15-2+1) methods, all but two iterator and size are defined here (they are defined to be abstract), although operations like add, contains, and remove are implemented to throw UnsupportedOperationException. Yet the addAll, containsAll, and removeAll methods are completely written here, using the promised iterator and eventually-working add, contains, and remove methods: they iterate through the parameter collection, calling the appropriate method for each element. Here is the Javadoc of AbstractCollection (because of size constraints, it appears in a smaller font). |
  |
  | Now we will examine the Javadoc of a concrete class that extends AbstractSet. |
Concrete Classes |
Now we will examine the Javadoc of a concrete class that extends
AbstractSet.
This class is implemented by an advanced and efficient technique called
hashing
We will briefly discuss hash tables at the end of this series of lectures.
Needless to say from the constructors, hash tables have "initial sizes" and
"load factors"; you will need to use only the first two constructors this
quarter; the other parameters relate to fine-tuning the efficiency of
the underlying hash table, and is a topic you will study in IC-23.
The HashSet class extends AbstractSet. It specifies some public constructors and 8 methods: defines the two abstract methods that it inherited, iterator and size; it overrides some other inherited methods as well (it either implements methods like add, which were previously defined to throw UnsupportedOperationException and can improve the peformance of others, knowing something about the hash tables that implement this class) Here is the Javadoc of HashSet (because of size constraints, it appears in a smaller font). |
  |
So, the structure leading from the Collection interface to the
HashSet concrete class involved all sorts of interesting
inheritance of abstract and concrete methods.
In some sense, we can use this class without knowing all this information,
by examining the Collection and Set interfaces, and knowing
that it implements its methods efficienty.
Finally, we when discuss the actual implementation of all these methods in concrete classes, we will discuss their performance in terms of big O notation, where n is typically the number of values stored in the collection. |
A Simple Example of Using a Collection Class |
Assume that we have to prompt the user for a sequence of n Strings,
such that the they are different.
The following program uses the Set interface and HashSet class
to solve this problem.
Note that i is incremented only if a new value is successfully
added to the set.
System.out.println("Enter "+n+" different Strings"); Set unique = new HashSet(); for (int i=1; i<=n; ) { String attempt = Prompt.forString("Enter unique value #"+i); if (!unique.contains(attempt)) { unique.add(attempt); i++; } }Actually, this code can be simplified if you carefully read the real Javadoc (not just the summary listed in this lecture) for the HashSet class. The add method adds the value to the set only if it is not there (generally, sets don't contain duplicates) and returns whether it added it. System.out.println("Enter "+n+" different Strings"); Set unique = new HashSet(); for (int i=1; i<=n; ) { String attempt = Prompt.forString("Enter unique value #"+i); if (unique.add(attempt)) i++; }Generally, we use the name of the interface (Set) for the type of the variable (unique), not the name of the actual class we are using for its implementation (HashSet). But we must use this name when constructing an object.
Technically, the most elegant solution to this problem uses the fact that the
Set knows its size, so we don't need the counter i at all.
|
Iterators (and inner classes) |
The Iterator interface (declared in the java.util package) is
used heavily with collection classes (both to implement the collection
classes and by users of the collection classes), but it it also useful in
other contexts.
Iterators allow us to process, one at at time, a sequence of values
(which we are said to iterate over), either stored in some collection or
generated on the fly, as necessary.
Before we begin to study using iterators with collection classes, we will first
study them independently (and a bit later in simplified collection classes).
The Iterator interface is very simple (as a heuristic, the simpler the
interface, the more useful it is): it comprises jut three methods.
public interface Iterator { public boolean hasNext(); public Object next (); throws NoSuchElementException public void remove (); throws UnsupportedOperationException, IllegalStateException }The first two methods are the most interesting and useful; the third method does not have to be implemented to do anything but throw the UnsupportedOperationException. The standard code fragment using the first two methods looks like for (Iterator i = an-iterator-object; i.hasNext(); ) { Object o = i.next(); ...process o... }So, the hasNext and next methods work together to query whether the loop should continue, and if so, access and process the next value in the collection. Sometimes the next object retrieved will be cast immediately and stored in a non-Object variable.
Because next can throw an exception, we can also write the following
loop to iterate over a sequence of values, which is equivalent to the first
The Squares class illustrates how to write a simple class that
implements this interface; it contains only a constructor and the methods
required to implement the Iterator interface.
|
Inner Classes to Implement Iterators in Collection Classes |
Now lets start learning about how iterators are implemented in collection
classes.
But, instead of starting with "real" collection classes (involving interfaces
and abstract classes), we will write an iterator in a simpler context: the
SimpleQueue class that we studied in the lecture on
1-d Arrays.
Please find and review this code now; pay close attention to the
class invariant that rear refers to the largest member index
that stores a value.
To illustrate iterators, we will add a method to this class with the prototype
public Iterator iterator(): it returns a reference to an object from
some class that implements Iterator (over the sequence of values that
this collection stores).
It is simple to use the result returned by this class with the idiom shown
above; to print all the values in the queue (without changing their
order in the queue) we would write just
In this context, it is useful to discuss the concept of an inner class. Sometimes -as with iterators- it is useful to declare a private class INSIDE another (public) class. By doing so, Java allows some interesting behaviors to emerge.
So, how can we extend the SimpleQueue to allow iterators?
First, we must define the iterator method in that class as
|
  |
If we wanted to implement the remove method, we would have to declare
an additional removedAlready instance variable and rewrite the
methods as follows.
private class SimpleQueueIterator implements Iterator { private int next = 0; private boolean removedAlready = true; public boolean hasNext() {return next <= rear;} public Object next () throws NoSuchElementException { if (!hasNext()) throw new NoSuchElementException ("SimpleQueueIterator: next - no next value"); removedAlready = false; return q[next++]; } public void remove () throws UnsupportedOperationException, IllegalStateException { if (removedAlready) throw new IllegalStateException ("SimpleQueueIterator: remove - cannot remove"); //Backup next: the value at that index will be removed. //Shift everything beyond to the left by 1, and decrement // rear too (since removing a value shrinks the queue). removedAlready = true; next--; for (int i=next; i<rear; i++) q[i] = q[i+1]; q[rear--] = null; } }The differences are that next resets removedAlready to false when it successfully advances to return another value, and remove checks this value. If remove does remove a value, it must shift the values following it to the left by 1 index, just like the dequeue method: in fact, a generalized helper method private void shiftLeftFrom (int shiftStart)would be useful to simplify both methods. Finally, removedAlready is initialized to true, because we cannot call remove until next has been called; this initialization ensures this requirement.
We can use the remove method to remove every odd value in a queue of
Integers easy.
The code is
|
Detecting Concurrent Changes |
There is still one loose end connecting collection classes and iterators.
If we are iterating through a collection class, and we modify it (add or
remove a value), how should this affect the way our iterator works in the
future, when hasNext/next are called?
It turns out that there is no uniformly good way to answer to this question
for all collection classes, so instead all Java collection classes (with
iterators) prohibit it from occurring.
Java prohibits it from occuring by forcing the next method to throw the ConcurrentModificationException if any state changes have been made to the object it is iterating over: i.e., we are modifying a collection class while concurrently iterating over it. This is accomplished in a surprisingly simple and efficient way (funny how those two properties often go together) by the iterator class. When this approach is used, the iterators are said to be fail-fast iterators. On the SimpleQueue side:
Finally, if the remove method in the iterator successfully removes an element from the collection, no other iterator should be able to continue. This is accomplished by incrementing the modCount of the collection first, and then copying this new value into the expectedModCount of this iterator. Therefore, this iterator can continue (because it knows how to ensure that it still iterates over all the remaining values in the collection) but any other iterator is forced to throw ConcurrentModificationException. |
Problem Set |
To ensure that you understand all the material in this lecture, please solve
the the announced problems after you read the lecture.
If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a TA, or any other student.
|