Implementing Collection Classes with an Inheritance Hierarchy


Introduction:

In this lecture we will discuss the details of the implementation of Set: one
of the six standard generic data types/collection classes that we have
discussed (Stack, Queue, PriorityQueue, List, Set, and Map). The implementation
uses an Array data strucure in a simple manner to store all the elements in the
Set. There are strong similarities between this implementation of Set and the
Array implementations of the other data types; also, Set is of middling
complexity compared to Stack/Queue/PriorityQueue (lower) and List/Map (higher),
so it is a good class to examine in detail. In Programming Assignments 2-4,
you will be writing more complicated (but still similar) implementations of
many of these collection classes.

Generally, the structure of all the Java code (interfaces and classes) is shown
below. First, here is the structure of the interfaces. The lower ones are
subinterfaces that extend the higher ones.

                              Iterable
                             /         \ 
         OrderedCollection                  Collection          Map
       /        |          \                   /   \            
    Stack      Queue     PriorityQueue      Set    List         

Recall the Stack, Queue, and Set interfaces supply names for these types, but
do not add any methods to the OrderedCollection (for Stack and Queue) or
Collection (for Set) interface they extend. The PriorityQueue interfaces adds
just the "merge" method. The List interface adds half a dozen  methods to the
Collection interface it extends: methods that specify an index in their
parameters. Map has no subinterfaces.

On the implementation side, here is the structure of the classes.

            AbstractOrderedCollection
           /           |             \
    AbstractStack  AbstractQueue  AbstractPriorityQueue
          |            |                     |
    ArrayStack     ArrayQueue       ArrayPriorityQueue

and

             AbstractCollection            AbstractMap
               /             \                  |
         AbstractSet     AbstractList           |
             |                |                 |
          ArraySet         ArrayList         ArrayMap

We will see that the AbstractOrderedCollection, AbstractCollection, and
AbstractMap define many methods. Some of these inherited methods will be
overridden in the Array implementation (and later, in other implementation)
classes, to be more efficient once the data structure is actually known. In
the middle, the Abstract subclasses (e.g., AbstractStack) often define just
the equals method, which  is different for every data type: i.e., for
ordered collections, stacks can only be .equals to stacks, queues can only be
.equals to queues, etc.

During the quarter we will write other implementations of these classes; each
will extend the Abstract class that it implements (one of the six Abstract
classes shown above) and override some of their inherited methods to be more
efficient for that particular data structure.

We have already examined the "code" inside the interfaces (really just the
headers of methods representing the operations that we can perform on that data
type). We are now going to examine in detail the code in the
AbstractCollection, AbstractSet, and ArraySet classes.

You can view all the code in the interfaces, abstract classes, and concrete
classes by creating an Eclipse project that builds a path to the
collections.jar library; then you can double-click on any of the the .class
files in that library, and examine the .java file for that class (which comes
up in an editor tab).

I SUGGEST THAT YOU EXAMINE THESE FILES WHILE YOU READ THIS LECTURE NOTE.

Note that you cannot change the contents of the file in this class. It is read-
only in the editor. if you want to actually experiment with these classes and
change them, you can download "All Collection Files" (and NOT USE
collections.jar at all)

Before we examine the interface, abstract class, and concrete class, here are
some statistics for them:

Interfaces
  Collection        :  35 lines
  Set               :   7

Classes
  AbstractCollection: 193 heavily using iterators; most overridden in ArraySet
  AbstractSet       :  30
  ArraySet          : 264

The .size method is a simple example of a method written concretely in the
AbstractCollection using an iterator, but reimplemented much more efficiently 
in ArraySet. In AbstractCollection we can use an iterator to iterate through 
all the values, counting each one, to compute the result of .size; in ArraySe
we  have an instance variable that stores how many array elements are occupied,
so we just return this value.

Review the Collection and Set interfaces, from the previous lecture. Recall
that a Set stores unique (no duplicate) elements. Commong operations are adding
values to a Set, checking whether a value is contained in a set, and removing
values from a Set; we can also query for the size of a Set, convert a Set to an
array, and iterate over all the elements in a Set.


AbstractCollection:

Now let's focus on the generic AbstractCollection class, which implements 
the Iterable interface (by defining an iterator() method, even though it is
abstract). At the summary level.

  This abstract class implements many of the methods (all but add, iterator,
  newEmpty, shallowCopy, and equals). But the implemented methods typically
  iterate over elements and thus might be mucy slower than necessary. It is
  interesting that we can write so many methods by using an iterator, but most
  of these methods will be overridden in the concrete ArraySet class, to make
  them run more quickly, once we know that the data structure is an array.

The AbstractSet class will define the equals method.

The ArraySet class will define the methods add, iterator, newEmpty, and
shallowCopy; by doing so, ArraySet becomes a working concrete class. If it
only defined these methods (and didn't override the slower ones), it would
still be a concrete class, but run more slowly than it should.

  This abstract class declares the variable modCount, which our code should
  increment every time it modifies/mutates the data structure implementing the
  data type. The iterators must be able to know about such changes (in which
  case they will throw the ConcurentModificationException). The declaration is

    //Used in subclasses: see their iterators
    protected transient volatile long modCount = 0;

In this abstract class, this variable is typically incremented whenever the
code in this classes calls remove on an iterator (right before the removal).
Transient and volatile are  "magic" (to us) keywords that tell the Java
compiler that treat this variable very carefully; the type long is an unsigned
integer, so Java uses the extra "sign" bit to be able to represent numbers
twice as high as ints.

Now, let's look at the code of the individual methods defined in the
AbstractCollection class.


Commands:

Note the add method is not defined here, but is defined in the ArraySet class.

(1) addAll: This method iterates over es, calling add on every value produced
by the iterator. If any call to add returns true, it will return true; if there
are no calls to add, or all calls to add returned false, it will return false.
Another way to code the body would be

  boolean modified = false;
  for (E e : es)
    modified = add(e) || modified;
    //because of short-circuit evaluation,  modifed || add(e) would NOT WORK!
  return modified;

This method does NOT need to be overridden in the ArraySet concrete class.


(2) remove: This method iterates over the Set, removing the specified value via
the Iterator (and returning true) if the same value as the specified parameter
o is produced. Note that o == null is treated as a special case in the if,
checking item == null; in the else it checks o.equals(item), which could also
be written item.equals(o). If it finds such an item it immediately removes it
and returns true; if it fails to find such an item, it eventually returns
false.

  This extra code complexity exists because we can write s.add(null) and it put
  a null reference into our Set. If we never do this, the code will always
  execute the else part of the if, checking o.equals(item). Note thet calling
  null.equals(...) throws NullPointerException because null refers to no
  object (so Java cannot execute the .equals method of "that" object).

This method should be overridden in the ArraySet concrete class, with a faster
and more direct method.


(3) removeAll: This method iterates over es, calling remove on every value
produced by the iterator. If any call to remove returned true, it will return
true; if all calls to remove returned false, it will return false. 

  See the comment about modified = ... in part(1), which applies here as well

This method does NOT need to be overridden in the ArraySet concrete class.

Note that the remove method is overridden in the ArraySet class to be faster.
So, when removeAll is called on an ArraySet, it calls the faster remove method
defined in ArraySet (that is how inheritance works).


(4) retainAll: This method iterates over the Set, calling remove (via the
iterator) on every value that is not in the parameter collection. If remove is
called one or more times, this method returns true; if remove is never called,
this method returns false.

This method does NOT need to be overridden in the ArraySet concrete class.


(5) clear: This method iterates over the Set, calling remove on every produced
by the iterator, thus leaving the Set empty.

This method should be overridden in the ArraySet concrete class, with a faster
and more direct method.


Queries:

(6) contains: This method iterates over the Set, looking for the same value as
the specified parameter o via the Iterator (and returning true) if the same
value as the specified parameter o is produced. Note that o == null is treated
as a special case in the if, checking item == null; in the else it checks
o.equals(item), which could also be written item.equals(o).  It returns whether
such a value was found.

This method should be overridden in the ArraySet concrete class, with a faster
and more direct method.


(7) containsAll: This method iterates over es, calling contains on every value
produced by the iterator. If any value is not contained in the Set, it will
immediatley return false (without needing to check any other values). If every
value produced by the iterator is contained in the Set, it will return true.

This method does NOT need to be overridden in the ArraySet concrete class.

Note that the contains method is overridden in the ArraySet class, so when
containsAll is called on an ArraySet, it calls the contains method defined in
ArraySet (that is how inheritance works).


(8) isEmpty: This method returns true if the size is 0.

This method does NOT need to be overridden in the ArraySet concrete class.

Note that the size method is overridden in the ArraySet class, so when isEmtpy
is called on an ArraySet, it calls the size method defined in ArraySet (that is
how inheritance works).


(9) size: This method uses an iterator to count how many values are in the
Set (how many times we can call next on the iterator before hasNext becomes
false) and returns this value.

This method should be  overridden in the ArraySet concrete class, with a faster
and more direct method.


Others:

(10) hashCode: We will skip this method for now (and discuss it later, when we
discuss hashing).


(11) toString: This method uses an iterator to catenate all the vaues produced
by the iterator (with commas in between them). A StringBuffer is a more
efficient way to catenate a large number of values (which is then converted to
a String when the method returns). We could use String to make the code
simpler, but it would be less efficient.

This method should be overridden in the ArraySet concrete class, with a faster
and more direct method, but mostly one that includes further details about the
Array implementation of this collection class

So, it is amazing how many methods we can concretely define in this abstract
class, by using an iterator; but many (not all) should be overridden by more
efficient methods in the ArraySet (dealing directly with the Array used to
store the Set)

Finally, at the bottom of the AbstractCollection I specify as abstract those
methods in the interface Set but not defined here. The equals method will be
defined in AbstractSet, the rest are defined in ArraySet.


AbstractSet:

The AbstractSet class defines just the equals method. When is a Set equal to
another arbitrary object? If the other object is a Set (regardless of the data
structure used to implement the Set data type), has the same number of values,
and if every value in one set is in the other set. Note that this code cannot
appear in the AbstractCollection class, because it is specific to Sets (for
example, see the equals method in the AbstractList class: it is different
because in Lists, we must compare to another List, and the order of values in
a List, unlike a Set, is important). Here is a line-by-line analysis of this
equals method.

Lines 10-11: If the Set .equals was called on is == to the parameter o, the
Sets are the same (==), and hence .equals with no further analysis.

Lines 12-13: If o is not some kind of Set (don't worry about the type of its
elements), it cannot be equals to the Set .equals was called on.

Line 14: Cast o to be some kind of Set (don't worry about the type of its
elements); this cast is GUARANTEED TO WORK, because lines 12-13 would return
false if this were not some kind of Set.

Lines 15-16: If the sizes of the Sets are not equal, the Sets cannot be equal.

Lines 19-27: Iterate over every object in s (the Set represented by o); if the
Set .equals was called on does not contain any of these values, the Sets are
not equal.

Line 28: If we found that o was a Set, the same size as the set .equals was
called on, and that every value in o was contained in the set .equals was
called on, then sets are equal.

It doesn't matter which set we iterate over/which set we call .contains on.
We can write

  ...
  for (E e : this)		//This refers to the Set .equals was called on
    if (!s.contains(e))
      return false;
  ...

Note that by the time we execute this code, we know both are Sets and both
are the same size.


ArraySet:

For a start, ignore the comments on lines 10-29 until after we discuss
analyzing algorithms a bit. Let's jump down to where the instance variables
are declared.

Lines 224-225: These define the two instance variables for an Array
implementation of a Set: the variable "set" stores an E[] and objectCount
stores an int count of how many elements are currently in the set. This is
NOT THE SAME as set.length (objectCount is always <= set.length). We will
soon examine how the array size is increased; it is typically doubled in
length.


Constructors:

Constructor 1: this(1) uses the second constructor (with the argument 1
matching the int parameter, initialCapacity) to construct an Array with
length 1.

Constructor 2: construct an ArraySet of the specified length, thowing an
IllegalArgumentException if intialCapacity is <= 0. We must start with an
Array that can contain at least 1 value (otherwise doubling fails: 2*0 = 0).

Constructor 3: construct an empty ArraySet and then adds all the values
produced by iterating over es.

Constructor 4: construct an ArraySet that adds all the values stored in the
Array es (by first allocating an Array whose size is at least 1 and at least
as long as the Array whose values are being put into the Set: if some values
in the array are duplicates, the set won't be as long as the array).

Now, onto the methods defined here. I will say for each whether it must be
defined here, or is defined here to override an inherited method, to improve
the speed of the operation. After we discuss all the operations, we will
finally discuss the ArraySetIterator class, defined inside the ArraySet class
and used by the iterator() method.

Also note that methods that change the data structure (either or both of the
instance variables set/objectCount) must increment the  modCount instance
inherited from AbstractCollection. This instance variable is declared to be
"protected", not "private", so that the code in subclasses can access it
directly.


Methods:

(1) add: This method MUST appear in this concrete class. If the specified value
is already in the Set, it immediatly returns false. Otherwise it ensures the
length of the array is big enough for 1 more value (we will discuss the 
ensureCapacity method below), puts the value in the required spot (increasing
the objectCount), increments modCount (because the data structure has changed),
and returns true.

  Note that we start with objectCount = 0; so, the first time we execute
    set[objectCount++] = e;
  objectCount gets incremented to 1 (there is now 1 object in the set Array),
  but its old value is used in the [], setting
     set[0] = e;
   the next time we execute
     set[objectCount++] = e;
  objectCount gets incremented to 2 (there are now 2 objects in the set Array),
  but its old value is used in the [], setting 
     set[1] = e;
  Notice that the set Array always stores values in indices 0 through
  objectCount-1. That is, if objectCount is 2, the indices 0 and 1 are used.

  (1a) ensureCapacity: This method is called by add; it ensures the set Array
  instance variable is long enough to contain minCapacity values. If not, it
  remembers the "old" set Array, determines the newCapacity (at least twice as
  big as the current set Array's length), constructs a new array that big and
  stores its reference into the set Array, and finally copies into it all the
  values in the old set Array.

  So, increasing the length of an Array is really accomplished by allocating a
  new, bigger Array, and then copying the needed values into it.


(2) remove: This method SHOULD appear in this concrete class, providing a speed
improvement over the method inherited from AbstractCollection. It calls a
private helper method, indexOf (discussed below) to compute the smallest index
containing the value o (of course since sets have no duplicates, the smallest
index is the unique index, if the value is present); if indexOf returns -1, o
is not in the Set so remove just returns false (and does not change the data
structure). Otherwise, it calls the private helper method removeAt (discusse
below) with this index and returns the result returned by removeAt, discussed
below.

  (2a) indexOf: This method uses a for loop to scan the set Array directly
  (instead of using an iterator), looking for o (doing a different comparison
  depending on whether o == null). If it finds that value, it returns the index
  in which it first/uniquely appears; if not it returns -1.

  (2b) removeAt: This method moves the last value in the Set to index i,
  removing the value what was at index i from the Array. In the process it
  decrements objectCount (there is now one less object in the set Array). It
  replaces this last index with null, increments modCount (because the data
  structure has changed) and returns true.

    Notice if there are five objects stored in the set Array, then objectCount
    is 5 and the objects are stored in indices 0-4 (call these values a, b, c,
    d, and e). If we call removeAt(1), then the object at index 4 (e) is copied
    to index 1 (so b, the old value at index 1, has been removed), index 4 is
    set to null. So now indices 0-3 (objectCount is decremented to 4) have the
    four remaining values (a, e, c, d).

    Note that because the order of a Set is undefined, it was fine for us to
    move the value at the last index to any other index; if the order were
    important (as in a List), we would have to shift a bunch of values to
    retain their relative ordering, instead of doing this faster operation.


(3) clear: This method SHOULD appear in this concrete class, providing a speed
improvement over the method inherited from AbstractCollection. This method
just sets objectCount to 0 and increments modCount because the data structure
has changed. It would be useful to set every index containing an object to null
(that would make garbage collection work better), but it is not necessary to
work correctly, since an objectCount of 0 implies there is nothing useful in
all the array indices. We will talk about garbage throughout the quarter.


(4) contains: This method SHOULD appear in this concrete class, providing a
speed improvement over the method inherited from AbstractCollection. This
method calls a private helper method, indexOf (discussed above as 2a), to
compute the smallest index (of course since sets have no duplicates, the
smallest index is the unique index, if the value is present) at which o is
stored in the Set: -1 means o it is not in the Set, any other value means it
is in the Set.


(5) iterator: This method MUST appear in this concrete class. It uses the
ArraySetIterator, which will will discuss below.


(6) size: This method SHOULD appear in this concrete class, providing a speed
improvement over the method inherited from AbstractCollection. The instance
variable objectCount directly stores the number or elements in the instance
variable set, so that number represents the size.


(7) newEmpty: This method MUST appear in this concrete class. It just uses the
first constructor to return a new, empty ArraySet.


(8) shallowCopy: This method MUST appear in this concrete class. It uses the
second constructor to declare a new ArraySet (whose length is the same as the
Set that this method was called on). Then it copies the objectCount of the
ArraySet that this method was called on to the objectCount of answer. Finally,
it copies all the reference from the set Array of the Set that this method was
called on into the set Array of answer, and then returns the answer Set.

  Notice that both Sets store references to the same values, so if we mutate
  a value from one Set, the other Set will refer to that mutated value. That
  is why this is called "shallow" copying. "Deep" copying would make a copy
  of every value in the original Set as well.

  Note that some classes, like String, are immutable: they contain no mutator
  methods. So, shallow copies don't cause any potential problems for these
  classes


(9) toString: The method SHOULD appear in this concrete class. Mostly it is
here to show information about the data structure implementing the Set, which
is an array. The returned String uses a StringBuffer to efficiently catenate
the class name (ArraySet) with the objectCount instance variable and length of
the set Array instance variable, followed by every index in the set Array that
is being used and the value at each).


ArraySetIterator:

The iterator method returns an object constructed from the ArraySetIterator
class. The state of this object remembers the state of the iteration (which
elements were seen, which was seen last and which will be seen next). Because
this class is declared inside ArraySet, it can refer to all the instance
variables in ArraySet: the set[], objectCount, and modCount. 

  Technically, when one class is declared inside another it is called a NESTED
  class. Nested classes can be static or non-static; this class is non-static
  so it is called an INNER classes. Each object constructed from an inner class
  will have a reference to the object of the outer class that constructs it
  (whichever object the iterator method is called on), which is how it accesses
  the information in the outer class.

  In ArraySetIterator, the outer class instance variables are used in hasNext
  (objectCount), next (set and  modCount), and remove (set and modCount). Also,
  modCount appears in the initialization of expectedModCount in line 262.  

We will first examine how ArraySetIterator objects are constructed and how
their hasNext and next methods work, then we will examine the code in the
remove method. The code is short, but a bit complicated.

There is no specific constructor for ArraySetIterator, but its three instance
variables (lines 261-263) are all explicitly initialized. Most importantly
is nextIteratorIndex (intitialized to 0, because 0 is the index of the next
element to be iterated over). Also note that the modCount instance variable of
the Set is copied into the expectedModCount instance variable of the nested
class: if a mutator is called on the Set, its modCount will increment and
become unequal to expectedModCount.

(1) hasNext: This method returns whether or not nextIteratorIndex is strictly
less than objectCount. If it is, then there is still another element in the set
Array that the iterator can produce (this index hasn't exceeded the end of the
array).

(2) next: This method checks for two error conditions: (a) whether the data
structure has changed during the iteration and (b) whether there are no more
elements that the iterator can produce; in either case, it throws an exception.
If it passes both checks, it gets the answer from the nextIteratorIndex
(incrementing that index for the next time hasNext/next is called).

  In addition it sets the instance variable removedAlready to false. This means
  that a new element was produced and returned by the next method in this
  iterator, and that element can be removed by the remove method in this
  iterator.

  Notice that removedAlready is initialized to true, so trying to call remove
  before you even call next will throw an exception. Basically, next sets this
  variable to false and remove sets it to true, forcing next to be called
  at least once between calls to remove.

(3) remove: This method checks for two error conditions: (a) whether the data
structure has changed during the iteration and (b) whether the element returned
by the most recent call to next has already been removed; in either case, it
throws an exception. If it passes both checks, it removes the element at the
index one less than nextIteratorIndex (the index of the element just returned
by next). Then it decrements nextIteratorIndex because there is a new element
for the iterator to produce at that index.

  For example, suppose the set Array stores a, b, c, and d at indices 0-3.
  Now suppose nextIteratorIndex is 1: that means next has already returned
  a (the element in index 0). Now we call remove, to remove a from the Set. The
  call to removeAt removes the element at position 0 by storing d there. Thus,
  the set Array now stores d, b, and c at indices 0-2. The next method, if it
  is called again, should return d, so we need to decrement nextIteratorIndex
  from 1 back to 0,  because a new/different element is now in the set Array at
  index 0.

  I could have written the two lines

    removeAt(nextIteratorIndex-1);
    nextIteratorIndex--;

  as the one line

    removeAt(--nextIteratorIndex);

  but I thought things were complicated enough already.

Note that removeAt will increment modCount, but it is OK for an iterator
to mutate the data structure and keep iterating (because the iterator is doing
all the work and knows how to continue doing it), so the new modCount is
stored back into expectedModCount and removedAlready is set to true, which
requires another call to next before remove can be called again.


We have now taken the complete tour through all the .java files relating to
the Set data type and its array implementation. Feel free to examine any/all
of the 5 other data types and their array implementations. Each will have some
unique code, but there will also be much similar code as well.

In Programming Assignment #2 you will write list implementations of various
collection classe: each will implement its data type by using a linked list.
Much of the code will mirror what is written here (converting array access to
linked list accesses). Especially interesting is the code relating to Iterators
(where hints will be given).