Using Collection Classes


Introduction:

In this lecture we will discuss the six standard generic data types/collection
classes: Stack, Queue, PriorityQueue, List, Set, and Map. You can read about
their details in the Javadoc that I have written for these classes (see the
"Javadoc of Collections API" link on the course web page). For now, focus on
the information in the Interface Summary, which uses Interfaces and
Subinterfaces to define all the operations applicable to these data types. We
will pay special attention to the iterator() method and how to use the Iterator
object that it produces.

In the next lecture we will focus on how to write concrete classes (using
inheritance and abstract classes) that define data structures implementing
these data types. We will study some of the simple Array data structure
implementations of these data types: specifically constructors in these
classes, the code for implementing their defined operations, and we will take
a brief look at the complexity classes of their methods (and expand on this
topic later in the quarter).

These pair of lectures are followed by Programming Assignment #1, which asks
you to represent and solve various problems using combinations of these data
types. In that assignment, you will focus on understanding/exploiting the data
types, using the implementations that I have provided. In the assignments #2,
#3, and #4, you will reimplement some of these data types using more
sophisticated and efficient data structures. In the final assignment, you will
once again use these data types (and their efficient implementations) to
implement a much more complicated data type: a graph.

After we learn more formally about using analysis of algorithms to study the
resource use (performance) of different data structures implementing data
types, we will have covered the three major topics in this course.

Please take some time and effort to understand the difference between data
types and data structures.


Six Data Types/Collection Classes:

Today we will examine the six most important data types in three groups, in the
following order, based on the similarity of their operations:

   (1) Stack, Queue, PriorityQueue
   (2) List and Set,
   (3) Map (and Map.Entry: a trivial data type related to Maps)

We will start discussing iterators in detail with the second group. Although
iterators also are available in all these collection classes, they are most
frequently used with groups 2-3.

I suggest that you open Eclipse and write some tiny programs that import these
interfaces and classes
  import edu.uci.ics.pattis.ics23.collections.*;
and construct an object from one class and use it to call a few methods, to
test out the knowledge you gain.

These six interfaces are arranged into two inheritence hierarchies. Here each
interface that defines at least one method (Stack, Queue, and Set define none)
is followed by the number of methods defined in that interface (in parentheses)

                       Iterable(1)                    Map(18)
                    /              \
  OrderedCollection(14)         Collection(17) 
   /     |       \                /     \
Stack  Queue  PriorityQueue(1)   Set    List(8) 


----------


Stacks, Queues, and Priority Queues

Stack, Queue, and PriorityQueue are similar, because they are all data types
that store their elements in a predefined order. That is, when we remove
elements from these collections, the order of removal is determined by the data
type of the collection. The orders are: Last-In-First-Out (LIFO, for a stack),
First-In-First-Out (FIFO, for a queue), and Highest-Priority-First-Out (for a
priority queue).

The interfaces for Stack, Queue, and PriorityQueue are "trivial": their
operations are mostly inherited from the OrderedCollection interface, which all
three extend. Only the interface for PriorityQueue specifies another operation
named "merge", which requires another PriorityQueue as a parameter.

So, let us start by looking at the OrderedCollection interface. Note that this
is a generic interface, with <E> specifying the type of the elements to be
placed in this collection, and E used when specifying some of the prototypes of
the methods: e.g., add takes an element of type E; remove returns an element of
type E. Note that if we specified E to be Object, then we could store
references to objects of any type in this collection.

I will mostly refer to elements in collections by their simpler name: values.

Also notice that this interface extends Iterable (lookup this interface by
clicking on it in the OrderedCollection Javadoc) to see that it just requires
that classes implementing this interface include an "iterator" method.

public interface OrderedCollection<E> extends Iterable<E> {
	
  //Commands
  public boolean add    (E e);
  public boolean addAll (Iterable<? extends E> es);
  public E       remove ()  throws NoSuchElementException;
  public void    clear  ();
  
  //Queries
  public E        peek    () throws NoSuchElementException;
  public boolean  isEmpty ();
  public int      size    ();
  
  //Miscellaneous
  public Iterator<E>          iterator    ();
  public E[]                  toArray     ();
  public OrderedCollection<E> newEmpty    ();
  public OrderedCollection<E> shallowCopy ();

  
  //Inherited (should override)
  public String   toString ();
  public boolean  equals   (Object o);
  public int      hashCode ();
}

In all my interfaces, I classify methods into two main categories: Commands
(also known as "mutators"), which change/mutate the data structure of the
objects they act on; and Queries (also known as "accessors") which examine,
but do not change the data structure of the objects they act on. Methods in
either category can return a result.

I also specify two other classifications, Miscellaneous and Inherited methods,
which are all Queries/accessors: every data structure must implement the
former methods; every data structure inherits but SHOULD OVERRIDE the latter
methods. We will see implementations of these methods in the classes (both
abstract and concrete) that implement these data types

Let's examine each of these methods more closely. To a large degree, the
operations here have almost identical meanings for the Stack, Queue, and
PriorityQueue data types: the big difference is in how their remove methods
work.


Commands:
      
The "add" method adds (includes) a value into the collection. We don't need to
understand here "how" this is accomplished by data structures that implement
this interface, but only that it will be correctly accomplished, setting up
for "remove" to remove the correct value according to which data type is used.

  Add returns a boolean result: true if adding the value changes the
  collection. For SQPQs (Stacks, Queues, and PriorityQueues), this method
  always returns true, because we can add duplicate values into these
  collections. Contrast this property with the Set collection, which allows NO
  DUPLICATES. For sets, sometimes calling "add" will return false (the value
  to be added was already in the set, so the set remains unchanged) and
  sometimes will return true (the value to be added was NOT already in the set,
  so the set changes).

The "addAll" method iterates through all the values produced by its parameter,
adding each one into the collection. It also returns a boolean result: true if
any added value changed the collection. Note that "addAll" may return false
for SQPQs, but only if the iterator parameter produces NO VALUES; if it
produces even one value, that value will be added to the SQPQ and change it.

  Note that because all but the Map interface inherit from Iterable, we can
  add all the values of a (non-Map) collection into any of these collections.
  So, if we want to add all the values from a Queue q into a Stack s, we can
  write s.addAll(q): Java will iterate through every value in q, adding each
  of these values to the stack s.

The "remove" method removes (discards) the next value from the collection, also
returning that value: stacks remove the most recently added value; queues
remove the least recently added value; priority queues remove the value with
the hightest priority. The add and remove methods work together to accomplish
this requirement. Finally, note that the remove operation can fail, if there
are no values in the collection to remove (size() is 0 and isEmpty() is true):
in such cases it indicates its failure by throwing the NoSuchElementException.

The "clear" method removes all the values currently the SQPQ; it is a void
method so returns nothing. Calling size() or isEmpty() after calling "clear"
will return a result of 0 or true respectively


Queries:
      
The "peek" method returns the same value as would calling the remove method
(or throws the same exception) but DOES NOT REMOVE that value from the
collections. Recall that queries DO NOT change the data structure of the
object they act on. So calling peek a second time returns the same value as
calling peek the first time (if no commands are called in between).

The "size" method returns the number values currently in the SQPQ. Generally
the add method increments this value by 1 and the remove method decrements
it by 1 (if there is at least one value in the SQPQ).

The "isEmpty" method returns whether or not there are any values in the SQPQ.
It is a convenient boolean method equivalent to testing whether size() == 0.


Miscellaneous:

The "iterator" method returns an iterator object that we can use to produce
all the values stored in a collection. See the Iterator class, and its three
methods hasNext, next, and remove, which we will discuss in more detail (along
with a special "for loop" for iterators) when we discuss Lists and Sets.

  Note that the order of values produced by an iterator of a SQPQ is the SAME
  ORDER the values in these collections would be returned by repeatedly calling
  the remove method.

The "toArray" method returns a reference to an array that is exactly big enough
(length == size()) to store all the values currently in the collection, which
is filled with the values in this collection in the SAME ORDER as these
values would be returned by repeatedly calling the remove method.

The "newEmpty" method returns an empty SQPQ, from the same class as the
collection it was called on. So, if the type of object that q refers to is an
ArrayQueue, calling q.newEmpty() returns an empty ArrayQueue. Likewise if the
type of object tht q refers to is a LinkedQueue, calling q.newEmpty() returns
an empty LinkedQueue.

The "shallowCopy" method returns a new SQPQ that is filled with the same
values as the collection it was called on.  The copy is SHALLOW, meaning that
the two ordered collections SHARE the same objects; such elements must be
mutated carefully (or not at all). So, if the type of q is ArrayQueue, calling
q.shallowCopy() returns an ArrayQueue storing the same values. A deep copy
would make a copy of the collection object, as well as copies of all the
objects stored in the collection.

I'll try to draw a picture in class illustrating the difference between a
shallow and deep copy. If I forget, ask me to do so.


Inherited (but should be overridden):

The "toString" method returns a String representation of a SQPQ collection.
It includes any useful information about the concrete data structure that
implements this ordered collection: e.g., for an array, it includes the length
of the array and the number of indices used in the array (always <= the
length).

The "equals" method returns whether or not the Object o stores the same
collection that it was called on: meaning the same data TYPE (it is not
necessary to be the same data STRUCTURE), with the same number of values,
appearing in the same order (because order is important in SQPQs). So given

  Stack<String> s  = new ArrayStack<String>();
  Queue<String> q1 = new ArrayQueue<String>();
  Queue<String> q2 = new LinkedQueue<String>();

Calling s.equals(q1) will alway return false; calling q1.equals(q1) will
always return true;  calling q1.equals(q2) can return either true or false,
depending on whether the two queues store .equal values in the same order. it
makes no difference that q1 refers to an object from the ArrayQueue class and
q2 refers to an object from the LinkedQueue class.


We will return to discuss the meaning and purpose of the "hashCode" method
later in the quarter, when we discuss hashing.


Constructors (not in Interfaces, but in classes implementing Interfaces):

Interfaces don't specify constructors, but every class implementing an
interface must specify at least one constructor. For the SQPQ classes,
there will be constructors that

  (1) Construct an empty SQPQ
  (2) Construct a SQPQ initialized by values in an array
  (3) Construct a SQPQ initialized by values in an iterable object

For 2-3, the SQPQ has values added from the array in lowest to largest index
and the iterable object in the order it is iterated through.

Of special note in all priority queue constructors is a parameter representing
an object implementing the Comparator interface, which the priority queue uses
to determine the relative priority of its values (and thus knows how to remove
the highest-priority value). The Compartor interface contains just the 
"compare" method.


Simple Uses of Ordered Collections

Now we have examined the meanings of the individual operations for SQPQ.
We still must learn how to organize operations to get jobs done, which we will
start now. Programming Assignment #1 deals with this topic in much more depth.

First, by convention when we use collection classes, we will declare the type
of variables using INTERFACES, and construct object references to store in
these variables from CLASSES implementing these interfaces. Recall that there
are three kinds of variables in Java: local variables, parameter variables,
and instance variables. Thus, we might write

PriorityQueue<Integer> q = new ArrayPriorityQueue<Integer>(priorityComparator);

and then write lots of code that manipulates q. Of course, this code will
consist solely of calling the methods specified in the PriorityQueue interface,
and the OrderedCollection interface that it extends. If at a later time we find
(and we will!) that the HeapPriorityQueue class provides a PriorityQueue
implementation that is much faster for most operations, we will need to change
only the right side of this one line in our code (selecting a different
PriorityQueue implementation).

PriorityQueue<Integer> q = new HeapPriorityQueue<Integer>(priorityComparator);

while leaving unchanged all the code that manipulates q. Since the
HeapPriorityQueue implementation supports exactly the same methods as the
ArrayPriorityQueue (specified by the PriorityQueue and OrderedCollection
interfaces) whatever code we wrote for q will still compile in Java and
run correctly as well -and mostly likely compute its answers much faster.

Finally, the use of the generic type parameter <Integer> ensures that the
"add" method can add references only  to Integer objects. Attempting to "add"
a reference to any other class objects results in a compiler error. So Java
ensures that only references to Integer objects are stored in q, because it
is declared PriorityQueue<Integer>.

Likewise, the "remove"/"peek" methods are guaranteed to return a reference to
an Integer object, so we don't need to cast to this class to call any Integer
methods. So, by using a generic type parameter, the Java compiler ensures only
reference to the specified type of objects is put in and removed/peeked from
the collection.

Here are a few small examples of such code. Suppose that we have a String[] x
and we want to reverse the order in which the values occur. We could easily
use a Stack for this job.

  Stack<String> s = new ArrayStack<String>();
  for (int i=0; i<x.length; i++)
    s.add(x[i]);
  for (int i=0; i<x.length; i++)
    x[i] = s.remove();

Hand simulate this code on a 4 or 5 element array to see that we have used the
LIFO property of stacks to reverse the order of the strings in the array.

In fact, that we could rewrite the first three lines as just

  Stack<String> s = new ArrayStack<String>(x);

because one constructor for ArrayStack<String> takes a String[] as a parameter.

Likewise, if we wanted to sort an array of Strings, we could use similar code:

  PriorityQueue<String> p = new ArrayPriorityQueue<String>(new AlphabeticalOrder());
  for (int i=0; i<x.length; i++)
    p.add(x[i]);
  for (int i=0; i<x.length; i++)
    x[i] = p.remove();

where

public class AlphabeticalOrder implements Comparator<String> {
  public int compare(String s1, String s2)
  {return -s1.compareTo(s2);}
}

Note for example that we want "A" to have a higher-priority than "B" (so it
will be removed from the priority queue first, and be put earlier in the 
Array). But "A".compareTo("B") returns a negative number (because "A" is less
than "B" in the alphabetical order), so we negate the result, and thus return
a positive number, giving "A" the higher priority. Of course, we could also
have used an anonymous class for this purpose as well.

Finally, it would be possible in the first and second examples to specify

  OrderedCollection<String> s = new ArrayStack<String>();

and

  OrderedCollection<String> p = new ArrayPriorityQueue<String>(new AlphabeticalOrder());

and then -IN BOTH CASES- follow these declarations by the SAME CODE. Notice
the operations performed on s and p are identical, just .add and .remove,
and both methods are specified in the OrderedCollection interface, that the
Stack and PriorityQueue interface inherit.

I use the Stack, Queue, and PriorityQueue interfaces for the types of
variables, so that by looking at their declarations, I can understand the
meaning of the .add and .remove methods when applied to the variable it
declares.

----------

Lists and Sets

Lists and Sets are similar, because they are data types that primarily allow us
to add and remove elements, and query whether those elements are present or
absent (something that we cannot easily/efficiently do with SQPQs). The primary
difference between these data types is that Lists store their elements at
specific integer indices that we can query and manipulate (like arrays), while
Sets have no concept of indices (and later we will see that we can leverage off
the "unorderedness" of Sets to implement some Set operations very efficiently).

Also recall that while a List can store the same element at different indices,
a Set cannot have duplicate elements (which is why its "add" method can
sometimes return true and sometimes return false.

The interface for Set is "trivial": it inherits all its operations from the
Collection interface. The interface for List adds another half a dozen
operations to the ones it inherits from the Collection interface; these added
operations all concern themselves with the INDICES of elements to examine
and/or manipulate.

So, let us start by looking at the Collection interface. Note that this one is
also a generic interface, with <E> specifying the type of the elements to be
placed in this collection, and E used when specifying some of the prototypes of
the methods. Again too, if we specified E to be Object, then we could store
objects of any type in this collection.

I will continue to refer to elements in collections mostly by their simpler
name: values.

Also notice that this interface extends Iterable (lookup this interface by
clicking on it in the OrderedCollection Javadoc) to see that it just requires
that classes implementing this interface include an "iterator" method.

public interface Collection<E> extends Iterable<E> {
	
  //Commands
  public boolean add       (E e);
  public boolean addAll    (Iterable<? extends E> es);
  public boolean remove    (E o);
  public boolean removeAll (Iterable<? extends E> es);
  public boolean retainAll (Collection<? extends E> c);
  public void    clear     ();
  
  //Queries
  public boolean contains    (Object o);
  public boolean containsAll (Iterable<? extends E> es);
  public boolean isEmpty     ();
  public int     size        ();
  
  //Miscellaneous
  public Iterator<E>   iterator    ();
  public E[]           toArray     ();
  public Collection<E> newEmpty    ();
  public Collection<E> shallowCopy ();

  
  //Inherited (should override)
  public String       toString ();
  public boolean      equals   (Object o);
  public int          hashCode ();
}

Let's examine some of these method more closely.


Commands:
      
The "add" method adds (includes) a value into the collection. We don't need to
understand here "how" this is accomplished by data structures that implement
this interface, but only that it will be correctly accomplished.

  When we add a value to a List, it is stored at an index one higher than the
  last used index (and becomes the last used index). Sets don't have indices,
  so we say just that the value is included in the Set.

  Add for a List always returns true (the List is always changed) but add for a
  Set returns true only if the value was not already in the set; if the value
  is already in the Set, the Set doesn't change.

  Regardless of whether we use a List or a Set, the contains query returns true
  for any value that we have added (but not yet removed) from a collection.

The "addAll" method iterates through all the values produced by its parameter,
adding each one into the collection. It also returns a boolean result: true if
any added value changed the collection. 

  Note that because all but the Map interface inherit from Iterable, we can
  add all the values of a (non-Map) collection into these collections.

The "remove" method removes (discards) one instance of the specified object
from the collection and returns true if it was successful (it found and removed
that value). Sets just have one instance of each value, but Lists can have many
(in which case only one is removed: the first one in the list: the one with the
lowest index).

  Note for OrderedCollection, the order of the collection determines which
  value to remove; for a Collection (List and Set) we must specify a parameter
  that is the value to remove.

The "removeAll" method iterates through all the values produced by its
parameter, removing each one from the collection. It also returns a boolean
result: true if any removed value changed the collection (e.g., it found and
removed any value).

The "retainAll" method iterates through all the values produced by its
parameter, retaining only those values in the collection (removing all others).
Thus, the collection will consist of only those values produced by the iterator
of c that were originally in the collection. It also returns a boolean result:
true if the collection changed (any value was removed).

The "clear" method removes all the values currently stored in the collection;
it is a void method so returns nothing. Calling size() or isEmpty() after
calling "clear" will return a result of 0 or true respectively


Queries:
      
The "contains" method returns whether or not the specified object is stored
somewhere in the collection (at most once for Sets; at least once for Lists).

The "containsAll" method returns whether or not all the objects produced by
its specified parameter are in the collection.

The "size" method returns the number values currently in the collection.

The "isEmpty" method returns whether or not there are any values in the 
collection.  It is a convenient boolean method equivalent to testing whether 
size() == 0.


Miscellaneous:

The "iterator" method returns an iterator object that we can use to produce
all the values stored in a collection. See the Iterator class, and its three
methods hasNext, next, and remove, which we will discuss in more detail (along
with a special "for loop" for iterators) after we discuss Lists below.

  Note that the order of values produced by an iterator of a List is the order
  that the values are stored in the list: the value in index 0, followed by the
  value of index 1, etc. up to index size()-1. The order of values produced by
  an iterator of a Set is undefined (and will depend on the class implementing
  this interface): its element are produced in no predictable order, so do not
  assume a Set iterator produces values in any interesting order; in fact.
  every time an iterator is created, it can iterate through the values in a
  DIFFERENT order! We can make few assumptions about the order.

The "toArray" method returns a reference to an array that is exactly big enough
(length == size()) to store all the values currently in the collection, which
is filled with the values in this collection. In a List, these will be in
order from index 0 through index size()-1; in a Set, the order is undefined
(and will depend on the class implementing this interface: it uses an iterator
to fill in the array values).

The "newEmpty" method returns an empty collection, from the same class as the
collection it was called on.

The "shallowCopy" method returns a new collection that is filled with the same
values as the collection it was called on.  The copy is SHALLOW, meaning that
the two ordered collections SHARE the same objects; such elements must be
mutated carefully (or not at all).  A deep copy would make a copy of the
collection object, as well as copies of all the objects they contain.


Inherited (but should be overridden):

The "toString" method returns a String representation of a collection. It
includes any useful information about the concrete data structure that
implements this ordered collection: e.g., for an array, it includes the length
of the array and the number of indices used in the array (always <= the
length).

The "equals" method returns whether or not the Object o stores the same
collection that it was called on: meaning the same data TYPE (it is not
necessary to be the same data STRUCTURE), with the same values. For Lists this
means a List with the same number of values, with the values appearing in the
same order (because order is important). For Sets this means a Set with the
same size and same values (order is not important).

We will return to discuss the meaning and purpose of the "hashCode" method
later in the quarter, when we discuss hashing.


Constructors (not in Interfaces):

Interfaces don't specify constructors, but every class implementing an
interface must specify at least one constructor. For the List and Set classes,
there will be constructors that

  (1) Construct an empty List and Set
  (2) Construct a List and Set initialized by values in an array
  (3) Construct a List and Set initialized by values in an iterable object

For 2-3, the Set/List has values added from the array in lowest to largest
index and the iterable object in the order it is iterated through. In
Lists the values are stored in this same order; in Sets there is no order.


Simple Uses of Sets

Continuing with our convention, here is some code that fills a Set with 5
different String values, gotten by prompting the user.

  Set<String> s = new ArraySet<String>();
  int count = 0;
  while (count <5) {
    String attempt = Prompt.forString("Enter a String");
    if (! s.contains(attempt) ) {
      s.add(attempt);
      count++;
    }
  }

First, notice that we don't need a local variable to count the number of values
in the Set (it has a query for that). So we can simplify this code to be

  Set<String> s = new ArraySet<String>();
  while (s.size() <5) {
    String attempt = Prompt.forString("Enter a String");
    if (! s.contains(attempt) )
      s.add(attempt);
    }
  }

Second, notice what it if we add a String that is already in the Set, the Set
remains unchanged. So, we don't need to test whether it is already contained in
s before adding it. Thus, we can further simplfy this code to be

  Set<String> s = new ArraySet<String>();
  while (s.size() <5) {
    String attempt = Prompt.forString("Enter a String");
    s.add(attempt);
  }

Finally, we don't really need the variable attempt (now that its value is used
in just one place), so we can simplify this code to be

  Set<String> s = new ArraySet<String>();
  while (s.size() <5)
    s.add(Prompt.forString("Enter a String"));

Here is another interesting equivalence.

  Set<String> s = new ArraySet<String>();
  int successfulRemoves = 0;
  ...
  String value = Prompt.forString("Enter String to try to remove");
  if (s.contains(value)){
    s.remove(value);
    successfulRemoves++;
  }

Notice that because remove returns a boolean, we can simplfy this code to be
    
  Set<String> s = new ArraySet<String>();
  int successfulRemoves = 0;
  ...
  String value = Prompt.forString("Enter String to remove");
  if (s.remove(value))
    successfulRemoves++;

The more you think about and practice using the methods in these classes, the
simpler and more elegant your code will become.


----------

List (only)

The List interface extends the Collection interface with the following methods,
each of which specifies an index as one of its parameters.

public interface List<E> extends Collection<E> {
  //Commands
  public boolean add    (int index, E e)                       throws IndexOutOfBoundsException;
  public E       remove (int index)                            throws IndexOutOfBoundsException;
  public E       set    (int index, E e)                       throws IndexOutOfBoundsException;
  public boolean addAll (int index, Iterable<? extends E> es)  throws IndexOutOfBoundsException;

  //Queries
  public E       get         (int index) throws IndexOutOfBoundsException ;
  public int     indexOf     (E e);
  public int     lastIndexOf (E e);
  public List<E> subList     (int startIndex, int stopIndex) throws IndexOutOfBoundsException ;
}


Commands (all involving indexes):

The "add" method adds (includes) the specified value into the List at the
specified index, moving the values at that index and all subsequent indexes up
by one. So long as this method doesn't throw the IndexOutOfBoundsException (the
specified index must be in the List or one beyond the end) it always returns
true and increments the size by 1.

The "remove" method removes (discards) a value from the List at the specified
index, moving the values one beyond that index and all subsequent indexes down
by one. It returns the removed value. So long as this method doesn't throw the
IndexOutOfBoundsException (the specified index must be in the List) it always
decrements the size by 1.

The "set" method sets (replaces) the specified index in the list to store the
specified value, MOVING NO OTHER VALUES. It returns the value originally stored
at that index.  The size doesn't change, but the method may throw the
IndexOutOfBoundsException (the specified index must be in the List)

The "addAll" method adds (includes) all the values produced by an iterator on
it parameter starting at the specified index, moving the values at that index
and all subsequent indexes up by the number of values added. So long as this
method doesn't throw the IndexOutOfBoundsException (the specified index must be
in the List or one beyond the end) and the iterator produces one or more
values, it always returns true and increments the size by the number of
values added.


Queries (all involving indexes):
      
The "get" method returns the values stored in the List at the specified
index, throwing the IndexOutOfBoundsException only if the specified index
is not in the list.

The "indexOf" method returns the number that is the lowest index that stores
the specified value if it is present the List, and -1 (never a legal index)
if it is not present in the List.

The "lastIndex" method returns the number that is the highest index that stores
the specified value if it is present the List, and -1 (never a legal index)
if it is not present in the List. Note that if index() == lastIndex() then
there are no duplicates of the specified value.

The "subList" method returns a new List consisting of all values specified
between the specified indices (inclusive). The original list remains unchanged
(which is why this method is a query). The lowest index in the returned List
is 0 (just like any List). This method throws the IndexOutOfBoundsException
if either index is not in the List).


Note that an array is just like a List with only a get/set method

  a[i]              is equivalent to a.get(i)
  a[i] = something  is equivalent to a.set(i,something)

Note that although the syntax needed to retrieve and store a value in a List
is more cumbersome than the syntax needed for an array, there are lots of other
methods built into Lists that perform more complex List manipulation, which
would require writing additional statements if we were using arrays.

Therefore, it is typically more convenient to use Lists than arrays, even if
doing the simple operations looks a bit more complicated.

For more proof, also see the Collections (notice it is plural) class, which
defines lots of static methods that operate on List (and other kids of) values.
For example there is a method to sort a List and a method to shuffle (put in
random order) a list, etc. Of course, it makes no sense to sort a set because
there is no order of values in a set.


----------

Iterators:

We will now begin our discussion of iterators, which applies in some form to
each of the six collection classes. We can use an iterator to produce every
value stored in a collection, and in the process also remove selected values
from the collection (without calling the remove method on the collection, but
instead calling the remove method on the iterator). We will examine this first
(and simpler) use first.

When we call iterator(), it produces an object from a class that implements
the Iterator interface (much like a StringTokenizer, but for collections not
Strings).

public interface Iterator<E> {
  public boolean hasNext();
  public E       next   () throws NoSuchElementException;
  publit void    remove () throws IllegalStateException, UnsupportedOperationException;
}

To produce every value stored in a collection, we need to call just the first
two methods. Here is how to print every String in a Set<String> s; we could
write the exact same code if s were a Stack, Queue, PriorityQueue or List. Note
that a Set (unlike the others) has no intrinsic order, and a Set (unlike the
others) gives no way other than this to examine its elements (no parameterless
"remove" like a SQPQ or "get" like a List).

  Iterator<String> is = s.iterator();
  while (is.hasNext())
    System.out.println( is.next() );

Note that our code should always check/call hasNext before calling next; if
next is called when hasNext returns false, then next will throw 
NoSuchElementException. Students often forget this important rule. Also note
that every time you call next it produces a new value. So the boolean
expression x.equals(is.next()) || y.equals(is.next()) DOES NOT check whether
the next value produced by the iterator is equal to x or y: it checks whether
the next value in the iterator is equal to x, or the value after that one in
the iterator is equal to y. Two calls are made to is.next(), with each
returning a different result. Even if we checked is.hasnNext(), this
expression might throw an exception because of the second call to next might
not be able to produce a value.

What code could we write to print a random value from a Set, by using its
iterator? This is easy for a List: generate a ranndom number in the range
[ 0, s.size()) ) and call get with that random number; but, a Set has no
indices. Hint: still generate the random number but do something different with
it, involving an iterator.

We can make use of a for loop and be a bit clever here and write some more
condensed code (combining the iterator declaration and check)
  
  for (Iterator<String> is = s.iterator(); is.hasNext(); /* see body */)
    System.out.println( is.next() );

Java recently introduced an even more special iterator-related for loop. It is
called the FOR-EACH iterator. If s is an object implementing the Iterable 
interface (meaning only that it contains an iterator() method, which Sets do)
we can write the code above as simply as

  for (String e : s)
    System.out.println( e );

Read this as, "for every String e when we iterate through Set<String> s,
print e".

Generally, when c is a class that contains an iterator() method returning
Iterator<T>, we can write

  for (T e : c) 
    body using e

which java translates into the following more complicated to read code

  for (Iterator<T> hiddenI = c.iterator(); hiddenI.hasNext(); /* see body */) {
    final T e = hiddenI.next()
    body using e
  }

We cannot refer to the name hiddenI in our code: it is a secret name that only
the Java compiler knows about.

So, if we need to produce all the values stored in an Iterable collection class
(all but Map are), and we do not need to remove any values, then we should
prefer writing this compact form of the FOR-EACH loop.

As a final example, suppose that we have List<String> a,b; want to write a
method that returns the first index that stores a different value (and -1 if
all indexes store equal strings). We can write this method symmetically as

public static firstDifference(List<String> l1, List<String> l2) {
  Iterator<String> l1I = l1.iterator();
  Iterator<String> l2I = l2.iterator();
  int i=0;
  for (/* see above */; l1I.hasNext() && l2I.hasNext(); i++)
    if (!l1I.next().equals(l2I.next()))
      return i;

  if (!l1I.hasNext() && !l2I.hasNext())
    return -1   //Lists the same size with no differences
  else
     return i;  //One List is longer
}

We can also call the remove method on an iterator: it removes from the
collection the value "just returned" by next. This meaning causes a 
bit of confusion for beginners, but when you understand this rule, you will
see that it is an obvious and correct one.

Here is one example to help. Suppose we have Set<String> s and we want to
remove all Strings whose lengths are > 5. We can write

  for (Iterator<String> is = s.iterator(); is.hasNext(); /* see body */)
    if ( is.next().length() > 5 )
       is.remove();
      
We cannot use a for-each if we need to call remove, because that form declares
no explicit iterator on which we can call remove.

Notice that inside the if's condition, Java produces the next value from
the Set and checks it for the desired property; it then calls remove which
removes that value (the one "just returned" by next). This leads to two
possibilities of remove throwing IllegalStateException, as illusted below.

1) Iterator<String> is = s.iterator();
   is.remove(); 

Because next has never been called; there is no "just returned" value.


2) Iterator<String> is = s.iterator();
   is.next();   //Assume there is one value to produce
   is.remove(); //This statement removes that value
   is.remove(); //This statement throws an IllegalStateException

The second/final call to remove throws an exception; the "just returned" value
has already been removed, and we cannot remove it again; we need to call next
again, before another value can be removed.

Some classes do not even support calling remove; in those cases (collections
where we cannot remove values via the iterator) calling remove throws
UnsupportedOperationException.

Finally, there is one more exception related to iterators. If we are iterating
through a collection and we change the collection in any way OTHER THAN THROUGH
THE REMOVE ON THE ITERATOR (typically by adding or removing a value) a 
subsequent use of an iterator method will throw a 
ConcurrentModificationException. 

The basic idea here is that if you change a collection while you are in
the process of iterating over it (but not through the iterator itself), then
the meaning of the iteator becomes unclear, so the iterator refuses to work
further.  Iterators that do this are known as FAIL-FAST iterators, as they fail
quickly if their underlying collection is changed.

For example here is the WRONG WAY to remove long string from Set<String> s.

  for (Iterator<String> is = s.iterator(); is.hasNext(); /* see body */) {
    String toCheck = is.next();
    if ( toCheck.length() > 5 )
       s.remove(toCheck);
  }

If we call s.remove(...) then the Set is changed -not through the iterator-
and then when we call is.next() the next time in the loop, it will throw an
exception.

----------

Maps and Map.Entry

Maps are the most interesting of the six data types/collection classes. A
map associates "keys' (of some type) with "values" (of some type, which can be
the same or different than the key type). Often the key is a simple type (e.g.,
String) while the value is some more complicated data type (e.g., Set). Each
key is "associated with"/"mapped to" one value at any time. Typically once we
associate/map a value with a key, we will later use the key to retrieve/get its
value (and possibly change the value: if the value is a Set, we may add
something to that Set). We can also remove a key, ask whether a key or value is
in a map, and iterate through all the keys, values, or mappings (represented by
an object of some class implementing the Map.Entry interface, defined inside
the Map interface).

So, let us look at the Map interface. Note that this is a doubly generic
interface, with <K> specifying the type of the keys in the Map, and with <V>
specifying the type of the values in the Map; both K and V are used when
specifying some of the prototypes of the methods: e.g., put takes a key of type
K and a value of type V; remove takes a key of type K and returns a value of
type V.

Also notice that this interface DOES NOT extend Iterable, unlike all the other
collection classes. It does include three methods that produce Iterable
objects: entries, keys, and values. So, given a Map<String,Set<String>> m;
(which maps every String to a Set of Strings: e.g., a word mapping to a Set of
words that are its synonyms) we can print every key/value association/mapping
on its own line by writing either

  for (String k : m.keys())
    System.out.println(k + "->" + m.get(k));

or 

  for (Map.Entry<String,Set<String>> e : m.entries())
    System.out.println(e.getKey() + "->" + e.getValue());

We will discuss both of these loops in more detail after discussing the methods
that they use from the Map interface below.


public interface Map<K,V>  {
	
  //Commands
  public V    put    (K key, V value);
  public void putAll (Iterable<Entry<K,V>> keysValues);
  public V    remove (K key);
  public void clear  ();
  
  //Queries
  public V                    get          (Object key);
  public boolean              containsKey  (Object key);
  public boolean              containsValue(Object value);
  public Iterable<Entry<K,V>> entries      ();
  public Iterable<K>          keys         ();
  public Iterable<V>          values       ();
  public boolean              isEmpty      ();
  public int                  size         ();
  
  //Miscellaneous
  public Entry<K,V>[] toArray     ();
  public Map<K,V>     newEmpty    ();
  public Map<K,V>     shallowCopy ();
  
  //Inherited (should override)
  public String       toString ();
  public boolean      equals   (Object o);
  public int          hashCode ();

  
  public interface Entry<K,V> {
    public K getKey   ();
    public V getValue ();
    public V setValue (V newValue);
  }
}

This interface defines a public nested interface named Entry. Outside the Map
interface, we refer to this type as Map.Entry. Each Entry stores one key/value
association/mapping. This interface says that given an object from a class
implementing Mapy.Entry (say, values returned by the entries interator) we can
retrieve the key and value, and we can change the value associated with a key
(but cannot change the key). See the iterator class above.

I'll try to draw a picture in class illustrating what a simple ArrayMap looks
like, using its SimpleEntry class (which implements Map.Entry). If I forget,
ask me. 

Let's examine each of these methods more closely.


Commands:
      
The "put" method maps a key to a value (adds an Entry to the Map associating
that key and value). If that key was already in the Map, it returns the value
that it previously mapped to; if it wasn't already in the Map, it returns null.
This method is like "add" in the other collection classes, but it returns not a
boolean, but the old value that the key mapped to (and if this returned value
is null, it typically means that a new key was addd to the map, because that
key did not map to anything before calling put).

The "putAll" method iterates through all the Entrys produced by its parameter,
putting each key/value mapping into the Map.

The "remove" method removes (discards) the key and whatever value it maps to
in the Map. It also returns the value that the key (now removed) used to map
to. If the key is not in the Map, the Map remains unchanged and this method
returns null.

The "clear" method removes all the Entrys currently the Map; it is a void
method so returns nothing. Calling size() or isEmpty() after calling "clear"
will return a result of 0 or true respectively


Queries:
      
The "get" method returns the value that the specified key maps to in the Map;
it returns null if the key is not in the Map (maps to no value). The get method
in Maps is like the get method in Lists, but uses a key instead of an index.

The "containsKey" method returns whether or not the specified key maps to any
value in the Map.

The "containsValue" method returns whether or not any key(s) map to the
specified value in the Map.

The "entries", "keys", and "values" methods each return an Iterable object.
We can iterate over the returned result to produce all the keys, values,
and Entrys respectively. Notice that the keys are unique, but two keys can
map to the same value, so the values() object may produce the same value
multiple times.

  Like Sets, there is no special order in which any of these methods will
  produce their results.

  If we created a List from any of these Iterable objects, the size of the List
  would be the same as the size() of the Map.

  We will see how to put all the keys from a Map into a List, then sort that
  list and use it to print all the keys and their values where the keys are
  in sorted order.

The "size" method returns the number key/value associations/mappings currently
in the Map. Generally the put method increments this value by 1 (but not if the
key is already in the Map, then the size stays the same) and the remove  method
decrements it by 1 (if that key is in the Map, otherwise no change is made to
the map).

The "isEmpty" method returns whether or not there are any key/value
associations/mappings in the Map. It is a convenient boolean method equivalent
to testing whether size() == 0.


Miscellaneous:

The "toArray" method returns a reference to an array that is exactly big enough
(length == size()) to store all the Entrys currently in the Map, which
is filled with the Entrys in this Map. The order is undefined (and will depend
on the class implementing this interface).

The "newEmpty" method returns an empty Map, from the same class as the Map
it was called on..

The "shallowCopy" method returns a new Map that is filled with the same values
as the Map it was called on.  The copy is SHALLOW, meaning that the two Maps
SHARE the same objects; such elements must be mutated carefully (or not at
all).


Inherited (but should be overridden):

The "toString" method returns a String representation of a Map. It includes any
useful information about the concrete data structure that implements this Map:
e.g., for an array, it includes the length of the array and the number of
indices used in the array (always <= the length).

The "equals" method returns whether or not the Object o stores the same
Map that it was called on. For Maps this means the same data type, the same
number of associations/mappings, with the same keys mapping to the same values.

We will return to discuss the meaning and purpose of the "hashCode" method
later in the quarter, when we discuss hashing.


Constructors (not in Interfaces):

Interfaces don't specify constructors, but every class implementing an
interface must specify at least one constructor. For the Map class, there will
be constructors that

  (1) Construct an empty Map
  (2) Construct a Map initialized by values in an Entry array
  (3) Construct a Map initialized by values in another Map object


Simple Uses of Maps

Let us assume for simplicity that we have declared Map<String,Set<String>> m;

We have seen that the following code prints all the key/value associations in
a Map, one per line. It iterates over every key k, printing it and the value
the key maps to in the Map.

  for (String k : m.keys())
    System.out.println(k + "->" + m.get(k));

Because the keys() method produces keys in no special order, we need to use the
following, more complicated code to print all the key/value associations in a
Map, IN ALPHABETICAL ORDER ACCORDING TO k.

  List<String> keyList = new ArrayList<String>(m.keys());
  Collections.sort(keyList); //We can add a Compartor to sort specially
  for (String k : keyList)
    System.out.println(k + "->" + m.get(k));

Here we start by constructing a List with all the keys in some order: here
we supply an iterator over keys to the ArrayList constructor. Then we sort that
List. Finally we iterate through the sorted list, printing each key (they are
produced by the iterator in alphabetical order) and the value it maps to.
Without adding a comparator argument to Collections.sort(keyList); the keys
will be sorted in increasing alphabetical/dictionary order, according to
the compareTo method in the String class.


Let's next examine a few ways to add/update an association/mapping in the Map
described above. Suppose we have a String key k1 and want a String v1 to be a
value in the Set that k1 maps to. There are two cases to consider:

  (1) k1 is a key in the Map (maps to a Set) so we should add v1 to that Set

  (2) k1 is not a key in the Map (maps to NO Set), so we should put in an
      association/mapping from k to a new Set that contains only v1

This code, or some variant of it, appears in most programs whose most basic
data type is a Map (most of those in Programming Assignment #1).

I will try to illustrate the execution of the code below with object/instance
diagrams to help you understand what is happening in class. If I forget, ask
me to do so.

Here is some code that directly implements this algorithm.

  if (m.containsKey(k1))			  //k1 a key in m?
    m.get(k1).add(v1);				  //add v1 to it associated set
  else {
    Set<String> mappedValues = new ArraySet<String>();  //Create empty set
    mappedValues.add(v1);				//add v1 to it	 
    m.put(k1,mappedValues);			  //associate k1 with this set
  }

Notice in this code that we must "search" the Map twice: once for contains(),
and once for either get() or put(), depending which if part is executed. 

Instead, the following code always searches the Map once for get(), and once
more for put() -but only if the key is not present; if the key is present, it
does no second search.

  Set<String> mappedValues = m.get(k1);    
  if (mappedValues == null)
    mappedValues = new ArraySet<String>();
    m.put(k1, mappedValues);
  }
  //mappedValue is now guaranteed to be a set asscociated with k1 in map m
  mappedValues.add(v1);

Notice that once we have a reference stored in mappedValues to the Set the key
maps to (whether from the get() or the new Set put() in), we just add to that
Set (mutating the Set that is in the Map).

Finally, the shortest code to accomplish our goal, shown below, takes at worst
three searches: contains(), put(), and get().

  if (!m.containsKey(k1))
    m.put(k1, new ArraySet<String>());
  m.get(k1).add(v1);

To understand maps, you should ensure that you understand why each of these
code fragments does what it must do.


Mutation in PriorityQueues, Sets, and Maps:

PLEASE DO NOT MUTATE any elements in a PriorityQueue or Set, or any KEYS in a
Map. It is perfectly OK and frequently useful (as shown above) to mutate the
VALUEs in a Map.

Advanced/efficient data structures that implement these data types store values
based on their hashCode method. Changing the state of such an object will
change its hashCode, causing the object to be lost in the collection.

So, for example, if you wanted to change an object in a Set, you should first
remove it, then mutate it, then add it back into the Set.

  Set<Foo> s;
  Foo element = ...;

  s.remove(element);
  element.mutator();
  s.add(element);

Likewise, if you wanted to change a KEY in a Map, you should first remove it,
then mutate it, then put it back into the Set.

  Map<Foo,Bar> m;
  Foo element = ...;

  Bar balue = m.remove(element);
  element.mutator();
  m.put(element,value);  //but the value is mutated

We will discuss this issue further, and in greater detail, when we learn about
Hashing. In fact, the same problem occurs when using binary search trees, so we
will discuss this issues more than once this quarter.

----------

Note that we can examine the code in all these classes (I stripped out the
Javadoc comments to make them shorted and more read) in the collections.jar
file, either unzipping this file and examining the .java files or by disclosing
these classes and viewing them in an Editor tab in an Eclipse project.

In the next lecture we will discuss some of the simple array implementations
of these interfaces. Many of the programming assignments this quarter require
you to implement these interfaces with advanced data structures.