Self-Referential Classes/Linked Objects

Introduction to Computer Science II
ICS-22


Introduction In this lecture we will begin studying self-referential classes, which leaded to linked objects. Initially we will discuss in great detail the simplest kind of linked objects: linear linked lists. Later this week, we will discuss more advanced linked list: lists with header/trailer nodes, circular lists, doubly-linked lists, etc. By the end of the quarter we will have also fully discussed more generally, non-linear, self-referential structures (trees and graphs).

Linear linked structures are used, like arrays, to store a sequence of values. We will soon examine how to use all kinds of linked structures to implement a many different collection classes. We have also already seen implementations for some of these same collection classes, using arrays. Deciding whether to use an array or linked structure when implementing a class depends on its ease of implementation and performance (time and space): so, complexity classes expressed as big O notation will make a come-back.


The LN Class The major topic of discussion for the next few days is how to use a slight generalization (private static instead of public, with Object replacing int) of the following class:
  public class LN {
    public LN (int i, LN n)
    {value = i; next = n;}

    public int value;
    public LN  next;
  }
Read LN as an abbreviation for List Node. It may seem odd to make such a fuss about such a simple class: one that contains only a constructor, along with two public instance variables (so no accessors or mutators are needed to retrieve or alter their contents). The key to all its interestingness is that every object constructed from the LN contains a next instance variable that stores null or a reference to another object from this class.

The value null is used to indicate the last node in the list (there is no list node following it). Thus, null references serve an important purpose here; before now, about the only use of null was when Java reported NullPointerException when trying to call a method on a variable (typically a member of an array) that stored null; i.e., did not refer to an object.

The data structures that arise from repeatedly constructing objects from this class are similar in content to arrays: they store some sequence of values. An example of one such structure is shown below.

  Notice that the last next instance variables stores a null. Although this picture is accurate, it is also cumbersome to draw. Instead, we will often abbreviate such pictures as shown below. Notice that the picture below blurs the important distinction between an object and its instance variables, and in addition the value null appears just as a / (a slash) in the final object. Ensure that you know how to draw the real pictures before adopting this shorthand.

Note that the placement of the LN objects on the page and the exact places that the arrows point to "in" the LN object (really they refer to the whole object) is irrelevant. What is relevant is being able to follow references from one object to the next one, etc. As with all pictorial representations of data, these will be important when we need to visualize complicated data structures and the operations we perform on them.

Note the following accesses and their classes and values. Any cascaded access chain ending in .value specifies some instance variable storing an int; any cascaded access chain ending in .next specifies some instance variable storing null or a reference to the next LN object.

  • x specifies a variable that stores a reference to the first object of class LN
  • x.value specifies an instance variable (in the first object) that stores the int 5
  • x.next specifies an instance variable (in the first object) that stores a reference to the second object of class LN
  • x.next.value specifies an instance variable (in the second object) that stores the int 3
  • x.next.next specifies an instance variable (in the second object) that stores a reference to the third object of class LN
  • x.next.next.value specifies an instance variable (in the third object) that stores the int 8
  • ...
  • x.next.next.next.next.next.next specifies an instance variable (in the sixth object) that stores null Only the last object can store null, otherwise a link is broken.

Traversing Linked Lists via Cursors As with arrays, the main processing that we do with linked lists is to traverse them, visiting (and somehow processing) every value that they store. For example, the following code computes the sum of all the values contained in a linked list. Note the similarity to a for loop for processing a sequence of integers stored in an array.
  int sum = 0;
  for (LN r=x; r!=null; r=r.next)
    sum += r.value;
  System.out.println("Sum = " + sum);
The for loop's parts
  • initialize the reference r to refer to the first object in the linked list (using the standard semantics for copying references)
  • test r to ensure that it still refers to some object in the linked list (e.g., its values is not null)
  • advance r to refer to the next object in the linked list (or possibly null, when there are no more objects to refer to) Tthe statement r=r.next is the key to understanding how to traverse linked lists: it is the linked-list equivalent of i++ when processing arrays.
The following picture is a hand simulation that illustrates how r takes on successive references to the objects in the linked list, summing the value instance variable of each.
  Compare this loop with the equivalent loop for adding up all the values in an array. Each uses a special variable (which we shall now call a cursor) to step through every value in their respective data structures. Generally, a cursor is is a small value that refers to a specific location in a data structure that can store many values. The most often use of the term cursor refers to the one seen in a text editor: there the cursor refers to the location where characters will be entered/deleted among the many characters in a file.

For arrays, the cursor is an int index; for linked lists, the cursor is a reference to some object in a linked list. We check array cursors numerically to determine whether they are still small enough to refer to index a value in an array; we check linked list cursors against null to determine whether they still refer to some object in the linked list. We advance array cursors by incrementing them; we advance linked list cursors via a statement like r = r.next; which updates r to refer to the next value beyond the one r current refers to. The for loop in Java is thus general enough to compactly specify all the information needed for traversing lists.


More Examples of Code for Linked List Processing Here are some more examples of processing linked list code via traversals. I encourage you to hand simulate this code until you become comfortable with processing linked lists by traversals. While I can often look at array code and figure it out, often with linked list code, especially if it is subtle, I need to perform a hand simuation to understand or debug it.

The first code fragments prints all the values stored in a linked list, with commas between the values.

  for (LN r=x; r!=null; r=r.next)
    System.out.println(r.value + (r.next!=null ? "," : ""));
Next, let's assume that we declare DecisionInt criteria; and store into it a reference to some object constructed from a class that implements the DecisionInt interface. We can modify the code above to print only the OK values in this linked list (we cannot put commas after values -do you see why- so we use a commas-before-values approach.
  boolean first = true;
  for (LN r=x; r!=null; r=r.next)
    if (criteria.isOK(r.value)) {
       System.out.println( (first ? "" : ",") + r.value);
       first = false;
    }
Next is a static method for computing the length of a linked list.
  public static int length (LN l)
  {
    int answer = 0;
    for (LN r=l; r!=null; r=r.next)
      answer++;
    return answer;
   }
The following picture illustrates a hand simulation of this method using a static call frame, first transmitting the argument reference to the parameter reference.
  We can "simplify" the code in this method by using the parameter itself to traverse the list.
  public static int length (LN l)
  {
    int answer = 0;
    for (; l!=null; l=l.next)
      answer++;
    return answer;
   }
Executing this code sill leaves x refering to the list and still returns a value of 4. The parameter changes, but not its matching argument (which just receives its initial value from the argument). In many cases, such methods are written inside classes to use one of their instance variables (referring to the beginning of a linked list) and we need to write a for loop that declares a new variable for traversing the list (leaving the instance variable unchanged).

Next, another static method, this time for computing the number of times that some int value occurs in the list.

  public static int countOccurences (LN l, int toCheck)
  {
    int answer = 0;
    for (LN r=l; r!=null; r=r.next)
      if (r.value == toCheck)
        answer++;
    return answer;
   }

So, it is frequently the case with for loops traversing linked lists (as was the case with for loops traversing arrays), that the same pattern for initializing, testing, and advancing is used. But as also seen with arrays, some code does have slight variants, as is illustrated below in code that computes whether a linked list is sorted in ascending (actually non-descending) order.

  public static boolean isSorted (LN l)
  {
    for (LN r=l; r!=null && r.next!=null; r=r.next)
      if (r.value > r.next.value)
        return false;
    return true;
   }
Notice the more complicated test for continuation/termination: if either r stores null, or it refers to an object whose next instance variables stores null. Examine how this method works carefully when passed an empty list (null) as a parameter and a reference to a linked list that contains just one object: in both cases the linked is sorted because we cannot find a pair of values out of order. Here, short-circuit evaluation is critical: if r!=null is is false evaluating r.next!=null would throw NullPointerException, if it were evaluated.

Building Linked Lists It is easy to update the variable x from null (refering to an empty list) to refer to a list with one value, say 5: x = new LN(5,null); Likewise, we can extend this list to a second value, say 2 by writing x.next = new LN(2,null); And, we can extend this list to a third value, say 7 by writing x.next.next = new LN(7,null); We can continue in this manner to build a linked list manually. In fact, we can even write this as one complicated assignment: x = new LN (5, new LN (2, new LN(7, null))); But, this method requires us to write code manually for every linked list that we must build.

Now let us examine ways to build lists automatically, say by reading values from a file. The following simple code reads all the values from a file (assume TypedBufferReader tbr has been declared and intialized) and places them in a linked list.

  for (;;)
    try {
      x = new LN(tbr.readInt(), x);
    }catch (EOFException eofe) {break;}
Note the x on both size of the equal sign. Generally, the code x = new LN (someValue, x); adds someValue at the front of the list (whether the list is originally empty or not: try both ways) making its next refer to the original linked list.

Note that executing the statement x = new LN(tbr.readInt(),x); is in the complexity class O(1): no matter how big of a list x refers to, this operation is completed in constant time (independent of the list size). Thus, since this operation is executed N times, the complexity class of reading in a list is O(N).

The only drawback of this code is that the values appear in the linked list in the reverse of the order in which they appear in the file (which may or may not be a problem depending on how we want to process the data); hand simulate this code for reading a file with just a few values in it to verify this statement. If we need the sequence of values stored in the same order as they appeared in the file, we have many possible ways to accomplish this task. First, we can reverse the list (see the next lecture), or make the code more complicated.

Below, let's examine the code needed to place a new list node at the rear of a linked list. The strategy for doing so can be easily described:

  • If the list is empty, change x to refer to the new list node
  • If the list is not empty, locate its last list node (the only one currently storing null in its next instance variable), then store a reference to the new list node in that next field.
In both cases, the new list node stores null in its next instance variable, because it is becoming the "new" last node in the list. Assume that int someValue stores the value we want to add at the end of the list. We can translate this description into the following Java code.
  if (x == null)
    x = new LN (someValue,null);
  else {
    LN r = x;
    for (; r.next!=null; r=r.next)
      {}
    r.next = new LN(someValue,null);
  }
There are a few interesting aspects of this code.
  • The cursor r must be declared outside the for loop, not inside it. The reason is that r (which is made to refer to the current last node in the list) must be altered after the loop terminates; any variable declared inside the loop would not be usable outside the loop body.
  • The purpose of the loop is to store into r a reference to the current last node in the list: so all it does is advance r until it refers to a list node whose next stores null. So note that the body of the loop is just {}: this empty block statement could be replaced by just the empty statement ; but we prefer to emphasize this "lack of actions" with an empty block. Some programmers would even write {\*nothing*\} as the body of the loop.
  • The continuation test is only r.next!=null; the first time we check this test we know that r stores a non-null reference (see the if statement). Likewise, for each subsequent loop we know that r, when updated to r.next will also store a non-null value. Thus, this code will never throw a NullPointerException.
An alternative way to write this code (less elegant, in my opinion, but possibly easier to understand) is
  if (x == null)
    x = new LN (someValue,null);
  else {
    for (LN r=x ;; r=r.next)  //Continuation test is always true
      if (r.next == null) {
         r.next = new LN(someValue,null);
         break;
      }
  }
Given this code for adding a new value at the end of a linked list, we can now write code that reads all the values from a file and places them in a linked list in the correct order, using two nested loops.
  for (;;)
    try {
      int someValue = tbr.readInt();
      if (x == null)
        x = new LN (someValue,null);
      else {
        LN r = x;
        for (; r.next!=null; r=r.next)
          {}
        r.next = new LN(someValue,null);
      }
    }catch (EOFException eofe) {break;}
Although this code is correct, it can be very inefficient: its complexity class, if N is the number of values read, is O(N2). The source of the inefficiency is repeatedly scanning the linked list to find its end. The first time requires scanning past 0 nodes; the second time requires scanning past 1 node, the third time requires scanning past 2 nodes, ... the Nth time requires scanning past N-1 nodes. So, to insert N nodes into this list requires scanning 0+1+2+...+N-1 nodes.

We have seen this before (and should memorize if we haven't already): the general formula for 1+2+...+N is N(N+1)/2, so in this case the result is (N-1)(N-1+1)/2 = N(N-1)/2. Of course, this is in the complexity class O(N2). For inserting 1,000 nodes in a list requires scanning 499,500 nodes; for inserting 1,000,000 nodes in a list requires scanning about 500,000,000,000 nodes! Even at 1 billon scans per second, that would take 8 minutes. Everett Dirkson, an ancient Senator from Illinois, was once quoted as "A billion dollars here, a billion dollars there, pretty soon it adds up to real money". A later day Dirkson, one in computer science, might say the same about nanoseconds adding up to real time.

We can drastically speed-up this process (changing its complexity class back to O(N)) by "caching" a reference to the last node in a list (and updating it whenever a new node is added at the end). In this way we can eliminate scanning altogether (or another way to look at it is scanning each newly added list node once, and remembering it: this is similar to amortized complexity computations). The technique of caching is a wonderful example of a space for time tradeoff: by increasing the amount of space (by storing an extra reference) we can decrease the amount of running time. The code to accomplish this same task more efficiently (in time; a bit less in space) is

  LN lastCache = null;
  for (;;)
    try {
      int someValue = tbr.readInt();
      if (lastCache == null)
        lastCache = x = new LN (someValue,null);
      else {
        lastCache.next = new LN(someValue,null);
        lastCache      = lastCache.next;
      }
    }catch (EOFException eofe) {break;}
In fact, as an extra bonus this code is even simpler (less confusing) than the code shown above, if we understand how the lastCache is initialized and used. Java masochists could "simplify" the else block by writing the single statement (similarly to what appears before the else)
 lastCache = lastCache.next = new LN(someValue,null);
whose double assignment does the job of both statements in the block. Notice that
 lastCache.next = lastCache = new LN(someValue,null);
with the order of the two values receiving the assignment, FAILS TO DO THE JOB.

Problem Set To ensure that you understand all the material in this lecture, please solve the the announced problems after you read the lecture.

If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a TA, or any other student.

  1. Hand simulate the following code fragment on an empty list, first on a list containing one object (x refers to it), and on the linked lists illustrate at the beginning of this lecture.
      LN answer = null;
      for (;x!=null;) {
        LN toMove   = x;
        x           = x.next;
        toMove.next = answer;
        answer      = toMove; 
      }
        
      x = answer;

  2. Describe the result of using the incorrect statement
     lastCache.next = lastCaache = new LN(someValue,null);
    in the code above that reads values from a file and puts them in a list, in the same sequential order. For a file of 3 values does this code throw an exception? If not, what does the final list look like?