Skip Lists


Skip Lists can be used to implement sets and maps. We will discuss how skip
lists are stored (their order and structure properties), and the various
algorithms that operate on them. Skip lists allow a relatively quick locate,
insert, and delete methods (quick like trees - O(Log2 N), not as quick as hash
tables). Also, it is easy and efficient to iterate over all the values in a
skip list, including removing values.

The basic material on skip lists and the algorithms that manipulate them are
covered in our textbook, Goodrich and Tamassia, on pages 415-423. They show
pictures for all the operations, which will not appear in these notes (but I
will draw them in class). You should be able to hand-simulate the various skip
list algorithms on such pictures.

The exact structure of a skip list depends on a random number generator, not
only on the data stored in a skip list (building two skip lists with the same
values added in the same order would most likely result in different skip
lists, except at the S0 level). This is unlike BSTs and AVL trees where the
tree structure depended exactly on the order in which the values were added to
the tree.

Since there are excellent random number generators, skip list should be less
pathological than search trees. The textbooks states that skip lists in
practice are faster than AVL and other balanced trees. But we can use trees to
store all sorts of information (e.g., like expression trees, not just
information to search for), and are generally more insteresting than skip
lists, so we have studied trees much more in ICS-23.

Note that each node in a skip list is quadruply linked: with prev, next, above,
and below references. Initially, an empty skip list contains two levels, S0
and S1, both storing only -inf and +inf nodes. Generally, all values in a
Skip list are stored at level S0 (making it easy to iterate through all the
values) and each higher level has ABOUT (determined by the random number
generator) 1/2 the values as its directly lower one (except the top level,
which always has only the two values -/+inf). See Goodrich and Tamassia, page
415-416.

Since every horizontal list stores a -inf and +inf node, each is like a linked
list with both a header and trailer node.


Searching: When searching for some value v, the algorithm finds the largest
value <= v at level S0: thus, it finds v if it is in the skip list, and the
value right before where v should appear, if it is not already in the skip list
(very useful for insertion). Here is the psuedo-code.

  Set cursor c to -inf at the highest level (the bottom level is S0)
  while c has a below node {      //always true when c is at the highest level
    c = c.below                   //  go down 1 level)
    while v >= c.next.value       //  go right at the same level until you find
      c = c.next;                 //    the node storing the biggest value <= v
  }
  c stores a reference to the node searched for (or before where it belongs)

This algorithm does a linear search at each level, refering to the largest
value <= v (comparing to the value one beyond it, which at worst is +inf).
At the higher levels the lists are sparse (have few values), so each probe
"jumps" ahead many values at the lower level (optimally half ahead, much like
a binary search tree optinally eliminates half the subtree values with each
comparison). See Goodrich and Tamassia, page 417.


Delete: This algorithm starts with searching, and is very simple beyond that:
it just removes the value from level S0 and every level that it appears in 
above S0 (it can reach upper nodes by following the above references.) The
quadruply linked lists make remove a bit tedious to write (code wise) but all
the necessary references are stored to make this operation possible.

  c = search for v             //See algorithm above
  if c.value != v
    return                     //Cannot delete it; it is not in the skip list

  //Remove c from level S0 and from every higher level it appears in
  while (c!=null)
    //Excise the node c refers to from the list
    c.prev.next = c.next       //next of node on left refers to node on right
    c.next.prev = c.prev       //prev of node on right refers to node on left
    c = c.above                //move up one level


Insert: This algorithm starts with searching too, but afterward its code is
more complicated, and involves generating random numbers (flipping coins).
So, first we find the location for the node to insert on level S0: searching
for a value not in the list finds in level S0 the node right before where we
would  add the new node. Then we insert this node and add nodes on top of it
(so long as the flip is heads), linking every node added to the ones to its
left/right (on that level) and the same node below - and even adding new "top"
levels (with -inf and +inf) if necessary. Here is the psuedo-code.

  c = search for v                 //See algorithm above
  if c.value == v
    return                         //No need to add, already in skip list

  put v after c's node at level S0 //Easy, doubly linked
  while flip == heads {            //Going up one level, based on head flip
    while (c.above == null)        //Stay or go backwards to find a node with
      c = c.prev		   //   something above it: at worst -inf
    c = c.above
    add a new node between between c and what follows c //at worse -inf, +inf
    if already at the highest level
      add a new highest level with only -inf and + inf


Iterator: Since level S0 has all the values, we primarily iterate through this
list. If want to remove the value just returned by next in the iterator, we
remember which one we have seen and perform what is in the delete code shown
above, but without calling "c = search for v" or checking if the value is
there, because our iterator already stores a reference to the node that we
want to remove.


Time Analysis (summarized from the text; a starred/advanced/optional section)

First, what height do we expect for a Skip List with N values? The probability
that adding a sinbgle value leads to a tower of height i is <= 1/2^i (getting i
consecutive head flips in a row). So if we add N entries, the probability that
the height of the Skip List data structure is i is <= N/2^i. The expected
height of a skip list with N values is Log2 N. What does this probability mean
in practice?

Let's compute the probability that the height of the Skip List is two times
the expected height (2 Log2 N). It is <= N/2^(2 Log2 N) or <= N/N^2 or <= 1/N.
So, for N = 1,000, it means the probability that the height is 2 Log2 1000
(e.g. 20) is 1 in a thousand. The probability that it is three times the
expected height  (3 Log2 1000) would be 1 in a million. The probability that it
is four times the expected height (4 Log2 1000) would be 1 in a billion.  So,
we expect the height to be O(Log2 N) with a small constant.

Insert and delete start by doing a locate, so let's think about the complexity
class of locate first. We will need to drop down from the highest level to
level 0, which by the previous analysis is O(Log2 N). We also need to know
how many forward scans we do on each level. Because of the coin flipping, this
number is 2 (the number of times we expect to flip a coin before it comes up
tails: 50% it takes 1 flip (.5); 25% it takes 2 flips (+.5); 12.5% it takes 3
flips (+.375); 6.25% it takes 4 flips (+.25); 3.125% it takes 5 flips
(+.15625) ... this sums to 2).

So if we expect to examine 2 values per level, and we expect O(Log2 N) levels,
we expect to examine O(Log2 N) values (with some small constant).

When inserting or removing, we expect to add/remove the tower of values, whose
height is expected to be O(Log2 N) as well.