Skip Lists Skip Lists can be used to implement sets and maps. We will discuss how skip lists are stored (their order and structure properties), and the various algorithms that operate on them. Skip lists allow a relatively quick locate, insert, and delete methods (quick like trees - O(Log2 N), not as quick as hash tables). Also, it is easy and efficient to iterate over all the values in a skip list, including removing values. The basic material on skip lists and the algorithms that manipulate them are covered in our textbook, Goodrich and Tamassia, on pages 415-423. They show pictures for all the operations, which will not appear in these notes (but I will draw them in class). You should be able to hand-simulate the various skip list algorithms on such pictures. The exact structure of a skip list depends on a random number generator, not only on the data stored in a skip list (building two skip lists with the same values added in the same order would most likely result in different skip lists, except at the S0 level). This is unlike BSTs and AVL trees where the tree structure depended exactly on the order in which the values were added to the tree. Since there are excellent random number generators, skip list should be less pathological than search trees. The textbooks states that skip lists in practice are faster than AVL and other balanced trees. But we can use trees to store all sorts of information (e.g., like expression trees, not just information to search for), and are generally more insteresting than skip lists, so we have studied trees much more in ICS-23. Note that each node in a skip list is quadruply linked: with prev, next, above, and below references. Initially, an empty skip list contains two levels, S0 and S1, both storing only -inf and +inf nodes. Generally, all values in a Skip list are stored at level S0 (making it easy to iterate through all the values) and each higher level has ABOUT (determined by the random number generator) 1/2 the values as its directly lower one (except the top level, which always has only the two values -/+inf). See Goodrich and Tamassia, page 415-416. Since every horizontal list stores a -inf and +inf node, each is like a linked list with both a header and trailer node. Searching: When searching for some value v, the algorithm finds the largest value <= v at level S0: thus, it finds v if it is in the skip list, and the value right before where v should appear, if it is not already in the skip list (very useful for insertion). Here is the psuedo-code. Set cursor c to -inf at the highest level (the bottom level is S0) while c has a below node { //always true when c is at the highest level c = c.below // go down 1 level) while v >= c.next.value // go right at the same level until you find c = c.next; // the node storing the biggest value <= v } c stores a reference to the node searched for (or before where it belongs) This algorithm does a linear search at each level, refering to the largest value <= v (comparing to the value one beyond it, which at worst is +inf). At the higher levels the lists are sparse (have few values), so each probe "jumps" ahead many values at the lower level (optimally half ahead, much like a binary search tree optinally eliminates half the subtree values with each comparison). See Goodrich and Tamassia, page 417. Delete: This algorithm starts with searching, and is very simple beyond that: it just removes the value from level S0 and every level that it appears in above S0 (it can reach upper nodes by following the above references.) The quadruply linked lists make remove a bit tedious to write (code wise) but all the necessary references are stored to make this operation possible. c = search for v //See algorithm above if c.value != v return //Cannot delete it; it is not in the skip list //Remove c from level S0 and from every higher level it appears in while (c!=null) //Excise the node c refers to from the list c.prev.next = c.next //next of node on left refers to node on right c.next.prev = c.prev //prev of node on right refers to node on left c = c.above //move up one level Insert: This algorithm starts with searching too, but afterward its code is more complicated, and involves generating random numbers (flipping coins). So, first we find the location for the node to insert on level S0: searching for a value not in the list finds in level S0 the node right before where we would add the new node. Then we insert this node and add nodes on top of it (so long as the flip is heads), linking every node added to the ones to its left/right (on that level) and the same node below - and even adding new "top" levels (with -inf and +inf) if necessary. Here is the psuedo-code. c = search for v //See algorithm above if c.value == v return //No need to add, already in skip list put v after c's node at level S0 //Easy, doubly linked while flip == heads { //Going up one level, based on head flip while (c.above == null) //Stay or go backwards to find a node with c = c.prev // something above it: at worst -inf c = c.above add a new node between between c and what follows c //at worse -inf, +inf if already at the highest level add a new highest level with only -inf and + inf Iterator: Since level S0 has all the values, we primarily iterate through this list. If want to remove the value just returned by next in the iterator, we remember which one we have seen and perform what is in the delete code shown above, but without calling "c = search for v" or checking if the value is there, because our iterator already stores a reference to the node that we want to remove. Time Analysis (summarized from the text; a starred/advanced/optional section) First, what height do we expect for a Skip List with N values? The probability that adding a sinbgle value leads to a tower of height i is <= 1/2^i (getting i consecutive head flips in a row). So if we add N entries, the probability that the height of the Skip List data structure is i is <= N/2^i. The expected height of a skip list with N values is Log2 N. What does this probability mean in practice? Let's compute the probability that the height of the Skip List is two times the expected height (2 Log2 N). It is <= N/2^(2 Log2 N) or <= N/N^2 or <= 1/N. So, for N = 1,000, it means the probability that the height is 2 Log2 1000 (e.g. 20) is 1 in a thousand. The probability that it is three times the expected height (3 Log2 1000) would be 1 in a million. The probability that it is four times the expected height (4 Log2 1000) would be 1 in a billion. So, we expect the height to be O(Log2 N) with a small constant. Insert and delete start by doing a locate, so let's think about the complexity class of locate first. We will need to drop down from the highest level to level 0, which by the previous analysis is O(Log2 N). We also need to know how many forward scans we do on each level. Because of the coin flipping, this number is 2 (the number of times we expect to flip a coin before it comes up tails: 50% it takes 1 flip (.5); 25% it takes 2 flips (+.5); 12.5% it takes 3 flips (+.375); 6.25% it takes 4 flips (+.25); 3.125% it takes 5 flips (+.15625) ... this sums to 2). So if we expect to examine 2 values per level, and we expect O(Log2 N) levels, we expect to examine O(Log2 N) values (with some small constant). When inserting or removing, we expect to add/remove the tower of values, whose height is expected to be O(Log2 N) as well.