Sorting: O(N Log2 N) Sorts and Lower Bounds

In this lecture we will discuss three O(N Log2 N) sorting algorithms (although
for quicksort, this bound is the average case, not the worst case). We will
also  discuss a non-trivial lower bound for sorting (to me, this is especially
interesting and surprising).


Heap Sort:

First we discuss Heap Sort. As we have seen with Heaps for priority queues, to
sort N values we can add each value into the Heap (assume the lowest value has
the highest priority) and then remove the values in order (highest one first:
use a Max-Heap). The complexity of the "online" algorithm is NxO(Log2 N) +
NxO(Log2 N) = 2NO(Log2 N) = O(N Log2 N). The complexity of the "offline"
algorithm is O(N) + NxO(Log2 N) = O(N+NLog2 N) = O(N Log2 N). It takes this
amount of work even in the best case.

  1) Worst/Best/Average case is O(N Log2 N)
  2) In-place (all data is stored in the array that is the heap); when we
       remove values (biggest first) we swap it with the last used location
       in the array, and then do not use that location any more.
  3) O(N Log2 N) comparisons; O(N Log2 N) swaps in the worst case
  4) Unstable (percolating values up and down the tree -across many indexes
       in the array- produces instability).

As we discussed, we can create a Heap in O(N) "offline" if we already have all
N values (not adding each to a Heap) in the array. The algorithm operates
bottom-up (imagine the Heap values all in a tree) right to left.

  For every node (scanning the array backwards deals with them deepest first,
  right to left) percolate it downward using the standard Heap algorithm (note
  that its children will already be heaps, via the order of processing).

Recall that in an N node binary tree (let's assume all depths are filled),
the deepest depth contains about N/2 values (yes, almost half its values are
at the bottom); the next depth contains about N/4 nodes, the next depth
contains about N/8 nodes, ..., the root 1. So each depth has 1/2 as many nodes
as the one below it. How far down can each node move in the algorithm. At the
bottom, these N/2 nodes are never moved down; at the depth above, these N/4
nodes can move down 1; at the depth above, these N/8 nodes can move down 2;
... at the top depth, the root can move Log2 N nodes down. So, higher up nodes
can move down more, but there are fewer and fewer higher nodes.

The total number of possible moves is therfore
   1*Log2 N + 2*(Log2 N  - 1) + 4*(Log2 N  - 2) + ... N/2*(0)

This sum is approximately N (try it out in Excel, for example); but instead of
trying to proving it, let's analze the tree directly.

Suppose we start with a tree with N nodes; in the worst case each node moves
from its depth to the bottom depth. Now imagine a tree with twice as many
nodes (2N). It would have exactly 1 deeper depth, and none of the N nodes at
this deepest depth would move. Each of the N nodes in the previous tree at
most would move one depth deeper, for a totally of N moves. So, by doubling
the size we have doubled the number of operations necessary, which is the
signature of the O(N) complexity class.

So, if we take the array as given, then heapify it with the biggest value
having the highest priority, when it comes time to remove that value, we swap
it with the current end of the heap/array and percolate down the swapped value
into the array (but not including the last value in the array). Repeatedly
doing this puts the biggest value at the end, the next biggest value right
before the end, ... etc. until we have the array in sorted order. So, we start
and end with an array, treating it like a heap in between.

In fact, we discussed using the heapify algorithm in two HeapPriorityQueue
constructors (the ones with Iterator and array parameters) to speed up
construction of an initial Heap containing some number of values.


Merge Sort

Next we will discuss Merge Sort. This is a "divide and conquer" sort,
implemented simply via recursion. We use recursion to divide up the problem
and merging to do the sorting. The array form of this sort is written as

  public static void mergeSort(int[] a) {
    mergeSort(a, 0, a.length-1);
  }

calling an overloaded mergeSort method that specifies the minimum and maximum
index to use when sorting the (sub)array (in the call from the method above, we
specify all indexes). This method can be written recursively
	
  public static void mergeSort(int[] a, int low, int high) {
    if (low >= high)                  //Base case: 1 value to sort->sorted
      return;                         //(0 possible only on initial call)
    else {
      int mid = (low + high)/2;       //Approximate midpoint
      mergeSort(a, low, mid);         //Sort low to mid part of array
      mergeSort(a, mid+1, high);      //Sort mid+1 to high part of array
      merge(a, low,mid, mid+1,high);  //Merge sorted parts of array
    }
  }

Note that if low and high are adjacent, say 4 and 5, then mid = 4 and the
recursive calls are mergeSort(a, 4,4) and mergeSort(a, 5,5), which are both
base cases.

All the sorting is done in the merge method: mergeSort just recursively
computes the positions in of each part of the array to sort (and stops at 1
element arrays as the base case, which are by definition sorted).

Suppose that we write an original array of 16 values as follows. We choose
16 because it is a perfect power of 2, but all other sizes work as well.

 7   10    3    2    6    13   15   16   12    1    5    9    14    4   11   8 

The first level of recursive calls will split it into 2 arrays of 8 values each
(see the | character)

 7   10    3    2    6    13   15   16 | 12    1    5    9    14    4   11   8 

The next level of recursive calls will split it into 4 arrays of 4 values each.

 7   10    3    2  | 6    13   15   16 | 12    1    5    9  | 14    4   11   8 

The next level of recursive calls will split it into 8 arrays of 2 values each.

 7   10 |  3    2  | 6    13 | 15   16 | 12    1 |  5    9  | 14    4 | 11   8 

The bottom level of recursive calls will split it into 16 arrays of 1 value
each.

 7 | 10 |  3 |  2 |  6 |  13 | 15 | 16 | 12 |  1 |  5 |  9 |  14 |  4 | 11 | 8 

Now each pair of adjacent 1 value sorted arrays is merged into 8 sorted arrays
of 2 values each.

 7   10 |  2    3 |  6    13 | 15   16 |  1   12 |  5    9  |  4   14 |  8 | 11

Now each pair of adjacent 2 value sorted arrays is merged into 4 sorted arrays
of 4 values each.

 2    3    7   10 |  6    12   15   16 |  1    5    9   12  |  4 |  8   11   14

Now each pair of adjacnet 4 value sorted arrays is merged into 2 sorted arrays
of 8 values each.

 2    3    6    7   10    12   15   16 |  1    4    5    8     9   11   12  14

Finally, the remaining pair of 8 value sorted arrays is merged into 1 sorted
arrays of 16 values.

 1    2    3    4    5     6    7    8    9   10   11   12    13   14   15  16

Note that recursive calls do O(1) work; there are Log2 N levels for O(Log2 N)
net work. Merging each level requires O(N) work (justified below), so the
total amount of work is Log2 N x O(N) or O(N Log2 N).

  1) Worst/Best/Average case is O(N Log2 N)
  2) Not in-place (requires an equal sized array; see merge below)
  3) O(N Log2 N) comparisons; O(N Log2 N) movement in the worst case
  4) Stable: when we merge left and right arrays, equal values are moved
       first from the left subarray (which were originally to the left of all
       the equal values on the right subarray, ensuring stability).

Here is pseudo-code for merging

public static void merge (int[] a,
                          int leftLow , int leftHigh,
                          int rightLow, int rightHigh) {

  Create a temporary array that is the same size as "a"
    (this extra storage is why the algorithm is not in-place)
  for every temporary array value from leftLow to rightHigh
    if there are no more "left" values
      copy to the temporary array the next "right" value
    else if there are no more "right" values
      copy to the temporary array the next "left" value 
    else if the next "left" value <= the next "right" value
      copy to the temporary array the next "left" value 
    else
      copy to the temporary array the next "right" value

  copy the temp array back into "a": into the positions leftLow to rightHigh
}

The Merge method merges two sorted arrays (both in a) of size about N/2 into
one sorted  array of size N (temp). The main loop puts a new value in the temp
array on every iteration, sometimes from the left of "a" and sometimes from the
right of "a". So, the loop iterates N times with O(1) work done during each
iteration. 

The first two ifs test whether all the values from the left/right have been
moved, and if so it moves a value from the other one. If there are values in
both, it compares them and moves the smallest (using the left of "a" when they
are equal). Finally, all the values are copied back from the temp array into
"a".

This method is easy to do with linked lists as well: although dividing the
linked list is half takes O(N) time, merging also takes O(N) times as well,
so the O(N Log2 N) complexity bound still holds for linked lists.

Finally there are iterative (non-recursive) implementations. Such code is more
complicated but not unreasonable for advanced students to write. Also,
sometimes other algorithms are faster for small N (say c). So sometimes the
base case is an array size <= c, at which point the other sorting method is
called to sort the subarray, instead of calling merge sort recursively.


Quick Sort:

Finally, we will discuss Quick Sort, which is also a "divide and conquer" sort,
implemented simply via recursion. We use partitioning to divide up the problem.
The array form of this sort is written much like mergeSort was, first

  public static void quickSort(int[] a) {
    quickSort(a, 0, a.length-1);
  }

calling an overloaded quickSort method that specifies the minimum and maximum
index to use when sorting the array (here, all of them). This method can be
written recursively
	
  public static void quickSort(int[] a, int low, int high) {
    if (low >= high)  //Base case: 0 or 1 value to sort->sorted
      return;         //(0 possible on initial call and recursion)
    else {
      int pivotIndex = partition(a,low,high); //Partion and return Pivot index

      quickSort(a, low, pivotIndex-1);      //Sort values to left of pivot
      quickSort(a, pivotIndex+1, high);     //Sort values to right of pivot

      //Note that all values to the left of the pivot are <= all values to
      //  right of the pivot, so if both are sorted (with the pivot between),
      //  the entire array is sorted
    }
  }

The partition method chooses the pivot value, then parititions the array into
those values < pivot (on the left) and those values >= pivot (on the right),
finally putting the pivot at an index in between these two. It returns the
pivot's index (so the recursive calls know which parts of the array need to be
sorted together).

Similar to Merge Sort, all the sorting is done in the pivot method: quickSort
calls parition and figures out, based on the pivotIndex, where to do the
recursive calls for more paritioning (and stops at 0 or 1 element arrays, which
are by definition sorted).

The pseudo-code for partition is
  Choose the pivot value (see the discussion below) and swap the pivot value
     with the value in a[high], putting it back where it belongs at the end
  Start with l = low and r = high;
  while (l<r) {                    //Are there some values to examine?
    while (l<r && a[l] <  pivot)   //Find a left value >= the pivot
      l++;
    while (l<r && a[r] >= pivot)   //Find a right value < the pivot
      r--;
    if (l<r)			   //If found both, swap them
      swap(a,l,r);
    //if not, the while loop will terminate on the next iteration
  }

  swap(a,l,high);  //swap the pivot back where it belongs a[l] is > pivot
  return l;        //the position of the pivot

Let's look at an example of how this work. Suppose that we write an original
array of 16 values as follows.

7   10    3    2    6    13   15   12   16    4    5    9    14    1   11   8 

Let's just choose the last value (8) as the pivot.

7   10    3    2    6    13   15   12   16    4    5    9    14    1   11   8 
l                                                                          high
                                                                            r 
It scans l forwards until it indexes a value >= 8; it scans r backwards until
it indexes value < 8.

7   10    3    2    6    13   15   12   16    4    5    9    14    1   11   8 
     l                                                             r      high

Now it swaps those values.

7    1    3    2    6    13   15   12   16    4    5    9    14   10   11   8 
     l                                                             r      high

It scans l forwards until it indexes a value >= 8; it scans r backwards until
it indexes value < 8.

7    1    3    2    6    13   15   12   16    4    5    9    14   10   11   8 
                          l                        r                      high

Now it swaps those values.

7    1    3    2    6     5   15   12   16    4   13    9    14   10   11   8 
                          l                        r                      high

It scans l forwards until it indexes a value >= 8; it scans r backwards until
it indexes value < 8.

7    1    3    2    6     5   15   12   16    4   13    9    14   10   11   8 
                               l              r                           high

Now it swaps those values.

7    1    3    2    6     5    4   12   16   15   13    9    14   10   11   8 
                               l              r                           high

It scans l forwards until it indexes a value >= 8; it scans r backwards until
it indexes value < 8 -but stops these indexes when they are equal.

7    1    3    2    6     5    4   12   16   15   13    9    14   10   11   8 
                                    l                                     high
                                    r

So, now r=l, so it doesn't swap those values. instead is swaps index l with
index high, putting the pivot after the values smaller than it and at the
beginning of the values greater than or equal to it.

7    1    3    2    6     5    4    8   16   15   13    9    14   10   11  12
                                    l                                     high
                                    r

The partitioned array would look like the following (with the pivot in ||).

7    1    3    2    6     5    4  | 8 | 16   15   13    9    14   10   11  12

The partition method returns 7 (the index of the pivot 8 in the array)

You should understand the details of how the partition method works, by hand
simulating it on other 16 element arrays.

Here we were lucky, as 8 was the middle value in the array. As with merging,
partitioning requires a total of O(N) operations to compute all the partitions
needed for each level. If we continue choosing "middle" values as pivots, there
would have been a total of Log2 N levels, just like with Merge Sort. Leading to
a best case complexity of O(N Log2 N).

Now we recursively partition the left part (indexes 0 to 6) and right part
(indexes 8 to 15). In both we again choose the last value as the pivot: 4 for
the left range, 12 for the right range; in  both these cases the choices
is fortunate, as these values are near the middle, of each range. After each
range is partitioned, it looks as follows (with the pivots in ||).

2    1    3  | 4 |   6     5    7  | 8 | 11   10    9  |12 |   14   15   16  13

The result is that we still have arrays of size 3, 3, 3, and 4 to partition. If
we keep choosing such good pivots, there would be Log N levels, meaning the
best case complesxity class for Quick Sort would be O(N Log2 N): N levels each
requiring O(N) work.

Starting over again, here is an example of an array that would continually
supply the the worst partition choice (the biggest value in the array).

1    2    3    4    5     6    7    8    9   10   11   12    13   14   15   16 

This results in the following array after partitioning

1    2    3    4    5     6    7    8    9   10   11   12    13   14   15  |16|

which has taken 16 operations to partition the array but has not changed it.
Now the recursive calls work on an array of size 15 and of size 0. If we
continue to choose the worst partition, the next recursive call would take 15
operations to partition the array but would not changed it. This would continue
requiring 16 + 15 + 14 + .... + 1 operations, or O(N^2), which is why in the
worst case this method is O(N^2).


You can see pages 519 and 523 in Goodrich and Tamassia for a full example.

  1) Worst is O(N^2), best and average are O(N Log2 N)
  2) Not in-place (requires Log2 N average -N worst- stack space for recursive
       calls); although not in place, Log2 N extra space isn't much 
  3) O(N^2) comparisons and movement worst case (O(N Log2 N) and O(N Log2 N)
        average)
  4) Unstable: partition swaps values over large distances in the array

To summarize, Quick Sort's work is between O(N Log2 N) and O(N^2). It can be
much more often O(N Log2 N), and its constant is lower than those for either
Heap Sort or Merge Sort. The difference between good and bad behavior is
picking a good pivot (discussed further below).

So, picking a good pivot is important. Sometimes pivots are choosen from the
start, middle, or end of the array. By choosing the middle, if the array is
already sorted, the pivot will be good (choosing either end will result in
O(N^2) time). One can also choose a pivot from a random position in the array. 
Obviously the best pivot is the median value in the array, which will split it 
in half. But it would take too much work to find the true median, over and over
again in each call to partition. So, we can find an approximate median by
choosing the pivot as the median of 3 values in the array to approximate the
actual median (picking the first, middle, last values to compare, or picking
three values in random indexes).

Of course, we could better approximate the median by looking at even more
values (say Median of 5), but the time to find such a median is bigger. We
need a tradeoff between how long it takes to choose a pivot and how good the
pivot chosen is. It has been found in practice that Median of 3 gives the
best overall results.

With a good pivot, this algorithm also requires only O(Log2 N) extra stack
space from recursion.

Finally, to speed up Quick Sort, often the many small arrays at the bottom
of the recursion are sorted via a faster sort for small arrays, or not sorted
at all. Say that we leave arrays of size 4 or less unsorted. After most parts
of the array are sorted, we do one call to Insertion Sort. This method runs in
O(N) if the data is mostly sorted (which it will be after lots of partitioning:
values will be within 4 of the final index) and doing so is often faster than
completly sorting via Quick Sort. Depending on your machine/compiler, you might
discover an optimal minimal size (that is bigger or smaller) for recursively
calling Quick Sort.


Final Words on O(N Log2 N) sorting:

The following analysis is based on my implementation of these sorting methods.

Heap Sort is guaranteed to run in O(N Log2 N) and typically runs slowest; it
CAN be done in place, but it is unstable.

Merge Sort is guaranteed to run in O(N Log2 N) but typically runs slower than
Quick sort; it CANNOT be done in place (requiring an extra N in space), but it
is stable.

Quick Sort is NOT guaranteed to run in O(N Log2 N) -running O(N^2) is bad
cases, but if we choose the partition carefully, it almost always runs in
O(N Log2 N) and does it faster than Merge or Heap Sorts (with a smaller
constant). It CANNOT be done in place, but the extra storage is only of size
Log2 N (which is much less than the extra N space needed by Merge Sort).
Finally, it too is unstable.

Recently (2011) a programmer named Tim Peters developed a sorting algorithm
(he named it TimSort) that is based on merge sort and insertion sort. It is
stable, and at worst runs in O(N Log2 N) but often runs faster, sometimes as
fast as O(N), when the data is not completely random, but partially sorted
(which is often the case). It does take up some extra space, but not a lot. To
get this performance the method is highly tuned and takes lots of code. But
since sorting is done so often, TimSort is now the standard sorting algorithm
in Python (where it was developed) and Java.


There are hundreds of sorting algorithms; The more you know about how
your data is distributed (if it isn't totally random) the better choice of
sorting algorithm you can make. Generally, Quick Sort  is built into libraries
with just one sorting method: Quick Sort is built into Java.


Lower Bounds for Comparison Sorting Methods:

Certainly we must look at every value in the array when sorting it (if we
left one out, it might be in the wrong spot). So we have a trivial Omega(N)
lower bound for sorting when using comparisons. But we can use the idea of a
Comparison Tree to compute a much more interesting and useful lower bound for
sorting using comparisons.

For every comparision-based algorithm that we develop for sorting, we can 
translate it into a Comparison Tree, by looking at which values in the array
it compares to which other values in the array.

Thus, the entire tree specifies how comparisons are made for every possible
input (which is just a different form of sorting algorithms). Each internal
node of the tree specifies a comparison to make; each leaf shows the order of
all the values. Here is a Comparison Tree for an algorithm that sorts the three
values x1, x2, x3. I took this tree from David Eppstein's ICS 163 Notes (so,
you might see a similar proof again, at a more sophisticated level, when you
are more sophisticated).
                
                 x1:x2
                /     \
             < /       \ >
              /         \
          x2:x3        x1:x3
           / \           / \
        < /   \>      < /   \ >
         /     \       /     \
    x1,x2,x3  x1:x3 x2,x1,x3 x2:x3
               / \           / \
            < /   \ >     < /   \ >
             /     \       /     \
      x1,x3,x2 x3,x1,x2  x2,x3,x1 x3,x2,x1

At the root we know nothing about the ordering of x1, x2, and x3. At every
internal node we perform one comparison (different sorting methods do these
comparisons in different orders). After each comparison we know a bit more
about the ordering of the values. After we accumulate enough information (do
enough comparisons on any path from the root downward), we know the exact
ordering of the values.

So, if for example, we follow from the root and if we find that
  x1 < x2, then find that x2 < x3 we know the order: x1 < x2 < x3.

Likwise, if we follow from the root and find that
  x1 < x2, then find that x2 > x3 then find that x1 < x3 we know the order:
  x1 < x3 < x2.

In the worst case for an input, a Comparison Tree must perform one comparison
for each depth in the tree, and thus in the worst case it performs a number of
comparisons equal to its height. So, if we know the the height of a comparison
tree, the complexity class of its algorithm is equal to its height. 

We can use what we know about tree heights to get an interesting bound, by
knowing how many leaves must be in any Bomparison Tree.

When sorting an N value array, there are N! (N factorial) different
arrangements of these values. Each arrangement must occur at least once in the
Comparision Tree; so, in the Comparision Tree there are at least N! leaves.

For example, for 3 values, there are 6 different arrangements of values:

  1) x1 < x2 < x3
  2) x1 < x3 < x2
  3) x2 < x1 < x3
  4) x2 < x3 < x1
  5) x3 < x1 < x2
  6) x3 < x2 < x1

all of which occur in this Comparison Tree.

Note that if there are N different choices for the smallest value, and (N-1)!
arrangements for the remaining values.

Based on a Comparison Tree having at least N! leaves, we can prove that the
height of the Comparison Tree (the number of comparisons performed for the
worst case input) is Omega(N Log2 N).  Here is a chain of inequalities that
allow us to prove this fact.

Note first that each Comparison Tree is a binary tree.

1) A Comparison Tree is a binary tree that has at least N! leaves.

2) A binary tree with N! leaves has more nodes than a binary tree with N!
   nodes - because it must have many internal nodes besides the leaves.

3) The height of a binary tree with N nodes is at least Log2 N.  So a binary
   tree with N! nodes must have a height at least Log2 N!.

4) N! =  N  * (N-1) * (N-2)* (N-3)*....*(N/2) * (N/2-1) * (N/2-2) *....* 2 * 1
   N! > N/2 *  N/2  *  N/2 *  N/2 *....* N/2

   Here we replaced the first N/2 terms by N/2 and ignored the second N/2 terms

   Thus, N! > (N/2)^(N/2)      
   Taking Logs of each side, Log2 N! > N/2 * Log2 N/2 = N/2 *(Log2 N - 1)
                                                      = N/2 Log2 N - N/2
   so Log2 N! is is Omega(N Log N)

So, we have
  Height of Comparison Tree (with N! leaves)
    > Heigh of a binary tree  (with N! nodes)
    > Log2 N!
    > N/2 (Log2 N - 1)
    which is Omega(N Log N)

  So, for any Comparison Tree that sorts N values, in the Worst Case it
  requires Omega(N Log2 N) comparisons to find the correct ordering  (there is
  an ordering that requires a number of comparisons at least the height of the
  tree).

You can see pages 531 and 532 in Goodrich and Tamassia for a similar proof.

Also note that Log2 n! = Log2 1 + Log2 2 + Log2 3 + ... + Log2 N which can
be accurately approximated by integrating Log x dx between 1 and N. Stirling's
approximation for N! is sqrt(2*pi*N)* N^N * e^(-N). As N gets big, e^(-N) gets
very close to 1, so N! is Omega(sqrt(N) * N^N); when taking Logs we have
Log N! is Omega (Log2 sqrt(2*pi*N) + N Log2 N) or just Omega(N Log2 N), by
dropping the sqrt term, which we can do for lower bounds.

Since we know sorting with comparisons is Omega(N Log2 N) and we have multiple
algorithms that are O(N Log2 N) -Heap Sort and Merge Sort- we have "optimally
solved" the sorting problem - at least according to its complexity class. Other
algorithms based on comparisons might have smaller constants (which affect the
actual speed), but none will be in a smaller complexity class.

Next we will examine two sorting algorithms that seem to violate this lower
bound, but they do so by not using comparisons to sort their values! These are
strange algorithms, that are useful for certain kinds of data and have 
interesting upper and lower bounds that we will explore in more detail (and
see that they don't really violate the lower bound)