Sorting: O(N Log2 N) Sorts and Lower Bounds In this lecture we will discuss three O(N Log2 N) sorting algorithms (although for quicksort, this bound is the average case, not the worst case). We will also discuss a non-trivial lower bound for sorting (to me, this is especially interesting and surprising). Heap Sort: First we discuss Heap Sort. As we have seen with Heaps for priority queues, to sort N values we can add each value into the Heap (assume the lowest value has the highest priority) and then remove the values in order (highest one first: use a Max-Heap). The complexity of the "online" algorithm is NxO(Log2 N) + NxO(Log2 N) = 2NO(Log2 N) = O(N Log2 N). The complexity of the "offline" algorithm is O(N) + NxO(Log2 N) = O(N+NLog2 N) = O(N Log2 N). It takes this amount of work even in the best case. 1) Worst/Best/Average case is O(N Log2 N) 2) In-place (all data is stored in the array that is the heap); when we remove values (biggest first) we swap it with the last used location in the array, and then do not use that location any more. 3) O(N Log2 N) comparisons; O(N Log2 N) swaps in the worst case 4) Unstable (percolating values up and down the tree -across many indexes in the array- produces instability). As we discussed, we can create a Heap in O(N) "offline" if we already have all N values (not adding each to a Heap) in the array. The algorithm operates bottom-up (imagine the Heap values all in a tree) right to left. For every node (scanning the array backwards deals with them deepest first, right to left) percolate it downward using the standard Heap algorithm (note that its children will already be heaps, via the order of processing). Recall that in an N node binary tree (let's assume all depths are filled), the deepest depth contains about N/2 values (yes, almost half its values are at the bottom); the next depth contains about N/4 nodes, the next depth contains about N/8 nodes, ..., the root 1. So each depth has 1/2 as many nodes as the one below it. How far down can each node move in the algorithm. At the bottom, these N/2 nodes are never moved down; at the depth above, these N/4 nodes can move down 1; at the depth above, these N/8 nodes can move down 2; ... at the top depth, the root can move Log2 N nodes down. So, higher up nodes can move down more, but there are fewer and fewer higher nodes. The total number of possible moves is therfore 1*Log2 N + 2*(Log2 N - 1) + 4*(Log2 N - 2) + ... N/2*(0) This sum is approximately N (try it out in Excel, for example); but instead of trying to proving it, let's analze the tree directly. Suppose we start with a tree with N nodes; in the worst case each node moves from its depth to the bottom depth. Now imagine a tree with twice as many nodes (2N). It would have exactly 1 deeper depth, and none of the N nodes at this deepest depth would move. Each of the N nodes in the previous tree at most would move one depth deeper, for a totally of N moves. So, by doubling the size we have doubled the number of operations necessary, which is the signature of the O(N) complexity class. So, if we take the array as given, then heapify it with the biggest value having the highest priority, when it comes time to remove that value, we swap it with the current end of the heap/array and percolate down the swapped value into the array (but not including the last value in the array). Repeatedly doing this puts the biggest value at the end, the next biggest value right before the end, ... etc. until we have the array in sorted order. So, we start and end with an array, treating it like a heap in between. In fact, we discussed using the heapify algorithm in two HeapPriorityQueue constructors (the ones with Iterator and array parameters) to speed up construction of an initial Heap containing some number of values. Merge Sort Next we will discuss Merge Sort. This is a "divide and conquer" sort, implemented simply via recursion. We use recursion to divide up the problem and merging to do the sorting. The array form of this sort is written as public static void mergeSort(int[] a) { mergeSort(a, 0, a.length-1); } calling an overloaded mergeSort method that specifies the minimum and maximum index to use when sorting the (sub)array (in the call from the method above, we specify all indexes). This method can be written recursively public static void mergeSort(int[] a, int low, int high) { if (low >= high) //Base case: 1 value to sort->sorted return; //(0 possible only on initial call) else { int mid = (low + high)/2; //Approximate midpoint mergeSort(a, low, mid); //Sort low to mid part of array mergeSort(a, mid+1, high); //Sort mid+1 to high part of array merge(a, low,mid, mid+1,high); //Merge sorted parts of array } } Note that if low and high are adjacent, say 4 and 5, then mid = 4 and the recursive calls are mergeSort(a, 4,4) and mergeSort(a, 5,5), which are both base cases. All the sorting is done in the merge method: mergeSort just recursively computes the positions in of each part of the array to sort (and stops at 1 element arrays as the base case, which are by definition sorted). Suppose that we write an original array of 16 values as follows. We choose 16 because it is a perfect power of 2, but all other sizes work as well. 7 10 3 2 6 13 15 16 12 1 5 9 14 4 11 8 The first level of recursive calls will split it into 2 arrays of 8 values each (see the | character) 7 10 3 2 6 13 15 16 | 12 1 5 9 14 4 11 8 The next level of recursive calls will split it into 4 arrays of 4 values each. 7 10 3 2 | 6 13 15 16 | 12 1 5 9 | 14 4 11 8 The next level of recursive calls will split it into 8 arrays of 2 values each. 7 10 | 3 2 | 6 13 | 15 16 | 12 1 | 5 9 | 14 4 | 11 8 The bottom level of recursive calls will split it into 16 arrays of 1 value each. 7 | 10 | 3 | 2 | 6 | 13 | 15 | 16 | 12 | 1 | 5 | 9 | 14 | 4 | 11 | 8 Now each pair of adjacent 1 value sorted arrays is merged into 8 sorted arrays of 2 values each. 7 10 | 2 3 | 6 13 | 15 16 | 1 12 | 5 9 | 4 14 | 8 | 11 Now each pair of adjacent 2 value sorted arrays is merged into 4 sorted arrays of 4 values each. 2 3 7 10 | 6 12 15 16 | 1 5 9 12 | 4 | 8 11 14 Now each pair of adjacnet 4 value sorted arrays is merged into 2 sorted arrays of 8 values each. 2 3 6 7 10 12 15 16 | 1 4 5 8 9 11 12 14 Finally, the remaining pair of 8 value sorted arrays is merged into 1 sorted arrays of 16 values. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Note that recursive calls do O(1) work; there are Log2 N levels for O(Log2 N) net work. Merging each level requires O(N) work (justified below), so the total amount of work is Log2 N x O(N) or O(N Log2 N). 1) Worst/Best/Average case is O(N Log2 N) 2) Not in-place (requires an equal sized array; see merge below) 3) O(N Log2 N) comparisons; O(N Log2 N) movement in the worst case 4) Stable: when we merge left and right arrays, equal values are moved first from the left subarray (which were originally to the left of all the equal values on the right subarray, ensuring stability). Here is pseudo-code for merging public static void merge (int[] a, int leftLow , int leftHigh, int rightLow, int rightHigh) { Create a temporary array that is the same size as "a" (this extra storage is why the algorithm is not in-place) for every temporary array value from leftLow to rightHigh if there are no more "left" values copy to the temporary array the next "right" value else if there are no more "right" values copy to the temporary array the next "left" value else if the next "left" value <= the next "right" value copy to the temporary array the next "left" value else copy to the temporary array the next "right" value copy the temp array back into "a": into the positions leftLow to rightHigh } The Merge method merges two sorted arrays (both in a) of size about N/2 into one sorted array of size N (temp). The main loop puts a new value in the temp array on every iteration, sometimes from the left of "a" and sometimes from the right of "a". So, the loop iterates N times with O(1) work done during each iteration. The first two ifs test whether all the values from the left/right have been moved, and if so it moves a value from the other one. If there are values in both, it compares them and moves the smallest (using the left of "a" when they are equal). Finally, all the values are copied back from the temp array into "a". This method is easy to do with linked lists as well: although dividing the linked list is half takes O(N) time, merging also takes O(N) times as well, so the O(N Log2 N) complexity bound still holds for linked lists. Finally there are iterative (non-recursive) implementations. Such code is more complicated but not unreasonable for advanced students to write. Also, sometimes other algorithms are faster for small N (say c). So sometimes the base case is an array size <= c, at which point the other sorting method is called to sort the subarray, instead of calling merge sort recursively. Quick Sort: Finally, we will discuss Quick Sort, which is also a "divide and conquer" sort, implemented simply via recursion. We use partitioning to divide up the problem. The array form of this sort is written much like mergeSort was, first public static void quickSort(int[] a) { quickSort(a, 0, a.length-1); } calling an overloaded quickSort method that specifies the minimum and maximum index to use when sorting the array (here, all of them). This method can be written recursively public static void quickSort(int[] a, int low, int high) { if (low >= high) //Base case: 0 or 1 value to sort->sorted return; //(0 possible on initial call and recursion) else { int pivotIndex = partition(a,low,high); //Partion and return Pivot index quickSort(a, low, pivotIndex-1); //Sort values to left of pivot quickSort(a, pivotIndex+1, high); //Sort values to right of pivot //Note that all values to the left of the pivot are <= all values to // right of the pivot, so if both are sorted (with the pivot between), // the entire array is sorted } } The partition method chooses the pivot value, then parititions the array into those values < pivot (on the left) and those values >= pivot (on the right), finally putting the pivot at an index in between these two. It returns the pivot's index (so the recursive calls know which parts of the array need to be sorted together). Similar to Merge Sort, all the sorting is done in the pivot method: quickSort calls parition and figures out, based on the pivotIndex, where to do the recursive calls for more paritioning (and stops at 0 or 1 element arrays, which are by definition sorted). The pseudo-code for partition is Choose the pivot value (see the discussion below) and swap the pivot value with the value in a[high], putting it back where it belongs at the end Start with l = low and r = high; while (l= the pivot l++; while (l= pivot) //Find a right value < the pivot r--; if (l pivot return l; //the position of the pivot Let's look at an example of how this work. Suppose that we write an original array of 16 values as follows. 7 10 3 2 6 13 15 12 16 4 5 9 14 1 11 8 Let's just choose the last value (8) as the pivot. 7 10 3 2 6 13 15 12 16 4 5 9 14 1 11 8 l high r It scans l forwards until it indexes a value >= 8; it scans r backwards until it indexes value < 8. 7 10 3 2 6 13 15 12 16 4 5 9 14 1 11 8 l r high Now it swaps those values. 7 1 3 2 6 13 15 12 16 4 5 9 14 10 11 8 l r high It scans l forwards until it indexes a value >= 8; it scans r backwards until it indexes value < 8. 7 1 3 2 6 13 15 12 16 4 5 9 14 10 11 8 l r high Now it swaps those values. 7 1 3 2 6 5 15 12 16 4 13 9 14 10 11 8 l r high It scans l forwards until it indexes a value >= 8; it scans r backwards until it indexes value < 8. 7 1 3 2 6 5 15 12 16 4 13 9 14 10 11 8 l r high Now it swaps those values. 7 1 3 2 6 5 4 12 16 15 13 9 14 10 11 8 l r high It scans l forwards until it indexes a value >= 8; it scans r backwards until it indexes value < 8 -but stops these indexes when they are equal. 7 1 3 2 6 5 4 12 16 15 13 9 14 10 11 8 l high r So, now r=l, so it doesn't swap those values. instead is swaps index l with index high, putting the pivot after the values smaller than it and at the beginning of the values greater than or equal to it. 7 1 3 2 6 5 4 8 16 15 13 9 14 10 11 12 l high r The partitioned array would look like the following (with the pivot in ||). 7 1 3 2 6 5 4 | 8 | 16 15 13 9 14 10 11 12 The partition method returns 7 (the index of the pivot 8 in the array) You should understand the details of how the partition method works, by hand simulating it on other 16 element arrays. Here we were lucky, as 8 was the middle value in the array. As with merging, partitioning requires a total of O(N) operations to compute all the partitions needed for each level. If we continue choosing "middle" values as pivots, there would have been a total of Log2 N levels, just like with Merge Sort. Leading to a best case complexity of O(N Log2 N). Now we recursively partition the left part (indexes 0 to 6) and right part (indexes 8 to 15). In both we again choose the last value as the pivot: 4 for the left range, 12 for the right range; in both these cases the choices is fortunate, as these values are near the middle, of each range. After each range is partitioned, it looks as follows (with the pivots in ||). 2 1 3 | 4 | 6 5 7 | 8 | 11 10 9 |12 | 14 15 16 13 The result is that we still have arrays of size 3, 3, 3, and 4 to partition. If we keep choosing such good pivots, there would be Log N levels, meaning the best case complesxity class for Quick Sort would be O(N Log2 N): N levels each requiring O(N) work. Starting over again, here is an example of an array that would continually supply the the worst partition choice (the biggest value in the array). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 This results in the following array after partitioning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |16| which has taken 16 operations to partition the array but has not changed it. Now the recursive calls work on an array of size 15 and of size 0. If we continue to choose the worst partition, the next recursive call would take 15 operations to partition the array but would not changed it. This would continue requiring 16 + 15 + 14 + .... + 1 operations, or O(N^2), which is why in the worst case this method is O(N^2). You can see pages 519 and 523 in Goodrich and Tamassia for a full example. 1) Worst is O(N^2), best and average are O(N Log2 N) 2) Not in-place (requires Log2 N average -N worst- stack space for recursive calls); although not in place, Log2 N extra space isn't much 3) O(N^2) comparisons and movement worst case (O(N Log2 N) and O(N Log2 N) average) 4) Unstable: partition swaps values over large distances in the array To summarize, Quick Sort's work is between O(N Log2 N) and O(N^2). It can be much more often O(N Log2 N), and its constant is lower than those for either Heap Sort or Merge Sort. The difference between good and bad behavior is picking a good pivot (discussed further below). So, picking a good pivot is important. Sometimes pivots are choosen from the start, middle, or end of the array. By choosing the middle, if the array is already sorted, the pivot will be good (choosing either end will result in O(N^2) time). One can also choose a pivot from a random position in the array. Obviously the best pivot is the median value in the array, which will split it in half. But it would take too much work to find the true median, over and over again in each call to partition. So, we can find an approximate median by choosing the pivot as the median of 3 values in the array to approximate the actual median (picking the first, middle, last values to compare, or picking three values in random indexes). Of course, we could better approximate the median by looking at even more values (say Median of 5), but the time to find such a median is bigger. We need a tradeoff between how long it takes to choose a pivot and how good the pivot chosen is. It has been found in practice that Median of 3 gives the best overall results. With a good pivot, this algorithm also requires only O(Log2 N) extra stack space from recursion. Finally, to speed up Quick Sort, often the many small arrays at the bottom of the recursion are sorted via a faster sort for small arrays, or not sorted at all. Say that we leave arrays of size 4 or less unsorted. After most parts of the array are sorted, we do one call to Insertion Sort. This method runs in O(N) if the data is mostly sorted (which it will be after lots of partitioning: values will be within 4 of the final index) and doing so is often faster than completly sorting via Quick Sort. Depending on your machine/compiler, you might discover an optimal minimal size (that is bigger or smaller) for recursively calling Quick Sort. Final Words on O(N Log2 N) sorting: The following analysis is based on my implementation of these sorting methods. Heap Sort is guaranteed to run in O(N Log2 N) and typically runs slowest; it CAN be done in place, but it is unstable. Merge Sort is guaranteed to run in O(N Log2 N) but typically runs slower than Quick sort; it CANNOT be done in place (requiring an extra N in space), but it is stable. Quick Sort is NOT guaranteed to run in O(N Log2 N) -running O(N^2) is bad cases, but if we choose the partition carefully, it almost always runs in O(N Log2 N) and does it faster than Merge or Heap Sorts (with a smaller constant). It CANNOT be done in place, but the extra storage is only of size Log2 N (which is much less than the extra N space needed by Merge Sort). Finally, it too is unstable. Recently (2011) a programmer named Tim Peters developed a sorting algorithm (he named it TimSort) that is based on merge sort and insertion sort. It is stable, and at worst runs in O(N Log2 N) but often runs faster, sometimes as fast as O(N), when the data is not completely random, but partially sorted (which is often the case). It does take up some extra space, but not a lot. To get this performance the method is highly tuned and takes lots of code. But since sorting is done so often, TimSort is now the standard sorting algorithm in Python (where it was developed) and Java. There are hundreds of sorting algorithms; The more you know about how your data is distributed (if it isn't totally random) the better choice of sorting algorithm you can make. Generally, Quick Sort is built into libraries with just one sorting method: Quick Sort is built into Java. Lower Bounds for Comparison Sorting Methods: Certainly we must look at every value in the array when sorting it (if we left one out, it might be in the wrong spot). So we have a trivial Omega(N) lower bound for sorting when using comparisons. But we can use the idea of a Comparison Tree to compute a much more interesting and useful lower bound for sorting using comparisons. For every comparision-based algorithm that we develop for sorting, we can translate it into a Comparison Tree, by looking at which values in the array it compares to which other values in the array. Thus, the entire tree specifies how comparisons are made for every possible input (which is just a different form of sorting algorithms). Each internal node of the tree specifies a comparison to make; each leaf shows the order of all the values. Here is a Comparison Tree for an algorithm that sorts the three values x1, x2, x3. I took this tree from David Eppstein's ICS 163 Notes (so, you might see a similar proof again, at a more sophisticated level, when you are more sophisticated). x1:x2 / \ < / \ > / \ x2:x3 x1:x3 / \ / \ < / \> < / \ > / \ / \ x1,x2,x3 x1:x3 x2,x1,x3 x2:x3 / \ / \ < / \ > < / \ > / \ / \ x1,x3,x2 x3,x1,x2 x2,x3,x1 x3,x2,x1 At the root we know nothing about the ordering of x1, x2, and x3. At every internal node we perform one comparison (different sorting methods do these comparisons in different orders). After each comparison we know a bit more about the ordering of the values. After we accumulate enough information (do enough comparisons on any path from the root downward), we know the exact ordering of the values. So, if for example, we follow from the root and if we find that x1 < x2, then find that x2 < x3 we know the order: x1 < x2 < x3. Likwise, if we follow from the root and find that x1 < x2, then find that x2 > x3 then find that x1 < x3 we know the order: x1 < x3 < x2. In the worst case for an input, a Comparison Tree must perform one comparison for each depth in the tree, and thus in the worst case it performs a number of comparisons equal to its height. So, if we know the the height of a comparison tree, the complexity class of its algorithm is equal to its height. We can use what we know about tree heights to get an interesting bound, by knowing how many leaves must be in any Bomparison Tree. When sorting an N value array, there are N! (N factorial) different arrangements of these values. Each arrangement must occur at least once in the Comparision Tree; so, in the Comparision Tree there are at least N! leaves. For example, for 3 values, there are 6 different arrangements of values: 1) x1 < x2 < x3 2) x1 < x3 < x2 3) x2 < x1 < x3 4) x2 < x3 < x1 5) x3 < x1 < x2 6) x3 < x2 < x1 all of which occur in this Comparison Tree. Note that if there are N different choices for the smallest value, and (N-1)! arrangements for the remaining values. Based on a Comparison Tree having at least N! leaves, we can prove that the height of the Comparison Tree (the number of comparisons performed for the worst case input) is Omega(N Log2 N). Here is a chain of inequalities that allow us to prove this fact. Note first that each Comparison Tree is a binary tree. 1) A Comparison Tree is a binary tree that has at least N! leaves. 2) A binary tree with N! leaves has more nodes than a binary tree with N! nodes - because it must have many internal nodes besides the leaves. 3) The height of a binary tree with N nodes is at least Log2 N. So a binary tree with N! nodes must have a height at least Log2 N!. 4) N! = N * (N-1) * (N-2)* (N-3)*....*(N/2) * (N/2-1) * (N/2-2) *....* 2 * 1 N! > N/2 * N/2 * N/2 * N/2 *....* N/2 Here we replaced the first N/2 terms by N/2 and ignored the second N/2 terms Thus, N! > (N/2)^(N/2) Taking Logs of each side, Log2 N! > N/2 * Log2 N/2 = N/2 *(Log2 N - 1) = N/2 Log2 N - N/2 so Log2 N! is is Omega(N Log N) So, we have Height of Comparison Tree (with N! leaves) > Heigh of a binary tree (with N! nodes) > Log2 N! > N/2 (Log2 N - 1) which is Omega(N Log N) So, for any Comparison Tree that sorts N values, in the Worst Case it requires Omega(N Log2 N) comparisons to find the correct ordering (there is an ordering that requires a number of comparisons at least the height of the tree). You can see pages 531 and 532 in Goodrich and Tamassia for a similar proof. Also note that Log2 n! = Log2 1 + Log2 2 + Log2 3 + ... + Log2 N which can be accurately approximated by integrating Log x dx between 1 and N. Stirling's approximation for N! is sqrt(2*pi*N)* N^N * e^(-N). As N gets big, e^(-N) gets very close to 1, so N! is Omega(sqrt(N) * N^N); when taking Logs we have Log N! is Omega (Log2 sqrt(2*pi*N) + N Log2 N) or just Omega(N Log2 N), by dropping the sqrt term, which we can do for lower bounds. Since we know sorting with comparisons is Omega(N Log2 N) and we have multiple algorithms that are O(N Log2 N) -Heap Sort and Merge Sort- we have "optimally solved" the sorting problem - at least according to its complexity class. Other algorithms based on comparisons might have smaller constants (which affect the actual speed), but none will be in a smaller complexity class. Next we will examine two sorting algorithms that seem to violate this lower bound, but they do so by not using comparisons to sort their values! These are strange algorithms, that are useful for certain kinds of data and have interesting upper and lower bounds that we will explore in more detail (and see that they don't really violate the lower bound)