Sorting: O(N^2) Sorts Sorting is one of the most studied problems in Computer Science. Hundreds of different sorting algorithms have been developed, each with relative advantages and disadvantages based on the data being truly random or partially sorted, and other features, some of which we will discuss below. We will start our discussion of sorting by covering general characteristics (applied to each algorithm we later examine), simple O(N^2) algorithms, more complicated O(N Log2 N) algorithms, non-trivial lower bounds for "comparison" based sorting, and finally sorting methods that do not use comparisons: their complexity classes, and how to interpret them. First, we will often examine the following sorting characteristics for the algorithms discussed below. 1) The complexity class of the algorithm (normally worst-case, but sometimes average-case and best case too) 2) The amount of extra storage needed to run it. If just O(1) extra storage is needed (not proportional to N, the size of the array being sorted), we call the sorting method "In Place". 3) The number of comparsions and data movements needed to sort. For example, it we are sorting a huge amount of information on an external memory that is non-random access (say a hard disk) the cost of a comparison might be a small fraction of the moving data, so we'd prefer an algorithm that does more of the former and less of the later. This won't change its complexity class, but it can have a large impact on its actual performance. 4) Is the algorithm stable: do equal values in the array keep their same relative order in the sorted array as they were in the original array. Stability is sometimes useful, but there is often a price to pay for it (increased execution time). Illustrating and Using Stability: Stability is useful, for example, in the following situation. Imagine we have an array of objects that store a student's name and grade. We want to sort the array by grade (first all A students, then all B students, etc.), but with all students who have the same grade listed in alphabetical order. With stable sorting we can do this easily. First, assume the original array contains the following pairs of data in each object (Name-Grade). Bob-A Mary-C Pat-B Fred-B Gail-A Irving-C Betty-B Rich-F 1) Sort on the minor/secondary key (name) first; we don't care whether or not the sort is stable; so assume the result is Betty-B Bob-A Fred-B Gail-A Irving-C Mary-C Pat-B Rich-F 2) Sort on the primary/major key (grade) using a stable sort. So, for example, since Betty (grade B) is to the left of Fred (grade B) who is to the left of Pat (grade B), for the equal keys of B (when sorting by grade with a stable sort), this order will be maintained in the newly sorted array: Betty to the left of Fred, Fred to the left of Pat. Bob-A Gail-A Betty-B Fred-B Pat-B Irving-C Mary-C Rich-F Thus, the information is finally sorted by grade, with all those students with the same grade (sub)sorted by name. Another way to accomplish this same ordering is by sorting once, but with a more complicated Comparator (instead of sorting twice, with two simple comparators). Use a Comparator such that if the grades are different, indicate which one is smaller; but if they are the same, indicate which one has the smaller name. So when comparing Betty-B and Bob-A, the grades are different so Bob-A comes first; when comparing Betty-B and Fred-B, the grades are the same but Betty's name is smaller (comes before in the dictionary ordering) Fred's name. Here is the comparator assuming String fields .name and .grade in a class called Student public class byGradeName implements Comparator { public int compare (Student a, Student b) { int gradeCompare = a.grade.compareTo(b.grade); //String compare .grades if (gradeCompare != 0) return gradeCompare; else return a.name.compareTo(b.name); //String compare .names //or, return (gradeCompare!=0 ? gradeCompare : a.name.compareTo(b.name)); } } In the spirit of empirical investigation, I have written a small driver that serves as a testbed for sorting application. It is available off the Programs link for the course (Sorting). It allows us to time various sorting algorithms on various sized-data with various orderings (including random). Simple to Code O(N^2) Sorts In Selection Sort, the left part of the array is sorted and the right is unknown. Each iteration around the outer loop ensures the sorted part expands by one index (to the right) and the unsorted part shrinks by one index (on the left). The algorithm scans forwards from the 1st unsorted index to the end of the array to find the smallest remaining value in the array, then it swaps that value with the one in the first unsorted index. 1) Worst is O(N^2), best is O(N^2), and average is O(N^2) 2) In-place: needs a few extra local variables in the method 3) O(N^2) comparisions; O(N) swaps 4) Unstable public static void selectionSort(int[] a) { for (int indexToUpdate=0; indexToUpdate meaning the values at indexes 0-2 are sorted (with the 3 smallest array values in order) and the values at indexes 3-9 are unsorted; this loop scans all of the unsorted values to find the smallest one, and immediately after the end of this loop, the code swaps it with the value in the first unsorted index (3). So, the value at array index 3 will store the next biggest value and the dividing line will moved one to the right and be between 3 and 4. Note the body of the inner for-loop is executed N-1 times (N = a.length) times when indexToUpdate is 0; N-2 times when indexToUpdate is 1; N-3 times when indexToUpdate is 2; ... 0 times when indexToUpdate is N-1. So, the total number of times it executes is the sum 0+1+2+...+(N-1) = N(N-1)/2. Note that the body of the inner loop does one comparison; each time the inner loop is finished, the body of the outer loop finally moves data in the e array. Some students might want to rewrite the swapping by embedding it in an if statements, do "avoid doing extra work": if (indexToUpdate != i) { int temp = a[indexToUpdate]; a[indexToUpdate] = a[indexOfMin]; a[indexOfMin] = temp; } adding extra code to avoid swapping a value with itself (the code executes correctly, but ultimately makes no changes in the array). The problem is that to save doing a swap that is SOMETIMES unneeded, we must ALWAYS do a comparison in the if. Suppose that the comparison takes 1 computer operation and the swap takes 3; also suppose that when sorting 1,000 values, 95% of the time indexToUpdate is not equal to i. Then, the original code takes 3,000 instructions (always swapping for 1,000 values). The conditional code takes 1,000 instructions to test whether to swap and swaps 950 times (so takes 1,000+2,850 = 3,850 computer instructions, compared to the 3,000 done the "always swap" way). So, the extra code isn't really an "improvement". How about stability? If you sort the following tiny array (sorted by name) by grade Betty-B Fred-B Gail-A The result will be Gail-A Fred-B Betty-B which has inverted the order of BettyB and Fred-B, so the sort is unstable. This algorithm moves data too radically in the array. Generally, swapping the value at indexToMove (on the left) to i (on the right) might make the value at indexToMove move to the right of other values that are equal to it (between indexToMove and i). This algorithm works equally well for arrays and linked lists (with slight changes in code). It runs in about the same amount of time no matter what the order in the original array. Also note that it is an offline algorithm: it requires all the data be present in the array before it can start. In Insertion Sort, again the left part of the array is sorted and the right is unknown. Each iteration around the outer loop ensures the sorted part expands by one index (to the right) and the unsorted part shrinks by one index (on the left). The algorithm moves/swaps the value in the 1st unsorted index backwards, until it is >= the value before it. So, only data in the sorted part changes. 1) Worst is O(N^2), best is O(N), average is O(N^2) 2) In-place: needs a few extra local variables in the method 3) O(N^2) comparisions; O(N^2) swaps 4) Stable public static void insertionSort(int[] a) { for (int indexToMove=0; indexToMove=0; i--) if ( a[i] <= a[i+1] ) break; else { //Swap a[i] and a[+1] int temp = a[i+1]; a[i+1] = a[i]; a[i] = temp; } } When indexToMove is 3, we have 0 1 2 3 4 5 6 7 8 9 +---+---+---+---+---+---+---+---+---+---+ | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+ ^ <- Sorted | Unsorted -> meaning the values at indexes 0-2 are sorted (although they might not yet contain the 3 smallest array values!) and indexes 3-9 are unsorted; this loop swaps the value at index 3 backwards until it is >= the value to its left, so at the end of this loop, the array indexes 0-4 will be in order (although they might not yet contain the 4 smallest array values!) and the dividing line will be between 3 and 4. Note the body of the inner for-loop is executed at most 0 times when indexToMove is 0; at most 1 time when indexToMove is 1; ... at most N-1 times when indexToMove is N-1 (when N = a.length). So, the most number of times it executes is the sum 0+1++2+...+(N-1) = N(N-1)/2. Note that in the best case (where the entire array is completely sorted), each value in indexToMove will already be bigger than the one before it, so the loop will immedately break, requiring just a total of N comparisons in the best case. If some values in the array are equal, the one on the right will move left, but stop to the right of any equal values (see the break, controlled by the "<=" operator). This means that the Insertion sorting method is stable. So, this method in the worst case does the same number of comparisons as in Selection sort, but many more swaps than Selection sort, and therefore it has a higher constant. But if we know the array is sorted (or very close to being sorted: where no values are far away from where they belong) this method is O(N), whereas selection sort is always O(N^2) -worst, best, and average case. This means if we know something about the data (like it is almost sorted) it means that we might prefer this algorithm over the previous one (in fact, if the data is almost sorted, this algorithm beats the O(N Log N) algorithms. This algorithm also works for doubly linked lists, but not simply for linear linked lists (note the inner for loop is incrementing backwards); but, if we remove each value from the first list and insert it into a second list (so that the second list is always sorted), this algorithm works for linked list (although a sorted list takes O(N^2) comparisons while the opposite of a sorted list takes only O(N)). Also note that it is an online algorithm: it doesn't requires all the data be present in the array before it can start: as each new value is "added to the array", the algorithm moves it backward to its correct position. ------------------------------------------------------------------------------ I'm not a big fan of animations, but you might want to check out the sorting animations at http://www.sorting-algorithms.com/. I think these animations are better on the O(N^2) algorithms, which are pretty easy to visualize without a computer anyway.