Sorting: O(N) Sorting without Comparisons


In this lecture, we will examine two sorting methods that do not use
comparisons between values to sort the data. So, they are not constrained
by the lower bound proved in the previoius lecture. The algorithms are called
Bucket Sort and Radix Sort. Both are a bit strange; they work well for integers
(and Radix Sort for Strings) but don't work well for other kinds of data (e.g.,
anything that is not "digital").


Bucket Sort:

Bucket Sort allows us to sort N data values, where each lies in the range 0-M
in time O(N+M). Note that if we don't know the range of integers, we can scan
the array first, to find the smallest and biggest one, scale everything between
those two values: scan the array (O(N)) to find S, the smallest value and B,
the biggest value, scan it again (O(N)) subtracting S from every value (the
range will be 0 to B-S), then sort these values, then scan it a final time
(O(N)) adding S to every value. The scaling and unscaling processes are each
O(N) (adding three O(N) passes to the data).

We will analyze a few problems with various sizes of N and M below.

For a first example, suppose that we needed to sort 1,000 exam scores, all in
the range 0 to 100 (so N = 1,000 and M = 100).

First, here is the psuedo-code for this algorithm. It uses an array much like
a histogram (keeping track of the number of times a specific data value is
seen in the array to be sorted).

  1) Declare an int histogram array with indexes/buckets 0 to 100 and 
     initialize each to 0 (there has been 0 of each index value seen so far).

  2) Look at every data value in the array to sort, and increment by 1 the 
     index (specified by that data value) in the histogram array. So, if 78 was
     the next data value in the array to sort, increment the histogram array at
     index 78.

  3) Scan the entire histogram array from 0 to 100; whenever we get to an index
     that stores c (with c != 0), put c values of that index into the next
     positions in the array to sort(starting at the beginning, replacing the
     values already in the array).
     
Step 1 is O(M), based only on the range of values (number of 0s we need to
store in the array indexes) and not on the number of values to sort.

Step 2 is O(N), based only on the number of values to sort, not the range of
values. Of course, we can access index i in an array in O(1) time. 

Step 3 is O(N+M), based both on the number of values to sort (since we are
putting each into the sorted array), and the range of values (since we must
scan through each bucket).

Looking at raw numbers, Step 1 requires 100 operations, Step 2 requires 1,000
operations, and Step 3 requires 1,100 operations (so the total operations is
about 2,200). If we just sorted the array using an O(N Log2 N) algorithm, it
would require about 10,000 operations. So here bucket sort looks pretty good,
by a factor of 5.

For a second example, suppose that we needed to sort 1,000,000 values from 0
up to 1 billion (so N = 1,000,000 and M = 1,000,000,000). We would

  1) Declare an int histogram array with indexes 0 to 1 billion and initialize
     it to 0 (0 of each index value seen so far).

  2) Look at every value in the array to sort, and increment by 1 the index
     (specified by that data value) in the histogram array. So, if 157,000,000
     was the next value in the array to sort, increment the histogram array at
     index 157,000,000.

  3) Scan the entire histogram array from 0 to 1 billion; whenever you get to
     an index that stores c (!= 0) put c values of that index into the next
     position in the array to sort (starting at the beginning, replacing the
     values already in the array).
     
Looking at raw numbers, Step 1 requires 1,000,000,000 operations, Step 2
requires 1,000,0000 operations, and Step 3 requires 1,001,000,000 operations
(for a total of about 2,002,000,000 operations). If we just sorted the array
using an O(N Log2 N) algorithm, it would require about 20,000,000 operations.
So here bucket sort looks pretty bad, by a factor of 100. Note we are doing a
tremendous number of operations to scan buckets that are likely empty (a
maximum of 1 in 1,000 buckes is non-0, achieved when every value in the array
to sort is different).

If we had to sort many numbers in a small range (or even billions of numbers in
the full int range: any time N>>M), then Bucket Sort would be more efficient
than comparision-based sorting. But if the range of possible values is very big
compared to the number of values to sort (M>>N), comparison-based sorting would
probably be faster. This is all assuming enough memory to store all N values.

Also, this method doesn't work well for Strings of even moderate size. If we
converted each String of lower-case letters to an int to solve this problem,
note that there are 26^N different String of N values: and, 26^7 is already
over 8 billion (so M is very big even for 7-character Strings).


Radix Sort:

Finally, we will examine Radix Sort. It repeatedly does something like Bucket
Sort, but always ensuring that N>>M, so the bucket sort is effective. It is
applicable when we sort numeric values that can be broken into pieces (the
digits in a number) where the "significance" of the pieces will be proceesed
from least to most significant, using a stable algorithm.

So, Radix Sort works by repeatedly doing something like a Bucket Sort in the
cases where Bucket Sort is efficient: sorting a large number of values each of
which has a small range (say, sorting many numbers by one of their digits,
where the range for each digit is 0-9).

Generally, suppose that we need to sort N positive int values. Numbers in the
range 0 to 1.5 billion have a most 10 digits (Log10 1.5 billion ~ 9.2). Here is
the psuedo-code for radix sorting.

Create an array of 10 buckets, each storing an empty queue
for every "place" (10 digit numbers: 1s, 10s, 100s, 1,000s, ... 1,000,000,000s)
  for every value in the array to sort
    add it to the queue in the correct bucket, according to the digit in the
      current "place"
  for every bucket (in order from 0 to 9)
    move each of the values, from its queue, back into the array to sort
  (leaving the queue empty)

For the first "place" (1s), the numbers will be sorted by their last digit
only. For the second "place" (10s), the numbers will be sorted by their last
two digits (because of the "stability" property of the queue).... For the last
"place" (1,000,000,000) the numbers will be sorted by all their digits. 

Note that each iteration is taking a result (sorted by all the lower places)
and extending it to the current "place". Because we are using a stable
mechanism (copying from the queues back into the array), numbers with an equal
digit in "place" are sorted by all the rest of the places already sorted.

Let's do a quick example, with much more restricted numbers. Let's use Radix
Sort to sort the following 10, three digit numbers (so we require three
passes, with the places 1s, 10s, and 100s).

  664, 947, 654, 305, 565, 424, 517, 252, 223, 326

In the pictures below, the queues go downward from each bucket (takes less
space in the pictures). First, we start with the 1s "place" and get

    0   1   2   3   4   5   6   7   8   9
  +---+---+---+---+---+---+---+---+---+---+
  |   |   |   |   |   |   |   |   |   |   |
  +---+---+---+---+---+---+---+---+---+---+
           252 223 664 305 326 947     
                   654 565     517            
                   424                     

Notice that each number is in the bucket based on its 1s "place".

Then we put these values back into the array, from buckets 0-9 (in that order),
removing each value from its queue. Notice that all numbers are sorted by their
last digits, but not any others. If their last digits are the same, their order
is the same order as they appeared in the input array (see 664, 654, and 424).

  252, 223, 664, 654, 424, 305, 565, 326, 947, 517, 

Then, we continue doing the same process on this new array with the 10s
"place" and get

    0   1   2   3   4   5   6   7   8   9
  +---+---+---+---+---+---+---+---+---+---+
  |   |   |   |   |   |   |   |   |   |   |
  +---+---+---+---+---+---+---+---+---+---+
   305 517 223     947 252 664                
           424         654 565
           326                             

Notice that each number is in the bin based on its 10s "place". 

Then we put these values back into the array, from buckets 0-9 (in that order),
removing each value from its queue. Notice that all numbers are sorted by their
last two digits. If their second to last digits are the same, their order is
the same order as they appeared in the input array, which was sorted on the
last digit (see 223, 424, and 326) so now it is sorted on the last two digits.

  305, 517, 223, 424, 326, 947, 252, 654, 664, 565
                                           
Then, we continue doing the same process on this new array with the 100s
"place" (the most significant "place" and the final iteration) and get

    0   1   2   3   4   5   6   7   8   9
  +---+---+---+---+---+---+---+---+---+---+
  |   |   |   |   |   |   |   |   |   |   |
  +---+---+---+---+---+---+---+---+---+---+
           223 305 424 517 654         947
           252 326     565 664             
                                           
Notice that each number is in the bin based on its 100s "place" (the most
significant one).

Then we put these values back into the array, from buckets 0-9 (in that order),
removing each value from its queue. Notice that all numbers are sorted by all
their digits.  If all their digits are the same, their order is the same order
as they appeared in the input array.

  223, 242, 305, 326, 424, 517, 565, 654, 664, 947
         
So, this sorting mechanism requires O(N) extra space (the sum of the space
occupied by all the queues is N and the original array is N) and it is stable.
Actually, if we are using a queue implementation that doesn't shrink arrays
(ArrayQueue doesn't but LinkeQueue does) then the space occupied by all the
queues could be much worse than 2N, it could be MN (or 10N, since M is 10
here).

Radix Sort's running time is based on the following: inside the outer loop, we
do N operations to move the N values from the array into their correct queues
(each queue operation is O(1)), and then do N more operations to transfer from
the queues back to the array. But we must multiply this operation count by the
number of outer loop iterations (which we will analyze below).

If there are N distinct numbers to sort, say the integers 1-N, then the biggest
number is N, and it has Log10 N digits, so the outer loop executes Log10 N
times and the total complexity is O(N Log10 N). Does the base of the logarithm
make difference for complexity classes? Mathematically no, because Log10 N =
Log2 N/Log2 10, and Log2 10 is just a constant that is absorbed by the big-O
notation we use for complexity classes. Of course, the actual running time
depends on this constant.

This argument says that we really can just write Log (without any base)
in all our complexity classes, because different Logs, regardless of their
bases, are only constant multiples of each other.

Thus, we characterize Radix Sort as 
  1) Worst case is O(N Log10 N)
  2) Requires O(N) extra storage for the queues
  3) No comparisons; O(N Log10 N) data movements in all cases
  4) Stable

Instead of choosing a radix of 10, we can choose a radix of 100. With this
choice, we have 100 buckets and we first sort by the last two digits (1s and
10s), then the next two (100s and 1,000s), etc. Here we tradeoff space (10
times as much space in the buckets, which is still much less than the space
taken up by the numbers to sort) against loop iterations (1/2 as many).

This method works for "digital" keys: keys that can be broken into "digits",
like integers and Strings (just as we did for storing/retrieving information in
"digital trees"). For String Radix Sort we can use a # of bins equal to the
number of possible ASCII characters (128 works for English characters). Note
that the number of iterations in the outer loop is equal to the length of the
longest String, so this method works best if all the Strings are about the
same size.

In the next quiz you will implement Radix Sort (easy to do given any simple
queue implementation). When I put this unsophisticated code into my sorting
tester, I found that Radix Sort (on a random array of 1,000,000) took a bit
longern than Mergesort and bit less than Heapsort. Quicksort still beats them
all (although it is unstable, unlike Mergesort and Radix Sort). When I made the
radix larger (100, 1000, ...), I got close to the quicksort time, but radix
sort still never quite ran as fast for the biggest-sized arrays I could test.
I didn't spend time trying to optimize the Queue implementation.


Some real numbers for sorting:

I ran this on my home computer. I generated 3 arrays (storing Integer wrapper
class values) of length N and called each sorting method 5 times. The results
here are the averages.


     N	   Selection   Insertion   Heap   Merge   Quick   Radix10   Radix100
----------+----------+-----------+------+-------+-------+---------+----------+
    10,000|   .454   |   .287    |  *   |  *    |   *   |    *    |    *     |
----------+----------+-----------+------+-------+-------+---------+----------+
   100,000| 47.0     | 27.4      | .059 | .050  | .031  |  .093   | .059     |
----------+----------+-----------+------+-------+-------+---------+----------+
 1,000,000|   +      |   +       |1.359 | .890  | .478  | 1.411   | .919     |
----------+----------+-----------+------+-------+-------+---------+----------+
             U,I           S,I      U,I    S,N      U,I    S,N        S,N

*   = Cannot be timed: <.001 seconds
+   = Not timed      : > 1 minute
U/S = Unstable/Stable
I/N = In place (requires < O(N) more space: O(1) or O(Log N))/Not in place

For more details, look at the Wikipedia article on Sorting Algorithms:

http://en.wikipedia.org/wiki/Sorting_algorithm