Analysis of Algorithms


Analysis of Algorithms: Second Lecture

In this lecture we will first review a bit of the material from the previous
lecture, and then discuss lower-bounds (Omega) and tight bounds (Theta) and how
they apply to analyzing algorithms.

We will also look at some small but real problems through the lens of analysis
of algorithms, and compare fundamentally differerent algorithms for solving the
same problem. Finally, we will compare some time and space metrics for array
vs. linked list implementations of simple ordered collections (e.g., queue).


Big-O:

Recall the formal definition of big-O notation, which bounds a function from
above. A function f(n) is O(g(n)) -often written "in the complexity class of
O(g(n))" if there are  values c and n0 such that  f(n) <= c g(n) for all n>n0.

Typically the "f" function we are interested in measures the effort (often the
amount of time) it takes for some algorithm a (coded in some language as
a method m) to run, which we write either Ta(N) or Tm(N). Note that Ta(N) is
O(N), then am(N) is also O(N^2), O(N^3), etc. because these functions are even
bigger bounds: if f(n) <= c1 n then f(n) <= c2 n^2, etc. Typically, though, we
are looking for the smallest complexity class that bounds some algorithm or
method.


Big-Omega:

Big-Omega notation bounds a function from below instead of from above.
The definition starts similarly to big-O notation: A function f(n) is
Omega(g(n)) if there are values c and n0 such that f(n) >= c g(n) for all
n>n0.

Notice the <= in big-O notation has been replaced with a >= in big-Omega
notation. Although big-O notation is mostly used to analyze "algorithms",
big-Omega notation is mostly used to analyze "problems". With big-O notation
we analyze one SPECIFIC algorithm/method to determine an upper bound on its
performance. In big-Omega notation we analyze all possible algorithms/methods
to determine a lower bound on performance. This second task is much harder.

For example, it is trivial to prove that any algorithm that solves the "find
the maximum of an unordred array problem" is Omega(N) because it has to look
at least at every value in the array ; if it missed looking at some value in
the array, that value might be the biggest, and the algorithm would return the
wrong value.

Interesting lower bounds on problems are much harder to prove than upper bounds
on algorithms. The lower bound on a problem is much more general: it says,
"for ANY ALGORITHM that solves this problem, it will take AT LEAST g(n)
operations". Whereas, for upper bounds we are analyzing something much
more concrete: one actual agorithm: we say, "for this particular algorithm,
it will take AT MOST g(n) operations."

Often the only lower bound that we can get on a problem is trivial -like that
we must examine every value in an array. Later in the quarter we will examine
an interesting/beautiful lower bound for sorting via comparisons: such a
problem is Omega(N Log N). We also will examine sorting algorithms that are
O(N Log N). This means that within comparison based sorting, we have optimally
solved the problem according to complexity class: any algorithm to solve this
problem requires work proportional to N Log N and we have an algorithm that
solves this problem in work proportional to N Log N. So, a new algorithm might
have a better/smaller constant (which is very important, once we have resolved
in which complexity class is the problem), but a better algorithm cannot have a
lower complexity class.

One interesting example of a LACK of an obvious lower and upper bounds concerns
matrix multiplication. When we multiply two NxN matrices we get another NxN
matrix. Since the input matrices have N^2 values and the result has N^2 values,
we know that this problem is Omega(N^2): it must at least look at 2N^2 inputs
and produce N^2 outputs. But, the standard algorithm to multiply matrices is
O(N^3).

So there is a gap between the complexity class of the problem (the lower bound
for the problem is Omega(N^2)) and the complexity class of the solution (the
upper bound for the standard matrix multiplication algorithm is O(N^3)). Either
we should be able to improve the lower bound by raising it: by proving more
work is always needed, or we should be able to improve the upper bound by 
lowering it: finding a better algorithm and proving that it needs to do less
work than the standard one.

In the 60s, a Computer Scientist named Strassen devised an algorithm to solve
this problem in O(N ^ Log 7): N raised to the power of Log (base 2) of 7, which
is ~N^2.8 (recall Log (base 2) of 8 = 3 so Log (base 2) of 7 will be a bit less
than 3), somewhat better than N^3 but still higher than N^2.

In the 90s two Computer Scientists, Coopersmith and Winograd, devised an
algorithm whose complexity is O(N^2.376). Interestingly enough, the constant
on this algorithm is so huge, the n0 for which is starts being faster than
Strassen is bigger than matrices easily storable on today's computers (more
than billions of values). In 2002, a computer scientist named Raz proved a
new lower bound of Omega(N^2 Log N), which is bigger than Omega(N^2).

So, at this point we know the actualy complexity of the problem, call it c(n)
is somewhere between N^2 Log N and N^2.376. Factoring out the N^2, difference
is between Log N and N^.376 (which is about the cube root of N).

For more information, check http://en.wikipedia.org/wiki/Strassen_algorithm

So, better algorithms decrease the big-O complexity class, better lower bound
proofs increase the big-Omega minimal complexity. If the big-O and big-Omega
bounds are the same functions, then we have discovered an optimal algorithm to
solve the problem. Well, best to say "optimal within a constant", as other
algorithms in the same (optimal) complexity class might exhibit a smaller
constant and be faster.

Sometimes we do want to prove just that some function f(n) is Omega(g(n)).
For example, we want to prove that f(n) = 5n^2 + 3nlogn + 2n + 5 is Omega(n^2).
So, we need to prove that c n^2 <= 5n^2 + 3nlogn + 2n + 5 for all n>n0. We can
easily ignore all positive lower order terms:

  f(n) >= f(n) - 3nlogn -2n - 5 (for all n>1) =  5n^2 and
  5n^2 >= 4n^2 (for all n > 1)

following the inequalities (and reversing how it is shown),
4n^2 <= f(n) (for all n>1). That is, 4n^2 <= 5n^2 <= 5n^2 + 3nlogn + 2n + 5
because for all problem sizes (which are positive) 3logn, 2n, and 5 are >=0

By a similar subtraction we can can ignore all positive lower-order  terms.

Likewise, we can see that for f(n) = 2n - 100log n, f(n) is O(n) because we
can choose c to be 1, so we need to know when

  2n - 100log n >= n 
  n - 100log n >= 0   (subtract 1n from each side)
  n >= 100log n       (add 100log n to each side)

It is not easy to solve this inequality, but log 1024 is 10, and 100log 1024
is 1000, so n >= 100log n for n = 1024, and n grows faster than log n, so for
bigger n, n is even bigger than 100log n.
 


Big-Theta:

This brings us to our final notation. Big-Theta notation is a combination of
big-O and big-Omega, which bounds a function from below and above. A function
f(n) is Theta(g(n)) if there are values c1, c2, and n0 such that
   c1 g(n) <= f(n) <= c2 g(n) for all n>n0.

We use Theat notation for two purposes. First, we use it to show that the O
notation is "tight" not only is some function O(g(n)) but we cannot really find
a smaller complexity class because it is Omega(g(n)) too.

For example, we proved f(n) = 5n^2 + 3nlogn + 2n + 5 is O(n^2) (for c2=15 and
n0=1) and we proved above that f(n) is Omega(n^2) (for c1 = 4 and n0=1) so
we have our c1, c2, and n0 (n0 is generally the bigger of the two, but here
both are 1). So talking about f(n) in terms of the n^2 complexity class makes
sense for an upper (O) and lower (Omega) bound.

We also use Theta notation to mean that we have found an optimal (within a
constant) algorithm for a problem: if our algorithm is O(g(n))and the problem
is Omega(g(n)) than our solution's complexity class is as good as we can get.
We will see that as a problem, sorting with comparisons is Omega(N Log N) and
we will see various sorting algorithms (mergesort and heapsort) that are
O(N Log N) so sorting is Theta(N Log N).

Finally, to make matters a bit worse (but more truthful), there is a sorting
algorithm called quicksort that is O(N^2) in the worst case, but O(N Log N)
for almost all inputs. In fact, the constant for a method implementing this
algorithm is typically smaller (by a factor of 2-3) than the constant for
mergesort and heapsort, two other sorting algorithms that guarantee to run in
O(N Log N).

See the sorting demo link in the reading to test various sorting methods. For
O(N^2) sorts, try sorting tens of thousands of values; for O(N Log N) sorts,
try sorting millions of values.

So, even though quicksort's worst complexity class is higher than other fast
sorting algorithms, its average performance is in the same complexity class as
other fast sorting algorithms, and its constant is actually lower. So, choosing
the best algorithm is a bit more complicated than just finding one in the
lowest complexity class. Note that on a few problems, quicksort can take much
longer than mergesort or heapsort.



Composing Complexity Classes: Sequential and Nested Statements

Note that using big-O notation
 O(f(n)) + O(g(n)) is O(f(n) + g(n)) which results in the bigger of the
complexity classes. So O(N) + O(Log N) = O(N + Log N) = O(N).

This rule helps us understand how to compute the complexity of doing some 
sequence of operations: executing some statements that are O(f(n)) followed by
executing some statements that are O(g(n)). Executing all the statemetns is
O(f(n) + g(n)). For example, if some method call m1(); is O(N) and another
method call m2(); is O(N Log N), then doing the sequence

   m1();
   m2();

is O(N + N Log N) which is O(N Log N).


Likewise, using big-O notation
 O(f(n)) * O(g(n)) is O(f(n) * g(n))
So, if we repeat an O(f(N)) process O(N) times, the resulting complexity
is O(N)*O(f(N)) = O(Nf(N)). An example of this is, if some method call m(); is
O(N^2), then executing that call N times (in the following loop)

  for (int i = 0; i<N; i++)
    m();

is O(N)*O(N^2) = O(N*N^2) = O(N^3)

Many compound statements can be analyzed by composing the complexity classes of
their constituent statements.


A Collection Class Example:

W will now look at the solution of a few problems (combining operations on a
priority queue (pq)) and how the complexity class of the result is affected by
three different implementations of priority queues.

For the problems, all we need to know is the complexity class of the "add" and
"remove" operations.

                      add           remove
	         +-------------+-------------+
Implementation 1 |    O(1)     |    O(N)     |
	         +-------------+-------------+
Implementation 2 |    O(N)     |    O(1)     |
	         +-------------+-------------+
Implementation 3 |  O(Log N)   |  O(Log N)   |
	         +-------------+-------------+

So, Implementation 1 works by adding the new value into the pq at the rear of
an array or the front of a list: O(1); it removes the highest priority value by
scanning through the array or list to find the highest: O(N).

Implementation 2 adds the new value into the pq by scanning the array or list
for the right spot to put it (arrays store their highest priority at the rear,
lists at the front); it removes the highest priority value from the rear or
front.

Implementation 3, which we will discuss later, uses a heap to implement both
operations with "middle" complexity, greater than O(1) but less than O(N).


Problem 1: Suppose we wanted to use the priority queue to sort N values: we
enqueue N values and then dequeue N values. Here is the complexity of these
combined operations for each implementation.

Implementation 1: N*O(1) + N*O(N)         = O(N)   + O(N^2)     = O(N^2)
Implementation 2: N*O(N) + N*O(1)         = O(N^2) + O(N)       = O(N^2)
Implementation 3: N*O(Log N) + N*O(Log N) = O(NLogN) + O(NLogN) = O(NLogN)

Here, Implementation 3 has the lowest complexity for the combined operations.
Implementations 1 and 2 each do one operation quickly but the other slowly;
since the slowest operation determines the complexity class, both are equally
slow. The compleixty class O(Log N) is between O(1) and O(N); surprisingly, it
is actually "closer" to O(1) than O(N), even though it does grow -because it
grows so slowly.

Problem 2: Suppose we wanted to use the priority queue to find the 10 biggest
(of N) values: we would enqueue N values and then dequeue 10 values. Here is
the complexity of these combined operations for each implementation..

Implementation 1: N*O(1) + 10*O(N)         = O(N)   + O(N)      = O(N)
Implementation 2: N*O(N) + 10*O(1)         = O(N^2) + O(1)      = O(N^2)
Implementation 3: N*O(Log N) + 10*O(Log N) = O(NLogN) + O(LogN) = O(NLogN)

Here, Implementation 1 has the lowest complexity for the combined operations.
That makes sense, as the operation done many times (add) is very simple (add to
the end of an array/the front of a list) and the operation done a constant
number of times (10, independent of N) is the expensive operation (remove). It
even beats the complexity of Implementation 3. So, as N gets bigger,
implementation 1 becomes faster than the other two.

So, there isn't often a "best all the time" implementation. We need to know
what problem we are solving (the complexity classes of all the operations in
various implementations and the number of times we must do these operations) to
choose the best implementation for solving the problem.


Analyzing Array Doubling (vs Linked List Allocation):

Assume that we are using an array to store a simple ordered collection (e.g., a
queue; and assume we are just adding values, not removing any). We start by
allocating (using "new") a 1 element array.

1) When we add the 1st value, we just store it in index 0 of the array; that
array is now filled.

2) When we add the 2nd value, we must reallocate (again using "new") a 2
element array, and then copy all the values from the old 1 element array into
it, and then we can store the 2nd value into index 1; that array is now filled.

3) When we add the 3rd value, we must reallocate (again using "new") a 4
element array, and then copy all the values from the old 2 element array into
it, and then we can store the 3rd value into index 2; there is 1 more available
index in the array (index 3).

4) When we add the 4th value, we just store it at index 3.

5) When we add the 5th value, we must reallocate (again using "new") an 8
element array, and then copy all the values from the old 4 element array into
it, and then we can store the 5th value into index 4; there are 3 more
available indexes in the array (indexes 5-7).

6-8) When we add the 6th, 7th, and 8th values, we just store them into indexes
5, 6, 7.

9) When we add the 9th value, we must reallocate (again using "new") a 16
element array, and then copy all the values from the old 8 element array into
it, and then we can store the 9th value into index 8; there are 7 more
available indexes in the array (indexes 9-15).

10-16) When we add the 10th, 11th, ... 16th values, we just store them at
indexes 9, ... 15.

17) When we add the 17th value, we must reallocate (again using "new") a 32
element array, and then copy all the values from the old 16 element array into
it, and then store the 17h value into index 16; there are 15 more available
indexes in the array (index 17-31).

We can make a table illustrating the total number of times "new" is called and
the total amount of copying needed to add the Nth value into a queue
represented by an array. Notice that for every value 1 beyond a perfect power
of 2, we must call another "new" and copy some more values.

 adding | times "new" | total copied
        |   called    |  values
--------+-------------+--------------------------
   1    |     1       |    0   
   2    |     2       |    1 (= 0 +  1) 
  3-  4 |     3       |    3 (= 1 +  2)
  5-  8 |     4       |    7 (= 3 +  4)
  9- 16 |     5       |   15 (= 7 +  8)
 17- 32 |     6       |   31 (=15 + 16)
 33- 64 |     7       |   63 (=31 + 32)
 65-128 |     8       |  127 (=63 + 64)
 ...

So, when adding values to an array, to store between 2^(M-1)+1 and 2^M values
takes at most M+1 calls to "new" and 2^M - 1 copied values. Another way to
think of this is to store N values requires about Log2 N calls to "new" and
N copied values.

Another way to think about this is that to get to an array with 1,024 values we
have to double at 512 and do 512 copies; to get to an array with 512 values we
have to double at 256 and do 256 copies; to get to an array with 256 values we
have to double at 128 and do 128 copies; ...

So how many copies in total do we do?
   512 + 256 + 128 + 64 + 32 + 16 + 8 + 2 + 1
Note that the series s  = a + ar + ar^2 + ar^3 + ... ar^n (plug in a=512,r=1/2)
  multiplying by r:  rs = ar + ar^2 + ar^3 + ... ar^n + ar^(n+1)
  or                 rs =     ar + ar^2 + ar^3 + ... ar^n + ar^(n+1)
  subtracting    s - rs = a - ar^(n+1)             (all the other terms cancel)
  or             s(1-r) = a(1 - r^(n+1))
  divide by 1-r  s      = a(1 - r^(n+1))/(1-r)
plug in a=512,r=1/2: s  = 512(1 - 1/2^(n+1))/ 1/2
now for large n, 1/2^(n+1) is very close to 0, so
  s ~ 512(1-0) / 1/2 = 512 * 2 = 1,024
so to get to 1,024 we have to do about 1,024 copies, so again the number of
copies is linear in the number of values added to the array. N adds is O(N)

Contrast this to using a linked list for this collection. To store N values
requires calling "new" N times (allocating a new LN for each value) and
performing 0 copies.

The "new" and copy operations are best thought to be O(1). So the complexity
class for using an array is

  (Log2 N)*O(1) + N*O(1) = O(Log2 N  +  N) = O(N)

and the complexity class for using a linked list is

  N*O(1) = O(N)

So both implementations are O(N). Which is actually faster? Based on complexity
classes we cannot know. But if you measure the times needed for both, we will
typically find that array implementation is faster. Here is some insight into
why.

Although calling "new" and copying values are both O(1), the constants are
different. Let's assume an overly simplistic model that calling "new" requires
20 instructions and copying requires 10 instructions. This means that

  # instructions needed to add N values into an array      = 10N + 20 Log2 N
  # instructions needed to add N values into a linked list = 20N

Let's look at these formulas for a few different values of N (but spanning
a huge range, from 4 to 1,000,000).

For N = 4 (Log2 4 = 2): # array instructions (#ai) = 40 + 40 = 80;
# linked list instructions (#lli) = 80. #lli/#ai = 80/80 = 1 so they run at
about the same speed.

For N = 8 (Log2 8 = 3): #ai = 80 + 60 = 140; #lli = 160.
#lli/#ai = 160/140 = 1.14; so the array is 14% faster.

For N = 64 (Log2 32 = 5): #ai = 320 + 100 = 420; #lli = 640.
#lli/#ai = 640/420 = 1.5; so the array is 50% faster.

For N = 1,000: #ai = 10,000 + 200 = 10,200; #lli = 20,000. #lli/#ai = 1.96;
so the array is 96% faster;

For N = 1,000,000: #ai = 10,000,000 + 400 = 10,000,400; #lli = 20,000,000.
#lli/#ai = 1.99; so the array is 99% faster;

So, as N -> infniity, the logarithmic term is dominated by the linear term in
#ai, so #lli/#ai -> 2; so for large N #ai is 100% faster (at most twice a
fast).

Here the coefficient for allocation is twice what the coefficient for copying
is; if it is higher, using an array becomes even more than twice as fast.


Now let's look a bit at analyzing the space for these data structures. First,
let's not worry about the size of the values actually stored in the queue;
we'll worry only about the storage occupied by the queue itself. We assume all
references occupy 1 word of memory (the same as an int would occupy).

  1) For the array implementation, if we are storing N values we need between
  N+1 and 2N-1 memory locations: e.g., if we need to store 1,024 values we
  would need 1,025 memory locations (1,024 reference + 1 to store the array's
  length), but if we need to store 1,025 values we would need 2,049 memory
  locations, because of array doubling.

  2) For the linked implementation, if we are storing N values at exactly 2N
  memory locations (one each for the value and the .next field).

So, generally we always need LESS storage space to store these values as an
array than as a linked list, even if 1/2 the array contains no values!

One word of caution. At the time a new array is allocated, we are using
N values (the old array) + 2N values (the new array) of data. We can get rid 
of the old array as soon as we copy its values. So, actually in the worst case
we might need about 3N memory locations to store N values, temporarily, when
doubling the array.

Finally, let's consider how much space the data takes up compared to the data
structure storing it. If each value in the data structure were an object
containing 10 memory locations (20-40 characters), then storing 1,024 of these
values would occupy 10,240 memory locations. So, even compared to a a linked
list (needing an additional 2048 memory locations to store these values), the
data structure uses only about 20% of the memory locations needed to store the 
values in a data structure: the space taken up by the values stored dominates
the space taken up to store them in an array/list.