Analysis of Algorithms 


Analysis of Algorithms: First Lecture

Analysis of Algorithms is a mathematical area of Computer Science in which we
analyze the resources (mostly time, but sometimes space) used by algorithms to
solve problems. An algorithm is a precise procedure for solving a problem,
written in any notation that humans understand (and thus can carry-out the
algorithm): if we write an algorithm as code in some programming language, then
a computer can execute it too.

The main tool that we use to analyze algorithms is big-O notation (and we will
briefly examine the big Omega and big Theta notations in the next lecture). We
use big-O notation to characterize the performance of an algorithm by placing
it in a complexity class (most often based on its WORST-CASE behavior -but
sometimes BEST-CASE or AVERAGE-CASE behavior- when solving a problem of size
N): once we know the complexity class of an algorithm, we have a good handle on
understanding its behavior (within certain limits). Thus, we don't necessarily
compute the exact resources needed, but typically a bound on the resources.


Getting to Big-O Notation: Throwing away Irrelevant Details

The web reading starts out by showing two algorithms coded in Java: the first
computes the biggest value in an array; the second sorts an array. In both of
these examples, their problem size is the length of the array: the number of
values to examine to find the maximum; and the number of values to sort.

Often, the problem size is the number of values processed: e.g., the number of
elements in an array, linked list, or file. But we can use other metrics as
well: it can be the count of digits in a number, when looking at the complexity
of multiplication, based on the size of the numbers. Thus, there is no single
measure of size that fits all problems: instead, we try to choose a measure
that makes sense for the problem (we will later see the height of a tree used
as a common measure).

For the first Java code/algorithm, the reading shows the Java virtual machine 
instructions that it is translated into by the Java compiler. We can then
examine a function Iaw(N) that for an array of size N computes the number
of instruction (that is what "I" means) that Java executes while executing this
algorithm (that is what "a" means) "in the worst case" (that is what the "w"
means). For this code, the worst case means that each value in the array is
bigger than all the previous values, so the "if test" in the "for loop" is
always true and always executes the new assignment to max. Such an input
results in the maximum number of instructions being executed.

The web reading shows Iaw(N) = 14N + 9, and then shows that if we simplify the
formula by dropping the + 9, and just write Iaw(N) ~ 14N, we can still get a
very accurate answer as N gets bigger and bigger for the number of instructions
executed. The difference between 14N+9 and 14N is always 9; when N is 1,000
this represents an error of 9/14,009 which is less than .01% (99.99% accurate).

Analysis of Algorithms really should be referred to as ASYMPTOTIC Analysis of
Algorithms, as it is mostly concerned with the performance of algorithms as
the problem size gets very big (N -> infinity).

If the real formula is the sum of a bunch of terms, we can drop any term that
doesn't grow as quickly as the most quickly growing term. Here the linear term:
14N, grows more quickly than the next term, 9, which doesn't grow at all (as N
grows).

For the sorting code, Iaw(N) = 14N^2 + 7. or Iaw(N) ~ 14N^2. Note that N^2
means N raised to the second power. The difference between 14N^2+7 and 14N^2 is
always 7; when N is 1,000 this represents an error of 7/14,000,007 which is
less than .0001% (99.9999% accurate).

We now will explain a rationale for dropping the constant in front of N and
N^2, and classifying these algorithms as O(N) and O(N^2). Here O means "grows
with the same ORDER" as N and N^2. 

1) Let us imagine that every instruction in the computer takes the same amount
   of time to execute (a reasonable but not totally accurate assumption). Then
   the time taken by such an algorithm Taw(N) is about 14N^2/speed; really we
   should think about writing Taw(N) as (14/speed)N^2, so 14/speed is the
   constant in front of N^2 when computing times. Where exactly did the 14
   come from? It relates to the number of instructions that the Java compiler
   generated that are in the main loop; but different Java compilers might
   generate different number of instructions (better ones generate fewer
   instructions); thus, it is based on technology and might change. And, of
   course "speed" changes based on technology too.

Since we are trying to come up with a "science" of algorithms, we don't want
our results to depend on technology, so we are also going to drop the constant
in front of the biggest term. Here is another justification for not being
concerned with the constant in front of the biggest term.

2) A major question we want answered about any algorithm is, "how much more of
   a resourece does it need when solving a problem twice as big". In "maximum",
   when N is big (so we can drop the +9 without losing much accuracy) the ratio
   of time to solve solve a problem of size 2N to the time to solve a problem
   of size N is easily computed:

    Iaw(2N)    14(2N)
   -------- ~ ------- ~ 2.
    Iaw(N)     14 N

   The ratio is is a simple number (no matter how many instructions are in the
   loop, since the constant 14 appears as a multiplicative factor in both the
   numerator and denominator).  So, we know for this code if we double the size
   of the array, we double the number of instructions that are executed, and
   thus double the amount of time (for whatever the speed of the computer is).
   Likewise, for sorting we can write

    Iaw(2N)    14(2N)^2
   -------- ~ ---------- ~ 4
    Iaw(N)     14 N^2

   Again, the ratio is is a simple number, with the constant (no matter what it
   is, disappearing).  So, we know for this code that if we double the size of
   the array, we increase by a factor of 4 the number of instructions that are
   executed, and thus increase by a factor of 4 the amount of time (for
   whatever the speed of the computer is). Thus, the contant 14 is irrelevant
   when asking this "doubling" question. 

   Note if we didn't simplify, we'd have

    Iaw(2N)    14(2N)^2 + 7
   -------- = -------------- 
    Iaw(N)     14 N^2 + 7

   which doesn't simplify easiy; although, as N->inifinty, this ratio gets
   closer and closer to 4 (and is very close evern for small-szed problems).

As with air-resistance and friction in physics, typically ignoring the
contribution of these negligible factors (for big, slow-moving objects) allows
us to quickly solve an approximately correct problem.

Using big-O notation, we say that the complexity class of the code to find the
maximum is O(N). The big-O means "on the order of" or "the growth rate is" N.
For the sorting code, its complexity class is O(N^2).

----------
IMPORTANT:

To compute the complexity class of code, we approximate the number of times
the most frequently executed statement is executed, then drop all the lower
terms and drop the constant in front of the most frequently executed statement.

The maximum code executes the if statement N times, so it is O(N). The sorting
code executes the if statement N(N-1)/2 times (we will justify this number
below), which is N^2/2 - N/2, so dropping the lower term and the constant 1/2,
yields a complexity class of O(N^2)
----------

More formally, we say an algorithm a is O(f(N)) if we can find c and N0, such
that we can prove Ta(N) < c f(N) for all N>N0. This means that once we get to
a big enough N (>N0) Ta(N) is always bounded (grows no faster than) f(N)
multiplied by some constant.

Normally we are interested is the SMALLEST GROWING function possible for f(N).
For example, computing the maximum of an array is O(N), but technically it is
also O(N sqrt(N)), O(N^2), O(N^3), etc. because all those function bound (grow
more quickly than) 14N+9. Normally we discard all these bigger functions and
just say it is O(N).

See the picture in the reading that shows the standard way to think about the
meaning of this picture and about comparing algorithms that are in different
complexity classes.

Primarily from this definition we have that if two algorithms, a and b, both
solve some problem, and a is in a lower complexity class than b, then for all
big enough N, Ta(N) < Tb(N). Note that nothing here is said about small N;
which algorithms uses fewer resources depends on the actual constants (and even
the terms that we dropped).

For example, if algorithm a is O(N) with a constant of 100, and algorithm b
is O(N^2) with a constant of 1, then for values of N in the range [1,100],
   Tb(N) = 1N^2 <= 100N = Ta(N)
but for all values bigger than 100,
   Ta(N) = 100N <= 1N^2 = Tb(N)
Again, we use the term "asymptotic" analysis of algoritms to indicate that we
are concerned with the speed when N gets very large (going towards infinity).
In which case algorithm a is better.

What about the constants? It is often the case that the constants of different
algorithms are close. (They are often just the number of instructions in the 
main loop of the code.) So the complexity classes are a good indication of
faster vs slower algorithms for all but the smallest values N.

Although all possible mathematical functions might represent complexity classes
(and many strange ones do), we will mostly restrict our attention to the
following complexity classes. Note that complexity classes can interchangably
represent computing time, # of machine operations executed, and such more
nebulous terms as "effort" or "work" or "resources".

As we saw before, a fundamental question about any algorithm is, "What is the
time needed to solve a problem twice as big". We will call this the SIGNATURE
of the complexity class (knowing this value empirically often allows us to know
the complexity class as well).

Class   |  Algorithm Example				| Signature
--------+-----------------------------------------------+----------------------
O(1)	| passing parameters				| T(2N) = T(N)
O(LogN) | binary searching of an ordered array		| T(2N) = c + T(N)
O(N)	| linear searching an array			| T(2N) = 2T(N)
O(NLogN)| Fast sorting					| T(2N) = cN + 2T(N)
  Fast algorithms come before here; NLogN grows a bit more slowly than linearly
  and no where near as fast as O(N^2)

O(N^2)  | Slow sorting; scanning N times array of size N| T(2N) = 4T(N)
O(N^3)  | Slow matrix multiplication		        | T(2N) = 8T(N)
O(N^m)  | for some fixed m: 4, 5, ...			| T(2N) = 2^mT(N)
  Tractable algorithms come before here; their work is polynomial in N

O(2^N)  | Finding boolean values that satisfy a formula | T(2N)=2^NT(N)=T(N)^2

For example, for an O(N^2) algorithm, doubling the size of the problem
quadruples the time required: T(2N) ~ c(2N)^2 = c4N^2 = 4cN^2 = 4T(N).

Note that in Computer Science, logarithms are mostly taken to base 2.
(Remember that algorithms and logarithms are very different terms).  You should
memorize and be able to use the following facts to compute some logarithms
without a calculator.

Log 1000 ~ 10
  Actually, 2^10 = 1,024, 2^10 is approximatley 1,000 with < a 3% error .

Log a^b = b Log a, or more usefully, Log 1000^N = N Log 1000; so ...
  Log 1,000,000     = 20 (because 1,000,000     = 1,000^2
  Log 1,000,000,000 = 30 (because 1,000,000,000 = 1,000^3

So note that Log is a very slowly growing function. When we increase from
Log 1,000 to Log 1,000,000,000 (a factor of 1 million) the result only goes
up from 10 to 30 (a factor or 3).

In fact, we can compute these logarithms on any calculator that computes Log
in any base. For example, Log (base b) X = Log (base a) X/Log (base a) b. So,
Log (base b) X is just a constant times Log (base e) X, so they are really all
the same complexity class (regardless of the base) because they differ only by
a multiplicative constant. For example,
  Log(base 10) X = Log(base 2) X  /  Log(base 2) 10 ~ .3 Log(base 2) X

----------
IMPORTANT:

If we can demonstrate that doubling the size of the input approximately
quadruples the time of the algorithm, then the algorithm is O(N^2). We can use
the signatures shown above for other complexity classes as well. Thus, even if
we cannot mathematically analyze the complexity class of an algorithm based on
its code, if we can measure it running on various sized problems, and use the
signature information to approximate its complexity class.

Please remind me in class to show you a spreadsheet with interesting behavior.
When N is small and we double it, the function looks linear (doubling the size
doubles the function) but as we continue doubling N, the function looks
quadratic (doubling the size quadruples the function). This example can teach
us something interesting about complexity classes. Remind me at the start of
the second AA lecture to show it.
----------

We can use the complexity class of an algorithm to predict its running time as
a function of N easily. For example, if we know Ta is O(N^2), then we know that
Ta(N) ~ (actually <=) cN^2 for some constant c. The constant c represents the
"technology" used: the language, compiler, machine speed, etc.; the N^2 (from
O(N^2)) represents the "science/math" part. Now, given this information, we can
time the algorithm for some large value N. Let's say for N = 10,000 (which is
actually a pretty small N these days) we find that Ta(10,000) is 4 seconds.
First, if I asked you to estimate Ta(20,000) you'd  immediately know it is
about 16 second (doubling the input of an O(N^2) algorithm approximately
increases the running time by a factor of 4). Second, if we solve for c we have

  Ta(N) ~ cN^2, substituting 10,000 for N and 4 for Ta(N) we have
  Ta(10,000) = 4 ~ c 10,000^2 (from the formula), so solving for c ~ 4x10^(-8).

By measuring the run-time of this code, we can calculate the constant "c",
which involves all the technology (language, compiler, computer speed, etc.).
Roughly, we can think of c as being the amount of time it takes to do one loop
(# of instructions per loop/speed of executing instructions) where the
algorithm requires N^2 loops to do all its work.

Therefore, Ta(N) ~ 4x10^(-8) x N^2. So, if asked to estimate the time to
process 1,000,000 (10^6) values (100 times more than 10,000), we'd have

  Ta(10^6) ~ 4x10^(-8) x (10^6)^2
  Ta(10^6) ~ 4x10^(-8) x 10^12
  Ta(10^6) ~ 4x10^4, or about 40,000 seconds (about 1/2 a day)

Notice that solving a problem 100 times as big take 10,000 (which is 100^2)
times as long, as we would expect for an O(N^2) algorithm.

In fact, while we often anaylze code to determine its complexity class,
if we don't have the code (or find it too complicated to analyze) we can
double the input sizes a few times and see whether we can "fit the resulting
times" to any of the standard signatures to estimate the complexity class
of the algorithm. We should do this for some N that is as large as reasonble
(taking some number of seconds to solve on the computer).

Note for an O(2^N) algorithms, if we double the size of the problem from
100 to 200 values the amount of time needed goes up by a factor of 2^100,
which is ~ 1.3x10^30. Notice that adding one more value to process doubles
the time: this "exponential" time is the opposite of logarithmic time,
in terms of its growth rate: it grows incredibly quickly.

Note too that it is important to be able to analyze the following code. Notice
that the upper bound of the inner loop (i) is changed by the outer loop. 

for (int i=0; i<N; i++)
  for (int j=0; j<i; j++)
     body

How many times does the "body" of the loop get executed? When the outer loop
index i is 0, "body" gets executed 0 times; when the outer loop index i is 1,
"body" gets executed 1 time; when the outer loop index i is 2, "body" gets
executed 2 times; .... when the outer loop index i is N-1 (as big as i gets),
"body" gets executed N-1 times. So, totally "body" gets executed
0 + 1 + 2 + 3 + ... + N-1 times or just 1 + 2 + 3 + ... + N-1 times.

There is a simple, general closed form solution of adding up consecutive
integers. Here is the proof that 1 + 2 + 3 + ... + N =N*(N+1)/2

Let

S = 1 + 2 + 3 + ... + N-1 + N.

Since the order of the numbers makes no difference in the sum, we also have

S = N + N-1 + ... + 3 + 2 + 1.

If we add the left and right side (column by column) we have

S   =   1  +    2  +   ...  +   N-1  +  N
S   =   N  +   N-1 +   ...  +   2    +  1
-------------------------------------
2S  = (N+1) + (N+1) +  ... +  (N+1) + (N+1)

That is, each pair in the column sums to N+1, and there are N pairs to sum.
Since there are N pairs, each summing to N+1, the right hand side can be
simplified to N*(N+1). so

2S = N*(N+1), therefore S = N(N+1)/2 = N^2/2 + N/2

Thus, S is O(N^2): with a constant of 1/2 and a term of N/2 that is dropped
(because its order is lower than that N^2). Note that either N or N+1 is
an even number, so dividing their product by 2 is always a integer:
6*7/2 = 21.

So, looking back at the example of the code above, the total number of times
the body gets executed is 0 + 1 + 2 + ... + N-1 which is the same as
1 + 2 + ... + N-1 so plugging N-1 in for n we have (N-1)(N-1+1)/2 = 
N^2/2 - N/2 which is O(N^2).

We can apply this formula for putting N values at the end of a linked list
that is initially empty (and has no cache reference to the last node). To put
in the 1st value requires skipping 0 nodes; to put in the 2nd value requires
skipping 1 nodes; to put in the 3rd value requires skipping 2 nodes; ...
to put in the Nth value requires skipping N-1 nodes. So the number of nodes
skipped is (N-1)N/2 so building a linked list in this way is in the O(N^2)
complexity class.


Fast Searching and Sorting:

There are obvious algorithms for searching an array in complexity O(N) and 
sorting an array in complexity O(N^2). But, there are surprisingly better
algorithms for these tasks: searching in O(Log N) if the array is sorted; and
sorting in O(N Log N).

The reading shows code for linear searching on arrays (where we look at the
first, value, the second value, the third value, etc.) whose complexity class
is O(N). 

  public static int linearSearch (int[] a, int value) {
    for (int i=0; i<a.length; i++)
      if (a[i] == value)
        return i;
    return -1;
  }

The worst case occurs when the value being searched for is the last one stored
in the array (or not even in the array at all). In such cases, the "if
statement" is executed N times, where N = a.length. The code and analysis for
linear searching a linked lists is similar: its complexity class is also O(N).

If the array is sorted, linear searching is still O(N). In the worst case, the
value being searched for is the BIGGEST value in the array, or BIGGER than any
value in the array.

  public static int linearSearchSorted (int[] a, int value) {
    for (int i=0; i<a.length && a[i] <= value; i++)
      if (a[i] == value)
        return i;
    return -1;
  }

Again, the result is similar for searching a sorted linked list.

But there is a very fast way to search a sorted array. The reading illustrates
and discusses the algorithm and code for binary searching, and shows (even in
the worst case) its complexity class is O(Log N). That means that when
searching an array of 1,000,000 values, we must access the array at most 20
times to either (a) find the index of the value in the array or (b) determine
the value is not in the array. This is potentially 50,000 times faster than
linear searching! On large problems, algorithms in a lower complexity class can
execute much faster.

Note that we cannot perform binary searching efficiently on linked lists,
because we cannot quickly find the middle of a linked list (for arrays we just
compute the middle index and access the array there). In fact, another self-
referential data structure, trees, can be used to perform efficient binary
searches. 

Sorting is one of the most common tasks performed on computers. There are
hundreds of different sorting algorithms that have been invented and studied. 
Many small and easy to write sorting algorithms are in the O(N^2) complexity
class (see the sorting code in the reading, for example). But some complicated
but efficient algorithms are in the O(N Log N) complexity class. We will study
sorting in more detail later in the class, including a few of these efficient
algorithms.

For now

  (1) Memorize that fast sorting algorithms are in the O(N Log N) complexity
  class. If you are ever asked to analyze the complexity class of a task that
  requires sorting data as part of the task, assume you can use an O(N Log N)
  sorting method.

  (2) If you want to study/time various sorting methods for various kinds of
  arrays, download the Eclipse project via the Sorting Demo link in the
  reading, and examine and run the code it contains.


Closing:

To close for now, finding an algorithm to solve a problem in a lower complexity
class is a big accomplishment; a more minor accomplishment might be decreasing
the constant in the same complexity class (certainly useful, but often based
more on technology than science). By knowing the complexity class of an
algorithm we know a lot about the performance of the algorithm (especially
if we measure the time it takes to solve certain sized problems). We can also
reverse the process, and use a few measurement to approximate the complexity
class of an algorithm.