Analysis of Algorithms Analysis of Algorithms: First Lecture Analysis of Algorithms is a mathematical area of Computer Science in which we analyze the resources (mostly time, but sometimes space) used by algorithms to solve problems. An algorithm is a precise procedure for solving a problem, written in any notation that humans understand (and thus can carry-out the algorithm): if we write an algorithm as code in some programming language, then a computer can execute it too. The main tool that we use to analyze algorithms is big-O notation (and we will briefly examine the big Omega and big Theta notations in the next lecture). We use big-O notation to characterize the performance of an algorithm by placing it in a complexity class (most often based on its WORST-CASE behavior -but sometimes BEST-CASE or AVERAGE-CASE behavior- when solving a problem of size N): once we know the complexity class of an algorithm, we have a good handle on understanding its behavior (within certain limits). Thus, we don't necessarily compute the exact resources needed, but typically a bound on the resources. Getting to Big-O Notation: Throwing away Irrelevant Details The web reading starts out by showing two algorithms coded in Java: the first computes the biggest value in an array; the second sorts an array. In both of these examples, their problem size is the length of the array: the number of values to examine to find the maximum; and the number of values to sort. Often, the problem size is the number of values processed: e.g., the number of elements in an array, linked list, or file. But we can use other metrics as well: it can be the count of digits in a number, when looking at the complexity of multiplication, based on the size of the numbers. Thus, there is no single measure of size that fits all problems: instead, we try to choose a measure that makes sense for the problem (we will later see the height of a tree used as a common measure). For the first Java code/algorithm, the reading shows the Java virtual machine instructions that it is translated into by the Java compiler. We can then examine a function Iaw(N) that for an array of size N computes the number of instruction (that is what "I" means) that Java executes while executing this algorithm (that is what "a" means) "in the worst case" (that is what the "w" means). For this code, the worst case means that each value in the array is bigger than all the previous values, so the "if test" in the "for loop" is always true and always executes the new assignment to max. Such an input results in the maximum number of instructions being executed. The web reading shows Iaw(N) = 14N + 9, and then shows that if we simplify the formula by dropping the + 9, and just write Iaw(N) ~ 14N, we can still get a very accurate answer as N gets bigger and bigger for the number of instructions executed. The difference between 14N+9 and 14N is always 9; when N is 1,000 this represents an error of 9/14,009 which is less than .01% (99.99% accurate). Analysis of Algorithms really should be referred to as ASYMPTOTIC Analysis of Algorithms, as it is mostly concerned with the performance of algorithms as the problem size gets very big (N -> infinity). If the real formula is the sum of a bunch of terms, we can drop any term that doesn't grow as quickly as the most quickly growing term. Here the linear term: 14N, grows more quickly than the next term, 9, which doesn't grow at all (as N grows). For the sorting code, Iaw(N) = 14N^2 + 7. or Iaw(N) ~ 14N^2. Note that N^2 means N raised to the second power. The difference between 14N^2+7 and 14N^2 is always 7; when N is 1,000 this represents an error of 7/14,000,007 which is less than .0001% (99.9999% accurate). We now will explain a rationale for dropping the constant in front of N and N^2, and classifying these algorithms as O(N) and O(N^2). Here O means "grows with the same ORDER" as N and N^2. 1) Let us imagine that every instruction in the computer takes the same amount of time to execute (a reasonable but not totally accurate assumption). Then the time taken by such an algorithm Taw(N) is about 14N^2/speed; really we should think about writing Taw(N) as (14/speed)N^2, so 14/speed is the constant in front of N^2 when computing times. Where exactly did the 14 come from? It relates to the number of instructions that the Java compiler generated that are in the main loop; but different Java compilers might generate different number of instructions (better ones generate fewer instructions); thus, it is based on technology and might change. And, of course "speed" changes based on technology too. Since we are trying to come up with a "science" of algorithms, we don't want our results to depend on technology, so we are also going to drop the constant in front of the biggest term. Here is another justification for not being concerned with the constant in front of the biggest term. 2) A major question we want answered about any algorithm is, "how much more of a resourece does it need when solving a problem twice as big". In "maximum", when N is big (so we can drop the +9 without losing much accuracy) the ratio of time to solve solve a problem of size 2N to the time to solve a problem of size N is easily computed: Iaw(2N) 14(2N) -------- ~ ------- ~ 2. Iaw(N) 14 N The ratio is is a simple number (no matter how many instructions are in the loop, since the constant 14 appears as a multiplicative factor in both the numerator and denominator). So, we know for this code if we double the size of the array, we double the number of instructions that are executed, and thus double the amount of time (for whatever the speed of the computer is). Likewise, for sorting we can write Iaw(2N) 14(2N)^2 -------- ~ ---------- ~ 4 Iaw(N) 14 N^2 Again, the ratio is is a simple number, with the constant (no matter what it is, disappearing). So, we know for this code that if we double the size of the array, we increase by a factor of 4 the number of instructions that are executed, and thus increase by a factor of 4 the amount of time (for whatever the speed of the computer is). Thus, the contant 14 is irrelevant when asking this "doubling" question. Note if we didn't simplify, we'd have Iaw(2N) 14(2N)^2 + 7 -------- = -------------- Iaw(N) 14 N^2 + 7 which doesn't simplify easiy; although, as N->inifinty, this ratio gets closer and closer to 4 (and is very close evern for small-szed problems). As with air-resistance and friction in physics, typically ignoring the contribution of these negligible factors (for big, slow-moving objects) allows us to quickly solve an approximately correct problem. Using big-O notation, we say that the complexity class of the code to find the maximum is O(N). The big-O means "on the order of" or "the growth rate is" N. For the sorting code, its complexity class is O(N^2). ---------- IMPORTANT: To compute the complexity class of code, we approximate the number of times the most frequently executed statement is executed, then drop all the lower terms and drop the constant in front of the most frequently executed statement. The maximum code executes the if statement N times, so it is O(N). The sorting code executes the if statement N(N-1)/2 times (we will justify this number below), which is N^2/2 - N/2, so dropping the lower term and the constant 1/2, yields a complexity class of O(N^2) ---------- More formally, we say an algorithm a is O(f(N)) if we can find c and N0, such that we can prove Ta(N) < c f(N) for all N>N0. This means that once we get to a big enough N (>N0) Ta(N) is always bounded (grows no faster than) f(N) multiplied by some constant. Normally we are interested is the SMALLEST GROWING function possible for f(N). For example, computing the maximum of an array is O(N), but technically it is also O(N sqrt(N)), O(N^2), O(N^3), etc. because all those function bound (grow more quickly than) 14N+9. Normally we discard all these bigger functions and just say it is O(N). See the picture in the reading that shows the standard way to think about the meaning of this picture and about comparing algorithms that are in different complexity classes. Primarily from this definition we have that if two algorithms, a and b, both solve some problem, and a is in a lower complexity class than b, then for all big enough N, Ta(N) < Tb(N). Note that nothing here is said about small N; which algorithms uses fewer resources depends on the actual constants (and even the terms that we dropped). For example, if algorithm a is O(N) with a constant of 100, and algorithm b is O(N^2) with a constant of 1, then for values of N in the range [1,100], Tb(N) = 1N^2 <= 100N = Ta(N) but for all values bigger than 100, Ta(N) = 100N <= 1N^2 = Tb(N) Again, we use the term "asymptotic" analysis of algoritms to indicate that we are concerned with the speed when N gets very large (going towards infinity). In which case algorithm a is better. What about the constants? It is often the case that the constants of different algorithms are close. (They are often just the number of instructions in the main loop of the code.) So the complexity classes are a good indication of faster vs slower algorithms for all but the smallest values N. Although all possible mathematical functions might represent complexity classes (and many strange ones do), we will mostly restrict our attention to the following complexity classes. Note that complexity classes can interchangably represent computing time, # of machine operations executed, and such more nebulous terms as "effort" or "work" or "resources". As we saw before, a fundamental question about any algorithm is, "What is the time needed to solve a problem twice as big". We will call this the SIGNATURE of the complexity class (knowing this value empirically often allows us to know the complexity class as well). Class | Algorithm Example | Signature --------+-----------------------------------------------+---------------------- O(1) | passing parameters | T(2N) = T(N) O(LogN) | binary searching of an ordered array | T(2N) = c + T(N) O(N) | linear searching an array | T(2N) = 2T(N) O(NLogN)| Fast sorting | T(2N) = cN + 2T(N) Fast algorithms come before here; NLogN grows a bit more slowly than linearly and no where near as fast as O(N^2) O(N^2) | Slow sorting; scanning N times array of size N| T(2N) = 4T(N) O(N^3) | Slow matrix multiplication | T(2N) = 8T(N) O(N^m) | for some fixed m: 4, 5, ... | T(2N) = 2^mT(N) Tractable algorithms come before here; their work is polynomial in N O(2^N) | Finding boolean values that satisfy a formula | T(2N)=2^NT(N)=T(N)^2 For example, for an O(N^2) algorithm, doubling the size of the problem quadruples the time required: T(2N) ~ c(2N)^2 = c4N^2 = 4cN^2 = 4T(N). Note that in Computer Science, logarithms are mostly taken to base 2. (Remember that algorithms and logarithms are very different terms). You should memorize and be able to use the following facts to compute some logarithms without a calculator. Log 1000 ~ 10 Actually, 2^10 = 1,024, 2^10 is approximatley 1,000 with < a 3% error . Log a^b = b Log a, or more usefully, Log 1000^N = N Log 1000; so ... Log 1,000,000 = 20 (because 1,000,000 = 1,000^2 Log 1,000,000,000 = 30 (because 1,000,000,000 = 1,000^3 So note that Log is a very slowly growing function. When we increase from Log 1,000 to Log 1,000,000,000 (a factor of 1 million) the result only goes up from 10 to 30 (a factor or 3). In fact, we can compute these logarithms on any calculator that computes Log in any base. For example, Log (base b) X = Log (base a) X/Log (base a) b. So, Log (base b) X is just a constant times Log (base e) X, so they are really all the same complexity class (regardless of the base) because they differ only by a multiplicative constant. For example, Log(base 10) X = Log(base 2) X / Log(base 2) 10 ~ .3 Log(base 2) X ---------- IMPORTANT: If we can demonstrate that doubling the size of the input approximately quadruples the time of the algorithm, then the algorithm is O(N^2). We can use the signatures shown above for other complexity classes as well. Thus, even if we cannot mathematically analyze the complexity class of an algorithm based on its code, if we can measure it running on various sized problems, and use the signature information to approximate its complexity class. Please remind me in class to show you a spreadsheet with interesting behavior. When N is small and we double it, the function looks linear (doubling the size doubles the function) but as we continue doubling N, the function looks quadratic (doubling the size quadruples the function). This example can teach us something interesting about complexity classes. Remind me at the start of the second AA lecture to show it. ---------- We can use the complexity class of an algorithm to predict its running time as a function of N easily. For example, if we know Ta is O(N^2), then we know that Ta(N) ~ (actually <=) cN^2 for some constant c. The constant c represents the "technology" used: the language, compiler, machine speed, etc.; the N^2 (from O(N^2)) represents the "science/math" part. Now, given this information, we can time the algorithm for some large value N. Let's say for N = 10,000 (which is actually a pretty small N these days) we find that Ta(10,000) is 4 seconds. First, if I asked you to estimate Ta(20,000) you'd immediately know it is about 16 second (doubling the input of an O(N^2) algorithm approximately increases the running time by a factor of 4). Second, if we solve for c we have Ta(N) ~ cN^2, substituting 10,000 for N and 4 for Ta(N) we have Ta(10,000) = 4 ~ c 10,000^2 (from the formula), so solving for c ~ 4x10^(-8). By measuring the run-time of this code, we can calculate the constant "c", which involves all the technology (language, compiler, computer speed, etc.). Roughly, we can think of c as being the amount of time it takes to do one loop (# of instructions per loop/speed of executing instructions) where the algorithm requires N^2 loops to do all its work. Therefore, Ta(N) ~ 4x10^(-8) x N^2. So, if asked to estimate the time to process 1,000,000 (10^6) values (100 times more than 10,000), we'd have Ta(10^6) ~ 4x10^(-8) x (10^6)^2 Ta(10^6) ~ 4x10^(-8) x 10^12 Ta(10^6) ~ 4x10^4, or about 40,000 seconds (about 1/2 a day) Notice that solving a problem 100 times as big take 10,000 (which is 100^2) times as long, as we would expect for an O(N^2) algorithm. In fact, while we often anaylze code to determine its complexity class, if we don't have the code (or find it too complicated to analyze) we can double the input sizes a few times and see whether we can "fit the resulting times" to any of the standard signatures to estimate the complexity class of the algorithm. We should do this for some N that is as large as reasonble (taking some number of seconds to solve on the computer). Note for an O(2^N) algorithms, if we double the size of the problem from 100 to 200 values the amount of time needed goes up by a factor of 2^100, which is ~ 1.3x10^30. Notice that adding one more value to process doubles the time: this "exponential" time is the opposite of logarithmic time, in terms of its growth rate: it grows incredibly quickly. Note too that it is important to be able to analyze the following code. Notice that the upper bound of the inner loop (i) is changed by the outer loop. for (int i=0; i