What's to submit:
Note that there's many optional points in the lab which you can explore. All these can earn you bonus points.
This lab has 2 parts. First you'll generalize the binary search algorithm you wrote in lab2 into "trinary" search (and optionally you can generalize that to "n-ary" search for any n=2,3,...). Second, you'll time all these search algorithms, plot their average running times and compare these to the theoretical analysis of the (worst-case) running time of these n-ary search algorithms.
The point of this lab is that you learn how to:
Modify the binary search algorithm in lab2 so that it does a "trinary" search instead. Recall that the binary search examines the value at the middle m = (f+i)/2 = i + (f-i)/2 of the searched-for interval [i,f] (of ordered values between some initial point i and the final point f), and then recursively calls the binary search on either the [i,m] or the [m,f] half of this [i,f] inverval. The base case of the binary search algorithm is this: If the searched-for interval is so small that it cannot be split into two smaller fragments, i.e. when f-i < 2, then the algorithm "manually" examines each of the values in the interval, i.e. simply the value at point i and the value at point f. (Note that if f-i < 2 then we either have f=i or f=i+1.)
A *trinary* search algorithm searches such [i,f] interval by examining the values at 1/3 and the 2/3 of the interval, i.e. by examining the values at point m1 = i + 1/3*(f-i) and m2 = i + 2/3*(f-i), and then, depending on how do these values compare with the searched-for value, the trinary search algorithm recursively calls the trinary search on *either* the [i,m1] interval, or the [m1,m2] interval, or the [m2,f] interval. The base case of the trinary search algorithm is similar to that of the binary search: If the searched-for interval is so small that it cannot be split into even smaller fragments, then the algorithm examines the remaning values one-by-one.
As you generalize the binary search to trinary one, these are the things you need to think about:
[Optionally, you can generalize this even further, and code a general "n-ary search" algorithm which given an argument n does the search by splitting the searched-for interval into n fragments and searching them recursively via the "n-ary search". (It might be that the base case of the n-ary search will have to depend on n, but maybe that will not be necessary.]
Do the (theoretical) analysis of the (worst-case) running time of the binary search *and* the trinary search algorithms. We want the running time to be expressed in terms of just one variable: n, the *size* of the searched interval. Note that in the case of the savings calculator application in lab2, the size n of the searched interval is equal to n = 100t/m where t is the target amount in cents, and m is the the number of months. In the running time analysis consider m, the number of months, as a constant. For example, set m = 60, which corresponds to dealing with 5-year saving plans. (Note also that the running time of all these algorithmsdoes not depend on the interest rate r.) The analysis will give you some running time functions, T_{bs}(n) and T_{ts} (n), for the binary search and the trinary search, correspondingly. What is the O() notation for both of these functions?
Compare the running times of the binary and the trinary search algorithms. Are they asymptotically the same or one is larger than the other? Even if they are the same in the O() sense, i.e. if T_{bs}(n) = O(T_{ts}(n)) and also T_{ts}(n) = O(T_{bs} (n)), try to think about the constants in fhe functions themselves: Can you hypothesize which function will dominate the other?
[Optionally, you can check what's the asymptotic behaviour of the running time function of general n-ary search, and compare it to the others. What's the optimal value of n?]
[Optionally, you can consider both m and n the variables for the running time function. In the book we talked about the running-time function as only a function T(n) of one single variable n that represents the "size of the problem", but one often wants to express the runnning time as a function of more than just one variable. In the case of the savings calculator application, it'd be convient to consider its running time as a function of two variables, n and m, i.e. some function T(m,n). The definition of the O() notation for the two-variable function is very similar to the single-variable definition: We say that T(m,n) is O(F(m,n)) for some other function F(m,n) if there exist a constant c>0 and two points m0 and n0 s.t. for all m>m0 and all n>n0 we have that T(m,n) <= c * F(m,n). When you do the running-time experiments in the second part of the lab (see below), you can gather the information also as a function of these two variables, m and n, and compare such results to the O() expression for the T(m,n) function you derive here using pen and paper.]
First you need to implement the timer. Use an appropriate timer class from the java Class library to clock the time before and after your search algorithm runs. We are interested in the running time of your implementation as a function of n, where n = 100t/m, where m is fixed, as we said above, to m=60. [Of course optionally you can explore the running time of your algorithm as a function of both n and m.] Since n is not an explicit variable which your algorithm takes as an input, this means that in order to test the time of the algorithm as a function of n you need to run the algorithm on input t = n*m / 100. (Recall that t denotes the target amount.)
Moreover, we want to examine the average running time, as a function of n, and not just a running time of a single instance of your algorithm for a particular interest rate r. (Once n and m are fixed, the only other input that determines how your code is executed will be the interest rate r.) How can we do this? Simply, for every n, you should examine the average running time of your algorithm by averaging the times observed in k executions of your code, each one executing with the same parameter n. It's up to you to determine the useful value of k, but you can start with k=100. In each of these k instances you should run the algorithm with the same n but you should vary the r parameter. For example you can take say that the i-th execution should run with r = 0.05 + 0.01 * i/k, which will effectively range r between 5% and 6%. Or you can assign in each instance r using a random number generator. (But make sure that it falls into some reasonable region, definitely different than 0 or 1, because if r is either 0 or 1 your code might exhibit very non-average behaviour...)
The result of all the above should be a procedure, for example called (binary or trinary search) test , which takes parameters k and n, and outputs the average running time of k instances of the (binary or trinary) search algorithm, each of which executes on the same input size n (and each of which runs on some average-looking interest rate r and on m=60).
Now, to analyze them, gather your results in an array of results for some realistic n values. Pick the n values in the array so that you can easily observe the dependance between the (average) running time and these values. For example, if you have good reasons to believe that the running time of your algorithm is a linear function of n, you will want your n values spread linearly, e.g. n_{i} = a * i for some constant a. If, on the other hand, you think that your running time is logarithmtic, then the running times will be easier to observe if you observe the results for n's spread exponentially, e.g. n_{i} = 2^{i} or n_{i} = 10^{i} . In any event, you should program a procedure which iterates over n values in an array of test cases, runs the above test for each of them, and gathers the outputs in an array of outputs. Then you can populate the array of test cases with linearly-spaced n's and see if the results are informative. If not, populate the array of test cases with exponentially-spaced n's and rerun the process. Note that if you start with too-big values, your code will run forever. If on the other hand you use too-small values, the results will be random-looking and hard to interpret. This is exactly why you need to program the *procedures* that take any array of test cases and runs any number k of tests on each. Using these procedures you can easily then change the k parameter and the test inputs and see what they need to be in order for you to get observable and easy to interpret results!
Finally, once you get some easy-to-interprest results, plot them on a graph. On the x-axis there should be the different values of n, and on the y-axis there should be the average running time of the algorithm for a given n. Your solutions can contain two such plots: One for the binary and one for the trinary algorithms. The two plots should use the same test cases n. Even better, you can draw a single plot, with two lines (or curves): In one color the line representing the average running times of the binary search algorithm, and in the other color the same for the trinary search.
For the code part, you should submit a code you used to test organized as I'm suggesting above, and with a description of how it can be used to do the testing: Where do you set the k value, where do you populate the test cases. The results should be displayed in some easy to read way, but they do not need to be plotted on the screen.
Answer the following questions:
[Optional idea: You can use this experiment to estimate the cost of the arithmetic operations compared to the cost of the other operations (like comparisons, look-ups, variable assignments, and everything else this code does). You can see this by replacing floats with doubles (or vice versa) in the interest-rate operations in the amount-saved procedure, and observing whether the running time changes by much. If you can, maybe you can do all the operations on the integers (it'd be sligtly tricky but it might be possible). You can test if this speeds it up.]