Introduction | In this lecture, we will discuss a variety of general issues in program simplification. It introduces no new Python programming features, but instead steps back to provide a perspective on one very important programming skill that all competent programmers must acquire. Sometimes simplifying a program is the only way to comprehend it, and hence debug it. Finally, this lecture shows that programs themselves can be studied and manipulated in a formal way (just as expressions are manipulated in algebra). |
Simplification |
We can use the laws of algebra to tell whether two forms are equivalent: using
either one produces the same result.
Thus, equivalence is a mathematical topic.
But as programmers, we must judge which form (the simpler one) to use in our
programs.
The simplest program is the one that is easiest to read, debug, and maintain. Thus, simplicity is a psychological topic. As a rule of thumb, smaller forms are often easier to understand (although sometimes a bit ofredundancy makes forms easier to understand: smtms lss s nt mr). In this section we will examine three kinds of algebras for proving equivalences
|
Boolean Algebra | Boolean Algebra concerns equivalences involving the bool type and the logical operators. The following is a list of useful laws (theorems, if you will) of Boolean Algebra. The most practical law is DeMorgan's law: one form explains how to simplify the negation of a conjunction (and) and the other form explains how to simplify the negaion of a disjunction (or). |
We can easily prove laws in Boolean Algebra by trying every combination of
the True and False values for each of the variables.
For example, to prove the conjunctive version of DeMorgan's law, we can start
with the following table of values, called a truth table.
Here we list all the variables in the leftmost columns, and the two
expressions -which we hope to prove equivalent- in the rightmost columns.
We then fill in each column by just computing the values of the expressions (using the semantics of the operators and our knowledge of evaluating expressions: e.g., operator precedence and associativity).
The law is proved if the columns under the two expressions always contain the same pair of values on each line. This means that for every pair of operands, the expressions compute the same result, so the expression are equivalent and thus interchangable in our code. This approach also illustrates a divide and conquer strategy to proofs: we divide the complicated proof into four different parts (each a line in the truth table); each line is easy to verify by pure calculation; once we verify all of the lines, we have verified the entire proof. |
Relational Algebra |
Relational Algebra concerns equivalances mostly involving the int and
float types and the numeric and relational operators.
It is based on the law of trichotomy.
As a first example, we will prove that max(x,y)+1 is equivalent to max(x+1,y+1). So, we can factor additive constants out of calls to the max function.
Here, knowing the relationship between x and y allows us to compute the answer for each expression. For example, if we know that x<y, we know max(x,y) evaluates to y, so max(x,y)+1 evaluates to y+1; likewise, if we know that x<y, we know that x+1<y+1, so max(x+1,y+1) evaluates to y+1. The other two cases can be simplified similarly. Again, the law is proved if the columns under the two expressions always have the same pair of values on each line. This means that for every pair of values, the expressions compute the same result, so the expressions are interchangable. As another example, we can prove that the expression x<0 == y<0 is equivalent to (x<0 and y<0) or (x>=0 and y>=0), which is True when x and y have the same sign (assuming the sign of 0 is considered positive). Here we need to list all possibilities of how x and y compare to 0.
So again, the equivalence it proven. Here is an interesting case where the smaller/simpler/more efficient code is not necessarily easier to understand. Most students believe the larger expression is easier to understand, until they intensely study the meaning of == when applied to boolean values generated by relational operators. In actual code, I would suggest using the simpler form, and then include the larger -easier to understand- form in a comment; we could even include the above proof as part of the comment. Finally we can use the law of trichotomy to prove that we simplify the negation of relational operators. Be careful when you do so, and recall the law of trichotomy. If it is not true that x is less than y, then there are two possiblities left: x is equal to y or x is greater than y. Beginning programmers make the mistake of negating x<y as x>y, but the correct equivalent expression is x>=y
In fact, we can use the same kinds of proofs to show how to simplify the negation of every relational operator.
Also notice that each of the relational operators is equivalent to an expression that includes only the < relational operator (along with sone logical operators). Although we don't need the five other relational operators to write Python programs (in a mathematical sense), they are provided because most programmers are aware of them and can use them to write simpler programs (in a psychological sense). Thus the Python language is made larger (more operators) to make it easier for human minds to use. Such value judgements (which is better, a smaller language or a language easier for humans to use) are required by programming language designers. Finally, we can often simplify negatated relational operators in DeMorgan's Laws simplifications. For example, we can "simplify" not (x < 0 or y > 10) to not (x < 0) and not (y > 10) which is actually a bit larger, but can be further simplified to x >= 0 and y <= 10 which is truly simpler. |
Pragmatics |
The formal techniques presented and illustrated above allow us to prove that
two programming forms are equivalent.
It is then up to us to determine which form is simpler, and use it in our
programs.
As a rule of thumb, the smaller the code the simpler it is.
Also, code with fewer nested statements is generally simpler.
More generally, we should try to distribute complexity. So, when decidiing between two equivalences, choose the one whose most complicated statement is simplest. If you ever find yourself duplicating code, there is an excellent chance that some simplification will remove this redundancy. Beginners are especially prone to duplicating large chunks of code, missing the important simplification. We should aggressively simplify our code while we are programming. We will be amply rewarded, because it is easier to add more code (completing the phases of the enhancements) to an already simplified program. If we wait until the program is finished before simplifying...well, we many never finish the program because it has become too complex; if we do finish, the context in which to perform each simplification will be much bigger and more complex, making it harder to simplify. Excessive complexity is one of the biggest problems that a software engineer faces. Generally, I try not to get distracted when I am writing code; but one of the few times that I will stop writing code is when I see a simplification. I know that in the long run, taking time to do a simplification immediately will likely allow me to finish the program faster. |
Problem Set |
To ensure that you understand all the material in this lecture, please solve
the the announced problems after you read the lecture.
If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a TA, or any other student.
|