ICS 32 Winter 2022, Notes and Examples: Test-Driven Development

What is test-driven development?

Test-driven development encourages you to build a program one small feature at a time, taking small steps from one piece of stable ground to another. The notion of "small feature" is open to debate, though a good guideline is to prefer features as simple as "The size of a newly-created collection of songs is zero" over features as complex as "A class to represent a collection of songs" or "A graphical user interface." The goal is to write a test that verifies the behavior of the new feature, then to write the code that implements the feature, using the test as a guide to indicate when you're done. At this point, you'll have a feature that is complete and tested, which means you've taken a step on to stable ground; more importantly, you have a test that you can keep until the feature's required behavior changes, which you'll be able to run repeatedly to ensure that your feature still works as you make other changes and add new features to your program. (Contrast this approach to the one you've taken as you've worked on your programs to date. With your current approach, how do you know that some part of your program is finished? How do you ensure that it continues to work correctly as you continue to make changes to your program? The answer, for most students, is some form of rote, mechanical testing and repeated re-testing.)

In lecture, we went through a step-by-step example as a group, developing portions of a SongCollection class using a test-driven development process. We did our best to follow all of the steps, though we sometimes forgot (or took liberties in the interest of time). Because it's so different from the programming style we're accustomed to, it takes a little time to adjust and get into the rhythm of test-driven development. But don't let the learning curve chase you away! It doesn't take long to get adjusted, and the benefits are higher-quality code — in terms of both how well it works and how well it's designed — and the ability to make changes to your program with confidence.

The steps in the test-driven development process are as follows.

Pick a new feature that you want to implement, preferring very simple features that can be verified with a single test. (It's not that you can't implement complex programs using test-driven development; it's just that you have to break them into simpler pieces. This is a good practice whether you're using a test-driven philosophy or not.)
Write a test. The test is intended to verify the behavior of a feature of one or more functions or classes that very likely haven't been created yet, which means you'll potentially be creating objects of classes that don't yet exist, or calling methods or functions that you haven't written. This may seem weird, but it's actually the whole point; pretend like the things you want have already been written. There are at least a couple of benefits to writing the test first:
- You won't need to guess whether your code works; the test will tell you when you've successfully implemented the feature.
- You've tested your design before you've ever implemented it. If you discover that the code that sets up the necessary objects and calls the method(s) you're testing seems more cumbersome than it needs to be, that is a very good indication that your design is probably more convoluted than it needs to be. Your design is at least as important as the code you write; a clean design ensures that your program will be understandable (to the original author and to others), as well as being maintainable and extensible as users request bug fixes and new features. These qualities should not be underestimated; programs in the "real world" often live a good deal longer than the original authors intend (and often stay in an organization long after the original author has moved on to greener pastures), and it's important to be able to introduce changes to a program without it falling down like a house of cards.
It's wise to start with very simple features and work your way up to the somewhat more complex ones, which is why we chose to begin by testing that the size of an empty collection of songs is zero.
Run the test, even though you know it will fail when you do. The point here is to get the tests to tell you what you're missing, rather than guessing at it. After running the tests and reading any messages associated with failure, you'll have a clear idea of what code needs to be added (or rewritten) in order to make the test succeed.
Write the mimimum amount of code that will make the test pass, without worrying about whether the code you wrote will affect the next test you write or satisfy the next feature you plan to implement. This is a difficult habit to get yourself into at first, because it often necessitates writing code that works perfectly in the simple case you're testing, but clearly won't work later on. That's okay; you'll be able to write code for the more general case later, and will have all your old tests so that you can verify that the simpler cases, as well as all the other functionality you've already built, still work correctly after the change. The tests are not something you write and then throw away; you'll keep them for as long as you keep your program, so that any time you want to go back and make changes anywhere in the program, your tests will be available to verify that nothing else has been broken as a result.
Run the test again. Hopefully, it will pass, which means that your new feature is implemented! You've now reached stable ground. (With the approach you've been using so far, how often do you feel like you're on stable ground?) If the test fails, that's okay; go back to the previous step and work on the code some more and try again. The tests will tell you when you're successful.
Now that you have your new feature implemented, see whether there are any ways to improve the design of the code. (We're looking for what are often called "code smells": places where the design could be made better.) Have you duplicated code from another part of the class (or from some other class)? Did the code you just added render older code useless? If so, fix the problems now, running the tests after each small change. (There's a name for this process; it's called refactoring.) You can make changes with confidence, because your tests provide a valuable safety net; if some change you've made breaks code that once worked, your tests will tell you so immediately, so you can work on the new problem while the change you just made is still fresh in your mind.
Now start this process again with another feature. Continue this until you believe that all of the features of your program are implemented.

After going through one iteration of this process, you'll have added one new feature to your program, verified that the feature works as expected, and cleaned up any brewing design problems before they become significantly bigger problems later. Each subsequent iteration adds new functionality, while verifiably preserving old functionality. Meanwhile, your design will likely need to be pretty clean — unit testing demands a design in which the individual pieces are broken down and know as little as possible about one another, which is a good goal — and the tests will form a lasting record of your understanding of how your code is supposed to work.

Test-driven development is most likely very different than what you've done in the past, but it leads to a very different kind of result, too.

What is unit testing?

Unit testing is one kind of testing that you might perform on a program you're writing, with the goal of verifying that small, individual pieces of its behavior are correct, outside of the effect of all the other pieces around it. We focus our attention not just on individual modules in a Python program, but on individual behaviors; moreso than just individual functions, we focus on each way that the functions may behave (i.e., there are usually multiple unit tests that contribute to the testing of one function).

What tools do we need?

Performing unit testing is a valuable thing to be able to do; with it, we can gain a level of confidence in the quality of code we write that is harder to achieve without it. But how do we actually do it?

One way is to start a Python shell, load a module into it, and then start running our tests manually, by typing them in and looking at the output. One nice thing about Python is that the Python shell gives us a tool for this kind of thing; we don't need to write a full-fledged program to see the output of individual functions. However, this should nonetheless strike you as a poor choice. It's boring, tedious work — typing in some expressions, then verifying that the output is what we expected.

People tend to be bad at doing boring, tedious work, which means they're likelier to make mistakes, especially when they've been doing it for a while. This means we're likelier to end up having bugs in our code that we don't notice, or going to look for bugs that aren't actually there.
Every time we make a change to the code that we tested, we'll need to re-run all of the tests. (Have you ever made a change to your program to fix one problem, but that fix introduced a problem somewhere else? If so, you know why re-running all of the tests is so important.) This is going to leave us reluctant to make changes, but we can't have that reluctance if we want to get quality software development done; change is inevitable and necessary.

But the nice thing about boring, tedious work is that it tends to be the kind of work that is most amenable to automation. We should be able to write programs that test our programs for us! Then, any time we want to re-test everything, all we need to do is run our test program and see what happens.

A unit testing framework is a library that helps us to write programs like this. The Python standard library includes one, which is called unittest. It handles a few of the more repetitive chores for us:

Individual tests are simply methods in a class. If we name them in a particular way, unittest will find them for us automatically and run each one, so that all we have to do to add new tests is to add methods to our class.
Kinds of tests can be grouped into modules. If we name the modules in a particular way, unittest will find all of them for us automaticlaly, so we can run all of the tests covering our entire program in one go, even if our program consists of multiple modules (and multiple test modules).
If we use unittest's tools for comparing our output to what is expected, unittest will print its output in a way that makes clear the most important things about each failure: What was attempted, what was expected, and what happened instead. That will help us to understand what's broken, so we have a better chance of finding and fixing the problem.

A step-by-step example of test-driven development

In lecture, we worked through several iterations of a test-driven development process, where we wrote portions of two classes we called SongCollection and Song, starting with nothing and using tests to drive our decision-making. We used the unittest module in the Python standard library to write our tests. While it took us most of a lecture to get that code written and tested, that was mainly because I was describing a set of techniques that I expected to be new to you. In practice, each of those iterations would have likely taken no more than a few minutes; if it was me working on my own, I'd have finished the simplest of them in something more like 30-45 seconds, though they aren't usually that simple, of course.

As promised in lecture, I'm providing a step-by-step account of what we did and why we did it. While it's possible that this won't be identical to what we did in lecture — this example tends to turn out a little differently every time I do it — this will certainly capture the spirit of what we were doing, and the "why" is much more important here than the "what."

Step-by-Step Example

What if I still discover a bug?

We didn't talk in lecture about what should be done if you discover a bug in your program, even if you've faithfully adhered to a test-driven strategy. Naturally, using a test-driven development process does not guarantee that a program will work, for a variety of reasons, even if you have no failing unit tests. Following this process allows the tests to help you avoid many mistakes, but there are many other aspects of software development that this process doesn't do much to improve. First of all, your program only works as well as your tests say it will; if one of your tests expects behavior that is incorrect (e.g., the size of an empty collection is 1) and you write code that passes the test, that doesn't mean that the code makes sense in a broader context. Similarly, tests can't verify that the program's requirements are appropriate; if you are tasked with building software that won't meet the business needs of your customer, tests won't help you identify the issue. In short, testing helps verify that a program is correct, but the notion of "correct" often isn't a black-and-white one.

So, unfortunately, there will still be bugs. The question is what should be done when you discover one. The following steps can guide you through your bug-fixing:

Write a test that reproduces the bug and asserts that the unintended behavior shouldn't happen. This step is critical, because it will provide you with a way of being sure that you've actually fixed the bug.
Run the test to verify that it fails because of the bug. If it doesn't, you haven't isolated the problem, so you'll need to go back and write a better test.
Find and fix the bug as you would normally. (If you find that you need to add new features to your program in order to fix the bug, follow the set of steps described above for adding them carefully, writing tests first, writing the minimum amounts of code needed to make them work, and so on.)
Run all of the tests to verify that the bug is fixed and that all of the other tests still pass, as well.

Now you can have confidence that you've not only fixed the problem, but also haven't broken anything else that previously worked. You'll again reach stable ground quickly, and you'll have assurance that you'll know if this bug ever resurfaces; your new test would then start failing again.

Testing side effects

Where test-driven development excels most is in testing functions that are pure. Pure functions are those take inputs and give outputs that are calculated only from those inputs; they're like mathematical functions, in the sense that they always return the same outputs given the same inputs. As you might imagine, these are a lot easier to test than the alternative, because there's no need to think about doing things in a particular sequence, or to worry that the behavior of one function will have affected the outcome of another.

However, functions do quite often have side effects, so it's reasonable to wonder how you might test them. Side effects are anything other than calculating a result from the inputs, which can include printing output to the Python shell, reading input from the keyboard, drawing graphics, writing to files, playing sounds, or even just adding a value to a list. Even the add() method in our SongCollection class had a side effect, because it took the Song object we gave it and added it to a list, which affected the result of subsequently-called methods on that SongCollection.

So, suffice it to say, we can't avoid writing functions with side effects, which means we need to consider how we might write unit tests for them. How you do it requires a two-pronged approach.

Isolate the part of the problem that requires side effects as best you can. For example, rather than writing a function that takes a path to a file and then returns, say, the first word from every line, you'd write separate functions. One function might open the file and read its text, without manipulating it at all. A separate function might take that text and return the first word from every line. The first of these functions can't easily be unit-tested, but, on the other hand, it's simple enough that there's relatively little that can go wrong with it. The second function, on the other hand, is pure; it can be tested using the techniques we've seen, and it's likelier to be the source of problems, anyway. (If it bothers you that you need to read the entire file into memory so that the second function can process it, note that there are features we've not seen in Python — such as generators — that can clean this issue up, allowing one function to process lines of input while another function is producing them one-by-one.)
When you have a side effect that you want to test, you can often test it by checking what happened afterward. Specifically how you do it depends on what the side effect is.
- In the case of our add() method in SongCollection, we did that by considering its effect on subsequent calls to size() or contains(); the question wasn't "What is the output of add()?", so much as it was "What effect does add() have on a subsequent call to size() or contains()?"; we wrote tests to answer the latter kind of question, as opposed to the former.
- In the case of, say, code that prints text to the Python shell, that's a different story, but not impossible to test. There are ways to redirect shell output to other places, which would allow you to capture that output and compare it to expectations. You could also avoid using the print() function altogether, instead calling a function of your own, then make it possible for that function to be configured to write output to one of two places (e.g., adding strings to a list for testing purposes, or printing them to the shell for display purposes). This is all in the general spirit of something we've been talking about all quarter: keeping separate things separate. Automating testing requires this point of view, but it's a good point of view, anyway. (As it turns out, test-driven development isn't only about testing; it's about putting yourself into a frame of mind where you'll naturally make better design decisions, with the tests themselves as an ancillary, albeit important, benefit.)

Additional thoughts

Give this process a genuine try when you work on Project #4, even if it feels less productive — or just plain strange — when compared to your usual strategy for writing your programs. Trust me; for the kind of thing you're building in Project #4 (particularly the game mechanics), if you can get yourself into a rhythm, you will find yourself writing higher-quality code more quickly, with fewer mistakes early on and less debugging to do at the end. As we learned from our experience in lecture, test-driven development works very nicely with pair programming. I sometimes made mistakes in my haste to get code written while still explaining everything to you, but with you folks working collectively as my "partner," we ended up with virtually no mistakes that lasted longer than a few seconds.

You'll definitely find, though, that not all kinds of programs lend themselves to these techniques. For example, some of the graphical portions of Project #5 will probably not be easily testable this way; it's not so simple to write a unit test that demonstrates that the image drawn in a PyGame window is precisely the right image. But to the extent that you can separate this code a bit — the way we did in our PyGame examples in lecture, where we had most of the interesting decisions made in a "model" module (separate from our "view") — you'll find that substantial portions of it might be very testable, even if the outermost layer that talks to PyGame is not.

Above all, have fun! Developing software should be an exciting, enjoyable, and stimulating experience. Test-driven development, when used appropriately, can take away a good deal of the frustration involved, allowing you to concentrate on understanding the problem and constructing a clean solution for it. It's not a silver bullet — nothing in software is — but it is nonetheless a wonderfully useful technique to have under your belt.