ICS 65 Fall 2011
Project #2: The Memory of Trees

Due date and time: Friday, November 4, 11:59pm


Introduction

In lecture, we've been discussing memory management issues and how to make your classes "well-behaved." Remember that in C++, classes specify user-defined data types that, unlike in Java, are in many ways functionally equivalent to the built-in types like int. In other words, there's no broad distinction between the handling of "primitive types" like int and that of class types like BinarySearchTree. That means that, just like an int, a BinarySearchTree can be allocated and deallocated as a local variable, a global variable, a static or non-static member variable of a class, or dynamically allocated on and deallocated from the heap using new and delete. A BinarySearchTree can also be copied with an assignment statement, passed as a parameter by value (meaning the contents of the tree should be copied) or by reference, and constructed as a copy of an existing one. Some member functions can be called on a const BinarySearchTree and some can't. In short, there is no fundamental difference between the capabilities of an int and those of a properly-designed BinarySearchTree, other than the obvious difference in terms of what data they store and what operations they can perform.

As is often the case in C++, this provides a great deal of flexibility at the cost of complexity. We talked extensively in class about constructors, destructors, copy constructors, overloaded assignment operators, const and non-const member functions, and how they allow you to build "well-behaved" classes. Now I'd like you to have some practice writing such a class, with an emphasis on learning how to handle managing memory for your own linked data structure. Additionally, I'd like you to be exposed to the clarity and simplicity of the Standard C++ Library by using the vector class.


Maps and multimaps

When you took a data structures course (ICS 22 / CSE 22, Informatics 42, or something equivalent elsewhere), you probably learned about a data structure called a map (sometimes called a table or a dictionary). A map is an indexed set of associations (sometimes called entries), in which each association contains a key that uniquely identifies it and a value, which consists of the remaining data in the entry. For example, you could imagine a student identification system in which there was a map of students. The key would likely be a student ID number and the value would be the information about each student: name, home address, and so on. The key and value don't necessarily have to be stored in separate objects; in the student example, there might be a Student class that encapsulates both.

Abstractly, a map is any data structure that stores such a set of data. Naturally, there are efficient and inefficient ways to implement such a structure. You've likely learned at some point about implementing a map as a binary search tree, where each node in the tree contains one association, and the nodes are organized such that, for all nodes n with a key k, all nodes in the left subtree of n have keys less than k and all nodes in the right subtree of n have keys greater than k. So long as the tree is relatively balanced, insertions, lookups, and removals will run in O(log n) time on a tree with n nodes. Special techniques such as AVL can be used to maintain balance, even in extreme cases such as inserting keys in ascending order.

A multimap is a similar data structure, whose key difference is that it allows more than one association to contain a particular key. (Or, viewed differently, one key may have multiple values associated with it.) In a student enrollment system, where students may be enrolled in multiple classes simultaneously, there might be an association for each enrollment of a student in a course, where the key is the student ID and the value is the course ID. If a student is enrolled in three courses, there will be three associations containing the same key.

Building such a data structure involves slightly more work, but not as much additional effort as you might think. You can still implement it as a binary search tree, where each node contains a key and a set of all of the values associated with that key; if the number of associations for each key will be relatively small, it is tolerably efficient use a linear data structure like a linked list or a vector to store them.


The program

It's likely that you've used WebReg to register yourself for courses in the past. I'd like you build a very simple student registration system. In our system, there are two main entities:

Your registration system should be capable of performing the following tasks:

All of the functionality except for the search and "print a list of all..." functionality should run in logarithmic time on the average. In order to accomplish this, you are required to implement a Multimap class, described below. Once completed, you can then create two Multimap objects:

Conveniently, student IDs and course IDs are both to be stored as ints, student names and course names are both to be stored as strings, and the sets of values can be stored in vector<int>'s; this allows us to use the same Multimap class in both situations.

You may design your user interface however you'd like, as long as it's clear to us how to use it when we run it.


Building a "well-behaved" Multimap class

We've spoken extensively in lecture about building "well-behaved" classes, which include constructors (including a default constructor, if one makes sense), destructors (if necessary), copy constructors, overloaded assignment operators, and const member functions (where appropriate). I'd like your Multimap class to be well-behaved, just as our Queue class in lecture was, so that it can be declared as a local variable, allocated and deallocated dynamically with new and delete, passed by value, used in an assignment statement, and have const objects that can still be used in ways that are reasonable.

I've provided a header file for your Multimap class as a starting point, which I'd like you to use as the design for your class; in other words, you are not permitted to change the signatures of any of the public member functions of the provided Multimap class, though you can change any of the private details you'd like. (In general, I'm a fan of providing students with the ability to adjust designs to their liking, though the kinds of details you're faced with in this project are mostly at the implementation level; mandating a design allows me to do automated tests that I can't do if you have full design freedom.)

Pay special attention to my use of the Standard C++ Library vector template. There's a chapter in the Savitch text about vectors. You can also read through the Standard C++ Library Module User's Guide by Rogue Wave Software, which explains the necessary concepts, such as iterators and generic algorithms, eloquently. I suggest reading through the first five chapters — which aren't very long — to understand some of the basics of the Standard C++ Library. We'll also talk about some of these issues in lecture, but the details are too numerous to include in lecture, so you'll really need to read through these chapters if you'd like to gain a thorough understanding of the Standard C++ Library.

The design of the class, as I've provided it, is:

You are not required to keep your tree balanced, but you may if you wish. But, if you don't, be sure that you aren't assigning student IDs or course IDs consecutively, because that will result in the worst-case performance for your trees.

I should point out that the design I've suggested is not nearly as generic as it could be. It assumes that each key has a name associated with it, in addition to a set of values; names are not a typical feature of multimaps. It assumes that the keys are integers, the names are strings, and the values are integers. This design fits very nicely into the program that you'll be building in this project, but a multimap is a generic concept which could be implemented in a way that would allow it to be reused in many other contexts. To make the multimap properly generic requires quite a bit of syntax that we haven't learned yet — the C++ equivalent of Java's generics, which are called templates, would be of great use for this — most of which we'll learn later this quarter. For now, I suggest sticking with the non-generic design proposed above, but if your intuition is telling you that there's something wrong, you're right!


A few suggestions

I have a few suggestions that you should read before you embark on your work.

Testing

Testing your Multimap class will likely be easier if you do it separately from your main program. I suggest writing a separate test program — or at least a testing function that can be called to test the Multimap. This will allow you to explore the various issues (passing Multimaps by value, assigning them into one another, creating copies of them, etc.) in an easier-to-understand context. Building your main program will then be much simpler, if you've already verified the functionality of your Multimap.

You're free to submit your tester along with your project, but please set your program up so that running it will call into your normal user interface.

Debugging

Debugging your Multimap implementation will be more difficult than debugging a typical Java class. The kinds of problems that you're likely to have with managing memory don't often have visible symptoms; if you forget to delete the nodes when you destroy a Multimap, the program will quietly leak memory, but will likely go right on working fine otherwise. The debugger within Visual Studio's C++ environment can really help, though, and I suggest that you use it when testing your program. When you run your program, run it in debug mode (by pressing F5 to execute it). When in debug mode, the environment checks for memory-related problems, such as accessing unallocated memory or deallocating the same memory more than once. In other words, the debugger provides a means of making visible at least some the problems that otherwise wouldn't be.


Starting point

I'm providing a header file for your Multimap class called Multimap.h, which suggests a design for your class. You are required to stick with the public "interface" I've provided (i.e., the signatures of the public member functions cannot change), but you're welcome to make any changes you'd like to the private portions.


Deliverables

Submit the C++ source and header files (.cpp and .h) that comprise your program. Do not submit any of the other files used or generated by your development environment. Follow this link for a discussion of how to submit your assignment. Remember that we do not accept paper submissions of your assignments, nor do we accept them via email under any circumstances.


Limitations

The Standard C++ Library provides an implementation of a map in a template class, aptly named map, and a multimap implementation called multimap. Allowing you to use these classes would defeat the purpose of the assignment, which is to have you build a "well-behaved" tree class on your own, using pointers and the dynamic memory management techniques that I described in lecture. So you may not use map or multimap in your solution. I encourage you, however, to use a vector in each tree node to store the values associated with a key, rather than using an array.


Additional challenges

I don't offer any extra credit in this course, but if you'd like to continue working on your project, you might want to take it in one or more of these directions:

I suggest finishing the project as assigned first and making a backup copy before tackling these additional challenges. That way, if you can't get these working, you'll still be able to get credit for the project.