Program 5

Graph Implemented via Collections
and
An Important Graph Algorithm

Fundamental Data Structures
ICS-23


Introduction This programming assignment is designed to improve your knowledge of using collection classes (especially Map and Set), while teaching you important concepts of representing graphs and writing graph-processing algorithms. There are two problems in this assignment.
  • The first problem is to write a class named HashGraph that implements the Graph interface, which includes a wide variety of simple and efficient bookkeeping operations on the nodes and edges in a directed graph.

  • The second problem involves implementing an important graph algorithm using HashGraph. The algorithms is Dijkstra's all-shortest-paths algorithm

When writing HashGraph, you will run it against my JUnit tests to verify (a bit too strong of a word here) that it is correct. You may also find it useful to test your class with the DriverForGraph. With this driver, you can individually test any methods in your classes interactively, and see their results (returned values and state changes, mostly using toString). Both the JUnit test and driver are included in the start project folder for this project. I have also included the Javadoc for the HashGraph class (built from the comments in the .java file.

You are already familiar with classes implementing the Map and Set interfaces; use the standard HashMap and HashSet implementations (actually, I will write the HashSet implementation over the weekend: for now use ArraySet to work on your program, and just substitute HashSet for ArraySet when I send email telling you where to get HashSet; the only difference will be the speed of the operations on these two different set representations).

You will implement the HashGraph class using these interfaces/classes, along with two simple inner classes that I have implemented specially for this assignment: SimpleEdge (which is a lot like SimpleEntry for HashMap: a class implementing Graph.Edge) and LocalInformation. I have provided a HashGraph class with Javadoc comments for all its methods, along with full implementations of the SimpleEdge and LocalInformation classes; change only the former, not the final two (but definitely read and understand their code).

I have also provided an interface named EdgeValueIO (in the file EdgeValueIO.java) and various simple classes implementing this interface for standard edge values; examine these files briefly (mostly their information is used in the load and write methods, which need to translate to/from String information (in files).

Download and unzip the following Eclipse project Start and use it to start working on this program. For each part of this assignmnment you will update and turn in a single .java file in the project (see the Checkmate submission for this assignment for more details).

Only one student should submit the assignment, but both student's names should appear in the comments at the top of each submitted program. Please turn in each program as you finish it, so that I can accurately assess the progress of the class as a whole during this assignment.

Print this document and carefully read it, marking any parts that contain important detailed information that you find (for review while you work on this assignment and before you turn in the files).


The HashGraph Class The start folder contains one main file to examine/update: HashGraph.java; but there are lots of others (most importantly Graph.java) to read and understand. You can download and run a special Executable version of my DriverForGraph to help you understand the specification of the problem and observe the programmer/user interaction that you are to implement. I suggest your create a few small graphs (with int edges), enter them using this driver, and then call various methods on the graph to see their behavior.

The HashGraph class acts as a repository for all the nodes and edges that comprise a graph, organized in such a way that allows us quickly to access information that is useful in graph algorithms. Part of the complexity comes from keeping (and updating) redundant information.

We will represent each graph primarily using two maps: one with nodes as keys and one with edges as keys. With this representation, it is easy to get... all the nodes and edges directly connected to some node, and get... the value associated with any edge.

  • nodeMap maps its key (a node name represented by a String) to its value: a LocalInformation object that stores four sets:
    1. nodes with edges that lead to them from this node (destination nodes with this node is their origin node)
    2. nodes whose edges lead to this node (origin nodes with this node as their destination node),
    3. edges leading to other nodes from this node, (all edges with this node is their origin node)
    4. edges from other nodes leading to this node (all edges with this node is their destination node)

  • edgeMap maps its key (an Edge) to its value (an object of generic type E, often using a String or wrapper class for the edge's value; but we can use any class for this value).
Clarification: Although there is an accessor named getValue in the SimpleEdge class (along with getOrigin and getDestination edge values ARE NOT stored in SimpleEdge objects (examine that class). Instead they are retrieved from the edgeMap in getValue by using the SimpleEdge object as a key (which stores only the origin and destination node).

We partition the connected nodes into two sets: outgoing and incoming, by using the outNodes and inNodes instance variables in the LocalInformation class. Likewise, we partition the edges into two sets: outgoing and incoming, by using the outEdges and inEdges instance variables in the LocalInformation class.

All these sets could be computed when necessary from the main maps, but it is more efficient to cache their values and then be able to use these values directly, without (re)computing them. The downside of caching is that it takes more space and whenever we udpate the main maps, we must also update the related LocalInformation sets too, which is where most of the complexity of this programming assignment comes from.

A variety of methods return Iterable objects: typically this means that they return sets of nodes or edges, because we keep track of most LocalInformation in sets and the Set interface extends Iterable. The code using such an Iterable object can easily turn it into a concrete Set or List by supplying the Iterable object to a constructor from a class.

These iterators should NOT allow their remove method to work; we don't want code with access to an iterator for nodes or edges to remove those nodes or edges directly (which would cause all sorts of inconsistencies with the other cached values; a programmer could wreck our carefully controlled data structures). Removal should be done only by the HashGraph methods. To aid you in this task, I have supplied a decorator for Iterable named Unremovable, which delegates hasNext and next to the original Iterable object supplied to its constructor, but throws UnsupportedOperationException whenever remove is called.

Here is a brief list of all the methods in the Graph interface. All of these methods are documented more fully in the class you are writing. I have already written some of the bodies of these methods; others you must write.

  • There are two constructors: one constructs an empty graph, another constructs a graph that is a copy of its graph parameter.

  • addNode adds a node (with no outgoing or incoming nodes/edges) to the graph.
  • addEdge adds an edge to the graph, updating the LocalInformation for its origin and destination nodes -and adding these nodes automatically, first, if they do not already exist in the graph.
  • removeNode removes a node from the graph, as well as all the edges that refer to that node, and updates the LocalInformation for the affected nodes (affected nodes are origin and destination nodes for the removed edges).
  • removeEdge removes an edge from the graph, updating the LocalInformation for its origin and destination nodes.
  • clear removes all nodes and edges from the graph.

  • load reads a graph from a file (see the comment for details)
  • write writes a graph to a file (see the comment for details) The details ensure that graphs that are written to a file can then be read back in.

  • getNodeCount returns the number of nodes in the graph.
  • getEdgeCount returns the number of edges in the graph.
  • hasNode determines whether or not a node is in the graph.
  • hasEdge determines whether or not an edge is in the graph.
  • getEdgeValue returns the value associated with the edge that is its parameter (or null if that edge is not in the graph).

  • inDegree returns the number of incoming edges to a node.
  • outDegree returns the number of outgoing edges from a node.
  • degree returns the number of edges outgoing or incoming to a node.
  • getAllNodes returns an Iterable object for all the nodes in the graph (whose remove method is disabled).
  • getAllEdges returns an Iterable object for all the edges in the graph (whose remove method is disabled).
  • getOutNodes returns an Iterable object for all the nodes in the graph that are desinations of the given node via some edge (whose remove method is disabled).
  • getInNodes returns an Iterable object for all the nodes in the graph that are origins of the given node via some edge (whose remove method is disabled).
  • getOutEdges returns an Iterable object for all the edges in the graph that have the given nodes as their origin (whose remove method is disabled).
  • getInEdges returns an Iterable object for all the edges in the graph that have the given nodes as their destination (whose remove method is disabled).
  • toString returns the textual representation of a graph by including all the nodes and edges that it contains; the nodes are sorted in alphabetic order; the out edges are sorted by destination nodes; the in edges are sorted by origin nodes.
Although this interface contains many methods, many are similar in functionality (and code). Each is implemented relatively easily via the powerful collection classes used to represent the graph. Some methods, which would normally be void (say addNode) return the entire HashGraph object (see return this; at their end). Using this approach, we can cascade method calls: e.g., g.addNode("a").addNode("b"); instead of writing two sequential statements. g.addNode("a"); g.addNode("b"); This isn't a huge win, but it is a convenience and you should see this style (any void method can instead be made to return such an object).

Often writing, testing, and debugging one method in a class will immediately lead to the correct code for one (or more) other methods; some simpler methods are called inside other more complicated methods. As with many programming problems, understanding the concepts involved will take lots of time; writing code must come afterward, and it should be more straightforward, although your bugs will lead you to discover yet more information.

There generally is no exception throwing in these methods. With non-existant nodes/edges as parameters, most of these methods return empty sets (or other special values like null). In other cases, such a removing non-existent nodes/edges, the methods make no state changes and they do NOT throw exceptions.


Writing/Testing The HashGraph Class Try to write, test, and debug one method at a time, using the driver in the DriverForGraph class. Basically, call toString after every mutator to see if the state of the class has been changed correctly.

I'd start by writing the first constructor and the simplest methods that manipulate nodeMap: addNode and getAllNodes (which is necessary in toString method in HashGraph). Next I'd write addEdge: you can try to have it update the LocalInformation for the origin and destination nodes, or I'd suggest leaving that functionality until it is needed later. We use these two methods to build any graph.

I would next write the getAllEdges method. Then I would write the getNodeCount and getEdgeCount methods, which can be easily computed from their respective maps.

When these work successfully, I'd write the hasNode, hasEdge, and getEdgeValue methods. Once you write the getEdgeValue method, the toString method should show values with the edges. Remember to test adding an edge with some value, and then adding the same edge (but with a different value; the new edge should replace the old one).

I'd next write the clear, load and write methods.

Next, I'd make sure that addEdge updates its LocalInformation and I'd write the inDegree, outDegree, and degree methods. I'd follow this by writing the getOutNodes, getInNodes, getOutEdges, and getInNodes to test that adding edges works corectly.

Finally, I'd write the second constructor, and then the removeEdge method, and then the removeNode method (which will automatically call removeEdge for every edge related to the node being removed): both must make LocalInformation updates. Again, call toString after every mutator to see if the state of the class has been changed correctly.

Of course, you can run JUnit test during this process, and that is what we will grade this part of the assinment on.


Extended Dijkstra's Algorithm
Implementing a Graph Algorithm with the HashGraph class
In this part of the programming assignment, you will implement an extended version of Dijkstra's Algorithm, using the HashGraph class -as well as various other collection classes. This algorithm reads a graph whose edges store integers (representing costs) and computes the minimum cost (shortest) paths to all nodes from a given node (the user is prompted for this node). For any other given node, we can compute up the minimum cost to reach it and list the nodes on this path. FYI, my solution added about 50 lines of code to main.

The start folder contains the file Dijkstra.java, which contains some code and many comments to guide you through the implementation. I will also summarize the extended Dijkstra Algorithm here. Most data is stored as objects from a class named Info, which contains the name of a node, the cost of the minimum path to reach that node (initialzed to +infinity and updated in the algorithm), and the name of the node before it on the shortest path (initialized to "" and updated in the algorithm). Objects of this class are ultimately stored in three collections: as the values of keys in a map, as the values in a priority queue (with the lowest cost having the highest priority: this class contains Comparator for this prioritization), and temporarily as the values in a set.

  1. Declare a map and put each node in the graph as a key in the map, with its value a new object of Info (which is not filled in yet).

  2. Prompt the user for the name of a starting node in the graph; keep prompting if the user enters a node name that is not in the graph. Find the Info for this node in the graph and set its cost to 0: since it is the start node, the cost to reach it is 0.

  3. Put all the Info values (not keys) from the map into the priority queue.

  4. So long as the priority queue is not yet empty, process another node.
    1. Remove the highest priority value in the priority queue (the one whose cost is lowest: initially, the start node). Call this the min node. We now know the least costly path from the start node to min.

    2. For every node d that is a destination from the min node, get d's Info and see if the cost is infinite or greater than the cost of the path from the start node to min, plus the cost of the edge from min to d. If it is, (1) update the cost in Info to this smaller number, (2) set the predecessor of d to be min, and (3) put this Info into a set of "changed" Info objects.

    3. Use an iterator to remove all the changed Info objects from the priority queue, and then re-add these objects (their cost -e.g., priority- is different) to the priority queue.

    4. Continue around the loop

  5. Print the map: note the Info values are filled in with the minimum cost to reach each node and the node preceding it on the minimum path.

  6. Repeatedly prompt the user for the name of a stopping node in the graph; ignore any node name that is not in the graph. Find the cost from the start node to that node and list the nodes on the path by repeatedly "getting" predecessors in the map (a Stack will be useful here, because we visit nodes backwards from the stopping node to the starting node, but want to print the paths forward, from the starting node to the stopping node).
Overall, this algorithm uses a stack, priority queue, set, and map -most of the collections that we have studied and implemented (except queue).

You can download and run a special Executable version of my Dijkstra program to help you understand the specification of the problem and observe the programmer/user interaction that you are to implement. Test it on the flightcost.txt file, typing in later the names of two cities.