From an invited talk by V. T. Rajan at the 5th MSI Worksh. on Computational Geometry, Stony Brook, October 20, 1995. Notes by D. Eppstein.
Robustness is very important - the program must work well even for degenerate or somewhat malformed input. The input polygon sides are likely to be mostly axis-parallel, but some diagonals can occur, especially at 45 degree angles. (There appears to be little in the way of theoretical results on such "almost-orthogonal" inputs. Automated layout typically produces axis-parallel layouts but critical regions are likely to have been hand optimized with sides of other angles.) Worst case time complexity is important, but the program should also be asymptotically good in the average case, since invariably people will run his codes on much larger inputs than he was anticipating. It is also important that the program run well on small inputs, since the other thing people do with his code is run it on batches of very many small inputs.
Any algorithms used must be simple enough that they can implement them quickly, and so that they can make changes later if necessary. Object oriented design is important - Rajan has found that imperative-style pseudo-code is a bad way of communicating with his programmers.
One particular feature of the input is problematic, and not adequately treated by the theory literature. The inputs Rajan deals with often come from hierarchical specifications, in which an object at one level of the specification may be repeated several times at a higher level. One can flatten this specification out to form the usual sort of polygonal input, but at a cost of multiplying the size by a large factor, typically from around 5 for random logic to around 50 for memory. It would be helpful to develop algorithms that could deal with this sort of hierarchical representation directly.
Inspired by Sugihara's experiments showing that randomized incremental Voronoi diagrams could be practical for large inputs, Rajan then had a programmer working with him write up Mulmuley's randomized incremental algorithm for constructing trapezoidal decompositions of line segment arrangements. Among the advantages of explicitly constructing a trapezoidal decomposition are that you can re-use it several times, and that you can choose your favorite order to traverse it.
The choice of a data representation turned out to be important; Rajan first told the programmer to do anything that seemed reasonable, but the project bogged down before he could get it working. Rajan then tried Guibas and Stolfi's quad-edge data structure, which worked much better but perhaps involves more overhead than necessary. At the time, the extra overhead was unimportant because this part of the code was still much faster than the other things being done to the input, but now those other things have been improved and he's thinking of trying to speed up the trapezoidal decomposition by using a more streamlined data structure.
Nowadays, whenever someone comes to Rajan with a geometric problem, he tries to fit it into his trapezoidal decomposition framework. For instance one can test for overlapping objects (or, after blowing up each object by a given amount, test for objects that are too close to each other) by simply keeping a count for each cell of the number of objects covering that cell, incrementing or decrementing the count as one traverses the decomposition.
Rajan also talked at some length about his implementation for testing density ground rules. For the example of such a rule described earlier, he would first reduce the complexity of the problem by partitioning the input region into 10 by 10 micron squares, and computing the density within each square. He then only considers 100 by 100 micron squares aligned with the 10 by 10 grid. The density in each large square can then be found by averaging 100 small square values. But by using standard running total techniques one can instead use only O(1) operations per large square. To find the density in each small square, he overlays the original input with a grid, using his trapezoidal decomposition (he cited some GIS-related work here).
A question from the audience asked how he tells, with such large inputs (as much as 2 gigabytes) whether his algorithms really are correct. The answer is of course testing, on smaller examples, and especially cross-comparison with previous codes for the same problem. When the two differ, he then has to tediously by hand try to figure out why. His co-workers are very conservative about replacing working code, and even when he finds discrepancies caused by bugs in the old code it is hard to persuade them that the bugs are a problem - they would sometimes prefer his new code be consistent with the bugs. The final test of correctness is whether the VLSI chip works - if it doesn't, that's a costly $100,000 mistake. If it does, no one cares whether the analysis was perfect.
Later, we discussed these numbers in a conversation with Ken Clarkson and Mike Goodrich (but after Rajan had left). Ken seemed to think that the factor of ten was pretty typical for plane sweep vs. randomized incremental techniques. The main advantage he seems to be gaining from the randomized incremental technique is that Rajan has already written it and doesn't need to change it for each new problem he solves. But we wondered whether it wouldn't make sense to replace his code with code that constructs the same trapezoidal decomposition using a plane sweep method. (The difference between this and where he started was that the original code did not construct a trapezoidalization, instead it apparently required you to fit whatever else you were doing into the plane sweep.)
Part of
Geometry in Action,
a collection of applications of computational geometry.
David Eppstein,
Theory Group,
ICS,
UC Irvine.
Last update: .