We describe an efficient streaming-model construction of epsilon-nets and epsilon-approximations, and use it to find deterministic streaming-model approximation algorithms for iceberg range queries and for various robust statistics problems.
We consider data structures for handling streams of check-in and check-out requests, and that must identify the set of check-ins that are not matched by a corresponding check-out. We show that, if this set has size k, it is possible to handle this problem in space O(k log n) by a deterministic data structure. However, if check-outs may occur that do not match any check-in, then no low-space deterministic solution is possible; instead, we provide a randomized solution with near-optimal space and high probability of answering queries correctly.
We determine the symmetric difference between two similar sets of items, held by different machines on the internet, using an amount of communication bandwidth that is proportional only to the difference between the sets and with low computational overhead. Our solution technique combines the invertible Bloom filter data structure from our previous work on streaming straggler detection with a randomized sampling scheme that allows us to accurately and efficiently estimate the size of the difference.
We show how to use invertible Bloom filters as part of range searching data structures that determine the differences between the members of two sets that lie in a given query range.
(Slides)
Publications – David Eppstein – Theory Group – Inf. & Comp. Sci. – UC Irvine
Semi-automatically filtered from a common source file.