We determine the symmetric difference between two similar sets of items, held by different machines on the internet, using an amount of communication bandwidth that is proportional only to the difference between the sets and with low computational overhead. Our solution technique combines the invertible Bloom filter data structure from our previous work on streaming straggler detection with a randomized sampling scheme that allows us to accurately and efficiently estimate the size of the difference.
Co-authors – Publications – David Eppstein – Theory Group – Inf. & Comp. Sci. – UC Irvine
Semi-automatically filtered from a common source file.