Suppose that you are given the bitmap representation of a set \(S\) of non-negative integers. Describe how to find the smallest non-negative integer \(x\) that is not a member of \(S\), in a constant number of integer operations.
(This is an important operation in combinatorial game theory, where it is called the "minimum excluded value" or "mex". You may assume that the dictionary set2element
from the lecture notes has already been computed and that looking up a key-value pair in that dictionary is allowed among the constant number of operations that you perform. You may also assume that the resulting value \(x\) is within the range of values allowed by your bitmap set representation.)
Suppose we wish to maintain a set of \(n\) items and a numbering of the items (a function mapping the items to a small range of integers), so that each two items in the set are mapped to distinct numbers, and so that we can look up the number for each item in the set in constant time. If the set changes, the numbering can also change. These numbers could be used, for instance, as indexes into arrays to store information about the items. Describe how to do this using a cuckoo-hash based set representation, without using any more storage space than the set itself would already use.
The lecture notes include a lower bound showing that (under standard assumptions) it is not possible to handle disjoint-set queries in substantially sublinear time per query, using a data structure whose construction takes polynomial time and space. However, this does not rule out fast queries using bigger and slower-to-construct data structures.
Suppose that:
We are given as input a family of \(n\) sets \(S_i\), each of size \(k\).
Each set is a subset of the numbers from \(0\) to \(N-1\) for some \(N\) (either because they were given that way or because we used the answer to question 2 to number the set elements)
We wish to handle disjoint-set queries of the following form: given a query set \(Q\) of size \(k\), as a list of its elements, answer ``yes'' if the input sets include a set \(S_i\) that is disjoint from \(Q\), or ``no'' if no such set exists. When the answer is yes, it is not necessary to find which set \(S_i\) is disjoint.
Describe how to use bitmaps to represent a set-of-sets data structure that can answer disjoint-set queries in time \(O(k)\), using a data structure whose space and construction time is exponential in \(N\). You may assume that operations on \(N\)-bit binary numbers, and that array indexing using numbers of this size, take constant time per operation.
We have seen that, to achieve a false positive rate of \(\varepsilon\) for a set of \(n\) items, Bloom filters need \((n\log_2 1/\varepsilon)/\log 2\) bits of storage. We didn't analyze cuckoo filters as precisely, but many sources give a bound of \(n(2+\log_2 1/\varepsilon)\) on the number of bits of storage needed for them. Assuming that this bound is accurate, for which values of \(\varepsilon\) do cuckoo filters use fewer bits than Bloom filters, and for which values of \(\varepsilon\) do cuckoo filters use more bits? (It may be easiest to solve this numerically by writing a short computer program. Logarithms without subscripts are natural logarithms.)