CS 261, Spring 2023, Practice Problem Set 9

  1. Find the suffix arrays for the three words toot, beveled, and pellmell (Include the end-of-string character, as in the lecture slides. Just list the positions of each suffix in the array; do not include the precomputed LCP information for fast searching. You do not have to go through the steps of the linear-time construction algorithm.)

  2. Find a string \(s\) of six symbols from the two-symbol alphabet \(\{0,1\}\) (plus the end-of-string character $) such that the suffix tree of \(s\\)\) has only ten nodes in it. (Hints: for the suffix tree to be this small, every non-leaf node must have exactly three children. And for a non-leaf node to have a child starting with the symbol $, the substring that it represents must be a suffix of \(s\).)

  3. Suppose that we are using a union-find structure for a family of disjoint sets of elements that have a total ordering (that is, we can compare any two elements and determine which one is smaller, in constant time). We wish to modify the find\((x)\) operation so that, instead of returning an arbitrary member of the set containing \(x\), it always returns the smallest member. Describe a way of performing this modification with the same time per operation (in terms of its \(O\)-notation) as the original union-find data structure.

    (Hints: It does not work to change which node gets linked to which in a union in order to make the minimum element be the root of the tree for its set. The reason it doesn't work is that doing this would interfere with the union-by-size property used in the analysis of the data structure. Instead, find a way of adding extra information to the tree nodes, without changing the connectivity of the tree, that can be maintained quickly in a union operation and allows the find operations to return the correct values.)

  4. Suppose that we want to make the union-find data structure partially persistent. We cannot use the path compression analysis, because that uses amortization, which does not work well with persistence. And we cannot directly use path copying persistence, because there are too many starting points for query paths (one for each different set element).

    Instead of using fat nodes, it is possible to obtain a partially persistent structure by using union by size (without path compression), keeping only one parent pointer per node (non-persistently), and recording for each node a timestamp of the update that first changed its parent pointer from None to another parent. Describe how to perform a find query in this structure. Your query should take as arguments the element on which to perform the find, and the timestamp of a version of the structure to be queried.