Data Mining and Multidimensional Analysis
Data mining is the process of querying large databases (such as
point-of-sale records) with the aim of distilling from them broad
patterns and smaller collections of useful information. There seems to
be little work in this area from the computational geometry perspective,
but there are likely good geometric problems to be found in it. One
such problem is coping with high-dimensional data, by condensing
information down to a small number of relevant dimensions and applying
geometric clustering techniques. Any algorithm to be used in this
context must be fast, but it is perhaps more important to deal with
amounts of data that do not fit in memory, and keep to a minimum the
total number of I/O operations needed, as has been considered in recent
work of Goodrich et al.
("External-memory computational geometry",
34th FOCS, 1993, 714-723). There are also interesting connections with
geographic information systems, which face similar
problems of querying large databases with more explicitly geometric
content.
Part of
Geometry in Action,
a collection of applications of computational geometry.
David Eppstein,
Theory Group,
ICS,
UC Irvine.
Semi-automatically
filtered
from a common source file.