Geometry in Action

Data Mining and Multidimensional Analysis

Data mining is the process of querying large databases (such as point-of-sale records) with the aim of distilling from them broad patterns and smaller collections of useful information. There seems to be little work in this area from the computational geometry perspective, but there are likely good geometric problems to be found in it. One such problem is coping with high-dimensional data, by condensing information down to a small number of relevant dimensions and applying geometric clustering techniques. Any algorithm to be used in this context must be fast, but it is perhaps more important to deal with amounts of data that do not fit in memory, and keep to a minimum the total number of I/O operations needed, as has been considered in recent work of Goodrich et al. ("External-memory computational geometry", 34th FOCS, 1993, 714-723). There are also interesting connections with geographic information systems, which face similar problems of querying large databases with more explicitly geometric content.

Part of Geometry in Action, a collection of applications of computational geometry.
David Eppstein, Theory Group, ICS, UC Irvine.

Semi-automatically filtered from a common source file.