______________
| |
parameters ----> | DATA MODEL |
|______________|
|
| data
V
_______________
| |
| NOISE MODEL |
|_______________|
|
| observations
V
_____________
| |
| ALGORITHM | ----> estimate
|_____________|
To some extent the division between the data model and noise model is arbitrary (both are describing physical processes that are parts of the real world) and comes from the division between which parameters we want to estimate and which other ones we want to treat as noise and ignore.
The data model sets the general flavor of problem being considered: single point estimation (data model: pass parameters unchanged as data coordinates), regression (data model: parameters are the coordinates of a linear function relating independent and dependent variables), clustering (data model: choose randomly among several different points or point distributions), or hierarchical clustering (e.g. evolutionary tree reconstruction, in which the parameters define an evolutionary tree and the data model uses this tree to define mutated gene sequences for each leaf of the tree).
More complicated data models of interest to the AI students in the course are hidden Markov models for time series data, and belief propagation methods for decoding error correcting codes ("turbocodes"). The problem of devising a good error correcting code could be seen as choosing a data model in such a way that whatever noise you have doesn't destroy too much of your data.
The noise model determines which to choose among several different estimation algorithms returning the same sort of output. E.g., if one is estimating a single point value, one might choose among least squares (Gaussian noise), the centroid (for noise distributions with known zero expectation and unknown radial components), the circumcenter (for arbitrary bounded-radius noise distributions), or the minimum circumcenter of n/2 points (for robust outlier elimination).
The simplest way of optimizing for speed without compromising statistical quality is to work on improved algorithms for already known statistical methods: that is, to find fast algorithms that produce exactly the same output as other slower algorithms already in use. One can also look for approximation algorithms that quickly find a result "almost as good as" the original slower algorithm, but one must be careful to preserve the other important statistical qualities of the original algorithm -- generally it's safest to approximate the estimated values themselves, rather than to approximate whatever objective function the estimation algorithm is optimizing.
David Eppstein,
Theory Group,
Dept. Information & Computer Science,
UC Irvine.
Last update: