Geometry in Action: Sequence Alignment

Biological Sequence Alignment

Sequence alignment is a problem of taking multiple sequences of characters, representing DNA sequences or proteins, and finding subsequences which match well with each other (according to some measure involving the lengths of the subsequences, the number and lengths of the gaps in them, etc). A common method of performing this is by finding small fragments that match (e.g. by exact string matching techniques) and then stringing the fragments together to form a longer alignment. By considering the positions of the fragments in each string to be coordinates in some Euclidean space, the problem becomes one amenable to orthogonal range searching techniques from computational geometry (and the decomposition of space formed by, for each point, determining the best previous point to chain it to, has strong resemblances to a rectilinear Voronoi diagram). Similar ideas also apply to a problem of comparing sequences of gene markers (where now the coordinates are positions of markers on a chromasome).

Chaining Multiple-Alignment Fragments in Sub-Quadratic Time, G. Myers and W. Miller, U. Arizona, from SODA '95.
Graph partitioning. Geometric methods for finding graph separators. Lecture notes from CS267, UC Berkeley, on high performance computing. This problem has applications in scientific computation, and, apparently, in DNA sequencing.
Sparse dynamic programming, D. Eppstein et al., SODA 1990 and JACM 1992.

Part of Geometry in Action, a collection of applications of computational geometry.
David Eppstein, Theory Group, ICS, UC Irvine.

Semi-automatically filtered from a common source file.