Home | Trees | Indices | Help |
---|
|
BaseKernel.BaseKernel --+ | FGSKernel
Test kernel to calculate similarity between pairs of strings. Similarity is based on "k-mers" in common between the two strings with a de-emphasis on carbon groups. All substrings of length k.
Conceptually, a feature vector of all possible k-mers is created for each string and has counts assigned to the elements for each respective k-mer that exists in the string. The dot-product between these two vectors is then taken as the similarity score. Any k-mer with a carbon will be given half the weight of a normal 2-mer
This is a very large vector of length (n^k) where n is the number of letters in the "alphabet" of the string. That is, the number of possible distinct characters the string can contain. This is a sparse vector, mostly 0's, thus actual such arrays are not used to represent these arrays. Instead, a "feature dictionary" containing only found k-mers and their counts is created.
|
|||
|
|||
|
|||
|
|||
Inherited from |
|
|||
k = -1
|
|||
Inherited from |
|
|
|
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:31 2007 | http://epydoc.sourceforge.net |