Package CHEM :: Package Kernel :: Module FGSKernel :: Class FGSKernel
[hide private]
[frames] | no frames]

Class FGSKernel



BaseKernel.BaseKernel --+
                        |
                       FGSKernel

Test kernel to calculate similarity between pairs of strings. Similarity is based on "k-mers" in common between the two strings with a de-emphasis on carbon groups. All substrings of length k.

Conceptually, a feature vector of all possible k-mers is created for each string and has counts assigned to the elements for each respective k-mer that exists in the string. The dot-product between these two vectors is then taken as the similarity score. Any k-mer with a carbon will be given half the weight of a normal 2-mer

This is a very large vector of length (n^k) where n is the number of letters in the "alphabet" of the string. That is, the number of possible distinct characters the string can contain. This is a sparse vector, mostly 0's, thus actual such arrays are not used to represent these arrays. Instead, a "feature dictionary" containing only found k-mers and their counts is created.

Instance Methods [hide private]
 
__init__(self, k)
Constructor.
 
similarity(self, obj1, obj2)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.
 
buildFeatureDictionary(self, aString)
Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.

Inherited from BaseKernel.BaseKernel: dictionaryDotProduct, dictionaryEuclideanDistanceSquared, ensureListCapacity, getFeatureDictionary, normalizeFeatureDictionary, outputMatrix, prepareFeatureDictionaryList

Class Variables [hide private]
  k = -1

Inherited from BaseKernel.BaseKernel: featureDictList, objIndex1, objIndex2

Method Details [hide private]

__init__(self, k)
(Constructor)

 
Constructor. Takes the value k as an argument to specify the length of the "k-mer" substrings to find in common.

similarity(self, obj1, obj2)

 
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.
Overrides: BaseKernel.BaseKernel.similarity
(inherited documentation)

buildFeatureDictionary(self, aString)

 
Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.
Overrides: BaseKernel.BaseKernel.buildFeatureDictionary