Package CHEM :: Package Kernel :: Module FunctionalGroupAggregateSpectrumKernel :: Class FunctionalGroupAggregateSpectrumKernel
[hide private]
[frames] | no frames]

Class FunctionalGroupAggregateSpectrumKernel



BaseKernel.BaseKernel --+
                        |
                       FunctionalGroupAggregateSpectrumKernel

Simple kernel to calculate similarity between pairs of strings. Similarity is based on the number of "k-mers" in common between the two strings. That is, all substrings of length k.

Conceptually, a feature vector of all possible k-mers is created for each string and has counts assigned to the elements for each respective k-mer that exists in the string. The dot-product between these two vectors is then taken as the similarity score.

This is a very large vector of length (n^k) where n is the number of letters in the "alphabet" of the string. That is, the number of possible distinct characters the string can contain. This is a sparse vector, mostly 0's, thus actual such arrays are not used to represent these arrays. Instead, a "feature dictionary" containing only found k-mers and their counts is created.

Instance Methods [hide private]
 
__init__(self, weightFactor=1)
Constructor.
 
similarity(self, obj1, obj2)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.
 
buildFeatureDictionary(self, aString)
Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.
 
weightCalc(self, stringLen)
This function will determine the weight that a string of length stringLen (int) should be given

Inherited from BaseKernel.BaseKernel: dictionaryDotProduct, dictionaryEuclideanDistanceSquared, ensureListCapacity, getFeatureDictionary, normalizeFeatureDictionary, outputMatrix, prepareFeatureDictionaryList

Class Variables [hide private]
  MIN_K = 2
  weightFactor = 1

Inherited from BaseKernel.BaseKernel: featureDictList, objIndex1, objIndex2

Method Details [hide private]

__init__(self, weightFactor=1)
(Constructor)

 
Constructor. Takes the value k as an argument to specify the length of the "k-mer" substrings to find in common.

similarity(self, obj1, obj2)

 
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.
Overrides: BaseKernel.BaseKernel.similarity
(inherited documentation)

buildFeatureDictionary(self, aString)

 
Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.
Overrides: BaseKernel.BaseKernel.buildFeatureDictionary