Package CHEM :: Package Kernel :: Module FGSKernel :: Class FGSKernel

Class FGSKernel

BaseKernel.BaseKernel --+
                        |
                       FGSKernel

Test kernel to calculate similarity between pairs of strings. Similarity is based on "k-mers" in common between the two strings with a de-emphasis on carbon groups. All substrings of length k.

Conceptually, a feature vector of all possible k-mers is created for each string and has counts assigned to the elements for each respective k-mer that exists in the string. The dot-product between these two vectors is then taken as the similarity score. Any k-mer with a carbon will be given half the weight of a normal 2-mer

This is a very large vector of length (n^k) where n is the number of letters in the "alphabet" of the string. That is, the number of possible distinct characters the string can contain. This is a sparse vector, mostly 0's, thus actual such arrays are not used to represent these arrays. Instead, a "feature dictionary" containing only found k-mers and their counts is created.

Instance Methods

[hide private]

__init__(self, k)
Constructor.

similarity(self, obj1, obj2)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.

buildFeatureDictionary(self, aString)
Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.

Inherited from BaseKernel.BaseKernel: dictionaryDotProduct, dictionaryEuclideanDistanceSquared, ensureListCapacity, getFeatureDictionary, normalizeFeatureDictionary, outputMatrix, prepareFeatureDictionaryList

Class Variables

[hide private]

k = -1

Inherited from BaseKernel.BaseKernel: featureDictList, objIndex1, objIndex2

Method Details

[hide private]

init(self, k)
(Constructor)

Constructor. Takes the value k as an argument to specify the length of the "k-mer" substrings to find in common.

similarity(self, obj1, obj2)

Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.

Overrides: BaseKernel.BaseKernel.similarity: (inherited documentation)

buildFeatureDictionary(self, aString)

Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.

Overrides: BaseKernel.BaseKernel.buildFeatureDictionary

Class FGSKernel

__init__(self, k) (Constructor)

similarity(self, obj1, obj2)

buildFeatureDictionary(self, aString)

init(self, k)
(Constructor)