Class FunctionalGroupAggregateSpectrumKernel

BaseKernel.BaseKernel --+
                        |
                       FunctionalGroupAggregateSpectrumKernel

Simple kernel to calculate similarity between pairs of strings. Similarity is based on the number of "k-mers" in common between the two strings. That is, all substrings of length k.

Conceptually, a feature vector of all possible k-mers is created for each string and has counts assigned to the elements for each respective k-mer that exists in the string. The dot-product between these two vectors is then taken as the similarity score.

This is a very large vector of length (n^k) where n is the number of letters in the "alphabet" of the string. That is, the number of possible distinct characters the string can contain. This is a sparse vector, mostly 0's, thus actual such arrays are not used to represent these arrays. Instead, a "feature dictionary" containing only found k-mers and their counts is created.

Instance Methods

[hide private]

__init__(self, weightFactor=1)
Constructor.

similarity(self, obj1, obj2)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.

buildFeatureDictionary(self, aString)
Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.

weightCalc(self, stringLen)
This function will determine the weight that a string of length stringLen (int) should be given

Inherited from BaseKernel.BaseKernel: dictionaryDotProduct, dictionaryEuclideanDistanceSquared, ensureListCapacity, getFeatureDictionary, normalizeFeatureDictionary, outputMatrix, prepareFeatureDictionaryList

Class Variables

[hide private]

MIN_K = 2

weightFactor = 1

Inherited from BaseKernel.BaseKernel: featureDictList, objIndex1, objIndex2

Method Details

[hide private]

init(self, weightFactor=1)
(Constructor)

Constructor. Takes the value k as an argument to specify the length of the "k-mer" substrings to find in common.

similarity(self, obj1, obj2)

Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.

Overrides: BaseKernel.BaseKernel.similarity: (inherited documentation)

buildFeatureDictionary(self, aString)

Create a dictionary keyed by all the k-mers (k-length substrings) of aString, with values equal to the number of times that k-mer appears in aString.

Overrides: BaseKernel.BaseKernel.buildFeatureDictionary

Class FunctionalGroupAggregateSpectrumKernel

__init__(self, weightFactor=1) (Constructor)

similarity(self, obj1, obj2)

buildFeatureDictionary(self, aString)

init(self, weightFactor=1)
(Constructor)