Package CHEM :: Package Kernel :: Module BaseKernel :: Class BaseKernel

Class BaseKernel

Abstract base class for all kernel classes. Kernel classes are those that can take a pair of of objects an calculate some similarity score between them for use in a support vector machine (SVM) style machine-learning application.

These objects need not be of any particular type as far as this interface is concerned. They may be a pair of strings, molecules (OEMolBase), vectors, etc. It is up to the implementing class to make those distinctions.

Ultimately, the scores generated from these kernels will probably be used to build a "Gram matrix" of scores on a list of source objects against itself. This abstract class provides a convenience methods for generating this matrix given an iterator factory for the list, outputting it as a tab-delimited file.

Object iterator factories, that is, an object that can produce fresh iterators over the object list, must be used rather than simple iterators because nested loops will be used to iterate over the objects multiple times. Thus, for example, if a file object was used, this would be a problem since, after the first iteration, the end-of-file would be reached. The Common.IteratorFactory module contains a couple classes for generating such factories from common source types (files, arrays, oemolistream).

Instance Methods

[hide private]

similarity(self, obj0, obj1)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.

dictionaryDotProduct(self, featureDict1, featureDict2)
Given two dictionaries, treat these like vectors and take the "dot-product" between them.

dictionaryEuclideanDistanceSquared(self, featureDict1, featureDict2)
Given two dictionaries, treat these like vectors and calculate the Euclidean distance between them, squared.

buildFeatureDictionary(self, obj)
Optional abstract method.

getFeatureDictionary(self, obj, objIndex)
See if a feature dictionary has already been created for the object at the specified objIndex.

normalizeFeatureDictionary(self, featureDict)
Given a dictionary, interpret it as a feature vector, whose values are some numerical value.

ensureListCapacity(self, aList, targetSize)
Ensure that the given list is at least the given size.

prepareFeatureDictionaryList(self, objIter)
Pre-processing step.

outputMatrix(self, objIterFactory, outFile)
Utility method to calculate a similarity for every pair of objects that come out of the iterators of teh objIterFactory and output them to the outFile as a tab-delimited matrix of values.

Class Variables

[hide private]

objIndex1 = -1

objIndex2 = -1
Temporary storage for the feature dictionaries calculated for any objects.

featureDictList = <CHEM.DB.rdb.search.NameRxnPatternMatchingMo...

Method Details

[hide private]

similarity(self, obj0, obj1)

Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.

dictionaryDotProduct(self, featureDict1, featureDict2)

Given two dictionaries, treat these like vectors and take the "dot-product" between them. That is, find all items whose key is found in both dictionaries and expect the item values to be some kind of count or number. Then just take the product of those 2 numbers and sum up all such products.

dictionaryEuclideanDistanceSquared(self, featureDict1, featureDict2)

Given two dictionaries, treat these like vectors and calculate the Euclidean distance between them, squared. That is, find all items whose key is found in both dictionaries and expect the item values to be some kind of count or number. Sum the square difference between each of these. Furthermore, if an item exists in one dictionary, but not the other, treat the other dictionary as having the item but with a value = 0.0.

buildFeatureDictionary(self, obj)

Optional abstract method. Given some object to compare, build a dictionary of its important features such that it is easy to compare any two objects' feature dictionaries.

Combined with the getFeatureDictionary(...) and prepareFeatureDictionaryList(...) methods, this can save a lot of time as the features need only be calculated once for an object, then stored in self.featureDictList instead of being recalculated for every similarity call.

getFeatureDictionary(self, obj, objIndex)

See if a feature dictionary has already been created for the object at the specified objIndex. If so, just return that one. Otherwise build it and store it for future use.

If objIndex < 0, then it is unused, just build and return.

normalizeFeatureDictionary(self, featureDict)

Given a dictionary, interpret it as a feature vector, whose values are some numerical value. In that case, the vector can be interpreted to have a magnitude / length. Divide all elements (values) by this magnitude to normalize the vector to have a length of 1.0.

ensureListCapacity(self, aList, targetSize)

Ensure that the given list is at least the given size. If it is not currently, then keep appending None elements until that size is achieved.

prepareFeatureDictionaryList(self, objIter)

Pre-processing step. Calculate the feature dictionary for each object, and do so only once here. Then, subsequent calls can just to getFeatureDictionary(...) can just get the stored object from the list instead of recalculating it.

Class Variable Details

[hide private]

objIndex2

Temporary storage for the feature dictionaries calculated for any objects. If the members objIndex1 or objIndex2 are set, then fill in these values accordingly. That way, the next time they're needed, they can just be accessed directly instead of requiring another calculation.

Value:

-1

featureDictList

Value:

None