Package CHEM :: Package ML :: Package Similarity :: Module BaseSimilarity :: Class BaseSimilarity

Class BaseSimilarity

Abstract base class for all similarity measure classes. Similarity classes should take pairs of objects and calculate some similarity score between them for use in a support vector machine (SVM) style machine-learning application.

Though this should be generalizable to any input object type as a genuine kernel function, current practical purposes expect that data objects will have run through one of the "kernel" classes to be mapped into a feature vector. More specifically, they will have extracted features into a feature dictionary, representing the sparse feature vector. Once those are ready, these similarity classes can then apply any of a variety of similarity measures to the dictionaries.

Ultimately, the scores generated from these kernels will probably be used to build a "Gram matrix" of scores. This abstract class provides a convenience methods for generating this matrix given a iterators over two lists to compare, outputting the matrix as a tab-delimited file.

Instance Methods

[hide private]

__init__(self)
Default constructor.

loadOptions(self, options)
Given an options object derived from an optparse.OptionParser, load any options of interest into the state of this object.

loadArgs(self, args)
Similar to loadOptions, handle the args that come out of optparse.OptionParser.

main(self, argv)
Main method, callable from command line.

__call__(self, obj1, obj2)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.

outputMatrix(self, inputIterFactory1, inputIterFactory2, outFile)
Utility method to calculate a similarity for every pair of objects that come out of the input iterator factories and output them to the outFile as a tab-delimited matrix of values.

Class Variables

[hide private]

parser = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.Searc...

inputIterFactory1 = <CHEM.DB.rdb.search.NameRxnPatternMatching...

inputIterFactory2 = <CHEM.DB.rdb.search.NameRxnPatternMatching...

outFile = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.Sear...

Method Details

[hide private]

init(self)
(Constructor)

Default constructor. Sets up expected command-line options. Sub-classes can add their own options on top of these, though should beware of overwriting an option letter.

loadOptions(self, options)

Given an options object derived from an optparse.OptionParser, load any options of interest into the state of this object. Sub-classes should have this handle any of the options it added to the command-line parser via the constructor.

loadArgs(self, args)

Similar to loadOptions, handle the args that come out of optparse.OptionParser. Subclass is responsible for translating the command-line arguments into actual input iterators and an output file object.

Note that more than simple streaming file iterators are needed. Iterator "factories" are needed so that fresh iterators over the same data can be accessed multiple times.

A default implementation is available here, assuming the arguments should be interpreted as names of feature dictionary files, parseable by FeatureDictReader. If this is not the case, the sub-class should override this method. For example using something like an oemolistream or simple file.

This implementation automatically takes into consideration the likely scenario that both input iterators will be over the same source file, and can even accomodate both being the "-" stdin character.

main(self, argv)

Main method, callable from command line. Setup several common options that all of the sub-classes will share.

call(self, obj1, obj2)
(Call operator)

Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.

This uses the "callable" interface, which means the object should be used like a function call. For example: >>> from DotProduct import DotProduct; >>> similarity = DotProduct(); >>> featureDict1 = {"a":1,"b":2,"c":3} >>> featureDict2 = {"z":3,"b":2,"c":1} >>> print similarity( featureDict1, featureDict2 ); # Note that the object looks like a function call 7

outputMatrix(self, inputIterFactory1, inputIterFactory2, outFile)

Utility method to calculate a similarity for every pair of objects that come out of the input iterator factories and output them to the outFile as a tab-delimited matrix of values.

Output will have n rows corresponding to the 1st input iterator and m columns corresponding to the 2nd input iterator.

Class Variable Details

[hide private]

parser

Value:

None

inputIterFactory1

Value:

None

inputIterFactory2

Value:

None

outFile

Value:

None

Class BaseSimilarity

__init__(self) (Constructor)

loadOptions(self, options)

loadArgs(self, args)

main(self, argv)

__call__(self, obj1, obj2) (Call operator)

outputMatrix(self, inputIterFactory1, inputIterFactory2, outFile)

parser

inputIterFactory1

inputIterFactory2

outFile

init(self)
(Constructor)

call(self, obj1, obj2)
(Call operator)