Home | Trees | Indices | Help |
---|
|
Abstract base class for all similarity measure classes. Similarity classes should take pairs of objects and calculate some similarity score between them for use in a support vector machine (SVM) style machine-learning application.
Though this should be generalizable to any input object type as a genuine kernel function, current practical purposes expect that data objects will have run through one of the "kernel" classes to be mapped into a feature vector. More specifically, they will have extracted features into a feature dictionary, representing the sparse feature vector. Once those are ready, these similarity classes can then apply any of a variety of similarity measures to the dictionaries.
Ultimately, the scores generated from these kernels will probably be used to build a "Gram matrix" of scores. This abstract class provides a convenience methods for generating this matrix given a iterators over two lists to compare, outputting the matrix as a tab-delimited file.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
parser = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.Searc
|
|||
inputIterFactory1 = <CHEM.DB.rdb.search.NameRxnPatternMatching
|
|||
inputIterFactory2 = <CHEM.DB.rdb.search.NameRxnPatternMatching
|
|||
outFile = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.Sear
|
|
|
|
Similar to loadOptions, handle the args that come out of optparse.OptionParser. Subclass is responsible for translating the command-line arguments into actual input iterators and an output file object. Note that more than simple streaming file iterators are needed. Iterator "factories" are needed so that fresh iterators over the same data can be accessed multiple times. A default implementation is available here, assuming the arguments should be interpreted as names of feature dictionary files, parseable by FeatureDictReader. If this is not the case, the sub-class should override this method. For example using something like an oemolistream or simple file. This implementation automatically takes into consideration the likely scenario that both input iterators will be over the same source file, and can even accomodate both being the "-" stdin character. |
|
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is. This uses the "callable" interface, which means the object should be used like a function call. For example: >>> from DotProduct import DotProduct; >>> similarity = DotProduct(); >>> featureDict1 = {"a":1,"b":2,"c":3} >>> featureDict2 = {"z":3,"b":2,"c":1} >>> print similarity( featureDict1, featureDict2 ); # Note that the object looks like a function call 7 |
Utility method to calculate a similarity for every pair of objects that come out of the input iterator factories and output them to the outFile as a tab-delimited matrix of values. Output will have n rows corresponding to the 1st input iterator and m columns corresponding to the 2nd input iterator. |
|
parser
|
inputIterFactory1
|
inputIterFactory2
|
outFile
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:31 2007 | http://epydoc.sourceforge.net |