Package CHEM :: Package ML :: Package features :: Module SpectrumExtractor :: Class SpectrumExtractor
[hide private]
[frames] | no frames]

Class SpectrumExtractor



BaseFeatureExtractor.BaseFeatureExtractor --+
                                            |
                                           SpectrumExtractor

Simple class to extract substrings as features of string objects. Features are based on the number of "k-mers" in a string. That is, all substrings of length k.

Conceptually, a feature vector of all possible k-mers is created for each string and has counts assigned to the elements for each respective k-mer that exists in the string.

This is a very large vector of length (n^k) where n is the number of letters in the "alphabet" of the string. That is, the number of possible distinct characters the string can contain. This is a sparse vector, mostly 0's, thus actual such arrays are not used to represent these arrays. Instead, a "feature dictionary" containing only found k-mers and their counts is created.

Instance Methods [hide private]
 
__init__(self)
Constructor.
 
loadOptions(self, options)
Load relevant options derived from an optparse.OptionParser into the state of this object.
 
__call__(self, obj)
Create a dictionary keyed by all the k-mers (k-length substrings) of the input string object, with values equal to the number of times that k-mer appears in the string.
 
objectDescription(self, obj)
Input is a string itself, so just return the input object itself

Inherited from BaseFeatureExtractor.BaseFeatureExtractor: getNameID, loadArgs, main, outputFeatures

Class Variables [hide private]
  k = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.SearchSent...

Inherited from BaseFeatureExtractor.BaseFeatureExtractor: inputFunction, inputIter, outFile, parser

Method Details [hide private]

__init__(self)
(Constructor)

 
Constructor. Initializes expected command-line options.
Overrides: BaseFeatureExtractor.BaseFeatureExtractor.__init__

loadOptions(self, options)

 

Load relevant options derived from an optparse.OptionParser into the state of this object.

Sub-classes should have this handle any of the options it added to the command-line parser via the constructor.
Overrides: BaseFeatureExtractor.BaseFeatureExtractor.loadOptions
(inherited documentation)

__call__(self, obj)
(Call operator)

 
Create a dictionary keyed by all the k-mers (k-length substrings) of the input string object, with values equal to the number of times that k-mer appears in the string.
Overrides: BaseFeatureExtractor.BaseFeatureExtractor.__call__

objectDescription(self, obj)

 
Input is a string itself, so just return the input object itself
Overrides: BaseFeatureExtractor.BaseFeatureExtractor.objectDescription

Class Variable Details [hide private]

k

Value:
None