CHEM.ML.Util.FeatureDictReader

Decodes data feature vectors (represented as index:value lines in a text file) into string:value feature dictionary objects.

These were probably encoded in the first place with the matching "FeatureDictWriter" class. The basic strategy is to just back-translate the index numbers into the actual string representation features. This class (which extends the dict class) stores the index:feature mappings for subsequent access (note that this is the inverse of the FeatureDictWriter's feature:index mappings).

Note that the default implementation assumes the features are represented as simple text strings. If you want a more sophisticated object representation, you'll have to extend this class and override the parseFeature method to translate the string into the feature object.

init(self, infile)
(Constructor)

Constructor expects an input file (object, not filename) to read from.

Returns:

new empty dictionary

Overrides: dict.__init__

parseFeature(featureStr)
Static Method

Given a string representation of the feature, extracted from the feature file, return the actual feature object to key the dictionaries by. For string based kernels (and by default), this can just be the string itself.

iter(self)

Produce an iterator over the feature dictionary objects parsed out of the input file.

This method implements the __iter__ interface which means you can do something as simple as:

>>> from cStringIO import StringIO
>>> reader = FeatureDictReader(StringIO(""))
>>> for (featureDict, objDescr) in reader:
...     print objDescr, featureDict

However, this overrides the normal meaning of the dictionary __iter__ method. Normally the __iter__ method should get the feature *keys* stored in the reader dictionary rather than data pairs. To access the keys in this way, they must instead be accessed explicitly with the iterkeys() method.

This method can only be called once as it iterates through the source file, after which time there's no guarantee we can trace back to the start of the file. If you do wish to be able to produce multiple iterators over the same data, use the FeatureDictReaderFactory which will create a temp file as needed to generate as many FeatureDictReader iterators as requested.

Overrides: dict.__iter__

Class FeatureDictReader

init(self, infile)
(Constructor)

parseFeature(featureStr)
Static Method

iter(self)

infile

objDescriptions

objNameIDs

Class FeatureDictReader

__init__(self, infile) (Constructor)

parseFeature(featureStr) Static Method

__iter__(self)

infile

objDescriptions

objNameIDs

init(self, infile)
(Constructor)

parseFeature(featureStr)
Static Method

iter(self)