CHEM :: Kernel :: Util :: FeatureDictReader :: Class FeatureDictReader

Class FeatureDictReader

object --+    
         |    
      dict --+
             |
            FeatureDictReader

Utility class to dencode data feature vectors (represented as index:count lines in a text file) into a string:count feature dictionary objects. These were probably encoded in the first place with the matching "FeatureDictWriter" class. The basic strategy is to just back-translate the index numbers into the actual string representation features. This class (which extends the dict class) stores the index:feature mappings for subsequent access (note that this is the inverse of the FeatureDictWriter's feature:index mappings).

Note that this current implementation assumes the features are represented as simple text strings. If you want a more sophisticated object representation, you'll have to add your own layer above that can encode / decode the features into strings for storage in the plain text files.

Example Usage: (Note this may have problems as a doctest since the feature:index mapping order is arbitrary based on the "random" traversal of feature keys through the feature dictionaries.

>>> from cStringIO import StringIO
>>> infile = StringIO();        # doctest can't handle multi-line strings well
>>> print >> infile, "# 0 a"    # So write it out as a StringIO first
>>> print >> infile, "# 1 s"
>>> print >> infile, "# 2 d"
>>> print >> infile, "# 3 g"
>>> print >> infile, "# 4 f"
>>> print >> infile, "# 5 A"
>>> print >> infile, "# 6 F"
>>> print >> infile, "# 7 S"
>>> print >> infile, "# 8 D"
>>> print >> infile, "# 9 h"
>>> print >> infile, "# 10 G"
>>> print >> infile, "# 11 H"
>>> print >> infile, "asdfsdfg 0:1 1:2 2:2 3:1 4:2 "
>>> print >> infile, "asdfasdfASDF 0:2 1:2 2:2 4:2 5:1 6:1 7:1 8:1 "
>>> print >> infile, "dfghDFGH 2:1 3:1 4:1 6:1 8:1 9:1 10:1 11:1 "
>>> infile = StringIO(infile.getvalue());
>>>
>>> featureReader = FeatureDictReader(infile);
>>> # Read out and print the contents of each feature dictionary
>>> for (featureDict, description) in featureReader.iterFeatureDicts():
...     print description, str(featureDict)
asdfsdfg {'a': 1, 's': 2, 'd': 2, 'g': 1, 'f': 2}
asdfasdfASDF {'a': 2, 'A': 1, 'd': 2, 'F': 1, 'f': 2, 'S': 1, 's': 2, 'D': 1}
dfghDFGH {'D': 1, 'g': 1, 'F': 1, 'h': 1, 'f': 1, 'G': 1, 'H': 1, 'd': 1}

Instance Methods

[hide private]

__init__(self, infile)
Constructor expects an input file (object, not filename) to read from.

iterFeatureDicts(self)
Produce an iterator over the feature dictionary objects parsed out of the input file.

Inherited from dict: __cmp__, __contains__, __delitem__, __eq__, __ge__, __getattribute__, __getitem__, __gt__, __hash__, __iter__, __le__, __len__, __lt__, __ne__, __new__, __repr__, __setitem__, clear, copy, fromkeys, get, has_key, items, iteritems, iterkeys, itervalues, keys, pop, popitem, setdefault, update, values

Inherited from object: __delattr__, __reduce__, __reduce_ex__, __setattr__, __str__

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, infile)
(Constructor)

Constructor expects an input file (object, not filename) to read from.

Returns:

new empty dictionary

Overrides: dict.__init__

iterFeatureDicts(self)

Produce an iterator over the feature dictionary objects parsed out of the input file. In the current implementation, this method can only be called once as it iterates through the source file, after which time there's no guarantee we can trace back to the start of the file.

Note that the items returned are actually 2-ples. The first component is the feature dictionary itself while the second component is the description string specified for the data item.

Class FeatureDictReader

__init__(self, infile) (Constructor)

iterFeatureDicts(self)

init(self, infile)
(Constructor)