Home | Trees | Indices | Help |
---|
|
object --+ | dict --+ | FeatureDictReader
Utility class to dencode data feature vectors (represented as index:count lines in a text file) into a string:count feature dictionary objects. These were probably encoded in the first place with the matching "FeatureDictWriter" class. The basic strategy is to just back-translate the index numbers into the actual string representation features. This class (which extends the dict class) stores the index:feature mappings for subsequent access (note that this is the inverse of the FeatureDictWriter's feature:index mappings).
Note that this current implementation assumes the features are represented as simple text strings. If you want a more sophisticated object representation, you'll have to add your own layer above that can encode / decode the features into strings for storage in the plain text files.
Example Usage: (Note this may have problems as a doctest since the feature:index mapping order is arbitrary based on the "random" traversal of feature keys through the feature dictionaries.>>> from cStringIO import StringIO >>> infile = StringIO(); # doctest can't handle multi-line strings well >>> print >> infile, "# 0 a" # So write it out as a StringIO first >>> print >> infile, "# 1 s" >>> print >> infile, "# 2 d" >>> print >> infile, "# 3 g" >>> print >> infile, "# 4 f" >>> print >> infile, "# 5 A" >>> print >> infile, "# 6 F" >>> print >> infile, "# 7 S" >>> print >> infile, "# 8 D" >>> print >> infile, "# 9 h" >>> print >> infile, "# 10 G" >>> print >> infile, "# 11 H" >>> print >> infile, "asdfsdfg 0:1 1:2 2:2 3:1 4:2 " >>> print >> infile, "asdfasdfASDF 0:2 1:2 2:2 4:2 5:1 6:1 7:1 8:1 " >>> print >> infile, "dfghDFGH 2:1 3:1 4:1 6:1 8:1 9:1 10:1 11:1 " >>> infile = StringIO(infile.getvalue()); >>> >>> featureReader = FeatureDictReader(infile); >>> # Read out and print the contents of each feature dictionary >>> for (featureDict, description) in featureReader.iterFeatureDicts(): ... print description, str(featureDict) asdfsdfg {'a': 1, 's': 2, 'd': 2, 'g': 1, 'f': 2} asdfasdfASDF {'a': 2, 'A': 1, 'd': 2, 'F': 1, 'f': 2, 'S': 1, 's': 2, 'D': 1} dfghDFGH {'D': 1, 'g': 1, 'F': 1, 'h': 1, 'f': 1, 'G': 1, 'H': 1, 'd': 1}
|
|||
|
|||
|
|||
Inherited from Inherited from |
|
|||
Inherited from |
|
|
Produce an iterator over the feature dictionary objects parsed out of the input file. In the current implementation, this method can only be called once as it iterates through the source file, after which time there's no guarantee we can trace back to the start of the file. Note that the items returned are actually 2-ples. The first component is the feature dictionary itself while the second component is the description string specified for the data item. |
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:31 2007 | http://epydoc.sourceforge.net |