Home | Trees | Indices | Help |
---|
|
object --+ | dict --+ | FeatureDictReader
Decodes data feature vectors (represented as index:value lines in a text file) into string:value feature dictionary objects.
These were probably encoded in the first place with the matching "FeatureDictWriter" class. The basic strategy is to just back-translate the index numbers into the actual string representation features. This class (which extends the dict class) stores the index:feature mappings for subsequent access (note that this is the inverse of the FeatureDictWriter's feature:index mappings).
Note that the default implementation assumes the features are represented as simple text strings. If you want a more sophisticated object representation, you'll have to extend this class and override the parseFeature method to translate the string into the feature object.
Example Usage: (Note this may have problems as a doctest since the feature:index mapping order is arbitrary based on the "random" traversal of feature keys through the feature dictionaries.>>> from cStringIO import StringIO >>> infile = StringIO(); # doctest can't handle multi-line strings well >>> print >> infile, "" # Test blank line robustness >>> print >> infile, "# 0 a" # So write it out as a StringIO first >>> print >> infile, "# 1 s" >>> print >> infile, "# 2 d" >>> print >> infile, "# 3 g" >>> print >> infile, "# 4 f" >>> print >> infile, "# 5 A" >>> print >> infile, "# 6 F" >>> print >> infile, "# 7 S" >>> print >> infile, "# 8 D" >>> print >> infile, "# 9 h" >>> print >> infile, "# 10 G" >>> print >> infile, "# 11 H" >>> print >> infile, "asdfsdfg UNKNOWN_ID 0:1 1:2 2:2 3:1 4:2 " >>> print >> infile, "asdfasdfASDF UNKNOWN_ID 0:2 1:2 2:2 4:2 5:1 6:1 7:1 8:1 " >>> print >> infile, "dfghDFGH UNKNOWN_ID 2:1 3:1 4:1 6:1 8:1 9:1 10:1 11:1 " >>> infile = StringIO(infile.getvalue()); >>> >>> featureReader = FeatureDictReader(infile); >>> # Read out and print the contents of each feature dictionary >>> for featureDict in featureReader: ... print str(featureDict) {'a': 1.0, 's': 2.0, 'd': 2.0, 'g': 1.0, 'f': 2.0} {'a': 2.0, 'A': 1.0, 'd': 2.0, 'F': 1.0, 'f': 2.0, 'S': 1.0, 's': 2.0, 'D': 1.0} {'D': 1.0, 'g': 1.0, 'F': 1.0, 'h': 1.0, 'f': 1.0, 'G': 1.0, 'H': 1.0, 'd': 1.0} >>> for description in featureReader.objDescriptions: ... print description asdfsdfg asdfasdfASDF dfghDFGH
|
|||
|
|||
|
|||
Inherited from Inherited from |
|
|||
|
|
|||
infile = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.Searc
|
|||
objDescriptions = <CHEM.DB.rdb.search.NameRxnPatternMatchingMo
|
|||
objNameIDs = <CHEM.DB.rdb.search.NameRxnPatternMatchingModel.S
|
|
|||
Inherited from |
|
|
|
Produce an iterator over the feature dictionary objects parsed out of the input file. This method implements the __iter__ interface which means you can do something as simple as:>>> from cStringIO import StringIO >>> reader = FeatureDictReader(StringIO("")) >>> for (featureDict, objDescr) in reader: ... print objDescr, featureDict However, this overrides the normal meaning of the dictionary __iter__ method. Normally the __iter__ method should get the feature *keys* stored in the reader dictionary rather than data pairs. To access the keys in this way, they must instead be accessed explicitly with the iterkeys() method. This method can only be called once as it iterates through the source file, after which time there's no guarantee we can trace back to the start of the file. If you do wish to be able to produce multiple iterators over the same data, use the FeatureDictReaderFactory which will create a temp file as needed to generate as many FeatureDictReader iterators as requested.
|
|
infile
|
objDescriptions
|
objNameIDs
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:32 2007 | http://epydoc.sourceforge.net |