Package CHEM :: Package ML :: Package featureAnalysis :: Module FeatureDictListToMatrix :: Class FeatureDictListToMatrix
[hide private]
[frames] | no frames]

Class FeatureDictListToMatrix



Read the feature dict list of a molecule dataset (eg, combinedFeatures.dict), and output selected(top,middle,bottom,or random) number of features as a matrix according to the descending order of features' frequences in the whole dataset.

Instance Methods [hide private]
 
__init__(self, bitOrCount, selectionType, selectedFeatureNum, featureDictListFile)
Constructor.
 
sortFeatures(self, featureDictListFile)
Sort features according to the ascending order of features' frequences in the whole dataset.
 
getSelectedFeatureArray(self)
Return a matrix: MatrixNewA, which contains only selected features for all molecules.
Method Details [hide private]

__init__(self, bitOrCount, selectionType, selectedFeatureNum, featureDictListFile)
(Constructor)

 
Constructor. Setup default parameter values.  

The following 3 attributes are used in the first pass through all feature dicts.
- componentdict1: A Map from features to their corresponding column number. 
- componentdict2: Inverse of componentdict1, ie, a Map from the column number 
                  to the corresponding feature.
- assignedcolumn: Total num of all feature columns.

- selectedFeatureNum: Number of selected features of interests.
- bitFlag: Boolean variable. if bits used, bitFlag=True. 
                             if counts used, bitFlag=False.

The following attributes are used in another pass through all feature dicts to 
get selected features.
- newComponentdict1: Similar to componentdict1, except that newComponentdict1 
                     only maps selected features to the corresponding column number.
- newComponentdict2: Inverse of newComponentdict1
- newAssignedcolumn: 

sortFeatures(self, featureDictListFile)

 
Sort features according to the ascending order of features' frequences in the whole dataset. Set selectedIndices to store the indices of selected features.

getSelectedFeatureArray(self)

 

Return a matrix: MatrixNewA, which contains only selected features for all molecules.

One pass thru all feature dicts to get the top features of interests, and then save them into a matrix: MatrixNewA (row number = molecule count, col number = topFeatureNum)