Package CHEM :: Package Kernel :: Module BondHistogramKernel :: Class BondHistogramKernel
[hide private]
[frames] | no frames]

Class BondHistogramKernel



BaseKernel.BaseKernel --+
                        |
                       BondHistogramKernel

Calculates a similarity score for molecules by comparing histograms of their bond lengths. Presently considers C-C, C-N and C-O bonds only (of any order), though this could easily change or become parameterized in the future.

The histogram is a count of all bonds with length fitting in the next "binWidth." For example, with a binWidth of 0.1, counts up all bonds with length in [0.0,0.1) then all those with length in [0.1,0.2) then [0.2,0.3), etc. until all bonds are accounted for. This yields a vector / histogram of bond counts. A dot product can then be computed across these vectors from two molecules to essentially count up the number of common bond lengths in the two.

Note that such a feature vector / histogram would be very sparse, mostly counts of 0, so a dictionary is built of only the non-zero count values instead.

Instance Methods [hide private]
 
__init__(self, binWidth)
 
similarity(self, obj1, obj2)
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two.
 
buildFeatureDictionary(self, mol)
Given an OEMolBase molecule object, look for all bonds of each bondType.
 
bondTypeMatches(self, bond, bondType)
Determine if the OEBondBase is of the bondType.
 
bondLength(self, mol, bond)
Returns the length of an OEBondBase, by calculating the distance between the endpoint atoms.

Inherited from BaseKernel.BaseKernel: dictionaryDotProduct, dictionaryEuclideanDistanceSquared, ensureListCapacity, getFeatureDictionary, normalizeFeatureDictionary, outputMatrix, prepareFeatureDictionaryList

Class Variables [hide private]
  BOND_TYPES = "C", "C", ("C", "N"), ("C", "O")
Width, in angstroms, of the bond length histograms
  binWidth = -1.0

Inherited from BaseKernel.BaseKernel: featureDictList, objIndex1, objIndex2

Method Details [hide private]

similarity(self, obj1, obj2)

 
Primary abstract method where, given two objects, should return an appropriate, non-negative, similarity score between the two. Up to the implementing class to define what this is.
Overrides: BaseKernel.BaseKernel.similarity
(inherited documentation)

buildFeatureDictionary(self, mol)

 

Given an OEMolBase molecule object, look for all bonds of each bondType. BondType is just a 2-ple containing the expected atomic symbols of the bond atoms.

Create a dictionary keyed by the bond type and the histogram bin index that the found bonds should be placed in, with values equal to the number of bonds with length that fit into that bin.

The bin index is just the number of times the binWidth can be wholly divided into the bond length. For example, for binWidth = 0.1, a bond length of 1.32 will be placed under bin index 13. Alternatively, you could say that bin index 13 contains a count for all bonds of length in [1.3,1.4)
Overrides: BaseKernel.BaseKernel.buildFeatureDictionary

bondTypeMatches(self, bond, bondType)

 
Determine if the OEBondBase is of the bondType. BondType is just a 2-ple containing the expected atomic symbols of the bond atoms. This comparison ignores sequence (CN matches CN or NC) as well as bond order (single bond, double bond, etc.)

bondLength(self, mol, bond)

 
Returns the length of an OEBondBase, by calculating the distance between the endpoint atoms. Requires a reference to the parent molecule to access coordinates.