Class BondHistogramKernel
BaseKernel.BaseKernel --+
|
BondHistogramKernel
Calculates a similarity score for molecules by comparing histograms of
their bond lengths. Presently considers C-C, C-N and C-O bonds only (of
any order), though this could easily change or become parameterized in
the future.
The histogram is a count of all bonds with length fitting in the next
"binWidth." For example, with a binWidth of 0.1, counts up all
bonds with length in [0.0,0.1) then all those with length in [0.1,0.2)
then [0.2,0.3), etc. until all bonds are accounted for. This yields a
vector / histogram of bond counts. A dot product can then be computed
across these vectors from two molecules to essentially count up the
number of common bond lengths in the two.
Note that such a feature vector / histogram would be very sparse,
mostly counts of 0, so a dictionary is built of only the non-zero count
values instead.
|
|
|
similarity(self,
obj1,
obj2)
Primary abstract method where, given two objects, should return an
appropriate, non-negative, similarity score between the two. |
|
|
|
|
|
bondTypeMatches(self,
bond,
bondType)
Determine if the OEBondBase is of the bondType. |
|
|
|
bondLength(self,
mol,
bond)
Returns the length of an OEBondBase, by calculating the distance
between the endpoint atoms. |
|
|
Inherited from BaseKernel.BaseKernel :
dictionaryDotProduct ,
dictionaryEuclideanDistanceSquared ,
ensureListCapacity ,
getFeatureDictionary ,
normalizeFeatureDictionary ,
outputMatrix ,
prepareFeatureDictionaryList
|
|
BOND_TYPES = "C", "C", ("C", "N"), ("C", "O")
Width, in angstroms, of the bond length histograms
|
|
binWidth = -1.0
|
Inherited from BaseKernel.BaseKernel :
featureDictList ,
objIndex1 ,
objIndex2
|
similarity(self,
obj1,
obj2)
|
|
Primary abstract method where, given two objects, should return an
appropriate, non-negative, similarity score between the two. Up to the
implementing class to define what this is.
- Overrides:
BaseKernel.BaseKernel.similarity
- (inherited documentation)
|
buildFeatureDictionary(self,
mol)
|
|
Given an OEMolBase molecule object, look for all bonds of each
bondType. BondType is just a 2-ple containing the expected atomic
symbols of the bond atoms.
Create a dictionary keyed by the bond type and the histogram bin index
that the found bonds should be placed in, with values equal to the number
of bonds with length that fit into that bin.
The bin index is just the number of times the binWidth can be wholly
divided into the bond length. For example, for binWidth = 0.1, a bond
length of 1.32 will be placed under bin index 13. Alternatively, you
could say that bin index 13 contains a count for all bonds of length in
[1.3,1.4)
- Overrides:
BaseKernel.BaseKernel.buildFeatureDictionary
|
bondTypeMatches(self,
bond,
bondType)
|
|
Determine if the OEBondBase is of the bondType. BondType is just a
2-ple containing the expected atomic symbols of the bond atoms. This
comparison ignores sequence (CN matches CN or NC) as well as bond order
(single bond, double bond, etc.)
|
bondLength(self,
mol,
bond)
|
|
Returns the length of an OEBondBase, by calculating the distance
between the endpoint atoms. Requires a reference to the parent molecule
to access coordinates.
|