| Home | Trees | Indices | Help |
|---|
|
|
Given a set of SMARTS patterns and molecules, counts how many
times each SMARTS pattern (i.e. functional group) is found in
each molecule.
Also includes a script to generate the output in a format easily inserted
into the application database. Assuming starting with some molecule
and SMARTS files that have NOT been inserted to the database, a complete
run, including inserting the product info into the database could be
accomplished with the following from the command line:
===========================================================================
python PatternMatchCounter.py molecule.smi example.smarts match.counter
python DBUtil.py -imolecule.smi -tMOLECULE -omolecule.smi.id CAN_SMILES LABEL
python DBUtil.py -iexample.smarts -tPATTERN -oexample.smarts.id SMARTS LABEL
python PatternMatchCounter.py -dmatch.txt -cmatch.counter molecule.smi.id example.smarts.id
python DBUtil.py -imatch.txt -tPATTERN_MATCH -omatch.txt.id MOLECULE_ID PATTERN_ID COUNT
===========================================================================
Alternatively, if you wish to use reactants and SMARTS from the database, something like this:
===========================================================================
python DBUtil.py "select CAN_SMILES, LABEL, MOLECULE_ID from MOLECULE" molecule.smi
python DBUtil.py "select SMARTS, LABEL, PATTERN_ID from PATTERN" example.smarts
python PatternMatchCounter.py molecule.smi example.smarts match.counter
python PatternMatchCounter.py -dmatch.txt -cmatch.counter molecule.smi example.smarts
python DBUtil.py -imatch.txt -tPATTERN_MATCH -omatch.txt.id MOLECULE_ID PATTERN_ID COUNT
===========================================================================
Input:
- Molecule file
Can be any format understandable by oemolistream, assuming a properly
named extension. For example, "molecules.smi" for SMILES format
- SMARTS pattern file
File containing one SMARTS pattern string per line that will
be used to search the molecules
Either of the above can take stdin as their source by specifying the
filename "-" or ".smi" or something similar. See documentation of
oemolistream for more information
Output:
- Match counter file
For each molecule read from the molecule file, will output one
line of counts, tab delimited. For each line, there will be one count per
SMARTS pattern read. The values will appear in the same order as
the SMARTS patterns were read, and the value will equal the number
of times that SMARTS pattern was matched in the respective molecule.
Again, redirection to stdout possible by specifying the filename "-".
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
Primary method, reads the source files to count pattern matches for the output file. See module documentation for more information. Note: This method takes actual File objects, not filenames, to allow the caller to pass "virtual Files" for the purpose of testing and interfacing. Use the "main" method to have the module take care of opening files from filenames. One extra catch, the molecule source is not a file, but an oemolistream necessary to take advantage of that classses high-level management of different molecule file formats |
Read the contents of the smartsFile as a list of SMARTS strings.
Comment lines prefixed with "#" will be ignored.
Expects one SMARTS string per line of the file. Each SMARTS string can be followed
by any title / comment, etc. separated by whitespace. These will be ignored.
Returns a list of OESubSearch objects, instantiated with the respective SMARTS string.
|
Given the database IDs of molecules, patterns (SMARTS) and a counter matrix relating the two, generate a simple text file that should be very easy to import into the database to persist that association information. To trim the output a bit, you can set the sparse option to True to not generate rows for matches that yielded a count = 0 (no matches, which will be most common) Each line produced should correspond to a row in the PATTERN_MATCH table, with values to insert respective to MOLECULE_ID, PATTERN_ID and COUNT |
| Home | Trees | Indices | Help |
|---|
| Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:21 2007 | http://epydoc.sourceforge.net |