Home | Trees | Indices | Help |
---|
|
Given a set of SMARTS patterns and molecules, counts how many times each SMARTS pattern (i.e. functional group) is found in each molecule. Also includes a script to generate the output in a format easily inserted into the application database. Assuming starting with some molecule and SMARTS files that have NOT been inserted to the database, a complete run, including inserting the product info into the database could be accomplished with the following from the command line: =========================================================================== python PatternMatchCounter.py molecule.smi example.smarts match.counter python DBUtil.py -imolecule.smi -tMOLECULE -omolecule.smi.id CAN_SMILES LABEL python DBUtil.py -iexample.smarts -tPATTERN -oexample.smarts.id SMARTS LABEL python PatternMatchCounter.py -dmatch.txt -cmatch.counter molecule.smi.id example.smarts.id python DBUtil.py -imatch.txt -tPATTERN_MATCH -omatch.txt.id MOLECULE_ID PATTERN_ID COUNT =========================================================================== Alternatively, if you wish to use reactants and SMARTS from the database, something like this: =========================================================================== python DBUtil.py "select CAN_SMILES, LABEL, MOLECULE_ID from MOLECULE" molecule.smi python DBUtil.py "select SMARTS, LABEL, PATTERN_ID from PATTERN" example.smarts python PatternMatchCounter.py molecule.smi example.smarts match.counter python PatternMatchCounter.py -dmatch.txt -cmatch.counter molecule.smi example.smarts python DBUtil.py -imatch.txt -tPATTERN_MATCH -omatch.txt.id MOLECULE_ID PATTERN_ID COUNT =========================================================================== Input: - Molecule file Can be any format understandable by oemolistream, assuming a properly named extension. For example, "molecules.smi" for SMILES format - SMARTS pattern file File containing one SMARTS pattern string per line that will be used to search the molecules Either of the above can take stdin as their source by specifying the filename "-" or ".smi" or something similar. See documentation of oemolistream for more information Output: - Match counter file For each molecule read from the molecule file, will output one line of counts, tab delimited. For each line, there will be one count per SMARTS pattern read. The values will appear in the same order as the SMARTS patterns were read, and the value will equal the number of times that SMARTS pattern was matched in the respective molecule. Again, redirection to stdout possible by specifying the filename "-".
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
Primary method, reads the source files to count pattern matches for the output file. See module documentation for more information. Note: This method takes actual File objects, not filenames, to allow the caller to pass "virtual Files" for the purpose of testing and interfacing. Use the "main" method to have the module take care of opening files from filenames. One extra catch, the molecule source is not a file, but an oemolistream necessary to take advantage of that classses high-level management of different molecule file formats |
Read the contents of the smartsFile as a list of SMARTS strings. Comment lines prefixed with "#" will be ignored. Expects one SMARTS string per line of the file. Each SMARTS string can be followed by any title / comment, etc. separated by whitespace. These will be ignored. Returns a list of OESubSearch objects, instantiated with the respective SMARTS string. |
Given the database IDs of molecules, patterns (SMARTS) and a counter matrix relating the two, generate a simple text file that should be very easy to import into the database to persist that association information. To trim the output a bit, you can set the sparse option to True to not generate rows for matches that yielded a count = 0 (no matches, which will be most common) Each line produced should correspond to a row in the PATTERN_MATCH table, with values to insert respective to MOLECULE_ID, PATTERN_ID and COUNT |
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:21 2007 | http://epydoc.sourceforge.net |