Package CHEM :: Package DB :: Package rdb :: Module BeautifulSoup :: Class MinimalSoup
[hide private]
[frames] | no frames]

Class MinimalSoup



          PageElement --+            
                        |            
                      Tag --+        
                            |        
markupbase.ParserBase --+   |        
                        |   |        
       sgmllib.SGMLParser --+        
                            |        
           BeautifulStoneSoup --+    
                                |    
                    BeautifulSoup --+
                                    |
                                   MinimalSoup
Known Subclasses:
RobustInsanelyWackAssHTMLParser

The MinimalSoup class is for parsing HTML that contains pathologically bad markup. It makes no assumptions about tag nesting, but it does know which tags are self-closing, that <script> tags contain Javascript and should not be parsed, that META tags may contain encoding information, and so on.

This also makes it better for subclassing than BeautifulStoneSoup or BeautifulSoup.

Instance Methods [hide private]

Inherited from BeautifulSoup: __init__, start_meta

Inherited from BeautifulStoneSoup: __getattr__, endData, handle_charref, handle_comment, handle_data, handle_decl, handle_entityref, handle_pi, isSelfClosingTag, parse_declaration, popTag, pushTag, reset, unknown_endtag, unknown_starttag

Inherited from Tag: __call__, __contains__, __delitem__, __eq__, __getitem__, __iter__, __len__, __ne__, __nonzero__, __repr__, __setitem__, __str__, __unicode__, append, childGenerator, fetch, fetchText, find, findAll, findChild, findChildren, first, firstText, get, has_key, prettify, recursiveChildGenerator, renderContents

Inherited from Tag (private): _getAttrMap

Inherited from PageElement: extract, fetchNextSiblings, fetchParents, fetchPrevious, fetchPreviousSiblings, findAllNext, findAllPrevious, findNext, findNextSibling, findNextSiblings, findParent, findParents, findPrevious, findPreviousSibling, findPreviousSiblings, insert, nextGenerator, nextSiblingGenerator, parentGenerator, previousGenerator, previousSiblingGenerator, replaceWith, setup, substituteEncoding, toEncoding

Inherited from PageElement (private): _findAll, _findOne, _lastRecursiveChild

Inherited from sgmllib.SGMLParser: close, convert_charref, convert_codepoint, convert_entityref, error, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_endtag, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, setliteral, setnomoretags, unknown_charref, unknown_entityref

Inherited from sgmllib.SGMLParser (private): _convert_ref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_marked_section, unknown_decl, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Class Variables [hide private]
  RESET_NESTING_TAGS = {}
  NESTABLE_TAGS = {}

Inherited from BeautifulSoup: CHARSET_RE, NESTABLE_BLOCK_TAGS, NESTABLE_INLINE_TAGS, NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS, NON_NESTABLE_BLOCK_TAGS, QUOTE_TAGS, SELF_CLOSING_TAGS

Inherited from BeautifulStoneSoup: HTML_ENTITIES, MARKUP_MASSAGE, ROOT_TAG_NAME, XML_ENTITIES, XML_ENTITY_LIST, i

Inherited from Tag: XML_SPECIAL_CHARS_TO_ENTITIES

Inherited from sgmllib.SGMLParser: entity_or_charref, entitydefs

Inherited from sgmllib.SGMLParser (private): _decl_otherchars