Package CHEM :: Package DB :: Package rdb :: Module BeautifulSoup :: Class ICantBelieveItsBeautifulSoup
[hide private]
[frames] | no frames]

Class ICantBelieveItsBeautifulSoup



          PageElement --+            
                        |            
                      Tag --+        
                            |        
markupbase.ParserBase --+   |        
                        |   |        
       sgmllib.SGMLParser --+        
                            |        
           BeautifulStoneSoup --+    
                                |    
                    BeautifulSoup --+
                                    |
                                   ICantBelieveItsBeautifulSoup
Known Subclasses:
RobustWackAssHTMLParser

The BeautifulSoup class is oriented towards skipping over
common HTML errors like unclosed tags. However, sometimes it makes
errors of its own. For instance, consider this fragment:

 <b>Foo<b>Bar</b></b>

This is perfectly valid (if bizarre) HTML. However, the
BeautifulSoup class will implicitly close the first b tag when it
encounters the second 'b'. It will think the author wrote
"<b>Foo<b>Bar", and didn't close the first 'b' tag, because
there's no real-world reason to bold something that's already
bold. When it encounters '</b></b>' it will close two more 'b'
tags, for a grand total of three tags closed instead of two. This
can throw off the rest of your document structure. The same is
true of a number of other tags, listed below.

It's much more common for someone to forget to close a 'b' tag
than to actually use nested 'b' tags, and the BeautifulSoup class
handles the common case. This class handles the not-co-common
case: where you can't believe someone wrote what they did, but
it's valid HTML and BeautifulSoup screwed up by assuming it
wouldn't be.



Instance Methods [hide private]

Inherited from BeautifulSoup: __init__, start_meta

Inherited from BeautifulStoneSoup: __getattr__, endData, handle_charref, handle_comment, handle_data, handle_decl, handle_entityref, handle_pi, isSelfClosingTag, parse_declaration, popTag, pushTag, reset, unknown_endtag, unknown_starttag

Inherited from Tag: __call__, __contains__, __delitem__, __eq__, __getitem__, __iter__, __len__, __ne__, __nonzero__, __repr__, __setitem__, __str__, __unicode__, append, childGenerator, fetch, fetchText, find, findAll, findChild, findChildren, first, firstText, get, has_key, prettify, recursiveChildGenerator, renderContents

Inherited from Tag (private): _getAttrMap

Inherited from PageElement: extract, fetchNextSiblings, fetchParents, fetchPrevious, fetchPreviousSiblings, findAllNext, findAllPrevious, findNext, findNextSibling, findNextSiblings, findParent, findParents, findPrevious, findPreviousSibling, findPreviousSiblings, insert, nextGenerator, nextSiblingGenerator, parentGenerator, previousGenerator, previousSiblingGenerator, replaceWith, setup, substituteEncoding, toEncoding

Inherited from PageElement (private): _findAll, _findOne, _lastRecursiveChild

Inherited from sgmllib.SGMLParser: close, convert_charref, convert_codepoint, convert_entityref, error, feed, finish_endtag, finish_shorttag, finish_starttag, get_starttag_text, goahead, handle_endtag, handle_starttag, parse_endtag, parse_pi, parse_starttag, report_unbalanced, setliteral, setnomoretags, unknown_charref, unknown_entityref

Inherited from sgmllib.SGMLParser (private): _convert_ref

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_marked_section, unknown_decl, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Class Variables [hide private]
  I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = ['em', 'big', 'i'...
  I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = ['noscript']
  NESTABLE_TAGS = {'abbr': [], 'acronym': [], 'b': [], 'bdo': []...

Inherited from BeautifulSoup: CHARSET_RE, NESTABLE_BLOCK_TAGS, NESTABLE_INLINE_TAGS, NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS, NON_NESTABLE_BLOCK_TAGS, QUOTE_TAGS, RESET_NESTING_TAGS, SELF_CLOSING_TAGS

Inherited from BeautifulStoneSoup: HTML_ENTITIES, MARKUP_MASSAGE, ROOT_TAG_NAME, XML_ENTITIES, XML_ENTITY_LIST, i

Inherited from Tag: XML_SPECIAL_CHARS_TO_ENTITIES

Inherited from sgmllib.SGMLParser: entity_or_charref, entitydefs

Inherited from sgmllib.SGMLParser (private): _decl_otherchars

Class Variable Details [hide private]

I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS

Value:
['em',
 'big',
 'i',
 'small',
 'tt',
 'abbr',
 'acronym',
 'strong',
...

NESTABLE_TAGS

Value:
{'abbr': [],
 'acronym': [],
 'b': [],
 'bdo': [],
 'big': [],
 'blockquote': [],
 'center': [],
 'cite': [],
...