Home | Trees | Indices | Help |
---|
|
PageElement --+ | Tag --+ | markupbase.ParserBase --+ | | | sgmllib.SGMLParser --+ | BeautifulStoneSoup --+ | BeautifulSoup --+ | ICantBelieveItsBeautifulSoup
The BeautifulSoup class is oriented towards skipping over common HTML errors like unclosed tags. However, sometimes it makes errors of its own. For instance, consider this fragment: <b>Foo<b>Bar</b></b> This is perfectly valid (if bizarre) HTML. However, the BeautifulSoup class will implicitly close the first b tag when it encounters the second 'b'. It will think the author wrote "<b>Foo<b>Bar", and didn't close the first 'b' tag, because there's no real-world reason to bold something that's already bold. When it encounters '</b></b>' it will close two more 'b' tags, for a grand total of three tags closed instead of two. This can throw off the rest of your document structure. The same is true of a number of other tags, listed below. It's much more common for someone to forget to close a 'b' tag than to actually use nested 'b' tags, and the BeautifulSoup class handles the common case. This class handles the not-co-common case: where you can't believe someone wrote what they did, but it's valid HTML and BeautifulSoup screwed up by assuming it wouldn't be.
|
|||
Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from |
|
|||
I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS =
|
|||
I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS =
|
|||
NESTABLE_TAGS =
|
|||
Inherited from Inherited from Inherited from Inherited from Inherited from |
|
I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS
|
NESTABLE_TAGS
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0beta1 on Thu Nov 8 17:49:29 2007 | http://epydoc.sourceforge.net |