sunlabs.brazil.util
public class LexHTML extends LexML
This class differs slightly from LexML as follows: after certain tags,
like the <script>
tag, the body that follows is
uninterpreted data and ends only at the next, in this case,
</script>
tag, not at the just the next
"<" or ">" character. This is one way that HTML is not fully
compliant with XML.
The default set of tags that have this special processing is
<script>
, <style>
, and
<xmp>
. The user can change this by retrieving
the Vector of special tags via
getClosingTags
, and modifying it as needed.
Version: 2.2
Constructor Summary | |
---|---|
LexHTML(String str)
Creates a new HTML parser, which can be used to iterate over the
tokens in the given string.
|
Method Summary | |
---|---|
Vector | getClosingTags()
Get the set of HTML tags that have the special body-processing
behavior mentioned above. |
String | getTag()
Gets the tag name at the begining of the current tag. |
boolean | nextToken()
Advances to the next token, correctly handling HTML tags that have
the special body-processing behavior mentioned above.
|
void | replace(String str)
Changes the string that this LexHTML is parsing.
|
Parameters: str The HTML to parse.
Parameters: tags The array of case-insensitive tag names that are only closed by seeing their "slashed" version.
Returns: The lower-cased tag name, or null
if the
current token does not have a tag name.
See Also: LexML
This method returns the uninterpreted data making up the body of a
special HTML tag as a token of type LexML.STRING
, even
if the body was actually a comment or another tag.
Returns: true
if a token was found, false
if there were no more tokens left.
Parameters: str The string that this LexHTML should now parse.