com.arsdigita.util
Class HtmlToText

java.lang.Object
  extended bycom.arsdigita.util.HtmlToText

public class HtmlToText
extends Object

Generates a best-guess plain text version of an HTML fragment. Parses the HTML and does some simple formatting. The parser and formatting are pretty stupid, but it's better than nothing.

Based on the ACS 4.0 Tcl conversion routines by Lars Pind and Aaron Swartz. In fact, its a direct port of that code to Java with very few changes.

Intended usage is allocate an HtmlToText object statically and then reuse it by calling its convert method. The class is not thread-safe for simultaneous access, so you should synchronize on your conversion object if collisions are possible.

Example:

 static HtmlToText htmlToText = new HtmlToText();

 synchronize(htmlToText) {
     String html = htmlToText.convert(text);
 }
 

Version:
$Id: //core-platform/dev/src/com/arsdigita/util/HtmlToText.java#9 $

Constructor Summary
HtmlToText()
          Constructor.
 
Method Summary
 String convert(String input)
          Convert HTML input to plain text output.
static String generateHTMLText(String text, String formatType)
          Returns HTML text, converted from the following: HTML -- returns the input pre-formatted - returns the input wrapped in <pre> tags plain - returns the input converted to HTML.
 void setMaxLength(int maxlen)
          Sets the maximum line length for wrapping text.
 void setShowTags(boolean showtags)
          Sets the flags for whether unrecognized HTML tags are copied to the output.
 String toString()
          Returns the last converted text block as a String.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HtmlToText

public HtmlToText()
Constructor.

Method Detail

setMaxLength

public void setMaxLength(int maxlen)
Sets the maximum line length for wrapping text. The value of maxlen must be greater than zero, otherwise it is simply ignored. Must be set prior to calling convert().

Parameters:
maxlen - the maximum number of character in an output line.

setShowTags

public void setShowTags(boolean showtags)
Sets the flags for whether unrecognized HTML tags are copied to the output. If set to false, these tags are simply ignored. Must be set prior to calling convert().


toString

public String toString()
Returns the last converted text block as a String.


convert

public String convert(String input)
Convert HTML input to plain text output. If the input does not contain any embedded HTML tags this will return a new String that is equal to the input String, with an optional newline character at the end.


generateHTMLText

public static String generateHTMLText(String text,
                                      String formatType)
Returns HTML text, converted from the following:

Parameters:
formatType - one of the types defined in MessageType.


Copyright (c) 2004 Red Hat, Inc. Corporation. All Rights Reserved. Generated at July 20 2004:2337 UTC