JTidy and UTF-8 (international characters)

To make JTidy work correctly with UTF-8 strings and process international characters in a proper way, use the following code:

JAVA:

  1. Document doc = Tidy.createEmptyDocument();
  2.         try {
  3.             doc = tidy.parseDOM(new InputStreamReader(IOUtils.toInputStream(html), "UTF-8"), new NullWriter());
  4.         } catch (UnsupportedEncodingException e) {
  5.             log.error(e);
  6.         }

Leave a Reply