English 中文(简体)
Javaing 字串字符集解析
原标题:Java String charset parsing

我正在使用 Jsoup API 浏览一些网页。 但我在 1 个字符集中得到了页面, 并不得不将其解析到其它

问题:如何将第1行分析为第2行?

String str1 = "Um grupo ligado à al-Qaeda assumiu o "
    + "ataque e ameaçou fazer outros.";

String str2 = "Um grupo ligado à al-Qaeda assumiu o "
    + "ataque e ameaçou fazer outros.";

//(The text above translate to some news about WTC)
问题回答

我不是这方面的专家,但我相信,你所寻求的答案是在http://www.davidcraddock.net/tag/beautifulsoup/

有点像 JTidy 解决方案 : 命名的实体, 如 < code_ amp; agrave; 定义在 w3c. org 的.dtd 文件中, 这些文件维持 HTML < code_ lt;!! DOCTYPE... 。 本地复制它们并分析它们( 容易) 。 然后您可以立即用 Unicode 字符替换实体, 或者创建数字实体 。

我还没有真正测试 < a href=>" "http://jsoup.org/" rel="no follow" > Jsoup ,但是 < a href="http://jtidy.sourceforge.net/" rel="nofolge" > JTidy 在我需要使用类 org.w3c.tidy.Tidy 将HTML转换为 XML时非常有用。 这将自动转换实体 。

static String str1 = "Um grupo ligado &agrave; al-Qaeda assumiu o "
        + "ataque e amea&ccedil;ou fazer outros.";

public static void main(String[] args) throws Exception {
    System.out.println(cleanData(str1));
}

private static String cleanData(String data) throws UnsupportedEncodingException {
    Tidy tidy = new Tidy();
    tidy.setNumEntities(true); // to num entities
    tidy.setPrintBodyOnly(true); // only print the content
    tidy.setWraplen(Integer.MAX_VALUE); // wrap
    ByteArrayInputStream inputStream = new ByteArrayInputStream(data.getBytes("UTF-8"));
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    tidy.parseDOM(inputStream, outputStream);
    return outputStream.toString("UTF-8");
}

如果您愿意,您也可以得到 Document 实例。

public org.w3c.dom.Document parseDOM(Reader in, Writer out)
public org.w3c.dom.Document parseDOM(InputStream in, OutputStream out)




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签