English 中文(简体)
读取 java 的 URL UTF-8 编码为 XML
原标题:Reading UTF-8 encoded XML from URL in java

我试图从 Google 天气网络服务读取 XML 数据 。 回复包含一些西班牙字符 。 问题在于这些字符没有被正确显示 。 我试图将所有字符转换为 UTF-8, 但似乎没有帮助 。 下面给出代码 。

public static void main(String[] args) {
    try {
    URL url = new URL("http://www.google.com/ig/api?weather=Noja&hl=es");
    HttpURLConnection con = (HttpURLConnection) url.openConnection();

    BufferedReader in = new BufferedReader(new InputStreamReader(
    con.getInputStream(), "UTF-8"));
    String str = in.readLine();
    //this does not work even                        
    //String str = new String(in.readLine().getBytes("UTF-8"),"UTF-8");
    System.out.println(str);

    in.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    }
}

产出如下(缩略以保持该职位的限额)。

修剪以保持最大字符限制值

<day_of_week data="mi�"/><day_of_week data="s�b"/><low data="11"/><high data="16"/><icon data="/ig/images/weather/chance_of_rain.gif"/><condition data="Posibilidad de lluvia"/></forecast_conditions></weather></xml_api_reply>
最佳回答

如果该页面是 xml, 那么通常您应该直接将输入Stream 传送到 xml 剖析器, 让它自动检测编码。 否则您应该查看 < a href=" http://docs. oracle. com/javase/6/docs/ api/java/ net/ URLConnonction. html# getContentType% 28% 29" rel=“ nofollow” >content type 响应头 < / a> 的字符集参数, 以确定正确的编码并创建适当的输入StreamReader 。

< 加强 > Edit: 服务器确实对浏览器和 java 客户端有不同编码的响应, 可能取决于 Accept- Charset 请求信头。 对于 Firefox, 此信头具有此值。

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

这意味着两个字符集都被接受, 任何一方都没有首选。 服务器以 Content- Type 信头回应 Content- Type text/ xml; charset=UTF-8 。 java 客户端不发送此信头, 服务器则以 text/ xml; chasset=ISO- 8859 响应 。

要使用服务器提供的字符集, 您可以使用以下代码 :

Matcher matcher = Pattern.compile("charset\s*=\s*([^ ;]+)").matcher(contentType);
String charset = "utf-8"; // default
if (matcher.find()) {
    charset = matcher.group(1);
}
System.out.println(con.getContentType());

BufferedReader in = new BufferedReader(new InputStreamReader(
    con.getInputStream(), charset));

< 加固> Edit 2: 输出服务器根据用户代理头选择要使用的字符集。 如果您添加下一行, 它会以 utf-8 的字符集响应 。

con.setRequestProperty("User-Agent", "Mozilla/5.0");

总之, Content-Type 响应信头包含正确的字符集。

问题回答

您的输入可能是正确的, 虽然我会使用 XML 解析器读取 XML, 而不是尝试将它解释为一行逐行的种子。 但是您的输出可能不正确 。

  1. What s the default char encoding of your JVM ? Check (and set) the confusingly named property -Dfile.encoding=UTF-8
  2. Do the requisite fonts etc. exist on your system ? Can you check the actual character codes you re outputting and not rely on your terminal settings ? I would suspect this is perhaps the case, since the encoding/decoding appears to work and you re just missing those individual characters.




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...