English 中文(简体)
从服务器获取文件时 Unicode 错误
原标题:unicode is wrong when get file from server

我想下载 < a href=\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

when I do it by browser,the unicode is correct and all things are right,but when I do it by curl or file_get_content it contain bad alphabets. what is difference and how should I solve it?

由 brower 下载

[[["سلام","hello","",""]],[["interjection",["سلام","هالو","الو"],[["سلام",["hello","hi","aloha","all hail"]],["هالو",["hallo","hello","halloo"]],["الو",["hello"]]]]],"en",,[["سلام",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["سلام",1000,0,0],["خوش",0,0,0],["میهمان گرامی",0,0,0],["خوش آمدید",0,0,0],["درود کاربر",0,0,0]],[[0,5]],"hello"]],,,[["en"]],65]

以以下php脚本下载 :

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<?php
$t = file_get_contents("http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello");
$f = fopen("t.txt", "w+");
fwrite($f, $t);
fclose($f);
?>
</body></html>
[[["ÓáÇã","hello","",""]],[["interjection",["ÓáÇã","åÇáæ","Çáæ"],[["ÓáÇã",["hello","hi","aloha","all hail"]],["åÇáæ",["hallo","hello","halloo"]],["Çáæ",["hello"]]]]],"en",,[["ÓáÇã",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["ÓáÇã",1000,0,0],["ÎæÔ",0,0,0],["ãu06CCåãÇä ÑÇãu06CC",0,0,0],["ÎæÔ ÂãÏu06CCÏ",0,0,0],["ÏÑæÏ ÇÑÈÑ",0,0,0]],[[0,5]],"hello"]],,,[["en"]],4]

页眉 :

Header are:
HTTP/1.1 200 OK
Pragma: no-cache
Date: Fri, 25 May 2012 22:29:12 GMT
Expires: Fri, 25 May 2012 22:29:12 GMT
Cache-Control: private, max-age=600
Content-Type: text/javascript; charset=UTF-8
Content-Language: fa
Set-Cookie: PREF=ID=b6c08a0545f50594:TM=1337984952:LM=1337984952:S=Sf1xcow2qPZrFeu0; expires=Sun, 25-May-2014 22:29:12 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Content-Disposition: attachment
Server: HTTP server (unknown)
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
最佳回答

添加参数 ie=UTF-8 >>oe=UTF-8 到url的查询字符串:

$t = file_get_contents("http://translate.google.com/translate_a/t?ie=UTF-8&oe=UTF-8&client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello");
问题回答

这曾经为我工作过一次,因为我正要向垃圾堆扔许多代码!

iconv(  CP1252 ,  UTF-8 , $string); 

echo 在 PHP 输出中从 file_get_contents 获得的内容应运行良好, 因为您正在从 UTF-8 JSON 响应 UTF-8 HTML 响应中运行。 在给定的 URL 中为我工作 。

当您存储到文件时, 您必须担心您正在使用的读取文件的工具的编码。 只要 < code> fwrite 正在运行中, 只要您看到的文本编辑器知道输出为 UTF-8 即可。 在 Windows 上, Notepad 可能会尝试在本地独立默认( ANSI) 代码页中读取它, 但它不会是 UTF-8 。 在西欧安装它时, 它将是代码第 1252页, 您就会得到像 < code\\\\\\\\\\\\\\\\\\\\\\\ < < 这样的输出 。

(其中一种方式是将一个 UTF-8 假的 BOM 放在文件前端, 使用 < code> fwrite ($f, "xefxbxbf"); 。 这是一个有点偏差, 因为 UTF-8 不需要字节顺序标记( 其字节顺序已经固定), 它打破了 UTF-8 s ACII- 兼容性, 但 Windows 工具如假的 BOMs 。 另一种方式是获得更好的文本编辑器, 使您可以默认将文件作为 UTF-8 处理 。)

这里有些有点不同, 因为 是您在 Windows 默认阿拉伯编码(代码第1256页)中保存 并随后在 Windows 默认西方编码(代码第1252页)中读到该编码时得到的 。 这意味着在您的测试中存在某种额外的存储和装入步骤, 这会干扰编码 。

如果与Windows命令行工具有关的话,你最好放弃,因为命令提示和 MSVCRT 应用程序对 Unicode 完全不起作用。





相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...