English 中文(简体)
How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?
原标题:

In the process of editing a file encoded as UTF-8 w/o [spurious] BOM the content might become devoid of any Unicode characters outside the ASCII or ANSI ranges. At the next reopening of the file, some text editors (Notepad++) will interpret it as ASCII/ANSI encoded and open it as such. Unaware of the change the user will continue editing, now adding non-ANSI Unicode characters, rendered however useless, since saved in ANSI. A menu option can exist (Notepad++) to open ANSI files as UTF-8 w/o BOM, but leading to the reverse issue of inadvertently overriding ANSI files with Unicode encoding.

问题回答

One workaround is to add a character outside the ANSI range to a comment in the file. Depending on the decoding algorithm, it might force the editor (Notepad++) to recognize the file as encoded in UTF-8 w/o BOM.

In a HTML document for example you could follow the charset definition in the header with such a Unicode comment, here the U+05D0 HEBREW LETTER ALEF: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <!-- א -->

How would you suggest that an editor tell the difference between ASCII/ANSI and UTF-8 w/o BOM, when the files look the same?

If you want guaranteed recognition of UTF-8 as UTF-8, either add the BOM, or force the file to contain UTF-8 characters.

Configure your editor to always use UTF-8 if possible, if not, complain to the creators of your editor. Charsets not targeting unicode are, IMO, deprecated and should be treated as such.

Files using only characters in the ASCII space (the 7-bit one) would be pretty much the same in UTF-8 anyway, so if you HAVE to deliver something in ASCII encoding, just don t type any unicode characters.





相关问题
How do you think Google is handling this encoding issue?

I recently came across an encoding issue specific to how Firefox encodes URLs directly entered into the address bar. It basically looks like the default Firefox character encoding for URLs is NOT UTF-...

PHP - UTF8 problem with German characters

I m at my wits end with this one, then I remember stack overflow. I have a site, http://bridgeserver3.co.uk/disklavier/de/ , the language is stored in a simple PHP file which looks like.. $...

URL Decoding in PHP

I am trying to decode this URL string using PHP s urldecode function: urldecode("Ant%C3%B4nio+Carlos+Jobim"); This is supposed to output... Antônio Carlos Jobim ...but instead is ouptutting this ...

File open error by using codec utf-8 in python

I execute following code on windows xp and python 2.6.4 But it show IOError. How to open file whose name has utf-8 codec. >>> open( unicode( 한글.txt , euc-kr ).encode( utf-8 ) ) Traceback ...

热门标签