When I open a multi-byte file, I get this:
I can see some duplicate characters in Unicode. For example, the character C can be represented by the code points U+0043 and U+0421. Why is this so?
When I open a multi-byte file, I get this:
If memory serves, Emacs will prompt the User for an encoding if it cannot determine one. When it makes a wrong determination you can use
C-x RET f coding RET
which will use coding
as the coding system for the visited file in the current buffer.
Short term, you can revisit the file with an alternate coding system with revert-buffer-with-coding-system
(select utf-16le then).
Middle term, you can bump the priority of that utf-16le encoding on load with prefer-coding-system
.
Long term, however, you d better try to understand why emacs did not pick the right encoding. I m not sure how I can help there though, short of digging inside the coding system guts, or at least have a file to reproduce.
EDIT: Does this file have a BOM ?
In xml files, Emacs takes this is big endian, while Windows takes this as little endian.
<?xml version="1.0" encoding="UTF-16"?>
<hi />
Trying something like encoding="UTF-16LE" or encoding="UTF16-16BE" will ruin the xml file after saving. It will take off the BOM. utf-16le no bom can be opened in Notepad.
I can see some duplicate characters in Unicode. For example, the character C can be represented by the code points U+0043 and U+0421. Why is this so?
Need to extract the initial character from a Korean word in MS-Excel and MS-Access. When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ . Is ...
I execute following code on windows xp and python 2.6.4 But it show IOError. How to open file whose name has utf-8 codec. >>> open( unicode( 한글.txt , euc-kr ).encode( utf-8 ) ) Traceback ...
I used lxml to parse some web page as below: >>> doc = lxml.html.fromstring(htmldata) >>> element in doc.cssselect(sometag)[0] >>> text = element.text_content() >>>...
The XML specification lists a bunch of Unicode characters that are either illegal or "discouraged". Given a string, how can I remove all illegal characters from it? I came up with the following ...
I am using Sandcastle Helpfile Builder to produce a helpfile (.chm). The project is a .shfbproj file, which is XML format, works with msbuild. I want to automatically update the Footer text that ...
When I open a multi-byte file, I get this:
• 如何在java印刷0x13Unicode nature?