Question 1

One of the most used libraries to do character conversion is the ICU library http://icu-project.org/ It is e.g. used by some boost http://www.boost.org/ libraries.

Answer

One of the most used libraries to do character conversion is the ICU library http://icu-project.org/ It is e.g. used by some boost http://www.boost.org/ libraries.

Question 2

将UTF-16（Visual C++的格式）转换为UTF-8，然后可能从UTF-8转换为UCS-4（GCC的格式）是否是一个可接受的答案？

If so, then in Windows you could use the WideCharToMultiByte function (with CP_UTF8 for the CodePage parameter), for the first part of the conversion. Then you could either paste the resulting UTF-8 strings directly into your program, or convert them further. Here is a message showing how one person did it; you can also write your own code or do it manually (the official spec, with a section on exactly how to convert UTF-8 to UCS-4, can be found here). There may be an easier way, I m not overly familiar with the conversion stuff in Linux yet.

Answer

将UTF-16（Visual C++的格式）转换为UTF-8，然后可能从UTF-8转换为UCS-4（GCC的格式）是否是一个可接受的答案？

If so, then in Windows you could use the WideCharToMultiByte function (with CP_UTF8 for the CodePage parameter), for the first part of the conversion. Then you could either paste the resulting UTF-8 strings directly into your program, or convert them further. Here is a message showing how one person did it; you can also write your own code or do it manually (the official spec, with a section on exactly how to convert UTF-8 to UCS-4, can be found here). There may be an easier way, I m not overly familiar with the conversion stuff in Linux yet.

Question 3

You only need to worry about characters between xD800 and xDFFF inclusive. Every other character should map exactly the same from UTF-16 to UCS-4 when zero-filled.

Answer

You only need to worry about characters between xD800 and xDFFF inclusive. Every other character should map exactly the same from UTF-16 to UCS-4 when zero-filled.

Question 4

Ignacio is right, if you don t use some rare Chinese characters (or some extinct scripts), then the mapping is one to one. (the official "lingo" is "if you don t have characters outside BMP")

This is the algorithm, just in case: http://unicode.org/faq/utf_bom.html#utf16-3 But again, most likely useless for your real case.

您也可以使用来自 Unicode 的免费资源（ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF）

Answer

Ignacio is right, if you don t use some rare Chinese characters (or some extinct scripts), then the mapping is one to one. (the official "lingo" is "if you don t have characters outside BMP")

This is the algorithm, just in case: http://unicode.org/faq/utf_bom.html#utf16-3 But again, most likely useless for your real case.

您也可以使用来自 Unicode 的免费资源（ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF）

友情链接