C++ 交叉形体(和Unicode)
Cross-platform strings (and Unicode) in C++


Straight away I ve been hit by the problem where wchar_t is 16-bits on Windows but 32-bits on the Mac. This is a problem because all of the strings are represented by wchar_t and there will be string data going back and forth between Windows and Mac machines (in both on-disk data and network data forms). Because of the way in which it works it wouldn t be totally straightforward to convert the strings into some common format before sending and receiving the data.


现在,我可以在这里谈谈多种想法,给我带来比需要更多的问题,这就是为什么我提出这个问题。 我们再说一遍,把我们所有的内传数据储存起来,因为UTF-8具有许多意义。 它解决了沙果问题,即由于面积不同,这意味着我们可以轻易地支持多种语言,也大大减少了我们的记忆足迹(我们拥有LOT——多数是英文——载体——但似乎与许多人一样这样做。 难道我们又失踪了吗? 你必须处理一个明显的问题,即,在什么地方,时间长度可能低于储存这种数据记忆的大小。

还是利用UTF-16这一更好的想法? 或者,我们是否应当坚持 w子和书写代码,以便在我们读/读到磁盘或网络的地方改用 w子,而统法协会则改用吗?



在涉及档案或网络连接时,航道使用由星体确定的一项议定书。 不要依赖C++汇编者如何储存任何记忆。 对于统法协会的案文,这意味着选择编码和单令(kay,UTF-8对单令的谨慎)。 即使你目前希望支持的平台有类似的结构,但另一个具有不同行为的民众平台,或甚至为你现有的平台之一设立新的代表处,也有可能走到一起。

我倾向于使用UTF-8作为内部代表。 你们只会失去长途检查,实际上没有任何途径。 关于视窗软件转换,我使用我自己的Win32转换功能。 由于Mac and linux(main standard UTF-8-aware,无需转换任何东西)。 免费奖金:

  1. use plain old std::string.
  2. byte-wise network/stream transport.
  3. For most languages, nice memory footprint.
  4. For more functionality: utf8cpp


Sure, any rule can be broken and this one is not carved in stone. But you have to know when it is ok to break it.

For instance it might be a good idea to use something else if the environment you are using wants something else. But Mac OS X APIs use UTF-16, same as Windows. So UTF-16 makes more sense. It is more straightforward to convert before you put/get things on the net (because you probably do it in 2-3 routines) than doing all the conversions to call OS APIs.

It also matter the type of application you develop. If it is something with very little text processing, and very little calls to the system (something like an email server that mostly moves things around without changing them), then UTF-8 might be a good choice.



