English 中文(简体)
C++ 交叉形体(和Unicode)
原标题:Cross-platform strings (and Unicode) in C++

因此,我最后回到我的主要任务——把一个相当庞大的C++项目从Windows转到Mac。

Straight away I ve been hit by the problem where wchar_t is 16-bits on Windows but 32-bits on the Mac. This is a problem because all of the strings are represented by wchar_t and there will be string data going back and forth between Windows and Mac machines (in both on-disk data and network data forms). Because of the way in which it works it wouldn t be totally straightforward to convert the strings into some common format before sending and receiving the data.

我们还真正开始支持更多语言,因此,我们开始处理许多统法协会编码数据(以及处理右翼语言)。

现在,我可以在这里谈谈多种想法,给我带来比需要更多的问题,这就是为什么我提出这个问题。 我们再说一遍,把我们所有的内传数据储存起来,因为UTF-8具有许多意义。 它解决了沙果问题,即由于面积不同,这意味着我们可以轻易地支持多种语言,也大大减少了我们的记忆足迹(我们拥有LOT——多数是英文——载体——但似乎与许多人一样这样做。 难道我们又失踪了吗? 你必须处理一个明显的问题,即,在什么地方,时间长度可能低于储存这种数据记忆的大小。

还是利用UTF-16这一更好的想法? 或者,我们是否应当坚持 w子和书写代码,以便在我们读/读到磁盘或网络的地方改用 w子,而统法协会则改用吗?

我认识到,这很危险地接近于征求意见,但是,我们回过头看一些显而易见的事情,因为它似乎像许多统法协会编码级一样(例如)——但还有大量法典用于转换成/从统法协会编码,如:地方、iconv、utf-cpp和ICCU。

问题回答

在涉及档案或网络连接时,航道使用由星体确定的一项议定书。 不要依赖C++汇编者如何储存任何记忆。 对于统法协会的案文,这意味着选择编码和单令(kay,UTF-8对单令的谨慎)。 即使你目前希望支持的平台有类似的结构,但另一个具有不同行为的民众平台,或甚至为你现有的平台之一设立新的代表处,也有可能走到一起。

我倾向于使用UTF-8作为内部代表。 你们只会失去长途检查,实际上没有任何途径。 关于视窗软件转换,我使用我自己的Win32转换功能。 由于Mac and linux(main standard UTF-8-aware,无需转换任何东西)。 免费奖金:

  1. use plain old std::string.
  2. byte-wise network/stream transport.
  3. For most languages, nice memory footprint.
  4. For more functionality: utf8cpp

作为th:UTF-16处理,UTF-8处理通信和营地;储存。

Sure, any rule can be broken and this one is not carved in stone. But you have to know when it is ok to break it.

For instance it might be a good idea to use something else if the environment you are using wants something else. But Mac OS X APIs use UTF-16, same as Windows. So UTF-16 makes more sense. It is more straightforward to convert before you put/get things on the net (because you probably do it in 2-3 routines) than doing all the conversions to call OS APIs.

It also matter the type of application you develop. If it is something with very little text processing, and very little calls to the system (something like an email server that mostly moves things around without changing them), then UTF-8 might be a good choice.

因此,正如你可能煽动这一答案一样,“它取决于”。

ICU有一个C++级护卫班,统法协会编码





相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签