Question

请考虑以下守则:

NSString *string = @"ä";
const char *str1 = [string cStringUsingEncoding:NSUTF8StringEncoding];
const char *str2 = "ä";
NSLog(@"C string comparison: %d",strcmp(str1,str2));
NSLog(@"str1: "%s"", str1);
NSLog(@"str2: "%s"", str2);

如果从一个全新的基金会项目中运行,该方案的产出如下:

C string comparison: 0
str1: "√§"
str2: "√§"

这正是我所期望的,因为弦应该是一样的。

然而,如果我在另一个代码库的深处运行这个完全相同的代码, 我得到这个输出:

C string comparison: 31
str1: "‚àö¬ß"
str2: "√§"

我敢肯定两个文件都在 UTF-8 编码中, 不同的文件编码是解释这种行为的唯一可能的解释,对吧?

在第二个案子中,有什么想法会出错?我怎么解决呢?

(也许我应该提到,在第二种情况下,代码是在一个 .mm 文件中运行的,即在目标C++下。这可以解释吗? )

Answer 1

如何在磁盘上编码源文件是一回事。编译者 < em> 相信 < / em> 如何编码是另一回事。默认情况下, 海合会假设 UTF-8, 但可以从本地端或 < code>- finput- charset@lt; charset> 选项的另一种编码中得知。我希望 Clang 支持同样的东西。

Xcode 有自己的源文件编码概念。我不知道它是否调整了编译命令, 以通过上述选项, 但我并不惊讶。

海合会还有一个执行字符集的概念。这是它如何将字符串写入二进制字符串。见 < code>- fexec- charseet@lt; charset> 选项。

所以, 编译者根据输入字符集来解释文件的字节, 并将其写入执行字符集中的二进制。如果两者不同, 则涉及到转换。这是每个翻译单位的事务, 因此对于不同的源文件来说, 它可能发生不同。

在 UTF-8 中, “ä” 在 Unicode 有两种可能的表达方式。您可以在 UIAERESIS (U+00E4) 上使用 LATIN SMALL LUT A (U+0061), 或者可以使用 LATIN SMAL LUT A (U+0061), 然后是 COMBINININING DIARESIS (U+0308)。在 UTF-8 中, “ ä” 可以是 0xC3 0xA4 vs. 0x61 0xCC 0xx88。您的两个源文件可能以不同的方式表达相同的字符。这意味着它们确实包含不同的字符串( : C字符串, < codecode> string , 尽管 < sring 将忽略 < code > 的这一差异, 如果 < coded> > < NSLITELSearch code > 的方法没有具体说明, 则会更加严重。

所以, 您需要跟踪包含相关字符串的具体源文件。请检查一个十六进制倾弃处, 确切显示它们包含的“ 强” 字节 < / 强” 。请检查用于编译它们的命令( 如果本地人可以发挥作用, 也有可能是环境), 以查看编译者对输入和可执行字符集的看法。

Answer 2

你可以试试用你角色的 Unicode 版本代替吗?

或(或)

NSString * string1 = @"u00e4" ;

参见http://blog.ablepear.com/2010/07/c-tuesday-unicode-string.html

Answer 3

发自Document :

The returned C string is guaranteed to be valid only until either the receiver is freed, or until the current autorelease pool is emptied, whichever occurs first.

I think in your case either the receiver is freed, or current autorelease pool is emptied.
For example

NSString *string = @"ä";
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
const char *str3 = [string cStringUsingEncoding:NSUTF8StringEncoding];
[pool release];
NSLog(@"str1: "%s"", str3);
const char *str2 = "ä";
NSLog(@"C string comparison: %d",strcmp(str3,str2));
NSLog(@"str2: "%s"", str2);

输出为

2012-05-22 17:14:50.069 test[32895:a0f] str1: "√§"
2012-05-22 17:14:50.071 test[32895:a0f] C string comparison: -195
2012-05-22 17:14:50.074 test[32895:a0f] str2: "√§" 



NSString *string = @"ä";
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
const char *str3 = [string cStringUsingEncoding:NSUTF8StringEncoding];
[pool release];
const char *str2 = "ä";
NSLog(@"C string comparison: %d",strcmp(str3,str2));
NSLog(@"str1: "%s"", str3);
NSLog(@"str2: "%s"", str2);

输出为

2012-05-22 17:19:13.226 test[33153:a0f] C string comparison: 0
2012-05-22 17:19:13.228 test[33153:a0f] str1: ""
2012-05-22 17:19:13.229 test[33153:a0f] str2: "√§"

友情链接