English 中文(简体)
如何证明是UTF-8的扼杀?
原标题:How can char[] represent an UTF-8 string?

在C11,在预设标准8中添加了新的字面。 这使一系列的果园,其文本被编码到UTF-8。 如何做到这一点? 是否签署了正常的果园? 由于签署协议,其使用的信息要少一点? 我的逻辑将表明,UTF-8的案文需要一系列未经签名的果园。

最佳回答

是否签署了正常的果园?

<代码>char为>>>> 签名unsign

此外,光标线“浮动”仍然可用于代表信息,<代码>char/code>不一定是大的8倍(在某些平台上可能更大)。

问题回答

There is a potential problem here:

If a implementation with CHAR_BIT = 8 use sign-magnized represent for char (so char/code> is signed),时,当UTF-8要求借方-pattern10000000时,负0. Soem>if >>> >执行进一步不支持负0,而任何特定的UTF-8 string可能含有的无效(cc)价值,引起问题。 哪怕是支持否定的零,那么借方模式(10000000<>/code>)与借方模式(0000<<>/code>)相比较。 (Nul terminator)在char[]/code>中使用UTF-8数据时可能会引起问题。

我认为,这意味着,对于C11号标书的执行,必须不签署<条码>。 通常,这取决于是否签署或未签署<条码>/条码>,但当然,如果签署<条码><>条/代码>,结果未能正确执行《八号总协定》,执行者就不得不自行签字。 除此以外,整个C++的非-2级补充执行也是如此,因为C++允许<条码><>>>和<条码>用于获取标语。 只允许<条码>未签署文件<>。

在2个辅助器和1个辅助器中,UTF-8数据所需的轨道型号是sign char/code>的有效数值,因此,可自由制作char,要么签署,要么未签署,要么仍然能够代表UTF-8在char[]上的方言。 这是因为所有256个轨道模式都是有效的2个辅助值,而UTF-8则不使用111111(1个辅助负0)。

不管怎么说,哪怕是哪怕是哪一点。 而UTF-8的规格本身并没有说这些果园必须未经签名。

PS Wat 是kookwekker voor n naam?

The signedness of char does not matter; utf8 can be handled with only shift and mask operations (which may be cumbersome for signed types, but not impossible) But: utf8 needs at least 8 bits, so "assert (CHAR_BIT >= 8);"

To illustrate by point: the following fragments contains no arithmetic operations on the character s value, only shift&mask.

static int eat_utf8(unsigned char *str, unsigned len, unsigned *target)
{
unsigned val = 0;
unsigned todo;

if (!len) return 0;

val = str[0];
if ((val & 0x80) == 0x00) { if (target) *target = val; return 1; }
else if ((val & 0xe0) == 0xc0) { val &= 0x1f; todo = 1; }
else if ((val & 0xf0) == 0xe0) { val &= 0x0f; todo = 2; }
else if ((val & 0xf8) == 0xf0) { val &= 0x07; todo = 3; }
else if ((val & 0xfc) == 0xf8) { val &= 0x03; todo = 4; }
else if ((val & 0xfe) == 0xfc) { val &= 0x01; todo = 5; }
else {  /* Default (Not in the spec) */
        if (target) *target = val;
        return -1; }


len--;str++;
if (todo > len) { return -todo; }

for(len=todo;todo--;) {
        /* For validity checking we should also
        ** test if ((*str & 0xc0) == 0x80) here */
        val <<= 6;
        val |= *str++ & 0x3f;
        }

if (target) *target = val;
return  1+ len;
}




相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...