For example, I have: 11100011 10000010 10100010
. It is the binary of: ア
;
its number in UTF-8 is:12450
我怎样才能从二进制中得到这个数字?
For example, I have: 11100011 10000010 10100010
. It is the binary of: ア
;
its number in UTF-8 is:12450
我怎样才能从二进制中得到这个数字?
您显示的字节序列是字符的UTF-8编码版本。
您需要解码UTF-8以获得Unicode码位。
对于这个精确的字节序列,以下位组成了码点:
11100011 10000010 10100010
**** ****** ******
因此,将带星号的位连接起来,我们得到数字0011000010100010
,它等于0x30a2或十进制的12450。
请参阅维基百科描述了解如何解释编码的详细信息。
简而言之:如果在第一个字节中设置了位7,则也设置的相邻位(称为m)的数量(2)给出了此码点的后续字节数。对于第一个字节,从每个字节中提取的位数为(8-1-1-m),从每个后续字节中提取6位。所以这里我们得到(8-1-1-2)=4+2*6=16位。
正如评论中指出的那样,有很多库可以做到这一点,所以你可能不需要自己实现它。
从维基百科页面,我想到了这个:
unsigned utf8_to_codepoint(const char* ptr) {
if( *ptr < 0x80) return *ptr;
if( *ptr < 0xC0) throw unicode_error("invalid utf8 lead byte");
unsigned result=0;
int shift=0;
if( *ptr < 0xE0) {result=*ptr&0x1F; shift=1;}
if( *ptr < 0xF0) {result=*ptr&0x0F; shift=2;}
if( *ptr < 0xF8) {result=*ptr&0x07; shift=3;}
for(; shift>0; --shift) {
++ptr;
if (*ptr<0x7F || *ptr>=0xC0)
throw unicode_error("invalid utf8 continuation byte");
result <<= 6;
result |= *ptr&0x6F;
}
return result;
}
请注意,这是一个非常糟糕的实现(我非常怀疑它甚至可以编译),并且解析了许多可能不应该解析的无效值。我提出这个只是为了表明它比你想象的要困难得多,你应该使用一个好的unicode库。
I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...
I have been searching for sample code creating iterator for my own container, but I haven t really found a good example. I know this been asked before (Creating my own Iterators) but didn t see any ...
Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...
I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?
Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...
Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...
I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...
Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?