English 中文(简体)
绕道UTF-8
原标题:Convert string to UTF-8 escape sequence

在我的C++方案中,我想转变一个 st子:如:

abc €

a. 进入UTF-8越轨:

abc%20%E2%82%AC

我需要把它作为独立的平台! 我发现,所有的解决办法都只是放在窗户上。 必须找到解决办法?

最佳回答

在C++11之前,标准中没有规定对UTF-8的支持。

这里有两个步骤:

  • convert to UTF-8 (unless it s already in UTF-8)
  • URL-escape the result (update: James Kanze covers this part)

假设你知道哪些特性可证明投入的用途,他们中任何一个都很难以可突然的方式书写。 这就意味着其他人以前已经这样做了,你就没有必要写这封信。 如果你单独找他们的话,你就能够找到每个步骤依赖平台的代码。

Note there are two different ways to URL-escape a space character, either as + or as %20. Your example uses %20, so if that s important to you then don t accidentally use a URL-escape routine that does the other.

它不是ISO-拉丁美洲-1,因为没有欧洲的标志[**],但它可能是Windows CP-1252。

[**] Unless it s been added recently. Anyway, your example codes the Euro sign as UTF-8 bytes 0xE2 0x82 0xAC, which represent the Unicode code point 0x20AC, not code point 0x80 which it has in CP1252. So if it was originally a single-byte encoding then clearly an intelligent single-byte-to-unicode-code-point conversion has been applied along the way. You could say there are three steps:

  • convert the std::string to Unicode code points (depends on input encoding).
  • convert the Unicode to UTF-8
  • URL-escape the UTF-8
问题回答

It seems rather straightforward to me. Your string is a sequence of bytes. Certain byte values (most, actually, but not the most common) are not permitted, and should be replaced with the three character sequence % followed by two hex characters representing the byte value. So something like:

std::string
toEscaped( std::string const& original )
{
    std::string results ;
    for ( std::string::const_iterator iter = original.begin();
            iter != original.end();
            ++ iter ) {
        static bool const allowed[] =
        {
            //  Define the 256 entries...
        };
        if ( allowed[static_cast<unsigned char>(*iter)] ) {
            results += *iter;
        } else {
            static char const hexChars[] = "0123456789ABCDEF";
            results +=  % ;
            results += hexChars[(*iter >> 4) & 0x0F];
            results += hexChars[(*iter     ) & 0x0F];
        }
    }
    return results;
}

.。

For platform independent feature-rich Unicode handling "de facto" standard library is ICU that is used by many fortune 500 companies and open-source projects... The license is open-source and friendly for use in commercial development

如果你只是想使用某种简单的转换,那么这种转变可能过于致命。

http://site.icu-project.org

如果你只需要一个简单的便携式utf-8 c++图书馆,你可以尝试

hth





相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签