English 中文(简体)
Percent Encoded UTF-8 to Ascii(8-bit) conversion
原标题:

Im reading in urls and they often have percent encoded characters.

Example: %C3%A9 is actually é

According to http://www.microsystools.com/products/sitemap-generator/faq/character-percentage-url-encoding/ , characters in the upper half of 8-Bit ASCII (128-255) are encoded as UTF-8, then their bytes are saved as hex. Now, when I get my URL, the %HEX s have been reencoded as 8-bit ascii, and I need to convert those back to their true 8bit ascii. Is there any function/library I can use, or else, how would I go about the conversion?

Im using C/C++.

问题回答

First you need to URLDecode. Not a function available in cross-platform C++, but, luckily for you, not a hard problem. Copy bytes from source to target. Non-% bytes just get copied. When you hit %xx, convert XX from hex chars to binary, and you have your byte.

This gives you a buffer of text in UTF-8. You say you want ASCII -- ISO-646. Then you can t have an accented e. I can think of several possibilities for what you really want:

  1. ISO-8859-1. You can use ICU to convert UTF-8 to ISO-8859-1.
  2. ISO-646. You can also use ICU, and I believe it will make accented chars into their ISO-646 equivalents.




相关问题
Percent Encoded UTF-8 to Ascii(8-bit) conversion

Im reading in urls and they often have percent encoded characters. Example: %C3%A9 is actually é According to http://www.microsystools.com/products/sitemap-generator/faq/character-percentage-url-...

US-ASCII encoding with Odd and Even numbers?

I splitted the list of numbers 1-100 to files of 2 bytes. Then, I noticed that each odd number btw 11-99 needs 2 files, ie 4bytes, while each even number btw 11-99 needs 1 file, 2bytes. A file is ...

Issues with Chr(0) in SQL INSERT script

We currently use the SQL Publishing Wizard to back up our database schemas and data, however we have some database tables with hashed passwords that contain the null character (chr(0)). When SQL ...

Encoding a string to ascii

I have a long string that I want to encode to ascii. I m doing: s = s.encode( ascii , replace ) but I get: ascii codec can t decode byte 0xc3 in position 2646: ordinal not in range(128) (I ve ...

热门标签