English 中文(简体)
Unicode generated by toEscapedUnicode method is without spaces
原标题:

For this word चौरेउत्तमयादव the Unicode is==> u0938u0941u0916u091Au0948u0928u093Eu0928u0940 u0930u0940u091Du0941u092Eu0932 u091Cu093Fu0935u0924u0930u093Eu092E

and look it has spaces before u0930 and u091C

But when I am trying in my code

String tempString=Strings.toEscapedUnicode(strString); 

This method to convert to Unicode gives a result without spaces: u0938u0941u0916u091Au0948u0928u093Eu0928u0940u0930u0940u091Du0941u092Eu0932u091Cu093Fu0935u0924u0930u093Eu092E

and that s why they are not matching. My toEscapeUnicode method generates Unicode without spaces. I want the spaces, so how to do it?

问题回答

It isn t a whole answer, but...when I copy n paste the Unicode characters "चौरेउत्तमयादव " and then use a couple of tools to analyze what s there, I do not see any spaces:

echo "चौरेउत्तमयादव " | odx

This produces a hex dump of the data; there s a blank at the end, but none in the middle.

0x0000: E0 A4 9A E0 A5 8C E0 A4 B0 E0 A5 87 E0 A4 89 E0   ................
0x0010: A4 A4 E0 A5 8D E0 A4 A4 E0 A4 AE E0 A4 AF E0 A4   ................
0x0020: BE E0 A4 A6 E0 A4 B5 20 0A                        ....... .
0x0029:

And the second command decodes UTF-8 data:

echo "चौरेउत्तमयादव " | utf8-unicode

It produces:

0xE0 0xA4 0x9A = U+091A
0xE0 0xA5 0x8C = U+094C
0xE0 0xA4 0xB0 = U+0930
0xE0 0xA5 0x87 = U+0947
0xE0 0xA4 0x89 = U+0909
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA5 0x8D = U+094D
0xE0 0xA4 0xA4 = U+0924
0xE0 0xA4 0xAE = U+092E
0xE0 0xA4 0xAF = U+092F
0xE0 0xA4 0xBE = U+093E
0xE0 0xA4 0xA6 = U+0926
0xE0 0xA4 0xB5 = U+0935
0x20 = U+0020
0x0A = U+000A

So, it seems that your problem might be with the input to toEscapedUnicode rather than with its output.


Also, it seems that what I copy n paste from the question doesn t match what you say is in the string:

Yours     Mine

u0938    U+091A
u0941    U+094C
u0916    U+0930
u091A    U+0947
u0948    U+0909
u0928    U+0924
u093E    U+094D
u0928    U+0924
u0940    U+092E
u0020
u0930    U+092F
u0940    U+093E
u091D    U+0926
u0941    U+0935
u092E
u0932
u0020
u091C
u093F
u0935
u0924

So, the pasted text does not match the claimed translation for other reasons too.


I believe that the Unicode string you specify should look like:

सुखचैनानी रीझुमल जिवतराम

I used a file containing the values you claimed, minus the u prefixes and with 0020 in place of the blanks:

0938
0941
0916
091A
0948
0928
093E
0928
0940
0020
0930
0940
091D
0941
092E
0932
0020
091C
093F
0935
0924
0930
093E
092E

And then I used this pure home-brew Perl script to generate the UTF-8 string I propose as the equivalent of your escaped Unicode string. I m sure there are mechanisms available in Perl to do it otherwise (using Unicode-related modules), but this worked for me. It would be less verbose if I didn t leave the debug code in there):

#!/bin/perl -w

use strict;
use constant debug => 0;

while (<>)
{
    chomp;
    my $i = hex;
    printf STDERR "0x%04X = %4d
", $i, $i if debug;
    if ($i < 0x100)
    {
        # 1-byte UTF-8
        printf STDERR "  0x%02X (%3d)
", $i, $i if debug;
        printf "%c", $i;
    }
    elsif ($i < 0x800)
    {
        # 2-byte UTF-8
        my($b1) = 0xC0 | (($i >> 6) & 0xFF);
        my($b2) = 0x80 | ($i & 0x3F);
        printf STDERR "  0x%02X (%3d)
", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)
", $b2, $b2 if debug;
        printf "%c%c", $b1, $b2;
    }
    elsif ($i < 0x10000)
    {
        # 3-byte UTF-8
        my($b1) = 0xE0 | (($i >> 12) & 0xFF);
        my($b2) = 0x80 | (($i >>  6) & 0x3F);
        my($b3) = 0x80 | ( $i        & 0x3F);
        printf STDERR "  0x%02X (%3d)
", $b1, $b1 if debug;
        printf STDERR "  0x%02X (%3d)
", $b2, $b2 if debug;
        printf STDERR "  0x%02X (%3d)
", $b3, $b3 if debug;
        printf "%c%c%c", $b1, $b2, $b3;
    }
    else
    {
        # 4-byte UTF-8 or error
        die "Oh bother!";
    }
}
print "
";

You can fill in the 4-byte UTF-8 and error handling stuff. I don t diagnose invalid UTF-8 sequences (notably the UTF-16 surrogates), so if you put bogus Unicode data points in, you will get bogus UTF-8 values out of the script. If you need to know more about that, read Chapter 3 of the Unicode book (available for download - as a chapter - from Unicode.org) or the FAQ - UTF-8, UTF-16, UTF-32 and BOM.

I had similar situation where I had to display data something like this "U0928U093eU0936U092aU093eU0924U0940" which has to be नाशपाती

I search out a lot to convert it but the answer that I found out myself was very simple & easy.

Only I had to put the given string coming from JSON into UILabel or anything you want. In my case, it was something like this:

let meaning = array[indexPath.row] as! NSDictionary
cell.textLabel?.text = meaning.value(forKey: "key") as? String




相关问题
How do I escape a string for a shell command in node?

In nodejs, the only way to execute external commands is via sys.exec(cmd). I d like to call an external command and give it data via stdin. In nodejs there does yet not appear to be a way to open a ...

Do I need to escape this?

It might be a bit unusual, but I need to echo <?php. However, I think that PHP treats it as an actual <?php and starts executing code instead of treating it as a string. How can I escape <?...

热门标签