I guess the codepoints of UCS and Unicode are the same, am I right?
在这种情况下,我们为什么需要两个标准(UCS和Unicode)?
I guess the codepoints of UCS and Unicode are the same, am I right?
在这种情况下,我们为什么需要两个标准(UCS和Unicode)?
They are not two standards. The Universal Character Set (UCS) is not a standard but something defined in a standard, namely ISO 10646. This should not be confused with encodings, such as UCS-2.
很难猜测你是否实际上意味着不同的编码或不同的标准。 但关于后者,统法协会和ISO 10646最初是两个不同的标准化努力,其目标和战略各不相同。 然而,在1990年代初,为了避免因两项不同标准而产生的所有问题,它们得到了协调。 它们得到协调,使法典各点确实相同。
但是,它们仍然不同,部分原因是统法协会是由能够灵活开展工作的行业联合会界定的,并且非常有兴趣将超出简单代码点任务的事项标准化。 统法协会准则界定了许多原则和处理规则,而不仅仅是特性。 ISO 10646是一种正式标准,可在ISO及其成员的标准和其他文件中参考。
The codepoints are the same but there are some differences. From the Wikipedia entry about the differences between Unicode and ISO 10646 (i.e. UCS):
两者之间的差异是,统法协会增加了不属于ISO 10646范围的规则和规格。 标准化组织10646是一个简单的特征图,是以前标准的延伸,如ISO 8859。 相比之下,统法协会增加了对希伯来语和阿拉伯语等文字的串通、形式正常化和双向算法的规则。
http://www.joelonsoftware.com/articles/Unicode.html” rel=“noreferer” a) 《关于统一编码和特征的规定》(无主见!)
我认为,这些差异来自编码问题。 CNS-x使用固定的 by量编码代码点。 例如,US-2使用两种 by。 但是,联邦统计局-2不能规定需要超过2个字典的编码点。 另一方面,UTF使用可变的批量进行编码。 例如,UTF-8至少使用一种沥青(如二等特性),但如果特性不在二类的范围,则使用更多的tes。
I can see some duplicate characters in Unicode. For example, the character C can be represented by the code points U+0043 and U+0421. Why is this so?
Need to extract the initial character from a Korean word in MS-Excel and MS-Access. When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ . Is ...
I execute following code on windows xp and python 2.6.4 But it show IOError. How to open file whose name has utf-8 codec. >>> open( unicode( 한글.txt , euc-kr ).encode( utf-8 ) ) Traceback ...
I used lxml to parse some web page as below: >>> doc = lxml.html.fromstring(htmldata) >>> element in doc.cssselect(sometag)[0] >>> text = element.text_content() >>>...
The XML specification lists a bunch of Unicode characters that are either illegal or "discouraged". Given a string, how can I remove all illegal characters from it? I came up with the following ...
I am using Sandcastle Helpfile Builder to produce a helpfile (.chm). The project is a .shfbproj file, which is XML format, works with msbuild. I want to automatically update the Footer text that ...
When I open a multi-byte file, I get this:
• 如何在java印刷0x13Unicode nature?