人们应当更好地利用Wikipedia的原始规格。 allowed natures in XML 1.0:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
The set of allowed characters in XML tag names is even more restricted.
The set of allowed character sequences in RDF literals is defined as "being a Unicode [UNICODE] string, which SHOULD be in Normal Form C [NFC]". The set of Unicode characters is codepoint U+0000 to U+10FFFF (minus 66 non-characters depending on your point of view).
Anyway, the set of allowed Unicode characters includes characters explicitly forbidden in XML. See also the SO question Why are "control" characters illegal in XML 1.0?. In XML 1.1 the set of characters was broadened to
Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
举例来说,在《刑法》中仍然不能表达特征(U+0000)。