English 中文(简体)
.NET DataSet.GetXml() - what s the default encoding?
原标题:

Existing app passes XML to a sproc in SQLServer 2000, input parameter data type is TEXT; The XML is derived from Dataset.GetXML(). But I notice it doesn t specify an encoding.

So when the user sneaks in an inappropriate character into the dataset, specifically ASCII 146 (which appears to be an apostrophe) instead of ASCII 39 (single quote), the sproc fails.

One approach is to prefix the result of GetXML with

<?xml version="1.0" encoding="ISO-8859-1"?>

It works in this case, but what would be a more correct approach to ensure the sproc does not crash (if other unforeseen characters pop up)?

PS. I suspect the user is typing text into MS-Word or similar editor, and copy & pasting into the input fields of the app; I would probably want to allow the user to continue working this way, just need to prevent the crashes.

EDIT: I am looking for answers that confirm or deny a few aspects, For example:
- as per title, whats the default encoding if none specified in the XML?
- Is the encoding ISO-8859-1 the right one to use?
- if there a better encoding that would encompass more characters in the english-speaking world and thus less likely to cause an error in the sproc?
- would you filter at the app s UI level for standard ASCII (0 to 127 only), and not allow extended ASCII?
- any other pertinent details.

最佳回答

DataSet.GetXml() returns a string. In .NET, strings are internally encoded using UTF-16, but that is not really relevant here.

The reason why there s no <?xml encoding=...> declaration in the string is because that declaration is only useful or needed to parse XML in a byte stream. A .NET string is not a byte stream, it s just text with well-defined codepoint semantics (which is Unicode), so it is not needed there.

If there is no XML encoding declaration, UTF-8 is to be assumed by the XML parser in the absence of BOM. In your case, however, it is also entirely irrelevant since the problem is not with an XML parser (XML isn t parsed by SQL Server when it s stored in a TEXT column). The problem is that your XML contains some Unicode characters, and TEXT is a non-Unicode SQL type.

You can encode a string to any encoding using Encoding.GetBytes() method.

问题回答

I believe your approach should be to use WriteXml instead of GetXml. That should allow you to specify the encoding.

However, note that you will have to write through an intermediate stream - if you output directly to a string, it will always use UTF-16. Since you are using a TEXT column, that will permit characters not valid for TEXT.





相关问题
Mojarra for JSF Encoding

Can anyone teach me how to use mojarra to encode my JSF files. I downloaded mojarra and expected some kind of jar but what i had downloaded was a folder of files i don t know what to do with

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

Using Java PDFBox library to write Russian PDF

I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It ...

what is encoding in Ajax?

Generally we are using UTF-8 encoding standard for sending the request for every language. But in some language this encoding standard is not working properly,then in that case we are using ISO-8859-1....

Encoding of window.location.hash

Does window.location.hash contain the encoded or decoded representation of the url part? When I open the same url (http://localhost/something/#%C3%BC where %C3%BCtranslates to ü) in Firefox 3.5 and ...

Auth-code with A-Za-z0-9 to use in an URL parameter

As part of a web application I need an auth-code to pass as a URL parameter. I am currently using (in Rails) : Digest::SHA1.hexdigest((object_id + rand(255)).to_s) Which provides long strings like : ...

热门标签