You can use any encoding you want. However from your question, it sounds like typically you re using UTF-8, but sometimes you re getting data from somewhere that s coming in with a different encoding (eg, Internet Explorer tends to like send data to the web server using ISO-8859-1).
If you re going to serve up UTF-8 encoded text, and you get non-UTF-8 encoded text from somewhere, you have to convert that to UTF-8 before you send it down the line. Probably a good practice is to automatically sanitize all data received from the web browser and re-encode it as UTF-8. Unfortunately the browser doesn t always tell you what encoding it s using; if it s not supplied you can probably assume it s UTF-8 or ISO-8859-1.
If you re using a server side language, you re going to want to look into how to convert encodings with that language. For example, PHP has iconv()
function calls, and a very nice function mb_detect_encoding($text)
which will do a pretty decent job of guessing what the encoding is for a given bit of data when you don t already know.
Something like this would be in order (presuming PHP serverside):
$text = iconv(mb_detect_encoding($text), UTF-8 , $text);
Do this with all user input before you do anything else with it (eg, use array_map to automatically convert user inputs):
function convert_to_utf8($text) {
return iconv(mb_detect_encoding($text), UTF-8 , $text);
}
$_GET = array_map( convert_to_utf8 , $_GET);
$_POST = array_map( convert_to_utf8 , $_POST);
Best yet would be to determine if the browser is supplying an encoding, and use that as the first argument to iconv() instead of mb_detect_encoding.