English 中文(简体)
Should I support Unicode in passwords?
原标题:

I would like to allow my users to use Unicode for their passwords.

However I see a lot of sites don t support that (e.g. Gmail, Hotmail).

So I m wondering if there s some technical or usability issue that I m overlooking.

I m thinking if anything it must be a usability issue since by default .NET accepts Unicode and if Hotmail--er, the new Live mail--is built on that, I don t see why they would restrict it.

Has anyone encountered similar issues?

问题回答

I am sure there is no technical problem but maybe gmail and hotmail are not supporting that on purpose. This kind of websites have a wide audience and should be accessible from everywhere.

Let s imagine the user have a password in Japanese but he is on travel and go to a cyber cafe and there is no Japanese support the user won t be able to login.

One other problem is to analyze the password complexity, it s not so difficult to make sure the user didn t type a common word in English but what about in Chinese/Russian/Thai. It is much more difficult to analyze the complexity of a password as you add more languages.

So in case you want your system to be accessible, it s better to ensure that the user would be able to type his password on every kind of devices/OSes/environments, so the alpha numeric password with most common symbols(!<>"#$%& etc..) is kind of good set of characters available everywhere.

Generally I am strongly in favor of not restricting what kinds of characters are allowed in passwords. However, remember that you have to compare something to something stored which may be the password or a hash. In the former case you have to make sure that comparison is done correctly which is much more complex with Unicode than with ASCII alone; in the latter case you would have to ensure that you are hashing exactly the same whenever it is entered. Normalization forms may help here or be a curse, depending on who applies them.

For example, in an application I m working on I am using a hash over a UTF-8 conversion of the password which was normalized beforehand to weed out potential problems with combining characters and such.

The biggest problem the user may face is that they can t enter it in some places, like on another keyboard layout. This is already the case for one of my passwords but never was a problem so far. And after all, that s a decision the user has to make in choosing their password and not one the application should make on behalf of the user. I doubt there are users who happily use arbitrary Unicode in their passwords and not think of the problems that may arise when using another keyboard layout. (This may be an issue for web-based services more than anything else, though.)

There are instances where Unicode is rightly forbidden, though. One such example is TrueCrypt which forces the use of the US keyboard layout for boot-time passwords (for full-volume encryption). There is no other layout there and therefore Unicode or any other keyboard layout only produces problems.

However, that doesn t explain why they forbid Unicode in normal passwords. A warning might be nice but outright forbidding is wrong in my eyes.

So I m wondering if there s some technical or usability issue that I m overlooking.

There s a technical issue with non-ASCII passwords (and usernames, for that matter) with HTTP Basic Authentication. As far as I know the sites you mentioned don t generally use Basic Authentication, but it might be a hangover from systems that do.

The HTTP Basic Authentication standard defines a base64-encoded username:password token. This means if you have a colon in the username or password the results are ambiguous. Also, base64-decoding the token gives you only bytes, with no direction of how to convert those bytes to characters. And guess what? The different browsers use different encodings to do it.

  • Opera and Chrome use UTF-8.

  • IE uses the client system s default code page (which is of course never UTF-8) and mangles characters that don t fit in it using the Windows standard Try To Find A Character That Looks A Bit Like It, Or Maybe Just Not (Who Cares) algorithm.

  • Safari uses ISO-8859-1, and silently refuses to send any auth token at all when the username or password has characters that don t fit.

  • Mozilla takes the lowest 8 bits of the code point (similar to ISO-8859-1, but more broken). See bug 41489 for tortuous discussion with no outcome or progress.

So if you allow non-ASCII usernames or passwords then the Basic Authentication process will be at best complicated and inconsistent, with users wondering why it randomly works or fails when they use different computers or browsers.

No. Restrict passwords to ASCII characters.

When you input a password, bullets are displayed to conceal the password.

But when you input Japanese and other languages, you must go through an input method, converting the keystrokes into the desired characters. This requires you to see what the characters are.

I support Unicode passwords in all of my web applications. If using a recent browser the visitor can use any code point in their preferred or native scripts.

For enhanced security I store a salted hash rather than using reversible encryption.

The important thing is to correctly normalize and encode the password string before adding the byte sequence to the hash (I prefer UTF-8 for endian independence).

Unicode sucks if you have to do programmatic matching. The "minus sign" and "dash" look the same, but might be separate codes. "n with a funny tilde over it" might be one letter, or a diacritic and a letter.

If people use different encoding methods, then their passwords might not match, even though the passwords look the same. See omg-ponies aka humanity=epic fail.

You can normalize, but what happens when:

  • the normalization rules change
  • you have some users with diacritics in their password
  • you have some users with combined letters in their password
  • the passwords are hashed, so you can t change the passwords

Guess what - you need to force a password reset on some of your users.

Good idea.

Makes the password stronger, gives more freedom to the users. And it is already done by Windows (since at least Win 2000), Active Directory and LDAP, Novell (since at least 2004)

Some customers want it (http://mailman.mit.edu/pipermail/kerberos/2008-July/013923.html) and there is even a standard on how to do it right (https://www.rfc-editor.org/rfc/rfc8265[3], obsoletes https://www.rfc-editor.org/rfc/rfc4013, thanks John).

I m sure that the multilingual counterparts of those sites do support unicode. It sounds like a user requirements issue rather than a technical challenge.

I would not be surprised if there is a technical issue with the server not being certain of the encoding the client is sending the password in.

However, I would guess that, say, sites with mainly native-speaking Japanese, Chinese or Russian audiences would use the commonly used respective non-ASCII character set (Big5, EUC-KR, koi8, etc.) for passwords. Maybe you can research what they are doing to cope with older web clients using any of the non-Unicode stuff.

with HTML 5, with the ability to send to your users a font, you can integrate a visual keyboard on your system, so users will be able to use your language,

Hint: use Deja Vu font, and modify it using FontForge so you can make it smaller, then, with a visual javascript keyboard, you can make it possible ;)

Look here, it is a project where i did the trick.





相关问题
Why are there duplicate characters in Unicode?

I can see some duplicate characters in Unicode. For example, the character C can be represented by the code points U+0043 and U+0421. Why is this so?

how to extract characters from a Korean string in VBA

Need to extract the initial character from a Korean word in MS-Excel and MS-Access. When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ . Is ...

File open error by using codec utf-8 in python

I execute following code on windows xp and python 2.6.4 But it show IOError. How to open file whose name has utf-8 codec. >>> open( unicode( 한글.txt , euc-kr ).encode( utf-8 ) ) Traceback ...

UnicodeEncodeError on MySQL insert in Python

I used lxml to parse some web page as below: >>> doc = lxml.html.fromstring(htmldata) >>> element in doc.cssselect(sometag)[0] >>> text = element.text_content() >>>...

Fast way to filter illegal xml unicode chars in python?

The XML specification lists a bunch of Unicode characters that are either illegal or "discouraged". Given a string, how can I remove all illegal characters from it? I came up with the following ...

热门标签