English 中文(简体)
在 Windows 上的 Haskell 中的 Unicode 控制台 I/O 的 Unicode 控制台 I/O
原标题:Unicode console I/O in Haskell on Windows

似乎很难让控制台 I/O 在 Windows 下的 Haskell 与 Unicode 字符合作。 以下是不幸的故事:

  1. (Preliminary.) Before you even consider doing Unicode I/O in the console under Windows, you need to make sure that you re using a console font that can render the characters you want. The raster fonts (the default) have infinitely poor coverage (and don t allow copy-pasting of characters they can t represent), and the TrueType options MS provides (Consolas, Lucida Console) have not-great coverage (though these will allow copy-pasting of characters they cannot represent). You might consider installing DejaVu Sans Mono (follow the instructions at the bottom here; you may have to reboot before it works). Until this is sorted, no apps will be able to do much Unicode I/O; not just Haskell.
  2. Having done this, you will notice that some apps will be able to do console I/O under Windows. But getting it to work remains quite complicated. There are basically two ways to write to the console under Windows. (What follows is true for any language, not just Haskell; don t worry, Haskell will enter the picture in a bit!)...
  3. Option A is to use the usual c-library style byte-based i/o functions; the hope is that the OS will interpret these bytes according to some encoding which can encode all the weird and wonderful characters you want. For instance, using the equivalent technique on Mac OS X, where the standard system encoding is usually UTF8, this works great; you send out UTF8 output, you see pretty symbols.
  4. On Windows, it works less well. The default encoding that Windows expects will generally not be an encoding covering all the Unicode symbols. So if you want to see pretty symbols this way, one way or another, you need to change the encoding. One possibility would be for your program to use the SetConsoleCP win32 command. (So then you need to bind to the Win32 library.) Or, if you d rather not do that, you can expect your program s user to change the code page for you (they would then have to call the chcp command before they run your program).
  5. Option B is to use the Unicode-aware win32 console API commands like WriteConsoleW. Here you send UTF16 directly to Windows, which renders it happily: there s no danger of an encoding mismatch because Windows always expects UTF16 with these functions.

不幸的是,这些选项在Haskell都没有很好地运作。 首先,我对使用选项B没有了解的图书馆,所以不太容易。 离开选项A。 如果您使用Haskell s I/O 库( putStrLn 等), 图书馆会这样做。 在现代版本的Haskell中, 它会仔细地询问 Windows 目前的代码页是什么, 并在正确的编码中输出您的字符串。 这种方法有两个问题 :

  • One is not a showstopper, but it is annoying. As mentioned above, the default encoding will almost never encode the characters you want: you need the user to change to an encoding that does. Thus your user needs to chcp cp65001 before they run your program (you may find it distasteful to force your users to do this). Or, you need to bind to SetConsoleCP and do the equivalent inside your program (and then use hSetEncoding so that the Haskell libraries will send output using the new encoding), which means you need to wrap the relevant part of the win32 libraries to make them Haskell-visible.
  • Much more seriously, there is a bug in Windows (resolution: won t fix) which leads to a bug in Haskell which means that if you have selected any code page like cp65001 which can cover all of Unicode, Haskell s I/O routines will malfunction and fail. So essentially, even if you (or your user) set the encoding properly to some encoding that covers all the wonderful Unicode characters, and then do everything right in telling Haskell to output things using that encoding, you still lose.

上述错误仍未解决,列为低优先级;基本结论是选项A(在我以上分类中)不可行,需要转向选项B才能获得可靠结果。 不清楚解决这个问题的时间框架是什么,因为这看起来是相当可观的工作。

问题是:“强”同时,是否有人可以建议采取变通办法,允许在Windows下的Haskell使用Unicode 控制台 I/O?

另见“http://bugs.python.org/issu1602' rel=“nofollown noreferrer”>>Python 错误追踪器数据库条目 ,努力解决Python 3中的相同问题(提议为一字,但尚未纳入代码库),和