汉字输出时如何避免乱码

Question

writeFile.

发现一个关于将汉字写入文件的奇怪问题

> writeFile "r.txt" "过"  -- outputting as expected.

> writeFile "r.txt" "图"  -- not displaying as expected. see the following:

然后奇怪的事情发生了：

> writeFile "r.txt" "图画"  -- outputting is normal, as follows:

更多乱码示例：

> writeFile "r.txt" "士"
> writeFile "r.txt" "十"
> writeFile "r.txt" "千"
> writeFile "r.txt" "一"
> writeFile "r.txt" "一千十士图" -- This is displayed as messy codes.

但以下是正常的：

> writeFile "r.txt" "一千十士图画" -- This is normal.

所以上面的字符和其他可以正常显示的字符一起导出是正常的，比如writeFile "r.txt" "十过"。

我不知道为什么会这样：

—— 为什么有的字符输出为乱码，有的却没有？事实上，“一千十士图”是中文中使用最多的汉字。

—— 为什么导出为乱码的字符和其他可以正常显示的字符一起正常显示？

如果有人能提供一些信息，我将不胜感激。

Answer 1

首先，这是一个很好的问题。即使是现在，编码问题仍然是一个问题。 Windows uses UTF-16 by default now，而 Haskell 的大部分内容是在 UTF-8 平台上开发的。 System.IO 函数使用的实际编码在运行时没有明确定义，因为它是由平台环境设置的，或者如果不是则任意选择。

首先要做的是切换到 Data.Text for text handling. Not only is this more aware of encodings, it's also considerably more efficient than the "List of Characters" model that String is. It has it's own I/O functions，它也考虑了特定的编码。

为了使这更容易，启用 OverloadedStrings 将非常有帮助。此外，由于您使用的是字符串文字，因此在 GHC 编译时检查源代码文件编码是否与环境匹配也会有所帮助。有很多地方处理这些数据，过了某个点，从一个已知良好的文件加载你的字符串最终比将它们放在源文件中更令人头疼。

汉字输出时如何避免乱码

how to avoid messy codes when outputting Chinese characters

windows

string

haskell

encode