如何从R16B升级到17? list_to_binary里面有汉字就断

How to upgrade from R16B to 17? list_to_binary breaks if there are Chinese characters inside

我们正在使用 R16B03-1 并尝试升级到 R17。

iolist_to_binarylist_to_binary里面有汉字就断

我用谷歌搜索并找到以下链接来解释问题。

  1. http://www.erlang.org/news/71

The default encoding of Erlang files has been changed from ISO-8859-1 to UTF-8. The encoding of XML files has also been changed to UTF-8

  1. http://www.erlang.org/doc/apps/stdlib/unicode_usage.html

Only if a string contains code points < 256, can it be directly converted to a binary by using i.e. erlang:iolist_to_binary/1 or can be sent directly to a port. If the string contains Unicode characters > 255, an encoding has to be decided upon and the string should be converted to a binary in the preferred encoding using unicode:characters_to_binary/{1,2,3}. Strings are not generally lists of bytes, as they were before Erlang/OTP R13. They are lists of characters. Characters are not generally bytes, they are Unicode code points.

我的问题是我们是否必须将所有 list_to_binary 修改为 unicode:characters_to_binary

谢谢

来自以下link http://www.erlang.org/doc/man/unicode.html

Other Unicode encodings than integers representing codepoints or UTF-8 in binaries are referred to as "external encodings". The ISO-latin-1 encoding is in binaries and lists referred to as latin1-encoding.

It is recommended to only use external encodings for communication with external entities where this is required. When working inside the Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when representing Unicode characters.

不需要在所有地方都将list_to_binary修改为unicode:characters_to_binary。只有那些需要与外部世界接口的地方才需要它,并且您不确定该字符串是否会用 utf8 表示(或者您确定编码不是 utf8)。转换后可以使用标准 BIF。

例子:如果有一个列表有一个字符[52974]。 list_to_binary([52974]). 给出错误参数异常错误。 但是一旦你做了

A = unicode:characters_to_binary([52974], utf8). <<"컮">>

经过上述转换后,您可以在业务逻辑中使用更快的内置函数。

B = binary_to_list(A).
"컮"
list_to_binary(B).
<<"컮">>