用于编码 "UTF8" 的 Postgres 无效字节序列：0xc3 0x2f

Question

我使用付款 API 并且它 returns 一些 XML。对于日志记录，我想将 API 响应保存在我的数据库中。

API 中的一个词是“manhã”，但 API returns 中的一个词是“manh�”。其他字符如 á ou ç 被正确返回，这是 API 我猜的一些错误。

但是当试图将其保存在我的数据库中时，我得到：

Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

我该如何解决这个问题？

我试过

response.encode("UTF-8") 还有 force_encode 但我得到的只是：

Encoding::UndefinedConversionError ("\xC3" from ASCII-8BIT to UTF-8)

我需要删除这个错误的字符或以某种方式转换它。

Answer 1

你走在正确的轨道上 - 你应该能够使用 encode 方法解决问题 - 当源编码已知时你应该能够简单地使用：

response.encode(‘UTF-8’, ‘ISO-8859-1’)

有时源编码中可能存在无效字符，为了避免异常，您可以指示ruby如何处理它们：

# This will transcode the string to UTF-8 and replace any invalid/undefined characters with ‘’ (empty string)
response.encode(‘UTF-8’, 'ISO-8859-1', invalid: :replace, undef: :replace, replace: ‘’)

这一切都在 Ruby docs for String 中列出 - 请查看！

----

请注意，许多人错误地认为 force_encode 会以某种方式解决编码问题。 force_encode 只是将字符串标记为指定的编码 - 它不会转码和 replace/remove 无效字符。在编码之间进行转换时，必须进行代码转换，以便一个字符集中的字符在另一个字符集中正确表示。

正如评论部分所指出的，如果您使用：response.force_encoding('ISO-8859-1').encode('UTF-8')（相当于上面第一个使用 encode 的示例），您可以使用 force_encoding 来转码您的字符串.

用于编码 "UTF8" 的 Postgres 无效字节序列：0xc3 0x2f

Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

postgresql

ruby-on-rails

utf-8