如何将 Python 3 字节字符串变量转换为常规字符串？

Question

我在 XML 电子邮件附件中阅读了

bytes_string=part.get_payload(decode=False)

正如我的变量名所暗示的那样，有效负载以字节字符串的形式出现。

我正在尝试使用推荐的 Python 3 方法将此字符串转换为我可以操作的可用字符串。

示例显示：

str(b'abc','utf-8')

如何将 b（字节）关键字参数应用于我的变量 bytes_string 并使用推荐的方法？

我试过的方法不行：

str(bbytes_string, 'utf-8')

Answer 1

在 bytes 实例上调用 decode() 以获取它编码的文本。

str = bytes.decode()

Answer 2

你在最后一行几乎是正确的。你要

str(bytes_string, 'utf-8')

因为bytes_string的类型是bytes，和b'abc'的类型一样。

Answer 3

更新：

TO NOT HAVE ANY b and quotes at first and end

How to convert bytes as seen to strings, even in weird situations.

由于您的代码可能有 无法识别的 个字符 'utf-8' 编码，最好只使用 str 而不使用任何其他参数：

some_bad_bytes = b'\x02-\xdfI#)'
text = str( some_bad_bytes )[2:-1]

print(text)

Output: \x02-\xdfI

如果您将 'utf-8' 参数添加到这些特定字节，您应该会收到错误消息。

正如 PYTHON 3 标准所说，text 现在可以毫无顾虑地使用 utf-8。

Answer 4

How to filter (skip) non-UTF8 charachers from array?

要解决 @uname01 的 post 和 OP 中的此评论，请忽略错误：

代码

>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'

详情

来自 docs，这里有更多使用相同 errors 参数的示例：

>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")  
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
  invalid start byte

The errors argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are 'strict' (raise a UnicodeDecodeError exception), 'replace' (use U+FFFD, REPLACEMENT CHARACTER), or 'ignore' (just leave the character out of the Unicode result).

如何将 Python 3 字节字符串变量转换为常规字符串？

How do I convert a Python 3 byte-string variable into a regular string?

string

type-conversion

python-3.x