在 python 中对 str 使用 encode 会发生什么情况？

Question

我明白了关于 unicode、编码和解码的要点。但是我不明白为什么 encode 函数对 str 类型有效。我希望它只适用于 unicode 类型。因此我的问题是：当 encode 在 str 而不是 unicode 上使用时，它的行为是什么？

Answer 1

Python 意识到它不能对 str 类型执行 encode，所以它首先尝试 decode！它使用 'ascii' 编解码器，如果您有任何代码点高于 0x7f 的字符，它将失败。

这就是为什么您在尝试执行 encode 时有时会看到 decode 错误。

Answer 2

在Python3中，编码字节串根本行不通。

>>> b'hi'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'

Python 2 尝试当您在 str 上调用 encode 并首先尝试 解码时会有所帮助 用 sys.getdefaultencoding()（通常是 ascii）和之后编码的字符串。

这就是为什么当您尝试使用 utf-8 编码时，您会收到相当奇怪的错误消息，即无法使用 ascii 解码。

>>> 'hi\xFF'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 2: ordinal not in range(128)

Ned 解释得比我好，从 16:20 开始观看 this。

Answer 3

在Python 2 中有两种可用的编解码器；那些在 str 和 unicode 之间转换的，以及那些从 str 转换为 str 的。后者的示例是 base64 和 rot13 编解码器。

存在str.encode()方法以支持后者：

'binary data'.encode('base64')

但既然它存在，人们也将它用于 unicode -> str 编解码器；编码只能从 unicode 到 str （并以另一种方式解码）。为了支持这些，Python 将首先使用 ASCII 编解码器隐式将您的 str 值解码为 unicode，然后再最终编码。

顺便说一句，当在 unicode 对象上使用 str -> str 编解码器时，Python 首先使用相同的 ASCII 隐式编码为 str编解码器。

在 Python 3 中，这已通过 a) 删除 bytes.encode() 和 str.decode() 方法解决（记住 bytes 有点像旧的 str 和 str 新的 unicode)，以及 b) 通过将 str -> str 编码移动到 codecs 模块 only, 使用 codecs.encode() and codecs.decode() functions. What codecs transform between the same type has also been clarified and updated, see the Python Specific Encodings section;请注意，此处提到的 'text' 编码在 Python 2 中可用，改为编码为 str。

在 python 中对 str 使用 encode 会发生什么情况？

What happens when encode is used on str in python?

python

string

unicode

encode

python-2.x