在字符串前创建带有 "b" 前缀的字节时，python 使用什么编码？

Question

来自python doc:

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

我知道我可以创建一个带有 b 前缀表达式的字节对象，例如：b'cool'，这会将 unicode 字符串 'cool' 转换为字节。我也知道 bytes 实例可以通过 bytes() 函数创建，但您需要指定编码参数：bytes('cool', 'utf-8').

根据我的理解，如果我想将一个字符串翻译成一个字节序列，我需要使用一种编码规则。我做了一些实验，似乎 b 前缀使用 utf-8 编码将字符串转换为字节：

>>> a = bytes('a', 'utf-8')
>>> b'a' == a
True
>>> b = bytes('a', 'utf-16')
>>> b'a' == b
False

我的问题是通过b前缀创建字节对象时，python使用什么编码？有没有指定这个问题的文档？它默认使用 utf-8 还是 ascii？

Answer 1

bytes 类型可以保存任意数据。例如，JPEG 图像（的开头）：

>>> with open('Bilder/19/01/IMG_3388.JPG', 'rb') as f:
...     head = f.read(10)

你应该把它想象成一个整数序列。这也是该类型在许多方面的行为方式：

>>> list(head)
[255, 216, 255, 225, 111, 254, 69, 120, 105, 102]
>>> head[0]
255
>>> sum(head)
1712

为了方便起见（我猜也是出于历史原因），字节的标准 repr表达及其文字类似于字符串：

>>> head
b'\xff\xd8\xff\xe1o\xfeExif'

它在适用的地方使用 ASCII 可打印字符，否则 \xNN 转义。如果 bytes 对象表示文本，这很方便：

>>> 'Zoë'.encode('utf8')
b'Zo\xc3\xab'
>>> 'Zoë'.encode('utf16')
b'\xff\xfeZ\x00o\x00\xeb\x00'
>>> 'Zoë'.encode('latin1')
b'Zo\xeb'

当您键入 bytes 文字时，Python 使用 ASCII 对其进行解码。 ASCII 范围内的字符在 UTF-8 中以相同的方式编码，这就是为什么您观察到 b'a' == bytes('a', 'utf8') 的等价性。表达式 b'a' == bytes('a', 'ascii').

的误导性较小

在字符串前创建带有 "b" 前缀的字节时，python 使用什么编码？

When creating bytes with "b" prefix before string, what encoding does python use?

python

byte

utf-8