使用 numpy 进行矢量化字节位置转换?

Vectorized byte-position conversion with numpy?

我有一个字节列表,像这样

b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)

b = b1 + b2

stream = [b] * 10

你可以把它想象成一个数组,比如

b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'

为了正确转换每个位置,我会这样做(知道哪些位置已签名,哪些未签名)

for line in stream:
    c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
    c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)

但这是非常低效的循环。鉴于我知道“列”的位置,我将如何以列向量化方式使用 numpy 执行此操作?

您可以使用结构化数组来完成此操作。所以给出:

In [1]: b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
   ...: b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
   ...:
   ...: b = b1 + b2
   ...:
   ...: stream = [b] * 10

In [2]: for line in stream:
   ...:     c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
   ...:     c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
   ...:     print(c1, c2)
   ...:
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987

然后,通过连接字节创建一个缓冲区,然后将结构化数据类型与 numpy.frombuffer 助手一起使用:

In [3]: import numpy as np

In [4]: buffer = b''.join(stream)

In [5]: arr = np.frombuffer(buffer, dtype=np.dtype([('x','<u2'), ('y','<i2')]))

In [6]: arr
Out[6]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
       (123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
      dtype=[('x', '<u2'), ('y', '<i2')])

请注意,我给出的名称 'x''y' 只是占位符。使用任何你想要的。但是无论你选择什么名称,你都可以索引到结构化数组中:

In [8]: arr['x']
Out[8]: array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123], dtype=uint16)

In [9]: arr['y']
Out[9]:
array([-987, -987, -987, -987, -987, -987, -987, -987, -987, -987],
      dtype=int16)

注意,如果您不关心名称,可以使用 shorthand dtype 规范:

In [10]: np.frombuffer(buffer, dtype=np.dtype('<u2,<i2'))
Out[10]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
       (123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
      dtype=[('f0', '<u2'), ('f1', '<i2')])

您可以阅读有关指定数据类型的更多信息in the official docs

使用结构化 numpy 数组:

data = np.zeros(100, dtype=[('a', '<u2'),('b','<i2')])
data['a'] = 123
data['b'] = -987
stream = data.tobytes()

data = np.frombuffer(stream, dtype=[('a', '<u2'),('b','<i2')])