使用 numpy 进行矢量化字节位置转换?
Vectorized byte-position conversion with numpy?
我有一个字节列表,像这样
b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
b = b1 + b2
stream = [b] * 10
你可以把它想象成一个数组,比如
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
为了正确转换每个位置,我会这样做(知道哪些位置已签名,哪些未签名)
for line in stream:
c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
但这是非常低效的循环。鉴于我知道“列”的位置,我将如何以列向量化方式使用 numpy 执行此操作?
您可以使用结构化数组来完成此操作。所以给出:
In [1]: b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
...: b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
...:
...: b = b1 + b2
...:
...: stream = [b] * 10
In [2]: for line in stream:
...: c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
...: c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
...: print(c1, c2)
...:
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
然后,通过连接字节创建一个缓冲区,然后将结构化数据类型与 numpy.frombuffer
助手一起使用:
In [3]: import numpy as np
In [4]: buffer = b''.join(stream)
In [5]: arr = np.frombuffer(buffer, dtype=np.dtype([('x','<u2'), ('y','<i2')]))
In [6]: arr
Out[6]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
(123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
dtype=[('x', '<u2'), ('y', '<i2')])
请注意,我给出的名称 'x'
和 'y'
只是占位符。使用任何你想要的。但是无论你选择什么名称,你都可以索引到结构化数组中:
In [8]: arr['x']
Out[8]: array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123], dtype=uint16)
In [9]: arr['y']
Out[9]:
array([-987, -987, -987, -987, -987, -987, -987, -987, -987, -987],
dtype=int16)
注意,如果您不关心名称,可以使用 shorthand dtype 规范:
In [10]: np.frombuffer(buffer, dtype=np.dtype('<u2,<i2'))
Out[10]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
(123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
dtype=[('f0', '<u2'), ('f1', '<i2')])
您可以阅读有关指定数据类型的更多信息in the official docs
使用结构化 numpy 数组:
data = np.zeros(100, dtype=[('a', '<u2'),('b','<i2')])
data['a'] = 123
data['b'] = -987
stream = data.tobytes()
data = np.frombuffer(stream, dtype=[('a', '<u2'),('b','<i2')])
我有一个字节列表,像这样
b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
b = b1 + b2
stream = [b] * 10
你可以把它想象成一个数组,比如
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
b'{\x00%\xfc'
为了正确转换每个位置,我会这样做(知道哪些位置已签名,哪些未签名)
for line in stream:
c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
但这是非常低效的循环。鉴于我知道“列”的位置,我将如何以列向量化方式使用 numpy 执行此操作?
您可以使用结构化数组来完成此操作。所以给出:
In [1]: b1 = int(123).to_bytes(length=2, byteorder="little", signed=False)
...: b2 = int(-987).to_bytes(length=2, byteorder="little", signed=True)
...:
...: b = b1 + b2
...:
...: stream = [b] * 10
In [2]: for line in stream:
...: c1 = int.from_bytes(line[0:2], byteorder="little", signed=False)
...: c2 = int.from_bytes(line[2:4], byteorder="little", signed=True)
...: print(c1, c2)
...:
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
123 -987
然后,通过连接字节创建一个缓冲区,然后将结构化数据类型与 numpy.frombuffer
助手一起使用:
In [3]: import numpy as np
In [4]: buffer = b''.join(stream)
In [5]: arr = np.frombuffer(buffer, dtype=np.dtype([('x','<u2'), ('y','<i2')]))
In [6]: arr
Out[6]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
(123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
dtype=[('x', '<u2'), ('y', '<i2')])
请注意,我给出的名称 'x'
和 'y'
只是占位符。使用任何你想要的。但是无论你选择什么名称,你都可以索引到结构化数组中:
In [8]: arr['x']
Out[8]: array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123], dtype=uint16)
In [9]: arr['y']
Out[9]:
array([-987, -987, -987, -987, -987, -987, -987, -987, -987, -987],
dtype=int16)
注意,如果您不关心名称,可以使用 shorthand dtype 规范:
In [10]: np.frombuffer(buffer, dtype=np.dtype('<u2,<i2'))
Out[10]:
array([(123, -987), (123, -987), (123, -987), (123, -987), (123, -987),
(123, -987), (123, -987), (123, -987), (123, -987), (123, -987)],
dtype=[('f0', '<u2'), ('f1', '<i2')])
您可以阅读有关指定数据类型的更多信息in the official docs
使用结构化 numpy 数组:
data = np.zeros(100, dtype=[('a', '<u2'),('b','<i2')])
data['a'] = 123
data['b'] = -987
stream = data.tobytes()
data = np.frombuffer(stream, dtype=[('a', '<u2'),('b','<i2')])