Python readinto：如何从 array.array 转换为自定义 ctype 结构

Question

我创建了一个整数数组，我希望它们能被我创建的结构定义所解释

from ctypes import *
from array import array

class MyStruct(Structure):
    _fields_ = [("init", c_uint),
                ("state", c_char),
                ("constant", c_int),
                ("address", c_uint),
                ("size", c_uint),
                ("sizeMax", c_uint),
                ("start", c_uint),
                ("end", c_uint),
                ("timestamp", c_uint),
                ("location", c_uint),
                ("nStrings", c_uint),
                ("nStringsMax", c_uint),
                ("maxWords", c_uint),
                ("sizeFree", c_uint),
                ("stringSizeMax", c_uint),
                ("stringSizeFree", c_uint),
                ("recordCount", c_uint),
                ("categories", c_uint),
                ("events", c_uint),
                ("wraps", c_uint),
                ("consumed", c_uint),
                ("resolution", c_uint),
                ("previousStamp", c_uint),
                ("maxTimeStamp", c_uint),
                ("threshold", c_uint),
                ("notification", c_uint),
                ("version", c_ubyte)]

# arr = array.array('I', [1])
# How can I do this?
# mystr = MyStruct(arr) magic
# (mystr.helloworld == 1) == True

我可以做到以下几点：

mystr = MyStruct()
rest = array.array('I')
with open('myfile.bin', 'rb') as binaryFile:
    binaryFile.readinto(mystr)
    rest.fromstring(binaryFile.read())

# Now create another struct with rest
rest.readinto(mystr) # Does not work

如果数据包含在 array.array('I') 中，如何避免使用文件将 Ints 数组转换为结构？我不确定 Structure 构造函数接受什么或 readinto 是如何工作的。

Answer 1

这必须是一个数组吗？你可以使用列表吗？您可以将列表中的内容解压缩为可以使用 * 运算符的函数：

mystr = MyStruct(*arr)

或带有以下内容的字典：

mystr = MyStruct(**arr)

Answer 2

解决方案 #1：单行初始化的星形解包

星形解包将起作用，但前提是结构中的所有字段都是整数类型。在 Python 2.x 中，c_char 无法从 int 初始化（在 3.5 中工作正常）。如果将 state 的类型更改为 c_byte，那么您可以这样做：

mystr = MyStruct(*myarr)

这实际上并没有受益于任何 array 特定的魔法（这些值在解包步骤中被短暂地转换为 Python ints，所以你没有降低峰值内存使用），所以如果初始化说 array 比出于任何原因直接读入结构更容易，那么你只会为 array 烦恼。

如果你走明星解包路线，阅读 .state 现在会得到 int 值而不是 len 1 str 值。如果你想用 int 初始化，但读为一个字符 str，你可以使用一个包含在 property:

中的受保护名称

class MyStruct(Structure):
    _fields_ = [...
                ("_state", c_byte),  # "Protected" name int-like; constructor expects int
                ...]

    @property
    def state(self):
        return chr(self._state)

    @state.setter
    def state(self, x):
        if isinstance(x, basestring):
            x = ord(x)
        self._state = x

可以在没有 property 的情况下使用类似的技术，方法是定义您自己的 __init__ 来转换传递的 state 参数：

class MyStruct(Structure):
    _fields_ = [("init", c_uint),
                ("state", c_char),
                ...]

    def __init__(self, init=0, state=b'[=12=]', *args, **kwargs):
        if not isinstance(state, basestring):
            state = chr(state)
        super(MyStruct, self).__init__(init, state, *args, **kwargs)

解决方案 #2：直接 `memcpy` 类解决方案来减少临时工

你可以使用一些 array 特定的魔法来避免暂时的 Python 级别 ints （并且避免需要将 state 更改为 c_byte) 没有真正的文件对象使用伪造的（内存中的）类文件对象：

import io

mystr = MyStruct()  # Default initialize

# Use BytesIO to gain the ability to write the raw bytes to the struct
# because BytesIO's readinto isn't finicky about exact buffer formats
io.BytesIO(myarr.tostring()).readinto(mystr)

# In Python 3, where array implements the buffer protocol, you can simplify to:
io.BytesIO(myarr).readinto(mystr)
# This still performs two memcpys (one occurs internally in BytesIO), but
# it's faster by avoiding a Python level method call

这只有效，因为您的非 c_int 宽度属性后跟 c_int 宽度属性（所以它们无论如何都被填充到四个字节）；如果你有两个 c_ubyte/c_char/等等。背对背类型，那么你就会遇到问题（因为 array 的一个值会初始化结构中的两个字段，这似乎不是你想要的）。

如果您使用的是 Python 3，您可以受益于 array 特定的魔法，以避免解包和 BytesIO 的两个步骤 memcpy 的成本技术（来自 array -> bytes -> 结构）。它在 Py3 中工作，因为 Py3 的 array 类型支持缓冲协议（它在 Py2 中不支持），并且因为 Py3 的 memoryview 具有一个 cast 方法，可以让你改变 memoryview 使其直接兼容 array:

mystr = MyStruct()  # Default initialize

# Make a view on mystr's underlying memory that behaves like a C array of
# unsigned ints in native format (matching array's type code)
# then perform a "memcpy" like operation using empty slice assignment
# to avoid creating any Python level values.
memoryview(mystr).cast('B').cast('I')[:] = myarr

与 BytesIO 解决方案一样，这仅适用于您的字段都恰好填充到四个字节的大小

性能

在性能方面，星号解包对于少量字段胜出，但对于大量字段（您的案例有几十个），直接基于 memcpy 的方法胜出；在 23 个字段 class 的测试中，BytesIO 解决方案在我的 Python 2.7 安装中以 2.5 倍的系数赢得了 star 解包（star 解包时间为 2.5 微秒，BytesIO是 1 微秒）。

memoryview 解决方案的扩展类似于 BytesIO 解决方案，尽管从 3.5 开始，它比 BytesIO 方法稍慢（可能是因为需要构建多个临时memoryviews 执行必要的转换操作 and/or memoryview 切片分配代码对于许多可能的格式都是通用的，因此实现起来并不简单 memcpy）。 memoryview 对于更大的副本可能会更好地扩展（如果损失是由于固定的 cast 开销造成的），但是很少有足够大的结构可以发挥作用；只有在更通用的复制场景（往返于 ctypes 数组等）中，memoryview 才有可能获胜。

Python readinto：如何从 array.array 转换为自定义 ctype 结构

Python readinto: How to convert from an array.array to a custom ctype structure

python

arrays

ctypes

python-2.x

解决方案 #1：单行初始化的星形解包

解决方案 #2：直接 `memcpy` 类解决方案来减少临时工

性能

Python readinto：如何从 array.array 转换为自定义 ctype 结构

Python readinto: How to convert from an array.array to a custom ctype structure

python

arrays

ctypes

python-2.x

解决方案 #1：单行初始化的星形解包

解决方案 #2：直接 memcpy 类解决方案来减少临时工

性能

解决方案 #2：直接 `memcpy` 类解决方案来减少临时工