带/位域的结构：Python 和 C 填充不同

Question

考虑以下程序：

#include <stddef.h>
#include <stdio.h>

typedef struct
{
    unsigned        bit:1;
    unsigned char   str[8];
} test;

#pragma pack(1)
typedef struct
{
    unsigned        bit:1;
    unsigned char   str[8];
} test_pack;

int main(int argc, char **argv) {

    printf("%3lu str offset\n", offsetof(test, str));
    printf("%3lu total\n", sizeof(test));

    printf("%3lu str_pack offset\n", offsetof(test_pack, str));
    printf("%3lu total\n", sizeof(test_pack));

    return 0;
}

输出

  1 str offset
 12 total
  1 str_pack offset
  9 total

在我的 Ubuntu 14.04.3 系统上，GCC 4.8.4。

（我认为）等效的 Python 程序，

#!/usr/bin/python3

from ctypes import *

class Test(Structure):
    _fields_ = [
        ('bit', c_uint, 1),
        ('str', c_ubyte * 8),
    ]

class TestPacked(Structure): 
    _pack_ = 1
    _fields_ = [
        ('bit_p', c_uint, 1),
        ('str_p', c_ubyte * 8),
    ]

if __name__ == "__main__":
    print("%3lu str offset" % Test.str.offset)
    print("%3lu total" % sizeof(Test))

    print("%3lu str_p offset" % TestPacked.str_p.offset)
    print("%3lu total_p" % sizeof(TestPacked))

产出

  4 str offset
 12 total
  4 str_p offset
 12 total_p

在同一系统上，运行 Python 3.4.0.

据我所知，位域应该占用 1 位。 C 和 Python 都将其填充为 32 位，向结构中添加 3 个空字节 space 以便更好地对齐。

但是，Python 将填充放在字符串之前，C 将其放在后面。

此外，对于 #pragma pack(1)，C 删除了填充，但 Python 没有。

我可以 python 在字符串之后而不是之前添加 3 个字节的填充吗？
如果做不到这一点，python 可以正确打包结构以便它们对齐吗？

不需要通过网络协议或任何东西跨不同系统工作...只是试图让位在一个系统上排列，即使我必须以某种方式重新配置它。谢谢！

Answer 1

AFAICT，C 中位字段的规则非常模糊（并且 compiler/ABI 依赖），你永远无法确定它们将如何被填充或对齐（尤其是当编译器特定的打包编译指示出现时入戏）。您可以通过自己显式定义额外的填充位字段片段（使总位数等于基础数据类型）来使它们更具可移植性，但这一切都将是黑客行为。由于无论如何都是 hackery，解决你如何保持填充的问题，但是按照 gcc 的方式将它移动到字符串之后非常容易，手动定义填充：

class Test(Structure):
    _fields_ = [
        ('bit', c_uint8, 1),
        ('str', c_ubyte * 8),
        ('', c_ubyte * 3), # Unnamed fields apparently work, go figure
    ]

>>> sizeof(Test)
12
>>> Test.str.offset
1

Answer 2

问题是位域的实现明确地依赖于实现。 C 标准（C11 的参考 n1256 草案）在 6.7.2.1 结构和联合说明符中说：

...
§4 A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.
...
9 A bit-field is interpreted as a signed or unsigned integer type consisting of the specified number of bits

好的，你想要一个使用一位的无符号类型，但下一段说（强调我的）：

10 An implementation may allocate any addressable storage unit large enough to hold a bitfield.

这里一个字节就够了，所以gcc会用一个字节。但是 32 位模式下的 MSVC 将使用 4 个字节，C 程序的输出将是：

  4 str offset
 12 total
  4 str_pack offset
 12 total

CPython 实现输出的内容

这意味着 ctypes 实施者选择在此处遵循 MSVC。除了大多数示例使用 Windows:

之外，模块文档中的两句话对此也给出了提示

16.16.1.10 中的警告。结构和联合¶

Warning: ctypes does not support passing unions or structures with bit-fields to functions by value. While this may work on 32-bit x86, it’s not guaranteed by the library to work in the general case.

下一段在 16.16.1.11 中说明。 Structure/union对齐和字节顺序

By default, Structure and Union fields are aligned in the same way the C compiler does it... This is what #pragma pack(n) also does in MSVC

问题是 ctypes 模块实现者选择了一个约定——他们不能在这里直接依赖编译器，因为位字段的大小必须是一个常量——独立于平台。您可能认为这是非 MSVC 实现的问题并填写错误报告，但我也认为 ctypes 模块主要用于 Windows.

所以能够在此处处理您的 C 结构的唯一方法是显式强制位字段的基础类型并将第一个字段声明为无符号字节，并最终指定一个显式填充：

class Test(Structure):
    _fields_ = [
        ('bit', c_byte),
        ('str', c_ubyte * 8),
        ('', c_ubyte * 3),      # explicit padding
    ]

class TestPacked(Structure): 
    _fields_ = [
        ('bit_p', c_ubyte), 
        ('str_p', c_ubyte * 8),
    ]

但这只是一种解决方法，因为这里的 bit 和 bit_p 是普通字节，而 C 代码要求只使用 1 位，其他 7 位是填充位。

Answer 3

这可能是这个众所周知的问题的重复： https://bugs.python.org/issue29753 一个补丁已经在几个月前验证过了，但还没有合并到官方源中。

带/位域的结构：Python 和 C 填充不同

Struct w/ bit field: CPython and C padding differs

python

ctypes

padding

memory-alignment