使用 union 检测字节顺序是否安全?

Is it safe to detect endianess with union?

换句话说,根据C标准,这段代码安全吗? (假设uint8_t是一个字节)

void detectEndianness(void){
    union {
        uint16_t w;
        uint8_t b;
    } a;
    a.w = 0x00FFU;
    if (a.b == 0xFFU) {
        puts("Little endian.");
    }
    else if (a.b == 0U) {
        puts("Big endian.");
    }
    else {
        puts("Stack Overflow endian.");
    }
}

如果改成这样呢?请注意我所知道的第三个 if 案例。

a.w = 1U;
if (a.b == 1U) { puts("Little endian."); }
else if (a.b == 0U) { puts ("Big endian."); }
else if (a.b == 0x80U) { /* Special potential */ }
else { puts("Stack Overflow endian."); }

引用自 n1570:

6.5.2.3 Structure and union members - p3

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, and is an lvalue if the first expression is an lvalue.

6.2.6 Representations of types / 1 General - p7

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

这是允许的。如果考虑 note 95(尽管只是提供信息),您的用例甚至可以被视为一个预期目的:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

现在,由于 uintN_t 类型族被定义为没有填充位

7.20.1.1 Exact-width integer types - p2

The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

它们所有的位表示都是有效值,没有陷阱表示是可能的。所以我们必须得出结论,它确实会检查 uint16_t.

的字节顺序

该标准(在链接的在线草案中可用)在 footnote 中表示,允许访问与先前编写的成员不同的同一联盟的成员:

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning''). This might be a trap representation.

但是脚注还提到了一种可能的陷阱表示,并且标准保证在陷阱表示方面安全的唯一数据类型是unsigned char。访问陷阱表示可能是未定义的行为;尽管我不认为 unit_32 可能会在您的平台上产生陷阱表示,但访问此成员是否是 UB 实际上取决于实现。

不要求一个字节中的位顺序与较大类型中相应位的顺序相匹配。例如,定义 uint32_t 并具有 8 位 unsigned char 的一致实现可以使用每个字节的四位存储 uint32_t 的高 16 位,并存储低 16 位位使用每个字节的剩余四位。从Standard的角度来看,32中的任何一个!位的排列同样可以接受。

话虽如此,任何并非故意迟钝且旨在在普通平台上 运行 的实现都将使用两种顺序之一 [将字节视为 8 个连续位的组,在order 0123 or 3210],不使用上述任何一种并且针对任何不完全晦涩的平台的人将使用 2301 或 1032。标准不禁止其他排序,但不适应它们将是非常不太可能造成任何麻烦,除非使用设计迟钝的实现。