使用联合和移位读取双平台字节序，安全吗？

Question

我见过的所有从缓冲区读取双字节字节序到平台字节序的示例都涉及检测当前平台的字节序并在必要时执行字节交换。

另一方面，除了使用位移的整数 (one such example)。

这让我想到可以使用联合和移位技术从缓冲区中读取双精度数（和浮点数），并且快速测试实现似乎有效（至少在 x86_64):

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

double read_double(char * buffer, bool le) {
    union {
        double d;
        uint64_t i;
    } data;
    data.i = 0;

    int off = le ? 0 : 7;
    int add = le ? 1 : -1;
    for (int i = 0; i < 8; i++) {
        data.i |= ((uint64_t)(buffer[off] & 0xFF) << (i * 8));
        off += add;
    }
    return data.d;
}

int main() {
    char buffer_le[] = {0x6E, 0x86, 0x1B, 0xF0, 0xF9, 0x21, 0x09, 0x40};
    printf("%f\n", read_double(buffer_le, true)); // 3.141590

    char buffer_be[] = {0x40, 0x09, 0x21, 0xF9, 0xF0, 0x1B, 0x86, 0x6E};
    printf("%f\n", read_double(buffer_be, false)); // 3.141590

    return 0;
}

但我的问题是，这样做安全吗？还是这里涉及未定义的行为？或者如果这个方法和字节交换方法都涉及未定义的行为，一个比另一个更安全？

Answer 1

通过联盟重新诠释

通过移位和ORing字节构造一个uint64_t值当然是C标准支持的。（由于需要确保左操作数的大小和类型正确以避免溢出和移位宽度问题，移位时存在一些危险，但问题中的代码在移位前正确转换为 uint64_t 。）然后代码剩下的问题是 C 标准是否允许通过联合重新解释。答案是肯定的。

C 6.5.2.3 3 说：

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,⁹⁹⁾…

注释 99 说：

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning")…

这种重新解释当然依赖于 C 实现中使用的对象表示。值得注意的是 double 必须使用预期的格式，匹配从输入流中读取的字节。

修改对象的字节

C 允许通过修改对象的字节（如使用指向 unsigned char 的指针）来修改对象。C 2018 6.5 7 说：

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [list of various types], or a character type.

虽然其中一条评论指出您可以通过这种方式“访问”但不能“修改”对象的字节（显然将“访问”解释为仅表示读取，而不是写入），但 C 2018 3.1 定义了“访问” ” 如：

to read or modify the value of an object.

因此，允许通过字符类型读取或写入对象的字节。

Answer 2

Reading double to platform endianness with union and bit shift, is it safe?

这种事情只有在处理来自程序外部的数据（例如来自文件或网络的数据）时才有意义；你有一个严格的数据格式（在文件格式规范或网络协议规范中定义）可能与 C 使用的格式无关，可能与 CPU 使用无关，也可能不也可以是 IEEE 754 格式。

另一方面，C 根本不提供任何保证。举个简单的例子，编译器为 float 使用 BCD 格式是完全合法的，其中 0x12345e78 = 1.2345 * 10**78，即使 CPU 本身恰好支持 "IEEE 754".

结果是您从程序外部获得了 "whatever the spec says format"，并且您正在将其转换为不同的 "whatever the compiler felt like format" 以便在程序内部使用；并且您所做的每一个假设（包括 sizeof(double)）都可能是错误的。

使用联合和移位读取双平台字节序，安全吗？

Reading double to platform endianness with union and bit shift, is it safe?

c

decode

decoding

endianness

undefined-behavior

通过联盟重新诠释

修改对象的字节