读取和比较 POD 类型的填充字节是未定义的行为吗?

Is it undefined behavior to read and compare padding bytes of a POD type?

今天我遇到了一些大致类似于以下代码片段的代码。 valgrindUndefinedBehaviorSanitizer 都检测到未初始化数据的读取。

template <typename T>
void foo(const T& x)
{
    static_assert(std::is_pod_v<T> && sizeof(T) > 1);
    auto p = reinterpret_cast<const char*>(&x);

    std::size_t i = 1; 
    for(; i < sizeof(T); ++i)
    {
        if(p[i] != p[0]) { break; }
    }

    // ...
}

上述工具在 p[i] != p[0] 比较时抱怨 包含填充字节的对象已传递给 foo。示例:

struct obj { char c; int* i; };
foo(obj{'b', nullptr});

从 POD 类型读取填充字节并将它们与其他类型进行比较是未定义的行为吗?我在 Standard 和 Whosebug 上都找不到明确的答案。

您的程序的行为是 实现定义的 有两个方面:


1) 在 C++14 之前:由于您的 char 可能有 1 的补码或有符号大小 signed 类型,您 可能 return 由于比较 +0 和 -0,结果令人惊讶。

真正无懈可击的方法是使用 const unsigned char* 指针。这消除了对现在废除的(从 C++14 开始)1 的补码或有符号大小 char.

的任何担忧

由于 (i) 您拥有内存,(ii) 您正在使用指向 x 的指针,并且 (iii) unsigned char 不能包含陷阱表示,(iv) [= 11=、unsigned charsigned char 不受 严格的别名规则 约束,使用 const unsigned char* 读取未初始化内存的行为非常好定义。


2) 但是因为你不知道那个未初始化的内存中包含什么,所以读取它的行为是未指定的,这意味着程序行为是实现定义的,因为 char 类型不能包含陷阱表示。

视情况而定

如果 x 是零初始化的,则填充具有零位,因此这种情况 定义明确(C++14 的 8.5/6):

To zero-initialize an object or reference of type T means:

— if T is a scalar type (3.9), the object is initialized to the value obtained by converting the integer literal

0 (zero) to T;105

— if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class

subobject is zero-initialized and padding is initialized to zero bits;

— if T is a (possibly cv-qualified) union type, the object’s first non-static named data member is zero-

initialized and padding is initialized to zero bits;

— if T is an array type, each element is zero-initialized; — if T is a reference type, no initialization is performed.

但是,如果 x 是默认初始化的,则未指定填充,因此它具有不确定的值(根据此处未提及填充的事实推断)(8.5/7):

To default-initialize an object of type T means:

— if T is a (possibly cv-qualified) class type (Clause 9), the default constructor (12.1) for T is called (and the initialization is ill-formed if T has no default constructor or overload resolution (13.3) results in an ambiguity or in a function that is deleted or inaccessible from the context of the initialization);

— if T is an array type, each element is default-initialized;

— otherwise, no initialization is performed.

在这种情况下,比较不确定值是 UB,因为 none 提到的例外情况适用,因为您将不确定值与某物 (8.5/12) 进行比较:

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [ Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2. — end note ] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

— If an indeterminate value of unsigned narrow character type (3.9.1) is produced by the evaluation of:

......— the second or third operand of a conditional expression (5.16),

......— the right operand of a comma expression (5.18),

......— the operand of a cast or conversion to an unsigned narrow character type (4.7, 5.2.3, 5.2.9, 5.4),

or

......— a discarded-value expression (Clause 5), then the result of the operation is an indeterminate value.

— If an indeterminate value of unsigned narrow character type is produced by the evaluation of the right operand of a simple assignment operator (5.17) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

— If an indeterminate value of unsigned narrow character type is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

Bathsheba 的回答正确地描述了 C++ 标准的字母。

坏消息是,我测试过的所有现代编译器(GCC、Clang、MSVC 和 ICC)在这一点上都忽略了标准的字母。他们改为将 Annex J.2 中的秃头语句处理为 C 标准

[the behavior is undefined if] the value of an object with automatic storage duration is used while it is indeterminate

就好像它是 100% 规范的,在 C 和 C++ 中,即使 Annex J 不是规范的。这适用于 所有 对未初始化存储的可能读取访问,包括通过 unsigned char * 仔细执行的读取访问,并且,是的,包括对填充字节的读取访问。

此外,如果您要提交错误报告,我相信您会被告知,如果标准的规范性文本与他们所做的不一致,那是 标准有缺陷。

消息是,如果您检查填充字节的内容,您只会在访问填充字节时产生 UB。复制它们是可以的。特别是,如果您初始化 POD 结构的所有命名字段,通过结构赋值和 memcpy 复制它是安全的,但 而不是 是安全的使用 memcmp.

将其与另一个此类结构进行比较