UB 取消引用联合数组时

UB When Dereferencing Array of Unions

以下哪些是未定义的行为:

template <class T> struct Struct { T t; };

template <class T> union Union { T t; };

template <class T> void function() {
  Struct aS[10];
  Union aU[10];

  // do something with aS[9].t and aU[9].t including initialization

  T *aSP = reinterpret_cast<T *>(aS);
  T *aUP = reinterpret_cast<T *>(aU);

  // so here is this undefined behaviour?
  T valueS = aSP[9];
  // use valueS in whatever way

  // so here is this undefined behaviour?
  T valueU = aUP[9];
  // use valueU in whatever way

  // now is accessing aS[9].t or aU[9].t now UB?
}

所以是的,最后 3 个操作中的哪个是 UB?

(我的推理:我不知道这个结构,如果有任何要求它的大小与其单个元素相同,但 AFAIK 联合必须是相同的大小作为元素。我不知道联合的对齐要求,但我猜它是一样的。对于结构我不知道。在联合的情况下,我猜它不是 UB,但正如我说,我真的真的不确定。对于struct其实我也不知道)

我找到了 c++ - Is sizeof(T) == sizeof(int). This specifies that structs do not have to have the same size as their elements (sigh). As for unions, the same would probably apply (after reading the answers, I am led to believe so). This is alone necessary to make this situation UB. However, if sizeof(Struct) == sizeof(T), and "It's well-established that" in , a pointer to aSP[9] would be the same location as that of aS[9] (at least I think so), and reinterpret_cast'ing that is guarantied by the standard (according to the quote in ).

编辑:这实际上是错误的。正确答案是 .

所以如果我们查看 reinterpret_cast (here)

的文档

5) Any object pointer type T1* can be converted to another object pointer type cv T2*. This is exactly equivalent to static_cast(static_cast(expression)) (which implies that if T2's alignment requirement is not stricter than T1's, the value of the pointer does not change and conversion of the resulting pointer back to its original type yields the original value). In any case, the resulting pointer may only be dereferenced safely if allowed by the type aliasing rules (see below)

现在别名规则怎么说?

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  1. AliasedType and DynamicType are similar.
  2. AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  3. AliasedType is std::byte, (since C++17)char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

所以它不是 2 也不是 3。可能是 1?

相似:

Informally, two types are similar if, ignoring top-level cv-qualification:

  1. they are the same type; or
  2. they are both pointers, and the pointed-to types are similar; or
  3. they are both pointers to member of the same class, and the types of the pointed-to members are similar; or
  4. they are both arrays of the same size or both arrays of unknown bound, and the array element types are similar.

并且,from C++17 draft

Two objects a and b are pointer-interconvertible if:

  • they are the same object, or
  • one is a union object and the other is a non-static data member of that object ([class.union]), or
  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast. [ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note]

所以,对我来说:

T *aSP = reinterpret_cast<T *>(aS); // Is OK
T *aUP = reinterpret_cast<T *>(aU); // Is OK. 

tl;dr:上面代码中的最后两个语句将始终调用未定义的行为,只需将指向联合的指针转换为指向其成员类型之一的指针通常就可以了,因为它实际上并没有做任何事情(它在最坏的情况下是未指定的,但绝不是未定义的行为;注意:我们谈论的只是演员表本身,使用演员表的结果访问对象是完全不同的故事)。


根据 T 的最终结果,Struct<T> 可能是一个 standard-layout 结构 [class.prop]/3 在这种情况下

T *aSP = reinterpret_cast<T *>(aS);

将是 well-defined 因为 Struct<T> 将是 pointer-interconvertible 及其第一个成员(类型为 T[basic.compound]/4.3. Above reinterpret_cast is equivalent to [expr.reinterpret.cast]/7

T *aSP = static_cast<T *>(static_cast<void *>(aS));

这将调用 array-to-pointer 转换 [conv.array], resulting in a Struct<T>* pointing to the first element of aS. This pointer is then converted to void* (via [expr.static.cast]/4 and [conv.ptr]/2), which is then converted to T*, which would be legal via [expr.static.cast]/13:

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

同样,

T *aUP = reinterpret_cast<T *>(aU);
如果 Union<T> 是一个 standard-layout 联合,那么

在 C++17 中将是 well-defined 并且在即将到来的 C++ 版本中通常看起来是 well-defined 基于当前的标准草案,其中工会及其成员之一总是 pointer-interconvertible [basic.compound]/4.2

然而,以上所有都是无关紧要的,因为

T valueS = aSP[9];

T valueU = aUP[9];

无论如何都会调用未定义的行为。 aSP[9]aUP[9](根据定义)分别与 *(aSP + 9)*(aUP + 9) 相同 [expr.sub]/1. The pointer arithmetic in these expressions is subject to [expr.add]/4

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.
  • Otherwise, the behavior is undefined.

aSPaUP 不指向数组的元素。即使 aSPaUP 将是 pointer-interconvertible 和 T,您也只能访问元素 0 并计算(但不能访问)元素 1 的地址假设的 single-element 数组…