gcc 向量扩展中未对齐 load/store
Unaligned load/store in gcc vector extension
我需要使用 GCC 向量扩展访问未对齐的值
下面的程序在 clang 和 gcc 中都崩溃了
typedef int __attribute__((vector_size(16))) int4;
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
int main()
{
int v[64] __attribute__((aligned(16))) = {};
int4p ptr = reinterpret_cast<int4p>(&v[7]);
int4 val = *ptr;
}
但是如果我改变
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
到
typedef int __attribute__((vector_size(16),aligned(4))) int4u;
typedef int4u *int4up;
生成的汇编代码是正确的(使用未对齐加载)- 在 clang 和 gcc 中。
单一定义有什么问题或我遗漏了什么? clang 和 gcc 会不会是同一个 bug?
注:在clang和gcc中都会发生
TL;DR
您更改了指针类型自身的对齐方式,而不是指针对象 类型。这与 vector_size
属性无关,而与 aligned
属性有关。它也不是错误,并且在 GCC 和 Clang 中都正确实现了。
说来话长
来自 GCC 文档,§ 6.33.1 Common Type Attributes (强调已添加):
aligned
(alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. [...]
有问题的类型是声明的类型,不是 声明的类型指向的类型。因此,
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
声明一个指向类型 *T 的对象的新类型 T,其中:
- *T 是一个 16 字节向量,默认对齐其大小(16 字节)
- T 是指针类型,这种类型的变量可以异常存储,对齐到低至 4 字节边界(即使它们指向的是一个类型*T 更加对齐)。
与此同时,§ 6.49 Using Vector Instructions through Built-in Functions 说 (强调):
On some targets, the instruction set contains SIMD vector instructions which operate on multiple values contained in one large register at the same time. For example, on the x86 the MMX, 3DNow! and SSE extensions can be used this way.
The first step in using these extensions is to provide the necessary data types. This should be done using an appropriate typedef
:
typedef int v4si __attribute__ ((vector_size (16)));
The int
type specifies the base type, while the attribute specifies the vector size for the variable, measured in bytes. For example, the declaration above causes the compiler to set the mode for the v4si
type to be 16 bytes wide and divided into int sized units. For a 32-bit int this means a vector of 4 units of 4 bytes, and the corresponding mode of foo is V4SI.
The vector_size
attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct. Only sizes that are a power of two are currently allowed.
演示
#include <stdio.h>
typedef int __attribute__((aligned(128))) * batcrazyptr;
struct batcrazystruct{
batcrazyptr ptr;
};
int main()
{
printf("Ptr: %zu\n", sizeof(batcrazyptr));
printf("Struct: %zu\n", sizeof(batcrazystruct));
}
输出:
Ptr: 8
Struct: 128
这与 batcrazyptr ptr
本身 的对齐要求一致,而不是它的指针,并且与文档一致。
解决方案
恐怕您将被迫使用 typedef
链,就像您对 int4u
所做的那样。在 typedef
.
中用单独的属性来指定每个指针级别的对齐方式是不合理的
我需要使用 GCC 向量扩展访问未对齐的值
下面的程序在 clang 和 gcc 中都崩溃了
typedef int __attribute__((vector_size(16))) int4;
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
int main()
{
int v[64] __attribute__((aligned(16))) = {};
int4p ptr = reinterpret_cast<int4p>(&v[7]);
int4 val = *ptr;
}
但是如果我改变
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
到
typedef int __attribute__((vector_size(16),aligned(4))) int4u;
typedef int4u *int4up;
生成的汇编代码是正确的(使用未对齐加载)- 在 clang 和 gcc 中。
单一定义有什么问题或我遗漏了什么? clang 和 gcc 会不会是同一个 bug?
注:在clang和gcc中都会发生
TL;DR
您更改了指针类型自身的对齐方式,而不是指针对象 类型。这与 vector_size
属性无关,而与 aligned
属性有关。它也不是错误,并且在 GCC 和 Clang 中都正确实现了。
说来话长
来自 GCC 文档,§ 6.33.1 Common Type Attributes (强调已添加):
aligned
(alignment)This attribute specifies a minimum alignment (in bytes) for variables of the specified type. [...]
有问题的类型是声明的类型,不是 声明的类型指向的类型。因此,
typedef int __attribute__((vector_size(16),aligned(4))) *int4p;
声明一个指向类型 *T 的对象的新类型 T,其中:
- *T 是一个 16 字节向量,默认对齐其大小(16 字节)
- T 是指针类型,这种类型的变量可以异常存储,对齐到低至 4 字节边界(即使它们指向的是一个类型*T 更加对齐)。
与此同时,§ 6.49 Using Vector Instructions through Built-in Functions 说 (强调):
On some targets, the instruction set contains SIMD vector instructions which operate on multiple values contained in one large register at the same time. For example, on the x86 the MMX, 3DNow! and SSE extensions can be used this way.
The first step in using these extensions is to provide the necessary data types. This should be done using an appropriate
typedef
:typedef int v4si __attribute__ ((vector_size (16)));
The
int
type specifies the base type, while the attribute specifies the vector size for the variable, measured in bytes. For example, the declaration above causes the compiler to set the mode for thev4si
type to be 16 bytes wide and divided into int sized units. For a 32-bit int this means a vector of 4 units of 4 bytes, and the corresponding mode of foo is V4SI.The
vector_size
attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct. Only sizes that are a power of two are currently allowed.
演示
#include <stdio.h>
typedef int __attribute__((aligned(128))) * batcrazyptr;
struct batcrazystruct{
batcrazyptr ptr;
};
int main()
{
printf("Ptr: %zu\n", sizeof(batcrazyptr));
printf("Struct: %zu\n", sizeof(batcrazystruct));
}
输出:
Ptr: 8
Struct: 128
这与 batcrazyptr ptr
本身 的对齐要求一致,而不是它的指针,并且与文档一致。
解决方案
恐怕您将被迫使用 typedef
链,就像您对 int4u
所做的那样。在 typedef
.