比较顺序数组元素时,Clang 无法矢量化

Clang fails to vectorize when comparing sequential array elements

我正在尝试向量化我的对齐函数的内部循环,并且 运行 遇到了一个我不明白的问题。当比较输入数组中连续的两个元素时,循环不会向量化,但当被比较的元素偏移 2 时,它会成功向量化。一个最小的例子:

int *vec_test(int *input) {
  int i, n1, n2;
  int *out = (int *) malloc(100 * sizeof(int));

  // This loop fails to vectorize
  for(i=1;i<100;i++) {
    n1 = input[i-1];
    n2 = input[i];
    out[i] = n1 > n2 ? n1 : n2;
  }

  // This loop successfully vectorizes
  for(i=1;i<100;i++) {
    n1 = input[i-1];
    n2 = input[i+1];
    out[i] = n1 > n2 ? n1 : n2;
  }

  return(out);
}

当我使用 clang 编译此代码时 (clang++ -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -c minimal.cpp) 第二个循环矢量化,但第一个循环不.

minimal.cpp:17:17: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop

minimal.cpp:23:3: remark: vectorized loop (vectorization factor: 4, unrolling interleave factor: 1) [-Rpass=loop-vectorize]

唯一的区别是被比较的元素在第一个循环中是连续的,而在第二个循环中偏移了 2。为什么第一个循环无法矢量化?

编辑: 用不同的宽度类型(int64_t、int32_t 或 int16_t)替换 int 会产生相同的结果案例:底部循环矢量化,顶部循环未能矢量化。

这个故障看起来像是 clang ~3.8 中的一个错误,已被 3.9.0 解决。

$ clang++ -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -c minimal.cpp
minimal.cpp:8:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
  for(i=1;i<100;i++) {
  ^
minimal.cpp:8:3: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]
minimal.cpp:15:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
  for(i=1;i<100;i++) {
  ^
minimal.cpp:15:3: remark: vectorized loop (vectorization width: 4, interleaved count: 1) [-Rpass=loop-vectorize]

$ clang++ --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/clang-latest/bin

另见 https://godbolt.org/g/Nw0kk1