opencl 点积。在这里，我试图将结果存储在每次都重置为零的局部变量中

Question

我的内核代码。

__kernel void OUT__1__1527__(__constant float *A,__constant float *B,__global float *res)
 {
  int i = get_global_id(0);
  float C=0;
  if (i <= 5 - 1) {
  C += (A[i] * B[i]);
  *res=C;
 }
}

A 和 B 的值都是 {1,2,3,4,5}。对于这个内核，我得到的结果是 25，即 5*5，因为我希望结果为 55。(1*1+2*2+3*3+4*4+5*5)

同步需要插入什么代码，需要插入什么代码

Answer 1

这些东西没有灵丹妙药。这是一个典型的归约问题（您希望将多个结果合并到一个变量中）。

如果这不是算法的瓶颈（即：您正在内核中的其他地方执行其他更昂贵的进程），则可以使用原子（但不能使用浮点值）。但是如果这是内核的核心。那么你应该彻底改变你的算法。

您可以先阅读以下内容：http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/

你的代码也是错误的，它实际上 "merge" 没有 res 中的任何数据。它只将 res 设置为 C 的值。 C 对每个线程都是私有的，因此，它不会对其中的任何内容求和。只有最后一个线程实际上赢得了数据竞赛，导致答案为 25。

有一个技巧我不推荐基于联合和read/write多次在全局内存中使用浮点数中的原子：

inline void AtomicAdd(volatile __global float *source, const float operand) {
    union {
        unsigned int intVal;
        float floatVal;
    } newVal;
    union {
        unsigned int intVal;
        float floatVal;
    } prevVal;
    do {
        prevVal.floatVal = *source;
        newVal.floatVal = prevVal.floatVal + operand;
    } while (atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}

__kernel void OUT__1__1527__(__constant float *A,__constant float *B,__global float *res)
 {
  int i = get_global_id(0);
  if (i <= 5 - 1) {
    AtomicAdd(res, (A[i] * B[i]));
 }
}

opencl 点积。在这里，我试图将结果存储在每次都重置为零的局部变量中

opnecl dot product. Here i am trying to store the result in local variable which resets to zero every time

synchronization

opencl