OpenCL - 在数组的每 n 个元素中插入值

OpenCL - Insert values in every n elements of an array

我有一个包含 100 个元素的数组,我想要做的是将这 100 个元素复制到另一个数组的每个第 n 个元素中。

假设 n 是 3

在将值复制到每个第 n 个元素后,新数组将具有 [val1 0 0 val2 0 0 val3 0 0 ...]。现在在 opencl 中,我尝试创建一个指向当前索引的指针,每次我只是将 n 添加到该值。但是,当前索引始终只保留相同的值。下面是我的代码。

__kernel void ddc(__global float *inputArray, __global float *outputArray,  __const int interpolateFactor, __global int *currentIndex){
    int i = get_global_id(0);
    outputArray[currentIndex[0]] = inputArray[i];
    currentIndex[0] = currentIndex[0] + (interpolateFactor - 1);
    printf("index %i \n", currentIndex[0]);    
}

currentIndex 部分的主机代码:

int  *index;
index = (int*)malloc(2*sizeof(int));
index[0] = 0;

cl_mem currentIndex;
currentIndex = clCreateBuffer(
    context,
    CL_MEM_WRITE_ONLY,
    2 * sizeof(int),
    NULL,
    &status);
status = clEnqueueWriteBuffer(
    cmdQueue,
    currentIndex,
    CL_FALSE,
    0,
    2 * sizeof(int),
    index,
    0,
    NULL,
    NULL);
printf("Index enqueueWriteBuffer status: %i \n", status);
status |= clSetKernelArg(
    kernel,
    4,
    sizeof(cl_mem),
    &currentIndex);
printf("Kernel Arg currentIndex Factor status: %i \n", status);

If you are wondering why I am using an array with two elements, it's because I wasn't sure how to just reference a single variable. I just implemented it the same way I had the input and output array working. When I run the kernel with an interpolateFactor of 3, currentIndex is always printing 2.

因此,如果我理解正确的话,您要做的是将应该使用的下一个索引保存到 currentIndex。这是行不通的。对于其他工作项,该值不会立即更新。如果你想这样做,你将不得不顺序执行所有内核。

你能做的是

__kernel void ddc(__global float *inputArray, __global float *outputArray,  __const int interpolateFactor, int start){
    int i = get_global_id(0);
    outputArray[start+i*(interpolateFactor-1)] = inputArray[i];
}

假设你可以从 0 以外的任何其他点开始。否则你可以完全放弃它。

让它像你那样工作

int start = 0;
status |= clSetKernelArg(
    kernel,
    3, // This should be 3 right? Might have been the problem to begin with.
    sizeof(int),
    &start);

希望这对您有所帮助。