OpenCL私有内存中小数组的部分排序

Partial sorting of small array in OpenCL private memory

任务是在数组中找到N个最大的元素。该数组非常小(约 40 项)。我正在使用这个算法:

    float max1 = -inf;
    int max1I = -1;
    float max2 = -inf;
    int max2I = -1;
    float max3 = -inf;
    int max3I = -1;
    float max4 = -inf;
    int max4I = -1;
    float max5 = -inf;
    int max5I = -1;
    float performances[MAX_NUMBER_OF_SELECTIONS];
    for (int i = 0; i < numberOfSelections; ++i) {
        float performance = /*some calculations*/;
        performances[i] = performance;
        if (performance > max1) {
            max5 = max4; max5I = max4I;
            max4 = max3; max4I = max3I;
            max3 = max2; max3I = max2I;
            max2 = max1; max2I = max1I;
            max1 = performance; max1I = i;
        } else if (performance > max2) {
            max5 = max4; max5I = max4I;
            max4 = max3; max4I = max3I;
            max3 = max2; max3I = max2I;
            max2 = performance; max2I = i;
        } else if (performance > max3) {
            max5 = max4; max5I = max4I;
            max4 = max3; max4I = max3I;
            max3 = performance; max3I = i;
        } else if (performance > max4) {
            max5 = max4; max5I = max4I;
            max4 = performance; max4I = i;
        } else if (performance > max5) {
            max5 = performance; max5I = i;
        }
    }

这个方法已经足够好了,但现在我需要让它成为前 10 名而不是前 5 名。我应该复制粘贴这个模式吗?或者也许有更好的东西?

如果您想在大型数组上执行此操作,则此代码无效。 我假设您有很多小数组,每个工作项都在其中一个上工作。

我会做类似的事情:

//Init
float maxs[10+1];
for(int i=0; i<10+1; i++){
    maxs[i] = -inf;
}

for(int i=0; i<size; i++){
    //Is it higher than the element 0?
    if(data[i] > maxs[0]){
        maxs[0] = data[i];
        for(int j=0; j<10; j++){
            if(maxs[j] > maxs[j+1])
                swap(maxs[j], maxs[j+1]);
            else break;
        }
    }
}

现在你有一个由 11 个元素组成的数组,从小到大排列,只取最后 10 个元素。

代码可以进一步优化,但是很简单