在 CUDA 中释放数组的一部分

Question

假设我在设备 (CUDA) 上有一组数字，例如

float *d_x;
cudaMalloc(&x, N*sizeof(float));

其中 x 类似于 [0,0,3,0,3,0,3,1,5,1,0]。

我正在对数组执行两个操作。细节并不重要，但第一个操作将作为一种预处理，排列 x 的值并返回一个索引，第二个操作将仅对第一个 n 值执行一些操作数组，其中 n 是第一个操作返回的值。

我的问题是，第二个操作本质上在计算上要昂贵得多，并且会花费更多时间，而实际上只涉及数组的前 n 个值。

所以，像

uint operation1(float* d_x)
{
    // call some kernel and wait for the kernel to execute.
    // The kernel reorders x into [3,3,1,5,1,0,0,0,0,0]
    return n; // n in this case is 5, because there are 5 nonzero values in d_x
}
void operation2(float* d_x, int n)
{
    // call another kernel, sorting the subarray [3,3,1,5,1], and never touching the values at index
    // n or above
    // In other words, sort the subarray of values *d_x, *(d_x + 1), ... *(d_x + n - 1) to get
    // [1,1,3,3,5]
}

int main()
{
    float* d_x;
    // fill d_x with input data
    int n = operation1(d_x);
    // many many lines of code doing several other things with it.
    operation2(d_x, n);
    // more code.
}

我的问题是双重的：

取消分配数组中在 operation1 之后不再使用的部分是个好主意吗？
如果是，最安全、最干净的方法是什么？

Answer 1

Is it a good idea to deallocate the part of the array that will no longer be used after operation1?

与其说这是一个“好”想法，还不如说是一个完全不受支持的想法。 CUDA API 中没有 realloc 风格的操作，并且考虑到 GPU 上内存分配的成本和同步性质，从性能的角度来看不是一个好主意，即使有这样的事情（或你自己的等效分配 - 无副本实施）。

If yes, what is the safest and cleanest way to go about this?

见上文。

在 CUDA 中释放数组的一部分

Deallocate part of array in CUDA

c++

cuda