通过自定义内核更改 cuda::GpuMat 值

Question

我正在使用内核“循环”实时摄像机流以突出显示特定颜色区域。这些不能总是用一些 cv::threshold 重建，因此我使用内核。

当前内核如下：

__global__ void customkernel(unsigned char* input, unsigned char* output, int width, int height, int colorWidthStep, int outputWidthStep) {
    const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
    const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;

    if ((xIndex < width) && (yIndex < height)) {
        const int color_tid = yIndex * colorWidthStep + (3*xIndex);
        const int output_tid = yIndex * outputWidthStep + (3*xIndex);
        const unsigned char red   = input[color_tid+0];
        const unsigned char green = input[color_tid+1];
        const unsigned char blue  = input[color_tid+2];
        if (!(red > 100 && blue < 50 && red > 1.0*green)) {
            output[output_tid] = 255;
            output[output_tid+1] = 255; 
            output[output_tid+2] = 255;
        } else {
            output[output_tid] = 0;
            output[output_tid+1] = 0;
            output[output_tid+2] = 0;
        }
    }
}

这里调用了这个内核：

extern "C" void myFunction(cv::cuda::GpuMat& input, cv::cuda::GpuMat& output) {
    // Calculate total number of bytes of input and output image
    const int colorBytes = input.step * input.rows;
    const int outputBytes = output.step * output.rows;

    unsigned char *d_input, *d_output;

    // Allocate device memory
    SAFE_CALL(cudaMalloc<unsigned char>(&d_input,colorBytes),"CUDA Malloc Failed");
    SAFE_CALL(cudaMalloc<unsigned char>(&d_output,outputBytes),"CUDA Malloc Failed");

    // Copy data from OpenCV input image to device memory
    SAFE_CALL(cudaMemcpy(d_input,input.ptr(),colorBytes,cudaMemcpyHostToDevice),"CUDA Memcpy Host To Device Failed");

    // Specify a reasonable block size
    const dim3 block(16,16);

    // Calculate grid size to cover the whole image
    const dim3 grid((input.cols + block.x - 1)/block.x, (input.rows + block.y - 1)/block.y);

    // Launch the color conversion kernel
    custom_kernel<<<grid,block>>>(d_input,d_output,input.cols,input.rows,input.step,output.step);

    // Synchronize to check for any kernel launch errors
    SAFE_CALL(cudaDeviceSynchronize(),"Kernel Launch Failed");

    // Copy back data from destination device meory to OpenCV output image
    SAFE_CALL(cudaMemcpy(output.ptr(),d_output,outputBytes,cudaMemcpyDeviceToHost),"CUDA Memcpy Host To Device Failed");

    // Free the device memory
    SAFE_CALL(cudaFree(d_input),"CUDA Free Failed");
    SAFE_CALL(cudaFree(d_output),"CUDA Free Failed");
}

我包含了一个示例图像，显示了内核在一辆红色汽车上的结果。正如您所看到的，有垂直红线，即使我尝试访问 RGB/BGR 值并将它们设置为零或 255。

我使用以下内容作为开始，但我觉得 cv::Mat 和 cv::cuda::GpuMat 没有以相同的方式保存它们的值。我读到 GpuMat 只有一个指向其数据的指针，并认为它将与 blockIdx、blockDim 参数一起使用。 https://github.com/sshniro/opencv-samples/blob/master/cuda-bgr-grey.cpp

具体问题：

红线是什么原因？
如何正确更改 RGB 值？

我在 NVidia Xavier NX 上的 Ubuntu 18.04 上使用 Cuda 10.2。

如评论中所述，我更改了 cudaMemcpy 函数的参数并删除了 cudaMalloc 和 cudaFree 部分。另外我提醒自己，OpenCV 将颜色存储在 BGR 中，所以我更改了内核中的 (+0,+1,+2)。我直接通过 cv::imread 加载了红色汽车，以排除任何以前的格式错误。太成功了，内核工作。

Answer 1

正如 @sgarizvi 在评论中提到的那样 cv::cuda::GpuMat 已经存在于 Gpu 中，所以我不得不使用 cudaMemcpyDeviceToDevice 而不是 cudaMemcpyHostToDevice。

也没有必要分配新的内存，这是通过删除上面代码的cudaMalloc和cudaFree部分实现的。

最后（只是在这种情况下，其他人可能会有所不同）我的图像输入是 StereoLabs 的 Zed 2，它以 RGBA 格式发布图像，因此内存是 R -> G -> B -> A，转换为 OpenCV 是 B -> G -> R -> A，每个像素 4 步：

const int color_tid = yIndex * colorWidthStep + (4*xIndex);
const int output_tid = yIndex * outputWidthStep + (4*xIndex);

因此，要正确定位每个像素，您必须将指针增加 xIndex 的四倍，如果您只有 BGR/RGB 图像，则使用三次；如果是灰度图像，则使用一次。

通过自定义内核更改 cuda::GpuMat 值

Change cuda::GpuMat values through custom kernel

c++

opencv

cuda