CUDA 的 Lambda 表达式

Lambda expressions with CUDA

如果我在 thrust::host 上使用 thrust::transform,lambda 用法就可以了

thrust::transform(thrust::host, a, a+arraySize,b,d,[](int a, int b)->int
{
    return a + b;
});

但是,如果我将 thrust::host 更改为 thrust::device,代码将无法通过编译器。这是 VS2013 上的错误:

The closure type for a lambda ("lambda [](int, int)->int") cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function

所以,问题是如何使用 __device____global__ 连接到设备 lambda。

在 CUDA 7 中这是不可能的。引用自 Mark Harris:

That isn't supported today in CUDA, because the lambda is host code. Passing lambdas from host to device is a challenging problem, but it is something we will investigate for a future CUDA release.

What you can do in CUDA 7 is call thrust algorithms from your device code, and in that case you can pass lambdas to them...

使用 CUDA 7,可以从设备代码(例如 CUDA 内核或 __device__ 仿函数)调用推力算法。在这些情况下,您可以使用带有推力的(设备)lambda。 parallelforall 博客 post here.

中给出了一个示例

但是,CUDA 7.5 引入了实验性设备 lambda 功能。此功能描述为 here:

CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a __device__ specifier.

为了启用此功能的编译,(目前,使用 CUDA 7.5)需要在 nvcc 编译命令行上指定 --expt-extended-lambda

这个使用设备 lambda 的简单代码在 CUDA 8.0 RC 下工作,尽管此版本 CUDA 的设备 lambda 仍处于实验阶段:

#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>

using namespace thrust::placeholders;

int main(void)
{
    // --- Input data 
    float a = 2.0f;
    float x[4] = { 1, 2, 3, 4 };
    float y[4] = { 1, 1, 1, 1 };

    thrust::device_vector<float> X(x, x + 4);
    thrust::device_vector<float> Y(y, y + 4);

    thrust::transform(X.begin(), 
                      X.end(),  
                      Y.begin(), 
                      Y.begin(),
                      [=] __host__ __device__ (float x, float y) { return a * x + y; }      // --- Lambda expression 
                     );        

    for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl;

    return 0;
}

记得用

--expt-extended-lambda

编译。