CUDA 的 Lambda 表达式
Lambda expressions with CUDA
如果我在 thrust::host
上使用 thrust::transform
,lambda 用法就可以了
thrust::transform(thrust::host, a, a+arraySize,b,d,[](int a, int b)->int
{
return a + b;
});
但是,如果我将 thrust::host
更改为 thrust::device
,代码将无法通过编译器。这是 VS2013 上的错误:
The closure type for a lambda ("lambda [](int, int)->int") cannot be used in the template argument type of a __global__
function template instantiation, unless the lambda is defined within a __device__
or __global__
function
所以,问题是如何使用 __device__
或 __global__
连接到设备 lambda。
在 CUDA 7 中这是不可能的。引用自 Mark Harris:
That isn't supported today in CUDA, because the lambda is host code. Passing lambdas from host to device is a challenging problem, but it is something we will investigate for a future CUDA release.
What you can do in CUDA 7 is call thrust algorithms from your device code, and in that case you can pass lambdas to them...
使用 CUDA 7,可以从设备代码(例如 CUDA 内核或 __device__
仿函数)调用推力算法。在这些情况下,您可以使用带有推力的(设备)lambda。 parallelforall 博客 post here.
中给出了一个示例
但是,CUDA 7.5 引入了实验性设备 lambda 功能。此功能描述为 here:
CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a __device__
specifier.
为了启用此功能的编译,(目前,使用 CUDA 7.5)需要在 nvcc
编译命令行上指定 --expt-extended-lambda
。
这个使用设备 lambda 的简单代码在 CUDA 8.0 RC 下工作,尽管此版本 CUDA 的设备 lambda 仍处于实验阶段:
#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>
using namespace thrust::placeholders;
int main(void)
{
// --- Input data
float a = 2.0f;
float x[4] = { 1, 2, 3, 4 };
float y[4] = { 1, 1, 1, 1 };
thrust::device_vector<float> X(x, x + 4);
thrust::device_vector<float> Y(y, y + 4);
thrust::transform(X.begin(),
X.end(),
Y.begin(),
Y.begin(),
[=] __host__ __device__ (float x, float y) { return a * x + y; } // --- Lambda expression
);
for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl;
return 0;
}
记得用
--expt-extended-lambda
编译。
如果我在 thrust::host
上使用 thrust::transform
,lambda 用法就可以了
thrust::transform(thrust::host, a, a+arraySize,b,d,[](int a, int b)->int
{
return a + b;
});
但是,如果我将 thrust::host
更改为 thrust::device
,代码将无法通过编译器。这是 VS2013 上的错误:
The closure type for a lambda ("lambda [](int, int)->int") cannot be used in the template argument type of a
__global__
function template instantiation, unless the lambda is defined within a__device__
or__global__
function
所以,问题是如何使用 __device__
或 __global__
连接到设备 lambda。
在 CUDA 7 中这是不可能的。引用自 Mark Harris:
That isn't supported today in CUDA, because the lambda is host code. Passing lambdas from host to device is a challenging problem, but it is something we will investigate for a future CUDA release.
What you can do in CUDA 7 is call thrust algorithms from your device code, and in that case you can pass lambdas to them...
使用 CUDA 7,可以从设备代码(例如 CUDA 内核或 __device__
仿函数)调用推力算法。在这些情况下,您可以使用带有推力的(设备)lambda。 parallelforall 博客 post here.
但是,CUDA 7.5 引入了实验性设备 lambda 功能。此功能描述为 here:
CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a
__device__
specifier.
为了启用此功能的编译,(目前,使用 CUDA 7.5)需要在 nvcc
编译命令行上指定 --expt-extended-lambda
。
这个使用设备 lambda 的简单代码在 CUDA 8.0 RC 下工作,尽管此版本 CUDA 的设备 lambda 仍处于实验阶段:
#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/transform.h>
using namespace thrust::placeholders;
int main(void)
{
// --- Input data
float a = 2.0f;
float x[4] = { 1, 2, 3, 4 };
float y[4] = { 1, 1, 1, 1 };
thrust::device_vector<float> X(x, x + 4);
thrust::device_vector<float> Y(y, y + 4);
thrust::transform(X.begin(),
X.end(),
Y.begin(),
Y.begin(),
[=] __host__ __device__ (float x, float y) { return a * x + y; } // --- Lambda expression
);
for (size_t i = 0; i < 4; i++) std::cout << a << " * " << x[i] << " + " << y[i] << " = " << Y[i] << std::endl;
return 0;
}
记得用
--expt-extended-lambda
编译。