OpenCL:GPU 上的单一计算设备?
OpenCL: single compute device on GPU?
所以我运行在 GeForce GT 610 上运行我的 OpenCL 程序。我知道 CUDA 是更好的选择,我可能稍后会编写我的代码的 CUDA 版本,但是我知道我正在写在 OpenCL 中,以便也能够 运行 在 AMD 显卡上。
在初始化过程中我选择了一个设备运行。这是我的程序在此阶段打印出的内容:
OpenCL Platform 0: NVIDIA CUDA
----- OpenCL Device # 0: GeForce GT 610-----
Gflops: 1.620000
Max Compute Units: 1
Max Clock Frequency: 1620
Total Memory of Device (bytes): 1072889856
Max Size of Memory Object Allocation (bytes): 268222464
Max Work Group Size: 1024
我的问题是为什么说最大计算单元只有1?根据 GeForce 网站上的规格详情,it has 48 CUDA cores。我知道 CUDA 运行 在 Nvidia 卡上更好,但它真的限制了这么多吗? Nvidia 将 OpenCL 限制为 1/48 的功率?
我的代码如下所示:
if (clGetPlatformInfo(platforms[platform], CL_PLATFORM_NAME, sizeof(name), name, NULL)) Fatal("Cannot get OpenCL platform name\n");
if (verbose) printf("OpenCL Platform %d: %s\n", platform, name);
...在forloop内部...
cl_uint compUnits, freq;
cl_ulong memSize, maxAlloc;
size_t maxWorkGrps;
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(compUnits), &compUnits, NULL)) Fatal("Cannot get OpenCL device units\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(freq), &freq, NULL)) Fatal("Cannot get OpenCL device frequency\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_NAME, sizeof(name), name, NULL)) Fatal("Cannot get OpenCL device name\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(memSize), &memSize, NULL)) Fatal("Cannot get OpenCL memory size.\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(memSize), &maxAlloc, NULL)) Fatal("Cannot get OpenCL memory size.\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(maxWorkGrps), &maxWorkGrps, NULL)) Fatal("Cannot get OpenCL max work group size\n");
int Gflops = compUnits * freq;
if (verbose) printf(" ----- OpenCL Device # %d: %s-----\n"
"Gflops: %f\n"
"Max Compute Units: %d\n"
"Max Clock Frequency: %d\n"
"Total Memory of Device (bytes): %lu\n"
"Max Size of Memory Object Allocation (bytes): %lu\n"
"Max Work Group Size: %d\n",
devId,
name,
1e-3*Gflops,
compUnits,
freq,
memSize,
maxAlloc,
maxWorkGrps);
My question is why does it say the max compute unit is only 1?
此处所指的计算单元对应于NVIDIA GPU SM(流式多处理器)。那个GPU正好有一个SM,里面有48个核心。
因此您不限于单核或该 GPU 能力的 1/48。访问该计算单元意味着您的程序将可以访问其中包含的 48 个内核。
所以我运行在 GeForce GT 610 上运行我的 OpenCL 程序。我知道 CUDA 是更好的选择,我可能稍后会编写我的代码的 CUDA 版本,但是我知道我正在写在 OpenCL 中,以便也能够 运行 在 AMD 显卡上。
在初始化过程中我选择了一个设备运行。这是我的程序在此阶段打印出的内容:
OpenCL Platform 0: NVIDIA CUDA
----- OpenCL Device # 0: GeForce GT 610-----
Gflops: 1.620000
Max Compute Units: 1
Max Clock Frequency: 1620
Total Memory of Device (bytes): 1072889856
Max Size of Memory Object Allocation (bytes): 268222464
Max Work Group Size: 1024
我的问题是为什么说最大计算单元只有1?根据 GeForce 网站上的规格详情,it has 48 CUDA cores。我知道 CUDA 运行 在 Nvidia 卡上更好,但它真的限制了这么多吗? Nvidia 将 OpenCL 限制为 1/48 的功率?
我的代码如下所示:
if (clGetPlatformInfo(platforms[platform], CL_PLATFORM_NAME, sizeof(name), name, NULL)) Fatal("Cannot get OpenCL platform name\n");
if (verbose) printf("OpenCL Platform %d: %s\n", platform, name);
...在forloop内部...
cl_uint compUnits, freq;
cl_ulong memSize, maxAlloc;
size_t maxWorkGrps;
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(compUnits), &compUnits, NULL)) Fatal("Cannot get OpenCL device units\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(freq), &freq, NULL)) Fatal("Cannot get OpenCL device frequency\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_NAME, sizeof(name), name, NULL)) Fatal("Cannot get OpenCL device name\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(memSize), &memSize, NULL)) Fatal("Cannot get OpenCL memory size.\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(memSize), &maxAlloc, NULL)) Fatal("Cannot get OpenCL memory size.\n");
if (clGetDeviceInfo(id[devId], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(maxWorkGrps), &maxWorkGrps, NULL)) Fatal("Cannot get OpenCL max work group size\n");
int Gflops = compUnits * freq;
if (verbose) printf(" ----- OpenCL Device # %d: %s-----\n"
"Gflops: %f\n"
"Max Compute Units: %d\n"
"Max Clock Frequency: %d\n"
"Total Memory of Device (bytes): %lu\n"
"Max Size of Memory Object Allocation (bytes): %lu\n"
"Max Work Group Size: %d\n",
devId,
name,
1e-3*Gflops,
compUnits,
freq,
memSize,
maxAlloc,
maxWorkGrps);
My question is why does it say the max compute unit is only 1?
此处所指的计算单元对应于NVIDIA GPU SM(流式多处理器)。那个GPU正好有一个SM,里面有48个核心。
因此您不限于单核或该 GPU 能力的 1/48。访问该计算单元意味着您的程序将可以访问其中包含的 48 个内核。