Cuda Python Error: TypingError: cannot determine Numba type of <class 'object'>
Cuda Python Error: TypingError: cannot determine Numba type of <class 'object'>
背景: 我正在尝试创建一个简单的 bootstrap 函数来进行替换采样。我想并行化该函数,因为我最终会将其部署到具有数百万个数据点的数据上,并且希望样本量更大。我还有 运行 其他示例,例如 Mandelbrot 示例。在下面的代码中,您会看到我有一个 CPU 版本的代码,它运行良好。
我已经阅读了一些资源来解决这个问题 运行:
问题: 这是我第一次涉足 CUDA 编程,我相信我已经正确设置了所有内容。我遇到了一个我似乎无法弄清楚的错误:
TypingError: cannot determine Numba type of <class 'object'>
我认为有问题的 LOC 是:
bootstrap_rand_gpu[threads_per_block, blocks_per_grid](rng_states, dt_arry_device, n_samp, out_mean_gpu)
解决问题的尝试:我不会详细说明,但这里有以下尝试
以为跟cuda.to_device()有关系。我改变了它,我还调用了 cuda.to_device_array_like()。我对所有参数都使用了 to_device() ,而且只有少数几个。我看过代码示例,其中它用于每个参数,有时不使用。所以我不确定应该做什么。
我已经删除了 GPU 的 运行dom 数字生成器 (create_xoroshiro128p_states) 并且只使用了一个静态值来测试。
使用 int() 显式分配整数(而不是)。不知道为什么我尝试这个。我读到 Numba 只支持有限的数据类型,所以我确保它们是 ints
- 其他几件事我不记得了...
对混乱的代码表示歉意。我有点无计可施了。
Below is the full code:
import numpy as np
from numpy import random
from numpy.random import randn
import pandas as pd
from timeit import default_timer as timer
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
from numba import *
def bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean):
for i in range(boot_samp):
rand_idx = random.randint(n_samp-1,size=(50)) #get random array of indices 0-49, with replacement
out_mean[i] = dt_arry[rand_idx].mean()
@cuda.jit
def bootstrap_rand_gpu(rng_states, dt_arry, n_samp, out_mean):
thread_id = cuda.grid(1)
stride = cuda.gridsize(1)
for i in range(thread_id, dt_arry.shape[0], stride):
for k in range(0,n_samp-1,1):
rand_idx_arry[k] = int(xoroshiro128p_uniform_float32(rng_states, thread_id) * 49)
out_mean[thread_id] = dt_arry[rand_idx_arry].mean()
mean = 10
rand_fluc = 3
n_samp = int(50)
boot_samp = int(1000)
dt_arry = (random.rand(n_samp)-.5)*rand_fluc + mean
out_mean_cpu = np.empty(boot_samp)
out_mean_gpu = np.empty(boot_samp)
##################
# RUN ON CPU
##################
start = timer()
bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean_cpu)
dt = timer() - start
print("CPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_cpu.mean()))
print("Bootstrap CPU in %f s" % dt)
##################
# RUN ON GPU
##################
threads_per_block = 64
blocks_per_grid = 24
#create random state for each state in the array
rng_states = create_xoroshiro128p_states(threads_per_block * blocks_per_grid, seed=1)
start = timer()
dt_arry_device = cuda.to_device(dt_arry)
out_mean_gpu_device = cuda.to_device(out_mean_gpu)
bootstrap_rand_gpu[threads_per_block, blocks_per_grid](rng_states, dt_arry_device, n_samp, out_mean_gpu_device)
out_mean_gpu_device.copy_to_host()
dt = timer() - start
print("GPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_gpu.mean()))
print("Bootstrap GPU in %f s" % dt)
您似乎至少有 4 个问题:
- 在您的内核代码中,
rand_idx_arry
未定义。
- 你不能在 cuda 设备代码中做
.mean()
- 您的内核启动配置参数颠倒了。
- 您的内核的网格步幅循环范围不正确。
dt_array.shape[0]
是 50,因此您只填充了 gpu 输出数组中的前 50 个位置。就像您的主机代码一样,此网格步幅循环的范围应该是输出数组的大小(即 boot_samp
)
可能还有其他问题,但是当我像这样重构您的代码以解决这些问题时,似乎 运行 没有错误:
$ cat t65.py
#import matplotlib.pyplot as plt
import numpy as np
from numpy import random
from numpy.random import randn
from timeit import default_timer as timer
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
from numba import *
def bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean):
for i in range(boot_samp):
rand_idx = random.randint(n_samp-1,size=(50)) #get random array of indices 0-49, with replacement
out_mean[i] = dt_arry[rand_idx].mean()
@cuda.jit
def bootstrap_rand_gpu(rng_states, dt_arry, n_samp, out_mean):
thread_id = cuda.grid(1)
stride = cuda.gridsize(1)
for i in range(thread_id, out_mean.shape[0], stride):
my_sum = 0.0
for k in range(0,n_samp-1,1):
my_sum += dt_arry[int(xoroshiro128p_uniform_float32(rng_states, thread_id) * 49)]
out_mean[thread_id] = my_sum/(n_samp-1)
mean = 10
rand_fluc = 3
n_samp = int(50)
boot_samp = int(1000)
dt_arry = (random.rand(n_samp)-.5)*rand_fluc + mean
#plt.plot(dt_arry)
#figureData = plt.figure(1)
#plt.title('Plot ' + str(n_samp) + ' samples')
#plt.plot(dt_arry)
#figureData.show()
out_mean_cpu = np.empty(boot_samp)
out_mean_gpu = np.empty(boot_samp)
##################
# RUN ON CPU
##################
start = timer()
bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean_cpu)
dt = timer() - start
print("CPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_cpu.mean()))
print("Bootstrap CPU in %f s" % dt)
#figureMeanCpu = plt.figure(2)
#plt.title('Plot '+ str(boot_samp) + ' bootstrap means - CPU')
#plt.plot(out_mean_cpu)
#figureData.show()
##################
# RUN ON GPU
##################
threads_per_block = 64
blocks_per_grid = 24
#create random state for each state in the array
rng_states = create_xoroshiro128p_states(threads_per_block * blocks_per_grid, seed=1)
start = timer()
dt_arry_device = cuda.to_device(dt_arry)
out_mean_gpu_device = cuda.to_device(out_mean_gpu)
bootstrap_rand_gpu[blocks_per_grid, threads_per_block](rng_states, dt_arry_device, n_samp, out_mean_gpu_device)
out_mean_gpu = out_mean_gpu_device.copy_to_host()
dt = timer() - start
print("GPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_gpu.mean()))
print("Bootstrap GPU in %f s" % dt)
python t65.py
CPU Bootstrap mean of 1000 mean samples: 10.148048544038735
Bootstrap CPU in 0.037496 s
GPU Bootstrap mean of 1000 mean samples: 10.145088765532936
Bootstrap GPU in 0.416822 s
$
备注:
- 我注释掉了一堆似乎不相关的东西。以后发布代码时,您可能想做类似的事情(删除与您的问题无关的内容。)
- 最后我还修复了一些关于最终 GPU 打印输出的问题。
- 我没有仔细研究你的代码。我并不是说任何东西都没有缺陷。我只是指出一些问题并提供如何解决这些问题的指南。我可以看到结果在 CPU 和 GPU 之间不匹配,但因为我不知道你在做什么,而且因为随机数生成器在 CPU 和 GPU 之间不匹配代码,对我来说,事情应该匹配并不明显。
背景: 我正在尝试创建一个简单的 bootstrap 函数来进行替换采样。我想并行化该函数,因为我最终会将其部署到具有数百万个数据点的数据上,并且希望样本量更大。我还有 运行 其他示例,例如 Mandelbrot 示例。在下面的代码中,您会看到我有一个 CPU 版本的代码,它运行良好。
我已经阅读了一些资源来解决这个问题 运行:
问题: 这是我第一次涉足 CUDA 编程,我相信我已经正确设置了所有内容。我遇到了一个我似乎无法弄清楚的错误:
TypingError: cannot determine Numba type of <class 'object'>
我认为有问题的 LOC 是:
bootstrap_rand_gpu[threads_per_block, blocks_per_grid](rng_states, dt_arry_device, n_samp, out_mean_gpu)
解决问题的尝试:我不会详细说明,但这里有以下尝试
以为跟cuda.to_device()有关系。我改变了它,我还调用了 cuda.to_device_array_like()。我对所有参数都使用了 to_device() ,而且只有少数几个。我看过代码示例,其中它用于每个参数,有时不使用。所以我不确定应该做什么。
我已经删除了 GPU 的 运行dom 数字生成器 (create_xoroshiro128p_states) 并且只使用了一个静态值来测试。
使用 int() 显式分配整数(而不是)。不知道为什么我尝试这个。我读到 Numba 只支持有限的数据类型,所以我确保它们是 ints
- 其他几件事我不记得了...
对混乱的代码表示歉意。我有点无计可施了。
Below is the full code:
import numpy as np
from numpy import random
from numpy.random import randn
import pandas as pd
from timeit import default_timer as timer
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
from numba import *
def bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean):
for i in range(boot_samp):
rand_idx = random.randint(n_samp-1,size=(50)) #get random array of indices 0-49, with replacement
out_mean[i] = dt_arry[rand_idx].mean()
@cuda.jit
def bootstrap_rand_gpu(rng_states, dt_arry, n_samp, out_mean):
thread_id = cuda.grid(1)
stride = cuda.gridsize(1)
for i in range(thread_id, dt_arry.shape[0], stride):
for k in range(0,n_samp-1,1):
rand_idx_arry[k] = int(xoroshiro128p_uniform_float32(rng_states, thread_id) * 49)
out_mean[thread_id] = dt_arry[rand_idx_arry].mean()
mean = 10
rand_fluc = 3
n_samp = int(50)
boot_samp = int(1000)
dt_arry = (random.rand(n_samp)-.5)*rand_fluc + mean
out_mean_cpu = np.empty(boot_samp)
out_mean_gpu = np.empty(boot_samp)
##################
# RUN ON CPU
##################
start = timer()
bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean_cpu)
dt = timer() - start
print("CPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_cpu.mean()))
print("Bootstrap CPU in %f s" % dt)
##################
# RUN ON GPU
##################
threads_per_block = 64
blocks_per_grid = 24
#create random state for each state in the array
rng_states = create_xoroshiro128p_states(threads_per_block * blocks_per_grid, seed=1)
start = timer()
dt_arry_device = cuda.to_device(dt_arry)
out_mean_gpu_device = cuda.to_device(out_mean_gpu)
bootstrap_rand_gpu[threads_per_block, blocks_per_grid](rng_states, dt_arry_device, n_samp, out_mean_gpu_device)
out_mean_gpu_device.copy_to_host()
dt = timer() - start
print("GPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_gpu.mean()))
print("Bootstrap GPU in %f s" % dt)
您似乎至少有 4 个问题:
- 在您的内核代码中,
rand_idx_arry
未定义。 - 你不能在 cuda 设备代码中做
.mean()
- 您的内核启动配置参数颠倒了。
- 您的内核的网格步幅循环范围不正确。
dt_array.shape[0]
是 50,因此您只填充了 gpu 输出数组中的前 50 个位置。就像您的主机代码一样,此网格步幅循环的范围应该是输出数组的大小(即boot_samp
)
可能还有其他问题,但是当我像这样重构您的代码以解决这些问题时,似乎 运行 没有错误:
$ cat t65.py
#import matplotlib.pyplot as plt
import numpy as np
from numpy import random
from numpy.random import randn
from timeit import default_timer as timer
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32
from numba import *
def bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean):
for i in range(boot_samp):
rand_idx = random.randint(n_samp-1,size=(50)) #get random array of indices 0-49, with replacement
out_mean[i] = dt_arry[rand_idx].mean()
@cuda.jit
def bootstrap_rand_gpu(rng_states, dt_arry, n_samp, out_mean):
thread_id = cuda.grid(1)
stride = cuda.gridsize(1)
for i in range(thread_id, out_mean.shape[0], stride):
my_sum = 0.0
for k in range(0,n_samp-1,1):
my_sum += dt_arry[int(xoroshiro128p_uniform_float32(rng_states, thread_id) * 49)]
out_mean[thread_id] = my_sum/(n_samp-1)
mean = 10
rand_fluc = 3
n_samp = int(50)
boot_samp = int(1000)
dt_arry = (random.rand(n_samp)-.5)*rand_fluc + mean
#plt.plot(dt_arry)
#figureData = plt.figure(1)
#plt.title('Plot ' + str(n_samp) + ' samples')
#plt.plot(dt_arry)
#figureData.show()
out_mean_cpu = np.empty(boot_samp)
out_mean_gpu = np.empty(boot_samp)
##################
# RUN ON CPU
##################
start = timer()
bootstrap_rand_cpu(dt_arry, n_samp, boot_samp, out_mean_cpu)
dt = timer() - start
print("CPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_cpu.mean()))
print("Bootstrap CPU in %f s" % dt)
#figureMeanCpu = plt.figure(2)
#plt.title('Plot '+ str(boot_samp) + ' bootstrap means - CPU')
#plt.plot(out_mean_cpu)
#figureData.show()
##################
# RUN ON GPU
##################
threads_per_block = 64
blocks_per_grid = 24
#create random state for each state in the array
rng_states = create_xoroshiro128p_states(threads_per_block * blocks_per_grid, seed=1)
start = timer()
dt_arry_device = cuda.to_device(dt_arry)
out_mean_gpu_device = cuda.to_device(out_mean_gpu)
bootstrap_rand_gpu[blocks_per_grid, threads_per_block](rng_states, dt_arry_device, n_samp, out_mean_gpu_device)
out_mean_gpu = out_mean_gpu_device.copy_to_host()
dt = timer() - start
print("GPU Bootstrap mean of " + str(boot_samp) + " mean samples: " + str(out_mean_gpu.mean()))
print("Bootstrap GPU in %f s" % dt)
python t65.py
CPU Bootstrap mean of 1000 mean samples: 10.148048544038735
Bootstrap CPU in 0.037496 s
GPU Bootstrap mean of 1000 mean samples: 10.145088765532936
Bootstrap GPU in 0.416822 s
$
备注:
- 我注释掉了一堆似乎不相关的东西。以后发布代码时,您可能想做类似的事情(删除与您的问题无关的内容。)
- 最后我还修复了一些关于最终 GPU 打印输出的问题。
- 我没有仔细研究你的代码。我并不是说任何东西都没有缺陷。我只是指出一些问题并提供如何解决这些问题的指南。我可以看到结果在 CPU 和 GPU 之间不匹配,但因为我不知道你在做什么,而且因为随机数生成器在 CPU 和 GPU 之间不匹配代码,对我来说,事情应该匹配并不明显。