高效计算 3D 点集之间的矢量

Efficient calculation of vector between sets of 3D points

我正在 Python 中编写特定版本的光线追踪,我正在尝试计算不同平面上的点之间的矢量。

我正在使用点光源集来模拟非点光源。每个光源为 "camera" 平面上的每个像素生成一条光线。通过对每个像素使用 for 循环进行迭代,我设法计算出每条光线的矢量:

for sensor_point in sensor_points:    
    sp_min_ro = sensor_point - rayorigins #Vectors between the points
    normalv = normalize(sp_min_ro) #Normalized vector between the points

其中 sensor_points 是一个大的 numpy 数组,具有不同像素位置的 [x,y,z] 坐标,rayorigins 是一个具有 [x,y,z] 坐标的 numpy 数组] 不同点源的坐标

这种 for 循环方法有效,但速度极慢。我尝试去掉for循环,直接用整个数组计算spr_min_ro = sensor_points - rayorigins,但是numpy无法操作:

ValueError: operands could not be broadcast together with shapes (1002001,3) (36,3)

有没有办法加快查找所有点之间的向量的过程?


编辑:添加我一直在使用的规范化函数定义,因为它也有问题:

def normalize(v):
    norm = np.linalg.norm(v, axis=1)
    return v / norm[:,None]

当我尝试从@aganders3 解决方案传递新的 (1002001, 36, 3) 数组时,它失败了,我想是因为轴?

查看 NumPy broadcasting 的规则。我认为在 sensor_points 数组的中间添加一个新轴会起作用:

>> sp_min_ro = sensor_points[:, np.newaxis, :] - rayorigins
>> sp_min_ro.shape
(1002001, 36, 3)

Numpy 解决方案

import numpy as np

sensor_points=np.random.randn(1002001,3)#.astype(np.float32)
rayorigins=np.random.rand(36,3)#.astype(np.float32)

sp_min_ro = sensor_points[:, np.newaxis, :] - rayorigins
norm=np.linalg.norm(sp_min_ro,axis=2)
sp_min_ro/=norm[:,:,np.newaxis]

计时

np.float64: 1.76 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.float32: 1.42 s ± 9.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba 解决方案

import numba as nb

@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def normalized_vec(sensor_points,rayorigins):
    res=np.empty((sensor_points.shape[0],rayorigins.shape[0],3),dtype=sensor_points.dtype)
    for i in nb.prange(sensor_points.shape[0]):
        for j in range(rayorigins.shape[0]):
            vec_x=sensor_points[i,0]-rayorigins[j,0]
            vec_y=sensor_points[i,1]-rayorigins[j,1]
            vec_z=sensor_points[i,2]-rayorigins[j,2]
            dist=np.sqrt(vec_x**2+vec_y**2+vec_z**2)
            res[i,j,0]=vec_x/dist
            res[i,j,1]=vec_y/dist
            res[i,j,2]=vec_z/dist
    return res

计时

%timeit res=normalized_vec(sensor_points,rayorigins)
np.float64: 208 ms ± 4.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.float32: 104 ms ± 515 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

具有预分配内存的 Numba 解决方案

内存分配可能非常昂贵。这个例子应该表明,为什么有时尽可能避免使用大型临时数组是个好主意。

@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def normalized_vec(sensor_points,rayorigins,res):
    for i in nb.prange(sensor_points.shape[0]):
        for j in range(rayorigins.shape[0]):
            vec_x=sensor_points[i,0]-rayorigins[j,0]
            vec_y=sensor_points[i,1]-rayorigins[j,1]
            vec_z=sensor_points[i,2]-rayorigins[j,2]
            dist=np.sqrt(vec_x**2+vec_y**2+vec_z**2)
            res[i,j,0]=vec_x/dist
            res[i,j,1]=vec_y/dist
            res[i,j,2]=vec_z/dist
    return res

计时

res=np.empty((sensor_points.shape[0],rayorigins.shape[0],3),dtype=sensor_points.dtype)
%timeit res=normalized_vec(sensor_points,rayorigins)
np.float64: 66.6 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
np.float32: 33.8 ms ± 375 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)