使用 numpy 广播减去二维数组

Question

总的来说，我是 numpy 的新手，所以这是一个简单的问题，但我不知道如何解决它。
我正在尝试实现 K 最近邻算法来对数据集进行分类

有数组名为 new_points 和 point 分别具有 (30,4) 的形状和 (120,4)（4 是每个元素的属性总数）
所以我正在尝试使用 numpy.broadcasting

计算每个新点与所有旧点之间的距离

def calc_no_loop(new_points, points):
  return np.sum((new_points-points)**2,axis=1)
#doesn't work here is log

ValueError: operands could not be broadcast together with shapes (30,4) (120,4)

但是根据广播规则，两个形状数组 (30,4) 和 (120,4) 是不兼容的所以我很感激任何关于如何解决这个问题的见解（使用 .reshape prehaps - 不确定）

请注意：我已经使用一个和两个循环实现了相同的功能，但没有一个就无法实现它

def calc_two_loops(new_points, points):
m, n = len(new_points), len(points)
d = np.zeros((m, n))
for i in range(m):
    for j in range(n):
        
        d[i, j] = np.sum((new_points[i] - points[j])**2)
return d


def calc_one_loop(new_points, points):
m, n = len(new_points), len(points)    
d = np.zeros((m, n))
print(d)
for i in range(m):
    d[i] = np.sum((new_points[i] - points)**2)
return d

Answer 1

让我们创建一个更小的示例：

nNew = 3; nOld = 5    # Number of new / old points
# New points
new_points = np.arange(100, 100 + nNew * 4).reshape(nNew, 4)
# Old points
points = np.arange(10, 10 + nOld * 8, 2).reshape(nOld, 4)

要单独计算差异，运行:

dfr = new_points[:, np.newaxis, :] - points[np.newaxis, :, :]

到目前为止，我们在每个点的每个属性中都有差异（每个新点与每个旧点）。

dfr的形状是(3, 5, 4):

第一维：新点的个数，
第二维度：老点数，
第三维度：每个维度的差异属性。

然后，按点求差的平方和，运行:

d = np.power(dfr, 2).sum(axis=2)

这是你的结果。

对于我的示例数据，结果是：

array([[31334, 25926, 21030, 16646, 12774],
       [34230, 28566, 23414, 18774, 14646],
       [37254, 31334, 25926, 21030, 16646]], dtype=int32)

Answer 2

所以你有 30 个新点和 120 个旧点，所以如果我理解正确的话你想要一个距离的 shape(120,30) 数组结果。

你可以

import numpy as np

points = np.random.random(120*4).reshape(120,4)
new_points = np.random.random(30*4).reshape(30,4)

def calc_no_loop(new_points, points):
    res = np.zeros([len(points[:,0]),len(new_points[:,0])])
    for idx in range(len(points[:,0])):
        res[idx,:] = np.sum((points[idx,:]-new_points)**2,axis=1)
    return np.sqrt(res)

test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)

给出

(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
 [0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
 [0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
 ...
 [0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
 [0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
 [1.08515826 0.64626221 0.6898687  ... 0.96882542 1.08075076 0.80144746]]

但是从你上面的函数名我知道你不需要循环？那么你可以这样做：

def calc_no_loop(new_points, points):
    new_points1 = np.repeat(new_points[np.newaxis,...],len(points),axis=0)
    points1 = np.repeat(points[:,np.newaxis,:],len(new_points),axis=1)
    return np.sqrt(np.sum((new_points-points1)**2 ,axis=2))

test = calc_no_loop(new_points,points)
print(np.shape(test))
print(test)

有输出：

(120, 30)
[[0.67166838 0.78096694 0.94983683 ... 1.00960301 0.48076185 0.56419991]
 [0.88156338 0.54951826 0.73919191 ... 0.87757896 0.76305462 0.52486626]
 [0.85271938 0.56085692 0.73063341 ... 0.97884167 0.90509791 0.7505591 ]
 ...
 [0.53968258 0.64514941 0.89225849 ... 0.99278462 0.31861253 0.44615026]
 [0.51647526 0.58611128 0.83298535 ... 0.86669406 0.64931403 0.71517123]
 [1.08515826 0.64626221 0.6898687  ... 0.96882542 1.08075076 0.80144746]]

即同样的结果。请注意，我将 np.sqrt() 添加到您在上面的示例中可能忘记的结果中。

使用 numpy 广播减去二维数组

Subtracting Two dimensional arrays using numpy broadcasting

python

arrays

numpy

data-science