计算后续成对坐标之间的累积欧氏距离
Compute cumulative euclidean distances between subsequent pairwise coordinates
我有以下两个数组:
X = array([37., 42., 31., 27., 37.])
Y = array([52., 57., 62., 68., 69.])
我也可以将它们组合如下:
XY = np.array((X, Y)).T
产生
([[37., 52.],
[42., 57.],
[31., 62.],
[27., 68.],
[37., 69.]])
我想计算所有点对之间的距离
例如我想这样做:
(
np.linalg.norm(np.array([37, 52]) - np.array([42, 57]))
+ np.linalg.norm(np.array([42, 57]) - np.array([31, 62]))
+ np.linalg.norm(np.array([31, 62]) - np.array([27, 68]))
+ np.linalg.norm(np.array([27, 68]) - np.array([37, 69]))
+ np.linalg.norm(np.array([37, 69]) - np.array([37, 52]))
)
然后生成 53.41509195750892
我写了一个这样做的函数:
def distance(X, Y):
N = len(X)
T = 0
oldx, oldy = X[-1], Y[-1]
for x, y in zip(X, Y):
T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
oldx = x
oldy = y
return T
print(distance(X, Y))
还生产 53.41509195750891
我很想知道是否有更多 elegant/efficient 方法来处理 numpy 数组操作。
编辑: 对不起,我给出的原始示例函数是错误的,现在应该是正确的
编辑: 谢谢大家的回答!这是我的大小为 50 的数组的基准测试,看起来 Dani 的答案是最快的,尽管 Akshay 的答案对于大小为 5 的数组更快。
def distance_charel(X, Y):
N = len(X)
T = 0
oldx, oldy = X[-1], Y[-1]
for x, y in zip(X, Y):
T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
oldx = x
oldy = y
return T
def distance_dani(X, Y):
XY = np.array((X, Y)).T
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
ss = np.power(diff, 2).sum(axis=1)
res = np.sqrt(ss).sum()
return res
def distance_akshay(X, Y):
XY = np.array((X, Y)).T
pairwise = pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
return total
def distance_gwang(X, Y):
XY = np.array((X, Y)).T
return sum([sum((p1 - p2) ** 2) ** .5 for p1, p2 in zip(XY, XY[1:])])
def distane_andy(X, Y):
arr = np.array((X, Y)).T
return np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
然后
print(distance_charel(X, Y))
print(distance_dani(X, Y))
print(distance_akshay(X, Y))
print(distance_gwang(X, Y)) # I think it misses the distance between last and first element
print(distane_andy(X, Y))
%timeit distance_charel(X, Y)
%timeit distance_dani(X, Y)
%timeit distance_akshay(X, Y)
%timeit distance_gwang(X, Y)
%timeit distane_andy(X, Y)
产出
2586.769647563161
2586.76964756316
2586.7696475631597
2568.8811037431624
2586.7696475631597
2.49 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.9 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
385 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.2 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
编辑: 我现在接受了 Dani 的回答,因为我发现他的代码对我来说是最好的(使用 numpy 向量运算,可读性强,而且速度最快(略有差距))情况。感谢大家的回答!
编辑: 我使用 280 coordinates
更新了基准
from scipy.spatial.distance import euclidean
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T
sum1 = euclidean(XY[0],XY[-1])
for i in range(len(XY)-1):
sum1 += euclidean(XY[i],XY[i+1])
这应该可以做到,从最难的项开始算起。然后迭代更简单的。将它们全部加在一起。
作为支票 euclidean(XY[0],XY[1]) = 7.0710678118654755
与您提供的值相同。
您可以通过 hand 以向量化方式计算公式,方法是使用 diff, power and sqrt:
import numpy as np
# setup
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T
# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)
# find the square root and sum
res = np.sqrt(ss).sum()
print(res)
输出
53.41509195750891
第一步:
# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
计算x1 - y1
和x2 - y2
,第二步:
# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)
将这些值提高到 2 的幂,即 (x1 - y1)^2
,然后求和,最后:
# find the square root and sum
res = np.sqrt(ss).sum()
如其所说求平方根。
为了更好地理解它,让我们看一个更小的例子:
# setup
X = np.array([37., 42.])
Y = np.array([52., 57])
XY = np.array((X, Y)).T
diff = np.diff(XY, axis=0)
# [[5. 5.]] (42 - 37) (57 - 52)
ss = np.power(diff, 2).sum(axis=1)
# [50.] 5^2 + 5^2
res = np.sqrt(ss).sum()
# 7.0710678118654755
In [2]: df = pd.DataFrame([[37., 42., 31., 27., 37.],
...: [52., 57., 62., 68., 69.]]).T.rename(columns={0:"X", 1:"y"})
...: df
Out[2]:
X y
0 37.0 52.0
1 42.0 57.0
2 31.0 62.0
3 27.0 68.0
4 37.0 69.0
In [3]: from scipy.spatial.distance import euclidean
...: np.sum([euclidean(df.iloc[i], df.iloc[i+1]) for i in range(len(df)-1)])
Out[3]: 36.41509195750892
您可以使用任何循环将其完全矢量化为一行代码,如下所示,使用广播 -
- 首先,
(5,1,2) broadcasted with (1,5,2) -> (5,5,2)
- 用这个广播减去得到
(5,5,2)
- 然后对
(5,5,2)
中的每个元素求平方
- 对最后一个轴求和得到
(5,5)
- 终于开平方了!
接下来,你可以只取保持(1,2), (2,3) ...
之间距离的移动对角线数组。总结一下,因为你想把距离加回到第一个,把它加到 [0,-1]
的值
#This get all pairwise distances calculated with broadcasting
pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
#This takes sum of the first diagonal elements instead of 0th
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
print(total)
53.41509195750892
另一种方法如下,但上述方法仍然更快 -
np.sum(np.sqrt(np.sum(np.square(np.diff(np.vstack([XY,XY[0]]), axis=0)), axis=-1)))
#The np.vstack adds the first coordinate into the array so that you can
#calculate the distance from the last to the first again
基准-
- Akshay Sehgal - 每个循环 19.9 µs ± 2.53 µs(7 次运行的平均值 ± 标准差,每次 10000 次循环)
- Gwang - 每个循环 21.5 µs ± 1.01 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
- ombk - 每个循环 60.4 µs ± 5.72 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
- Dani Mesejo - 每个循环 16.4 µs ± 6.12 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
- Andy L - 每个循环 17.6 µs ± 3.08 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
正如预期的那样,numpy 向量化始终占据主导地位! Gj丹妮!
#preparation:
x = np.array([37., 42., 31., 27., 37.])
y = np.array([52., 57., 62.,68.,69.])
xy = np.array((x, y)).T
def euclidean_distance(p1, p2):
return sum((p1 - p2) ** 2) ** .5
您可以使用函数式编程更优雅地完成它。
在这里,您想 reduce
遍历 xy
:
中连续元素对的列表
from functools import reduce
from operator import add
reduce(add, [euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
## 36.41509195750892
reduce
遍历列表 [1, 2, 3, 4, ..., k]
通过应用二元函数 func(a, b)
可以做到这一点:
func( ... func(func(func(func(1, 2), 3), 4), 5) ..., k)
.
@DaniMesejo 指出 reduce(add, lst)
只是 sum(lst)
.
所以就更简单了:
sum([euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
这里最好的技巧实际上是 zip(xy, xy[1:])
从列表 [1, 2, 3, 4, ..., k]
创建
对:[(1, 2), (2, 3), (3, 4), ... (k-1, k)]
您可以将 np.roll
与 np.linalg.norm
和 sum
一起使用
#arr = np.stack([X,Y], axis=1)
arr = np.array((X, Y)).T #as suggested in the comment
Out[50]:
array([[37., 52.],
[42., 57.],
[31., 62.],
[27., 68.],
[37., 69.]])
In [52]: np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
Out[52]: 53.41509195750892
我有以下两个数组:
X = array([37., 42., 31., 27., 37.])
Y = array([52., 57., 62., 68., 69.])
我也可以将它们组合如下:
XY = np.array((X, Y)).T
产生
([[37., 52.],
[42., 57.],
[31., 62.],
[27., 68.],
[37., 69.]])
我想计算所有点对之间的距离
例如我想这样做:
(
np.linalg.norm(np.array([37, 52]) - np.array([42, 57]))
+ np.linalg.norm(np.array([42, 57]) - np.array([31, 62]))
+ np.linalg.norm(np.array([31, 62]) - np.array([27, 68]))
+ np.linalg.norm(np.array([27, 68]) - np.array([37, 69]))
+ np.linalg.norm(np.array([37, 69]) - np.array([37, 52]))
)
然后生成 53.41509195750892
我写了一个这样做的函数:
def distance(X, Y):
N = len(X)
T = 0
oldx, oldy = X[-1], Y[-1]
for x, y in zip(X, Y):
T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
oldx = x
oldy = y
return T
print(distance(X, Y))
还生产 53.41509195750891
我很想知道是否有更多 elegant/efficient 方法来处理 numpy 数组操作。
编辑: 对不起,我给出的原始示例函数是错误的,现在应该是正确的
编辑: 谢谢大家的回答!这是我的大小为 50 的数组的基准测试,看起来 Dani 的答案是最快的,尽管 Akshay 的答案对于大小为 5 的数组更快。
def distance_charel(X, Y):
N = len(X)
T = 0
oldx, oldy = X[-1], Y[-1]
for x, y in zip(X, Y):
T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
oldx = x
oldy = y
return T
def distance_dani(X, Y):
XY = np.array((X, Y)).T
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
ss = np.power(diff, 2).sum(axis=1)
res = np.sqrt(ss).sum()
return res
def distance_akshay(X, Y):
XY = np.array((X, Y)).T
pairwise = pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
return total
def distance_gwang(X, Y):
XY = np.array((X, Y)).T
return sum([sum((p1 - p2) ** 2) ** .5 for p1, p2 in zip(XY, XY[1:])])
def distane_andy(X, Y):
arr = np.array((X, Y)).T
return np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
然后
print(distance_charel(X, Y))
print(distance_dani(X, Y))
print(distance_akshay(X, Y))
print(distance_gwang(X, Y)) # I think it misses the distance between last and first element
print(distane_andy(X, Y))
%timeit distance_charel(X, Y)
%timeit distance_dani(X, Y)
%timeit distance_akshay(X, Y)
%timeit distance_gwang(X, Y)
%timeit distane_andy(X, Y)
产出
2586.769647563161
2586.76964756316
2586.7696475631597
2568.8811037431624
2586.7696475631597
2.49 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.9 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
385 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.2 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
编辑: 我现在接受了 Dani 的回答,因为我发现他的代码对我来说是最好的(使用 numpy 向量运算,可读性强,而且速度最快(略有差距))情况。感谢大家的回答!
编辑: 我使用 280 coordinates
更新了基准from scipy.spatial.distance import euclidean
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T
sum1 = euclidean(XY[0],XY[-1])
for i in range(len(XY)-1):
sum1 += euclidean(XY[i],XY[i+1])
这应该可以做到,从最难的项开始算起。然后迭代更简单的。将它们全部加在一起。
作为支票 euclidean(XY[0],XY[1]) = 7.0710678118654755
与您提供的值相同。
您可以通过 hand 以向量化方式计算公式,方法是使用 diff, power and sqrt:
import numpy as np
# setup
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T
# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)
# find the square root and sum
res = np.sqrt(ss).sum()
print(res)
输出
53.41509195750891
第一步:
# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
计算x1 - y1
和x2 - y2
,第二步:
# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)
将这些值提高到 2 的幂,即 (x1 - y1)^2
,然后求和,最后:
# find the square root and sum
res = np.sqrt(ss).sum()
如其所说求平方根。
为了更好地理解它,让我们看一个更小的例子:
# setup
X = np.array([37., 42.])
Y = np.array([52., 57])
XY = np.array((X, Y)).T
diff = np.diff(XY, axis=0)
# [[5. 5.]] (42 - 37) (57 - 52)
ss = np.power(diff, 2).sum(axis=1)
# [50.] 5^2 + 5^2
res = np.sqrt(ss).sum()
# 7.0710678118654755
In [2]: df = pd.DataFrame([[37., 42., 31., 27., 37.],
...: [52., 57., 62., 68., 69.]]).T.rename(columns={0:"X", 1:"y"})
...: df
Out[2]:
X y
0 37.0 52.0
1 42.0 57.0
2 31.0 62.0
3 27.0 68.0
4 37.0 69.0
In [3]: from scipy.spatial.distance import euclidean
...: np.sum([euclidean(df.iloc[i], df.iloc[i+1]) for i in range(len(df)-1)])
Out[3]: 36.41509195750892
您可以使用任何循环将其完全矢量化为一行代码,如下所示,使用广播 -
- 首先,
(5,1,2) broadcasted with (1,5,2) -> (5,5,2)
- 用这个广播减去得到
(5,5,2)
- 然后对
(5,5,2)
中的每个元素求平方
- 对最后一个轴求和得到
(5,5)
- 终于开平方了!
接下来,你可以只取保持(1,2), (2,3) ...
之间距离的移动对角线数组。总结一下,因为你想把距离加回到第一个,把它加到 [0,-1]
#This get all pairwise distances calculated with broadcasting
pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
#This takes sum of the first diagonal elements instead of 0th
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
print(total)
53.41509195750892
另一种方法如下,但上述方法仍然更快 -
np.sum(np.sqrt(np.sum(np.square(np.diff(np.vstack([XY,XY[0]]), axis=0)), axis=-1)))
#The np.vstack adds the first coordinate into the array so that you can
#calculate the distance from the last to the first again
基准-
- Akshay Sehgal - 每个循环 19.9 µs ± 2.53 µs(7 次运行的平均值 ± 标准差,每次 10000 次循环)
- Gwang - 每个循环 21.5 µs ± 1.01 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
- ombk - 每个循环 60.4 µs ± 5.72 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
- Dani Mesejo - 每个循环 16.4 µs ± 6.12 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
- Andy L - 每个循环 17.6 µs ± 3.08 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
正如预期的那样,numpy 向量化始终占据主导地位! Gj丹妮!
#preparation:
x = np.array([37., 42., 31., 27., 37.])
y = np.array([52., 57., 62.,68.,69.])
xy = np.array((x, y)).T
def euclidean_distance(p1, p2):
return sum((p1 - p2) ** 2) ** .5
您可以使用函数式编程更优雅地完成它。
在这里,您想 reduce
遍历 xy
:
from functools import reduce
from operator import add
reduce(add, [euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
## 36.41509195750892
reduce
遍历列表 [1, 2, 3, 4, ..., k]
通过应用二元函数 func(a, b)
可以做到这一点:
func( ... func(func(func(func(1, 2), 3), 4), 5) ..., k)
.
@DaniMesejo 指出 reduce(add, lst)
只是 sum(lst)
.
所以就更简单了:
sum([euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
这里最好的技巧实际上是 zip(xy, xy[1:])
从列表 [1, 2, 3, 4, ..., k]
创建
对:[(1, 2), (2, 3), (3, 4), ... (k-1, k)]
您可以将 np.roll
与 np.linalg.norm
和 sum
#arr = np.stack([X,Y], axis=1)
arr = np.array((X, Y)).T #as suggested in the comment
Out[50]:
array([[37., 52.],
[42., 57.],
[31., 62.],
[27., 68.],
[37., 69.]])
In [52]: np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
Out[52]: 53.41509195750892