改进元组距离计算算法以提高时间效率
Improve tuple distance computation algorithm for time efficiency
我有一个算法可以计算每个点 p
(我在元组中表示的坐标值)到元组列表中每个其他元组的距离。
积分列表:
centerList = [(54, 2991),
(1717, 2989),
(1683, 2991),
(1604, 2991),
(114, 2991),
(919,222),
(930,233)]
距离函数:
def getDistance(p0, p1):
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
计算点 p
到元组列表中所有其他点的距离的算法。
i = 0
distanceList = []
for p in range(len(centerList)):
while i < len(centerList):
print centerList[p], centerList[i], getDistance(centerList[p], centerList[i])
distance = getDistance(centerList[p], centerList[i])
if distance < 20:
distanceList.append(distance)
i += 1
i = p + 2
我当前的算法以一种非冗余的方式递增,但在其当前状态下对于实际应用来说过于粗暴。我的问题在于我的实际 centerList
包含数千个元组。
可以做些什么来提高这个元组比较算法的时间效率?
您可以将 sklearn.metrics.euclidean_distances
与 numpy
的布尔索引结合起来进行计算:
>>> from sklearn.metrics import euclidean_distances
>>> import numpy as np
>>> centerList = np.array(centerList)
>>> distances = euclidean_distances(centerList)
>>> distances[distances<20]
array([ 0. , 0. , 0. , 0. ,
0. , 0. , 15.55634919, 15.55634919, 0. ])
距离的计算使用了在高速 C 中开发的 numpy 矩阵代数。文档还强调了底层数学技术的效率:
For efficiency reasons, the euclidean distance between a pair of row
vector x and y is computed as:
dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
This formulation has two advantages over other ways of computing
distances. First, it is computationally efficient when dealing with
sparse data. Second, if one argument varies but the other remains
unchanged, then dot(x, x) and/or dot(y, y) can be pre-computed.
只有 numpy
:
import numpy
centerList = [(54, 2991), (1717, 2989), (1683, 2991), (1604, 2991), (114, 2991), (919,222), (930,233)]
centerList = numpy.array(centerList)
def getDistance(p0,p1):
return numpy.linalg.norm(p0-p1)
会 return 与您的 getDistance
功能相同的结果。
我有一个算法可以计算每个点 p
(我在元组中表示的坐标值)到元组列表中每个其他元组的距离。
积分列表:
centerList = [(54, 2991),
(1717, 2989),
(1683, 2991),
(1604, 2991),
(114, 2991),
(919,222),
(930,233)]
距离函数:
def getDistance(p0, p1):
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
计算点 p
到元组列表中所有其他点的距离的算法。
i = 0
distanceList = []
for p in range(len(centerList)):
while i < len(centerList):
print centerList[p], centerList[i], getDistance(centerList[p], centerList[i])
distance = getDistance(centerList[p], centerList[i])
if distance < 20:
distanceList.append(distance)
i += 1
i = p + 2
我当前的算法以一种非冗余的方式递增,但在其当前状态下对于实际应用来说过于粗暴。我的问题在于我的实际 centerList
包含数千个元组。
可以做些什么来提高这个元组比较算法的时间效率?
您可以将 sklearn.metrics.euclidean_distances
与 numpy
的布尔索引结合起来进行计算:
>>> from sklearn.metrics import euclidean_distances
>>> import numpy as np
>>> centerList = np.array(centerList)
>>> distances = euclidean_distances(centerList)
>>> distances[distances<20]
array([ 0. , 0. , 0. , 0. ,
0. , 0. , 15.55634919, 15.55634919, 0. ])
距离的计算使用了在高速 C 中开发的 numpy 矩阵代数。文档还强调了底层数学技术的效率:
For efficiency reasons, the euclidean distance between a pair of row vector x and y is computed as:
dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
This formulation has two advantages over other ways of computing distances. First, it is computationally efficient when dealing with sparse data. Second, if one argument varies but the other remains unchanged, then dot(x, x) and/or dot(y, y) can be pre-computed.
只有 numpy
:
import numpy
centerList = [(54, 2991), (1717, 2989), (1683, 2991), (1604, 2991), (114, 2991), (919,222), (930,233)]
centerList = numpy.array(centerList)
def getDistance(p0,p1):
return numpy.linalg.norm(p0-p1)
会 return 与您的 getDistance
功能相同的结果。