欧几里德距离:python 和具有大量实例的 numpy 之间的结果不同
Euclidean distance: results are different between python and numpy with large number of instances
我正在尝试两种方法来实现欧氏距离的平方结果。
通过 Numpy:
def inference(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
distances[idx] = euclidean_distances(pair[0].reshape((1, -1)), pair[1].reshape((1, -1))).item()
distances[idx] = distances[idx] * distances[idx]
return distances
来自 python:
def inference1(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
for pair_idx in range(len(pair[0])):
tmp = pair[0][pair_idx] - pair[1][pair_idx]
distances[idx] += tmp * tmp
return distances
测试结果代码为:
def main(args):
d = 128
n = 100
array2 = [(np.random.rand(d)/4, np.random.rand(d)/3) for x in range(n)]
result = sample.inference(array2)
print(list(result)) # print result 1
result = sample.inference1(array2)
print(list(result)) # print result 2
当n达到100000时结果不同,当n较小时结果相同。
为什么会这样?我怎样才能得到相同的结果?
在这个最小的例子中,我们看到两个结果之间的差异可以忽略不计。
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
def inference_sklearn(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
distances[idx] = euclidean_distances(pair[0].reshape((1, -1)), pair[1].reshape((1, -1))).item()
distances[idx] = distances[idx] * distances[idx]
return distances
def inference_python(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
for pair_idx in range(len(pair[0])):
tmp = pair[0][pair_idx] - pair[1][pair_idx]
distances[idx] += tmp * tmp
return distances
d = 128
ns = [100, 1000, 10000, 100000, 200000]
for n in ns:
print("n =", n)
test_array = [(np.random.rand(d)/4, np.random.rand(d)/3) for x in range(n)]
result_sklearn = inference_sklearn(test_array)
result_python = inference_python(test_array)
print(euclidean_distances([result_sklearn], [result_python])[0][0])
输出:
n = 100
0.0
n = 1000
0.0
n = 10000
0.0
n = 100000
0.0
n = 200000
1.52587890625e-05
当你想测试相等性时,不要只打印你的结果。您也可以使用 numpy.set_printoptions 来处理阵列的打印质量。
我正在尝试两种方法来实现欧氏距离的平方结果。
通过 Numpy:
def inference(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
distances[idx] = euclidean_distances(pair[0].reshape((1, -1)), pair[1].reshape((1, -1))).item()
distances[idx] = distances[idx] * distances[idx]
return distances
来自 python:
def inference1(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
for pair_idx in range(len(pair[0])):
tmp = pair[0][pair_idx] - pair[1][pair_idx]
distances[idx] += tmp * tmp
return distances
测试结果代码为:
def main(args):
d = 128
n = 100
array2 = [(np.random.rand(d)/4, np.random.rand(d)/3) for x in range(n)]
result = sample.inference(array2)
print(list(result)) # print result 1
result = sample.inference1(array2)
print(list(result)) # print result 2
当n达到100000时结果不同,当n较小时结果相同。
为什么会这样?我怎样才能得到相同的结果?
在这个最小的例子中,我们看到两个结果之间的差异可以忽略不计。
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
def inference_sklearn(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
distances[idx] = euclidean_distances(pair[0].reshape((1, -1)), pair[1].reshape((1, -1))).item()
distances[idx] = distances[idx] * distances[idx]
return distances
def inference_python(feature_list):
distances = np.zeros(len(feature_list))
for idx, pair in enumerate(feature_list):
for pair_idx in range(len(pair[0])):
tmp = pair[0][pair_idx] - pair[1][pair_idx]
distances[idx] += tmp * tmp
return distances
d = 128
ns = [100, 1000, 10000, 100000, 200000]
for n in ns:
print("n =", n)
test_array = [(np.random.rand(d)/4, np.random.rand(d)/3) for x in range(n)]
result_sklearn = inference_sklearn(test_array)
result_python = inference_python(test_array)
print(euclidean_distances([result_sklearn], [result_python])[0][0])
输出:
n = 100
0.0
n = 1000
0.0
n = 10000
0.0
n = 100000
0.0
n = 200000
1.52587890625e-05
当你想测试相等性时,不要只打印你的结果。您也可以使用 numpy.set_printoptions 来处理阵列的打印质量。