Python: 如何在数组 x 中找到与数组 y 中元素值接近的元素？

Question

数组 x 和 y 中的元素是浮点数。我想找到数组 x 中的元素，其值尽可能接近数组 y 中的元素（对于数组 y - 数组 x 中的一个元素。数组 x 包含 >10^6 个元素和数组 y 大约 10^3，这是 for 循环的一部分 所以应该尽快完成。

我试图避免将它作为一个新的 for 循环，所以我这样做了，但是对于一个大的 y 数组来说它非常慢

x = np.array([0, 0.2, 1, 2.4, 3,  5]); y = np.array([0, 1, 2]);
diff_xy = x.reshape(1,len(x)) - y.reshape(len(y),1);
diff_xy_abs = np.fabs(diff_xy);
args_x = np.argmin(diff_xy_abs, axis = 1);
x_new = x[args_x]

我是 Python 的新手，欢迎提出任何建议！

Answer 1

也许对较大的数组进行排序，然后从中二进制搜索较小的数组的值，如果找到最接近的值并且附近的值在附近的索引中紧挨着它，如果没有找到，那么最接近的值是下一个到失败的地步。

Answer 2

下面给出了想要的结果。

x[abs((np.tile(x, (len(y), 1)).T - y).T).argmin(axis=1)]

它tiles x对于y（len(y)）中的每个元素，转置（.T）这个平铺数组，减去y，重新转置，取absolute的差值，用argmin（over axis=1）确定最小值的索引，最后从这些索引中取值x.

Answer 3

它以 x 和 y 的顺序为代价，但该代码是否满足您的性能需求？ Rem: 来自 x 的相同值可以用于多个 y 值。

import numpy as np

# x = np.array([0, 0.2, 1, 2.4, 3,  5]);
# y = np.array([0, 1, 2]);
x = np.random.rand(10**6)*5000000
y = (np.random.rand(10**3)*5000000).astype(int)

x_new = np.zeros(len(y))  # Create an 'empty' array for the result

x.sort()  # could be skipped if already sorted
y.sort()  # could be skipped if already sorted

len_x = len(x)
idx_x = 0
cur_x = x[0]

for idx_y, cur_y in enumerate(y):
    while True:
        if idx_x == len_x-1: 
            # If we are at the end of x, the last value is the best value
            x_new[idx_y] = cur_x
            break
        next_x = x[idx_x+1]
        if abs(cur_y - cur_x) < abs(cur_y - next_x):
            # If the current value of x is better than the next, keep it
            x_new[idx_y] = cur_x
            break
        # Check for the next value
        idx_x += 1
        cur_x = next_x

print(x_new)

Python: 如何在数组 x 中找到与数组 y 中元素值接近的元素？

Python: how to find elements in array x which have values close to elements in array y?

python

arrays

elements