如何在 Python 中的大型 6 列数组中找到两个特定值?

How to find two specific values in a large 6-column array in Python?

我有一个 1213x5 列的数组,其中包含 3 到 5 个值(参见下面的示例)。

数组示例:

2 3 6 南南
8 6 2 楠楠
9 8 6 5 楠
9 5 2 1 楠
2 3 4 1 6
6 8 5 3 2

我有两个值,想找到这两个值出现的行的索引。我创建了以下代码,但需要 ~0.2 秒才能完成。这并不理想,因为我必须迭代数千次。我是 Python 的新手,因此我仍在习惯 pythonic 代码以及最快的处理方式。

当前代码:

def rowIdx(array, m, n):
    idxList = []
    m = int(m)
    n = int(n)
    for x in range(len(array)):
        if (array[x,0]== m or array[x,0] == n) and (array[x,1] == m or array[x,1] == n):
            idxList.append(x)
        if (array[x,0] == m or array[x,0] == n) and (array[x,2] == m or array[x,2] == n):
            idxList.append(x)
        if (array[x,1] == m or array[x,1] == n) and (array[x,2] == m or array[x,2] == n):
            idxList.append(x)
        if (array[x,0]== m or array[x,0] == n) and (array[x,3] == m or array[x,3] == n):
            idxList.append(x)
        if (array[x,1] == m or array[x,1] == n) and (array[x,3] == m or array[x,3] == n):
            idxList.append(x)
        if (array[x,2] == m or array[x,2] == n) and (array[x,3] == m or array[x,3] == n):
            idxList.append(x)
        if (array[x,0] == m or array[x,0] == n) and (array[x,4] == m or array[x,4] == n):
            idxList.append(x)
        if (array[x,1]== m or array[x,1] == n) and (array[x,4] == m or array[x,4] == n):
            idxList.append(x)
        if (array[x,2] == m or array[x,2] == n) and (array[x,4] == m or array[x,4] == n):
            idxList.append(x)
        if (array[x,3] == m or array[x,3] == n) and (array[x,4] == m or array[x,4] == n):
            idxList.append(x)            
    return idxList 

如有任何帮助,我们将不胜感激。谢谢!

我认为迭代数组的行并使用 in 检查所需的值应该很简单

def rowIdx(array, m, n):
    idxList = []
    m = int(m)
    n = int(n)
    for rx, aRow in enumerate(array):
        if m in aRow and n in aRow:
            idxList.append(rx)

enumerate is a standard built-in function,值得一试。

如果你觉得舒服,你可以像列表推导一样做:

def rowIdx(array, m, n):
    m = int(m)
    n = int(n)
    idxList = [rx for rx, aRow in enumerate(array) if m in aRow and n in aRow]

这非常 pythonic 且高效:

def row_index(array,m,n):
    for index, row in enumerate(array):
        if m in row and n in row:
            yield index

如果你想得到索引列表的结果,而不是像这样调用它:

list(row_index(array,m,n))

您可以枚举整个列表(numpy.array 或常规 python list)。不是检查每个元素和位置,而是检查元素是否属于列表:

def newRowIdx(array, m, n):
    idxList = []
    m = int(m)
    n = int(n)
    for idx, row in enumerate(array):
        if (m in row and n in row):
            idxList.append(idx)

这个方法耗时0.372s,你的耗时0.976s(网上环境查了一下,都是慢的),但是提升还是很明显的

None 上面给出的答案非常有效,因为它们依赖于 Python for 循环,我也不会考虑它们 Pythonic。在这种情况下,您最好只使用 numpy 函数。在引擎盖下,它们是 运行 C 代码,针对您想做的事情进行了大量优化。

希望下面的代码对您有用。这将比目前提出的任何解决方案快几个数量级。

import numpy as np

# Example array
array = np.array([[1,2,90,91,90],[90,2,3,91,5],[1,2,3,np.nan,np.nan]])

# Adapt to your values.
# Find rows per value where the element is there. Easy to extend to more than two values
msk_value_1 = (array == 90).any(axis=1)
msk_value_2 = (array == 91).any(axis=1)

both_true = msk_value_1 & msk_value_2

# find the indices
np.where(both_true)

基准

这里有一些基准可以与提供的大多数解决方案进行比较:

# Create a random matrix for inputs
array = np.random.randint(0,1000, (3000, 4))

# solution here
def find_two_values(array, m, n):
    msk_value_1 = (array == int(m)).any(axis=1)
    msk_value_2 = (array == int(n)).any(axis=1)

    both_true = msk_value_1 & msk_value_2

    # find the indices
    return np.where(both_true)

# takes 900 microseconds on my computer
_ = find_two_values(array, 90, 91)

# other solution proposed
def rowIdx(array, m, n):
    idxList = []
    m = int(m)
    n = int(n)
    for rx, aRow in enumerate(array):
        if m in aRow and n in aRow:
            idxList.append(rx)
# takes 9.8 ms on my computer
_ = rowIdx(array, 90, 91)

即 1 毫秒与 10 毫秒大约快十倍。