如何在 Python 中的大型 6 列数组中找到两个特定值?
How to find two specific values in a large 6-column array in Python?
我有一个 1213x5 列的数组,其中包含 3 到 5 个值(参见下面的示例)。
数组示例:
2 3 6 南南
8 6 2 楠楠
9 8 6 5 楠
9 5 2 1 楠
2 3 4 1 6
6 8 5 3 2
我有两个值,想找到这两个值出现的行的索引。我创建了以下代码,但需要 ~0.2 秒才能完成。这并不理想,因为我必须迭代数千次。我是 Python 的新手,因此我仍在习惯 pythonic 代码以及最快的处理方式。
当前代码:
def rowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for x in range(len(array)):
if (array[x,0]== m or array[x,0] == n) and (array[x,1] == m or array[x,1] == n):
idxList.append(x)
if (array[x,0] == m or array[x,0] == n) and (array[x,2] == m or array[x,2] == n):
idxList.append(x)
if (array[x,1] == m or array[x,1] == n) and (array[x,2] == m or array[x,2] == n):
idxList.append(x)
if (array[x,0]== m or array[x,0] == n) and (array[x,3] == m or array[x,3] == n):
idxList.append(x)
if (array[x,1] == m or array[x,1] == n) and (array[x,3] == m or array[x,3] == n):
idxList.append(x)
if (array[x,2] == m or array[x,2] == n) and (array[x,3] == m or array[x,3] == n):
idxList.append(x)
if (array[x,0] == m or array[x,0] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
if (array[x,1]== m or array[x,1] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
if (array[x,2] == m or array[x,2] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
if (array[x,3] == m or array[x,3] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
return idxList
如有任何帮助,我们将不胜感激。谢谢!
我认为迭代数组的行并使用 in
检查所需的值应该很简单
def rowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for rx, aRow in enumerate(array):
if m in aRow and n in aRow:
idxList.append(rx)
enumerate
is a standard built-in function,值得一试。
如果你觉得舒服,你可以像列表推导一样做:
def rowIdx(array, m, n):
m = int(m)
n = int(n)
idxList = [rx for rx, aRow in enumerate(array) if m in aRow and n in aRow]
这非常 pythonic 且高效:
def row_index(array,m,n):
for index, row in enumerate(array):
if m in row and n in row:
yield index
如果你想得到索引列表的结果,而不是像这样调用它:
list(row_index(array,m,n))
您可以枚举整个列表(numpy.array
或常规 python list
)。不是检查每个元素和位置,而是检查元素是否属于列表:
def newRowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for idx, row in enumerate(array):
if (m in row and n in row):
idxList.append(idx)
这个方法耗时0.372s,你的耗时0.976s(网上环境查了一下,都是慢的),但是提升还是很明显的
None 上面给出的答案非常有效,因为它们依赖于 Python for 循环,我也不会考虑它们 Pythonic。在这种情况下,您最好只使用 numpy 函数。在引擎盖下,它们是 运行 C 代码,针对您想做的事情进行了大量优化。
希望下面的代码对您有用。这将比目前提出的任何解决方案快几个数量级。
import numpy as np
# Example array
array = np.array([[1,2,90,91,90],[90,2,3,91,5],[1,2,3,np.nan,np.nan]])
# Adapt to your values.
# Find rows per value where the element is there. Easy to extend to more than two values
msk_value_1 = (array == 90).any(axis=1)
msk_value_2 = (array == 91).any(axis=1)
both_true = msk_value_1 & msk_value_2
# find the indices
np.where(both_true)
基准
这里有一些基准可以与提供的大多数解决方案进行比较:
# Create a random matrix for inputs
array = np.random.randint(0,1000, (3000, 4))
# solution here
def find_two_values(array, m, n):
msk_value_1 = (array == int(m)).any(axis=1)
msk_value_2 = (array == int(n)).any(axis=1)
both_true = msk_value_1 & msk_value_2
# find the indices
return np.where(both_true)
# takes 900 microseconds on my computer
_ = find_two_values(array, 90, 91)
# other solution proposed
def rowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for rx, aRow in enumerate(array):
if m in aRow and n in aRow:
idxList.append(rx)
# takes 9.8 ms on my computer
_ = rowIdx(array, 90, 91)
即 1 毫秒与 10 毫秒大约快十倍。
我有一个 1213x5 列的数组,其中包含 3 到 5 个值(参见下面的示例)。
数组示例:
2 3 6 南南
8 6 2 楠楠
9 8 6 5 楠
9 5 2 1 楠
2 3 4 1 6
6 8 5 3 2
我有两个值,想找到这两个值出现的行的索引。我创建了以下代码,但需要 ~0.2 秒才能完成。这并不理想,因为我必须迭代数千次。我是 Python 的新手,因此我仍在习惯 pythonic 代码以及最快的处理方式。
当前代码:
def rowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for x in range(len(array)):
if (array[x,0]== m or array[x,0] == n) and (array[x,1] == m or array[x,1] == n):
idxList.append(x)
if (array[x,0] == m or array[x,0] == n) and (array[x,2] == m or array[x,2] == n):
idxList.append(x)
if (array[x,1] == m or array[x,1] == n) and (array[x,2] == m or array[x,2] == n):
idxList.append(x)
if (array[x,0]== m or array[x,0] == n) and (array[x,3] == m or array[x,3] == n):
idxList.append(x)
if (array[x,1] == m or array[x,1] == n) and (array[x,3] == m or array[x,3] == n):
idxList.append(x)
if (array[x,2] == m or array[x,2] == n) and (array[x,3] == m or array[x,3] == n):
idxList.append(x)
if (array[x,0] == m or array[x,0] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
if (array[x,1]== m or array[x,1] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
if (array[x,2] == m or array[x,2] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
if (array[x,3] == m or array[x,3] == n) and (array[x,4] == m or array[x,4] == n):
idxList.append(x)
return idxList
如有任何帮助,我们将不胜感激。谢谢!
我认为迭代数组的行并使用 in
检查所需的值应该很简单
def rowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for rx, aRow in enumerate(array):
if m in aRow and n in aRow:
idxList.append(rx)
enumerate
is a standard built-in function,值得一试。
如果你觉得舒服,你可以像列表推导一样做:
def rowIdx(array, m, n):
m = int(m)
n = int(n)
idxList = [rx for rx, aRow in enumerate(array) if m in aRow and n in aRow]
这非常 pythonic 且高效:
def row_index(array,m,n):
for index, row in enumerate(array):
if m in row and n in row:
yield index
如果你想得到索引列表的结果,而不是像这样调用它:
list(row_index(array,m,n))
您可以枚举整个列表(numpy.array
或常规 python list
)。不是检查每个元素和位置,而是检查元素是否属于列表:
def newRowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for idx, row in enumerate(array):
if (m in row and n in row):
idxList.append(idx)
这个方法耗时0.372s,你的耗时0.976s(网上环境查了一下,都是慢的),但是提升还是很明显的
None 上面给出的答案非常有效,因为它们依赖于 Python for 循环,我也不会考虑它们 Pythonic。在这种情况下,您最好只使用 numpy 函数。在引擎盖下,它们是 运行 C 代码,针对您想做的事情进行了大量优化。
希望下面的代码对您有用。这将比目前提出的任何解决方案快几个数量级。
import numpy as np
# Example array
array = np.array([[1,2,90,91,90],[90,2,3,91,5],[1,2,3,np.nan,np.nan]])
# Adapt to your values.
# Find rows per value where the element is there. Easy to extend to more than two values
msk_value_1 = (array == 90).any(axis=1)
msk_value_2 = (array == 91).any(axis=1)
both_true = msk_value_1 & msk_value_2
# find the indices
np.where(both_true)
基准
这里有一些基准可以与提供的大多数解决方案进行比较:
# Create a random matrix for inputs
array = np.random.randint(0,1000, (3000, 4))
# solution here
def find_two_values(array, m, n):
msk_value_1 = (array == int(m)).any(axis=1)
msk_value_2 = (array == int(n)).any(axis=1)
both_true = msk_value_1 & msk_value_2
# find the indices
return np.where(both_true)
# takes 900 microseconds on my computer
_ = find_two_values(array, 90, 91)
# other solution proposed
def rowIdx(array, m, n):
idxList = []
m = int(m)
n = int(n)
for rx, aRow in enumerate(array):
if m in aRow and n in aRow:
idxList.append(rx)
# takes 9.8 ms on my computer
_ = rowIdx(array, 90, 91)
即 1 毫秒与 10 毫秒大约快十倍。