从另一个包含选择索引的 NumPy 数组增长 NumPy 数组的最快方法

Question

我必须使用原始数组的一组索引从已知数组创建一个大数组。索引存储为 ndarray，为了构建新数组，我正在做这样的事情：

import numpy as np

dim_1       = 200
high_index  = 1000 
dim_2       = 300

masks_array = np.random.randint( low = 0, high = high_index - 1, size=(high_index, dim_1) )
the_array   = np.random.rand( high_index, dim_2 )

new_array   = np.array( [ the_array[ masks_array[ j, : ], :  ] for j in range(high_index) ]  )

这是从 masks_array 生成 new_array 的最快方法吗？有没有办法在没有循环的情况下做到这一点？出于兴趣，因为“for”循环在 np.array 构造函数内部，这是否转化为 Python 中的有效循环（类似于列表理解）？

Answer 1

In [198]: dim_1       = 200
     ...: high_index  = 1000
     ...: dim_2       = 300
     ...: 
     ...: masks_array = np.random.randint( low = 0, high = high_index - 1, size=
     ...: (high_index, dim_1) )
     ...: the_array   = np.random.rand( high_index, dim_2 )
     ...: 
     ...: new_array   = np.array( [ the_array[ masks_array[ j, : ], :  ] for j i
     ...: n range(high_index) ]  )
In [199]: new_array.shape
Out[199]: (1000, 200, 300)
In [200]: masks_array.shape
Out[200]: (1000, 200)
In [201]: the_array.shape
Out[201]: (1000, 300)

让我们尝试使用 masks_array 的简单索引：

In [205]: arr = the_array[masks_array,:]
In [206]: arr.shape
Out[206]: (1000, 200, 300)
In [207]: np.allclose(new_array, arr)
Out[207]: True

时间比较：

In [213]: timeit new_array = np.array([the_array[masks_array[j,:],:] for j in ra
     ...: nge(high_index)])
658 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [214]: timeit arr = the_array[masks_array,:]
292 ms ± 65.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

节省的时间不多，我怀疑是因为结果的总体规模很大。

这是python，np.array是函数。所以

[the_array[masks_array[j,:],:] for j in range(high_index)]

先求值，然后传递给`np.array.

In [215]: timeit [the_array[masks_array[j,:],:] for j in range(high_index)]
369 ms ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

从另一个包含选择索引的 NumPy 数组增长 NumPy 数组的最快方法

Fastest way to grow up a NumPy array from another NumPy array containing selection indices

python

numpy

numpy-ndarray