了解 Numpy 多维数组索引

Question

谁能解释一下这三种索引操作的区别：

y = np.arange(35).reshape(5,7)

# Operation 1
y[np.array([0,2,4]),1:3]
# Operation 2
y[np.array([0,2,4]), np.array([[1,2]])]
# Operation 3
y[np.array([0,2,4]), np.array([[1],[2]])]

我没有得到的是：

为什么操作 2 不工作而操作 1 工作正常？
为什么操作 3 有效，但返回我期望的转置（即操作 1 的结果）？

根据 numpy 参考：

If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot be broadcast to the same shape, an exception is raised.

好的，这意味着我不能这样做：

y[np.array([0,2,4]), np.array([1,2])]

但是 numpy 参考文献也提到了操作 1：

In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array to produce a resultant array of shape (3,2).

那为什么我做不到：

y[np.array([0,2,4]), np.array([[1,2]])]

我收到错误：

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (1,2)

Answer 1

In [1]: import numpy as np; y = np.arange(35).reshape(5,7)

操作 1

In [2]: y[np.array([0,2,4]), 1:3]
Out[2]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

这里我们混合了高级索引（使用数组）和基本索引（使用切片），只有一个高级索引。根据 reference

[a] single advanced index can for example replace a slice and the result array will be the same [...]

正如以下代码所示：

In [3]: y[::2, 1:3]
Out[3]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

Out[2]和Out[3]的唯一区别是前者是y中数据的一个副本（高级索引总是生成一个副本）而后者是一个视图与 y 共享同一内存（基本索引只总是生成一个视图）。

因此，对于操作 1，我们通过 np.array([0,2,4]) 选择了行，通过 1:3.

选择了列

操作二

In [4]: y[np.array([0,2,4]), np.array([[1,2]])]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-bf9ee1361144> in <module>()
----> 1 y[np.array([0,2,4]), np.array([[1,2]])]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (1,2)

这失败了，并理解为什么我们首先必须认识到这个例子中索引的本质与操作 1 根本不同。现在我们只有高级索引（并且不止一个高级索引）。这意味着索引数组必须具有相同的形状或至少与 broadcasting 兼容的形状。让我们看看形状。

In [5]: np.array([0,2,4]).shape
Out[5]: (3,)
In [6]: np.array([[1,2]]).shape
Out[6]: (1, 2)

这意味着广播机制将尝试合并这两个数组：

np.array([0,2,4])  (1d array):     3
np.array([[1,2]])  (2d array): 1 x 2
Result             (2d array): 1 x F

最后一行末尾的F表示形状不兼容。这就是操作2中IndexError的原因。

操作 3

In [7]: y[np.array([0,2,4]), np.array([[1],[2]])]
Out[7]: 
array([[ 1, 15, 29],
       [ 2, 16, 30]])

同样，我们只有高级索引。现在让我们看看形状是否兼容：

In [8]: np.array([0,2,4]).shape
Out[8]: (3,)
In [9]: np.array([[1],[2]]).shape
Out[9]: (2, 1)

这意味着广播将像这样工作：

np.array([0,2,4])     (1d array):     3
np.array([[1],[2]])   (2d array): 2 x 1
Result                (2d array): 2 x 3

所以现在可以广播了！由于我们的索引数组被广播到 2x3 数组，这也将是结果的形状。所以它也解释了结果的形状与操作1不同的形状。

要获得操作 1 中形状为 3x2 的结果，我们可以这样做

In [10]: y[np.array([[0],[2],[4]]), np.array([1, 2])]
Out[10]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

现在的广播机制是这样的：

np.array([[0],[2],[4]])  (2d array): 3 x 1
np.array([1, 2])         (1d array):     2
Result                   (2d array): 3 x 2

给出一个 3x2 数组。而不是 np.array([1, 2]) 也

In [11]: y[np.array([[0],[2],[4]]), np.array([[1, 2]])]
Out[11]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

会工作因为

np.array([[0],[2],[4]])  (2d array): 3 x 1
np.array([[1, 2]])       (2d array): 1 x 2
Result                   (2d array): 3 x 2

了解 Numpy 多维数组索引

Understanding Numpy Multi-dimensional Array Indexing

python

indexing

numpy

array-broadcasting

操作 1

操作二

操作 3