NumPy 索引：使用布尔数组进行广播

Question

与, I came across an indexing behaviour via Boolean arrays and broadcasting I do not understand. We know it's possible to index a NumPy array in 2 dimensions using integer indices and broadcasting. This is specified in the docs相关：

a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])

b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])

c1 = np.where(b1)[0]  # i.e. [1, 2]
c2 = np.where(b2)[0]  # i.e. [0, 2]

a[c1[:, np.newaxis], c2]  # or a[c1[:, None], c2]

array([[ 4,  6],
       [ 8, 10]])

但是，这不适用于布尔数组。

a[b1[:, None], b2]

IndexError: too many indices for array

备选方案 numpy.ix_ 适用于整数和布尔数组。这似乎是因为 ix_ 对布尔数组执行特定操作以确保一致的处理。

assert np.array_equal(a[np.ix_(b1, b2)], a[np.ix_(c1, c2)])

array([[ 4,  6],
       [ 8, 10]])

所以我的问题是：为什么广播适用于整数，但不适用于布尔数组？这种行为是否记录在案？还是我误解了一个更根本的问题？

Answer 1

作为@Divakar , Boolean advanced indices behave as if they were first fed through np.nonzero and then broadcast together, see the relevant documentation for extensive explanations。引用文档，

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
[...]
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

在您的情况下，广播不一定是问题，因为两个数组都只有两个非零元素。问题是结果中的维数：

>>> len(b1[:,None].nonzero())
2
>>> len(b2.nonzero())
1

因此，索引表达式 a[b1[:,None], b2] 将等效于 a[b1[:,None].nonzero() + b2.nonzero()]，它将在 a 中放置一个长度为 3 的元组，对应于 3d 数组索引。因此，您会看到关于 "too many indices".

的错误

文档中提到的惊喜与您的示例非常接近：如果您没有注入那个单例维度怎么办？从长度为 3 和长度为 4 的布尔数组开始，您将得到长度为 2 的高级索引，即大小为 (2,) 的一维数组。这绝不是您想要的，这将我们引向该主题中的另一条琐事。

关于改进高级索引的计划已经进行了很多讨论，请参阅进行中的草案 NEP 21。问题的要点是 numpy 中的奇特索引，虽然有明确的记录，但有一些非常古怪的功能，这些功能实际上对任何东西都没有用，但如果你犯了一个错误，产生令人惊讶的结果而不是错误，它可能会咬你。

NEP 的相关引用：

Mixed cases involving multiple array indices are also surprising, and only less problematic because the current behavior is so useless that it is rarely encountered in practice. When a boolean array index is mixed with another boolean or integer array, boolean array is converted to integer array indices (equivalent to np.nonzero()) and then broadcast. For example, indexing a 2D array of size (2, 2) like x[[True, False], [True, False]] produces a 1D vector with shape (1,), not a 2D sub-matrix with shape (1, 1).

现在，我要强调的是，NEP 还在进行中，但是 NEP 当前状态的建议之一是在上述高级索引情况下禁止布尔数组，只允许它们在 "outer indexing" 场景中，即 np.ix_ 将帮助您使用布尔数组：

Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing [i.e. the current behaviour] is generally not helpful or well defined. A user who wishes the "nonzero" plus broadcast behaviour can thus be expected to do this manually.

我的观点是，布尔高级索引的行为及其弃用状态（或缺乏弃用状态）可能会在不久的将来发生变化。

NumPy 索引：使用布尔数组进行广播

NumPy indexing: broadcasting with Boolean arrays

python

arrays

indexing

numpy

array-broadcasting