NumPy 索引:使用布尔数组进行广播

NumPy indexing: broadcasting with Boolean arrays

, I came across an indexing behaviour via Boolean arrays and broadcasting I do not understand. We know it's possible to index a NumPy array in 2 dimensions using integer indices and broadcasting. This is specified in the docs相关:

a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])

b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])

c1 = np.where(b1)[0]  # i.e. [1, 2]
c2 = np.where(b2)[0]  # i.e. [0, 2]

a[c1[:, np.newaxis], c2]  # or a[c1[:, None], c2]

array([[ 4,  6],
       [ 8, 10]])

但是,这不适用于布尔数组。

a[b1[:, None], b2]

IndexError: too many indices for array

备选方案 numpy.ix_ 适用于整数 布尔数组。这似乎是因为 ix_ 对布尔数组执行特定操作以确保一致的处理。

assert np.array_equal(a[np.ix_(b1, b2)], a[np.ix_(c1, c2)])

array([[ 4,  6],
       [ 8, 10]])

所以我的问题是:为什么广播适用于整数,但不适用于布尔数组?这种行为是否记录在案?还是我误解了一个更根本的问题?

作为@Divakar , Boolean advanced indices behave as if they were first fed through np.nonzero and then broadcast together, see the relevant documentation for extensive explanations。引用文档,

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
[...]
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

在您的情况下,广播不一定是问题,因为两个数组都只有两个非零元素。问题是结果中的维数:

>>> len(b1[:,None].nonzero())
2
>>> len(b2.nonzero())
1

因此,索引表达式 a[b1[:,None], b2] 将等效于 a[b1[:,None].nonzero() + b2.nonzero()],它将在 a 中放置一个长度为 3 的元组,对应于 3d 数组索引。因此,您会看到关于 "too many indices".

的错误

文档中提到的惊喜与您的示例非常接近:如果您没有注入那个单例维度怎么办?从长度为 3 和长度为 4 的布尔数组开始,您将得到长度为 2 的高级索引,即大小为 (2,) 的一维数组。这绝不是您想要的,这将我们引向该主题中的另一条琐事。

关于改进高级索引的计划已经进行了很多讨论,请参阅进行中的草案 NEP 21。问题的要点是 numpy 中的奇特索引,虽然有明确的记录,但有一些非常古怪的功能,这些功能实际上对任何东西都没有用,但如果你犯了一个错误,产生令人惊讶的结果而不是错误,它可能会咬你。

NEP 的相关引用:

Mixed cases involving multiple array indices are also surprising, and only less problematic because the current behavior is so useless that it is rarely encountered in practice. When a boolean array index is mixed with another boolean or integer array, boolean array is converted to integer array indices (equivalent to np.nonzero()) and then broadcast. For example, indexing a 2D array of size (2, 2) like x[[True, False], [True, False]] produces a 1D vector with shape (1,), not a 2D sub-matrix with shape (1, 1).

现在,我要强调的是,NEP 还在进行中,但是 NEP 当前状态的建议之一是在上述高级索引情况下禁止布尔数组,只允许它们在 "outer indexing" 场景中,即 np.ix_ 将帮助您使用布尔数组:

Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing [i.e. the current behaviour] is generally not helpful or well defined. A user who wishes the "nonzero" plus broadcast behaviour can thus be expected to do this manually.

我的观点是,布尔高级索引的行为及其弃用状态(或缺乏弃用状态)可能会在不久的将来发生变化。