NumPy 索引:使用布尔数组进行广播
NumPy indexing: broadcasting with Boolean arrays
与, I came across an indexing behaviour via Boolean arrays and broadcasting I do not understand. We know it's possible to index a NumPy array in 2 dimensions using integer indices and broadcasting. This is specified in the docs相关:
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])
c1 = np.where(b1)[0] # i.e. [1, 2]
c2 = np.where(b2)[0] # i.e. [0, 2]
a[c1[:, np.newaxis], c2] # or a[c1[:, None], c2]
array([[ 4, 6],
[ 8, 10]])
但是,这不适用于布尔数组。
a[b1[:, None], b2]
IndexError: too many indices for array
备选方案 numpy.ix_
适用于整数 和 布尔数组。这似乎是因为 ix_
对布尔数组执行特定操作以确保一致的处理。
assert np.array_equal(a[np.ix_(b1, b2)], a[np.ix_(c1, c2)])
array([[ 4, 6],
[ 8, 10]])
所以我的问题是:为什么广播适用于整数,但不适用于布尔数组?这种行为是否记录在案?还是我误解了一个更根本的问题?
作为@Divakar , Boolean advanced indices behave as if they were first fed through np.nonzero
and then broadcast together, see the relevant documentation for extensive explanations。引用文档,
In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero()
into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2]
is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)]
.
[...]
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero()
analogy. The function ix_
also supports boolean arrays and will work without any surprises.
在您的情况下,广播不一定是问题,因为两个数组都只有两个非零元素。问题是结果中的维数:
>>> len(b1[:,None].nonzero())
2
>>> len(b2.nonzero())
1
因此,索引表达式 a[b1[:,None], b2]
将等效于 a[b1[:,None].nonzero() + b2.nonzero()]
,它将在 a
中放置一个长度为 3 的元组,对应于 3d 数组索引。因此,您会看到关于 "too many indices".
的错误
文档中提到的惊喜与您的示例非常接近:如果您没有注入那个单例维度怎么办?从长度为 3 和长度为 4 的布尔数组开始,您将得到长度为 2 的高级索引,即大小为 (2,)
的一维数组。这绝不是您想要的,这将我们引向该主题中的另一条琐事。
关于改进高级索引的计划已经进行了很多讨论,请参阅进行中的草案 NEP 21。问题的要点是 numpy 中的奇特索引,虽然有明确的记录,但有一些非常古怪的功能,这些功能实际上对任何东西都没有用,但如果你犯了一个错误,产生令人惊讶的结果而不是错误,它可能会咬你。
NEP 的相关引用:
Mixed cases involving multiple array indices are also surprising, and
only less problematic because the current behavior is so useless that
it is rarely encountered in practice. When a boolean array index is
mixed with another boolean or integer array, boolean array is
converted to integer array indices (equivalent to np.nonzero()
) and
then broadcast. For example, indexing a 2D array of size (2, 2)
like
x[[True, False], [True, False]]
produces a 1D vector with shape (1,)
,
not a 2D sub-matrix with shape (1, 1)
.
现在,我要强调的是,NEP 还在进行中,但是 NEP 当前状态的建议之一是在上述高级索引情况下禁止布尔数组,只允许它们在 "outer indexing" 场景中,即 np.ix_
将帮助您使用布尔数组:
Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing [i.e. the current behaviour] is generally not helpful or well defined. A user who wishes the "nonzero" plus broadcast behaviour can thus be expected to do this manually.
我的观点是,布尔高级索引的行为及其弃用状态(或缺乏弃用状态)可能会在不久的将来发生变化。
与
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])
c1 = np.where(b1)[0] # i.e. [1, 2]
c2 = np.where(b2)[0] # i.e. [0, 2]
a[c1[:, np.newaxis], c2] # or a[c1[:, None], c2]
array([[ 4, 6],
[ 8, 10]])
但是,这不适用于布尔数组。
a[b1[:, None], b2]
IndexError: too many indices for array
备选方案 numpy.ix_
适用于整数 和 布尔数组。这似乎是因为 ix_
对布尔数组执行特定操作以确保一致的处理。
assert np.array_equal(a[np.ix_(b1, b2)], a[np.ix_(c1, c2)])
array([[ 4, 6],
[ 8, 10]])
所以我的问题是:为什么广播适用于整数,但不适用于布尔数组?这种行为是否记录在案?还是我误解了一个更根本的问题?
作为@Divakar np.nonzero
and then broadcast together, see the relevant documentation for extensive explanations。引用文档,
In general if an index includes a Boolean array, the result will be identical to inserting
obj.nonzero()
into the same position and using the integer array indexing mechanism described above.x[ind_1, boolean_array, ind_2]
is equivalent tox[(ind_1,) + boolean_array.nonzero() + (ind_2,)]
.
[...]
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with theobj.nonzero()
analogy. The functionix_
also supports boolean arrays and will work without any surprises.
在您的情况下,广播不一定是问题,因为两个数组都只有两个非零元素。问题是结果中的维数:
>>> len(b1[:,None].nonzero())
2
>>> len(b2.nonzero())
1
因此,索引表达式 a[b1[:,None], b2]
将等效于 a[b1[:,None].nonzero() + b2.nonzero()]
,它将在 a
中放置一个长度为 3 的元组,对应于 3d 数组索引。因此,您会看到关于 "too many indices".
文档中提到的惊喜与您的示例非常接近:如果您没有注入那个单例维度怎么办?从长度为 3 和长度为 4 的布尔数组开始,您将得到长度为 2 的高级索引,即大小为 (2,)
的一维数组。这绝不是您想要的,这将我们引向该主题中的另一条琐事。
关于改进高级索引的计划已经进行了很多讨论,请参阅进行中的草案 NEP 21。问题的要点是 numpy 中的奇特索引,虽然有明确的记录,但有一些非常古怪的功能,这些功能实际上对任何东西都没有用,但如果你犯了一个错误,产生令人惊讶的结果而不是错误,它可能会咬你。
NEP 的相关引用:
Mixed cases involving multiple array indices are also surprising, and only less problematic because the current behavior is so useless that it is rarely encountered in practice. When a boolean array index is mixed with another boolean or integer array, boolean array is converted to integer array indices (equivalent to
np.nonzero()
) and then broadcast. For example, indexing a 2D array of size(2, 2)
likex[[True, False], [True, False]]
produces a 1D vector with shape(1,)
, not a 2D sub-matrix with shape(1, 1)
.
现在,我要强调的是,NEP 还在进行中,但是 NEP 当前状态的建议之一是在上述高级索引情况下禁止布尔数组,只允许它们在 "outer indexing" 场景中,即 np.ix_
将帮助您使用布尔数组:
Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing [i.e. the current behaviour] is generally not helpful or well defined. A user who wishes the "nonzero" plus broadcast behaviour can thus be expected to do this manually.
我的观点是,布尔高级索引的行为及其弃用状态(或缺乏弃用状态)可能会在不久的将来发生变化。